unt digital library/67531/metadc...chapter i introduction one of the most popular and useful...

2n°\ M0/d

A MONTE CARLO ANALYSIS OF EXPERIMENTWISE AND COMPARISONWISE

TYPE I ERROR RATE OF SIX SPECIFIED MULTIPLE COMPARISON

PROCEDURES WHEN APPLIED TO SMALL k's AND EQUAL AND

UNEQUAL SAMPLE SIZES

DISSERTATION

Presented to the Graduate Council of the

North Texas State University in Partial

Fulfillment of the Requirements %

For the Degree of

DOCTOR OF PHILOSOPHY

By

William R. Yount, B.S., M.R.E., Ed.D.

Denton, Texas

December, 1985

f/nJ

lount, W., A Monte Carlo Analysis of Experimentwise and

Comparisonwise Type I_ Error Rates of Six Specified Multiple

Comparison Procedures When Applied to Small k's_ and Equal and

Unequal Sample Sizes. Doctor of Philosophy (Educational

Research), December, 1985, 201 pp., 26 tables, 10 figures,

bibliography, 111 titles.

The problem of this study was to determine the dif-

ferences in experimentwise and comparisonwise Type I error

rate among six multiple comparison procedures when applied to

twenty-eight combinations of normally distributed data. These

were the Least Significant Difference, the Fisher-protected

Least Significant Difference, the Student Newman-Keuls Test,

the Duncan Multiple Range Test, the Tukey Honestly Sig-

nificant Difference, and the Scheffe Significant Difference.

The Spj^tvoll-Stoline and Tukey—Kramer HSD modifications were

used for unequal n conditions.

A Monte Carlo simulation was used for twenty-eight

combinations of k and n. The scores were normally distributed

(y=100; a=10). Specified multiple comparison procedures were

applied under two conditions: (a) all experiments and (b)

experiments in which the F-ratio was significant (0.05).

Error counts were maintained over 1000 repetitions.

The FLSD held experimentwise Type I error rate to

nominal alpha for the complete null hypothesis. The FLSD was

more sensitive to sample mean differences than the HSD while

protecting against experimentwise error. The unprotected LSD

was the only procedure to yield comparisonwise Type I error

rate at nominal alpha. The SNK and MRT error rates fell

between the FLSD and HSD rates. The SSD error rate was the

most conservative. Use of the harmonic mean of the two

unequal sample n's (HSD-TK) yielded uniformly better results

than use of the minimum n (HSD-SS). Bernhardson's formulas

controlled the experimentwise Type I error rate of the LSD

and MRT to nominal alpha, but pushed the HSD below the 0.95

confidence interval. Use of the unprotected HSD produced

fewer significant departures from nominal alpha. The for-

mulas had no effect on the SSD.

TABLE OF CONTENTS

Page

LIST OF TABLES

LIST OF ILLUSTRATIONS vii

Chapter

I. INTRODUCTION 1

Statement of the Problem Purpose of the Study-Hypotheses Significance of the Study The Model of the Study Definitions Assumptions Chapter Bibliography

II. SYNTHESIS OF RELATED LITERATURE 15

Introduction to Multiple Comparisons The Concepts of Error Rate and Power

Types of Error Rates The Concept of Power

Implications of Error Rate and Power The Development and Definition of Multiple

Comparison Procedures Multiple Comparisons in Graduate Research The Critical Difference Values of Multiple

Comparison Procedures The Least Significant Difference Tukey's Honestly Significant Difference The Student Newman-Keuls Test The Duncan Multiple Range Test The Scheffe Significant Difference

The Research of Carmer and Swanson The Carmer and Swanson Model Articles Citing the Carmer and Swanson

Studies The Research of Clemens Bernhardson Chapter Bibliography

III. PROCEDURES, . 67

The Simulation Plan Generating Random Numbers

1X1

Interpolating Critical Value Tables The Main BASIC Program Summary Chapter Bibliography

IV. ANALYSIS OF DATA 86

Testing the Hypotheses Related Findings Chapter Bibliography

V. CONCLUSIONS, RECOMMENDATIONS, AND SUGGESTIONS FOR FURTHER STUDY 97

Appendix A 1 08 Chi-square test of random number generator

Appendix B 119 Two Samples of Data Generated by the Main Program

Appendix C 120 Main BASIC Program Listing

Appendix D 135 Analysis of the Stepwise Testing Procedure

Appendix E 140 Summary Sheets from Data Analysis

Appendix F 69 List of Articles Citing Carmer and Swanson Studies

Appendix G Critical Differences for each Multiple Comparison Procedure for equal n's for k=3 to k=6

Appendix H 180 Critical Differences for each Multiple Comparison Procedure for unequal n's for k=3 to k=6

Appendix I Results of z—tests between FLSD and Protected and Unprotected HSD Procedures for each k,J

Appendix J gy Graphic Displays of Error Rates for Protected and Unprotected Multiple Comparison

BIBLIOGRAPHY

IV

LIST OF TABLES

Table Page

I. Comparison Between Experimentwise and Per Comparison Error Rates 17

II. Comparison Between Error Rates of the SNK and MRT Procedures 35

III. Multiple Comparison Procedures Used in Dissertations on File With Dissertation Abstracts International 37

IV. Comparison of Critical Values of the (F)LSD and HSD Multiple Comparison Procedures as k Varies 39

V. Comparison of Critical Values of (F)LSD, HSD, and SNK Multiple Comparison Procedures as r Varies 4.4

VI. Comparison of Critical Values of (F)LSD, HSD, SNK, and MRT Multiple Gompari son Procedures as r Varies 46

VII. Comparison of Critical Values of (F)LSD, HSD, SNK, MRT and SSD Multiple Comparison Procedures as r Varies 4.8

VIII. Mean Chi-Square Values for Ten Repetitions of N = 1000 Scores and a Given U 69

IX. Mean Chi-Square Values for Ten Repetitions of N Scores with U = 20 71

X. Mean and Standard Deviation Values for Ten Sets of N = 10,000 Scores 72

XI. Critical Differences for Each of the Testing Procedures for k=3 and J=1 TO 5 176

XII. Critical Differences for Each of the Testing Procedures for k=4. and J=1 TO 5 177

XIII. Critical Differences for Each of the Testing Procedures for k=5 and J=1 TO 5 178

v

XIV.

XV.

XVI.

XVII.

XVIII.

XIX.

XX.

XXI.

XXII.

XXIII.

XXIV.

XXV.

XXVI.

Critical Differences for Each of the Testing Procedures for k=6 and J=1 TO 5 179

Critical Differences for Each of the Testing Procedures for k=3 and J-6 TO 7 180


Critical Differences for Each of the Testing Procedures for k=5 and J=6 TO 7. . . . . . .182


Variables Associated with Specified Error Counts for Each Multiple Comparison Procedure and Two Kinds of Type I Error 81

Counts of Significant F-Ratios for 1000 Repetitions of Each k,J Combination 83

Experimentwise Error Rates for Multiple Comparison Procedures Averaged Across Unequal N'S 87

Comparisonwise Error Rates for Multiple Comparison Procedures Averaged Across Unequal N'S 89

Experimentwise Error Rates for Multiple Comparison Procedures Averaged Across Equal N'S 90

Comparisonwise Error Rates for Multiple Comparison Procedures Averaged Across Equal N'S 91

Z-tests for Significant Difference of Proportions Between Experimentwise Type I Error Rates for the HSD and FLSD Multiple Comparison Procedures 185

Z-tests for Significant Difference of Proportions Between Experimentwise Type I Error Rates for the Unprotected HSD and FLSD Multiple Comparison Procedures 186

VI

LIST OF ILLUSTRATIONS

Figure Page

1 . Three Kinds of Error in Hypothesis Testing . . . . . 20

2. Effect of Size of Difference between Means on Error Rates and Power 23

3. Effect of Population Variance on Error Rates and Power 24.

4-. Effect of Per Comparison and Experimentwise

Type I Error Rates on Power 25

5. The Randomized Block Design 53

6. Comparison of two research designs 54. 7. Graphic Presentation of Experimentwise Type I

Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated by Application of Bernhardson Formulas for k=3 and k=4. 187

8. Graphic Presentation of Experimentwise Type I Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated by Application of Bernhardson Formulas for k=5 and k=6 188

9. Graphic Presentation of Experimentwise Type I Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated Without Prior Significant F-ratio for k=3 and k=4- 189

10. Graphic Presentation of Experimentwise Type I Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated Without Prior Significant F-ratio for k=5 and k=6

Vll

CHAPTER I

INTRODUCTION

One of the most popular and useful statistical tech-

niques in research is analysis of variance (9, p. 237). The

most common use of analysis of variance is in testing the

hypothesis that k > 2 population means are equal (19, p.

90). The purpose is to determine whether the sample means

are indicative of experimental treatment effects or merely

reflect chance variation (17, p. 511). Two statistical

conclusions are possible. Either the null condition of y^ =

y2 ~ * * * = "̂s tenable o r it is rejected. But the re-

jection of the null hypothesis tells us nothing about which

means differ significantly from which other means (8, p.

368). Therefore, a significant omnibus F-ratio may raise

more questions than it answers (9, p. 275)• When researchers

want to know which means in an experiment differ sufficiently

to produce the significant F-ratio, they study differences

between pairs of means by using search techniques called

multiple comparison procedures (17, p. 511). However, these

procedures vary in definition and implementation. It is

difficult to understand the differences between the various

approaches or to select the procedure which will yield the

most reliable results (2, p. 738).

In an effort to empirically clarify the problem of

selecting the appropriate multiple comparison procedure,

Carmer and Swanson conducted two Monte Carlo studies. The

procedures studied in 1971 were the Least Significant Dif-

ference (LSD), the Fisher-protected Least Significant Dif-

ference (FLSD), Tukey's Honestly Significant Difference

(HSD), Duncan's Multiple Range Test (MRT), and the Bayes

Least Significant Difference (BLSD). Their recommendation

was for the Fisher-protected Least Significant Difference (3,

p. 945).

The 1973 study included the five multiple comparison

procedures of the 1971 study and added the Scheffe Sig-

nificant Difference (SSD), the Student Newman-Keuls Test

(SNK), and a second Bayesian procedure called the Bayes Exact

Test (BET). The Bayes Least Significant Difference was

renamed the Bayesian Significant Difference (BSD). The

Fisher-protected Least Significant Difference was refined

into three approaches. The Least Significant Difference was

applied when the F-ratio was found significant at the 0.01

level (FSD1), the 0.05 level (FSD2), and the 0.10 level

(FSD3) (4-, p. 67). Their recommendation was for the FSD2:

the Least Significant Difference when the F-ratio is sig-

nificant at 0.05. The HSD, SSD, SNK and FSD1 were eliminated

because they lacked power. The FSD3, LSD and MRT were

eliminated because they did not sufficiently protect against

experimentwise Type I errors. The BSD was eliminated because

3

the BET did slightly better. Both the BET and FSD2 were

recommended, but due to its parsimony, the FSD2 was recom-

mended as the multiple comparison of choice (4-» p. 74-).

However, Einot and Gabriel state that the Carmer and

Swanson studies are "misleading" because their conclusions

are simple consequences of the two basic kinds of Type I

error rate defined by the techniques, rather than the tech-

niques themselves (6, p. 5*74-—5*75) - Some multiple comparison

procedures use an experimentwise Type I error rate while

others use a comparisonwise Type I error rate (6, p. 575; 9,

p. 278; 21, p. 327). Einot and Gabriel fault the Carmer and

Swanson studies for failing to consider the different Type I

error rates (6, p. 574-) •

Their solution to this problem was to set all multiple

comparison procedures to the same experimentwise Type I error

rate and compare them empirically through the use of a Monte

Carlo simulation. From this study they recommended Tukey's

Honestly Significant Difference (HSD) for its "elegant

simplicity" and power, which they reported was "little below

that of any other method." The Fisher-protected Least Sig-

nificant Difference (FLSD) was rejected because of its

liberal experimentwise Type I error rate (6).

Comparable conflict surrounds the Student Newman-Keuls

Test. Recent statistical texts recommend the Student Newman-

Keuls as the procedure of choice (7, p. 312.; 8, p. 376; 9,

p» 307). However, Einot and Gabriel reject it because of its

k

excessive experimentwise Type I error rate (as compared to

the HSD) (6, p. 582). Likewise, Carmer and Swanson reject it

for its inability to detect real differences among means (as

compared to the FLSD) (4, p. 73). The confusion over mul-

tiple comparison procedures does not stop with Student

Newman-Keuls. Kirk summarizes his chapter on multiple com-

parisons by emphasizing that each has been recommended by one

or more statisticians (19, p. 127). Thus it is obvious that

conflicting recommendations abound in the area of multiple

comparisons.

Statement of the Problem

The problem of this study was to determine the dif-

ferences in experimentwise and comparisonwise Type I error

rates among six specified multiple comparison procedures.

Purpose of the Study

The purpose of this study was to empirically analyze the

Least Significant Difference, the Fisher—protected Least

Significant Difference, the Student Newman-Keuls Test, the

Duncan Multiple Range Test, the Tukey Honestly Significant

Difference, and the Scheffe/ Significant Difference in terms

of their error rates when applied to k,J experimental com-

binations of normally distributed data generated by Monte

Carlo methods.

5

Hypotheses

The first hypothesis of this study was that there would

be no difference in the ranking of error rates found by

Carmer and Swanson (1973) using large k's and equal n's and

the ranking obtained using small k's and unequal n's.

The second hypothesis of this study was that there would

be no statistically significant difference in experimentwise

Type I error rate between the HSD and FLSD procedures when

using the Bernhardson formulas.

Significance of the Study

This study was considered significant in that it

empirically investigated the error rates for six multiple

comparison procedures for specified k,J experimental combina-

tions of simulated data generated by Monte Carlo methods.

The question was whether the findings of Carmer and Swanson,

applicable to the large k's and equal n's found in agricul-

tural research, generalize to smaller k's and unequal n's

prevalent in educational research.

The study was further considered significant in that it

focused on the FLSD technique to determine if this method,

using the error rate definitions of Bernhardson, can yield

acceptable control for experimentwise Type I errors for data

common in educational research. Myette and White stated that

"further replications of the Carmer and Swanson (1973) [4.]

and Bernhardson (1975) [1] studies need to be conducted." If

further replications confirm that the two-stage t-test is as

accurate as it appears to be, then "the extensive work in

developing new techniques and modifications of existing

techniques may be focused in the wrong areas." Instead of

creating more techniques, "it is more important to systema-

tically integrate the information we now have and to deter-

mine if this simple approach is not only more parsimonious,

but just as accurate" (20, p. 14.).

The Model of the Study

Data for this study was generated according to a com-

pletely randomized design. This is the model used by Kesel-

man and others in several Monte Carlo studies (13, p. 99; 15,

p. 127; 16, p. 48; 18, p. 585). The equation for the com-

pletely randomized design is given by

Yij = H + Tj + c K j ) E'- 1

where u is the population mean, x. is the effect of treatment J

level j subject to the restriction that all effects sum to

zero, and is normally distributed experimental error

(19, p. 135). The population mean in this study was set to

one hundred (100), all treatment effects were set to zero,

and the normally distributed experimental error was simulated

by a pseudo-random number generator (See Appendix A for a

description of the generator).

7

Definitions

Bernhardson formulas.—Formulas developed by Clemens

Bernhardson (4-) calculate a and a only after a prelimi-pc ew r

nary significant F-test.

Critical difference.—The critical difference of a

multiple comparison procedure is the computed difference

between two means required to declare them significantly

different. It is computed by multiplying the procedure's

critical value, taken from the appropriate statistical table,

by the standard error of difference between the two means.

Critical value.—The critical value of a multiple com-

parison procedure is the value drawn from a critical value

table designed for that comparison. The value depends on the

level of significance desired, the number of error degrees of

freedom, and, for some procedures, the number of means in the

experiment or steps between ordered means.

Experimentwise Type I error.—Type I errors are treated

differently by various multiple comparison procedures. Some

are based on an experimentwise Type I error rate, a . This ew

rate is defined as the long run proportion of the number of

experiments containing at least one Type X error divided by

the total number of experiments (9, p. 278).

8

FLSD•—The term FLSD refers to the Fisher-protected

Least Significant Difference procedure which applies the

"unprotected LSD" only after a preliminary F-test is found to

be significant (4, p.67). The FLSD is also called a "two-

stage LSD" ( H , p. 884-).

(F)LSD.—The term (F)LSD refers to both FLSD and LSD

multiple comparison procedures. It is used when references

can apply to either procedure, such as in computing critical

differences used in testing. The (F)LSD is based on a (3, pc v '

A, 5).

HSD.—The term HSD refers to Tukey's Honestly Sig-

nificant Difference multiple comparison procedure which was

developed in 1953. It is one of the most widely used mul-

tiple comparison procedures (19, p. 116). Like the (F)LSD,

it is a simultaneous test procedure in that it uses one

critical value for all comparisons (7, p. 311). The HSD is

based on a g w (21, p. 327).

k,J combination.—This term refers to two major vari-

ables in this study: the number of groups in an experiment,

k, and the sample size category, J. There were four levels

of k representing three, four, five and six groups. There

were seven levels of J. J(1) through J(5) represented equal n

sample sizes of 5, 10, 1$, 20, and 25 respectively. J(6)

represented an unequal set of n.'s in the ratio of J

1:2:3:4:5:6 with 11^=10. That is, when k=3, the sample n's

were 10, 20, and 30. When k=6, the sample n's were 10, 20,

30, 4-0, 50 and 60. J(7) represented a set of n . 's in the J

ratio of 4:1:1:1:1:1 with n.j=80. That is, when k=3, the

sample n's were 80, 20 and 20. This provided twenty-eight

combinations of k,J.

LSD.—The term LSD refers to the Least Significant

Difference multiple comparison procedure. For purposes of

this study, the LSD was applied in the same way as the mul-

tiple t-test (11, p. 521), sometimes referred to as the

"ordinary" LSD or the "unrestricted" LSD (5, p. 10). This is

done to distinguish it from the FLSD which is a "protected"

(4, p. 67) or "restricted" (5, p.11) LSD test.

Monte Carlo method.—The Monte Carlo method consists of

generating simulated random experiments by computer. Scores

are generated by a specified mathematical formula and

categorized into the desired research design. These scores

form the basis for testing statistical procedures (6, p. 579;

10, p. 72).

MRT.—The term MRT refers to the Multiple Range Test,

developed by Duncan (1953). It is a stepwise multiple com-

parison procedure which is based on a (6, p. 575). PC

Per comparison Type I error.—Some multiple comparison

procedures use a per comparison or comparisonwise Type I

10

error rate, a , which is defined as the total number of Type ir ^

I errors made divided by the total number of possible com-

parisons (9, p. 278).

Power.—The power of a statistical test is the

probability that it will correctly reject a false null

hypothesis (9, p. 152).

SNK.—The term SNK refers to Student Newman-Keuls Test

which developed from the work of Student (1927), Newman

(1939), and Keuls (1952). Like the MRT, it is a stepwise

testing procedure and is based on a (6, p. 575). pc

SSD.—The term SSD refers to the Scheffe"multiple com-

parison procedure (1953) which is the most flexible and

conservative of the multiple comparison procedures (7, p.

121). It is able not only to test pairwise comparisons, but

can also test any combination of means against any other

combination of means within the experiment. This flexi-

bility, however, reduces its ability to detect pairwise

differences (19, p. 122). The Scheffe^ Test is based on an

ae w error rate (21, p. 327).

Type I error.—A Type I error is made when two popu-

lation means are declared different when they are actually

equal (22, p. 566). In statistical terms, it is rejecting a

true null hypothesis (12, p. 1374). The probability of

making a Type I error is symbolized by the letter alpha (a).

11

It is also referred to as the "level of significance" (19, p,

36).

Type II error.—A Type II error is made when two means

are declared equal when they are actually different (22, p.

566). Statistically speaking, it is retaining a false null

hypothesis (12, p. 1374.). The probability of making a Type

II error is given by beta (3) (19, p. 36).

Type III error.—A Type III error is made when two

population means are declared different when they are, in

fact, different, but in reverse order (22, p. 566). The

probability of committing a Type III error is given by gamma

(Y) (11, p. 513).

Assumptions

It was assumed that the data produced by the random

number generator used in this study were not different from

data normally collected and analyzed by educational re-

searchers .

It was further assumed that small k's and unequal n's

better reflect the realities of educational research than

large k's and equal n's.

12

CHAPTER BIBLIOGRAPHY

1. Bernhardson, Clemens S., "375: Type I Error Rates When Multiple Comparison Procedures Follow a Significant F Test of ANOVA," Biometrics. XXII (March 1975), pp. 229-232.

2. Boardman, Thomas J. and Moffitt, Donald R., "Graphical Monte Carlo Type I Error Rates for Multiple Com-parison Procedures," Biometrics. (September 1971), pp. 738-74-3.

3. Carmer, S. G. and Swanson, M. R., "Detection of Dif-ferences Between Means: A Monte Carlo Study of Five Pairwise Multiple Comparison Procedures," Agronomy Journal. LXIII (1971), p. 940-945.

4* . , "An Evaluation of Ten Pairwise Multiple Comparison Procedures by Monte Carlo Methods," Journal of the American Statistical Association. LXVIII (1973), pp. 66-74.

5* and Walker, W. M., "Pairwise Multiple Com-parisons Procedures for Treatment Means," Technical Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983). pp. 1-33.

6. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (1975). pp. 574-583.

7. Ferguson, George A., Statistical Analysis in Psychology and Education. 5th ed., New York, McGraw Hill Book Publishers, 1981.

8. Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology. 2nd ed., Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984. '

9. Howell, David C., Statistical Methods for Psychology. Boston, Duxbury Press, 1982.

10. Howell, John F. and Games, Paul A., "The Effects of Variance Heterogeneity on Simultaneous Multiple Comparison Procedures with Equal Sample Size," British Journal of Mathematical and Statistical Psychology. XXVTT (1Q7/,1r pp. 7?_«1

13

11. Harter, H. Leon, "Error Rates and Sample Sizes for Range Tests in Multiple Comparisons," Biometrics, XIII (1957), pp. 511-536.

12. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their Relationship to Power," Journal of Dairy Science, LVIII (September 1975), pp. 1372-1378.

13. H. J. Keselman, "A Power Investigation of the Tukey Multiple Comparison Statistic," Educational and Psychological Measurement , XXXVI (1976), pp. 97-104..

K . ; Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin. LXXXVI (July 1979), pp. 884.-888.

15. and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Parametric Tests for Mul-tiple Comparisons," British Journal of Mathematical and Statistical Psychology. XXX (May 1977), pp. 125-133.

16* , , "A Comparison of the Modified-Tukey and Scheffe Methods of Multiple Comparisons for Pairwise Contrasts," Journal of the American Statistical Association. VXXIII (March 1978), pp. 4-7-52.

17' and Toothaker, Larry E., "Comparison of Tukey's T-Method and Scheffe's S-Method for Various Numbers of All Possible Differences of Averages Contrasts Under Violation of Assumptions", Educa-tional and Psychological Measurement. VXX (1974), pp. 511-519. ~

. ., and Shooter, M., "An Evaluation of Two Unequal n. Forms of the Tukey Multiple Comparison Statisticf" Journal of the American Statistical Association. LXX, (September 1975), pp. 584-587. ~

19. Kirk, Roger E., Experimental Design; Procedures for the Behavioral Sciences, 2nd ed., Belmont, California, Brooks/Cole Publishing Company, 1982.

u

20. Myette, Beverly M. and White, Karl R., "Selecting An Appropriate Multiple Comparison Technique: An In-tegration of Monte Carlo Studies," Paper presented before the Annual Meeting of the American Educa-tional Research Association, March 19-23, 1982.

21. Steel, R. G. D., "Query 163: Error Rates in Multiple Comparisons," Biometrics, (1961), pp. 326-328.

22. Welsch, Roy E., "Stepwise Multiple Comparison Procedures," Journal of the American Statistical Association. LXXII (1977), pp. 566-575.

CHAPTER II

SYNTHESIS OF RELATED LITERATURE

Introduction

The history of multiple comparison procedures "suffers

from an embarrassment in riches seldom found in statistics"

(4-7, p. 53). A number of authors have described theoreti-

cally, mathematically, and preferentially various multiple

comparison techniques. The literature is filled with con-

tradictory assumptions, recommendations, and conclusions

concerning which multiple comparison procedure to use under

what circumstance (45> p. 4-).

The situation reflects the frustration of then Senator

Walter Mondale in a speech to the American Educational

Research Association in the early 1970's. Summing up the

results of his study of the research on integration in the

public schools, he said,

What I have not learned is what we should do about these problems. I had hoped to find research to support or to conclusively oppose my belief that quality inte-grated education is the most promising approach. But I have found very little conclusive evidence. For every study, statistical or theoretical, that contains a pro-posed solution or recommendation, there is always another, equally well documented, challenging the as-sumptions or the conclusions of the first. No one seems to agree with anyone else's approach. But more dis-tressing. no one seems to know what works. As a result, I must confess, I stand with my colleagues confused and often disheartened (44, p. viii).

15

16

Much of the confusion in the literature concerning

multiple comparison procedures stems from differing perspec-

tives on experimental error and the power of statistical

tests. A clear concept of error rate and power is essential

for understanding the distinctives of each procedure.

The Concepts of Error Rate and Power

Types of Error Rates

Three types of error rate were briefly defined in Chap-

ter I and are analyzed more fully here. A Type I error is

made when two population means are declared significantly

different when they are actually equal (55, p. 566). It is

rejecting HQ: y. = when HQ is true (31, p. 1374). It was

noted that Type I errors are treated differently by various

multiple comparison procedures. Some are based on an experi—

mentwise Type I error rate while others use a per comparison,

or comparisonwise (53, p. 539), Type I error rate (28, p.

278). An experimentwise error rate is the probability that

an experiment contains some incorrect decisions (4.7, p. 4.5).

It is a measure of the risk one takes in making one or more

Type I errors on all pairs of means within an experiment (33,

p. 884.). A per comparison error rate is the risk of making a

Type I error when testing a single pair of means within an

experiment (33, p. 884-5 51, p. 327; and 55, p. 566). The

constant relationship between the two kinds of Type I errors

is given by (41, p. 104; 51, p. 327)

17

(1-aew) = (1-apo)° Eq. 2

where c is the number of comparisons being made. Table I

illustrates the relationship between the two error rates.

TABLE I

COMPARISON BETWEEN EXPERIMENTWISE AND PER COMPARISON ERROR RATES

Number Means (k)

of Number of Comparisons (k(k-1)/2)

Experimentwise Error Rate

Per Comparison Error Rate

Set a = aew = °*05

3 3 .050000 .016952 4 6 .050000 .008512 5 10 .050000 .005116 6 15 .050000 .003413

10 45 .050000 .001139

Set a = a = 0.05 pc

3 3 .142625 .050000 4 6 .264908 .050000 5 10 .401263 .050000 6 15 .536708 .050000

10 45 .900559 .050000

It logically follows from equation 2 that as c in-

creases, the divergence between a e w and ap C increases. Table

I shows that selecting a per comparison error rate of 0.05

increases the risk of committing a Type I error across all

pairs. For example, with 10 means in an experiment, there is

a 90 per cent chance of committing at least one Type I error

in testing the 4-5 pairs. Selecting an experimentwise error

18

rate of 0.05 decreases the probability of committing a Type I

error with any given pair. With 10 means in an experiment,

the per comparison probability of committing a Type I error

in testing a single pair is 0.001. Fisher writes in 1937,

When the z_ test does not demonstrate significant differentiation, much caution should be used before claiming significance for special comparisons. Com-parisons, which the experiment was designed to make, may, of course, be made without hesitation. It is com-parisons suggested subsequently, by a scrutiny of the results themselves, that are open to suspicion; for if the variants are numerous, a comparison of the highest with the lowest observed value, picked out from the results, will often appear to be significant, even from undifferentiated material. . . . Thus, in comparing the best with the worst of ten tested varieties, we have chosen the pair with the largest apparent difference out of 45 pairs, which might equally have been chosen. We might, therefore, require the probability of the ob-served difference to be as small as 1 in 900, instead of 1 in 20, before attaching statistical significance to the contrast (18, pp. 65-66).

Fisher s "1 in 900" [0.0011] agrees with the per comparison

error rate [0.0011] computed in Table I when a is set at 0.05

and an experimentwise procedure is used. However, this per

comparison probability is suggested by Fisher for testing the

largest difference in the 4-5 pairs. A simultaneous experi-

mentwise procedure tests all pairs at this level. It will be

shown that this greatly reduces the ability of a test to

detect differences between paired means.

The formula for computing the per comparison Type I

error rate is given by Kirk in equation 3 (41 » p. 103).

Number of contrasts falsely declared significant

apc ~ ~ " ~ ~ " Eq. 3 Number of contrasts

19

The formula for computing the experimentwise Type I error

rate is given in equation 4- (4-1, p. 103).

Number of experiments with at least one contrast falsely

declared significant ae w

= Eq. 4-Number of experiments

A Type II error is made when two means are declared

equal when they are actually different (55, p. 566). It is

retaining a false null hypothesis (25, p. 209; 31, p. 1374.) .

The probability of making a Type II error is symbolized by

M M (3) (4-1, p. 36). Type II error is directly linked to

the power of a test. A more powerful test will make fewer

Type II errors, declaring H Q true when it is actually false,

than a less powerful test (25, pp. 210-211).

A Type III error is made when two population means are

declared different when they are, in fact, different, but in

reverse order (55> p. 566). That is, sample mean 2, drawn

from population 2, is declared significantly lower than

sample mean 1, drawn from population 1, when population 2 is

ctually larger than population 1 (26, p. 521). The proba-

bility of making a Type III error is symbolized by gamma (y)

(26).

The relationship among the three types of errors was

clearly illustrated by Harter as shown in figure 1 (26, p.

521). The area labelled a/2 is the region of rejection of

a

20

distribution one. It represents the probability of making a

Type I error. Any mean falling inside either ct/2 area is

considered significantly different from ^ . In figure 1,

sample mean A falls in this region of rejection. If sample A

was drawn from population 2, then a correct decision has been

made. If sample A was drawn from population 1, however, the

decision is incorrect and results in a Type I error.

1 2

Sample mean A

Type I error j

Type II error

Sample Mean B

Type III error

Sample Mean C

Power = 1-3

Fig. 1 Three kinds of error in hypothesis testing.

21

The area labelled 3 is that part of distribution two

which falls below the upper a/2 region of rejection of dis-

tribution one. This area represents the probability of

making a Type II error. A mean which was drawn from popula-

tion 2 is declared not significantly different from if it

falls in this region (25, p. 211-212). Sample mean B falls

in this region. If sample B was drawn from population 1 then

a correct decision has been made. If sample B was drawn from

population 2, however, the decision is incorrect and results

in a Type II error.

The region labelled y is that part of distribution two

which falls beyond the lower a/2 of distribution one. This

area represents the probability of making a Type III error

(26, p. 521). A sample mean which was drawn from population

2 is declared significantly lower than population 1 if it

falls in this region. Mean C falls in this region. If

sample C was drawn from population 1, then a Type I error has

been made. If sample C was drawn from population 2, then a

Type III error has been made. The level of occurrence of

Type III error was found to be little or no problem for any

of the multiple comparison techniques (6, p. 94.3; 55, p.

569).

A perpetual dilemma in statistical inference is that,

with regard to Type I and Type II errors, reducing the risk

of one increases the risk of the other. The decision con-

cerning which kind of error to control is not a mathematical

22

judgement but rather a subjective one (21, p. 99; 33, p.

886). The only way to simultaneously reduce both kinds of

error is to improve the research design itself. Increasing

the number of subjects, using more precise tools of measure-

ment, and choosing research designs which partition experi-

mental error into definable components, reduce both types of

error (8, p. 95; 31, p. 1375).

The Concept of Power.

The power of a multiple comparison procedure is defined

in terms of the number of comparisons it will identify as

significantly different (31 • p. 1374-)* This is directly

related to the size of critical value used by the procedure.

The procedure with the lowest critical value is defined as

the "most powerful" (48, p. 481) because it will declare more

pairwise differences significant than a procedure with a

higher critical value. Power is the complement of Type II

error and has a probability of 1-0 (56, p. 12). Figure 1

shows the power of a test as the area under distribution two

to the right of the demarcation line (-48). Since this demar-

cation line is set by the upper a/2 region, power is directly

related to alpha and can always be increased for any method

by increasing the likelihood of Type I error (50, p. 355).

Using Harter's diagram, the effect of these variables of

hypothesis testing on error rates and power can be shown.

Figure 2 illustrates the effect of mean difference between

23

two populations on error rates and power. As the difference

between means grows, 3 decreases and power increases. That

is, a test will make fewer Type II errors and will be able to

detect differences more easily as the difference between

means increases.

Ui - U

a = 0.05

Fig* 2 Effect of size of difference between means on error rates and power.

2K

Figure 3 illustrates the effect of population variances

on error rate and power. One can increase the power of a

test for a given difference between means by decreasing the

variability of measurements. This can be done by studying

more homogeneous populations, using more precise instruments

or increasing the sample size (4.1, p. 39).

Ui = u 2

° 1 < ° 2

a = 0.05

power, Fig. 3 Effect of population variance on error rates and

25

The choice of an experimentwise or comparisonwise error

rate directly affects "the power of a "test. Figure 4- shows

two sample means being tested by two different multiple com-

parisons. The power resulting from the use of a per compari-

son procedure and alpha set at 0.05 is represented by the

area to the right of line 1. The power resulting from the

use of an experimentwise procedure and alpha set at 0.05 is

represented by the area to the right of line 2. It is ob-

vious that the experimentwise procedure will not declare as

many differences significant as the per comparison procedure

and is therefore considered less powerful (31, p. 1374.).

a = a = 0.05

a = aew = °'°5

apc = °'°5 V = °- 0 0 1

Fig. 4. Effect of per comparison and experimentwise Type I error rates on power. J *

26

Implications of Error Rates and Power

Is it more important to retain true null hypotheses or

reject false ones? Kemp [1975] reports Gill's [1973] con-

tention that avoiding experimentwise Type I errors is very

important (23; 31, p. 1375). Gill writes eight years later

that putting one's trust in a comparisonwise error rate "does

not restrict false findings sufficiently. . . .the frequency

of publications of false positives already far exceeds the

nominal rate that scientists believe they are operating with"

(24, p. 1506). Tukey [1953] and Ryan [1959] both support the

experiment rather than the comparison as the unit of study

and therefore recommend the experimentwise error rate (4.7, p.

52). A practical application of this concern is voiced by

Barcikowski. If an experiment shows erroneously that teach-

ing by television produces significant improvement in learn-

ing (Type I error), then huge sums of money will be invested

in equipment and teacher training. If no significant dif-

ference is found erroneously (Type II error), no action is

usually taken. Therefore, avoiding Type I errors is of

greatest importance (3).

These writers use an experimentwise error rate to mini-

mize the publishing of erroneous conclusions in the

literature. A per comparison rate yields "more fictitious

results" than the experimentwise rate. Replication is seldom

done. Findings often stand on one experiment. Therefore,

extreme caution is warranted. In summary, "it is better to

27

punish truth than to let falsehood gain respectability" (4.7,

P. 53).

Carmer and Walker take an opposing view. If the unit of

interest is the individual comparison rather then the entire

experiment of k(k-1)/2 comparisons, then the experimenter

should not be penalized for using an efficient experimental

design (10, p. 13)• If the unit of interest is indeed the

individual comparison, then a comparisonwise error rate and a

comparisonwise multiple comparison procedure is justified.

The emphasis under these conditions is not avoiding experi—

mentwise Type I errors, but avoiding Type II errors. In

January 1985, Carmer stated that this 1983 report, with minor

revisions, was scheduled to be published in the Spring 1985

issue of The Journal of Agronomic Education (11). It there-

fore represents Carmer's most recent view of the multiple

comparison problem.

Carmer suggests this practical example. If a variety of

plant or fertilizer is declared superior to another when in

fact the two are equally effective (Type I error), there is

no economic loss if the two cost the same. But when two

varieties are declared the same when in fact one is superior

(Type II error), economic loss occurs if the inferior variety

is chosen. Therefore, Type II errors are more costly to

research users than Type I errors (8, p. 97).

Kemp agrees with Carmer's reasoning. Using milk pro-

duction as an example, he poses the same general question; If

28

two ratios of feed yield equal amounts of milk, is great harm

done if the researcher declares one to be superior to the

other? If the two rations are really equal and the experiment

is carefully designed, then the difference between the sample

yields would be too small to justify a more expensive ration

to obtain the difference, whether it was "statistically sig-

nificant" or not. Several experiments would be run before

changes were made, and the probability of finding significant

differences, given equal population means, over several ex-

periments is very low (31, p. 1375). Duncan and Brant write

in support of the comparisonwise error rate, stating that the

objectives are the same in the simultaneous testing of m com-

parisons in one experiment as if each test were being made in

a separate experiment (12, p. 794.).

Therefore, some writers shun procedures which use a com-

parisonwise Type I error rate because their greater concern

is the avoidance of Type I errors (24, 50, 55). Others shun

procedures which use an experimentwise Type I error rate

because their greater concern is avoiding Type II errors and

maximizing power (6, 7, 9, 10, 11, 12, 31).

Is there a distinction between researchers who desire

power to detect differences on the one hand and statistical

theorists who prefer avoiding false positives on the other?

The literature has shown support for such a distinction.

However, Kirk, in a personal letter, states that he does not

conceptualize the problem of multiple comparisons as a

29

"battle between theory and practice, although there are

schools of thought concerning the appropriateness of various

multiple comparison procedures" (4.2).

The Development and Definition of Multiple Comparison Procedures

The solution to the problem of hypothesis testing be-

tween two means when the population standard deviation is

unknown "might well be taken as the dawn of modern inferen-

tial statistical methods. It was found in 1908 by William S.

Gossett who published it under the pseudonym 'Student'" (25,

p. 217). The method is called the t-test, and when applied

one time to data from two samples at the 0.05 level of sig-

nificance, then the probability of committing a Type I error

is indeed 0.05 (25, p. 305). The original multiple compari-

son procedure was the multiple t-test, which applied

Student's t-test to paired means within an experiment (21, p.

97). That is, an experiment with four treatment means would

aPPly the t-test to the six pairs of means to determine which

means were significantly different from the others. The

problem with this approach is that multiple applications of

the t-test inflate the Type I error rate of the experiment.

If one were to set the level of significance to 0.05 and make

the six comparisons between paired means, the true proba-

bility of committing a Type I error increases to 0.265. This

is given by

30

p = 1 - (1 - a)c Eq. 5

where c is the number of independent comparisons (25, p.

325). The Type I error rate probability grows rapidly with

the number of comparisons (4.7, p. 4-3).

In 1925, English statistician Sir Ronald A. Fisher

published a solution to testing more than two means without

increasing the Type I error rate in Statistical Methods for

Research Workers (17). In this work he acknowledged his debt

to Student in no uncertain terms.

The study of the exact sampling distributions of statistics commences in 1908 with "Student's" paper The Probable Error of a Mean. . . ."Student's" work was not quickly appreciated, and from the first edition [1925] it has been one of the chief purposes of this book to make better known the effect of his researches. . . (17 pp. 24.-25).

But the contribution and influence of Fisher far surpassed

that of the man purported to be his master (30, p. 6).

The statistical technique developed by Fisher to solve

the problem of testing k > 2 means without increasing Type I

error is known as the analysis of variance (ANOVA). This

procedure evaluates whether there is any systematic dif-

ference among a set of k means. A significant F-ratio indi-

cates that the variance among the experimental means is

greater than one would expect if the null hypothesis is true

(25, p. 325).

Often, however, the fact that "one or more means differ"

is less important to the researcher than which means differ.

The multiple t-test can be employed to determine this, but

31

not without undoing what the ANOVA was designed to correct.

Fisher suggested a solution for the multiple comparison

problem in 1935 which he called the Least Significant Dif-

ference (LSD). Federer, in the "only textbook [in 1957] to

discuss multiple comparison procedures" (26, p. 515),

describes several variations of the LSD. One is the multiple

t-test, defined as the standard error of the mean times ST

times the value of t at the 0.05 level of significance for

the number of degrees of freedom associated with the standard

error (15, p. 20). Another variation is the Most Significant

Difference, MSD, which uses the 0.01 level of significance

for the t-distribution. The third variation described by

Federer required a significant F-ratio to precede the ap-

plication of the multiple t-test (15, p. 21). This third

variation is the one given by Kirk as the definition of the

LSD. The test consists of "first performing a test of the

overall null hypothesis with the ANOVA." If the F-ratio is

significant, then apply the multiple t-test. If the F-ratio

is not significant, no pairwise comparisons are made (41, p.

115).

These variations have caused a great deal of confusion

m the literature with regard to the "LSD." It will be shown

m a later section that some apply the LSD without the

preliminary F-test. This is the multiple t-test, sometimes

called the "unprotected LSD" (26, p. 513), the "ordinary

LSD," the "unrestricted LSD" (10, p. 10) or simply the LSD

32

(5). Others have emphasized the importance of the prelimi-

nary F-test and use such terms as the "two-stage strategy"

(33, p. 884.), the "Fisher-protected" LSD (7, p. 67), the

"restricted" LSD (10, p. 11) or the LSD (41, p. 115). It is

sometimes difficult to distinguish which procedure is in-

tended when the literature reports findings on the "LSD."

The unprotected LSD provides the least protection against

Type I errors of all the multiple comparison procedures.

This is the reason for its name (49, p. 312). The FLSD

provides greater protection against experimentwise Type I

errors than the unrestricted LSD because of the preliminary

F-test (7, p. 67), but still yields an experimentwise rate

above nominal alpha because of its per comparison definition

(8, p. 99).

Another procedure, the Student Newman-Keuls (SNK),

developed from the work of Student (1927), Newman (1939), and

Keuls (1952), uses an ordered set of means and a range of

critical values rather than a single critical value for all

comparisons. Sample means are ordered from the smallest to

the largest. The largest difference, r = k means apart, is

tested first at a level of significance. If this difference

is significant, then means that are r = k-1 steps apart are

tested at alpha and so on. The actual Type I error rate is

neither per comparison nor experimentwise, but falls some-

where between the two (41, p. 123). The SNK uses a different

critical value for each value of r. For unequal n's,

33

Barcroft suggests using the harmonic mean (n') computed from

all the group n's in the experiment so long as the group

sizes are similar (16, p. 312; 28, p. 12). Another approach

is to substitute the harmonic mean of the two samples being

compared (n,!) for n in the formula. The latter leads to

less bias than the former (28, p. 12) and was used in this

study.

In 1953, responding to the criticism of the high experi-

mentwise Type I error rate of the LSD (6, p. 943), J. W.

Tukey developed a conservative multiple comparison procedure

called the Honestly Significant Difference (41, p. 115). The

HSD is one of the most widely used multiple comparison proce-

dures for evaluating pairwise comparisons among means (4-1 » p.

116). It is similar to the LSD in that it is a simultaneous

test procedure, or STP (20, p. 4-87). That is, it uses one

critical value for all comparisons. The HSD is defined on an

experimentwise basis and therefore controls the Type I error

rate for all pairs to nominal alpha regardless of the number

of means in the experiment (31, p. 1375).

The HSD assumes equal sample sizes. Several modifi-

cations have been proposed for unequal sample size situ-

ations. The two selected for this study were the Tukey-

Kramer, developed in 1956, and the Spjjtftvoll-Stoline,

developed in 1973 (41, pp. 118-120). The Tukey-Kramer

modification uses the harmonic mean of the n's of the two

means being tested (41, p. 120). This procedure generally

34

controls the rate of experimentwise Type I error and is as

sensitive to treatment differences as other recommended pro—

cedures (36, p. 127). The Spjjrfvoll-Stoline uses n . , the m m 7

smaller of the n's of the two means being tested. Since

critical values increase as n decreases, the use of n min

generates a more conservative test than the harmonic mean of

n's used in the Tukey-Kramer procedure (41, p. 119).

Also in 1953, H. Scheffe' developed the most flexible and

conservative of the multiple comparison procedures (16, p.

121). The Scheffe' Significant Difference (SSD) can be used

not only to evaluate pairwise comparisons, but can also be

used to test any combination of means against any other

combination of means within the experiment. This flexi-

bility, however, reduces its ability to detect pairwise

differences (4-1, p. 122). In fact, the test is so conserva-

tive that it may not detect any significant differences among

means even when the overall F-ratio is significant (4.9, p .

315).

In 1955, David B. Duncan developed the Multiple Range

Test (MRT). The MRT follows the same testing procedure as

the SNK but uses a different critical value table ( 4 1 , pp.

824-825). The MRT table gives critical values which provide

a k-mean significant level equal to 1 - (1 - a)k~1 ( 4 1 , p .

125). The farther the means are separated by rank, the more

lenient the standard of significance becomes for the MRT pro-

cedure as shown in Table II (16, p. 311 ) .

35

TABLE II

COMPARISON BETWEEN ERROR RATES OF SNK AND MRT PROCEDURES

Number of Means

Formula MRT a

SNK a

2 3 A 5 1

1 1

1

1 1

1 1

• • •

• o o o o

Ul ^ VuJ ?\J

1 1

1 1 0.05

0.0975 0.U26 0.1855

0.05 0.05 0.05 0.05

When alpha is set at .05 in an experiment with five

means, the actual test of the largest difference will be made

at experimentwise error rate varies with the number of treat-

ments in the experiment which "does not make sense in the

real world" (9, p. 123). Unequal n's are handled in the same

manner as with the Newman-Keuls Test.

In 1965, Duncan proposed a modification to the LSD. The

procedure included the use of Bayesian statistical principles

in examining prior probabilities of decision errors and was

called the Bayesian Least Significant Difference (BLSD). The

procedure is named for Thomas Bayes. Fisher writes that

Bayes' "celebrated essay published in 1763 is well known as

containing the first attempt to use the theory of probability

as an instrument of inductive reasoning; that is, for arguing

from the particular to the general, or from the sample to the

population" (17, p. 22). Duncan's BLSD allows an experi-

menter to choose a value of k which represents the ratio of

36

relative seriousness of Type I to Type II errors. The BLSD

approximates the LSD when the F-ratio is large. But when F

is small, less than 2.5, the BLSD is more conservative and

tends to approximate the HSD (6, p. 942). The BLSD was found

to be as powerful as the FLSD by Carmer and Swanson in 1971

(6, p. 945).

Multiple Comparisons in Graduate Research

A prominent source of information concerning which

multiple comparison procedures are recommended for analyzing

actual research data is the doctoral dissertation. A search

°̂ > Dissertation Abstracts International in Education and

Psychology revealed one hundred fifty-seven dissertations

which had used one hundred sixty-two multiple comparison pro-

cedures in analyzing their data. The most popular procedure

since the earliest cited dissertation (1976) has been the

Student Newman-Keuls Test. The SNK accounted for 25 per cent

of procedures selected. Duncan's Multiple Range Test ac-

counted for another 19 per cent. Together, the range tests

made up U3 per cent of the procedures. The more conservative

procedures, the Scheffe Significant Difference and Tukey's

Honestly Significant Difference, accounted for 31 per cent of

the procedures. The more liberal, the Least Significant

Difference and the Fisher-protected Least Significant Dif-

ference, accounted for 26 per cent of the procedures. Table

III shows the frequencies of use of the various procedures.

37

TABLE III

MULTIPLE COMPARISON PROCEDURES USED IN DISSERTATIONS ON FILE WITH

DISSERTATION ABSTRACTS INTERNATIONAL

Procedure Frequency of Percent of Name Use Use

Newman-Keuls 39 24.07% Multiple Range Test 30 18!52° Scheffe 29 17^90 Fisher-protected LSD 21 12.96 Unprotected LSD 21 1?*q£ Tukey HSD 2 1

Bayes Exact Test 1 0.01

162 99.38%-;

Rounding error

The Critical Difference Values of Multiple Comparison Procedures

Given a specific experimental situation, critical dif-

ference values are computed differently for each multiple

comparison procedure.

The Least Significant Difference.

The LSD critical difference is given by (27, p. 268; 28,

p. 295; 4-1 f P- 115)

LSD = t(a/2,v) /(2MSW/n) Eq. 6

where t is the Student's t-distribution table value, a/2 is

the upper portion of the level of significance, v is the

38

within degrees of freedom, MSW is the mean square within

value, and n is the number of subjects in each sample.

Carmer and Swanson use a slightly different form of equation

6 and give the LSD critical value as (6, p. 941; 7, p. 67;

10, p . 10)

LSD = t(a,v) sd Eq. 7

where s^ is the standard error of difference of the two

groups. It is clear from equations 6 and 7 that sd is equal

to /2MSw/n. The relationship between Student's t Distri-

bution Table (4-1, Table E.4-), and the Studentized Range Table

(4-1 , Table E.7) is given by

t(a/2,v) = q(a,2,v)//2~ Eq. 8

Therefore equation 6 can be rewritten in terms of the Studen-

tized Range table as shown in Equation 9.

LSD = q(a,2,v) /MSw/n Eq. 9

This is equivalent to Carmer's formula in equation 7. The HSD

and SNK procedures both use the Studentized Range table. By

applying the relationship in equation 8 and using equation 9

in testing, the Studentized Range table may be used for the

(F)LSD as well.

The LSD is considered by many to be an appropriate pro-

cedure if its use is restricted to experiments in which the

analysis of variance F value is significant (FLSD) and the

experimenter's interest is in the individual pairwise con-

strasts rather than the overall test ( 1 , p . 194.; 8, p . 95;

13, p. U0; 29, p. 72; 31, p. 137$; 33, p. 8 8 4 ) . Several

39

studies found that the FLSD yielded the greatest power and an

"acceptably low" error rate ( 2 , 19 , 46, 5 4 ) .

Tukey's Honestly Significant Difference.

The critical difference for the HSD is given by ( 41 , p .

116).

HSD = q(a,k,v) /(MSW/n) Eq. 10

where q() is the Studentized Range critical value and k is

the number of means in the experiment. The key difference

between the (F)LSD and the HSD procedures is the value of r

used in entering the Studentized Range table. For the

(F)LSD, r always equals 2. For the HSD, r equals the number

of means in the experiment (r=k). Table IV demonstrates the

effect of r on the critical difference used in the two proce-

dures with varying k's.

TABLE IV

COMPARISON OF CRITICAL VALUES OF THE (F)LSD AND HSD MULTIPLE COMPARISON PROCEDURES AS k VARIES

(Kirk Table E.7 values)*

k Value o f r Critical Values

(F)LSD HSD (F)LSD HSD

2 4 10

2 2 2

2 U 10

3.15 3.15 3.15

3.15 4.33 5.60

Q II c • c v=10

40

The result of the increasing critical value in HSD is a

"reduction in power" (7, p. 74). That is, it becomes in-

creasingly difficult to detect differences as the number of

groups in the experiment grows. However, it is this increas-

ing critical value that allows the HSD to maintain the

experimentwise Type I error rate at a. The constant value of

the (F)LSD allows the experimentwise error rate to increase

as k increases (31, p. 1377).

It should be noted that some take issue with Carmer and

Swanson's use of the term "reduction in power". They state

that this increasing value

. . .should not be interpreted to mean that the lukey test is less powerful than other multiple compari-

P ^ o c e ^ r e s a s suggested by Carmer and Swanson l l l l V ' . T h e, sensitivity of the Tukey test is predic-t s y "than other procedures (such as Newman-Keuls, Ryan and Duncan), since in its development, the test sets a different rate of Type I error. As Einot and Gabriel point out, the more power-concerned analysts can increase the sensitivity of the Tukey test by merely manipulating its Type I error (40, p. 586) [Einot and Gabriel suggest using the HSD at a = 0.25 in order to increase its power (14., p. 577)].

Aitkin comments that the "lack of sensitivity objected to in

experimental error rates" is brought about by using conven-

tional 0.05 or 0.01 levels of significance. "In many experi-

mental situations when the null hypothesis is known a priori

to be false, it is appropriate to increase substantially the

experimental error rate above these levels" (1, p. 193).

At any rate, the issue appears to be one of semantics.

Of what value is it to use the HSD with a higher probability

41

of committing experimentwise Type I errors over the (F)LSD

which, by definition, has a higher experimentwise Type I

error rate? And what does the term "reduction in power" mean?

It means nothing more than that the procedure declares fewer

comparisons significant than another. Not one writer dis-

putes the fact that the HSD will declare fewer comparisons

significant than the (F)LSD.

Games, Keselman, and Clinch call the HSD the "most

powerful simultaneous multiple comparison technique that

controls Type I error rate over the set or family of

comparisons" (22, p. U 2 ) . Still, Kemp says that it "lacks

sufficient power to be useful in sorting out which treatments

are different" (31, p. 1377) a n d q u o t e s from Thomas [52] that

the HSD is "too conservative to be practical." Unless the

researcher is extremely concerned about Type I errors, use of

the HSD is "ruled out because of its poor sensitivity to real

differences" (6, pp. 944-9-45).

There is some controversy whether the HSD should be

applied after a significant F-ratio. Keselman suggests this

two stage strategy (32, p. 101). Carmer and Swanson apply

the HSD whether the F-ratio is significant or not (Carmer,

1973, p. 67). One study suggests that applying the HSD in a

two-stage procedure with ANOVA could lead to a "Type IV

error." Since the HSD is not directly related to the F

statistic, it could detect differences undetected by ANOVA,

hence, a Type IV error ( 4 3 ) . Keselman and Murray studied

42

this possibility and state that researchers "need not be

concerned about committing a Type IV error, whether the

concept is theoretically valid or not" (34, p. 609).

There are actually two Tukey test statistics. The

preceding discussion has focused on the HSD, or Tukey A, as

some call the test (21, p. 98; 28, p. 303). This is the more

conservative of the two tests and the one used by Carmer and

Swanson in their studies (50, p. 355). The other test is

Tukey's Wholly Significant Difference (WSD), or Tukey B. The

critical value of the Tukey B is the mean of the critical

values for the SNK and HSD procedures. This is stated by

Howell as (28, p. 303)

q(WSD) = (q(k) + q(r))/2 Eq. 11

With this modification, the WSD proceeds in the same manner

as the SNK. Therefore, the WSD is more powerful, but more

complex and provides less protection against Type I errors

than the HSD (21, p. 98). Some confusion comes from authors

who treat the HSD and the WSD as the same test (41, p. 116).

A common occurrence in educational research is unequal

sample sizes. Several modifications of the HSD have been

suggested for this situation. The .Spjrftvoll and Stoline

modification uses n m l n, the n of the smaller group in the

comparison, in place of n in the HSD equation (41, pp. 118-

119). This modification also uses q' from the Studentized

Augmented Range Table (41, p. 846) in place of q. The Kramer

modification uses the sample sizes of the two means being

4-3

compared by replacing MSW/n with MSW( 1/.n.+1/n . )/2 (41, p.

120). This procedure controls experimentwise Type I error

and is as sensitive to differences as others (38, p. 51).

Myette and White found the Kramer modification to be the most

accurate (4.5, p. 14.). Kirk recommends the use of Spj^tvoll

and Stoline when sample sizes are nearly equal and the Kramer

modification when they are moderately to severely unequal

(41, p. 120).

Both the F(LSD) and HSD procedures are simultaneous test

procedures (20, p. 485) which use a single critical value

regardless of how much separation exists between ranked means

(16, p. 311). If a study has five means, the HSD critical

value is 4.650 for a = 0.05 and v = 10 (see Table IV). This

one value is then applied to all pairwise comparisons, no

matter how many ranks separate the two means. The next two

procedures allow for a stepwise, or layered, approach to

making pairwise comparisons. The separation distance between

ranked means is taken into account when the critical dif-

ferences are determined (41, p. 123). These two procedures

are the Student Newman-Keuls Test and the Duncan Multiple

Range Test.

The Student-Newman-Keuls Range Test.

The range of critical differences for the Student

Newman-Keuls is given by (41, p. 124)

SNKr = q(ot,r,v) /(MSW/n) Eq. 12

u

where q(a,r,v) is the critical value from the Studentized

Range table, and r is the distance between ranked means. The

critical differences for the SNK are compared with the (F)LSD

and HSD in Table V.

TABLE V

COMPARISON OF CRITICAL VALUES OF (F)LSD, HSD, AND SNK MULTIPLE COMPARISON PROCEDURES AS r VARIES

(Kirk Table E.7 values)*

(F)LSD HSD r=2 r=k=5

SNK r

5 3.15 4-.65 4.. 65 1 3.15 4.65 4.33 \ 4-65 3.88 2 3.15 4.65 3.1$

*a = 0.05, v = 10, k = 5

The first test made in the SNK procedure is on the difference

between the largest and smallest means in the experiment.

The r-mean range in this case is equal to the number of means

in the experiment. That is, r = k. For this test, the SNK

critical difference equals the HSD and yields a conservative

test for this largest difference. If this largest difference

is declared significant, then the means k-1 steps apart are

tested at the r = k-1 level of significance. As the distance

between ranked means (r) decreases, the critical value be-

comes more liberal. When r = 2, the SNK critical value

4-5

equals that used by the (F)LSD. See Appendix D for a

detailed analysis of the procedure. This range of critical

values attempts to balance the seriousness of committing Type

I and Type II errors (4-1, pp. 123-125). Dunnett writes, "For

significance testing, I think it is generally agreed that the

Newman-Keuls Test is preferable to Tukey's" (13, p. UO).

But Ramsey faults the test for its "clearly inflated experi-

mentwise Type I error rate" (4.8, p. 4.82).

The Duncan Multiple Range Test.

The critical difference for the MRT is given by (4.1, p.

126)

MRTr = m(a,r,v) /(MSW/n) Eq. 13

where m is the critical value from Duncan's Multiple Range

Table. The Duncan Multiple Range Test follows the same

stepwise procedure as the Student Newman-Keuls, but uses the

Duncan New Multiple Range Test table rather than the Studen-

tized Range table. The use of this table provides critical

values that vary with the number of treatments in the experi-

ment. While the SNK maintains alpha for each pair of ordered

means (4.1, p. 126), the MRT does not. The level of a for ew

the MRT increases with the number of treatments in the

ranking and should only be used "by researchers stranded

somewhere between reality and the Wonderful World of Statis-

tical Theory" (9, p. 123). Table VI shows critical dif-

ferences for procedures discussed to this point.

46

TABLE VI

COMPARISON OF CRITICAL VALUES OF (F)LSD, HSD, SNK, AND MRT MULTIPLE COMPARISON PROCEDURES

AS r VARIES*

r (F) LSD"'* r=2

HSD** r=k=5

SNK** r

MRT*** r

5 4 3 2

3.15 3.15 3.15 3.15

4.6 5 4.65 4.65 4.65

4.65 4.33 3.88 3.15

3.43 3.37 3.30 3.15

*a = 0.05, v = 10, k = 5 ;"Kirk Table E.7 ***Kirk Table E.8

Carmer and Walker go on "to criticize the layered ap-

proach to multiple comparison testing because it confuses the

issue of error rates. It is better in their opinion to

choose an error rate, whether it be per comparison or experi-

mentwise, and then apply the best test for that error rate.

With multiple range tests the difference between two treatments required for significance depends on [k] + ' d° e S n o t m a k e m u c h s e n s e to think that the true difference between two treatments depends in any way on what other treatments are included in the experi-J? oiir' * * f-We-' r e c o m m e n d that neither the DMRT, nor

e SNK, nor any other multiple range procedures ever be used lor comparisons among treatment means (10, p. 21).

The. Scheffe Significant Difference.

The Scheffe Significant Difference is unlike the other

procedures in that it is directly tied to the F-test. Its

critical difference is given by (4.1, p. 121)

SSD = /L(k-1) F c y] v/LMSW Z(c /n )] Eq. U J J

47

where k is the number of groups in the experiment. F is the cv

critical value for the omnibus F-test, MSW is the mean square

within value, c^ is the contrast factor for the jth group

(which equals 1 for pairwise comparisons), and n. is the J

number of subjects in the jth group. For pairwise compari-

sons between groups of equal size, the formula simplifies to

SSD = /[(k-1)F ] /[2MSW/n]. Eq. 15

Some writers state that the Scheffe "should never be

employed for pairwise comparisons" because of its extreme

conservatism (7, p. 73; 28, p. 304.). Others laud the pro-

tection against experimentwise Type I error afforded by the

Scheffe, even with its lack of power. Gill writes "Only

Scheffe's procedure, among well-known methods, can guarantee

any rational assessment of strength of evidence" (24, p.

1506). Petrinovich and Hardyck state that the simplest

approach to multiple comparison procedures would be to apply

the Scheffe as the initial test of choice. A significant

Scheffe means that wrong conclusions are unlikely. A non-

significant Scheffe could be followed by the Tukey HSD (47,

p. 53). Games counters this position when he writes

The [Petrinovich and Hardyck] recommendation is one arbitrary point on a continuum of choice . . . . when one specifies a conservative test, and then says that if this conservative test is not significant, he will use a more liberal test, he is merely adding ambiguity and inconsistency to his decision rule" (21, p. 100),

Games recommends using a test and its associated error rate

which matches the questions which the experimenter wants to

48

answer (21, p. 101). Still others simply report that the

Scheffe is expected to be overly conservative when used for

pairwise comparisons (39, p. 513).

Table VII gives a final summary of critical differences

for each of the procedures selected for this present study.

These differences were computed for actual data in the Monte

Carlo study.

TABLE VII

COMPARISON OF CRITICAL VALUES OF (F)LSD. HSD SNK, MRT AND SSD MULTIPLE COMPARISON PROCEDURES

AS r VARIES*

r (F)LSD** HSD** SNK** MRT*** SSD r=2 r=k=5 r r

$ ?' ™ n*503 1 0* 5 0 3 8* 1 2 1 1 2- 2 60 L 7*i?n S'S? 10*°^2 7* 9 6 7 1 2' 2 6 0

\ 7* 7n 9 *^28 7 * 7 8 8 12.260 2 7*]?n ' 8 , 6 0 5 7- 5 3 2 12.260 2 7.170 10.503 7.170 7.149 12.260

a = 0.05, n = 20, v = 1U, k = 6 Interpolated values from Kirk Table E.7 Interpolated values from Kirk Table e!8

It is obvious from these values that the (F)LSD will declare

more pairwise differences significant than any of the others

because its critical difference is the smallest. Likewise

the SSD will declare fewer differences significant than any

other because its critical difference is the largest. In

fact, on the basis of critical differences alone, we can rank

the procedures according to expected error rates and power

49

from (F)LSD (high) through MRT, SNK, HSD to the SSD (low).

In summary, it appears that answers to the question of

which multiple comparison technique to use in a given situ-

ation are more philosophical than mathematical. Those who

are deeply concerned about avoiding Type I errors recommend

the more conservative, but less powerful a procedures such 0 W

as SSD and HSD. Those who are more concerned with power and

significance of specific contrasts recommend the more power-

ful, but less protected a procedures such as (F)LSD and

SNK.

The decision as to which error rate is more appropriate

and what significance level is to be used must, finally, be

made by the experimenter (51, p. 327). The method of choice

depends on the experimenter's decision regarding which type

of error he would most like to minimize (4.7, p. 50). If an

investigator chooses a definition of error rate, a test, and

a significance level then this information permits him to

compute the significance level for any other suitable test

and error rate. This knowledge eliminates any paradox (51,

p. 328).

In March of 1982, Myette and White presented a paper at

the Annual Meeting of the American Educational Research

Association. They had attempted to synthesize the findings

of all Monte Carlo studies of multiple comparison procedures

m order to overcome the apparent contradictions of the

existing studies and bring some clarity to the problem of

50

choosing an appropriate multiple comparison technique. Only

twelve of the twenty Monte Carlo studies provided enough

codable data to permit synthesizing of results. But one key

conclusion was that the two-stage t-test, or FLSD, seemed to

provide a parsimonious solution to the bewildering problem of

selecting a multiple comparison procedure (4.5, p. 17). They

focused on two studies which had recommended this procedure:

Carmer and Swanson's 1973 study and Bernhardson's 1975

study. These studies form the heart of the present inves-

tigation and are now analyzed.

The Research of Carmer and Swanson

The 1973 study of Carmer and Swanson referenced by

Myette and White was really an extension of one published in

1971. The 1971 study focused on five multiple comparison

procedures: the unprotected LSD or multiple t-test, the

protected FLSD, the HSD, the MRT, and the BLSD. Eight sets

of 10 treatment means and seven sets of 20 treatment means,

each set possessing a different level of homogeneity, were

tested with 3, 4-> 5, and 6 replicates, or scores per sample.

Data was categorized in a randomized complete block design.

One thousand replications of each experimental condition

produced (1000 reps x 15 sets of means x /, levels of sample

size=) 60,000 experiments. Type I, II, and III error rates

were computed for each procedure. The observed rates of

correct decisions when real differences occur and the rates

51

of the three types of possible errors indicated that the

FLSD, the MRT, and the BLSD are more appropriate for use in

research than the LSD or the HSD. "Based on the statistical

properties observed in the study, a choice among these three

is difficult; however, the FLSD may be preferred due to its

familiarity to researchers and its simplicity of application"

(6, p. 94-0).

The 1973 study was an expansion of the 1971 study. Ten

multiple comparison procedures were studied in 1973. The

unprotected LSD, the MRT, and the HSD were carried over from

the 1971 study. The FLSD was tested at three levels of

significance for the F-test: the FSD1 applied the LSD when

the F-test was significant at the 0.01 level; the FSD2 ap-

plied the LSD when the F-test was significant at the 0.05

level; and the FSD3 applied the LSD when the F-test was

significant at the 0.10 level. The BLSD was renamed the

Bayes Significant Difference (BSD). The Student Newman-Keuls

(SNK), the Scheffe Significant Difference (SSD), and the

Bayes Exact Test (BET) were added to make a total of ten pro-

cedures .

Seven sets of 5 means, 8 sets of 10 means and 7 sets of

20 means, each with varying homogeneity, were tested with 3,

4-, 6, and 8 replicates. Data was categorized in a randomized

block design. One thousand replications of each experimental

condition produced (1000 reps x 22 sets of means x 4 levels

of sample sizes=) 88,000 experiments. Type I, II, and III

52

error rates were computed for each procedure. The observed

rates of correct decisions when real differences occurred and

the observed rates of the three types of possible errors

indicated that the FSD2 and the BET are more appropriate for

use in research than the FSD3, the LSD or the MRT, which

produced excessive experimentwise error rates, or the FSD1,

SSD, or HSD, which lacked the power to detect real dif-

ferences (7, p. 7A)• Carmer and Swanson again recommended

the most powerful and parsimonious procedure as the method of

choice: the LSD should be applied after a preliminary F-test

is found significant at the 0.05 level (7)

The focus of criticism leveled at the Carmer and Swanson

recommendation is the per comparison error rate of the FLSD

and its accompanying excessive experimentwise error rate (U,

24-, 47, 50). Carmer is not disturbed. He writes, "To me the

experimentwise error rate is of little or no use. One simply

does not care about it. The comparison is the unit of in-

terest in virtually all experiments I have been involved

with" (11). in his satirical look at the multiple comparison

problem, Carmer has Baby Bear, a young researcher, saying

that theoretical statisticians were looking for honey up the

wrong tree when they invented experimentwise error rates" (9,

p. 123). This perspective grows directly out of Carmer's

view of the role of the experiment. A study of his use of

the randomized block design provides insight into this view.

53

The Carmer and Swanson Model

Carmer and Swanson used a randomized complete block

design in both 1971 and 1973. Though the model is not

clearly described in educational research terms in the Carmer

and Swanson articles, this design is confirmed by Welsch (55,

p. 568) and most recently in personal letters from Kirk and

Carmer (11, 42). This design is a two-way ANOVA with treat-

ments as one variable and blocks as the other. Each treat-

ment cell contains one score (41, p. 238). This design is

shown in Figure $.

TREATMENTS

A1 A2 A3 Aj

B1 Y11 Y12 Y13 ... Y1j

B L

B2 121 Y22 Y23 ... Y2j

0 C

B3 131 Y32 Y33 ••• 13j

K •

•

•

• • •

S • • •

• •

Bn Yn1 Yn2 Yn3 •

... Ynj

Fig. 5—Randomized block desi gn

The blocking variable in the Carmer and Swanson studies were

replications of treatments which took values of 3, 4, 6, and

8. A replication m agricultural research is synonymous with

a score or "subject" in educational research (4.1, p. 238).

54

The reason for using "the randomized block design grows

out of practical considerations of agricultural research.

Carmer and Walker shed light on this reasoning by way of an

example. Given fifteen cultivars, how might a researcher

determine which cultivars differ significantly in their yield

from the others? One approach is to pair all fifteen cul-

tivars with the others. This yields (15x14/2 =) 105 pairs of

cultivars. Using four replications for each of two cultivars

requires eight plots. Eight plots for 105 pairs requires 840

plots to conduct the experiment. Using the randomized block

design, all fifteen cultivars with their four replications

each can be tested with sixty plots. This is shown in Figure

6.

COLUMNS

R 0 ¥ S

1

1 X 2 X 3 X 4 X

X X X X

15x14/2 = 105 trials CxRxl05 = 840 plots

1 2 3 4

X X X X

X X X X

COLUMNS

3 4

X X X X

X X X X

X X X X

1 trial CxR = 60 plots

15

X X X X

Fig. 6 -Comparison of two research designs

It is good economy to reduce the required plots from 840

to 60. Further, this change yields a more efficient statisti-

cal design in that the first approach yields three error

degrees of freedom (v=3) while the latter yields forty-two

55

(v=42) (9, p. 122). The unit of interest, the individual

comparisons between the 105 pairs of cultivars, has not

changed. Therefore, the per comparison, not the experiment-

wise error rate, is of primary interest to the researcher.

Calculating the experimentwise error rate for the randomized

complete block design in figure 6 yields a value of 0.78.

However, calculating the experimentwise error rate for the

105 independent trials yields a value of (1 - (.95)105 =)

0.9954- (10).

It is quite clear that the conceptual unit of interest

is the individual comparison and not the experiment. Carmer

and Walker state that "the penalties imposed by the use of an

experimentwise error rate [which include larger critical

values, larger Type II error rate, smaller power, and smaller

correct decision rate (10, p. 14)] should not be inflicted

upon the experimenter because he used a design with 60 ex-

perimental units rather than 105 trials occupying 840 experi-

mental units" (10, p. 13).

The equation for the randomized block design (4.1, p.

24-0) is given by

Yij = ^ + Tj + "i + £ij Eq. 16

where is a score for the experimental unit in block i and

treatment level j, M is the overall population mean, t is J

the effect of treatment level j and is subject to the

restriction that Zx = 0, tk is the effect of block i which is

normally distributed, and e±j. is the experimental error that

56

is normally distributed and independent of the block effect.

Gill recommends blocking units into homogeneous groups as the

chief device for reducing experimental error (24, p. 1507).

However, Carmer and Swanson set their block effect to a

constant zero (6, p. 942; 7, p. 68). Under this condition,

the randomized complete block design can be simplified to the

completely randomized design (41, p. 240). The equation for

the completely randomized design (41. p. 135) is given by

Yij = U + Tj + ei(j)• E1- 17

The randomized complete block design has the advantage

of specifying the source of experiment error more precisely

than the completely randomized design. It does this by

dividing the error term, ^, in equation 17 into two

parts. The first part accounts for error among blocks, TT.,

and the second accounts for the remainder. When the block

effect is zero, the generalized error terms in both equations

are equal. Under this condition, equation 17 can be used to

generate scores as precisely as equation 16. This is the

model used by Keselman and others in several studies (32, p.

99; 35, p. 264; 36, p. 127; 37, p. 1051; 38, p. 48; 40, p.

585).

Articles Citing the Carmer and Swanson Studies

The impact of the Carmer and Swanson recommendations for

the FLSD can be seen by the number of writers who have cited

57

their work. The Science Citation Index. 1973-1984., cites

seventy articles referencing the 1973 study. Thirty-one

articles published since 1974- used the FSD2 procedure and

specifically cited the Carmer and Swanson study (See Appendix

F for a complete list of articles noted in this section).

Four studies used the Bayes Exact Test which was the secon-

dary recommendation of the 1973 study. The BET performed as

well as the FLSD but is more difficult to use. Three ar-

ticles cited the 1973 study, but then used procedures that

the study did not recommend: the unprotected LSD, the Tukey

HSD and the Duncan MRT. Three articles discussed the general

use of multiple comparisons and were clearly supportive of

the Carmer and Swanson findings, while eight others only made

a passing reference to the studies. Eighteen articles could

not be located for analysis. Three articles openly chal-

lenged the findings of Carmer and Swanson on the basis of

error rate considerations.

One article extended the findings of the Carmer and

Swanson 1973 study and was cited by Myette and White in

support of the two-stage LSD technique. This is the 1975

study by Clemens S. Bernhardson (4).

The Research of Clemens Bernhardson

Bernhardson<s study is a refutation of the findings of

Boardman and Moffitfs 1971 study (5). Their recommendation

against the LSD was based on empirical tests which were made

58

without doing a preliminary F test U , p. 229). Bernhardson

developed four formulas with which to compute per comparison

and experimentwise error rates for multiple comparison proce-

dures to be used only after a significant F test. Equation

18 gives the per comparison rate for the combination sig-

nificant F test and multiple comparison procedure.

The number of Type I errors following significant F test

aA = ,Z 7 I T T - Eq. 18 (Number of experiments)(k(k-1)/2)

This formula reduced the a p c of the FLSD below the nominal

level, but not as much as the HSD procedure. Bernhardson

further modified equation 18 by changing the denominator to

reflect only experiments with significant F-ratios. This

definition is given by equation 19.

The number of Type I errors following significant F test

a -B ' ~ — Eq. 19

(Number of experiments with significant F)(k(k-1)/2)

This formula produced an unacceptably high error rate and was

not included in this study.

Bernhardson also studied two modifications of the for-

mula for experimentwise error rate. Equation 20 gives the

experimentwise error rate for the combination of a sig-

nificant F test and a multiple comparison procedure.

Number of experiments with a significant F

and one or more Type I errors a° = T~Z I 2 0

Number of experiments

59

This procedure reduced a g w of the FLSD to the level of the

HSD and was used in this study. An additional modification

of the experimentwise error rate was made by changing the

denominator to reflect only experiments with significant F-

ratios. This definition is given by equation 21.

Number of experiments with a significant F

and one or more Type I errors aD = ~ — — Eq. 21

JMumber of experiments

with a significant F

This formula also proved to yield excessive error rates and

was not used in this study. Bernhardson's results demon-

strated that the use of the modified formula reduces the two-

stage LSD experimentwise Type I error rate to the nominal

level of significance U , p. 231). Since excessive experi-

mentwise error rate has been the chief argument against the

FLSD, it would seem that Bernhardson's findings demand fur-

ther investigation. This is precisely the conclusion of

Myette and White in their synthesis of twenty empirical

studies of multiple comparison procedures (45, pp. 13-14.).

It should be noted that Carmer has recently become more

extreme m his position. Given the fact that ANOVA tests a

complete null hypothesis, that is, that all means in the ex-

periment are equal; and the fact that one can scarcely design

a realistic experiment in which ten or fifteen means are

equal ["Baby Bear considered it to be rather unlikely that

all 15 cultivars were genetically alike, so he did not worry

60

a great deal about possible Type I errors." (9, p. 122)], the

ANOVA is known to be significant before it is applied. It is

therefore unneccessary.

In deciding whether to use the ordinary LSD or the restricted LSD, the experimenter needs to consider the question: "How likely is it that all [k] treatments in my experiment have exactly the same true means?" If it is quite unlikely that all [k] treatments are equal, there may be little or no point in requiring the analysis of variance F ratio to be significant. On the other hand, if the experimenter has evidence that all LkJ treatments might be expected to be equal, use of the restricted LSD [FLSD] may be a good choice (10, p. 12).

While one might not expect ten or fifteen means in an experi-

ment to be equal, what about experiments that consist of

smaller k's such as those common in educational research?

Carmer writes,

• think that for 3, 4-> or 5 treatments one might opt for the FLSD if he felt there really were possibilities that all treatments were the same. I think that would be preferable to using an experiment-wise procedure. In agricultural research, we are, of course, concerned about Type I errors, but in general using the LSD or FLSD at 5% or 1% will give adequate

I 1 5 ^ r e a t m e n t s a n d u s e s Tukey's test n noo / C7 V l s ecluivalent to using the LSD at U.(JojS4% and one has little chance of making a Type I error and little chance of detecting real differences

The purpose of this study was to empirically study the

findings of Carmer and Swanson (6, 7) and Bernhardson U ) to

better understand the dynamics of the selected multiple com-

parison procedures, principally the FLSD and HSD, in relation

to per comparison and experimentwise error rates.

61


1. Aitkin, M. A., "Multiple Comparisons in Psychological Experiments," The British Journal of Mathematical

Statistical Psychology. XXII (November 1969). pp. 193-198. "

2. Balaam, L. N., "Multiple Comparisons: A Sampling " Australian Journal of Statistics. Vol.

V (1963).

3. Barcikowski, Robert S., "Statistical Power With Group Mean As the Unit of Analysis," ED 191 910, National Institute of Education Grant, (Ohio State Univer-sity, 1980).

4-. Bernhardson, Clemens S., "375: Type I Error Rates When Multiple Comparison Procedures Follow a Significant

229-232°f A N° V A , M B i o m e t r i c s - X X X I (March 197$), pp.

5. Boardman, Thomas J. and Moffitt, Donald R., "Graphical Monte Carlo Type I Error Rates for Multiple Com-P a r i ^ S Jy° c e d u r es," Biometrics. (September 1971), pp. /j>o-/43»

6. Carmer, S. G. and Swanson, M. R., "Detection of Dif-ferences Between Means: A Monte Carlo Study of Five airwise Multiple Comparison Procedures," Aeronomv

Journal, LXIII (1971), p. 940-945. Agronomy

7.

8.

; ; . "An Evaluation of Ten airwise Multipie Comparison Procedures by Monte

Carlo Methods," Journal of the American Statistical Association, LXVIII (1973T,^?."55=7^

"Optimal Significant Levels for Applica-I N O N + _ _ I T-\. Sr Sr tion of the Least Significant Difference in Crop

Performance Trials," Crop Science. XVT f February 1976), pp. 95^99. January

9' Statist-! ̂ ? d T W ? l k r ; W' " B a b y B e a r' s Dilemma: A

62

10 ,

11

t "Pairwise Multiple Comparisons Procedures for Treatment Means," Techni-cal Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983), pp. 1-33.

, Professor of Biometry, University of Illinois, Urbana, Illinois, Personal letter received January 14, 1985.

12. Duncan, D. B. and Brant, L. J., "Adaptive t Tests for Multiple Comparisons," Biometrics. XXXIX, pp. 790-794- •

13. Dunnett, C. ¥., "Answer to Query 272: Multiple Comparison Tests," Biometrics, XXVI (September 1969), pp. 139-14-0 •

14-. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (197 <5) pp. 574-583. ~ '

15. Federer, Walter T., Experimental Design: Theory and Application. New York, The Macmillan Company, 1955.

16. Ferguson, George A., Statistical Analysis in Psychology and. Education, 5th ed., New York, McGraw Hill Book Publishers, 1981.

17. Fisher, R. A., Statistical Methods for Research Workers. 6th ed., Edinburgh (London), Oliver and Boyd, 1936.

> The Design of Experiments. 2nd ed., Edin-18. burgh, Oliver and Boyd, 1937.

19. Fryer, H. C., Concepts and Methods of Experimental Statistics. Boston, Allyn and Bacon, 1966.

20. Gabriel, Ruben K., "Comment," Journal of the American Statistical Association. LXXIII (September 1978), pp. 4-85-4-87. "

21. Games, Paul, "Inverse Relation Between the Risks of Type I and Type II Errors and Suggestions for the Unequal n Case m Multiple Comparisons," Psychological Bulletin, LXXV (1971), pp. 97-102. *

63

22* _> Keselman, H. J., and Clinch, Jennifer J., "Multiple Comparisons for Variance Hetereogeneity," British Journal of Mathematical and Statistical Psychology. XXXII, (1979), pp. 133-142i

23. Gill, J. L., "Current Status of Multiple Comparisons of Means in Designed Experiments," Journal of Dairv Science, Vol. LVI (1973).

, "Evolution of Statistical Design and Analysis of Experiments," Journal of Dairy Science, LXIV (June 1981), p. U94-1519.

25. Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology. 2nd ed. Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984-

26. Harter, H. Leon, "Error Rates and Sample Sizes for Range

('^957)in M u l^|P 1®^^ o mP a r i s o n s» " Biometrics . XIII

27. Hinkle, Dennis E.; Wiersma, William and Jurs, Stephen G., asic Behavioral Statistics, Boston, Houghton Mif-

flin Company, 1982.

28. Howell, David C., Statistical Methods for Psychology. Boston, Duxbury Press, 1982.

29. Howell, John F. and Games, Paul A., "The Effects of Variance Heterogeneity on Simultaneous Multiple Comparison Procedures with Equal Sample Size," British Journal of Mathematical and Statistical Psychology. XXVII (1974), pp. 72-81.

30. Johnson, ̂ Palmer^ CU and Jackson, Robert W. B., Modern Statistical Methods: Descriptive and Inductive. Chicago, Rand McNally & Company, 1959~*

31. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their ^ ^ i o n s h i p to Power," Journal of Dairy Science, i,VIII (September 1975), pp. 1372-1378.

32. Keselman, H. J., "A Power Investigation of the Tukey Multiple Comparison Statistic," Educational and •Psychological Measurement , XXXVI (1976), pp"T~97-IU4 •

6A

— ; Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin. LXXXVI (Julv 1979), pp. 884.-888.

34. and Murray, Robert, "Tukey Tests for Pairwise Contrasts Following the Analysis of Variance: Is There a Type IV Error?," Psychological Bulletin. LXXXI (1974.) p. 609.

^ — and Rogan, Joanne C., Effect of Very Unequal Group Sizes on Tukey's

Multiple Comparison Test," Educational and Psychological Measurement. XXXVT (Summp-r 1976), pp. 263-270. ' ̂

and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Parametric Tests for Mul-

Comparisons, " British Journal of Mathematical a n d Statistical Psychology. XXX (May 1977;, pp. 125-133 •

3 7 - = : = — - — . "The Tukey Multiple ?S?P?fr

1?°n Test: 1953-1976," Psychological Bulletin, LXXXIV (September 1977), pp. 1050-1056.

^8* + u — M — "A Comparison of the Modifled-Tukey and Scheffe Methods of Multiple Comparisons for Pairwise Contrasts," Journal of the American Statistical Association. VXXIII (March

3 9' _ — a n d Toothaker, Larry E., "Comparison of Tukey's T-Method and Scheffe's S-Method for Various Numbers of All Possible Differences of Averages Contrasts Under Violation of Assumptions", Educa-tional and Psychological Measurement. VXX (1Q7/) pp. 511-519.

^ n-r 7;— and Shooter, t V S Evaluation of Two Unequal n, Forms of the Tukey Multiple Comparison Statistic?" Journal of the American^Statistical Association. LXX, (September

4-1. Kirk, Roger E., Experimental Design: Procedures for the Behavioral Sciences, 2nd ed., Belmont, Calif^nilT Brooks/Cole Publishing Company, 1982.

65

4-2. Kirk, Roger E., Professor of Psychology, Baylor Univer-sity > Waco, Texas, Personal letter received. January 22, 1985.

43. Levin, J. R. and Marascuilo, L. A., "Type IV Errors and Interactions," Psychological Bulletin. LXXVIII (1972), pp. 368^374"!

4-4-. Light, Richard J. and Pillemer, David B. Summing Up: The Science of Reviewing Research. Cambridge, Harvard University Press, 1984..

4-5. Myette, Beverly M. and White, Karl R., "Selecting An Appropriate Multiple Comparison Technique: An In-tegration of Monte Carlo Studies," Paper presented before the Annual Meeting of the American Educa-tional Research Association, March 19-23, 1982.

46. O'Neill, R. and Wetherhill, G. B., "The Present State of Multiple Comparison Methods," Royal Statistical Society (Series B), XXXIII (1971).

4-7. Petrinovich, Lewis F. and Hardyck, Curtis D., "Error Rates for Multiple Comparison Methods: Some Evidence Concerning the Frequency of Erroneous Conclusions," Psychological Bulletin. Vol. VXXI (1969), pp. 43-54.

4-8. Ramsey, Philip H., "Power Differences Between Pairwise Multiple Comparisons," Journal of the American Statistical Association. LXXIII^"1978), p. 4.79.

4-9. Roscoe, John T., Fundamental Research Statistics for the Behavioral Sciences. 2nd ed., New York, Holt" Rinehart and Winston, Inc., 1975.

50. Ryan, T. A., "Comment on 'Protecting the Overall Rate of iype I Errors for Pairwise Comparisons With an

T®?t Statistic,'" Psychological Bulletin. LaXXVIII (September 1980), pp. 354,-355.

51. Steel, R. G. D., "Query 163: Error Rates in Multiple Comparisons," Biometrics. (1961), pp. 326-328.

52. Thomas, D. A. H., "Error Rates in Multiple Comparisons Among Means: Results of a Simulation Exercise," Unpublished Master's Thesis, University of Kent, Canterberry, England.

53. Waldo, D. R., "An Evaluation of Multiple Comparison Procedures," Journal of Animal Science. XLII (1Q7M PP. 539-54-4-. '

66

54-. Waller, Ray A. and Duncan, David B., "A Bayes Rule for the Symmetric Multiple Comparisons Problem," Journal

the American Statistical Association. LXIV (December 1969), p. K85.

55 • Welsch, Roy E., "Stepwise Multiple Comparison Procedures," Journal of the American Statistical Association. LXXII (1977T7~pp. 566-575^

56. Winer, B. J., Statistical Principles in Experimental Design, New York, McGraw-Hill Book Company, 1962.

CHAPTER III

PROCEDURES

The Simulation Plan

The following plan was followed for generating data,

applying the F-test and the six specified multiple comparison

procedures, and presenting the summary statistics.

Generating random numbers

The heart of this Monte Carlo study was a pseudo-random

number generator developed from the Fortran computer program

"RANDU." RANDU generates twelve (U=12) uniform random num-

bers ranging from 0.00 to 0.99, adds them together, and

subtracts the value of six (U/2) from the total. The result

is a pseudo-random number which, along with N-1 others,

simulates a normal distribution with a mean of 0 and standard

deviation of 1.0.

The generator routine was converted from Fortran IV to

BASIC. The BASIC version of the generator created an ex-

tremely leptokurtic distribution with U set to 12 (See Appen-

dix A). Scores were concentrated about the mean more than

one would expect in a normal curve. A BASIC program was

written to modify U and test distributions of scores by the

chi-square goodness of fit test until the generator could

produce a population of at least 1000 scores which fit the

67

68

theoretical normal curve. Beginning with U=12, 1000 scores

were generated by the equation.

Xij = u + Tj + ei(j)- 23

This is equation 17 in Chapter II. The value of y was a

constant 100. The value of x. was a constant 0 representing

the null condition of no treatment effect. The value of

ei(j) was simulated by the following BASIC program segment;

36^0 ' GENERATE SCORES

3700 FOR S%=1 TO K <__ k groups 3710 FOR N%=1 TO NN(S%) < — n for each group 3720 FOR J%=1 TO U < — add U uniform random 3730 A=A+RND numbers 374-0 NEXT J% 3^50 E=(A-B)-""SIGMA < — subtract B=U/2 from sum; 3760 A=0 multiply by SIGMA=10 3770 X-MU+E < — score = 100 + E(rror)

3850 NEXTES7 N % < _~ ^ c a l c u l a t i o n s o f sums 3o50 NEXT S% f o r g r 0 Up means not shown here)

Scores were then categorized into one of the following

exclusive score ranges: (1) less than or equal to 60, (2)

less than 70, (3) less than 80, (4.) less than 90, (5) less

than 100, (6) less than 110, (7) less than 120, (8) less than

130, (9) less than 14-0, and (10) greater than or equal to

U 0 . The theoretical Normal Curve distribution was divided

into ten equal z-score ranges which yielded theoretical

percentages for N scores categorized into ten classes. These

percentages were multiplied by N = 1000, yielding expected

frequencies for the ten categories of 1.3, 10.9, 54-.6, 159.8,

69

273 »4> 273.4-, 1 59.8, 5-4• 6, 10.9 and 1.3 scores respectively.

Actual counts of range frequencies were tested for goodness

of fit against expected counts from the normal distribution.

This process was repeated ten times for each value of

U. An average chi-square value for the ten repetitions was

calculated and tested against the critical value of 16.919

(df = 9, a = 0.05). U was then incremented by 1 and the

procedure repeated. This program and the detailed results of

the chi-square tests are located in Appendix A. Table VIII

shows mean chi—square values obtained when 1000 scores were

placed into ten categories for ten repetitions.

TABLE VIII

MEAN CHI-SQUARE VALUES FOR TEN REPETITIONS OF N=1000 SCORES AND A GIVEN VALUE OF U

U* MEAN CHI-SQUARE U MEAN CHI-SQUARE

12 100.062 21 8.720 13 80.123 22 10.4-53 u 63.24.5 23 14..963 15 4.9.230 24. 20.4.24. 16 39-384 25 24.. 914. 17 27.352 26 36.280 1 8 20.113 27 U.968 19 12.64.3 ** 28 62.382 20 8.678 *** 29 86.84.6

30 100.194.

Number of uniform randoms used to generate one "normal" random

«v i ^00(^ i"lt at 5% level of significance Best fit between generated scores and normal curve for N=1000 scores

70

As U increased from 12 to 18, the leptokurtosis of the

generated distributions decreased toward normality. As U

increased beyond 21, the generated distributions passed

through normality and became increasingly platykurtic. As

shown in Table VIII, the best fit between distributions

generated by the BASIC program and the normal curve occurred

when U was set at 20. That is, using twenty uniform random

numbers to produce each normal score produced the best simu-

lated normal distribution.

The next question to be answered was how large a popu-

lation of pseudo—random numbers could be generated while

using U = 20 uniform random numbers and still "fit" the

normal curve. The BASIC program was modified to maintain the

value of U at 20 and vary the size of N. Sets of 1000, 2000,

3000, 4.000, and $000 scores were generated and placed in ten

categories. A chi-square value was computed on each set of N

scores. This was repeated ten times, and a mean chi-square

value computed. This was tested against the same critical

value as before: 16.919 (df = 9, a = 0.0$). All of these

sets fit the normal curve. Ten sets of 10,000 scores were

tested m the same manner. A significant chi-square declared

these distributions non-normal. Ten sets of 7000 and $$00

scores were tested in the same manner with the same result.

Table IX shows mean chi-square values for various sizes

of N using the best case U = 20 uniform random numbers. As

shown in the table, the random number generator produced

71

"normal" distributions up to N = 5000. This study used a

maximum of 180 scores per experiment (k=6, J=7, n. = J

80,20,20,20,20,20). Populations of 1000 are common in educa-

tional research. Therefore, the scores generated by the

pseudo-random number generator and equation 23 were well

within the bounds of a normal population distribution for

purposes of this study.

TABLE IX

MEAN CHI-SQUARE VALUES FOR TEN REPETITIONS OF N SCORES WITH U = 20

N MEAN CHI-SQUARE

1000 8.610 * 2000 9.085 * 3000 11.515 * 5000 15.^02 * 5500 17.195 7500 23.867

10000 22.227

"Good fit" at 5% level of significance

The BASIC routine was further tested by generating ten

sets of ten thousand scores each with jj = 0, o = 1, and U =

20. A mean and standard deviation was computed for each

set. Finally, average values for mean and standard deviation

was computed across all ten sets. These average values were

compared to 0 and 1.0 respectively to estimate the accuracy

of the BASIC version of RANDU. Results of this test are

shown in Table X.

72

TABLE X

MEAN AND STANDARD DEVIATION VALUES FOR TEN SETS OF N=10,000 SCORES

Set Mean Standard Deviation

1 0.0145909 1.2959851 2 -0.0092742 1.2947861 3 0.0034-212 1.2959851 4 0.0026654 1.2860699 5 0.0034329 1.2981012 6 0.0092000 1.2819469 7 -0.0031013 1.3148083 8 -0.0171827 1.2973865 9 -0.0067541 1.2890456 10 0.0186605 1.2829025

Mean: 0.0015659 1.2937017

Interpolating Critical Value Tables

Four critical value tables were required to analyze the

means of the generated scores. These were the F-distribution

table (3, Table E.5), the Studentized Range table (3, Table

E.7), the Duncan Multiple Range table (3, Table E.8), and the

Studentized Augmented Range table (3, Table E.18).

A BASIC program was written to interpolate the four

tables and store them in a single random access file for use

by the main program. Appropriate values were read from these

tables upon each k,J cycle of the simulation. These values

were printed at the top of each data summary sheet (see

Appendix E) in order to insure that each procedure was using

73

the proper testing criteria. Interpolated critical values

were considered sufficiently accurate for this study since

mainframe statistical packages such as SPSS and SAS use

interpolated critical values (4., p. 129).

The Main BASIC Program

The main program for this study, listed in Appendix C,

was written to do the following:

1. Initialize all variables and the printer.—This

subroutine dimensioned all variable arrays, set all variables

to their initial values, created all printing formats for

generated results and initialized the printout settings on

the printer.

2. Set cycle parameters.—This section set initial

values for k and J. The values of k ranged from 3 to 6 means

per experiment. The values of J ranged from 1 to 7, reflec-

ting seven categories of sample size. Upon initializing the

program, specific values could be selected for k and J. The

default values were k=3 and J=1. Subsequent cycles incre-

mented J to J+1 until J=8. J was then reset to 1 and k set

to k+1. Program execution ended when k incremented to 7.

4-. Assigned sample n's.—For sample size conditions J(1)

through J(5), n was set to 5, 10, 15, 20, and 25 respec-

tively. For condition J(6), n was set in increments of 5,

beginning at 10. That is, for k=3, was set to 10, n ? was

7 U

set to 15, and was set to 20. Total N for J(6) ranged

from 4-5 (k=3) to 135 (k=6). For condition J(7), n^ was set

to 80, and all other n.'s to 20. Total N for J(7) ranged

from 120 (k=3) to 180 (k=6). These sample sizes were chosen

to reflect the common conditions of small k's and unequal n's

in educational research. Kirk recommends the HSD modifi-

cation by Spjs^tvoll-Stoline when sample n's are approximently

equal and the Tukey-Kramer modification when there is a

moderate or large imbalance among sample n's (3, p. 120).

Therefore, J(6) was selected to reflect educational studies

in which groups are nearly equal in size and J(7) was

selected to reflect those studies in which one group is much

larger than the others.

5. Obtained critical values.—Critical values were read

from the four interpolated critical value tables which

resided in a single random access computer file. Each criti-

cal value was located in the random access file by a record

number computed by

R = (F-1)x1000 + (1-2)x200 + J Eq. 2k

where F was the file number (1-4), I was the table column

number, and J was the table row number. For the F-test, F

was 1, I was degrees of freedom between groups [k-1], and J

was degrees of freedom within groups [N-k]. The F-ratio

critical value for the case where k=4 and J=4 (equal n case,

n=20) was found as follows:

75

R = (F-1)x1000 + (I-2)x200 + J Eq. 25

= (1-1) x1000 + (3-2)X200 + 76

= 276.

Record number 276 held the value of 2.739» the Interpolated

F-test critical value for a = 0.05, dfb = 3, dfw = 76. The

critical value for the (F)LSD, located in the Studentized

Range Table (F=2, r=2) was found as follows:

R = (F-1)x1000 + (1-2)x200 + J Eq. 26

= (2-1)x1000 + (2-2)x200 + 76

= 1076.

Record number 1076 held the value of 2.822, the interpolated

critical Studentized Range value for q(2), a = 0.05, dfw =

76. Values for hypothesis testing were obtained in this same

manner for the HSD, SNK(r) and MRT(r) for each k,J

combination. File three held values of the Multiple Range

table and file four held values of the Student Augmented

Range table, used by the Spj^voll-Stoline procedure.

6. Generated scores and calculated an F-ratio.—The

specified number of scores were generated with the random

number generator. A sample of generated scores is located in

Appendix B. An F-ratio was computed and compared to the

appropriate F table value. If the computed F-ratio equalled

or was larger than the table value, then the F-ratio was

declared significant. A flag was set to "1 " for a sig-

nificant F-test and "0" for a non-significant F-test. A

76

running count of significant F-ratios was maintained and

printed on the summary sheet.

7. Computed critical differences.—The six specified

multiple comparison procedures were used to test the pairwise

differences between group means. As demonstrated in Chapter

II, each multiple comparison procedure computes its critical

difference in a unique way. For the equal n cases (J=1 to

5), the standard error of the mean was computed the same way

for the (F)LSD, the MRT, the SNK and the HSD. The standard

error of the mean is the square root of the mean square

within value, taken from the ANOVA table, divided by n, the

number of scores in each group (1, p. 370):

sd - /MSW/n E q # 27

The critical difference for each multiple comparison

procedure was computed by multiplying the standard error of

the difference by the appropriate table critical value.

These equations are summarized as

(F)LSD = q(a,2,v) /MSw/n Eq. 28

= q(a,k,v) /MSw/n Eq. 29

SNK^ = q(ot,r,v) /MSw/n jjq. o,Q

MRT^ = m(ot,r,v) /MSw/n gq ̂

S S D = /Uk-1)F c v] /[2MSW/n]. Eq. 32

Tables XI to XIV (See Appendix G) contain critical differed ice

77

values for each multiple comparison procedure for the equal n

cases for k = 3 to 6. These values are the actual values

computed for the specified k,J combination on a randomly

selected ith cycle.

It is inappropriate to compare critical differences

across J because they depend directly on the magnitude of

MSw, which was computed from randomly varying scores. Values

for each k,J combination reflect one random cycle of its set

of 1000 cycles. For a specified k,J combination, however,

one can compare the critical differences across procedures.

In every case we find the differences ranging from low — the

(F)LSD — through the MRT, SNK, and HSD to high — the SSD.

For the two k,J conditions for unequal sample sizes (J=6

and J-7), a slightly different procedure was used to compute

critical differences. The standard error of the mean re-

quired modification to deal with unequal n's. For the

(F)LSD, the MRT, the SNK, and the Kramer modification of HSD,

n was replaced by the harmonic mean of the n's of the two

means being tested (2, p. 302; 3, p. 120). That is, n was

replaced by

2 nh = T; 7 — Eq. 33

1/n. + 1/n. 1 J

This procedure was recommended by Keselman, Murray and Rogan

as inducing less bias into the critical difference than

computing n^ on all sample n's in the experiment (2, p.

78

302). The formula for standard error of "the mean used for

the unequal n case was therefore /MSw/n^ for the (F)LSD, the

MRT, the SNK, and the Tukey-Kramer HSD.

A second modification of the HSD used in this study was

the Spj^tvoll-Stoline. The standard error of the mean was

computed using the smaller of the two n's of the two means

being tested. This number, referred to as n . , directlv m m ' J

replaced the n in the formula for the standard error of the

mean (3, p. 119). Using the minimum n rather than the har-

monic mean between the two n's increased the size of the

standard error of the mean, and, in turn, increased the

magnitude of the critical difference. Therefore this modifi-

cation was more conservative than the Tukey-Kramer.

The modification for the standard error of the mean for

SSD for unequal n's involved multiplying MSw by (1/n. + 1/n.)

where n ± and n̂ . are the sample sizes for the two means being

tested (3, pp. 121-2).

Tables XV to XVIII (See Appendix H) contain critical

difference values for each multiple comparison procedure for

the unequal n cases and k = 3 to 6. The order of these

values differs from those of Tables XI to XIV. There, criti-

cal differences were given for each k,J combination and, for

the range tests, the value of r. In Tables XV to XVIII,

critical differences are given for specific paired-mean

tests. Means were ranked ordered from high to low before the

multiple comparison tests were applied. Differences between

79

paired means were then ordered such that, given three means,

d=1 means the difference between means 1 and 3; d=2 the

difference between 1 and 2; and d=3 the difference between 2

and 3. The critical difference used to test the pairwise

difference was based on the specific n's in the pair. There-

fore for J=6 each pairwise difference is tested by a unique

critical difference. These critical differences are shown in

Tables XV to XVIII. In the case of the SNK and MRT, the

first test (d=1) must be significant for the second test

(d=2) to be made. The complete procedure for making stepwise

tests is given in Appendix D. Suffice it to say here that

the dash (-) in the table refers to range test values not

computed because a preliminary test proved not significant.

It is inappropriate to compare critical differences

across d in the tables because the ordered comparisons,

sample sizes, and the resultant critical values varied ran-

domly within each selected cycle of its set of 1000 cycles.

For a given comparison (d), however, one can compare the

critical differences across procedures. All procedures may

be directly compared for d=1. Procedures other than the MRT

and SNK can be compared for all values of d. In most cases

we find the differences ranging from low — the (F)LSD

through the MRT, SNK, HSD-TK, the HSD-SS to high — the SSD.

In some cases the critical differences for the HSD-SS

procedure were more conservative than the SSD.

80

8. Computed summary statistics.—One thousand experi-

ments were simulated for each k,J combination. Separate

counts for each multiple comparison procedure were made of

the number of (1) all experiments with at least one Type I

error, (2) experiments with a significant F-ratio and at

least one Type I error, (3) all comparisons incorrectly

declared significantly different, and (4.) comparisons incor-

rectly declared significantly different in experiments with a

significant F-ratio. The variables which held these differ-

ing counts are shown in Table XIX.

The variables without a percent sign (%) were counters

which tracked each procedure's experimentwise and comparison-

wise errors. The counters under "EXPERIMENTWISE - ALL

ANOVAS" were incremented each time at least one Type I error

was made within an experiment. The first Type I error within

an experiment set a flag which caused all other Type I errors

within that experiment to be ignored. The counters under

"EXPERIMENTWISE - SIG ANOVAS" were incremented only when a

Type I error was made within an experiment with a significant

F-ratio. Counters under "COMPARISONWISE - ALL ANOVAS" were

incremented each time a comparison was declared significantly

different. Counters under "COMPARISONWISE - SIG ANOVAS" were

incremented only when significant comparisons were detected

in experiments with a significant F-ratio.

TABLE XIX

VARIABLES ASSOCIATED WITH SPECIFIED ERROR COUNTS FOR EACH MULTIPLE COMPARISON PROCEDURE

AND TWO KINDS OF TYPE I ERROR

81

LSD

FLSD

MRT

SNK

HSD

SSD

EXPERIMENTWISE COMPARISONWISE

HSD-SS

HSD-TK

LE PLE%

ME PME%

NE PNE%

HE PHE%

HE* PHE%

TE PTE%

SE PSE%

ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVAS

LC PLC%

FE PFE%

SME PSME%

SNE PSNE%

SHE PSHE%

MC PMC%

NC PNC%

HC PHC%

UNEQUAL N HSD

SHE PSHE%

STE PSTE%

HC PHC%

TC PTC%

SSE PSSE%

SC PSC%

FC PFC%

SMC PSMC%

SNC PSNC%

SHC PSHC%

SHC PSHC%

STC PSTC%

SSC PSSC%

:"HSD and HSD-SS shared these variables since they ran under different J's.

The variables ending in held error rate percen-

tages. Experimentwise error counts were divided by 1000 t<

yield a proportion, then multiplied by 100 to compute the

82

percentage. Per comparison error counts were divided by 1000

x k(k-1)/2, then multipled by 100 to yield percentages.

All programs were developed in interpreted BASIC to

facilitate interactive testing of the procedures. The

greatest drawback of interpreted BASIC is its slowness of

execution. Each program statement is translated into machine

language as it is being executed. One thousand repetitions

of the smallest k,J combination — k=3, J=1 — required 57

minutes to run. Therefore it was decided to compile the main

program. Using a BASIC compiler, the interpreted BASIC

program was converted into machine language. When the com-

piled program executed, there was no translation of state-

ments necessary, and 1000 repetitions of k=3 and J=1 ran in

just under five minutes. Run times for each k,J combination

were printed on the summary sheets.

For ot=0.05, it was expected that the count of sig-

nificant AN0VAS out of 1000 repetitions would be 50. Due to

the randomness of the data, however, there were sets of 1000

repetitions that produced too many or too few significant

F's. It was also noted that tests made by the multiple

comparison procedures were affected by this fluctuation: as

the count of significant F's increased or decreased, so did

the count of significant comparisons. It was decided that

only data which generated between 4.6 and 54 significant F's

in 1000 repetitions (rounding error for 0.050) would be used

in comparing the multiple comparison procedures. Table XX

83

gives the counts of significant F-ratios for each k,J com-

bination for 68,000 experiments.

TABLE XX

COUNTS OF SIGNIFICANT F-RATIOS FOR 1000 REPETITIONS OF EACH k,J COMBINATION

k = 3 k = 4- k = 5 k = 6

1 39 50 51 59 49 44 53

2 57 4-0 61 56 54 43 56 45 49 52

3 42 63 50 42 53 49 44 66 35 56 53

4 36 56 50 38 42 43 47 64 55 51 60 45

5 54 55 54 53 54

6 56 58 53 61 46 56 44 46 56 42 46

7 5<? 53 45 44 49 58 55 62 43 63 37 48

9. Printed out results.—Summary statistics for the F-

tests and multiple comparison procedures for each k,J com-

bination were printed on a single 8.5x11" page. These data

summaries are located in Appendix E.

Summary

In summary, a Monte Carlo simulation procedure was used

where (1) experiment size (k) varied from 3 to 6 means; (2)

84

there were seven sample size patterns (J); (3) the population

from which scores were drawn had a mean of 100 and a standard

deviation of 10; (4.) 1000 repetitions were computed for each

k,J combination; (5) five multiple comparison procedures were

applied to each pairwise comparison in each experiment under

the two conditions of (a) all experiments regardless of F-

ratio significance and (b) experiments in which the F-ratio

was significant at the 0.05 level; (6) counts and percentages

were maintained for the six specified procedures: LSD, FLSD,

MRT, SNK, HSD [HSD-SS and HSD-TK for unequal n cases] and

SSD.

85


1. Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology, 2nd ed., Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984.

2. Howell, David C., Statistical Methods for Psychology, Boston, Duxbury Press, 1982.

3. Kirk, Roger E., Experimental Design: Procedures for the Behavioral Sciences, 2nd ed., Belmont, California, Brooks/Cole Publishing Company, 1982.

4. Wilkinson, Leland, SYSTAT: The System for Statistics SYSTAT, Inc., Evanston, 111., 198-4.

CHAPTER IV

ANALYSIS OF DATA

The results of this study of specified multiple com-

parison procedure error rates is presented in two major

sections. The first section deals with the data directly

related to the two hypotheses stated in Chapter I. The

second section presents analysis of related data.

Hypotheses

The first hypothesis of this study was that there would

be no difference in the ranking of error rates found by

Carmer and Swanson (1973) using large k's and equal n's and

the ranking obtained in this study using small k's and un-

equal n's. The procedure for testing this hypothesis was to

compare the Type I error rates generated in this study to

those computed in the Carmer and Swanson study. If the

results of this comparison demonstrated that the level of

error rate followed the ranking of LSD > FLSD > MRT > SNK >

HSD > SSD, then hypothesis one was to be accepted. The em-

pirical data for experimentwise and comparisonwise Type I

error rates produced in unequal n cases for protected and

unprotected conditions is summarized in Table XXI.

86

87

TABLE XXI

EXPERIMENTWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS

UNEQUAL N'S FOR K = 3 TO 6

Procedure 3 4 5 6

LSD FLSDa

12.4% 5.3

19.3% 4 • 6

29.7% 4.8

35.2% 4.7

MRT MRTa

9.7 5.1

12.3 4«4

18.3 4.7

19.9 4.5

SNK SNKa

5.1 4.7

4.7 3.7

4.8 3.6

4.3 3.0

HSD-SS HSD-SSa

3.0 2.8

2.6 2.4

3.0 2.3

2.6 2.2

HSD-TK HSD-TKa

5.6 5.0

5.0 3.9

5.2 3.9

4.9 3.4

SSD SSDa

3.8 3.8

2.4 2.4

1.5 1.5

1 .2 1 .2

Congruency Congruency(a)

Nob

Noa Noc

Noc,d No? No

No° No

a, c experiments. HSD-SS < SSD and HSD-TK > SNK. nbD-rK > SNK. UHSD-SS = SSD.

In summary, the rankings of for small k's and un-

equal n1s were congruent with the rankings given by Carmer

and Swanson with the following exceptions: HSD-SS < SSD and

HSD-SS " < SSD (k=3); HSD-SSa < SSDa (k=4); and SNK < HSD-TK

(all k). Table XXI shows that 37 of the 48 k,J rankings, or

77 per cent, were in the proper order.

It was shown in Chapter II that experimentwise and

comparisonwise Type I error rates are related in a fixed

88

manner by the equation

1 " aew = ( 1 " V > ° E q - 3 4

where c is the number of comparisons to be made. It follows,

then, that the rankings of comparisonwise Type I error rates

should prove congruent with experimentwise Type I error

rates. The empirical data for comparisonwise Type I error

rates for unequal n cases is summarized in Table XXII. In

each case, with minor exceptions, the ranking of the Carmer

and Swanson studies was replicated.

In summary, the rankings of comparisonwise Type I error

rates for small k's and unequal n's were congruent with the

rankings of error rates given by Carmer and Swanson with the

exception of the following relationships: HSD-SS < SSD and

HSD-SS* < SSD* (k=3); HSD-SS* < SSD* (k=4); and SNK < HSD-TK

(k—6). Table XXII shows 4-4- of 4-8 of the k,J rankings, or 92

per cent, were in the proper order. When the overly conser-

vative HSD-SS procedure is excluded, 4.7 of 4.8, or 98 per

cent, fall in the proper order. Summing the counts from both

experimentwise and comparisonwise error rates, there are 81

of 96 of the k,J combinations, or 84 per cent, that fall in

the proper ranking. When the overly conservative HSD-SS is

excluded, there are 71 of 80, or 89 per cent, in the proper

rankings. Therefore, hypothesis one is retained for small

k's and unequal n's with the exceptions of the conservatism

of HSD-SS and liberalism of HSD-TK as they relate to the

equal n HSD procedure.

89

TABLE XXII

COMPARISONWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS

UNEQUAL N'S FOR K = 3 TO 6

Procedure 3 4 5 6

LSD FLSD*

5.25% 2.67

4.80% 1 .70

5.30% 1.48

5.02% 1 .31

MRT SMRT*

4.34 2.63

3.13 1 .57

3.14 1 .28

2.63 0.99

SNK SSNK*

2.^2 2.24

1.26 1.09

0.75 0.62

0.43 0.43

HSD-SS HSD-SS*

1 .08 1 .00

0.53 0.49

0.40 0.33

0.24 0.19

HSD-TK HSD-TK*

2.17 1 .97

1.11 0.93

0.69 0.56

0.46 0.36

SSD SSD*

1 .43 1.73

0.50 0.50

0.22 0.22

0.10 0.10

Congruency Congruency*

No** No**

Yes No**

Yes Yes

No*** Yes

HSD-SS < SSD. ***HSD-TK > SNK.

The rankings of experimentwise Type I error rates for

the equal n cases were congruent with the findings of Carmer

and Swanson with the exceptions of the relationships of SNK =

HSD for all k,J and FLSD* = MRT* (k=6). Table XXIII shows

that 29 of 4-0 k,J rankings, or 73 per cent, were in the

proper order.

90

TABLE XXIII

EXPERIMENTWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS

EQUAL N'S FOR K = 3 TO 6

Procedure 3 4 5 6

LSD FLSDa

11.96% 5.16

19.18% 5.14

27.56% 4.86

35.68% 5.20

MRT MRTa

9.70 5.12

13.56 5.14

18.68 4.86

22.48 5.20

SNK SNKa

5.12 4.74

5.04 4.30

4.64 3.70

5.00 3.80

HSD HSDa

5.12 4.74

5.04 4.30

4.64 3.70

5.00 3.80

SSD SSDa

4.22 4.22

2.52 2.52

1 .78 1 .78

1 .34 1.34

Congruency Congruencya

N oK No

N ok No ' c

No? No ' c

N o£ No ' c

cl cSignificant experiments. FLSD = protected MRT

SNK = HSD

If the ties are considered neutral, that is, as not refuting

the overall ranking order, then 40 of the 40 k,J rankings, or

100 per cent, are in proper order.

Table XXIV shows that the rankings of a p c for the equal

n cases were congruent with the findings of Carmer and Swan-

son with the one exception of FLSD"- = MRT* (k=3). Thirty-

nine of the 40 k,J rankings, or 98 per cent, are in the

proper order. If the ties are eliminated, then the ordering

is 100 per cent congruent.

91

TABLE XXIV

COMPARISONWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS

EQUAL N'S FOR K = 3 TO 6

Procedure 3 U 5 6

LSD FLSD*

5.11% 2.62

4.89% 2.05

5.05% 1 .62

5.06% 1 .52

MRT MRT*

4-.25 2.62

3.61 1.89

3.28 1 .40

3.12 1 .28

SNK SNK*

2.51 2.39

1.32 1 .20

0.77 0.68

0.62 0.54

HSD HSD*

2.07 1.94-

1 .05 0.93

0.63 0.53 O

O

• •

0 00

SSD SSD*

1.64. 1.64.

0.51 0.51

0.23 0.23

0.12 0.12

Congruency Congruency*

Yes No**

Yes Yes

Yes Yes

Yes Yes

FLSD = MRT^

The second hypothesis of this study was that there would

be no statistically significant difference in experimentwise

Type I error rate between the HSD and FLSD procedures when

using the Bernhardson formulas. The procedure for testing

hypothesis two was to test the difference between error rates

computed from the Bernhardson formulas for HSD and FLSD using

the z-test for difference between proportions. (1, pp. 230-

232). Table XXV, located in Appendix I, shows the results of

these tests. Of the 36 comparisons, 21 were not signifi-

cantly different. That is, the FLSD and HSD procedures

92

produced comparable experimentwise Type I error rates. In

every case where a significant difference was found, the

difference was due to the conservatism of the HSD, not the

liberalism of the FLSD. The significant differences were due

to the HSD error rates falling below nominal a. Eight of the

15 significant differences were produced by the HSD-SS which

consistently yielded excessively conservative error rates.

The comparisons in Table XXV were made between the FLSD

and the HSD under the "Significant F-ratio" condition

produced by the Bernhardson formulas. Carmer and Swanson

applied the HSD without regard to the significance of the F-

ratio. Table XXVI (Appendix I) shows the comparison between

FLSD and the "unprotected HSD." Of the 36 comparisons, 30

were not significantly different. That is, the FLSD and HSD

procedures produced comparable experimentwise Type I error

rates when the HSD was applied without regard to the sig-

nificance of the F-ratio. All six of the significant dif-

ferences were produced by conservatism of HSD-SS, not the

liberalism of the FLSD.

Therefore, hypothesis two is accepted for 51 of 72

comparisons, or 71 per cent. When the consistently conserva-

tive HSD-SS procedure is excluded from the analysis, hypothe-

sis two is accepted for 49 of 56 comparisons, or 88 per cent.

93

Related Findings

There are several related findings resulting from

analysis of the data which fall outside the scope of the

hypotheses but which have direct bearing on the study -of

multiple comparison procedures. These are described in this

section.

1. A confidence interval (0.95) was computed to deter-

mine which procedures produced experimentwise error rates

significantly different from 0.05 (1, p. 230). The limits of

this confidence interval were 0.0635 and 0.0365. Using the

Bernhardson formulas for experimentwise Type I error rate, no

procedure produced an error rate in excess of 0.0635. The

FLSD and protected MRT procedures yielded consistent nominal

error rates under all k,J combinations. The protected SNK

procedure produced error rates below 0.0365 in nine k,J

combinations: k=4, and 7; k=5, J=3, 4, 6 and 7; and k=6,

J=5, 6 and 7. The protected HSD-TK produced error rates

below 0.0365 m only two k,J combinations: k=6, J=6 and 7.

The protected HSD-SS procedure produced error rates below

0.0365 m all unequal k,J combinations. The SSD produced

error rates below 0.0365 in all k,J combinations above k=3,

J-6. Figures 7 to 10 (See Appendix J) graphically display

the error rate relationships among the multiple comparison

procedures selected for this study. Figure 7 is a graphic

presentation of experimentwise Type I error rates in relation

to 0.95 confidence interval for 0.05 level of significance

'4

generated by application of Bernhardson formulas for k=3 and

k=4. Figure 8 depicts experimentwise Type I error rates in

relation to 0.95 confidence interval for a =0.05 and N=1000

generated by application of Bernhardson formulas for k=5 and

k 6. Figure 9 depicts experimentwise Type I error rates in

relation to 0.95 confidence interval for a =0.05 and N=1000

generated without prior significant F-ratio for k=3 and k=4.

Figure 10 depicts experimentwise Type I error rates in rela-

tion to 0.95 confidence interval for a =0.05 and N=1000

generated without prior significant F-ratio for k=5 and k=6.

2. The HSD-TK modification produced error rates much

closer to nominal a and the equal n HSD than the HSD-SS

procedure for both unequal n conditions. The HSD-TK produced

slightly higher error rates than the SNK for the unequal n

conditions. The HSD-SS produced error rates closer to SSD

than HSD levels, and at times yielded a more conservative

test than SSD.

3. The SNK and HSD procedures produced identical ex-

perimentwise Type I error rates for all equal n cases.

4-. The FLSD and protected MRT procedures produced the

same experimentwise Type I error rates in 21 of the 28 k,J

combinations. Error rates of the protected MRT were slightly

lower in the following combinations: k=3, J=4, j=7; &=/,.,

J=7; k=5, J=7; and k=6, J=6, J=7.

5. The unprotected LSD and MRT produced experimentwise

Type I error rates far in excess of nominal a for all It,J.

95

6. The unprotected LSD was "the only procedure "to produce

a comparisonwise Type I error rate equal to nominal a. All

other procedures produced comparisonwise Type I error rates

below a. Use of the FLSD decreased a below a, but not as pc

low as the HSD a pc

96


1. Hinkle, Dennis E.; Wiersma, William and Jurs, Stephen G. Basic Behavioral Statistics. Boston, Houghton Mif-' flin Company, 1982.

CHAPTER V

CONCLUSIONS, RECOMMENDATIONS AND SUGGESTIONS FOR FURTHER RESEARCH

The driving force behind this study has been the desire

to work through the statistical jargon and conflicting as-

sumptions in the literature on multiple comparison procedures

to learn first-hand when and how they should be applied. The

problem of which procedure to use has proven to be related as

much to philosophical perspective as it is to mathematical

precision. It has been shown that each assertion for or

against a given procedure has been accompanied by theoretical

or empirical data analysis. The bewildering array of argu-

ments pro and con can leave the researcher, interested in his

subject more than the subtleties of statistical theory, with

little assurance that the procedure he has chosen will serve

his purposes better than those he did not choose. The recom-

mendations that follow come from both the synthesis of the

literature and the empirical analysis of data generated for

this study.

Conclusions

It is clear from the empirical data that if one's inter-

est is truly in testing paired comparisons in a parsimonious

design, the Least Significant Difference is the procedure of

97

98

choice. The unprotected LSD was the only procedure to yield

a comparisonwise Type I error rate at nominal a. This

parallels Carmer's most recent recommendation for the LSD

(3). The use of the FLSD reduces the comparisonwise Type I

error rate, but consequently reduces the power of the test as

well. Carmer suggests using the FLSD when the possibility is

good that all group means in an experiment are equal (4).

Otherwise, if the researcher has good reason to believe that

all means are not equal, the preliminary F-test is unneces-

sary. The smaller k's and unequal n's prevalent in educa-

tional research designs — due to sampling difficulties and

mortality of human subjects — suggest the benefit of at

least minimal protection afforded by the preliminary F—test.

It is clear from the empirical data that the FLSD

procedure does indeed hold the experimentwise Type I error

rate to nominal a for the complete null hypothesis. Com-

parison of the FLSD and HSD procedures reveals the FLSD

consistently yielded experimentwise error rates closer to

nominal <x than the HSD with a larger comparisonwise error

rate for every k,J combination. That is, the FLSD is more

sensitive to sample mean differences than the HSD while

protecting against experimentwise error.

Reviewing the writings of R. A. Fisher revealed the

conservatism of the HSD procedure. Given an experiment with

10 means (4.5 comparisons), he suggested that testing the one

pair with the largest difference out of the U5 should be made

99

against a probability of "1 in 900" rather than "1 in 20" (6,

p. 66). It was shown that this "1 in 900" is the same

probability, 0.0011 , that the HSD would use to test all 4-4-

remaining pairwise comparisons. If the interest of the

researcher is paired comparisons, the HSD becomes less at-

tractive as the number of means in his experiment increases.

Einot and Gabriel criticized the Carmer and Swanson

findings as "simple consequences of error rates defined by

the procedures, rather than the techniques themselves" (5, p.

577). Their solution was to set all of the procedures to the

same experimentwise error rate and compare their perfor-

mance. Since the type of error rate defined by a procedure

is an integral part of the procedure itself, it seems Einot

and Gabriel violated the procedures by forcing one kind of

error rate on all. Their suggestion that power of the HSD

can be raised by increasing a from 0.0$ to 0.25 begs the

question. Why strongly defend an experimentwise procedure,

m order to protect against excessive experimentwise error,

and then recommend raising the level of significance? Games

criticizes the Petrinovich and Hardyck recommendation on the

same grounds: "when one specifies a conservative test, and

then says that if this test is not significant, he will use a

more liberal test, he is merely adding confusion and incon-

sistency to his decision rule" (7, p. 100).

The SNK and MRT error rates fell between the FLSD and

HSD rates where they were expected. The SNK experimentwise

100

rate equalled the HSD rate for all k under the equal n

conditions. Analysis of data drawn from both significant and

non-significant experiment cycles showed that a pairwise

difference in a given experiment large enough to be detected

by SNK was also large enough to be detected by HSD. This is

because both procedures test the largest difference by the

same critical difference. Secondary significant differences

did not alter the experimentwise rate — one Type I error per

experiment is all that is considered. This finding opposes

Ramsey's objection to the SNK for its "excessive experiment-

wise error rate" (U, p. ^82). Perhaps this is due to the

complete null condition used in this study. Petrinovich and

Hardyck report that the error rates for the SNK are similar

to the MRT — too high — "for all conditions save the com-

plete null hypothesis" (13, p. 53). At any rate, the SNK and

MRT appear to fall somewhere between pure comparisonwise and

experimentwise error levels. Carmer and Walker criticize

both SNK and MRT for this very reason. It is better, in

their opinion, to choose an error rate which can best answer

the questions of a given study, and then apply a procedure

defined by that error rate (3, p. 21).

Of the two unequal n modifications of HSD, the Tukey-

Kramer test fared much better than the Spj^tvoll-Stoline. In

some cases the HSD-SS was more conservative than the

Scheffe. The conservatism of the HSD-SS was not due to its

being a "better test," but rather due to the way it handled

101

the unequal n situation. Use of the harmonic mean of the n's

from the two samples being tested yielded uniformly better

results that use of the minimum n. The HSD-TK consistently

yielded error rates within the 0.95 confidence interval.

The Scheffe Significant Difference demonstrated its

severe limitations for use with pairwise comparisons. For

all k > 3, the SSD yielded error rates below the lower limit

of the 0.95 confidence interval.

The Bernhardson formulas proved essential for control-

ling the experimentwise error rates of the LSD and MRT

procedures. Application of these formulas yielded experi-

mentwise error rates not significantly difference from

nominal a. The Bernhardson formulas were not helpful for the

SNK and HSD procedures. Under some conditions, the preceding

significant F-test pushed the HSD below the lower limit of

the 0.95 confidence interval. Use of the unprotected HSD

resulted in fewer significant departures from nominal a. The

Bernhardson formulas had no effect on the SSD.

The "perpetual dilemma" (7, p. 99; 10) of increasing one

type of error by reducing the other points up the limitation

of statistical tests. Simultaneously reducing both types of

error is achieved only by improving the research design

itself. This includes increasing the number of subjects,

using more precise tools of measurement, or using research

designs that better partition error ( 2 , p . 95,. 9 , p . 1 3 7 5 ) f

such as in the use of homogeneous blocks (8).

102

Recommendations

Is the researcher's concern focused on each pair of

means, or is it concerned with the entire family of all

pairs? If his concern is with each pair, then the use of the

unprotected LSD is most appropriate. If his concern is with

the entire experiment, then the unprotected HSD is most

appropriate.

Is the researcher reasonably sure that his treatment

means are different? If he is, then the preliminary F-test is

unnecessary and restricts the ability of the test to detect

pairwise differences. Use of the unprotected LSD is most

appropriate. If he is unsure about the equality of treatment

means — a common occurrance with the smaller k's found in

educational research then the preliminary F—test can be

applied. Use of the FLSD is most appropriate.

What is the anticipated cost of implementing the

findings of the study? Will the incorrect rejection of the

null hypothesis (Type I error) result in large commitments of

time and financial resources? If so, it is recommended to

protect against this possibility by using the HSD or the

FLSD. Will the incorrect retention of the null hypothesis

(Type II error) result in inefficient programs or methods

being continued while better procedures are not explored? If

so, it is recommended to protect against this possibility by

using the LSD if group means are known a priori to be dif-

ferent, or the FLSD if they are not.

103

How important is the magnitude of the difference being

sought? If the researcher is seeking only those differences

large enough to engender economically feasible changes which

will produce practical improvements, then the conservative

HSD or SSD can be applied. If, however, the researcher is

exploring a broad area of interest and seeks those dif-

ferences, however small, that are at least statistically

different, if not practically different, the FLSD or LSD

procedures can be applied. Differences found through this

approach can be focused on in further studies.

Suggestions for Futher Study

There are several questions that have been suggested by

this study that could be investigated further. These are

discussed in this section.

This study was limited to the complete null hypothesis.

Its findings are limited to situations where one is testing

group means under the complete null condition. Bernhardson

also tested the complete null situation (1). Ryan raises the

question of "partial" null hypotheses in which, for example,

nine means are equal and the tenth is much larger (15, p.

354-)* Even Carmer and Swanson reported an experimentwise

Type I error rate as high as 45.5 per cent for the FLSD under

the partial null condition (2, Table 4, p. 70). Ryan

criticizes not only Carmer and Swanson, but also Keselman,

Games, and Rogan (1979, 1980) for "perpetuating the same

10^

misleading recommendations by considering only complete

nulls" (15, p. 355). Related studies could be done to inves-

tigate the error rates and power for partial null conditions.

This study used two patterns of unequal n's related to

the recommendations of Kirk for using the Tukey-Kramer and

Spj^tvoll-Stoline modifications of the HSD. Related studies

could be done to investigate other patterns of unequal n's

with regard to specific kinds of educational research.

This study simulated the random drawing of scores from a

single population with mean of 100 and standard deviation of

10. A related study could be done to investigate the effects

of differing population variances on error rate and power of

the multiple comparison procedures using the Games-Howell and

Tamhane modifications of the HSD (11, pp. 120-121).

In the most recent publication of Carmer and Walker, the

suggestion was made that the study of differing rates of

application of pesticide, for example, would be better

analyzed with trend analysis through multiple regression than

by way of multiple comparisons (3, p. 5). Testing the sig-

nificance of differences between regression coefficients,

using effect coding, is tantamount to testing the differences

between means (12, p. 299). With the advent of computer

packages such as SPSS and SAS, which permit application of

multiple regression techniques with relative ease, a study

could be done investigating the differences between trend

analysis and pairwise comparisons. It may well be that, as

105

multiple regression grows in usage, and the pendulum swings

away from "all or nothing" hypothesis testing, the use of

multiple comparison procedures may decline. Certainly the

literature has shown movement away from this area since the

mid 1970's. Whether this happens or not, this study of

multiple comparison procedures was an excursion through the

Wonderful World of Statistical Theory which has both

broadened and deepened my understanding of and appreciation

for the logic and technical vocabulary of the field.

106


1. Bernhardson, Clemens S., "375: Type I Error Rates When Multiple Comparison Procedures Follow a Significant F Test of ANOVA," Biometrics. XXXI (March 197*5). ™ . 229-232.

2. Carmer, S. G., "Optimal Significant Levels for Application of the Least Significant Difference in Crop Perfor-mance Trials," Crop Science, XVI (Januarv-Februarv 1976), pp. 95-99.

3* and Walker, W. M., "Pairwise Multiple Com-parisons Procedures for Treatment Means," Technical Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983), pp. 1-33.

4- Professor of Biometry, University of Il-linois, Urbana, Illinois, Personal letter received January 14-, 1985.

5. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of "the American Statistical Association. LXX (1975) . pp. 574-583.

6. Fisher, R. A., Statistical Methods for Research Workers, 6th ed., Edinburgh (London), Oliver and Boyd, 1936.

7. Games, Paul, "Inverse Relation Between the Risks of Type I and Type II Errors and Suggestions for the Unequal n Case in Multiple Comparisons," Psychological ~ Bulletin. LXXV (1971), pp. 97-102.

8. Gill, J. L., "Evolution of Statistical Design and Analysis of Experiments," Journal of Dairy Science, LXIV (June 1981), p. 14.94.-1519.

9. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their Relationship to Power," Journal of Dairy Science, LVIII (September 1975), pp. 1372-1378.

107

10. Keselman, H. J.; Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Palrwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin, LXXXVI (Julv 1979), pp. 884.-888.

11. Kirk, Roger E., Professor of Psychology, Baylor Univer-sity, Waco, Texas, Personal letter received January 22, 1985.

12. Pedhazur, Elazar J., Multiple Regression in Behavioral Research: Explanation and Prediction. 2nd ed., New York, Holt, Rinehart and Winston, Inc., 1982.

13- Petrinovich, Lewis F. and Hardyck, Curtis D., "Error Rates for Multiple Comparison Methods: Some Evidence Concerning the Frequency of Erroneous Conclusions," Psychological Bulletin. Vol. VXXI (1969), pp. 43-54.

14. Ramsey, Philip H., "Power Differences Between Pairwise Multiple Comparisons," Journal of the American Statistical Association. LXXIII~~f1978), p. 479.

15. Ryan, T. A., "Comment on 'Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic,'" Psychological Bulletin. LXXXVIII (September 1980), pp. 354-355.

APPENDIX A

A GOODNESS OF FIT CHI-SQUARE TEST OF RANDOM NUMBER NORMALITY

T h i s p r o g r a m t e s t e d t h e a b i l i t y of a m i c r o c o m p u t e r t o g e n e r a t e n o r m a l l y d i s t r i b u t e d random n u m b e r s . One t h o u s a n d random^ number s were c a t e g o r i z e d i n t o t e n c l a s s e s . T h i s d i s -t r i b u t i o n was t e s t e d a g a i n s t t h e e x p e c t e d f r e q u e n c i e s of n o r m a l numbers c a l c u l a t e d f r o m p e r c e n t a g e v a l u e s i n a n o r m a l c u r v e t a b l e .

F o l l o w i n g t h e p r o g r a m l i s t i n g a r e s e l e c t e d p r i n t o u t s showing t h e r e l a t i o n s h i p b e t w e e n t h e t h e o r e t i c a l n o r m a l c u r v e

and t h e d i s t r i b u t i o n of g e n e r a t e d d a t a ( " * » ) f o r U = 12 , 16 , 19 , 2 0 , 2 1 , 2 4 , and 2 8 .

PROGRAM LISTING:

1110 1120 1130 1140 1150 1160 1170 1180 1190 1200 1210 1220 1230 1250 1260 1270 1280 1290 1300 1310 1320 1330 1340 1350 1360 1370 1380 1390

* * * * * i n i t i a t e

CLS : CLEAR : COLOR 4 ,0 ,0 ,0 DEFINT J.K.N U=ll N=1000 MU=100 SIGMA=10 C N T = 1 ° ' # REPETITIONS DIM C(10),PER(10),E(10),D(10),D2(10),DE(10),T(10),AVG(10)

' NUM OF UNIFORM RND # ' NUM OF REPETITIONS ' POPULATION MEAN ' POPULATION SDEV

GOSUB 2030 'SET UP PRINTER

U=U+1 : IF U=31 THEN END T1=VAL(MID$(TIME$,4,2))*60+VAL(RIGHT$(TIME$,2)) RANDOMIZE T1

BEGIN RND NUMBER LOOP *****

CLS FOR REP=1 TO 10 PRINT USING "NOW ON REP ## FOR J=1 TO N

FOR K=1 TO U A=A+RND

NEXT K X=(A-(U/2))*SIGMA+MU A=0

";REP;

' NID RND FROM U UNIFORM RND

108

109 1400 ' ***** CATEGORIZE SCORES ***** 1410 '

1430 IF X<=60 THEN C(1)=C(1)+l: T(1)=T(1)+1 :GOTO 1620 1450 IF X<70 THEN C(2)=C(2)+1: T(2)=T(2)+1 :G0T0 1620 1470 IF X<80 THEN C(3)=C(3)+1: T(3)=T(3)+1 :G0T0 1620 1490 IF X<90 THEN C(4)=C(4)+1: T(4)=T(4)+1 :GOTO 1620 1510 IF X<100 THEN C(5)=C(5)+1: T(5)=T(5)+1 :GOTO 1620 1530 IF X<110 THEN C(6)=C(6)+1: T(6)=T(6)+1 :G0T0 1620 1550 IF X<120 THEN C(7)=C(7)+1: T(7)=T(7)+1 :G0T0 1620 1570 IF X<130 THEN C(8)=C(8)+1: T(8)=T(8)+1 :GOTO 1620 1590 IF X<140 THEN C(9)=C(9)+1: T(9)=T(9)+1 :GOTO 1620 1610 IF X=>140 THEN C(10)=C(10)+1:T(10)=T(10)+1 1620 NEXT J 1630 * 1 6 4 0 ' ****** GOODNESS OF FIT TEST ****** 1650 ' 1660 FOR 1=1 TO 10 1670 READ PER(I)

1690 NEXT(I)=N*PER(I) ' C A L C E X P E C T E D FREQUENCIES

1720 RESTORE 0 1 3" 0 1 0 9" 0 5 4 6" 1 5 9 8" 2 7 3 4'• 2 7 3 4" 1 5 9 8" 0 5 4 6" 0 1 0 9" 0 0 1 3

1730 ' 1740 FOR 1=1 TO 10 1750 D(I)=C(I)-E(I) 1760 D2(I)=D(I)*D(I) 1770 DE(I)=D2(I)/E(I) 1780 CHI=CHI+DE(I) 1790 NEXT I

1810 ® ™ 2 I = S D M C H 1 + C H I : P ™ T "SING "CHI(##) , ####.###»;REP,CHI

1820 FOR 11=1 TO 10 :C(II)=0 :NEXT 1830 NEXT REP

1840 AVGCHI=SUMCHI/10: PRINT USING "AVGCHI = ####.###». AVGCHI 1850 BEND=AVGCHI ' A V b U U

1860 FOR 1=1 TO 10 1870 AVG(I)=T(I)/CNT 1880 T(I)=0 1890 NEXT I 1900 ' i9io ; ***** P R I N T OUTPUT ***** ±7 20

1930 G0SUB 2300 1940 ' 1950 ' **** L 0 0 p B A C K 1960 1970 LPRINT CHR$(12); : GOTO 1230 1980 '

1990 ' S E T UP PRINTER ROUTINE

2020 ' =========================================== 2030 LPRINT CHR$(15); 'COMPRESSED PRINT

2040 LPRINT CHR$(27);CHR$(9);CHR$(20); 'MOVE TO COL 25 1 1 0

2050 LPRINT CHR$(27);CHR$(57); 'SET LEFT MARGIN 2060 WIDTH LPRINT 132 2070 SCALE = 2 2080 F0$="## + ###.# \"+STRING$(85," ")+"\" 2090 FE$=" | \"+STRING$(85," ")+"\"

2100 H1$="CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS" 2110 H2$="TEN REPETITIONS" 2120 H3$="================-===============______________:_!! 2130 HH$="N = 1000 U = ##"

2140 Fl$=" -1- - 2- -3- -4- -5- -6- -7- -8- -9- 10 " 2160 H4$=" I

2180 lil'l ttl'l lit-*, m • " m j m • * tlt'Tlttl" 2190 Fl$= Category: "I'll ###'# "*'* ###-# ###-#

2200 F2$=" Observed: "+F2$ 2210 F3$=" Expected: "+F3$ 2230 H4$=" "+H4$ 2240 RETURN 2250 ' 2260 '========================================:==============

2 2 7 0 ' PRINT ROUTINE 2 2 8 0 ' = = = = = = = = = = = = = = = = = = = = = = : = = = = = = = ; = = = = = = = = = = = = = = = = = = =

2290 ' =========

2300 LPRINT:LPRINT

2310 TX=45-(LEN(Hl$))/2

2320 LPRINT TAB(TX);H1$

2330 TX=45-(LEN(H2$))/2 2340 LPRINT TAB(TX);H2$ 2350 TX=(45-LEN(H3$)/2) 2360 LPRINT TAB(TX);H3$ 2370 TX=(45-LEN(HH$)/2) 2380 LPRINT TAB(TX) USING HH$;U 2390 LPRINT

2400 H5$="MEAN CHI-SQUARE =####.###" 2410 TX=(45-(LEN(H5$)/2))

2420 LPRINT TAB(TX) USING H5$;AVGCHI 2430 LPRINT

2440 LPRINT Fl$ : LPRINT H4$

g E ̂ 2560 FOR 1=1 TO 10 2570 EX=(E(I)/SCALE)-1: IF EX<0 THEN EX=0 2580 0BS=(AVG(I)/SCALE) 2590 IF EX=0 THEN E$="" . G0T0 26?n 2600 IF EX<1 THEN E$=":" . G0T0 2620 2610 E$=STRING$(EX," * 2620 IF 0BS=0 THEN 0$="" . G0T0 2650 2630 IF 0BS<1 THEN 0$="<" ! GOTO 2fisn 2640 0$=STRING$(0BS,"*") * 2650 LPRINT USING F0$;I,AVG(I),0$ 2660 LPRINT USING FE$;E$

111

2670 NEXT I 2680 '

2690 FOR I = 1 TO 10 : AVG(I)=0 : NEXT 2700 T0T=0 : CHI=0 : SUMCHI=0 : AVGCHI=0 2740 RETURN 2750 ' 2760 '========================== END =====

112

CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=100C SCORES

UNIFORM RANDOM NUMBERS (U) = 12

MEAN CHI-SQUARE = 100.062

Category: -1- -2 - -3 - -4 - -5 - -6 - -7 - -8- -9- -10-

y. u

10

Observed: 0.1 0.9 19.5 143.9 335.9 338.5 138.1 22.5 0.6 Expected: 1.3 10.9 54.6 159.3 273.4 273.4 159.3 54.6 10.9 l!c

+ 0 <

1 <

20 * * * •

144 n u m m m i n n i i n m i m i o u i M

339 * " H " H H H H H H m t » i M m i i i i m m i m < ] i i i i i i m i m i i i i i i i i m i i i i M ! H H H H H >

133 **»*»»»»*«»»Ma Mm »mi n i l

23 *****

1 <

0

1 1 3

CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS

TEN REPETITIONS OF N=100Q SCORES

UNIFORM RANDOM NUMBERS !U) = 16

MEAN CHI-SQUARE = 3 9 . 3 8 4

C a t e g o r y ! - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 1 0 -

O b s e r v e d : 0 . 2 3 . 4 3 4 . 8 1 5 2 . 8 3 0 4 . 7 3 1 2 . 2 1 5 2 . 8 3 4 . 4 4 . 6 0 . 1

E x p e c t e d : 1 . 3 1 0 . 9 5 4 . 6 1 5 9 . 8 2 7 3 . 4 2 7 3 . 4 1 5 9 . 8 5 4 . 6 1 0 . 9 1 . 3

1 + 0 <

35

3 1 2 MjHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHt

3 4

10

1 U

CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=iOOO SCORES



Category.* -i- -2- -3- -4- -5- -6- -7- -8- -9- -10-

10

Observed: 0.5 7.2 53.9 157.3 280.6 287.1 157.8 47.8 7.6 0.2 Expected: 1.3 10.9 54.6 159.8 273.4 273.4 159.8 54.6 10.9 1.3

1 + 1 C

7 *

54 ***********

157 *******************************

281 ********************************************************

287 *********************************************************

158 ********************************

48 **********

8 * *

0 <

115

CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=100Q SCORES



Category: -1- -2- -3- -4- -5- -6-- 1 0 -

Observed: 1.1 7.8 49.7 160.7 272.8 279.4 165.5 52.3 9.9 0.8 Expected: 1.3 10.9 54.6 159.8 273.4 273.4 159.8 54.6 lo!? 1.3

1 + 1 < i I

2 + 3 **

3 + 50 *****«#*#

161

166 MHHHHHHHHHHt*

52 *######**#

10 * *

116

CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=1000 SCORES



Category: -1 -- 1 0 -

Observed: 1.2 8.3 55.7 157.1 269.0 275.1 165.6 54.0 12.3 1.7 Expected: 1.3 10.9 54.6 159.3 273.4 273.4 159.8 54.6 10.9 1.3

1 + 1 <

8 * *

56 ***********

10

157 *******************************

Zh9 ***********************************************#*##***

166 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

54 ***********

12 * *

117

CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS

TEN REPETITIONS OF N=1000 SCORES


MEAN CHI-SQUARE = 2 0 . 4 2 4

C a t e g o r y : - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 1 0 -

O b s e r v e d : 1 . 7 1 6 . 1 6 3 . 3 1 6 3 . 9 2 6 1 . 4 2 5 5 . 4 158 .4 6 2 . 2 1 5 . 4

E x p e c t e d : 1 . 3 1 0 . 9 5 4 . 6 1 5 9 . 3 2 7 3 . 4 2 7 3 . 4 1 5 9 . 8 5 4 . 6 1 0 . 9

1 + 2 C i

2 + 1 6 * * *

63 *************

164 MM********************#********

261 *******************************#M**********$*******

i '55 ********************************$##**********#*****

158 ********************************

62 ************

15 ***

2.2 1 . 3

118

CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=1000 SCORES


MEAN CHI-SQUARE = 62.:

Category: -i- -2- -3- -4- -5- -5- -7-- 1 0 -

o. o

10

Observed: 3.8 19.6 75.0 160.3 248.9 241.2 152.8 71.4 23.2 Expected: 1.3 10.9 54.6 159.8 273.4 273.4 159.8 54.6 10.9 L 3

20 * * * *

75 ***************

160 ********************************

249 ***********************************^^^^^#^##^

241 *************************************^$#*m$#m

153 *******************************

71 **************

23 *****

4 C

APPENDIX B

TWO SAMPLES OF DATA GENERATED BY THE MAIN PROGRAM

T^ 1 S a P P e n d l x displays two samples of data generated by rk-3ITandP?°gram' samples are produced with three groups Lic-JJ and five scores per group [J=1]. e p

Group 3

67.99 100.50 10^.71 99.57

103.83

M: s:

Significant F-Ratio

Group 1 Group 2

103.16 94-70

105.17 88.60 80.07

117.60 124.01 108.79 100.01 116.15

94.34 10.391

113.31 9.194

95.32 15.431

SOURCE Between Within Total

Group 1

81.36 105.66 89.36

103.99 92.13

M: 94-. 50 s: 10.24.0

SOURCE Between Within Total

SS DF MS 1141.1 2 570.54 1722.5 12 143.54 2863.6 14

143.54

F 3.975

Fcv 3.890

Sig? Yes

Non-Significant F-Ratio

Group 2

109.30 105.82 78.68 93.44 106.09

98.67 12.709

F 0.228

SS DF MS 56.7 2 28.34

1491.4 12 124.29 1548.1 14

124.29

Group 3

104.91 82.38 97.01 85.25

103.37

94.58 10.319

Fcv Sig? 3.890 No

119

APPENDIX C

The Main BASIC Program Listing

1000 '

'PROGRAM TO ANALYZE EXPERIMENTWISE AND COMPARISONWISE ERROR RATES - F°R S I X S E L E C T E D MULTIPLE COMPARISON PROCEDURES

J030 Ph.D. Dissertation in Educational Research , College of Education

J050 North Texas State University 060 William R. Yount

1 0 7 0 . July 1985 1080 ' 1090 ' ~ 1100 'PROGRAM MAP: 1120 'INITIALIZATION 1 1 3 0 'SET UP VARIABLES H40 'SET UP PRINTER 1160 'PREPARE PARAMETERS FOR REPETITIONS 1 1 7 0 'SET CYCLE PARAMETERS 1 1 8 0 'ASSIGN SAMPLE SIZES 1 1 9 0 'OBTAIN CRITICAL VALUES 1200 'MAIN LOOP: 1000 REPETITIONS 1 2 1 0 'GENERATE SCORES 1 2 2 0 'CALCULATE AND TEST F-RATIO 1 2 3 0 'CALCULATE MCP TESTS 1240 'SUMMARIZE RESULTS 1 2 5 0 'CALCULATE PERCENTAGES 1 2 6 0 'PRINT OUT RESULTS 1280 |PROGRAM ENDS AFTER LAST K,J CONDITION COMPLETED 1290

1310 I " " " " " " " " " " " " " " " " " " " INITIALIZATION SUBROUTINE """""'"""""""mmmmmmmi

1320 'SET UP VARIABLES 1330 ' 1340 KEY OFF : CLS : CLEAR 1350 REP%=1000 1360 RANDOMIZE VAL(RIGHT$(TIME$,2)) 1370 FOR RX%=1 TO RND*100:NEXT 1380 DEFINT I-K 1390 DEFSTR 0

!//O S 1 > = V N<2>-10 : N(3)-15 : N(4)=20 : N(5)-25

1450 M ">'NZZ ^ I N T ° U T. 0 F M D L T I P L E COMPARISON ERROR RATES * * *» 1 W ) UZ - GROUPS (k): # COMPUTATION TIME: \ \"

120

121

1460 1470 1480 1490 1500 1510 1520 1530 1540 1550 1560 1570 1580 1590 1600 1610 1620 1630 1640 1650 1660 1670 1680 1690 1700 1710 1720 1730 1740 1760 1770 1780 1790 1800 1810 1820 1830 1840 1850 1860 1870 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

03 05 06 07 08 09 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 OFMT 0TBL1= 0TBL2= 0TBL3= 0TBL4== ODIFN= ODIFM= ODIFL= ODIFH= ODIFS= ODIFHS ODIFHT

="SIZE (J): ft

#

r =" #: ti

F ##.###

Q ##.###

M ##.###

QT" ##.###"

"F-TESTS: ####" :"SIG F-TESTS: #### E: ## ="PERCENT SIG: #.### E: 0.050"

EXPERIMENTWISE

COMPARISONS: #####"

COMPARISONWISE"

T?

__ ft

="LSD ft

="FLSD _ f t

="MRT __ f T ="SNK _ _ t t

="SSD IT

="HSD

ALL ANOVAS SIG ANOVAS ALL ANOVAS

:"HSD-TK ft

"HSD-SS tf

"REP=### "SOURCE " BET " WITH " TOT "DIF=## "DIF=## "DIF=## "DIF=## "DIF=## ="DIF=## '="DIF=##

##### ##.##%

##### ##.##%

##### ##### ##.##% ##.##% ##### ##### ##.##% ##.##% ##### ##### ##.##% ##.##% ##### ##### ##.##% ##.##%

UNEQUAL ##### ##.##% ##### ##.##%

NCOMPS= #### DFB DF MS ## ##### .## ##

### ##### .##"

###"

##### ##.##% ##### ##.##%

K=# J=# SS

######.# ######.# #######.# ## NCV(##)=##.##

MCV(##)=##.## LCV=##.## » HCV=##.## " SCV=##.## » HCV-SS=##.##" HCV-TK=##.##"

#####" ##.##%"

##### ##.##% ##### ##.##% ##### ##.##% ##### ##.##%

N HSD ##### ##.##% ##### ##.##%

'=## DFW=### F FCV .### ##.###

SIG ANOVAS" ft

#####" ##.##%" #####" ##.##%" #####" ##.##%" #####" ##.##%" #####" ##.##%"

tt

#####" ##.##%" #####" ##.##%"

DFT=### NTOT=###" : SIG ICNT" : # ###"

.##

.##

.##

.##

.##

.##

J SET UP PRINTER AND DISK FOR OUTPUT

LOCATE 3,1 : INPUT "(P)RINTER OR (S)CREEN "-ZZ$ IF ZZ$="P" OR ZZ$="p" THEN 0P="LPT1:" : GOTO 1960 IF ZZ$="S" OR ZZ$="s" THEN 0P="SCRN:" BEEP : GOTO 1920 OPEN OP FOR OUTPUT AS #3 IF OP="SCRN:" THEN 2100 PRINT "READY PRINTER — THEN PRESS ANY KEY" A$=INKEY$ : IF A$="" THEN 1990

GOTO 1960

r'Skip printer setup

122

2010 LPRINT CHR$(27);"f";CHR$(2) 2020 LPRINT CHR$(27);"y"; 2030 LPRINT CHR$(27);"q"; 2040 LPRINT CHR$(27);CHR$(9);CHR$(12); 2050 LPRINT CHR$(27);CHR$(57) ; 2060 LPRINT CHR$(27);CHR$(9);CHR$(75); 2070 LPRINT CHR$(27);CHR$(58); 2080 WIDTH LPRINT 80 2090 LPRINT CHR$(27);CHR$(9);CHR$(15); 2100 GOTO 2590 2120 2130 2140 2150 2160 2170 2180 2190 2200 2210 2220 2230 2240 2250 2260 2270 2280 2290 2300 2310 2320 2330 2340 2350 2360 2370 2380 2390 2400 2410 2420 2430 2440 2450 2470 2480 2490 2500 2510 2520

font module #2 10 pitch quality print move to col 12 set l.m. at 12 move to col 75 set r.m. at 75

move to col 12 SET UP PARAMETERS

MCP TEST PRINTOUT SUBROUTINE

LPRINT TAB(3+XX*6);XX; : LPRINT TAB(3+XX*6);"===":

NEXT XX : NEXT XX

LPRINT LAB$ FOR 11=2 TO K FOR XX=2 TO K LPRINT FOR XX=K TO 2 STEP -1

LPRINT " :"+STRING$((K-XX+1)*2 " ")• FOR YY=K-1 TO 1 STEP -1

IF YY>=XX THEN LPRINT TAB(LP0S(0)+1);" "• • GOTO 2240 M„vrn

L P R I N T TAB(LP0S(0)+1);USING » \ \ ";FLAG$(XX,YY); NEXT YY LPRINT

NEXT XX LPRINT RETURN i

J DIFFERENCE CHART PRINTOUT SUBROUTINE

LPRINT :LPRINT :LPRINT "PAIRWISE DIFFERENCES:" • LPRINT LPRINT TAB(13);""; FOR II=K-1 TO 1 STEP -1

LPRINT USING " ###.## ";XBAR(II); NEXT II LPRINT LPRINT TAB(13);""; FOR 11=1 TO K-l

LPRINT " ====== NEXT II LPRINT FOR II=K TO 2 STEP -1

LPRINT USING " ###.## FOR JJ=K-1 TO 1 STEP -1

IF JJ>=II THEN LPRINT TAB(LP0S(0)+1);" "• • GOTO ?500 DIF=ABS(XBAR(JJ)-XBAR(II)) ' * 0 0 1 0 2 5 0 0

LPRINT TAB(LP0S(0)+1);USING "###.## ";DIF-NEXT JJ

LPRINT NEXT II

";XBAR(II);

2530 RETURN

:'Select 1 cycle to print

123

2560 PREPARE PARAMETERS FOR REPETITIONS """"""'"""""""tmmtf

2570 ' INCREMENT K,J 2580 ' 2600 BEEP:BEEP

2610 LOCATE 5,1 : INPUT "ENTER K(3,4,5,6) AND J(l,2,3 4 5 6 7) AS # # "*lf T 2620 LOCATE 6,1 : INPUT "PRINTOUT (Y) OR NO PRINTOUT (N) OF CALCS "'-wt 2630 IF OPTO'T' AND OPTO»N" THEN BEEP : GOTO 2620 ' 2640 SETUP=1 2650 GOTO 2880 2660 ' 2670 ' TOP OF INCREMENT CYCLE 2680 '

2690 K=K+1 : J=0 : IF K=7 THEN 8510 .'goto nroeran, PnH 2700 J=J+1 : IF J=8 THEN 2690 !»? p r o ^ m e"d

2880 NC0MPS=REP%*(K*(K-l))/2 •»Pa a r r e S 6 t J

2890 PRTOUT-O ; . S e t p r i n t e r flag * C o n , P a r i s o n s

2900 RANDOMIZE RX% :'Randomize aoain

2920° '•'Reset timer to 0

2930 PRINT "*** BEGIN ***» :'Scrn display of status 2940 PRINT USING "*=» K=# J=#"-K J 2950 PRINT ' ' 2960 RD%=RND*1000 2970 PRINT "REP #"RD%" WILL BE PRINTED" 2980 * 2990 ' ASSIGN SAMPLE SIZES 3000 * 3010 NT0T=0 3020 FOR 1=1 TO K 3030 IF J<>6 THEN 3060 30^0 IF 1=1 THEN NN(I)=10 ELSE NN(I)=NN(I-l)+5 3050 GOTO 3100 3060 IF J<>7 THEN 3090 3 0 7 0 I F 1=1 THEN NN(I)=80 ELSE NN(I)=20 3080 GOTO 3100 3090 NN(I)=N( J) 3100 NTOT=NTOT+NN(I) 3110 NEXT I 3120 ' 3130 DFB=K-1 3140 DFW=NT0T-K 3150 DFT=NT0T-1 3160 ' 3170 ' READ CRITICAL VALUES 3180 * 3190 OPEN "G:TABLE.RND" AS #2 LEN=15

3210 ™ L D #2'3 A S TAB$' 2 A S B$' 2 A S W$' 4 A S CV5$' 4 A S C V 1 $

3220 IF 0PT="Y" THEN LPRINT "CRITICAL VALUES-" 3230 'F TABLE 3240 FX=1 : IX=DFB : JX=DFW 3250 RECORD = (FX-1)*1000 + (IX-2)*200 + JX

124 3260 GET #2, RECORD 3270 FCV=CVS(CV5$) 3280 IF OPT="Y" THEN LPRINT USING "FCV=##.###";FCV 3290

3300 'STUDENTIZED RANGE: SNK(LEVELS), LSD(2), HSD(K) 3310 FX=2 K J

3320 FOR IX=2 TO K 3330 RECORD = (FX-1)*1000 + (IX-2)*200 + JX 3340 GET #2, RECORD 3350 Q(IX)=CVS(CV5$)

3370 NEXTFIXPT=="Y" ™ E N L P R I N T U S I N G ";IX,Q(IX);

3380 IF OPT="Y" THEN LPRINT 3390 '

3400 'MULTIPLE RANGE: DMRT(LEVELS) 3410 FX=3 3420 FOR IX=2 TO K 3430 RECORD = (FX-1)*1000 + (IX-2)*200 + JX 3440 GET #2, RECORD 3450 M(IX)=CVS(CV5$)

3460 IF OPT="Y" THEN LPRINT USING "M(#)=#.### "-IX MCIXV 3470 NEXT IX , u , n u ^ ' 3480 IF OPT="Y" THEN LPRINT 3490 ' 3500 'STUDENT AUGMENTED RANGE: H-SS(K) 3510 FX=4 : IX=K

3520 RECORD = (FX-1)*1000 + (IX-2)*200 + JX 3530 GET #2, RECORD 3540 QT=CVS(CV5$)

3560 'F °PT="Y" T H E N L P R I N T U S I N G "QT=#-### "^ T : L P ™

3570 CLOSE #2

3590 '""'"'"""""'"""""""HMtmMMMMt BEGIN MAIN LOOP """""'"""""""""MMttrmMMnnHi

3600 3610 FOR NREPS%=1 TO REP% 3620 IF INT(NREPS%/C)<>NREPS%/C THEN 3640 3630 PRINT USING "REP: #### TIME: \ \":NREPS% TIMES 3640 IF NREPS%ORD% OR OPT="N" THEN 3700 , $

3660 PRTOUT-fING OFMT;NREPSZ'It'J'KCOMPS.DFB.I>™.DfT,NTOT 3670 ' 3680 ' GENERATE SCORES 3690 ' 3700 FOR S%=1 TO K 3710 FOR N%=1 TO NN(S%) 3720 FOR U%=1 TO 20 :'

3740 NFYTAn7+RND :»' R a n d o m NID error generated from 20 3750 E=(A-B)*SIGMA •' u n i f°™ random numbers. B-10. SIGMA-10. 3760 A=0 3770 X=MU+E •» • j -j 3780 SX2-SY?+Y*Y '' ndividual score without treatment effect

, 1 : S u m o f x squared T(S%)-T(S%)+X :' Sum of Xj

125

3800 NEXT N% 3810 XBAR(S%)=T(S%)/NN(S%) r'Meani 3820 TJ(S%)=T(S%)^T(S%)/NN(S%) :'Tj squared / ni 3830 T=T+T(S%) :'SumsumXij 3840 TJ=TJ+TJ(S%) :'Sun, of all Ti 3850 NEXT S% J

3860 ' 3870 IF NREPS%ORD% OR OPT="N" THEN 4000 3880 LPRINT :LPRINT "DATA SUMMARY:" 3890 FOR 1=1 TO K 3900 LPRINT USING "MEAN(#)=###.# ";I,XBAR(I); 3910 NEXT I 3920 LPRINT 3930 FOR 1=1 TO K 3940 LPRINT USING " N(#)= ## ";I,NN(I); 3950 NEXT I 3960 LPRINT 3970 ' 3980 ' CALCULATE F 3990 ' 4000 TTN=T*T/NT0T 4010 SSB=TJ-TTN 4020 SSW=SX2-TJ 4030 SST=SX2-TTN 4040 MSB=SSB/DFB 4050 MSW=SSW/DFW 4060 F =MSB/MSW

4080 ' IS F-RATIO SIGNIFICANT? 4090 ' 4100 IF F>=FCV THEN SIG=1 ELSE SIG=0 4110 IF SIG=1 THEN ICNT=ICNT+1 4120 *

4130 IF NREPS%ORD% OR OPT="N" THEN 4230 4140 LPRINT 4150 LPRINT 0TBL1

4160 LPRINT USING 0TBL2;SSB,DFB,MSB,F,FCV,SIG,ICNT 4170 LPRINT USING 0TBL3;SSW,DFW,MSW 4180 LPRINT USING 0TBL4;SST,DFT 4190 LPRINT 4220 ' 4230 FOR 1=1 TO K C L E A N U P

4240 T(I)=0 : TJ(I)=0 4250 NEXT I 4260 SX2=0 : T=0 : TJ=0 4280 ' """'"""MMimiiimniitmimti! MULTIPLE COMPARISONS """'""""""""""""""mmmmi 4285 4290 ' 4300 ' RANK MEANS FROM HIGH TO LOW 4310 ' 4320 FOR PRI=1 TO K-l 4330 FOR SEC=PRI+1 TO K 4340 IF XBAR(PRI)>=XBAR(SEC) THEN 4370 XBAR(l),NN(l)=high

126 4350 4360 4370 4380 4400 4410 4420 4430 4440 4450 4460 4470 4480 4490 4500 4510 4520 4530 4540 4550 4560 4570 4580 4590 4600 4610 4620 4630 4640 4650 4660 4670 4680 4690 4700 4710 4720 4730 4740 4750 4760 4770 4780 4790 4800 4810 4820 4830 4840 4850 4860 4870 4880

SWAP XBAR(PRI),XBAR(SEC) SWAP NN(PRI),NN(SEC)

NEXT SEC NEXT PRI f

IF J>5 THEN 6180 t

' EQUAL N ROUTINE T

SDS = SQR(2*MSW/NN(1)) SD = SQR(MSW/NN(1)) LCV=Q(2)*SD HCV=Q(K)*SD SCV=SQR((K-1)*FCV)*SDS IF NREPS%ORD% OR OPT='"N" THEN 4950 LPRINT "SED (L,M,N,H) LPRINT USING "SED LPRINT USING "SED LPRINT LPRINT "SED (SSD) LPRINT USING "SED LPRINT USING "SED LPRINT LPRINT "LSD CV LPRINT USING " LPRINT USING " LPRINT LPRINT "SNK CV FOR 1=2 TO K

SNK=Q(I)*SD LPRINT USING "SNK(#)

NEXT I : LPRINT LPRINT "MRT CV FOR 1=2 TO K

MRT=M(I)*SD LPRINT USING "MRT(#)

NEXT I LPRINT LPRINT "HSD CV LPRINT USING " LPRINT USING " LPRINT LPRINT "SSD CV LPRINT USING " LPRINT USING " LPRINT CHR$(12); LPRINT "RANKED MEANS:" FOR 1=1 TO K

LPRINT USING "MEAN(#)=### NEXT I LPRINT FOR 1=1 TO K

LPRINT USING " N(#)=##

XBAR(k),NN(k)=low

:' Goto Unequal n routine

' S.E.D.:SSD ' S.E.D.:LSD,HSD,SNK,MRT ' LSD CV ' HSD CV ' SSD CV

SQT(MSW / N)" SQT(####.#/##)";MSW,NN(1) ###.##";SD

SQT(2 * MSW / N)" SQT(2 * ####.#/##)";MSW,NN(1) ###.##";SDS

QC2) * SD" #.### * ##.###";Q(2),SD ##.###";LCV

Q(K) SD"

= #.### * ##.### = #.###";I,Q(I),SD,SNK

M(K) * SD"

= #.### * ##.### = #.###";!,M(I),SD,MRT

Q(K) * SD" #.### * ##.###";Q(K),SD ##.###";HCV

SQT((K-1)*FCV) * SDS" SQT(( # )*#.### * ##.###";K-l,FCV,SDS ##.###";SCV

# ";I,XBAR(I);

";I,NN(I);

127

4890 4900 4910 4920 4930 4940 4950 4960 4970 4980 4990 5000 5010 5020 5030 5040 5050 5060 5070 5080 5090 5100 5110 5120 5130 5140 5150 5160 5170 5180 5190 5200 5210 5220 5230 5240 5250 5260 5270 5280 5290 5300 5310 5320 5330 5340 5350 5360 5370 5380 5390 5400 5410

NEXT I GOSUB 2320 LPRINT : LPRINT LPRINT "MULTIPLE COMPARISON PROCEDURES:" : LPRINT

„ STUDENT NEWMAN KEULS (SNK): EQUAL N'S LAB$="LAYER" V

FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II

FOR II=K TO 2 STEP -1 FOR JJ=1 TO II-l

IF F L A G $ ( I I , A N D JJ=1 THEN 11=2 : GOTO 5200 *'SKIP RFST

™ N J J = n " 1 : G 0 T° ; R O W

NCV(KK)=Q(KK)*SD DIF=ABS(XBAR(II)-XBAR(JJ))

( IF NREPS%=RD% AND 0PT="Y" THEN LPRINT USING ODIFN;DIF,KK,NCV(KK)

IF DIF>=NCV(KK) THEN 5140 FLAG$(II,JJ+1)="-M

FLAG$(II-1,JJ)="-" FLAG$(II,JJ)="NS" IF KK=K THEN 5210 JJ=II-1 GOTO 5180

IF NF=1 THEN 5160 ELSE NE=NE+1 : NF=1 IF SIG=1 THEN SNE=SNE+1

NC=NC+1 : FLAG$(II,JJ)="*" IF SIG=1 THEN SNC=SNC+1

IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 NEXT JJ

NEXT II

IF NREPS%=RD% AND 0PT="Y" THEN LAB$="FINAL" : GOSUB 2150

r'COMP NOT SIG

:'SKIP ALL TESTS :'SKIP REST OF ROW

:'EW (ALL) :'EW (SIG) :'PC (ALL) :'PC (SIG)

— MULTIPLE RANGE TEST (MRT): EQUAL N'S

FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II

LAB$="LAYER" FOR II=K TO 2 STEP -1

FOR JJ=1 TO II-1 IF FLAG$(II,JJ)="-" AND JJ=1 THEN 11=2 IF FLAG$(II,JJ)="-" THEN JJ=II-1 KK=II-JJ+1 MCV(KK)=M(KK)*SD DIF=ABS(XBAR(II)-XBAR(JJ))

IF NREPS%=RD% AND OFIV'Y" THEN LPRINT USING ODIFM;DIF,K,MCV(EIC)

IF DIF>=MCV(KK) THEN 5440 FLAG$(II,JJ+1)="-" FLAG$(II-1,JJ)="-" FLAG$(II,JJ)="NS"

IF KK=K THEN 5510 :'SKIP ALL TESTS

: GOTO 5500 r ' S K I P REST : GOTO 5490 r ' S K I P THIS ROW :'Layer index for MRT

128

5430 GOTO 5490 : ^ 5 4 4 0 IF MF=1 THEN 5460 ELSE ME=ME+1 : MF=1 5 4 5 0 IF SIG=1 THEN SME=SME+1 5 4 6 0 MC=MC+1 : FLAG$(II,JJ)="*" 5 4 7 0 IF SIG=1 THEN SMC=SMC+1 5480 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 5490 NEXT JJ 5500 NEXT II 5510 *

5530 ^ OpT="Y" THEN LAB$="FINAL" : GOSUB 2150

255Q ! (F)LSD PROCEDURE: EQUAL N'S

5560 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II

5580 FOR II=K TO 2 STEP -1 5590 FOR JJ=1 TO II-l 5600 DIF=ABS(XBAR(II)-XBAR(JJ))

5620 S ™ > 2 c " 5 6 T ' Y " T H E" M I n T U S I N G 0 D I F L : ™ 5630 FLAG$(II,JJ)="NS" 5640 GOTO 5690

H I ° J F LF=1 THEN 5670 ELSE LE=LE+1 : LF=1 5 6 6 0 IF SIG=1 THEN FE=FE+1 5670 LC=LC+1 : FLAG$(II,JJ)="*»

I F S I G = 1 T H E N FC=FC+1 5690 NEXT JJ 5700 NEXT II 5710 ' 5720 IF NREPS%=RD% AND 0PT="Y" THEN LAB$="FINAL" : GOSUB 2150

5^° i TUKEY PROCEDURE (HSD): EQUAL N'S


5780 FOR II=K TO 2 STEP -1 5790 FOR JJ=1 TO II-l 5800 DIF=ABS(XBAR(II)-XBAR(JJ)) 5810 _ IF NREPS%=RD% AND OPT.»Y» THEN LPRINT USING ODIFH;DIP,HCV

5830 IF DIF>=HCV THEN 5860 5840 FLAG$(II,JJ)="NS" 5850 GOTO 5900

^60 IF HF=1 THEN 5880 ELSE HE=HE+1 : HF=1 58?0 IF SIG=1 THEN SHE=SHE+1 5880 HC=HC+1 : FLAG$(II,JJ)="*»

5900° N E X T J J I F S I G = 1 T H E N S H C » S H C ^

5910 NEXT II 5920 '

5940 ̂ N R E P S % = R D % A N D OPT="Y" THEN LAB$="FINAL" : GOSUB 2150

129

5 9 5 0 ' SCHEFFE PROCEDURE (SSD): EQUAL N'S 5960 5970 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II 5980 5990 FOR II=K TO 2 STEP -1 6000 FOR JJ=1 TO II-l 6010 DIF=ABS(XBAR(II)-XBAR(JJ))

6030 ' ^ N R E P S % = R D % A N D 0PT"'Y" THEN LPRINT USING ODIFS;DIF,SCV

6040 IF DIF>=SCV THEN 6070 6050 FLAG$(II,JJ)="NS" 6060 GOTO 6110 6 0 7 0 IF SF=1 THEN 6090 ELSE SE=SE+1 : SF=1 6 0 8 0 IF SIG=1 THEN SSE=SSE+1 6 0 9 0 SC=SC+1 : FLAG$(II, 5J9? IF SIG=1 THEN SSC=SSC+1 6110 NEXT JJ 6120 NEXT II 6130 '

6140 IF NREPS%=RD% AND OPT="Y" THEN LAB$="FINAL" : GOSUB 2150 6150 f

6170 *'Skip unequal n routine

6180 ' UNEQUAL N ROUTINE 6190 ' 6 2 0 0 ' SNK UNEQUAL N'S 6210 6220 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II

6240 LAB$="LAYER" 6250 FOR II=K TO 2 STEP -1 6260 FOR JJ=1 TO II-l

I F F L A G $ ( H » A N D JJ=1 THEN 11=2 : GOTO 6480 -'SKIP REST

6290° T ™ : , G 0 T O : ROW

6300 NHAR-2/((l/NN(II))+(l/NN(JJ))) • f Layer index for SNK

6310 NCV(KK)=Q(KK)*SQR(MSW/NHAR) •' SNK 6320 DIF=ABS(XBAR(II)-XBAR(JJ))

6330 i IF NREPS%=RD% AND OPT="Y» THEN LPRINT USING ODIFN;DIF,KK,NCV(KK)

6350 IF DIF>=NCV(KK) THEN 6420 6360 FLAG$(II,JJ+1 6370 FLAG$(II-1,JJ)="~" 6380 FLAG$(II,JJ)="NS"

6400 JJ=TT-]I ™ E N 6 4 9 0 :' S K I P A L L T E S T S

6410 GOTO 6460 :'SKIP REST OF ROW

6420 IF NF=1 THEN 6440 ELSE NE=NE+1 : NF=1 f430 IF SIG=1 THEN SNE=SNE+1 °440 NC=NC+1 : FLAG$(II,JJ)="*" 64f0 IF SIG=1 THEN SNC=SNC+1 6460 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 6470 NEXT JJ

130 6480 NEXT II 6490 ' 6500 IF NREPS%=RD% AND 0PT="Y" THEN LAB$="FINAL" : GOSUB 2150 6510 ' 6520 ' MRT UNEQUAL N'S 6530 ' 6540 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II 6550 LAB$="LAYER" 6560 ' 6570 FOR II=K TO 2 STEP -1 6580 FOR JJ=1 TO II-l 6590 IF FLAG$(II,JJ)="-" AND JJ=1 THEN 11=2 : GOTO 6800 :'SKIP REST 6600 IF F L A G $ ( I I , T H E N JJ=II-1 : GOTO 6790 :'SKIP THIS ROW 6610 KK=II-JJ+1 :'Layer index for MRT 6620 NHAR=2/((1/NN(II))+(1/NN(JJ))) :'MRT 6630 MCV(KK)=M(KK)*SQR(MSW/NHAR) :'MRT 6640 DIF=ABS(XBAR(II)-XBAR(JJ)) 6650 IF NREPS%=RD% AND OPT="Y" THEN LPRINT USING ODIFM;DIF,KK,MCV(KK) 6660 ' 6670 IF DIF>=MCV(KK) THEN 6740 6680 FLAG$(II,JJ+1)="-" 6690 FLAG$(II-1,JJ)="-" 6700 FLAG$(II,JJ)="NS" 6710 IF KK=K THEN 6810 SKIP ALL TESTS 6 7 2 0 JJ=II-1 :'SKIP REST OF ROW 6730 GOTO 6780 6740 IF MF=1 THEN 6760 ELSE ME=ME+1 : MF=1 6750 IF SIG=1 THEN SME=SME+1 6760 MC=MC+1 : FLAG$(II,JJ)="*" 6770 IF SIG=1 THEN SMC=SMC+1 6780 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 6790 NEXT JJ 6800 NEXT II 6810 ' 6820 IF NREPS%=RD% AND OPT="Y" THEN LAB$="FINAL" : GOSUB 2150 6830 ' 6840 ' (F)LSD UNEQUAL N'S 6850 ' 6860 LAB$="FINAL" 6870 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II 6880 ' 6890 FOR II=K TO 2 STEP -1 6900 FOR JJ=1 TO II-l 6910 NLS=((1/NN(II)) + (1/NN(JJ))) :'(F)LSD 6920 LCV=Q(2)*SQR(MSW*NLS/2) :'(F)LSD 6930 DIF=ABS(XBAR(II)-XBAR(JJ)) 6940 IF NREPS%=RD% AND 0PT="Y" THEN LPRINT USING ODIFLrDIF.LCV 6950 ' 6960 IF DIF>=LCV THEN 6990 6970 FLAG$(II,JJ)="NS" 6980 GOTO 7030 6990 IF LF=1 THEN 7010 ELSE LE=LE+1 : LF=1 7000 IF SIG=1 THEN FE=FE+1

131 7 0 1 0 LC=LC+1 : FLAG$(II,JJ)="*" 7 0 2 0 IF SIG=1 THEN FC=FC+1 7030 NEXT JJ 7040 NEXT II 7050 IF NREPS%=RD% AND 0PT="Y" THEN GOSUB 2150 7060 '

^non ' SPJOTVOLL-STOLINE MODIFICATION OF HSD /UoO


7110 FOR II=K TO 2 STEP -1 7120 FOR JJ=1 TO II-l

7130 IF NN(II)<=NN(JJ) THEN NMIN=NN(II) ELSE NMIN=NN(JJ) •' H-SS 7140 HCV=QT*SQR(MSW/NMIN) . t H_qq 7150 DIF=ABS(XBAR(II)-XBAR(JJ)) ' 7160 ^ IF NREPS%=RD% AND OPT="Y" THEN LPRINT USING ODIFHS;DIF,HCV

7180 IF DIF>=HCV THEN 7210 7190 FLAG$(II,JJ)="NS" 7200 GOTO 7250 7210 IF HF=1 THEN 7230 ELSE HE=HE+1 : HF=1 7 2 2 0 IF SIG=1 THEN SHE=SHE+1 7230 HC=HC+1 : FLAG$(II,JJ)="*"

72M NEXT JJ " S I G = 1 T H E N S H C = S H C + 1

7260 NEXT II 7270 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 7280 ' 72qq I TUKEY-KRAMER MODIFICATION OF HSD

7310 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ II 7320 FOR II=K TO 2 STEP -1 ' 7330 FOR JJ=1 TO II-l 7340 NTK=((l/NN(II))+(l/NN(JJ)))/2 .» H_T1z 7350 TCV=Q(K)*SQR(MSW*NTK) !. 7360 DIF=ABS(XBAR(II)-XBAR(JJ))

7380 ' I F N R E f S % = R D X S N D 0PT-"Y" ™ E N l pR™T USING ODIFHT;DIF,TCV

7390 IF DIF>=TCV THEN 7420 7^00 FLAG$(II,JJ)="NS" 7410 GOTO 7460 7*2° IF TF=1 THEN 7440 ELSE TE=TE+1 : TF=1 7f30 IF SIG=1 THEN STE=STE+1 7 ^ TC=TC+1 : FLAG$(II, JJ)="*"

74M NEXT JJ I F S K = I T H E" S T C' S T C + 1

7470 NEXT II 7480 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 7490 f

7^J° J SSD UNEQUAL N'S

7520 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)=»0": NEXT JJ,II

132 7540 FOR II=K TO 2 STEP -1 7550 FOR JJ=1 TO II-l 7560 7570 7580 7590 7600 ' 7610 7620 7630 7640 7650 7660 7670 7680

' SSD ' SSD

NSD=((1/NN(II)) + (1/NN(JJ))) SCV=SQR((K—1)*FCV)*SQR(MSW*NSD) DIF=ABS(XBAR(II)-XBAR(JJ)) IF NREPS%=RD% AND 0PT="Y" THEN LPRINT USING 0DIFS;DIF,SCV

IF DIF>=SCV THEN 7640 FLAG$(II,JJ)="NS" GOTO 7680

IF SF=1 THEN 7660 ELSE SE=SE+1 : SF=1 IF SIG=1 THEN SSE=SSE+1

SC=SC+1 : FLAG$(II, IF SIG=1 THEN SSC=SSC+1

NEXT JJ

| EW Type I error rate (all anovas) - LSD ^ EW Type I error rate (sig anovas) - FLSD ^ PC Type I error rate (all anovas) - LSD PC Type I error rate (sig anovas) - FLSD

7690 NEXT II

7700 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150

7720 | ZERO OUT CYCLE COUNTERS

7740 FOR 11=1 TO K : XBAR(II)=0 : NEXT II 7750 LF=0: HF=0: SF=0: NF=0: MF=0: TF=0 7760 ' 7770 NEXT NREPS% 7780 ' 7790 m h m m m m , , , , s u m m a r i z e R E S U L T g

7820 * CALCULATE PERCENTS 7830 ' 7840 PLE=C*LE/REP% 7850 PFE=C*FE/REP% 7860 PLC=C*LC/NCOMPS 7870 PFC=C*FC/NCOMPS 7880 ' 7890 PME=C*ME/REP% 7900 PSME=C*SME/REP% 7910 PMC=C*MC/NC0MPS 7920 PSMC=C*SMC/NC0MPS 7930 ' 7940 PNE=C*NE/REP% 7950 PSNE=C*SNE/REP% 7960 PNC=C*NC/NCOMPS 7970 PSNC=C*SNC/NCOMPS 7980 ' 7990 PSE=C*SE/REP% 8000 PSSE=C*SSE/REP% 8010 PSC=C*SC/NCOMPS 8020 PSSC=C*SSC/NCOMPS 8030 ' 8040 IF J>5 THEN 8110 8050 PHE=C*HE/REP% 8060 PSHE=C*SHE/REP% 8070 PHC=C*HC/NC0MPS

' EW (all) ' EW (sig) ' PC (all) ' PC (sig)

' SNK

MRT

' SSD

HSD

133

8080 PSHC=C*SHC/NCOMPS

8090 GOTO 8210 Skip Unequal n HSD

8110 PHE=C*HE/REP% .» HSD-SS 8120 PSHE=C*SHE/REP% 8130 PHC=C*HC/NC0MPS 8140 PSHC=C*SHC/NCOMPS 8150 ' 8160 PTE=C*TE/REP% .' HSD-TK 8170 PSTE=C*STE/REP% 8180 PTC=C*TC/NCOMPS 8190 PSTC=C*STC/NCOMPS 8200 '

8230 ' - - - I ™ ! R U N T I M E F 0 R K'J COMBINATION

8240 ' PRINT OUT SUMMARY SHEET 8250 ' 8270 IF OPT="Y" THEN LPRINT CHR$(12); 8280 PRINT #3," " : PRINT #3," 11 : PRINT #3 " " 8290 PRINT#3, 01 ' 8300 PRINT#3, " ":PRINT#3," " : PRINT #3 » » 8310 PRINT#3, USING 02;K,CALC$ 8320 PRINT#3, USING 03;J,DATE$ 8330 PRINT#3," " : PRINT #3 " " 8340 PRINT#3, 05 8350 PRINT#3, 06 8360 FOR 1=2 TO K

8370 PRINT#3, USING 07;I,FCV,Q(I),M(I) OT 8380 NEXT I 8390 PRINT#3, 08 8400 PRINT#3, USING 09;REP% 8410 PRINT#3, USING 010;ICNT,.05*REP%,NC0MPS 8420 PRINT#3, USING 011;ICNT/REP% 8430 PRINT#3," " 8440 PRINT#3, 012 8450 PRINT#3, 013 8460 PRINT#3, 014 8470 PRINT#3, 015 8480 PRINT#3," " 8490 PRINT#3, USING 016;LE,LC 8500 PRINT#3, USING 017;PLE,PLC 8510 PRINT#3," " 8520 PRINT#3, USING 018;FE,FC 8530 PRINTI3, USING 019;PFE,PFC 8540 PRINT#3," " 8550 PRINT#3, USING 020;ME,SME,MC,SMC 8560 PRINT#3, USING 021;PME,PSME,PMC,PSMC 8570 PRINT#3," " 8580 PRINT#3, USING 022;NE,SNE,NC,SNC 8590 PRINT#3, USING 023;PNE,PSNE,PNC,PSNC 8600 PRINT#3," " 8610 '

8620 IF J>5 THEN 8700 : Skip Equal „ HSD

134 8630 '

8640 PRINT#3, USING 026;HE,SHE,HC,SHC HSD 8650 PRINT#3, USING 027;PHE,PSHE,PHC,PSHC 8660 ' 8670 GOTO 8790 , IT_ 8680 ' ' i p Unequal n HSD

8690 PRINT#3," " 8700 PRINT#3, 028 8710 PRINT#3," " 8720 PRINT#3, USING 031;HE,SHE,HC,SHC HSD-SS 8730 PRINT#3, USING 032;PHE,PSHE,PHC,PSHC 8740 PRINT#3," "

8750 PRINT#3, USING 029;TE,STE,TC,STC HSD-TK 8760 PRINT#3, USING 030;PTE,PSTE,PTC,PSTC 8770 PRINT#3," " 8780 PRINT#3, 028 8790 PRINT#3," "

8800 PRINT#3, USING 024;SE,SSE,SC,SSC •' SSD 8810 PRINT#3, USING 025;PSE,PSSE,PSC,PSSC 8820 PRINT#3," " 8830 PRINT#3, 033 8840 PRINT#3, CHR$(12); 8860 '

8870 'ZERO OUT COUNTERS FOR NEXT K,J CYCLE 8880 ' 8900 ICNT=0 8910 LF=0: HF=0: SF=0: NF=0: MF=0: TF=0 8920 LE=0: HE=0: SE=0: NE=0: ME=0: TE=0 8930 LC=0: HC=0: SC=0: NC=0: MC=0: TC=0: FC=0 8940 FE=0: SHE=0: SSE=0: SNE=0: SME=0: STE=0 8950 FC=0: SHC=0: SSC=0: SNC=0: SMC=0: STC=0 8960 SIG=0 8970 •

8980 GOTO 2590 R E T U R N F 0 R N E X T C Y C L E

9000 END

APPENDIX D

ANALYSIS OF THE STEPWISE AND SIMULTANEOUS TESTING PROCEDURES

Both the SNK and MRT are stepwise, or layered, multiple comparison procedures. The critical differences competed by these procedures depend on the distance between ranked means

» P Pt!! d l XJ'?S;

o n s t r a t e a h o w t h e s e procedures are applied and how they differ from simultaneous procedures. a p p l l e a

? example shown below is taken from one of the sig-nificant F-tests: repetition 130, k=6, J=5. The five zrout) means from this cycle were 97.63, 97.06 107 51 100 f*; 101.48, and 102.02. The ANOVA d i f f e r ' ' t ^ I H t e ^ f i h o w n

S S d f M S F Fcv

Between 1769.1 5 353.83 2 632 2~?sT Within 19359.4 144 13^44 Total 21128.5 U 9

Step U Rank order the group means. This results in the following listing of group means:

R a n k Mean Original position

1 107.51 3 2 102.02 6 3 101.48 5 4 100.55 4 5 97.63 1 o 97.06 2

Step 2. Create a paired difference matrix as shown below:

97.63 100.55 101.48 102.02 107.51

97.06 0.57 3.50 4-42 4.96 (10.45)

97.63 2.92

00 •

0̂ 4-39 9.88

100.55 0 . 9 3 1 .47 6.96

00 •

o

0.54 6.03 102.02

5.49

135

136 Step 3. Calculate the first critical difference. The first test is made of the largest difference, 10.45. The critical

diffe?enoP hS +° m p u t e d by.multiplying the standard error of inference by the appropriate Studentized Range Table criti-cal value as shown below. 1

s n k6 = <l(-05,6,144) /134.44/25

= (4.07) (2.319)

= 9.438

different6 J ^ U a l d i f f e r f c ® i s larger than the critical difference, this pair is declared significantly different This is reflected by a If this pair had not been

been made?1 1 0 different, no further tests would have

Step 4. Having tested the two means r = k = 6 ranks aDart we now test n,eans r = k-1 = 5 ranks apart, shown S e l o H n ( ) .

97.63 100.55

97.06 0.57 3.50

97.63 2.92

100.55

101.48

102.02

SNKC 5

0

• OO

102.02 107.51 — — — —

4.42 (4.96)

3.85 4.39 (9.88)

0.93 1 .47 6.96

0.54 6.03

5.49

test is computed as follows

= (3.90) (2.319)

= 9.044

a c ^ a l difference, 4.96, is less than the critical difference, the pair is declared not significantly different

S r t Z r ? r t ? e r t e s t s , a r e " d e on this row. A d ^ W o n a U y ! no further tests are made from this column to the left. The test barrier, shown as a " —" sign, marks the "Hmi + Q -p pairwise tests. limits for further

137

97.63 100.55 101.48 102.02 107.51

97.06 0.57 3-50 ~ ~ - ~

97.63 2.92 3.85

100.55

101.48

102.02

NS *

(9.88)

0.93 1.47 6.96

0.54 6.03

5.49

The next test made is between the actual difference 9.88 and the critical difference 9.44. Since the actual difference is greater than the critical difference, this pair is declared significantly different. No further tests are made on this row because of the barrier (-).

97.06

97.63

0.57

100.55

3.50

101.48 102.02

NS

107.51

97.63 2.92 3.85 —

100.55 0.93 1.47 6 .96

101.48 0.54 6.03

102.02 5.49

m e a n s ? " 1 5 « _ t ® s t e d ® e a n s J = k-1 ranks apart, we now test means r - k-2 - 4 means apart, as shown in the () below.

97.06

97.63

0.57

100.55

3.50

101.48

( ! ) "

102.02

NS

107.51

97.63 2.92 3.85 ( - ) -X-

100.55 0.93 1 .47 (6.96)

00 •

o

0.54 6.03 102.02

5.49 The critical difference for this test is computed as follows:

= (3.66) (2.319)

= 8.488

138 ac"^ual difference, 6.96, is less than the critical

difference, the pair is de Barriers are set to the l| below.

97.06

97.63

0.57

100

97.63 4.92

100.55

101.48

102.02

No further tests are made means apart, has been barr multiple comparison proced

The MRT procedure fol slightly lower critical va cal differences. In this e ferences were 7 .34 , 7 .20 , respectively. In this case two comparisons significan

As a point of compari was computed as 6.47.

5.49

because the next test, r = k-3 = 3 ed from testing. At this point, the ure ends. lows the same process but uses lues to compute its layered criti-•xample, the MRT critical dif-and 7.03 for r = 6 , 5, and 4 , both procedures declared the same tly different. son, the LSD critical difference

LSD = q ( .05

= (2.79)

= 6.470

It declared 3 comparisons below:

97.63

NS

100

97.06 NS |rs

97.63 $s

100.55

101.48

102.02

Notice that all of the difff the same simultaneous appro 9«44 for all differences.

clared not significantly different, ft and below this cell as shown

.55

.50

101.48

3.85

0.93

102.02

NS

(-)

0.54

107.51

NS

,2,144) /134•44/25

(2.319)

significantly different, as shown

55 101.48

NS

NS

NS

102.02

NS

NS

NS

NS

107.51

NS

NS

erences were tested. The HSD uses ach. Its critical difference was

139

HSD = q ( . 0 5 , 6 , 1 4 4 ) / 1 3 4 . 4 4 / 2 5

= ( 4 . 0 7 ) (2.319)

= 9.438

The same pairs declared different by SNK and MRT were declared significant by the HSD, as shown below:

97.63^ 100.55 101.48 102.02 107.51

97.06 NS NS NS NS *

97.63 NS NS NS *

1 0 0 , 5 5 NS NS NS

NS NS 101.48

102.02

11.075.

NS

The Scheffe Significant Difference was computed to be

SSD = /((k-1)Fcv) /(2MSW/n)

= / ( 6 - 1 ) ( 2 . 2 8 1 ) / ( 2 ) ( 1 3 4 . 4 4 ) / 2 5

= / ( 1 1 .405) /" (10.755)

= 11 .075

It declared no differences significant as shown below:

-Vitl 1 0 0 , 5 5 101 ̂ 8 1 0 2' 0 2 107.51

9 7' 0 6 NS NS NS NS~~ NS~

NS NS NS NS

NS NS NS

NS NS

NS

97.63

100.55

101 .48

102.02

APPENDIX E

* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *

GROUPS (k): SIZE (J):

3 1

COMPUTATION TIME: 00:04:34

r 2 : 3 :

F 3.890 3.890

Q 3.080 3 .770

M 3.080 3.230

QT 3. 791 3. 791

F-TESTS: 1000 SIG F-TESTS: 50 E: 50 PERCENT SIG: 0.050 E: 0.050

EXPERIMENTWISE

ALL ANOVAS SIG ANOVAS

LSD

FLSD

MRT

SNK

HSD

SSD

109 10.90%

90 9.00%

48 A.80%

48 4 . 80%

39 3.90%

50 5.00%

50 5.00%

45 4 . 50%

45 4.50%

39 3 . 90%

COMPARISONS: 3000

COMPARISONWISE


145 4.83%

123 4.10%

70 2. 33%

60 2.00%

48 1 . 60%

77 2.57%

77 2.57%

6 7 2. 23%

57 1 .90%

48 1 .60%

U o

u 1



3 2


r 2 : 3 : 3.355

Q 2.905 3.510

M 2.905 3.050

QT 3.519 3.519


EXPERIMENTWISE


LSD

FLSD

MRT

SNK

HSD

SSD

128 12.80%

104 10.40%

51 5 . 10%

51 5 . 10%

40 4.00%

52 . 20%

52 . 20%

47 . 70%

4 7 70%

40 00%

COMPARISONS: 3000

COMPARISONWISE


159 5.30%

135 4. 50%

77 2.57%

58 1 .93%

44 1.47%

82 2.73%

8 2 2. 7 3%

-7 n

/ D 2.43%

1 . 80%

4 4 1 .47%

U 2

Ac * * P O PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *


3 3


i' 2 : 3 :

F 3 . 222 3 . 222

Q 2.857 3.436

M 2.857 3 . 007

QT 3.446 3.446


EXPERIMENTWISE


LSD

FLSD

MRT

SNK

HSD

SSD

108 10.80%

90 9. 00%

49 4 . 90%

49 4 . 90%

40 4.00%

50 5.00%

50 5.00%

46 4 .60%

46 4 .60%

40 4.00%

COMPARISONS: 3000

COMF'ARI SONWI SE


138 4 .60%

117 3 . 90%

71 2.37%

59 1 .97%

46 1 . 53%

2. 50%

7 5 2.50%

68 2. 2 7%

56 1.87%

46 1 .5 3%

1 U

PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *


3 4


r 2 : -> . D :

F 3.162 3. 162

Q 2.834 3 . 406

M 2.834 2.984

QT 3.413 3.413


EXPERIMENTWISE


COMPARISONS: 3000

COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

SSD

126 12.60%

155 5.17%

101 10.10%

54 5. 40%

54 5. 40%

4 7 4. 70%

50 5.00%

50 5.00%

48 4.80%

48 4 .80%

47 4 . 70%

130 4.3 3%

78 2 . 60%

66 2 . 20%

54 1 . 30%

75 2. 50%

75 2. 50%

2. 40%

60 2.00%

5 4 1 .80%

U 5

PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *

GROUPS (fc): SIZE (J):

3 5


r 2 : 3 :

F 3.134 3. 134

Q 2.824 3 .392

M 2.821 2.971

QT 3. 397 3.397


EXPERIMENTWISE


COMPARISONS: 3000

COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

SSD

127 12.70%

161 5.37%

100 10.00%

54 5. 40%

54 5. 40%

45 4 . 50%

54 5.40%

54 5.40%

51 5.10%

51 5.10%

45 4 . 50%

133 4.43%

81 2. 70%

67 2. 23%

54 1 . 80%

84 2.80%

84 2.80%

78 2. 60%

64 2.13%

54 1 . 80%

146

k * •* PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *

GROUPS (k) SIZE (J):

3 6

CODfrUTATION TIME: 00:12:13

r 2: 3 :

F 3.222 3. 222

Q 2.857 3.436

M 2.857 3 .007

QT 3 . 446 3.446


EXPERIMENTWISE


COMPARISONS: 3000

COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD—SS

HSD—TK

SSD

127 12.70%

100 10.00%

51 5.10%

45 4 . 50%

53 5.30%

53 5 . 30%

48 4.80%

162 5.40%

8 2 2.73%

134 82 4.47% 2.7 3%

75 72 2. 50% 2. 40%

34 3 .40%

34 3 . 40%

N HSD

37 1 . 23% 1

54 5.40%

51 5.10%

65 2.17%

KT u r n

2

45 4.50%

52 1 . 73%

37

62

52 .1. 735

U 7




r 2 : 3 :

F 3 .074 3.074

Q 2.802 3 . 362

M 2. 794 2.944

QT 3.364 3 . 364


EXPERIMENTWISE

ALL ANOVAS SIG ANOVAS ALL ANOVAS

LSD 120 153 12.00% 5.10%

FLSD 53 5 . 30%

MRT 94 51 126 9 . 40% 5.10% 4. 20%

SNK 51 43 70 5.10% 4. 30% 2.3 3%

N HSD

HSD-SS 26 21 28 2 . 60% 2.10% 0.93%

HSD-TK 57 48 65 5 . 70% 4 . 80% 2.17%

N HSD

SSD 30 30 3 4 3 . 00% 3 .00% 1.13%

COMPARISONS: 3000

COMPARISONWISE

SIG ANOVAS

2 . 60%

i 6 2.53%

6 2 2.01%

23 0.11%

56 1.87%

34 1.135

14-8



4 1


F 3.240 3.240 3.240

Q 3.000 3.650 4.050

M 3.000 3. 150 3. 230

QT 4 .050 4.050 4.050


EXPERIMENTWISE


LSD

FLSD

MRT

SNK

HSD

181 18.10%

128 12.80%

46 4 . 60%

46 4 . 60%

25 2. 50%

51 5.10%

51 5.10%

40 4.00%

40 4.00%

25 2 . 50%

COMPARISONS

COMPARISONWISE


290 4.83%

216 3 .60%

77

1 . 28%

62 1.03%

33 0. 55%

133 2 . 22%

12 4 2.0 7%

71 1 .13%

56 0.93%

33 0.55%

U 9


GROUPS (k): 4 SIZE (J): 2


r 2: D : 4 :

F 2.872 2.872 2.872

Q 2.872 3 . 460 3.814

M 2.872 3.022 3 . 108

QT 3.814 3.814 3.814

F-TESTS: 1000 SIG F-TESTS: 54 PERCENT SIG: 0.054

E : E : : 50 0.050

COMPARI SONS: 6000



LSD 182 18.20%

285 4. 75%

FLSD 54 5 . 40%

132 2 . 20%

MRT 132 13.20%

54 5. 40%

214 3.57%

119 1 .98%

SNK 50 5.00%

46 4 . 60%

92 1.53%

88 1 . 47%

HSD 50 5.00%

46 4 . 60%

70 1.17%

66 1 .10%

SSD 27 2. 70%

27 2. 70%

38 0.63%

38 0.63%

150

* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *


4 3


F 2. 776 2. 776 2.776

Q 2.836 3 . 408 3. 750

M 2.836 2.986 3.084

QT 3. 749 3. 749 3. 749


EXPERIMENTWISE


COMPARISONS: 6000

COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

SSD

195 19.50%

300 5 .00%

142 14.20%

55 5.50%

55 5. 50%

35 3 . 50%

53 5. 30%

53 5. 30%

48 4.80%

48 4.80%

35 3.50%

220 3.67%

83 1 . 38%

66 1.10%

38 0.63%

120 2.00%

111 1 .85%

76 1.27%

59 0. 98%

38 0.63%

151


GROUPS (k.) SIZE (J):

4 4


r 2 : D :

4 :

2.739 2. 739 2. 739

Q 2.822 3 . 389 3. 724

M 2.818 2.96.8 3 .068

QT 3.724 3. 724 3. 724

F-TESTS: 1000

SIG F-TESTS: 45 E: 50 PERCENT SIG: 0.045 E: 0.050

E X P E R I M E N T S S E


COMPARISONS: 6000

C OMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

207 20.70%

29 7 4.95%

139 13.90%

49 4. 90%

49 4 . 90%

15 1 . 50%

45

4 . 50%

45 4 . 50%

3 6 3.60%

3 . 60%

15 1 .505

207 3 . 4 5%

65 1 .08%

56 0.93%

16 0.2 7%

101 1 . 68%

9 3

1 . 55%

0 .8 7%

4 3 0 . 7 2%

16 0.27%

152



4 5


F 2.712 2.712 2.712

Q 2 . 8 1 2 3.3 76 3. 704

M 2.803 2.953 3.053

QT 3 . 707 3. 707 3. 707


EXPERIMENTWISE


LSD

FLSD

MRT

!NK

HSD

;SD

194 19.40%

137 13.70%

52 5 . 20%

52 5. 20%

24 2 . 40%

54 5 . 40%

54 5 . 40%

4 5 4 . 50%

4 5 4.50%

24 2. 40%

COMPARISONS: 6000

COMPARISONWISE


307 5.12%

227 3 . 78%

79 1 .32%

61 1.02%

28 0.4 7%

129 2 . 15%

121 2.0 2%

7 2 1 . 20%

54 0 . 90%

28 0 . 4 7%

153



4 6


F 2. 752 2.752 2. 752

Q 2.827 3 . 396 3. 734

M 2.826 2.976 3.076

QT 3. 733 3.733 3.733


COMPARISONS 6000

EXPERIMENTWISE


COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD-SS

HSD-TK

198 19.8056

130 13.00%

52 5. 20%

24 2 . 40%

53 5.30%

46 4 . 60%

45 4 . 50%

39 3 .90%

300 5.00%

200 3.33%

1 . 42%

UNEQUAL N HSD

24 2 . 40%

40 4 . 00%

31 0.5 2%

70 1.17%

106 1 .7 7%

101 1 .68%

7 2 1 . 20%

31 0.5 2%

5 7 0 . 95%

-UNEQUAL N HSD

zo 2. 60%

26 2 . 60%

35 0 . 58%

35

154

rt * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *


4 7


r F Q M QT 2: 2.674 2. 792 2. 786 3.672 3 : 2.674 3. 347 2.936 3.672 4 : 2.674 3.667 3.036 3.672

F-TESTS 1000 SIG F-TESTS: 45 E 50 COMPARISONS : 6000 PERCENT SIG: 0.045 E • 0.050

EXPERIMENTWISE C OMPARISONWI SE


LSD 187 276 18.70% 4 .60%

FLSD 45 9 7 4.50% 1 . 62%

MRT 116 43 175 87 11.60% 4. 30% 2.92% 1 .45%

SNK 4 2 3 4 66 58 4 . 20% 3 . 40% 1 . 10% 0.9 7%

M HCFl • • U IN JEJ v; .U /A L.

HSD-SS 28 23 32 27 2.80% 2. 30% 0 .53% 0. 4 5%

HSD-TK 47 38 63 54 4 . 70% 3 . 80% 1 .05% 0.90%

KT T4QT1 " U IN L U n L

SSD 22 22 25 25 2 . 20% 2 . 20% 0. 42% 0.4 2%

155

PRINT OUT OF MULTIPLE COMPARISON ERROR RATES *


5 1


F 2.870 2.870 2.870 2.870

Q 2.950 3.580 3.960 4.230

M 2.950 3. 100 3. 180 3.250

QT 4. 233 4. 233 4. 23 3 4.233


EXPERIMENTWISE


COMPARISONS: 10000

COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

SSD

264 26.40%

482 4.82%

193 19.30%

47 4. 70%

47 4. 70%

23 2. 30%

49 4 . 90%

49 4 . 90%

39 3.90%

39 3 .90%

23 2. 30%

344 3 .44%

79 0. 79%

62 0 .62%

30 0. 30%

161 1 .61%

139 1 . 39%

71 0 .71%

54 0.54%

30 0. 30%

156

* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES *


5 ?


F 2.590 2.590 2.590 2.590

Q 2.853 3.430 3.777 4.013

M 2.853 3.003 3.095 3. 162

QT 4.024 4.024 4 .024 4 . 024


COMPARISONS: 10000

EXPERIMENTWISE


COMPARISONWISE


LSD 263 26.30%

485 4.85%

FLSD

MRT

SNK

HSD

SSD

181 18.10%

45 4 . 50%

45 4 . 50%

13 1 . 30%

4 . 50%

45 4 . 50%

37 3 . 70%

37 3 . 70%

13 1 . 30%

319 3.19%

69 0.69%

59 0.59%

15 0.15%

148 1 . 48.%

127 1.27%

ol 0 .61%

51 0.51%

15 0.15%

157

* PRINT OUT OF MULTIPLE COMPARISON ERROR RATES

GROUPS (fc): SIZE (J):

5 3


F 2.517 2.517 2.517 2.517

Q 2.825 3 . 393 3. 730 3.970

M 2 . 8 2 2 2.972 3.072 3. 135

QT 3.968 3 .968 3.968 3 . 968


EXPERIMENTWISE


COMPARISONS: 10000

COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

SSD

279 27.90%

516 5.16%

191 19.10%

48 4 . 80%

48 4 . 80%

18 1 . 80%

49 4 .90%

49 4.90%

36 3 . 60%

36 3 .60%

18 1 .80%

343 3 . 43%

80 0.80%

65 0.65%

26 0. 26%

174 1 . 74%

147 1 .47%

68 0 .68%

53 0.53%

26 0. 26%

158



5 4


F 2.483 2.483 2. 483 2.483

Q 2.813 3. 377 3. 705 3.945

M 2.804 2.954 3.054 3. 123

QT 3.94 2 3.942 3.942 3 .942


COMPARISONS: 10000

EXPERIMENTWISE


COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

278 27.80%

485 4.85%

180 18.00%

46 4.60%

46 4 . 60%

16 1 . 60%

47 4. 70%

47 4. 70%

35 3 . 50%

35 3 . 50%

16 1 .60%

316 3.16%

70 0. 70%

56 0 . 56%

18 0.18%

158 1 . 53%

139 1 . 39%

_> /

0.59%

45

18 0 .18%

159

PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *

GROUPS (k.) SIZE (J):


F 2.450 2.450 2.450 2.450

Q

2 . 8 0 0 3 . 360 3.680 3.920

M 2. 792 2.943 3.042 3.112

QT 3.917 3.917 3.917 3.917

F-TESTS: 1000 S1G F-TESTS: 53 E: 50 PERCENT SIG: 0.053 E: 0.050

COMPARISONS: 10000

EXPERIMENTWISE


COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

SSD

294 29.40%

508 5.08%

189 18.90%

46 4 . 60%

46 4 . 60%

19 1 . 90%

53 5. 30%

53 5. 30%

38 3 .80%

38 3 .80%

19 1 .90%

316 3.16%

88 0.88%

71 0.71%

28 0 . 28%

169 1 .69%

149 1 . 49%

80 0.80%

6 3 0.63%

28 0 . 28%

160

* * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *


5 6


F 2.433 2.483 2. 483 2.483

Q 2.813 3.377 3.705 3.945

M 2.804 2.954 3 .054 3. 123

QT 3.942 3.94 2 3 .94 2 3.942


COMPARISONS 10000

EXPERIMENTWISE


COMPARISONWISE


LSD 295 29.50%

531 5.31%

FLSD

MRT

SNK

178 17.80%

45 4 . 50%

46 4 . 60%

45 4.50%

36 3 . 60%

299 2.99%

65 0.65%

1 . 38%

113 1 . 13%

0 . 565

HSD-SS

HSD-TK

SSD

19 1 . 90%

47 4 . 70%

13 1 . 30%

-UNEQUAL N HSD

17 1 . 70%

38 3.80%

27 0.27%

61 0 . 6 1 %

-UNEQUAL N HSD

13 1 . 30%

20 0 . 20%

z o 0 . 25%

52 0.52%

20 0 . 20%

161



5 7


F 2.437 2.437 2.437 2.437

Q 2. 783 3.331 3.651 3.885

M 2. 779 2.929 3.029 3.099

QT 3.883 3.883 3.883 3 .883


COMPARISONS: 10000

EXPERIMENTWISE COMPARISONWISE —

ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVA

LSD 299 528 29.90% 5 . 28%

FLSD 49 158 4.90% 1 . 58%

MRT 188 4 8 328 142

1—•

CD

00

o

4 .80% 3 . 28% 1 . 42%

SNK 51 35 84 6 7 5.10% 3. 50% 0 .84% 0.6 7%

UNEQUAL N HSD

HSD-SS 41 29 53 41 4.10% 2.90% 0. 53% 0.41%

HSD-TK 56 39 77 60 5 . 60% 3 . 90% 0.7 7% 0. 60%

N ucn

SSD 17

uinl^/UHL

1 7

N

24 24 1 . 70% 1 . 70% 0 . 24% 0.24%

162



6 1


r F Q M QT 2: 2.620 2.920 2.920 4.373 3: 2.620 3.530 3.070 4.373 4 : 2.620 3.900 3. 150 4.373 5: 2.620 4. 170 3.220 4.373 6 : 2.620 4. 370 3 . 280 4.373

F-TESTS: 1000 SIG F-TESTS: 53 E : : 50 COMPARISONS: 15000 PERCENT SIG: 0.053 E : : 0.050



LSD 346 780 34.60% 5. 20%

FLSD 53 240 5 . 30% 1 .60%

MRT 231 53 513 209 2 3.10% 5. 30% 3 . 42% 1 . 39%

SNK 50 39 100 89 5.00% 3 .90% 0.67% 0. 59%

HSD 50 39 75 64 5 . 00% 3 . 90% 0 . 50% 0. 4 3%

SSD 11 11 16 • 16 1 . 10% 1.10% 0.11% 0.11%

* PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *

163


o 2


2 2 ,

2 .

2 .

2

394 394 394 394 394

Q 2.839 3.412 3. 755 3 . 963 4.181

M 2.

2 .

3. 3. 3.

839 989 086 149 206

QT 4 .184 4 . 184 4. 184 4. 184 4.184


EXPERIMENTWISE


COMPARISONS: 15000

COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

5SD

354 35.40%

741 4.94%

229 22.90%

49 4.90%

49 4 . 90%

11 1 . 10%

49 4.90%

49 4 . 90%

40 4 . 00%

40 4 . 00%

11 1 . 10%

461 3.07%

89 0. 59%

72 0. 48%

13 0.09%

206 1 .37%

177 1.18%

80 0 . 53%

6 3 0.4 2%

13 0.09%

164.



6 3

COMPUTATION TIME: 00:24:0*

2 2 .

2 ,

2 ,

2,

338 338 338 338 338

2 ,

3. 3, 3 , 4,

818 384 716 956 136

M 2.812 2.962 3.062 3. 128 3. 188

QT 4. 136 4. 136 4. 136 4 . 136 4.136

F-TESTS: SIG F-TESTS PERCENT SIG:

1000 53

0.053 E E ;

50 0.050

COMPARISONS: 15000

EXPERIMENTWISE


COMPARISONWISE


LSD

FLSD

MRT

SNK

HSD

SSD

350 35.00%

756 5 .04%

217 21.70%

55 5 . 50%

55 5.50%

21 2.10%

53 5. 30%

53 5. 30%

39 3 . 90%

39 3 . 90%

21 2. 10%

460 3.07%

98 0 .65%

81 0.54%

28 0.19%

233 1 . 55%

193 1 . 29%

81 0 . 54%

65 0.4 3%

28 0.19%

PRINT OUT OF MULTIPLE COMPARISON ERROR RATES

165


6 4


F 2. 298 2. 298 2. 298 2. 298 2.298

2 3

3 3 4

803 364 686 926 106

M 2 ,

2 ,

3. 3, 3.

945 045 115 175

QT 4.103 4.103 4 .103 4 . 103


COMPARISONS: 15000

EXPERIMENTWISE


COMPARISONWISE


LSD

FLSD

MRT

5NK

HSD

SSD

359 35.90%

766 5.11%

225 22.50%

48 4. 80%

48 4 . 80%

11 1 .10%

51 5. 10%

51 5.10%

38 3 .80%

38 3 . 80%

11 1. 10%

455 3.03%

95 0.63%

66 0 . 44%

15 0 .10%

228 1 . 52%

187 1 . 25%

85 0.5 7%

56 0.3 7%

15 0 . 10%

166

PRINT OUT OF MULTIPLE COMPARISON ERROR RATES


D 5


r 7

5

2. 2 ,

2 ,

2.

281 281 281 281 281

Q 2. 788 3 . 340 3.660 3.896 4.072


EXPERIMENTWISE


M 2. 783 2.933 3.033 3. 103 3 .163

COMPARISONS

QT 4.0 70 4.070 4.0 70 4.070 4.0 70

15000

COMPARISONWISE


,SD

FLSD

MRT

SNK

HSD

SSD

375 37.50%

767 5.11%

222 22.20%

48 4. 80%

48 4 . 80%

13 1 . 30%

54 5. 40%

54 5. 40%

34 3 . 40%

3 4 3 . 40%

13 1 . 30%

453 3.02%

87 0 . 58%

69 0 . 46%

16 0.11%

231

193 1 "I Q L • JL. / -'o

73 0 . 49%

55 0.37%

16 0.11%

167


GROUPS (k): 6 COMPUTATION TIME : 00:38 : 04 SIZE (J) 6

r F Q M QT 2 : 2.287 2. 796 2. 789 4.086 3 : 2. 287 3.353 2.939 4.086 4 : 2. 287 3.673 3.039 4 . 086 5: 2. 287 3.911 3. 109 4.086 6 : 2. 287 4.089 3. 169 4 .086

F-TESTS: 1000 SIG F-TESTS : 46 E 50 COMPARISONS: 15000 PERCENT SIG : 0.046 E : 0.050



LSD 342 741 34.20% 4.9 4%

FLSD 46 195 4.60% 1 . 30%

MRT 192 45 387 149 19.20% 4 . 50% 2.58% 0 . 99%

SNK 39 28 70 58 3.90% 2 . 80% 0.47% 0. 39%

M u c n U iN £, v»̂ U rl Li

HSD-SS 19 18 21 20 1 .90% 1 .80% 0.14% 0.13%

HSD-TK 46 34 65 52 4 . 60% 3 . 40% 0 . 43% 0.3 5%

KT u c n """ U IN E. U J r \ i_

SSD 13 13 14 14 1 . 30% 1 . 30% 0 . 09% 0 . 09%

168

* A * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *

GROUPS (k) : 6 COMPUTATION TIME: 00 : 4 8 : 1 9 SIZE (J ) : 7

r F Q M QT 2: 2.270 2.773 2.772 4.037 3 : 2.270 3.315 2.922 4.037 4 : 2.270 3.635 3.022 4 . 03 7 5 : 2. 270 3 .866 3.092 4.03 7 6: 2. 270 4.03 7 3 .152 4.037

F-TESTS : 1000 SIG F-TESTS: 48 E 50 COMPARISONS: 15000 PERCENT SIG: 0.048 E : 0.050

EXPERIMENTWISE COMPARI SONWISE


LSD 361 763 36.10% 5 .09%

FLSD 48 196 4 . 80% 1.31 %

MRT 206 45 400 147 20.60% 4.50% 2.67% 0.98%

SNK 46 31 85 70 4 . 60% 3.10% 0 .57% 0.47 %

M wen " " U IN U r\ Li

HSD-SS 3 6 25 4 9 38 3 . 60% 2 . 50% 0.33% 0 . 25%

HSD-TK 51 33 73 5 5 5.10% 3 . 30% 0 . 49% 0.37%

W yen w JLN 11 IJ i_,

SSD 11 11 16 16 1.10% 1 . 10% 0.11% C >. 11%

APPENDIX F

Articles Citing the Findings of Carmer and Swanson 1973

Articles Which Used FLSD

1. Atchley, W. R.; Rutledge, J. J.; Cowley, D. E., "A Multi-variate Statistical Analysis of Direct and Correlated Response to Selection in the Rat," Evolution. XXXVI (July 1982), pp. 677-698.

2. Bryant, Edwin H., "Morphometric Adaptation of the Housefly, MUSCA DOMESTICA L.. in the United States," Evolution, XXXI (September 1977), pp. 580-596.

and Turner, Carl R., "Comparative Mor-phometric Adaptation in the Housefly and Facefly in the United States," Evolution, XXXII (December 1Q7fi). pp. 759-770.

4-. Cameron, Guy N. and Kincaid, W. Bradley, "Species Removal Effects on Movements of Sigmodon hispides [cotton rats] and Reithrodontomys fulvescens [harvest mice]", American Midland Naturalist, CVIII (Julv 19fi?K nn. 60-67.

5. Cardon, Kathleen; Anthony, Rita Jo; Hendricks, Deloy G; and Mahoney, Arthur W., "Effect of Atmospheric Oxidation on Bioavailability of Meat Iron and Liver Weights in Rats," Journal of Nutrition. CX (March 1980). 567-574.

6. Dhingra, 0. D. and Sinclair, J. B., "Survival of Macro-phomina phaseolina Sclerotia in Soil: Effects of Soil Moisture, Carbon: Nitrogen Ratios, Carbon Sources, and Nitrogen Concentrations," Phyto, LXV (March 1975), pp. 236-2^0.

169

170

7. Fajemisin, J. M. and Hooker, A. L., "Predisposition to Diplodia Stalk Rot in Corn Affected by Three Helmin-thosporium Leaf Blights," Phyto, LXIV (December 1974), PP. U96-U99 .

8. "Top Weight, Root Weight, and Root Rot of Corn Seedlings as Influenced by Three Helminthosporium Leaf Blights," Plant Dis-ease Reporter, LVIII (April 1974.), pp. 313-317.

9. Farmer, Bonnie R.; Mahoney, Arthur W.; Hendricks, Deloy G.; and Gillett, Tedford, "Iron Bioavailability of Hand-Deboned and Mechanically Deboned Beef," Journal of Food Science, XLII (November-December 1977), pp. 1630-1632.

10. Friedrich, J. W.; Smith, Dale; and Schrader, L. E., "Her-bage Yield and Chemical Composition of Switchgrass as Affected by N, S, and K Fertilizations," Agronomy Journal. LXIX (January-February 1977), pp. 30-33.

11. Fritzell, Erik K., "Habitat Use by Prarie Racoons During the Waterfowl Breeding Season," Journal of Wildlife Management, XLII (January 1978), pp. 118-127.

12. Garcia-de-Siles, J. L.j Ziegler, J. H.; and Wilson, L., "Effects of Marbling and Conformation Scores on Quality and Quantity Characteristics of Steer and Heifer Carcasses," Journal of Animal Science, XLIV (January 1977), pp. 36-4-6.

13. "Prediction of Beef Quality by Three Grading Systems," Journal of Food Science, XLII (Mav-June 1977), pp. 711-715.

u-"Growth, Carcass, and Muscle Characters of Hereford and Holstein Steers," Journal of Animal Science, XLIV (June 1977), pp. 973-984-

1$. Hammerstedt, Roy H. and Hay, Sandra R., "Effect of Incuba-tion Temperature on Motility and cAMP Content of Bovine Sperm," Archives of Biochemistry and Biophysics. CXCIX (February 1980), pp. 4.27-4-37.

16. Harrison, R. G. and Massaro, T. A., "Influence of Oxygen and Glucose on the Water and Ion Content of Swine Aorta," American Journal of Physiology. CCXXXI (December 1976), pp. 1800-1805.

171

17. Ilyas, M. B.j Ellis, M. A., and Sinclair, J. B., "Evalua-tion of Soil Fungicides for Control of Charcoal Rot of Soybeans," Plant Disease Reporter, LIX (April 1975), pp. 360-364..

18. Krapu, Gary and Swanson, George, "Some Nutritional Aspects of Reproduction in Prarie Nesting Pintails," Journal of Wildlife Management. XXXIX (January 1975), pp. 156-162.

19. Lorenz, K. and Dilsaver, W., "Microwave Heating of Food Materials at Various Altitudes," Journal of Food Science, XLI (May-June 1976), pp. 699-702.

20. Mahoney,^ Arthur ¥. and Hendricks, Deloy G., "Some Effects of Different Phosphate Compounds on Iron and Calcium Absorption," Journal of Food Science. XLV (September-October 1978), pp. H73-H76.

21 . • Gillett Tedford, "Effect of Sodium Nitrate on the Bioavailability of Meat Iron for the Anemic Rat," Journal of Nutrition. CIX (December, 1979).

22- J Farmer, Bonnie R.; and Hendrick, Deloy G., "Effects of Level and Source of Dietary Fat on the Bioavailability of Iron from Turkey Meat for the Anemic Rat," Journal of Nutrition, CX (August. 1980). pp. 1703-1708.

23. Mills, David E. and Robertshaw, David, "Response of Plasma Prolactin to Changes in Ambient Temperature and Humidity in Man," Journal of Clinical Endrocrinology and Metabolism. LII (February 1981), pp. 279-283.

24-• Richards, J. Scott; Hurt, Michael; and Melamed, Laurence, "Spinal Cord Injury: A Sensory Restriction Perspective," Archives of Physical Medicine and Rehabilitation. LXIII (May, 1982), pp. 195-199.

25. Rominger, R. S.; Smith, Dale; Petersen, L. A., "Yields and Elemental Composition of Alfalfa Plant Parts at Late Bud Under Two Fertility Levels," Canadian Journal of Plant Science, LV (January, 1975), pp. 69-75^

26 •

— ' * O / y • ^ / 7 r ' J r * ^ / * s •

Chemical Composition of Alfalfa as Influenced by High Rates of K Topdressed as KC1 and KoS0/f" Agronomy Journal, LXVIII (July-August, 1976/, pp. 573-577.

172

27. Smith, Dale and Rominger, R. S., "Distribution of Elements Among Individual Parts of the Orchard Grass Shoot and Influence of Two Fertility Levels," Canadian Journal of Plant Science, LIV (July, 1974), pp. 485-494.

28. Solso, Robert L. and McCarthy, Judith, E., "Prototype Formation of Faces: A Case of Pseudo-memory," British Journal of Psychology. LXXII (November, 1971),—pp^ 499-503.

29* Thatcher, R. W.; Lester, M. L.; McAlaster, R.; and Horst, R., "Effects of Low Levels of Cadmium and Lead on Cognitive Functioning in Chiildren," Archives of Environmental Health. XXXVII (May-June, 1982), pp. 159-166.

30. Volenec, Jeff; Smith, Dale; Soberalske, R. M.; and Ream, H. ¥., "Greenhouse Alfalfa Yields With Single and Split Applications of Deproteinized Alfalfa Juice," Agronomy Journal. LXXI (July-August, 1971), pp. 695-697 •

Articles That Cited Carmer and Swanson But Did Not Specifically State Whether

the LSD was Protected

1. Hagman, Joseph D. and Williams, Evelyn, "Use of Distance and Location in Short Term Motor Memory," Perceptual and Motor Skills. XLIV (June 1977), pp. 867-873.

2. Jensen, Craig, "Generality of Learning Differences in Brain-Weight-Selected Mice," Journal of Comparative an| Physiological Psychology. XCIX (.Tnnn 1Q77^ pp.

3« Parker, Robert J.; Hartman, Kathleen D.; and Sieber, Susan M., "Lymphatic Absorption and Tissue Disposition of Liposome-entrapped [C]Adriamysin Following In-traperitoneal Administration to Rats," Cancer Research. XLI (April, 1981), pp. 1311-13T7T

4* Spring, David R. and Dale, Philip S., "Discrimination of Linguistic Stress in Early Infancy," Journal of sPeech and Hearing Research. XX (June, 1977),"pp. 224-232•

173

Articles That Used the Bayes Exact Test, A Secondary Recomendation

1. Chamblee, Rick W.j Thompson, Layfayette; and Bunn, Tommie, Management of Broadleaf Signalgrass (Brachiaria

platyphylla) in Peanuts (Arachis hypogaea) with Herbicides," Weed Science. XXX (January 1982), pp. 4-0-4-4-•

2. and Coble, Harold, Inference of Broadleaf Signalgrass (Brachiaria

platyphylla) in Peanuts (Arachis hypogaea)," Weed Science, XXX (January 1982), pp. 4.5-4.9.

3« Johnson, Douglas H., "The Comparison of Usage and Availability Measurements For Evaluating Resource Preference," Ecology. LXI (February 1980), pp. 65-71.

4-. Santos, P. F. and Whitford, W. G. , "The Effects of Microarthropods on Litter Decomposition in a ?5o?V a h u a n !? e s e^ Ecosystem," Ecology. LXII (June, 1981), pp. 654--663.

Articles That Used Unrecommended Procedures

1. May, Philip R. A.; Tuma, A. H.; and Yale, Coralee, "Schizophrenia: A Follow-Up Study of Results of Treatment," Archives of General Psychiatry. XXXTTT (April, 1976), pp. 481-4.86.

2. Nilwik, H. J. M., "Growth Analysis of Sweet Pepper (Cap-sicum annum L.): The Influence of Irradiance and Temperature Under Greenhouse Conditions in Winter," Annals of Botany. XLVIII (August, 1981), pp. 129-136.

3. Rees, R. G.; Thompson, J. P.; and Mayer, R. J., "Slow a n d Tolerance to Rusts in Wheat: The Progress

and^Effects of Epidemics of Puccinia graminis tritici in Selected Wheat Cultivars," Australian Journal of Agricultural Research. XXX (May, 1979), pp. 4.O3-4T9.

174

Articles on Multiple Comparisons that Clearly Support the Carmer and Swanson Findings

Keselman, H. J. and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Paramtric Tests for Multiple Comparisons," British Journal of Mathematical and Statistical Psychology. XXX fMav 1977)7 PP- 12fPT33.

2. Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic " Psychological Biy^etin, LXXXVI (July 1979), pp. 8 8 ^

3. Wike, Edward L. and Church, James D., "Further Comments on N o*P<™etric M u l t iP l e Comparison Tests," Perceptual and Motor Skills, XLV (December, 1977), pp. 917-918.

Articles That Made <a Passing Reference To the Carmer and Swanson Studies

1. Adwinckle, Herb S.; Polach, F. J.; Molin, W. T.; and Pear-son, R. C., Pathogenicity of Phytophthora cactorum Isolates from New York Apple Trees and Other

989-994'" P h y t o p a t h o l o ^ Y ' L X V (September 1975), pp.

2. Carmer, S. G., "Optimal Significant Levels for Application

the Least Significant Difference in Crop Perfor-

1 9 7 6 ) S c i e n c e > XVI (January-February

3. Daniel, Wayne W.; Coogler, Carol G., "Statistical Applica-tions in Physical Medicine," American Journal of Physical Medicine % LIV (February 1975)"*

4. Gill, J. L , ''Evolution of Statistical Design and Analysis

198^ ̂ ^ 4 9 4 - 1 ~ Science, LXIV (June

5. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their Relation-ship to Power," Journal of Dairy Science. LVITT (September 1975), pp. 1372-1378.

175

6. Keselman, H. J. and Rogan, Joanne C., "The Tukey Multiple Comparison Test: 1953-1976," Psychological Bulletin, LXXXIV (September 1977), pp. 1050-1056.

7. Madden, L. V.; Knoke, J. K.; and Raymond, Louie, "Con-siderations for the Use of Multiple Comparison Procedures in Phytopathological Investigations," Phytopathology. LXXII (August 1982), pp. 1015-1017.

8. Petersen, R. G., "Use and Misuse of Multiple Comparison Procedures," Agronomy Journal. LXIX (March-April, 1977), pp. 205-208.

Articles Openly Critical of the Carmer and Swanson Findings

1. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (1975), ^p, 574.-583.

2. Games,^Paul, "A Three-Factor Model Encompassing Many Pos-sible Statistical Tests on Independent Groups," Psychological Bulletin. LXXXV (January 1978), pp. 168—182•

3. Ryan, T. A., "Comment on 'Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Om-nibus Test Statistic,'" Psychological Bulletin, LXXXVIII (September 1980), pp. 354.-355.

APPENDIX G

TABLES XI TO XIV SHOWING CRITICAL DIFFERENCES FOR EACH MULTIPLE COMPARISON PROCEDURE FOR

EQUAL N'S FOR k=3 TO k=6

TABLE XI

CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=3 AND J=1 TO $

J r i* (F)LSD MRT SNK HSD SSD q(2) m(r) q(r) q(k)

1 2 728 13.768 13.768 13.768 3 14.4-39 16.853

2 2 758 10.673 10.673 10.673 3 11.206 12.896

3 2 53 8.908 8.908 8.908 3 9.376 10.714

-4 2 117 7.331 7.331 7.331 3 7.719 8.809

5 2 803 8.572 8.572 8.572 3 9.019 10.296

"i=randomly selected iteration from

176

177

TABLE XII

CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=4 AND J=1 TO 5

(F)LSD MRT SNK HSD q(2) m(r) q(r) q(k)

SSD

1 2 165 17 .115 3 4

2 2 652 9 . 8 4 5 3 4

3 2 249 10 .035 3 4

4 2 683 9 . 2 8 4 3 4

5 2 191 7 . 2 9 5 3 4

17 .115 17 .115 2 3 . 1 0 5 2 5 . 1 5 ^ 17 .971 2 0 . 8 2 3 18.4-27 2 3 . 1 0 5

9 . 8 4 5 9.84-5 13 .074 U . 2 2 9 10 .359 11 .860 10.654- 13 .074

10 .035 10 .035 1 3 . 2 6 9 14 .440 10 .565 1 2 . 0 5 9 10 .912 1 3 . 2 6 9

9 . 2 8 4 9 . 2 8 4 12 .252 1 3 . 3 3 7 9 . 7 6 5 11 .151

10 .094 12 .252

7 . 2 9 5 7 . 2 9 5 9 . 6 0 9 10 .465 7 .661 8 . 7 5 9 7 .921 9 . 6 0 9

178

TABLE XIII


(F)LSD q(2)

MRT m(r)

SNK q(r)

HSD q(k)

SSD

4

2 3 4 5

2 3 4 5

2 3 4 5

2 3 4 5

2 3

5

382 16.028

224 11.319

869 9.4-62

295 7.14-3

793 7.860

16.028 16.84.3 17.278 17.658

11 .319 11.914-12.281 12.549

9-462 9.956 10.291 10.500

7.143 7.502 7.756 7.930

7.860 8.260 8.54-1 8.737

16.028 19.14-1 21.515 22.982

11.319 13.611 U.990 15.922

9.462 11.365 12.493 13.297

7.14-3 8.576 9.410 10.019

7.860 9.432 10.330 11.004

22.982 26.034

15.922 18.063

13.297 15.028

10.019 11.320

11.004 12.427

179

TABLE XIV

CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=6 AND J=1 TO $

(F)LSD q ( 2 )

MRT q' (r)

SNK q (r)

HSD q (k)

SSD

1 2 619 18 .872 18 .872 18 .872 3 19.841 22 .814 4 20 .358 2 5 . 2 0 6 5 20 .811 26 .951 6 21 .199 28 .243

2 2 298 11 .576 11 .576 11 .576 3 12 .188 13 .912 4 12 .583 15.311 5 12 .840 16 .159 6 13 .073 17 .048

3 2 856 9 .391 9 .391 9 .391 3 9 .871 11 .278 4 10 .205 12 .384 5 10 .425 13 .184 6 10 .625 13 .784

4 2 313 7. .170 7 .170 7 .170 3 7 .532 8 . 6 0 5 4 7 . 7 8 8 9 . 4 2 8 5 7 . 9 6 7 10 .042 6 8 .121 10 .503

5 2 326 6. 00

00

6 . 8 5 8 6 . 8 5 8 3 7 .216 8 . 2 1 6 4 7 .462 9 . 0 0 3 5 7 .634 9 . 5 3 8 6 7 .781 10 .016

28.24-3 33 .081

17.04-8 19.951

13 .784 16 .114

10 .503 12 .262

10 .016 11 .748

APPENDIX H

TABLES XV TO XVIII SHOWING CRITICAL DIFFERENCES FOR EACH MULTIPLE COMPARISON PROCEDURE FOR

UNEQUAL N'S FOR k=3 TO k=6

TABLE XV


J d" i (F)LSD q (2 )

MRT m (r)

SNK q (r)

HSD-SS q (k)

HSD-TK q (k)

SSD

6 1 2 3

728 10 .60 10 .06

8 . 8 7

11 .16 12 .75 14 .00 14 .00 11 .43

12 .75 12 .09 10 .67

13 .32 12.64. 11 .14

7 1 2 3

895 7 . 2 2 5 .71 5 .71

7 . 5 8 8 . 6 6 8 . 6 7 8 . 6 7 8 . 6 7

8 . 6 6 6 . 8 5 6 . 8 5

9 . 0 3 7 . U 7 . 1 4

id order of mean difference comparison tested

180

181

TABLE XVI


J d i (F)LSD MRT SNK HSD-SS HSD-TK q(2) m (r) q (r) q (k) q (k)

SSD

6 1 951 8.66 9.42 11 .44 2 7.61 8.01

11 .44

3 9.82 9.41 —

4 10.32 10.89 —

5 9.4-9 — _ 6 8.28 - -

7 1 833 7.99 8.69 10.50 2 6.32 — —

3 7.99 —

4 7.99 — _ 5 6.32 —

6 6.32 — —

12.22 11.44 12.45 10.59 10.05 10.93 14.97 12.97 14.11 14.97 13.67 14.88 14.97 12.53 13.64 12.22 10.94 11 .90

10.51 10.50 11 .47 10.51 8.30 9.07 10.51 10.50 11.47 10.51 10.50 11 .47 10.51 8.30 9.07 10.51 8.30 9.07

182

TABLE XVII


J d i (F)LSD MRT SNK HSD-SS HSD-TK SSD q(2) m (r) q (r) q (k) q (k)

6 1 571 9.62 10.68 13.49 2 10.20 —

3 9.85 — —

4 10.75 —

5 8.33 — _ 6 8.99 —

7 8.60 — __ 8 7.13 —

9 7.90 —

10 7.60 - -

7 1 A3 7.AA 8.29 10.39 2 5.88 —

10.39

3 7.44 — —

4 7.44 — —

5 7.4-4- — —

6 5.88 —

7 7.4-4- —

8 7.4-4- — _ 9 5.88 — _ 10 5.88 —

16.51 13 • A9 15.24 16.51 U .31 16.16 16.51 13 .82 15.62 16.51 15 .08 17.04 13.4-8 11 . 68 13.20 13.4-8 12 .62 14.25 13.4-8 12 .06 13.63 10.4.4. 10 .00 11 .30 11.67 11 .08 12.52 11.67 10 . 66 12.05

10.38 10 .39 11 .80 10.38 8 .21 9.33 10.38 10 .39 11 .80 10.38 10 .39 11 .80 10.38 10 .39 11 .80 10.38 8 .21 9.33 10.38 10, .39 11 .80 10.38 10, .39 11 .80 10.38 8, .21 9.33 110.38 8, .21 9.33

183

TABLE XVIII


J d i (F)LSD MRT SNK q (2) m ( r ) q ( r )

6 1 793 10.06 11 .41 14.72 2 8 .66 —

14.72

3 7 .87 —

4 6 .98 —

5 7 .35 —

6 10.50 —

7 9 .17 — _ 8 8 .42 — —

9 7 .60 —

10 10.25 — ...

11 8 . 8 7 —

12 8 .10 _ ,

13 10 .87 —

14 9 . 5 9 —

15 11 .46 - -

7 1 601 8 . 3 9 9 .54 12.21 2 6 .63

9 .54

3 8 .39 — _

4 8 . 8 9 —

5 8 .89 —

6 8 .89 — _ 7 6 .63 —

8 8 .39 — —

9 8 . 3 9 — « _

10 8 .39 — _ 11 6 .63 _ 12 8 .39 — —

13 8 .39 _

14 6 .63 —

15 6 .63 —

HSD-SS q (k)

HSD-TK q (k)

SSD

18.34 14.72 14.98 12 .67 12 .97 11 .51 10.59 10.21 11.60 10.75 18.34 15.36 14.98 13.41 12.97 12.32 11.60 11.12 18.34 14.99 14.98 12.98 12 .97 11.85 18.34 15.90 14 .98 14.02 18.34 16.76

12.21 12.21 12.21 9 . 6 6 12.21 12.21 12.21 12.21 12.21 12.21 12.21 12.21 12.21 9 .66 12.21 12.21 12.21 12.21 12.21 12.21 12.21 9 .66 12.21 12.21 12.21 12.21 12.21 9 .66 12.21 9 .66

17.21 U . 8 1 13.46 11 .94 12 .57 17.96 15.68 14-4 0 13.00 17.53 15.18 13.86 18.59 16.4-0 19.60

14.42 11.40 14.42 14.42 14.42 14.42 11.40 14.42 14.42 14.42 11.40 14.42 14.42 11.40 11.40

APPENDIX I

This appendix contains summaries of the z-tests per-

formed between the HSD, and its modifications, and the FLSD

multiple comparison procedures. Table XXV shows the results

of the z-tests when the Bernhardson formulas are applied to

both HSD and LSD procedures. Table XXVI shows the results of

z-tests when the unprotected HSD is compared with the FLSD.

In every case, significant z scores were produced by the

conservatism of the HSD rather than the liberalism of the

FLSD.

184

185

TABLE XXV

Z-TESTS FOR SIGNIFICANT DIFFERENCE OF PROPORTIONS BETWEEN EXPERIMENTWISE TYPE I ERROR RATES FOR THE HSD AND

FLSD MULTIPLE COMPARISON PROCEDURES

FLSD HSD Standard error of z

k J CNT % CNT % Difference score SIG?

3 1 50 .050 45 .045 0.0067 0.7433 3 2 52 .052 47 .047 0.0069 0.7289 3 3 50 .050 46 .046 0.0068 0.5917 —

3 A 50 .050 48 .048 0.0068 0.2930 —

3 5 54 .054 51 .051 0.0071 0.4254 3 6* 53 .053 34 .034 0.0065 2.9456 Yes 3 6** 53 .053 51 .051 0.0070 0.2849 3 7- 53 .053 21 .021 0.0060 5.3609 Yes 3 7" 53 .053 48 .048 0.0069 0.7221 4 1 51 .051 40 .040 0.0066 1.6692 _

A 2 5A .054 46 .046 0.0069 1.1608 A 3 53 .053 48 .048 0.0069 0.7221 _

A A 45 .045 36 .036 0.0062 1.4438 A 5 54 .054 45 .045 0.0069 1.3121 A 6- 46 .0^6 24 .024 0.0058 3.7855 Yes A 6--* 46 .046 40 .040 0.0064 0.9353 _

A 7" 45 .045 23 .023 0.0057 3.8388 Yes A 7* -* 45 .0A5 38 .038 0.0063 1.1099 5 1 49 • 0A9 39 .039 0.0065 1.5419 _ 5 2 45 .045 37 .037 0.0063 1.2758 5 3 49 .049 36 .036 0.0064 2.0379 Yes 5 A 47 .047 35 .035 0.0063 1.9137 _ 5 5 53 .053 38 .038 0.0066 2.2761 Yes 5 6< 46 . 046 17 .017 0.0055 5.2504 Yes 5 6* 46 .046 38 .038 0.0063 1.2612 5 7 49 .049 29 .029 0.0061 3.2669 Yes 5 7-:: f* 49 .049 39 .039 0.0065 1.5419 6 1 53 .053 39 .039 0.0066 2.1134 Yes 6 2 49 .049 40 .040 0.0065 1.3802 6 3 53 .053 39 .039 0.0066 2.1134 Yes 6 A 51 .051 38 .038 0.0065 1 .9936 Yes 6 5 54 .054 34 .034 0.0065 3.0837 Yes 6 6- 46 .046 18 .018 0.0056 5.0309 Yes 6 6* 46 .046 34 .034 0.0062 1 .9365 6 7": 48 .048 25 .025 0.0059 3.8784 Yes 6 7-•* 48 .048 33 .033 0.0062 2.4063 Yes

* Spjptvoll-Stoline modification Tukey-Kramer modification

186

TABLE XXVI

Z-TESTS FOR SIGNIFICANT DIFFERENCE OF PROPORTIONS BETWEEN EXPERIMENTWISE TYPE I ERROR RATES FOR THE UNPROTECTED

HSD AND FLSD MULTIPLE COMPARISON PROCEDURES

FLSD HSD Standard error of z

k J CNT % CNT % Difference score SIG?

3 1 50 .050 48 .048 0.0068 0.2930 3 2 52 .052 51 .051 0.0070 0.1431 —

3 3 50 .050 49 .049 0.0069 0.1458 —

3 4 50 .050 54 .054 0.0070 -0.5697 —

3 5 5k .054 54 .054 0.0071 0.0000 _

3 6- 53 .053 34 .034 0.0065 2.9456 Yes 3 6-<* 53 .053 54 .054 0.0071 -0.1405 3 7-f 53 .053 26 .026 0.0062 4.3835 Yes 3 7" 53 .053 57 .057 0.0072 -0.5548 _

4 1 51 .051 46 .046 0.0068 0.7360 4 2 54 .054 50 .050 0.0070 0.5697 4 3 53 .053 55 .055 0.0071 -0.2798 —

4 4 45 .045 49 .049 0.0067 -0.5977 _

4 5 54 .054 52 .052 0.0071 0.2823 4 6- 46 .046 24 .024 0.0058 3.7855 Yes 4 6" 46 .046 53 .053 0.0069 -1.0205 4 7" 45 .045 28 .028 0.0059 2.8667 Yes 4 7" "* 45 .045 47 .047 0.0066 -0.3019 _ 5 1 49 .0^9 47 .047 0.0068 0.2959 —

5 2 45 .045 45 .045 0.0066 0.0000 5 3 49 .049 48 .048 0 .0068 0.1472 5 4 47 .047 46 .046 0 .0067 0.1502 5 5 53 .053 46 .046 0 .0069 1.0205 5 6- 46 . 046 19 .019 0 .0056 4.8150 Yes 5 6-•* 46 .046 47 .047 0.0067 -0.1502 5 7" 49 .049 41 .041 0.0066 1.2203 5 7-•* 49 .049 56 .056 0.0071 -0.9925 _ 6 1 53 .053 50 .050 0.0070 0.4292 6 2 49 .049 49 .049 0.0068 0.0000 6 3 53 .053 55 .055 0.0071 -0.2798 _ 6 4 51 .051 48 .048 0.0069 0.4374 6 5 54 .054 48 .048 0.0070 0.8624 6 6": 46 .046 19 .019 0.0056 4.8150 Yes 6 6" •* 46 .046 46 .046 0.0066 0.0000 6 7"; 48 .04.8 36 .036 0.0063 1.8918 6 7": •* 48 .0^8 51 .051 0.0069 -0.4374 -

Spj j^tvoll-Stoline modification Tukey-Kramer modification

APPENDIX J

GRAPHIC DISPLAYS OF EXPERIMENTWISE TYPE I ERROR RATES IN RELATION TO A 0.9$ CONFIDENCE

INTERVAL FOR a=0.05 and N=1000

J: 1

64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 15

upper limit

FM

NH NH

S S S

lower limit

Jc = 3 k = 4

4 5 6 7 1 2 3 4 5 6 7

FM FM F FM FM FM F NH T M FM

II II II II II II II II II II II II II II II II II II II

NH N T NH : S S S NH FM NH F F

N M M

NH NT T

H* S N

S S

s S S H* H*

H* S

S

(F)LSD (M)RT S(N)K (H)SD HSD-(T)K (H*)SD-SS (S)SD

Fig* 7—-Graphic presentation of experimentwise Type I error rates in relation to 0.95 confidence interval generated by application of Bernhardson formulas for k=3 and k=4.

187

188

J:

k = 5

1 2 3 4. 5 6 7

64-62 60 58 56 54 52 50 48 4-6

42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12

k = 6

1 2 3 4 5 6 7

FM

FM M FM FM

NH T NH NH T

NH—NH N — N -

FM FM FM

F M

F M

NH NH NH NH

F M

H*

NH T

N

H* S

S S S

H*

S S

T N

H*

(F)LSD (M)RT S(N)K (H)SD HSD-(T)K (H*)SD-SS (S)SD

Fig. 8—-Graphic presentation of experimentwise Type I error rates in relation to 0.95 confidence interval generated by application of Bernhardson formulas for k=5 and k=6.

189

J: 1

64 62 60 58 56 54 52 50 4-8 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14

k = 3

3 4 6 7

L128 L126 M104 M101

L109 L108 L127 M90 M90 NH M100

k = 4

3 4

L126 : L181 L195 M100 M128 M142

L120 L182 L207 M94 M132 M139

7

L194 L187 M137 M116

L198 M130

T NH NH T T

NH N N NH N ==========NH================ =============NH=========== NH S NH T

S S NH

N S S S

H*

S S H"

H* s s S H*

S

S

(L)SD (M)RT S(N)K (H)SD HSD-(T)K (H*)SD-SS (S)SD

Fig. 9—Graphic presentation of experimentwise Type I error rates in relation to 0.95 confidence interval generated without prior significant F-ratio for k=3 and k=4.

190

J: r

k = 5

-4. 5. •7

L263 M1 81

1

k = 6

-3 K 5- •7

64.-62 60 58 56 54-52 50 4-8 4-6 U 4.2 40 38 36 34-32 30 28 26 24-22 20 18 16 U 12

L278 L295 : L346 L350 L375 L361

L267 T?7Q toq/M1?8to M 2 3 1 M 2 1 7 M 2 2 2 M 2 0 6

L264. L279 L294- L299 L354. M193 M191 M189 M188 M229

L359 M225

L34-2 M192

T

N

NH NH T NH NH NH N

H*

lower limit-

S S S H";

NH

-NH—NH T

NH NH T N

N

-H*-

S S

H*

S S S S

(L)SD (M)RT S(H)K (H)SD HSD-(T)K (H»)SD-SS (S)SD

Fig. 10 -Graphic presentation of experimentwise Tvne T error rates in relation to 0.95 confidence interval generated Without prior significant F-ratio for k=5 and k=6? Senerated

BIBLIOGRAPHY

Books

Cohen, Jacob, Statistical Power Analysis for the Behavioral Sciences, Revised edition, New York, Academic Press, 1977.

Couch, James V., Fundamentals of Statistics for the Be-havioral Sciences. New York, St. Martin's Press, 1982. '

Federer, Walter T., Experimental Designi Theory and Application. New York, The Macmillan Company, 1955*

Ferguson, George A., Statistical Analysis in Psychology and Education, 5th ed., New York, McGraw Hill Book Publishers, 1981.

Fisher, R. A.,^Statistical Methods for Research Workers. 6th ed., Edinburgh (London), Oliver and Boyd, 1936.

T-r-:—' Design of Experiments. 2nd ed., Edinburgh, Oliver and Boyd, 1937.

Fryer, H. C., Concepts and Methods of Experimental Statistics. Boston, Allyn and Bacon, 1966.

Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology. 2nd ed., Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984.

Hinkle, Dennis E.; Wiersma, William and Jurs, Stephen G., Basic Behavioral Statistics« Boston, Houghton Mif-flin Company, 1982.

Howell, David C., Statistical Methods for Psychology. Boston. Duxbury Press, 1982. ^ '

Johnson, Palmer 0. and Jackson, Robert W. B., Modern Statis-tical Methods: Descriptive and Inductive. Chicago. Rand McNally & Company, 1959.

Kirk, Roger E., Experimental Design: Procedures for the Behavioral Sciences, 2nd ed., Belmont, California, Brooks/Cole Publishing Company, 1982.

191

192

Light, Richard J. and Pillemer, David B. Summing Up: The Science of Reviewing Research. Cambridge, Harvard university Press, 1984..

Miller, R_._ G. , Simultaneous Statistical Inference. New York. McGraw-Hill, 1966. ~ '

Pedhazur, Elazar J., Multiple Regression in Behavioral .Research: Explanation and Prediction. 2nd ed., New lork, Holt, Rinehart and Winston, Inc., 1982.

Roscoe, John T., Fundamental Research Statistics for the Behavioral Sciences. 2nd ed., New lork, Holt"! Rinehart and Winston, Inc., 1975.

Winer, B.J., Statistical Principles in Experimental Design. New York, McGraw-Hill Book Company, 1962.

Articles

Adwinckle, Herb S.; Polach, F. J.; and Molin, W. T., athogenicity of Phytophthora cactorum Isolates

from New York Apple Trees and Other Sources," Phytopathology. LXV (September 1975), pp. 989-994.

Aitkin, M. A.,^"Multiple Comparisons in Psychological

f^!?eq+m+°t+!" T^f B r i t l s h Journal of Mathematical statistical Psychology. XXII (November 1969),

PP • I 7 1 7O •

Anderson, D. A., "Overall Confidence Levels of the Least Significant Difference Procedure," The American Statistician. Vol. XXVI (1972).

Atchley, W. R.; Rutledge, J. J.; Cowley, D. E., "A Multi-variate Statistical Analysis of Direct and Corre-vvvfr? ?®sP°nfe to Selection in the Rat," Evolution. XXXVI (July 1982), pp. 677-698. '

Balaam, L. N., "Multiple Comparisons: A Sampling Experiment," Australian Journal of Statistics. Vol. V (1963).

Bernhardson, Clemens S., "375: Type I Error Rates When Mul-Te?t0of°?MnJiS°nR^Procedures Follow a Significant F 229-232 ' Biometrics, XXXI (March 1975), pp.

Boardman, Thomas J. and Moffitt, D. R., "Graphical Monte Carlo Type I Error Rates for Multiple Comparison Procedures," Biometrics. (September 1971), pp. 738-

193

Bryant, Edwin H., "Morphometric Adaptation of the Housefly, MUSCA DOMESTICA L., in the United States," Evolution, XXXI TSeptember 1977), pp. 580-596.

and Turner, Carl R., "Comparative Mor-phometric Adaptation in the Housefly and Facefly in the United States," Evolution, XXXII (December 1978), pp. 759-770.

Cameron, Guy N. and Kincaid, W. Bradley, "Species Removal Effects on Movements of Sigmodon hispides [cotton rats] and Reithrodontomys fulvescens [harvest mice]", American Midland Naturalist, CVIII (JiHv 1982), pp. 60-67.

Cardon, Kathleen; Anthony, Rita Jo; Hendricks, Deloy G; and Mahoney, Arthur W., "Effect of Atmospheric Oxidation on Bioavailability of Meat Iron and Liver Weights in Rats," Journal of Nutrition. CX (March 1QR0K rm. 567-574.

Carmer, S. G., "Optimal Significant Levels for Application of the Least Significant Difference in Crop Performance Trials," Crop Science, XVI (January-February 1976), pp. 95-99. '

and Swanson, M. R., "Detection of Differences Between Means? A Monte Carlo Study of Five Pairwise Multiple Comparison Procedures," Agronomy Journal. LXIII (1971), p. 940-945.

"An Evaluation of Ten Pairwise Multiple Comparison Procedures by Monte Carlo Methods," Journal of the American Statistical Association. LXVIII (19731, pp. 66-74.

and Walker, W. M., "Baby Bear's Dilemma: A Statistical Tale," Agronomy Journal. LXXIV (January-February 1982), pp. 122-124.

Chamblee, Rick W. ; Thompson, Layfayette; and Bunn, Tommie, "Management of Broadleaf Signalgrass (Brachiaria platyphylla) in Peanuts (Arachis hypogaea) with Herbicides," Weed Science. XXX (January 1982), pp. 40-44«

„T _ — — > a n d Coble, Harold, Inference of^Broadleaf Signalgrass (Brachiaria

platyphylla) in Peanuts (Arachis hypogaea)," Weed Science. XXX (January 1982), pp. 45-49.

194

Daniel, Wayne W.; Coogler, Carol G., "Statistical Applica-tions in Physical Medicine," American Journal of Physical Medicine, LIV (February 1975).

Dhingra, 0. D. and Sinclair, J. B., "Survival of Macrophomina phaseolina Sclerotia in Soil: Effects of Soil Mois-ture, Carbon: Nitrogen Ratios, Carbon Sources, and Nitrogen Concentrations," Phytopathology. LXV (Mnrnh 1975), pp. 236-240.

Duncan, D. B. and Brant, L. J., "Adaptive t Tests for Mul-tiple Comparisons," Biometrics. XXXIX, pp. 790-794..

Dunnett, C. W., "Answer to Query 272: Multiple Comparison Tests," Biometrics. XXVI (September 1969), pp. 139-14-0 •

Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (1975). pp. 574-583. ~ '

Fajemisin, J. M. and Hooker, A. L., "Predisposition to Diplodia Stalk Rot in Corn Affected by Three Helmin-thosporium Leaf Blights," Phytopathology. LXTV (December 1974), pp. U96-1499.

—_____ , "Top Weight, Root Weight, and Root Rot of Corn Seedlings as Influenced by Three Helminthosporium Leaf Blights," Plant Disease Reporter. LVIII (April 1974), pp. 313-3VT.

Farmer, Bonnie R.; Mahoney, Arthur W.; Hendricks, Deloy G.; and Gillett, Tedford, "Iron Bioavailability of Hand-Deboned and Mechanically Deboned Beef," Journal of Food Science. XLII (November-December 1977).pp. 1630-1632. J'

Friedrich,^J. W.; Smith, Dale; and Schrader, L. E., "Herbage Yield and Chemical Composition of Switchgrass as Affected by N, S, and K Fertilizations," Agronomy Journal, LXIX (January-February 1977), pp. 30-33.

Fritzell, Erik K., "Habitat Use by Prarie Racoons During the Waterfowl Breeding Season," Journal of Wildlife Management. XLII (January 1978), pp. 118-127.

Gabriel, Ruben K., "Comment," Journal of the American Static tical Association. LXXIII ("September 1978), pp. 485 4-87 •

s-

195

Games, Paul, ''A Three-Factor Model Encompassing Many Possible Statistical Tests on Independent Groups," Psychological Bulletin, LXXXV (January 1978), pp. 168—182•

, "Inverse Relation Between the Risks of Type I and Type II Errors and Suggestions for the Unequal n Case in Multiple Comparisons," Psychological Bulletin, LXXV (1971), pp. 97-102.

., Keselman, H. J., and Clinch, Jennifer J., "Mul-tiple Comparisons for Variance Hetereogeneity," British Journal of Mathematical and Statistical Psychology, XXXII, (1979), pp. 133-142^

Garcia-de-Siles, J. L. ; Ziegler, J. H.; and Wilson, L. L., Effects of Marbling and Conformation Scores on

Quality and Quantity Characteristics of Steer and Heifer Carcasses," Journal of Animal Science, XT,TV (January 1977), pp. 36-^6.

"Prediction of Beef Quality by Three Grading ' Systems," Journal of Food Science. XLII fMav-.Tunp

1977), pp. 711-715. *

"Growth, Carcass, and Muscle Characters of Hereford ^?lr

Holstein Steers," Journal of Animal Science, XLIV (June 1977), pp. 973-984..

Gill, J. L. "Current Status of Multiple Comparisons of Means

vSl Lvf n(?973r r i m e n t S'" J ° U r n a l — D a i r y Science.

—_—> "Evolution of Statistical Design and Analysis of

^981)lmp U94J1519al ~ D a i r y Science, LXIV (June

Hagman, Joseph D. and Williams, Evelyn, "Use of Distance and Sh?Tr^T?xm M° t 0 r Memory," Perceptual and

Motor Skills, XLIV (June 1977), pp. 867-873.

Hammerstedt, Roy H. and Hay, Sandra R., "Effect of Incubation Temperature on Motility and cAMP Content of Bovine

^Archives of Biochemistry and Biophysics. CXCIX (February 1980;, pp. 4.27-437.

Harrison, R. G. and Massaro, T. A., "Influence of Oxygen and Glucose on the Water and Ion Content of Swine Aorta," American J ournal of Physiology. CCXXXT (December 1976), pp. 1 800-1 80ff.

196

Howell, John F. and Games, Paul A., "The Effects of Variance Heterogeneity on Simultaneous Multiple Comparison Procedures with Equal Sample Size," British Journal of Mathematical and Statistical Psychology, XXVII TT974), pp. 72-81.

Harter, H. Leon, "Error Rates and Sample Sizes for Range Tests in Multiple Comparisons," Biometrics, XIII (1957), pp. 511-536.

Ilyas, M. B.j Ellis, M. A., and Sinclair, J. B., "Evaluation of Soil Fungicides for Control of Charcoal Rot of Soybeans," Plant Disease Reporter. LIX (April 1975), pp. 360-364..

Jensen, Craig, "Generality of Learning Differences in Brain— Weight-Selected Mice," Journal of Comparative and Physiological Psychology. XCIX TJune 1977), ppTT29-64-1.

Johnson, Douglas H., "The Comparison of Usage and Availability Measurements For Evaluating Resource Preference," Ecology. LXI (February 1980), pp. 65-71.

Johnson, Steven B. and Berger, R. D., "On the Status of Statistics in Phytopathology." Phytopathology. (March 1982), pp. 1014-1015.

Kemp, K. E., "Multiple Comparisons: Comparisonwise and Ex-perimentwise Type I Error Rates and Their Relation-ship to Power," Journal of Dairy Science, LVIII (September 1975), pp. 1372-1378.

H. J. Keselman, "A Power Investigation of the Tukey Multiple Comparison Statistic," Educational and Psychological Measurement , XXXVI (1976), pp. 97-104..

. > Games, Paul; and Rogan, Joanne C., "Protect-ing the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin. LXXXVI (July 1979), pp. 884-888 •

and Murray, Robert, "Tukey Tests for Pairwise Contrasts Following the Analysis of Variance: Is There a Type IV Error?," Psychological Bulletin. LXXXI (1974) p. 609.

197

and Rogan, Joanne C., "Effect of Very Unequal Group Sizes on Tukey's Multiple Comparison Test," Educational and

26~3C270°giCal XXXVI (Summer 1976), pp.

and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Parametric Tests for Multiple Comparisons,^ British Journal of Mathematical and Statistical Psychology. XXX (May 1977), pp. 12fPT33.

"The Tukey Multiple Test: 1953-1976," Psychological Bulletin,

LXXXIV (September 1977), pp. 1050-1056.

"A Comparison of the —TT 7 vvmjJUl XQUU U J_ Modifled-Tukey and Scheffe Methods of Multiple Comparisons for Pairwise Contrasts," Journal of the 1978)Can S t ^ i ^ i c a l Association. VXXIII (March

and Toothaker, Larry E., "Comparison of Tukey's T-Method and Scheffe's S-Method for Various Numbers of All Possible Differences of Averages Contrasts Under Violation of Assumptions", Educa-tional^ and^Psychological Measurement. VXX (1 9 7 4 . ) ,

and Shooter, M., •i * y., _ » u u u wxiwwoox « r l • * ./l* Evaluation o f T w o Unequal n. Forms of the Tukey Multiple Comparison Statistic,"Journal of the fo^r!Can S t a t l s t i c a l Association. LXX, (September 1975), pp. 584.-587.

Krapu, Gary and Swanson, George, "Some Nutritional Aspects of Reproduction m Prarie Nesting Pintails," Journal of 1Ao Management. XXXIX (January 1975), pp. 156-I Dfc •

Levin, J. R. and Marascuilo, L. A., "Type IV Errors and Interactions," Psychological Bulletin. LXXVTTT (1972), pp. 368-374.

Lorenz, K. and^Dilsaver, W., "Microwave Heating of Food Materials at Various Altitudes," Journal of Food Science, XLI (May-June 1976), pp. 699-7027

Madden, L. V.; Knoke, J. K.; and Raymond, Louie, "Considera-^ 1 0m! f°r i!16-, ? o f Comparison Procedures m rhytopathological Investigations," Phytopathology. LXXII (August 1982), pp. 1015-1017.

198

Mahoney, Arthur W. and Hendricks, Deloy G., "Some Effects of Different Phosphate Compounds on Iron and Calcium Absorption," Journal of Food Science, XLV (September-October 1978), pp. 1473-14.76.

; Gillett, Tedford, "Effect of Sodium Nitrate on the Bioavailability of Meat Iron for the Anemic Rat," Journal of Nutrition, CIX (December, 1979)«

Farmer, Bonnie R.; and Hendrick, Deloy G., "Effects of Level and Source of Dietary Fat on the Bioavailability of Iron from Turkey Meat for the Anemic Rat," Journal of Nutrition, CX (August. 1980), pp. 1703-1708.

May, Philip R. A.; Tuma, A. H.j and Yale, Coralee, "Schizophrenia: A Follow-Up Study of Results of Treatment," Archives of General Psychiatry. XXXIII (April, 1976), pp. 4.81-4.85"!

Mills, David E. and Robertshaw, David, "Response of Plasma Prolactin to Changes in Ambient Temperature and Humidity in Man," Journal of Clinical Endrocrinology and Metabolism. LII (February 1981), pp. 279-283.

Nilwik, H. J. M., "Growth Analysis of Sweet Pepper (Capsicum annum L.): The Influence of Irradiance and Tempera-ture Under Greenhouse Conditions in Winter," Annals of Botany, XLVIII (August, 1981), pp. 129-136.

O'Neill, R. and Wetherhill, G. B., "The Present State of Multiple Comparison Methods," Royal Statistical Society (Series B), XXXIII (197TK

Parker, Robert J.; Hartman, Kathleen D.; and Sieber, Susan M., "Lymphatic Absorption and Tissue Disposition of Liposome-entrapped [C]Adriamysin Following In-traperitoneal Administration to Rats," Cancer Research, XLI (April, 1981), pp. 1311-1317.

Petersen, R. G., "Use and Misuse of Multiple Comparison Procedures," Agronomy Journal. LXIX (March-April, 1977), pp. 205-208.

Petrinovich, Lewis F. and Hardyck, Curtis D., "Error Rates for Multiple Comparison Methods: Some Evidence Concerning the Frequency of Erroneous Conclusions," Psychological Bulletin. Vol. VXXI (1969), pp. 43-54.

199

Ramsey, Philip H., "Power Differences Between Pairwise Mul-tiple Comparisons," Journal of the American Statis-tical Association, LXXIII 0 978) , p. 4-79.

Rees, R. G.; Thompson, J. P.; and Mayer, R. J., "Slow Rusting and Tolerance to Rusts in Wheat: The Progress and Effects of Epidemics of Puccinia graminis tritici in Selected Wheat Cultivars," Australian Journal of Agricultural Research, XXX (May, 1979), pp. 4.03—4.19.

Richards, J. Scott; Hurt, Michael; and Melamed, Laurence, "Spinal Cord Injury: A Sensory Restriction Perspective," Archives of Physical Medicine and Rehabilitation, LXIII (May, 1982), pp. 195-199.

Rominger, R. S.; Smith, Dale; Petersen, L. A., "Yields and Elemental Composition of Alfalfa Plant Parts at Late Bud Under Two Fertility Levels," Canadian Journal of Plant Science, LV (January, 1975), pp. 69-75.

"Yield and Chemical Composition of Alfalfa as Influenced by High Rates of K Topdressed as KC1 and K?S0,," Agronomy Journal, LXVIII (July-August, T975), pp, 573-577.

Ryan, T. A., "Comment on 'Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic,'" Psychological Bulletin. LXXXVIII (September 1980), pp. 354—355.

Santos, P. F. and Whitford, W. G., "The Effects of Microarthropods on Litter Decomposition in a Chihuahuan Desert Ecosystem," Ecology. LXII (June, 1981 ) , pp. 654.-663.

Smith, Dale and Rominger, R. S., "Distribution of Elements Among Individual Parts of the Orchard Grass Shoot and Influence of Two Fertility Levels," Canadian Journal of Plant Science, LIV (July, 1974.), pp. 485— 4-94-«

Solso, Robert L. and McCarthy, Judith, E., "Prototype Forma-tion of Faces: A Case of Pseudo-memory," British Journal of Psychology. LXXII (November, 1971), pp. 4-99-503.

Spring, David R. and Dale, Philip S., "Discrimination of Linguistic Stress in Early Infancy," Journal of Speech and Hearing Research, XX (June. 1977). bd. 224-232.

200

Steel, R. G. D., "Query 163: Error Rates in Multiple Comparisons," Biometrics, (1961), pp. 326-328.

Thatcher, R. W. ; Lester, M. L.; McAlaster, R.; and Horst, R., "Effects of Low Levels of Cadmium and Lead on Cogni-tive Functioning in Chiildren," Archives of Environ-mental Health, XXXVII (May-June, 1982), pp. 159-166.

Volenec, Jeff; Smith, Dale; Soberalske, R. M.; and Ream, H. W., "Greenhouse Alfalfa Yields With Single and Split Applications of Deproteinized Alfalfa Juice," Agronomy Journal, LXXI (July-August, 1971)* pp. 695-697.

Waller, Ray A. and Duncan, David B., "A Bayes Rule for the Symmetric Multiple Comparisons Problem," Journal of the American Statistical Association, LXIV (December 1969), p. 14-85.

Welsch, Roy E., "Stepwise Multiple Comparison Procedures," Journal of the American Statistical Association, LXXII (1977TT~pp. 566-575.

Wike, Edward L. and Church, James D., "Further Comments on Nonparametric Multiple Comparison Tests," Perceptual and Motor Skills, XLV (December, 1977), pp. 917-918.

Willson, V. L., "Research Techniques in AERJ Articles: 1969— 1978," Educational Researcher, IX ("1980), pp. 5 — 10.

Reports and Manuals

Barcikowski, Robert S., "Statistical Power With Group Mean As the Unit of Analysis," ED 191 910, National In-stitute of Education Grant, (Ohio State University, 1980).

Carmer, S.G. and Walker, W. M., "Pairwise Multiple Com-parisons Procedures for Treatment Means," Technical Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983), pp. 1-33.

Wilkinson, Leland, SYSTAT: The System for Statistics SYSTAT, Inc., Evanston, 111., 1984-•

Unpublished Materials

Carmer, S. G., Professor of Biometry, University of Illinois, Urbana, Illinois, Personal letter received January U , 1985.

201

Kirk, Roger E., Professor of Psychology, Baylor University, Waco, Texas, Personal letter received January 22, 1985.

Myette, Beverly M» and White, Karl R., "Selecting An Ap-propriate Multiple Comparison Technique: An Integra-tion of Monte Carlo Studies," Paper presented before the Annual Meeting of the American Educational Research Association, March 19—23> 1982.

Thomas, D. A. H., "Error Rates in Multiple Comparisons Among Means: Results of a Simulation Exercise," Un-published Master's Thesis, University of Kent, Canterberry, England.

R. A. Waller, "On the Bayes Rule for the Symmetric Multiple Comparison Problem," Unpublished Notes, Kansas State University, Manhatton 66506.

unt digital library/67531/metadc...chapter i introduction one of the most popular and useful...

Documents