unt digital library/67531/metadc...chapter i introduction one of the most popular and useful...
TRANSCRIPT
2n°\ M0/d
A MONTE CARLO ANALYSIS OF EXPERIMENTWISE AND COMPARISONWISE
TYPE I ERROR RATE OF SIX SPECIFIED MULTIPLE COMPARISON
PROCEDURES WHEN APPLIED TO SMALL k's AND EQUAL AND
UNEQUAL SAMPLE SIZES
DISSERTATION
Presented to the Graduate Council of the
North Texas State University in Partial
Fulfillment of the Requirements %
For the Degree of
DOCTOR OF PHILOSOPHY
By
William R. Yount, B.S., M.R.E., Ed.D.
Denton, Texas
December, 1985
f/nJ
lount, W., A Monte Carlo Analysis of Experimentwise and
Comparisonwise Type I_ Error Rates of Six Specified Multiple
Comparison Procedures When Applied to Small k's_ and Equal and
Unequal Sample Sizes. Doctor of Philosophy (Educational
Research), December, 1985, 201 pp., 26 tables, 10 figures,
bibliography, 111 titles.
The problem of this study was to determine the dif-
ferences in experimentwise and comparisonwise Type I error
rate among six multiple comparison procedures when applied to
twenty-eight combinations of normally distributed data. These
were the Least Significant Difference, the Fisher-protected
Least Significant Difference, the Student Newman-Keuls Test,
the Duncan Multiple Range Test, the Tukey Honestly Sig-
nificant Difference, and the Scheffe Significant Difference.
The Spj^tvoll-Stoline and Tukey—Kramer HSD modifications were
used for unequal n conditions.
A Monte Carlo simulation was used for twenty-eight
combinations of k and n. The scores were normally distributed
(y=100; a=10). Specified multiple comparison procedures were
applied under two conditions: (a) all experiments and (b)
experiments in which the F-ratio was significant (0.05).
Error counts were maintained over 1000 repetitions.
The FLSD held experimentwise Type I error rate to
nominal alpha for the complete null hypothesis. The FLSD was
more sensitive to sample mean differences than the HSD while
protecting against experimentwise error. The unprotected LSD
was the only procedure to yield comparisonwise Type I error
rate at nominal alpha. The SNK and MRT error rates fell
between the FLSD and HSD rates. The SSD error rate was the
most conservative. Use of the harmonic mean of the two
unequal sample n's (HSD-TK) yielded uniformly better results
than use of the minimum n (HSD-SS). Bernhardson's formulas
controlled the experimentwise Type I error rate of the LSD
and MRT to nominal alpha, but pushed the HSD below the 0.95
confidence interval. Use of the unprotected HSD produced
fewer significant departures from nominal alpha. The for-
mulas had no effect on the SSD.
TABLE OF CONTENTS
Page
LIST OF TABLES
LIST OF ILLUSTRATIONS vii
Chapter
I. INTRODUCTION 1
Statement of the Problem Purpose of the Study-Hypotheses Significance of the Study The Model of the Study Definitions Assumptions Chapter Bibliography
II. SYNTHESIS OF RELATED LITERATURE 15
Introduction to Multiple Comparisons The Concepts of Error Rate and Power
Types of Error Rates The Concept of Power
Implications of Error Rate and Power The Development and Definition of Multiple
Comparison Procedures Multiple Comparisons in Graduate Research The Critical Difference Values of Multiple
Comparison Procedures The Least Significant Difference Tukey's Honestly Significant Difference The Student Newman-Keuls Test The Duncan Multiple Range Test The Scheffe Significant Difference
The Research of Carmer and Swanson The Carmer and Swanson Model Articles Citing the Carmer and Swanson
Studies The Research of Clemens Bernhardson Chapter Bibliography
III. PROCEDURES, . 67
The Simulation Plan Generating Random Numbers
1X1
Interpolating Critical Value Tables The Main BASIC Program Summary Chapter Bibliography
IV. ANALYSIS OF DATA 86
Testing the Hypotheses Related Findings Chapter Bibliography
V. CONCLUSIONS, RECOMMENDATIONS, AND SUGGESTIONS FOR FURTHER STUDY 97
Appendix A 1 08 Chi-square test of random number generator
Appendix B 119 Two Samples of Data Generated by the Main Program
Appendix C 120 Main BASIC Program Listing
Appendix D 135 Analysis of the Stepwise Testing Procedure
Appendix E 140 Summary Sheets from Data Analysis
Appendix F 69 List of Articles Citing Carmer and Swanson Studies
Appendix G Critical Differences for each Multiple Comparison Procedure for equal n's for k=3 to k=6
Appendix H 180 Critical Differences for each Multiple Comparison Procedure for unequal n's for k=3 to k=6
Appendix I Results of z—tests between FLSD and Protected and Unprotected HSD Procedures for each k,J
Appendix J gy Graphic Displays of Error Rates for Protected and Unprotected Multiple Comparison
BIBLIOGRAPHY
IV
LIST OF TABLES
Table Page
I. Comparison Between Experimentwise and Per Comparison Error Rates 17
II. Comparison Between Error Rates of the SNK and MRT Procedures 35
III. Multiple Comparison Procedures Used in Dissertations on File With Dissertation Abstracts International 37
IV. Comparison of Critical Values of the (F)LSD and HSD Multiple Comparison Procedures as k Varies 39
V. Comparison of Critical Values of (F)LSD, HSD, and SNK Multiple Comparison Procedures as r Varies 4.4
VI. Comparison of Critical Values of (F)LSD, HSD, SNK, and MRT Multiple Gompari son Procedures as r Varies 46
VII. Comparison of Critical Values of (F)LSD, HSD, SNK, MRT and SSD Multiple Comparison Procedures as r Varies 4.8
VIII. Mean Chi-Square Values for Ten Repetitions of N = 1000 Scores and a Given U 69
IX. Mean Chi-Square Values for Ten Repetitions of N Scores with U = 20 71
X. Mean and Standard Deviation Values for Ten Sets of N = 10,000 Scores 72
XI. Critical Differences for Each of the Testing Procedures for k=3 and J=1 TO 5 176
XII. Critical Differences for Each of the Testing Procedures for k=4. and J=1 TO 5 177
XIII. Critical Differences for Each of the Testing Procedures for k=5 and J=1 TO 5 178
v
XIV.
XV.
XVI.
XVII.
XVIII.
XIX.
XX.
XXI.
XXII.
XXIII.
XXIV.
XXV.
XXVI.
Critical Differences for Each of the Testing Procedures for k=6 and J=1 TO 5 179
Critical Differences for Each of the Testing Procedures for k=3 and J-6 TO 7 180
Critical Differences for Each of the Testing Procedures for k=4 and J=6 TO 7 181
Critical Differences for Each of the Testing Procedures for k=5 and J=6 TO 7. . . . . . .182
Critical Differences for Each of the Testing Procedures for k=6 and J=6 TO 7 183
Variables Associated with Specified Error Counts for Each Multiple Comparison Procedure and Two Kinds of Type I Error 81
Counts of Significant F-Ratios for 1000 Repetitions of Each k,J Combination 83
Experimentwise Error Rates for Multiple Comparison Procedures Averaged Across Unequal N'S 87
Comparisonwise Error Rates for Multiple Comparison Procedures Averaged Across Unequal N'S 89
Experimentwise Error Rates for Multiple Comparison Procedures Averaged Across Equal N'S 90
Comparisonwise Error Rates for Multiple Comparison Procedures Averaged Across Equal N'S 91
Z-tests for Significant Difference of Proportions Between Experimentwise Type I Error Rates for the HSD and FLSD Multiple Comparison Procedures 185
Z-tests for Significant Difference of Proportions Between Experimentwise Type I Error Rates for the Unprotected HSD and FLSD Multiple Comparison Procedures 186
VI
LIST OF ILLUSTRATIONS
Figure Page
1 . Three Kinds of Error in Hypothesis Testing . . . . . 20
2. Effect of Size of Difference between Means on Error Rates and Power 23
3. Effect of Population Variance on Error Rates and Power 24.
4-. Effect of Per Comparison and Experimentwise
Type I Error Rates on Power 25
5. The Randomized Block Design 53
6. Comparison of two research designs 54. 7. Graphic Presentation of Experimentwise Type I
Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated by Application of Bernhardson Formulas for k=3 and k=4. 187
8. Graphic Presentation of Experimentwise Type I Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated by Application of Bernhardson Formulas for k=5 and k=6 188
9. Graphic Presentation of Experimentwise Type I Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated Without Prior Significant F-ratio for k=3 and k=4- 189
10. Graphic Presentation of Experimentwise Type I Error Rates in Relation to 0.95 Confidence Interval for a =0.05 and N=1000 Generated Without Prior Significant F-ratio for k=5 and k=6
Vll
CHAPTER I
INTRODUCTION
One of the most popular and useful statistical tech-
niques in research is analysis of variance (9, p. 237). The
most common use of analysis of variance is in testing the
hypothesis that k > 2 population means are equal (19, p.
90). The purpose is to determine whether the sample means
are indicative of experimental treatment effects or merely
reflect chance variation (17, p. 511). Two statistical
conclusions are possible. Either the null condition of y^ =
y2 ~ * * * = "̂s tenable o r it is rejected. But the re-
jection of the null hypothesis tells us nothing about which
means differ significantly from which other means (8, p.
368). Therefore, a significant omnibus F-ratio may raise
more questions than it answers (9, p. 275)• When researchers
want to know which means in an experiment differ sufficiently
to produce the significant F-ratio, they study differences
between pairs of means by using search techniques called
multiple comparison procedures (17, p. 511). However, these
procedures vary in definition and implementation. It is
difficult to understand the differences between the various
approaches or to select the procedure which will yield the
most reliable results (2, p. 738).
In an effort to empirically clarify the problem of
selecting the appropriate multiple comparison procedure,
Carmer and Swanson conducted two Monte Carlo studies. The
procedures studied in 1971 were the Least Significant Dif-
ference (LSD), the Fisher-protected Least Significant Dif-
ference (FLSD), Tukey's Honestly Significant Difference
(HSD), Duncan's Multiple Range Test (MRT), and the Bayes
Least Significant Difference (BLSD). Their recommendation
was for the Fisher-protected Least Significant Difference (3,
p. 945).
The 1973 study included the five multiple comparison
procedures of the 1971 study and added the Scheffe Sig-
nificant Difference (SSD), the Student Newman-Keuls Test
(SNK), and a second Bayesian procedure called the Bayes Exact
Test (BET). The Bayes Least Significant Difference was
renamed the Bayesian Significant Difference (BSD). The
Fisher-protected Least Significant Difference was refined
into three approaches. The Least Significant Difference was
applied when the F-ratio was found significant at the 0.01
level (FSD1), the 0.05 level (FSD2), and the 0.10 level
(FSD3) (4-, p. 67). Their recommendation was for the FSD2:
the Least Significant Difference when the F-ratio is sig-
nificant at 0.05. The HSD, SSD, SNK and FSD1 were eliminated
because they lacked power. The FSD3, LSD and MRT were
eliminated because they did not sufficiently protect against
experimentwise Type I errors. The BSD was eliminated because
3
the BET did slightly better. Both the BET and FSD2 were
recommended, but due to its parsimony, the FSD2 was recom-
mended as the multiple comparison of choice (4-» p. 74-).
However, Einot and Gabriel state that the Carmer and
Swanson studies are "misleading" because their conclusions
are simple consequences of the two basic kinds of Type I
error rate defined by the techniques, rather than the tech-
niques themselves (6, p. 5*74-—5*75) - Some multiple comparison
procedures use an experimentwise Type I error rate while
others use a comparisonwise Type I error rate (6, p. 575; 9,
p. 278; 21, p. 327). Einot and Gabriel fault the Carmer and
Swanson studies for failing to consider the different Type I
error rates (6, p. 574-) •
Their solution to this problem was to set all multiple
comparison procedures to the same experimentwise Type I error
rate and compare them empirically through the use of a Monte
Carlo simulation. From this study they recommended Tukey's
Honestly Significant Difference (HSD) for its "elegant
simplicity" and power, which they reported was "little below
that of any other method." The Fisher-protected Least Sig-
nificant Difference (FLSD) was rejected because of its
liberal experimentwise Type I error rate (6).
Comparable conflict surrounds the Student Newman-Keuls
Test. Recent statistical texts recommend the Student Newman-
Keuls as the procedure of choice (7, p. 312.; 8, p. 376; 9,
p» 307). However, Einot and Gabriel reject it because of its
k
excessive experimentwise Type I error rate (as compared to
the HSD) (6, p. 582). Likewise, Carmer and Swanson reject it
for its inability to detect real differences among means (as
compared to the FLSD) (4, p. 73). The confusion over mul-
tiple comparison procedures does not stop with Student
Newman-Keuls. Kirk summarizes his chapter on multiple com-
parisons by emphasizing that each has been recommended by one
or more statisticians (19, p. 127). Thus it is obvious that
conflicting recommendations abound in the area of multiple
comparisons.
Statement of the Problem
The problem of this study was to determine the dif-
ferences in experimentwise and comparisonwise Type I error
rates among six specified multiple comparison procedures.
Purpose of the Study
The purpose of this study was to empirically analyze the
Least Significant Difference, the Fisher—protected Least
Significant Difference, the Student Newman-Keuls Test, the
Duncan Multiple Range Test, the Tukey Honestly Significant
Difference, and the Scheffe/ Significant Difference in terms
of their error rates when applied to k,J experimental com-
binations of normally distributed data generated by Monte
Carlo methods.
5
Hypotheses
The first hypothesis of this study was that there would
be no difference in the ranking of error rates found by
Carmer and Swanson (1973) using large k's and equal n's and
the ranking obtained using small k's and unequal n's.
The second hypothesis of this study was that there would
be no statistically significant difference in experimentwise
Type I error rate between the HSD and FLSD procedures when
using the Bernhardson formulas.
Significance of the Study
This study was considered significant in that it
empirically investigated the error rates for six multiple
comparison procedures for specified k,J experimental combina-
tions of simulated data generated by Monte Carlo methods.
The question was whether the findings of Carmer and Swanson,
applicable to the large k's and equal n's found in agricul-
tural research, generalize to smaller k's and unequal n's
prevalent in educational research.
The study was further considered significant in that it
focused on the FLSD technique to determine if this method,
using the error rate definitions of Bernhardson, can yield
acceptable control for experimentwise Type I errors for data
common in educational research. Myette and White stated that
"further replications of the Carmer and Swanson (1973) [4.]
and Bernhardson (1975) [1] studies need to be conducted." If
further replications confirm that the two-stage t-test is as
accurate as it appears to be, then "the extensive work in
developing new techniques and modifications of existing
techniques may be focused in the wrong areas." Instead of
creating more techniques, "it is more important to systema-
tically integrate the information we now have and to deter-
mine if this simple approach is not only more parsimonious,
but just as accurate" (20, p. 14.).
The Model of the Study
Data for this study was generated according to a com-
pletely randomized design. This is the model used by Kesel-
man and others in several Monte Carlo studies (13, p. 99; 15,
p. 127; 16, p. 48; 18, p. 585). The equation for the com-
pletely randomized design is given by
Yij = H + Tj + c K j ) E'- 1
where u is the population mean, x. is the effect of treatment J
level j subject to the restriction that all effects sum to
zero, and is normally distributed experimental error
(19, p. 135). The population mean in this study was set to
one hundred (100), all treatment effects were set to zero,
and the normally distributed experimental error was simulated
by a pseudo-random number generator (See Appendix A for a
description of the generator).
7
Definitions
Bernhardson formulas.—Formulas developed by Clemens
Bernhardson (4-) calculate a and a only after a prelimi-pc ew r
nary significant F-test.
Critical difference.—The critical difference of a
multiple comparison procedure is the computed difference
between two means required to declare them significantly
different. It is computed by multiplying the procedure's
critical value, taken from the appropriate statistical table,
by the standard error of difference between the two means.
Critical value.—The critical value of a multiple com-
parison procedure is the value drawn from a critical value
table designed for that comparison. The value depends on the
level of significance desired, the number of error degrees of
freedom, and, for some procedures, the number of means in the
experiment or steps between ordered means.
Experimentwise Type I error.—Type I errors are treated
differently by various multiple comparison procedures. Some
are based on an experimentwise Type I error rate, a . This ew
rate is defined as the long run proportion of the number of
experiments containing at least one Type X error divided by
the total number of experiments (9, p. 278).
8
FLSD•—The term FLSD refers to the Fisher-protected
Least Significant Difference procedure which applies the
"unprotected LSD" only after a preliminary F-test is found to
be significant (4, p.67). The FLSD is also called a "two-
stage LSD" ( H , p. 884-).
(F)LSD.—The term (F)LSD refers to both FLSD and LSD
multiple comparison procedures. It is used when references
can apply to either procedure, such as in computing critical
differences used in testing. The (F)LSD is based on a (3, pc v '
A, 5).
HSD.—The term HSD refers to Tukey's Honestly Sig-
nificant Difference multiple comparison procedure which was
developed in 1953. It is one of the most widely used mul-
tiple comparison procedures (19, p. 116). Like the (F)LSD,
it is a simultaneous test procedure in that it uses one
critical value for all comparisons (7, p. 311). The HSD is
based on a g w (21, p. 327).
k,J combination.—This term refers to two major vari-
ables in this study: the number of groups in an experiment,
k, and the sample size category, J. There were four levels
of k representing three, four, five and six groups. There
were seven levels of J. J(1) through J(5) represented equal n
sample sizes of 5, 10, 1$, 20, and 25 respectively. J(6)
represented an unequal set of n.'s in the ratio of J
1:2:3:4:5:6 with 11^=10. That is, when k=3, the sample n's
were 10, 20, and 30. When k=6, the sample n's were 10, 20,
30, 4-0, 50 and 60. J(7) represented a set of n . 's in the J
ratio of 4:1:1:1:1:1 with n.j=80. That is, when k=3, the
sample n's were 80, 20 and 20. This provided twenty-eight
combinations of k,J.
LSD.—The term LSD refers to the Least Significant
Difference multiple comparison procedure. For purposes of
this study, the LSD was applied in the same way as the mul-
tiple t-test (11, p. 521), sometimes referred to as the
"ordinary" LSD or the "unrestricted" LSD (5, p. 10). This is
done to distinguish it from the FLSD which is a "protected"
(4, p. 67) or "restricted" (5, p.11) LSD test.
Monte Carlo method.—The Monte Carlo method consists of
generating simulated random experiments by computer. Scores
are generated by a specified mathematical formula and
categorized into the desired research design. These scores
form the basis for testing statistical procedures (6, p. 579;
10, p. 72).
MRT.—The term MRT refers to the Multiple Range Test,
developed by Duncan (1953). It is a stepwise multiple com-
parison procedure which is based on a (6, p. 575). PC
Per comparison Type I error.—Some multiple comparison
procedures use a per comparison or comparisonwise Type I
10
error rate, a , which is defined as the total number of Type ir ^
I errors made divided by the total number of possible com-
parisons (9, p. 278).
Power.—The power of a statistical test is the
probability that it will correctly reject a false null
hypothesis (9, p. 152).
SNK.—The term SNK refers to Student Newman-Keuls Test
which developed from the work of Student (1927), Newman
(1939), and Keuls (1952). Like the MRT, it is a stepwise
testing procedure and is based on a (6, p. 575). pc
SSD.—The term SSD refers to the Scheffe"multiple com-
parison procedure (1953) which is the most flexible and
conservative of the multiple comparison procedures (7, p.
121). It is able not only to test pairwise comparisons, but
can also test any combination of means against any other
combination of means within the experiment. This flexi-
bility, however, reduces its ability to detect pairwise
differences (19, p. 122). The Scheffe^ Test is based on an
ae w error rate (21, p. 327).
Type I error.—A Type I error is made when two popu-
lation means are declared different when they are actually
equal (22, p. 566). In statistical terms, it is rejecting a
true null hypothesis (12, p. 1374). The probability of
making a Type I error is symbolized by the letter alpha (a).
11
It is also referred to as the "level of significance" (19, p,
36).
Type II error.—A Type II error is made when two means
are declared equal when they are actually different (22, p.
566). Statistically speaking, it is retaining a false null
hypothesis (12, p. 1374.). The probability of making a Type
II error is given by beta (3) (19, p. 36).
Type III error.—A Type III error is made when two
population means are declared different when they are, in
fact, different, but in reverse order (22, p. 566). The
probability of committing a Type III error is given by gamma
(Y) (11, p. 513).
Assumptions
It was assumed that the data produced by the random
number generator used in this study were not different from
data normally collected and analyzed by educational re-
searchers .
It was further assumed that small k's and unequal n's
better reflect the realities of educational research than
large k's and equal n's.
12
CHAPTER BIBLIOGRAPHY
1. Bernhardson, Clemens S., "375: Type I Error Rates When Multiple Comparison Procedures Follow a Significant F Test of ANOVA," Biometrics. XXII (March 1975), pp. 229-232.
2. Boardman, Thomas J. and Moffitt, Donald R., "Graphical Monte Carlo Type I Error Rates for Multiple Com-parison Procedures," Biometrics. (September 1971), pp. 738-74-3.
3. Carmer, S. G. and Swanson, M. R., "Detection of Dif-ferences Between Means: A Monte Carlo Study of Five Pairwise Multiple Comparison Procedures," Agronomy Journal. LXIII (1971), p. 940-945.
4* . , "An Evaluation of Ten Pairwise Multiple Comparison Procedures by Monte Carlo Methods," Journal of the American Statistical Association. LXVIII (1973), pp. 66-74.
5* and Walker, W. M., "Pairwise Multiple Com-parisons Procedures for Treatment Means," Technical Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983). pp. 1-33.
6. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (1975). pp. 574-583.
7. Ferguson, George A., Statistical Analysis in Psychology and Education. 5th ed., New York, McGraw Hill Book Publishers, 1981.
8. Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology. 2nd ed., Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984. '
9. Howell, David C., Statistical Methods for Psychology. Boston, Duxbury Press, 1982.
10. Howell, John F. and Games, Paul A., "The Effects of Variance Heterogeneity on Simultaneous Multiple Comparison Procedures with Equal Sample Size," British Journal of Mathematical and Statistical Psychology. XXVTT (1Q7/,1r pp. 7?_«1
13
11. Harter, H. Leon, "Error Rates and Sample Sizes for Range Tests in Multiple Comparisons," Biometrics, XIII (1957), pp. 511-536.
12. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their Relationship to Power," Journal of Dairy Science, LVIII (September 1975), pp. 1372-1378.
13. H. J. Keselman, "A Power Investigation of the Tukey Multiple Comparison Statistic," Educational and Psychological Measurement , XXXVI (1976), pp. 97-104..
K . ; Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin. LXXXVI (July 1979), pp. 884.-888.
15. and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Parametric Tests for Mul-tiple Comparisons," British Journal of Mathematical and Statistical Psychology. XXX (May 1977), pp. 125-133.
16* , , "A Comparison of the Modified-Tukey and Scheffe Methods of Multiple Comparisons for Pairwise Contrasts," Journal of the American Statistical Association. VXXIII (March 1978), pp. 4-7-52.
17' and Toothaker, Larry E., "Comparison of Tukey's T-Method and Scheffe's S-Method for Various Numbers of All Possible Differences of Averages Contrasts Under Violation of Assumptions", Educa-tional and Psychological Measurement. VXX (1974), pp. 511-519. ~
. ., and Shooter, M., "An Evaluation of Two Unequal n. Forms of the Tukey Multiple Comparison Statisticf" Journal of the American Statistical Association. LXX, (September 1975), pp. 584-587. ~
19. Kirk, Roger E., Experimental Design; Procedures for the Behavioral Sciences, 2nd ed., Belmont, California, Brooks/Cole Publishing Company, 1982.
u
20. Myette, Beverly M. and White, Karl R., "Selecting An Appropriate Multiple Comparison Technique: An In-tegration of Monte Carlo Studies," Paper presented before the Annual Meeting of the American Educa-tional Research Association, March 19-23, 1982.
21. Steel, R. G. D., "Query 163: Error Rates in Multiple Comparisons," Biometrics, (1961), pp. 326-328.
22. Welsch, Roy E., "Stepwise Multiple Comparison Procedures," Journal of the American Statistical Association. LXXII (1977), pp. 566-575.
CHAPTER II
SYNTHESIS OF RELATED LITERATURE
Introduction
The history of multiple comparison procedures "suffers
from an embarrassment in riches seldom found in statistics"
(4-7, p. 53). A number of authors have described theoreti-
cally, mathematically, and preferentially various multiple
comparison techniques. The literature is filled with con-
tradictory assumptions, recommendations, and conclusions
concerning which multiple comparison procedure to use under
what circumstance (45> p. 4-).
The situation reflects the frustration of then Senator
Walter Mondale in a speech to the American Educational
Research Association in the early 1970's. Summing up the
results of his study of the research on integration in the
public schools, he said,
What I have not learned is what we should do about these problems. I had hoped to find research to support or to conclusively oppose my belief that quality inte-grated education is the most promising approach. But I have found very little conclusive evidence. For every study, statistical or theoretical, that contains a pro-posed solution or recommendation, there is always another, equally well documented, challenging the as-sumptions or the conclusions of the first. No one seems to agree with anyone else's approach. But more dis-tressing. no one seems to know what works. As a result, I must confess, I stand with my colleagues confused and often disheartened (44, p. viii).
15
16
Much of the confusion in the literature concerning
multiple comparison procedures stems from differing perspec-
tives on experimental error and the power of statistical
tests. A clear concept of error rate and power is essential
for understanding the distinctives of each procedure.
The Concepts of Error Rate and Power
Types of Error Rates
Three types of error rate were briefly defined in Chap-
ter I and are analyzed more fully here. A Type I error is
made when two population means are declared significantly
different when they are actually equal (55, p. 566). It is
rejecting HQ: y. = when HQ is true (31, p. 1374). It was
noted that Type I errors are treated differently by various
multiple comparison procedures. Some are based on an experi—
mentwise Type I error rate while others use a per comparison,
or comparisonwise (53, p. 539), Type I error rate (28, p.
278). An experimentwise error rate is the probability that
an experiment contains some incorrect decisions (4.7, p. 4.5).
It is a measure of the risk one takes in making one or more
Type I errors on all pairs of means within an experiment (33,
p. 884.). A per comparison error rate is the risk of making a
Type I error when testing a single pair of means within an
experiment (33, p. 884-5 51, p. 327; and 55, p. 566). The
constant relationship between the two kinds of Type I errors
is given by (41, p. 104; 51, p. 327)
17
(1-aew) = (1-apo)° Eq. 2
where c is the number of comparisons being made. Table I
illustrates the relationship between the two error rates.
TABLE I
COMPARISON BETWEEN EXPERIMENTWISE AND PER COMPARISON ERROR RATES
Number Means (k)
of Number of Comparisons (k(k-1)/2)
Experimentwise Error Rate
Per Comparison Error Rate
Set a = aew = °*05
3 3 .050000 .016952 4 6 .050000 .008512 5 10 .050000 .005116 6 15 .050000 .003413
10 45 .050000 .001139
Set a = a = 0.05 pc
3 3 .142625 .050000 4 6 .264908 .050000 5 10 .401263 .050000 6 15 .536708 .050000
10 45 .900559 .050000
It logically follows from equation 2 that as c in-
creases, the divergence between a e w and ap C increases. Table
I shows that selecting a per comparison error rate of 0.05
increases the risk of committing a Type I error across all
pairs. For example, with 10 means in an experiment, there is
a 90 per cent chance of committing at least one Type I error
in testing the 4-5 pairs. Selecting an experimentwise error
18
rate of 0.05 decreases the probability of committing a Type I
error with any given pair. With 10 means in an experiment,
the per comparison probability of committing a Type I error
in testing a single pair is 0.001. Fisher writes in 1937,
When the z_ test does not demonstrate significant differentiation, much caution should be used before claiming significance for special comparisons. Com-parisons, which the experiment was designed to make, may, of course, be made without hesitation. It is com-parisons suggested subsequently, by a scrutiny of the results themselves, that are open to suspicion; for if the variants are numerous, a comparison of the highest with the lowest observed value, picked out from the results, will often appear to be significant, even from undifferentiated material. . . . Thus, in comparing the best with the worst of ten tested varieties, we have chosen the pair with the largest apparent difference out of 45 pairs, which might equally have been chosen. We might, therefore, require the probability of the ob-served difference to be as small as 1 in 900, instead of 1 in 20, before attaching statistical significance to the contrast (18, pp. 65-66).
Fisher s "1 in 900" [0.0011] agrees with the per comparison
error rate [0.0011] computed in Table I when a is set at 0.05
and an experimentwise procedure is used. However, this per
comparison probability is suggested by Fisher for testing the
largest difference in the 4-5 pairs. A simultaneous experi-
mentwise procedure tests all pairs at this level. It will be
shown that this greatly reduces the ability of a test to
detect differences between paired means.
The formula for computing the per comparison Type I
error rate is given by Kirk in equation 3 (41 » p. 103).
Number of contrasts falsely declared significant
apc ~ ~ " ~ ~ " Eq. 3 Number of contrasts
19
The formula for computing the experimentwise Type I error
rate is given in equation 4- (4-1, p. 103).
Number of experiments with at least one contrast falsely
declared significant ae w
= Eq. 4-Number of experiments
A Type II error is made when two means are declared
equal when they are actually different (55, p. 566). It is
retaining a false null hypothesis (25, p. 209; 31, p. 1374.) .
The probability of making a Type II error is symbolized by
M M (3) (4-1, p. 36). Type II error is directly linked to
the power of a test. A more powerful test will make fewer
Type II errors, declaring H Q true when it is actually false,
than a less powerful test (25, pp. 210-211).
A Type III error is made when two population means are
declared different when they are, in fact, different, but in
reverse order (55> p. 566). That is, sample mean 2, drawn
from population 2, is declared significantly lower than
sample mean 1, drawn from population 1, when population 2 is
ctually larger than population 1 (26, p. 521). The proba-
bility of making a Type III error is symbolized by gamma (y)
(26).
The relationship among the three types of errors was
clearly illustrated by Harter as shown in figure 1 (26, p.
521). The area labelled a/2 is the region of rejection of
a
20
distribution one. It represents the probability of making a
Type I error. Any mean falling inside either ct/2 area is
considered significantly different from ^ . In figure 1,
sample mean A falls in this region of rejection. If sample A
was drawn from population 2, then a correct decision has been
made. If sample A was drawn from population 1, however, the
decision is incorrect and results in a Type I error.
1 2
Sample mean A
Type I error j
Type II error
Sample Mean B
Type III error
Sample Mean C
Power = 1-3
Fig. 1 Three kinds of error in hypothesis testing.
21
The area labelled 3 is that part of distribution two
which falls below the upper a/2 region of rejection of dis-
tribution one. This area represents the probability of
making a Type II error. A mean which was drawn from popula-
tion 2 is declared not significantly different from if it
falls in this region (25, p. 211-212). Sample mean B falls
in this region. If sample B was drawn from population 1 then
a correct decision has been made. If sample B was drawn from
population 2, however, the decision is incorrect and results
in a Type II error.
The region labelled y is that part of distribution two
which falls beyond the lower a/2 of distribution one. This
area represents the probability of making a Type III error
(26, p. 521). A sample mean which was drawn from population
2 is declared significantly lower than population 1 if it
falls in this region. Mean C falls in this region. If
sample C was drawn from population 1, then a Type I error has
been made. If sample C was drawn from population 2, then a
Type III error has been made. The level of occurrence of
Type III error was found to be little or no problem for any
of the multiple comparison techniques (6, p. 94.3; 55, p.
569).
A perpetual dilemma in statistical inference is that,
with regard to Type I and Type II errors, reducing the risk
of one increases the risk of the other. The decision con-
cerning which kind of error to control is not a mathematical
22
judgement but rather a subjective one (21, p. 99; 33, p.
886). The only way to simultaneously reduce both kinds of
error is to improve the research design itself. Increasing
the number of subjects, using more precise tools of measure-
ment, and choosing research designs which partition experi-
mental error into definable components, reduce both types of
error (8, p. 95; 31, p. 1375).
The Concept of Power.
The power of a multiple comparison procedure is defined
in terms of the number of comparisons it will identify as
significantly different (31 • p. 1374-)* This is directly
related to the size of critical value used by the procedure.
The procedure with the lowest critical value is defined as
the "most powerful" (48, p. 481) because it will declare more
pairwise differences significant than a procedure with a
higher critical value. Power is the complement of Type II
error and has a probability of 1-0 (56, p. 12). Figure 1
shows the power of a test as the area under distribution two
to the right of the demarcation line (-48). Since this demar-
cation line is set by the upper a/2 region, power is directly
related to alpha and can always be increased for any method
by increasing the likelihood of Type I error (50, p. 355).
Using Harter's diagram, the effect of these variables of
hypothesis testing on error rates and power can be shown.
Figure 2 illustrates the effect of mean difference between
23
two populations on error rates and power. As the difference
between means grows, 3 decreases and power increases. That
is, a test will make fewer Type II errors and will be able to
detect differences more easily as the difference between
means increases.
Ui - U
a = 0.05
Fig* 2 Effect of size of difference between means on error rates and power.
2K
Figure 3 illustrates the effect of population variances
on error rate and power. One can increase the power of a
test for a given difference between means by decreasing the
variability of measurements. This can be done by studying
more homogeneous populations, using more precise instruments
or increasing the sample size (4.1, p. 39).
Ui = u 2
° 1 < ° 2
a = 0.05
power, Fig. 3 Effect of population variance on error rates and
25
The choice of an experimentwise or comparisonwise error
rate directly affects "the power of a "test. Figure 4- shows
two sample means being tested by two different multiple com-
parisons. The power resulting from the use of a per compari-
son procedure and alpha set at 0.05 is represented by the
area to the right of line 1. The power resulting from the
use of an experimentwise procedure and alpha set at 0.05 is
represented by the area to the right of line 2. It is ob-
vious that the experimentwise procedure will not declare as
many differences significant as the per comparison procedure
and is therefore considered less powerful (31, p. 1374.).
a = a = 0.05
a = aew = °'°5
apc = °'°5 V = °- 0 0 1
Fig. 4. Effect of per comparison and experimentwise Type I error rates on power. J *
26
Implications of Error Rates and Power
Is it more important to retain true null hypotheses or
reject false ones? Kemp [1975] reports Gill's [1973] con-
tention that avoiding experimentwise Type I errors is very
important (23; 31, p. 1375). Gill writes eight years later
that putting one's trust in a comparisonwise error rate "does
not restrict false findings sufficiently. . . .the frequency
of publications of false positives already far exceeds the
nominal rate that scientists believe they are operating with"
(24, p. 1506). Tukey [1953] and Ryan [1959] both support the
experiment rather than the comparison as the unit of study
and therefore recommend the experimentwise error rate (4.7, p.
52). A practical application of this concern is voiced by
Barcikowski. If an experiment shows erroneously that teach-
ing by television produces significant improvement in learn-
ing (Type I error), then huge sums of money will be invested
in equipment and teacher training. If no significant dif-
ference is found erroneously (Type II error), no action is
usually taken. Therefore, avoiding Type I errors is of
greatest importance (3).
These writers use an experimentwise error rate to mini-
mize the publishing of erroneous conclusions in the
literature. A per comparison rate yields "more fictitious
results" than the experimentwise rate. Replication is seldom
done. Findings often stand on one experiment. Therefore,
extreme caution is warranted. In summary, "it is better to
27
punish truth than to let falsehood gain respectability" (4.7,
P. 53).
Carmer and Walker take an opposing view. If the unit of
interest is the individual comparison rather then the entire
experiment of k(k-1)/2 comparisons, then the experimenter
should not be penalized for using an efficient experimental
design (10, p. 13)• If the unit of interest is indeed the
individual comparison, then a comparisonwise error rate and a
comparisonwise multiple comparison procedure is justified.
The emphasis under these conditions is not avoiding experi—
mentwise Type I errors, but avoiding Type II errors. In
January 1985, Carmer stated that this 1983 report, with minor
revisions, was scheduled to be published in the Spring 1985
issue of The Journal of Agronomic Education (11). It there-
fore represents Carmer's most recent view of the multiple
comparison problem.
Carmer suggests this practical example. If a variety of
plant or fertilizer is declared superior to another when in
fact the two are equally effective (Type I error), there is
no economic loss if the two cost the same. But when two
varieties are declared the same when in fact one is superior
(Type II error), economic loss occurs if the inferior variety
is chosen. Therefore, Type II errors are more costly to
research users than Type I errors (8, p. 97).
Kemp agrees with Carmer's reasoning. Using milk pro-
duction as an example, he poses the same general question; If
28
two ratios of feed yield equal amounts of milk, is great harm
done if the researcher declares one to be superior to the
other? If the two rations are really equal and the experiment
is carefully designed, then the difference between the sample
yields would be too small to justify a more expensive ration
to obtain the difference, whether it was "statistically sig-
nificant" or not. Several experiments would be run before
changes were made, and the probability of finding significant
differences, given equal population means, over several ex-
periments is very low (31, p. 1375). Duncan and Brant write
in support of the comparisonwise error rate, stating that the
objectives are the same in the simultaneous testing of m com-
parisons in one experiment as if each test were being made in
a separate experiment (12, p. 794.).
Therefore, some writers shun procedures which use a com-
parisonwise Type I error rate because their greater concern
is the avoidance of Type I errors (24, 50, 55). Others shun
procedures which use an experimentwise Type I error rate
because their greater concern is avoiding Type II errors and
maximizing power (6, 7, 9, 10, 11, 12, 31).
Is there a distinction between researchers who desire
power to detect differences on the one hand and statistical
theorists who prefer avoiding false positives on the other?
The literature has shown support for such a distinction.
However, Kirk, in a personal letter, states that he does not
conceptualize the problem of multiple comparisons as a
29
"battle between theory and practice, although there are
schools of thought concerning the appropriateness of various
multiple comparison procedures" (4.2).
The Development and Definition of Multiple Comparison Procedures
The solution to the problem of hypothesis testing be-
tween two means when the population standard deviation is
unknown "might well be taken as the dawn of modern inferen-
tial statistical methods. It was found in 1908 by William S.
Gossett who published it under the pseudonym 'Student'" (25,
p. 217). The method is called the t-test, and when applied
one time to data from two samples at the 0.05 level of sig-
nificance, then the probability of committing a Type I error
is indeed 0.05 (25, p. 305). The original multiple compari-
son procedure was the multiple t-test, which applied
Student's t-test to paired means within an experiment (21, p.
97). That is, an experiment with four treatment means would
aPPly the t-test to the six pairs of means to determine which
means were significantly different from the others. The
problem with this approach is that multiple applications of
the t-test inflate the Type I error rate of the experiment.
If one were to set the level of significance to 0.05 and make
the six comparisons between paired means, the true proba-
bility of committing a Type I error increases to 0.265. This
is given by
30
p = 1 - (1 - a)c Eq. 5
where c is the number of independent comparisons (25, p.
325). The Type I error rate probability grows rapidly with
the number of comparisons (4.7, p. 4-3).
In 1925, English statistician Sir Ronald A. Fisher
published a solution to testing more than two means without
increasing the Type I error rate in Statistical Methods for
Research Workers (17). In this work he acknowledged his debt
to Student in no uncertain terms.
The study of the exact sampling distributions of statistics commences in 1908 with "Student's" paper The Probable Error of a Mean. . . ."Student's" work was not quickly appreciated, and from the first edition [1925] it has been one of the chief purposes of this book to make better known the effect of his researches. . . (17 pp. 24.-25).
But the contribution and influence of Fisher far surpassed
that of the man purported to be his master (30, p. 6).
The statistical technique developed by Fisher to solve
the problem of testing k > 2 means without increasing Type I
error is known as the analysis of variance (ANOVA). This
procedure evaluates whether there is any systematic dif-
ference among a set of k means. A significant F-ratio indi-
cates that the variance among the experimental means is
greater than one would expect if the null hypothesis is true
(25, p. 325).
Often, however, the fact that "one or more means differ"
is less important to the researcher than which means differ.
The multiple t-test can be employed to determine this, but
31
not without undoing what the ANOVA was designed to correct.
Fisher suggested a solution for the multiple comparison
problem in 1935 which he called the Least Significant Dif-
ference (LSD). Federer, in the "only textbook [in 1957] to
discuss multiple comparison procedures" (26, p. 515),
describes several variations of the LSD. One is the multiple
t-test, defined as the standard error of the mean times ST
times the value of t at the 0.05 level of significance for
the number of degrees of freedom associated with the standard
error (15, p. 20). Another variation is the Most Significant
Difference, MSD, which uses the 0.01 level of significance
for the t-distribution. The third variation described by
Federer required a significant F-ratio to precede the ap-
plication of the multiple t-test (15, p. 21). This third
variation is the one given by Kirk as the definition of the
LSD. The test consists of "first performing a test of the
overall null hypothesis with the ANOVA." If the F-ratio is
significant, then apply the multiple t-test. If the F-ratio
is not significant, no pairwise comparisons are made (41, p.
115).
These variations have caused a great deal of confusion
m the literature with regard to the "LSD." It will be shown
m a later section that some apply the LSD without the
preliminary F-test. This is the multiple t-test, sometimes
called the "unprotected LSD" (26, p. 513), the "ordinary
LSD," the "unrestricted LSD" (10, p. 10) or simply the LSD
32
(5). Others have emphasized the importance of the prelimi-
nary F-test and use such terms as the "two-stage strategy"
(33, p. 884.), the "Fisher-protected" LSD (7, p. 67), the
"restricted" LSD (10, p. 11) or the LSD (41, p. 115). It is
sometimes difficult to distinguish which procedure is in-
tended when the literature reports findings on the "LSD."
The unprotected LSD provides the least protection against
Type I errors of all the multiple comparison procedures.
This is the reason for its name (49, p. 312). The FLSD
provides greater protection against experimentwise Type I
errors than the unrestricted LSD because of the preliminary
F-test (7, p. 67), but still yields an experimentwise rate
above nominal alpha because of its per comparison definition
(8, p. 99).
Another procedure, the Student Newman-Keuls (SNK),
developed from the work of Student (1927), Newman (1939), and
Keuls (1952), uses an ordered set of means and a range of
critical values rather than a single critical value for all
comparisons. Sample means are ordered from the smallest to
the largest. The largest difference, r = k means apart, is
tested first at a level of significance. If this difference
is significant, then means that are r = k-1 steps apart are
tested at alpha and so on. The actual Type I error rate is
neither per comparison nor experimentwise, but falls some-
where between the two (41, p. 123). The SNK uses a different
critical value for each value of r. For unequal n's,
33
Barcroft suggests using the harmonic mean (n') computed from
all the group n's in the experiment so long as the group
sizes are similar (16, p. 312; 28, p. 12). Another approach
is to substitute the harmonic mean of the two samples being
compared (n,!) for n in the formula. The latter leads to
less bias than the former (28, p. 12) and was used in this
study.
In 1953, responding to the criticism of the high experi-
mentwise Type I error rate of the LSD (6, p. 943), J. W.
Tukey developed a conservative multiple comparison procedure
called the Honestly Significant Difference (41, p. 115). The
HSD is one of the most widely used multiple comparison proce-
dures for evaluating pairwise comparisons among means (4-1 » p.
116). It is similar to the LSD in that it is a simultaneous
test procedure, or STP (20, p. 4-87). That is, it uses one
critical value for all comparisons. The HSD is defined on an
experimentwise basis and therefore controls the Type I error
rate for all pairs to nominal alpha regardless of the number
of means in the experiment (31, p. 1375).
The HSD assumes equal sample sizes. Several modifi-
cations have been proposed for unequal sample size situ-
ations. The two selected for this study were the Tukey-
Kramer, developed in 1956, and the Spjjtftvoll-Stoline,
developed in 1973 (41, pp. 118-120). The Tukey-Kramer
modification uses the harmonic mean of the n's of the two
means being tested (41, p. 120). This procedure generally
34
controls the rate of experimentwise Type I error and is as
sensitive to treatment differences as other recommended pro—
cedures (36, p. 127). The Spjjrfvoll-Stoline uses n . , the m m 7
smaller of the n's of the two means being tested. Since
critical values increase as n decreases, the use of n min
generates a more conservative test than the harmonic mean of
n's used in the Tukey-Kramer procedure (41, p. 119).
Also in 1953, H. Scheffe' developed the most flexible and
conservative of the multiple comparison procedures (16, p.
121). The Scheffe' Significant Difference (SSD) can be used
not only to evaluate pairwise comparisons, but can also be
used to test any combination of means against any other
combination of means within the experiment. This flexi-
bility, however, reduces its ability to detect pairwise
differences (4-1, p. 122). In fact, the test is so conserva-
tive that it may not detect any significant differences among
means even when the overall F-ratio is significant (4.9, p .
315).
In 1955, David B. Duncan developed the Multiple Range
Test (MRT). The MRT follows the same testing procedure as
the SNK but uses a different critical value table ( 4 1 , pp.
824-825). The MRT table gives critical values which provide
a k-mean significant level equal to 1 - (1 - a)k~1 ( 4 1 , p .
125). The farther the means are separated by rank, the more
lenient the standard of significance becomes for the MRT pro-
cedure as shown in Table II (16, p. 311 ) .
35
TABLE II
COMPARISON BETWEEN ERROR RATES OF SNK AND MRT PROCEDURES
Number of Means
Formula MRT a
SNK a
2 3 A 5 1
1 1
1
1 1
1 1
• • •
• o o o o
Ul ^ VuJ ?\J
1 1
1 1 0.05
0.0975 0.U26 0.1855
0.05 0.05 0.05 0.05
When alpha is set at .05 in an experiment with five
means, the actual test of the largest difference will be made
at experimentwise error rate varies with the number of treat-
ments in the experiment which "does not make sense in the
real world" (9, p. 123). Unequal n's are handled in the same
manner as with the Newman-Keuls Test.
In 1965, Duncan proposed a modification to the LSD. The
procedure included the use of Bayesian statistical principles
in examining prior probabilities of decision errors and was
called the Bayesian Least Significant Difference (BLSD). The
procedure is named for Thomas Bayes. Fisher writes that
Bayes' "celebrated essay published in 1763 is well known as
containing the first attempt to use the theory of probability
as an instrument of inductive reasoning; that is, for arguing
from the particular to the general, or from the sample to the
population" (17, p. 22). Duncan's BLSD allows an experi-
menter to choose a value of k which represents the ratio of
36
relative seriousness of Type I to Type II errors. The BLSD
approximates the LSD when the F-ratio is large. But when F
is small, less than 2.5, the BLSD is more conservative and
tends to approximate the HSD (6, p. 942). The BLSD was found
to be as powerful as the FLSD by Carmer and Swanson in 1971
(6, p. 945).
Multiple Comparisons in Graduate Research
A prominent source of information concerning which
multiple comparison procedures are recommended for analyzing
actual research data is the doctoral dissertation. A search
°̂ > Dissertation Abstracts International in Education and
Psychology revealed one hundred fifty-seven dissertations
which had used one hundred sixty-two multiple comparison pro-
cedures in analyzing their data. The most popular procedure
since the earliest cited dissertation (1976) has been the
Student Newman-Keuls Test. The SNK accounted for 25 per cent
of procedures selected. Duncan's Multiple Range Test ac-
counted for another 19 per cent. Together, the range tests
made up U3 per cent of the procedures. The more conservative
procedures, the Scheffe Significant Difference and Tukey's
Honestly Significant Difference, accounted for 31 per cent of
the procedures. The more liberal, the Least Significant
Difference and the Fisher-protected Least Significant Dif-
ference, accounted for 26 per cent of the procedures. Table
III shows the frequencies of use of the various procedures.
37
TABLE III
MULTIPLE COMPARISON PROCEDURES USED IN DISSERTATIONS ON FILE WITH
DISSERTATION ABSTRACTS INTERNATIONAL
Procedure Frequency of Percent of Name Use Use
Newman-Keuls 39 24.07% Multiple Range Test 30 18!52° Scheffe 29 17^90 Fisher-protected LSD 21 12.96 Unprotected LSD 21 1?*q£ Tukey HSD 2 1
Bayes Exact Test 1 0.01
162 99.38%-;
Rounding error
The Critical Difference Values of Multiple Comparison Procedures
Given a specific experimental situation, critical dif-
ference values are computed differently for each multiple
comparison procedure.
The Least Significant Difference.
The LSD critical difference is given by (27, p. 268; 28,
p. 295; 4-1 f P- 115)
LSD = t(a/2,v) /(2MSW/n) Eq. 6
where t is the Student's t-distribution table value, a/2 is
the upper portion of the level of significance, v is the
38
within degrees of freedom, MSW is the mean square within
value, and n is the number of subjects in each sample.
Carmer and Swanson use a slightly different form of equation
6 and give the LSD critical value as (6, p. 941; 7, p. 67;
10, p . 10)
LSD = t(a,v) sd Eq. 7
where s^ is the standard error of difference of the two
groups. It is clear from equations 6 and 7 that sd is equal
to /2MSw/n. The relationship between Student's t Distri-
bution Table (4-1, Table E.4-), and the Studentized Range Table
(4-1 , Table E.7) is given by
t(a/2,v) = q(a,2,v)//2~ Eq. 8
Therefore equation 6 can be rewritten in terms of the Studen-
tized Range table as shown in Equation 9.
LSD = q(a,2,v) /MSw/n Eq. 9
This is equivalent to Carmer's formula in equation 7. The HSD
and SNK procedures both use the Studentized Range table. By
applying the relationship in equation 8 and using equation 9
in testing, the Studentized Range table may be used for the
(F)LSD as well.
The LSD is considered by many to be an appropriate pro-
cedure if its use is restricted to experiments in which the
analysis of variance F value is significant (FLSD) and the
experimenter's interest is in the individual pairwise con-
strasts rather than the overall test ( 1 , p . 194.; 8, p . 95;
13, p. U0; 29, p. 72; 31, p. 137$; 33, p. 8 8 4 ) . Several
39
studies found that the FLSD yielded the greatest power and an
"acceptably low" error rate ( 2 , 19 , 46, 5 4 ) .
Tukey's Honestly Significant Difference.
The critical difference for the HSD is given by ( 41 , p .
116).
HSD = q(a,k,v) /(MSW/n) Eq. 10
where q() is the Studentized Range critical value and k is
the number of means in the experiment. The key difference
between the (F)LSD and the HSD procedures is the value of r
used in entering the Studentized Range table. For the
(F)LSD, r always equals 2. For the HSD, r equals the number
of means in the experiment (r=k). Table IV demonstrates the
effect of r on the critical difference used in the two proce-
dures with varying k's.
TABLE IV
COMPARISON OF CRITICAL VALUES OF THE (F)LSD AND HSD MULTIPLE COMPARISON PROCEDURES AS k VARIES
(Kirk Table E.7 values)*
k Value o f r Critical Values
(F)LSD HSD (F)LSD HSD
2 4 10
2 2 2
2 U 10
3.15 3.15 3.15
3.15 4.33 5.60
Q II c • c v=10
40
The result of the increasing critical value in HSD is a
"reduction in power" (7, p. 74). That is, it becomes in-
creasingly difficult to detect differences as the number of
groups in the experiment grows. However, it is this increas-
ing critical value that allows the HSD to maintain the
experimentwise Type I error rate at a. The constant value of
the (F)LSD allows the experimentwise error rate to increase
as k increases (31, p. 1377).
It should be noted that some take issue with Carmer and
Swanson's use of the term "reduction in power". They state
that this increasing value
. . .should not be interpreted to mean that the lukey test is less powerful than other multiple compari-
P ^ o c e ^ r e s a s suggested by Carmer and Swanson l l l l V ' . T h e, sensitivity of the Tukey test is predic-t s y "than other procedures (such as Newman-Keuls, Ryan and Duncan), since in its development, the test sets a different rate of Type I error. As Einot and Gabriel point out, the more power-concerned analysts can increase the sensitivity of the Tukey test by merely manipulating its Type I error (40, p. 586) [Einot and Gabriel suggest using the HSD at a = 0.25 in order to increase its power (14., p. 577)].
Aitkin comments that the "lack of sensitivity objected to in
experimental error rates" is brought about by using conven-
tional 0.05 or 0.01 levels of significance. "In many experi-
mental situations when the null hypothesis is known a priori
to be false, it is appropriate to increase substantially the
experimental error rate above these levels" (1, p. 193).
At any rate, the issue appears to be one of semantics.
Of what value is it to use the HSD with a higher probability
41
of committing experimentwise Type I errors over the (F)LSD
which, by definition, has a higher experimentwise Type I
error rate? And what does the term "reduction in power" mean?
It means nothing more than that the procedure declares fewer
comparisons significant than another. Not one writer dis-
putes the fact that the HSD will declare fewer comparisons
significant than the (F)LSD.
Games, Keselman, and Clinch call the HSD the "most
powerful simultaneous multiple comparison technique that
controls Type I error rate over the set or family of
comparisons" (22, p. U 2 ) . Still, Kemp says that it "lacks
sufficient power to be useful in sorting out which treatments
are different" (31, p. 1377) a n d q u o t e s from Thomas [52] that
the HSD is "too conservative to be practical." Unless the
researcher is extremely concerned about Type I errors, use of
the HSD is "ruled out because of its poor sensitivity to real
differences" (6, pp. 944-9-45).
There is some controversy whether the HSD should be
applied after a significant F-ratio. Keselman suggests this
two stage strategy (32, p. 101). Carmer and Swanson apply
the HSD whether the F-ratio is significant or not (Carmer,
1973, p. 67). One study suggests that applying the HSD in a
two-stage procedure with ANOVA could lead to a "Type IV
error." Since the HSD is not directly related to the F
statistic, it could detect differences undetected by ANOVA,
hence, a Type IV error ( 4 3 ) . Keselman and Murray studied
42
this possibility and state that researchers "need not be
concerned about committing a Type IV error, whether the
concept is theoretically valid or not" (34, p. 609).
There are actually two Tukey test statistics. The
preceding discussion has focused on the HSD, or Tukey A, as
some call the test (21, p. 98; 28, p. 303). This is the more
conservative of the two tests and the one used by Carmer and
Swanson in their studies (50, p. 355). The other test is
Tukey's Wholly Significant Difference (WSD), or Tukey B. The
critical value of the Tukey B is the mean of the critical
values for the SNK and HSD procedures. This is stated by
Howell as (28, p. 303)
q(WSD) = (q(k) + q(r))/2 Eq. 11
With this modification, the WSD proceeds in the same manner
as the SNK. Therefore, the WSD is more powerful, but more
complex and provides less protection against Type I errors
than the HSD (21, p. 98). Some confusion comes from authors
who treat the HSD and the WSD as the same test (41, p. 116).
A common occurrence in educational research is unequal
sample sizes. Several modifications of the HSD have been
suggested for this situation. The .Spjrftvoll and Stoline
modification uses n m l n, the n of the smaller group in the
comparison, in place of n in the HSD equation (41, pp. 118-
119). This modification also uses q' from the Studentized
Augmented Range Table (41, p. 846) in place of q. The Kramer
modification uses the sample sizes of the two means being
4-3
compared by replacing MSW/n with MSW( 1/.n.+1/n . )/2 (41, p.
120). This procedure controls experimentwise Type I error
and is as sensitive to differences as others (38, p. 51).
Myette and White found the Kramer modification to be the most
accurate (4.5, p. 14.). Kirk recommends the use of Spj^tvoll
and Stoline when sample sizes are nearly equal and the Kramer
modification when they are moderately to severely unequal
(41, p. 120).
Both the F(LSD) and HSD procedures are simultaneous test
procedures (20, p. 485) which use a single critical value
regardless of how much separation exists between ranked means
(16, p. 311). If a study has five means, the HSD critical
value is 4.650 for a = 0.05 and v = 10 (see Table IV). This
one value is then applied to all pairwise comparisons, no
matter how many ranks separate the two means. The next two
procedures allow for a stepwise, or layered, approach to
making pairwise comparisons. The separation distance between
ranked means is taken into account when the critical dif-
ferences are determined (41, p. 123). These two procedures
are the Student Newman-Keuls Test and the Duncan Multiple
Range Test.
The Student-Newman-Keuls Range Test.
The range of critical differences for the Student
Newman-Keuls is given by (41, p. 124)
SNKr = q(ot,r,v) /(MSW/n) Eq. 12
u
where q(a,r,v) is the critical value from the Studentized
Range table, and r is the distance between ranked means. The
critical differences for the SNK are compared with the (F)LSD
and HSD in Table V.
TABLE V
COMPARISON OF CRITICAL VALUES OF (F)LSD, HSD, AND SNK MULTIPLE COMPARISON PROCEDURES AS r VARIES
(Kirk Table E.7 values)*
(F)LSD HSD r=2 r=k=5
SNK r
5 3.15 4-.65 4.. 65 1 3.15 4.65 4.33 \ 4-65 3.88 2 3.15 4.65 3.1$
*a = 0.05, v = 10, k = 5
The first test made in the SNK procedure is on the difference
between the largest and smallest means in the experiment.
The r-mean range in this case is equal to the number of means
in the experiment. That is, r = k. For this test, the SNK
critical difference equals the HSD and yields a conservative
test for this largest difference. If this largest difference
is declared significant, then the means k-1 steps apart are
tested at the r = k-1 level of significance. As the distance
between ranked means (r) decreases, the critical value be-
comes more liberal. When r = 2, the SNK critical value
4-5
equals that used by the (F)LSD. See Appendix D for a
detailed analysis of the procedure. This range of critical
values attempts to balance the seriousness of committing Type
I and Type II errors (4-1, pp. 123-125). Dunnett writes, "For
significance testing, I think it is generally agreed that the
Newman-Keuls Test is preferable to Tukey's" (13, p. UO).
But Ramsey faults the test for its "clearly inflated experi-
mentwise Type I error rate" (4.8, p. 4.82).
The Duncan Multiple Range Test.
The critical difference for the MRT is given by (4.1, p.
126)
MRTr = m(a,r,v) /(MSW/n) Eq. 13
where m is the critical value from Duncan's Multiple Range
Table. The Duncan Multiple Range Test follows the same
stepwise procedure as the Student Newman-Keuls, but uses the
Duncan New Multiple Range Test table rather than the Studen-
tized Range table. The use of this table provides critical
values that vary with the number of treatments in the experi-
ment. While the SNK maintains alpha for each pair of ordered
means (4.1, p. 126), the MRT does not. The level of a for ew
the MRT increases with the number of treatments in the
ranking and should only be used "by researchers stranded
somewhere between reality and the Wonderful World of Statis-
tical Theory" (9, p. 123). Table VI shows critical dif-
ferences for procedures discussed to this point.
46
TABLE VI
COMPARISON OF CRITICAL VALUES OF (F)LSD, HSD, SNK, AND MRT MULTIPLE COMPARISON PROCEDURES
AS r VARIES*
r (F) LSD"'* r=2
HSD** r=k=5
SNK** r
MRT*** r
5 4 3 2
3.15 3.15 3.15 3.15
4.6 5 4.65 4.65 4.65
4.65 4.33 3.88 3.15
3.43 3.37 3.30 3.15
*a = 0.05, v = 10, k = 5 ;"Kirk Table E.7 ***Kirk Table E.8
Carmer and Walker go on "to criticize the layered ap-
proach to multiple comparison testing because it confuses the
issue of error rates. It is better in their opinion to
choose an error rate, whether it be per comparison or experi-
mentwise, and then apply the best test for that error rate.
With multiple range tests the difference between two treatments required for significance depends on [k] + ' d° e S n o t m a k e m u c h s e n s e to think that the true difference between two treatments depends in any way on what other treatments are included in the experi-J? oiir' * * f-We-' r e c o m m e n d that neither the DMRT, nor
e SNK, nor any other multiple range procedures ever be used lor comparisons among treatment means (10, p. 21).
The. Scheffe Significant Difference.
The Scheffe Significant Difference is unlike the other
procedures in that it is directly tied to the F-test. Its
critical difference is given by (4.1, p. 121)
SSD = /L(k-1) F c y] v/LMSW Z(c /n )] Eq. U J J
47
where k is the number of groups in the experiment. F is the cv
critical value for the omnibus F-test, MSW is the mean square
within value, c^ is the contrast factor for the jth group
(which equals 1 for pairwise comparisons), and n. is the J
number of subjects in the jth group. For pairwise compari-
sons between groups of equal size, the formula simplifies to
SSD = /[(k-1)F ] /[2MSW/n]. Eq. 15
Some writers state that the Scheffe "should never be
employed for pairwise comparisons" because of its extreme
conservatism (7, p. 73; 28, p. 304.). Others laud the pro-
tection against experimentwise Type I error afforded by the
Scheffe, even with its lack of power. Gill writes "Only
Scheffe's procedure, among well-known methods, can guarantee
any rational assessment of strength of evidence" (24, p.
1506). Petrinovich and Hardyck state that the simplest
approach to multiple comparison procedures would be to apply
the Scheffe as the initial test of choice. A significant
Scheffe means that wrong conclusions are unlikely. A non-
significant Scheffe could be followed by the Tukey HSD (47,
p. 53). Games counters this position when he writes
The [Petrinovich and Hardyck] recommendation is one arbitrary point on a continuum of choice . . . . when one specifies a conservative test, and then says that if this conservative test is not significant, he will use a more liberal test, he is merely adding ambiguity and inconsistency to his decision rule" (21, p. 100),
Games recommends using a test and its associated error rate
which matches the questions which the experimenter wants to
48
answer (21, p. 101). Still others simply report that the
Scheffe is expected to be overly conservative when used for
pairwise comparisons (39, p. 513).
Table VII gives a final summary of critical differences
for each of the procedures selected for this present study.
These differences were computed for actual data in the Monte
Carlo study.
TABLE VII
COMPARISON OF CRITICAL VALUES OF (F)LSD. HSD SNK, MRT AND SSD MULTIPLE COMPARISON PROCEDURES
AS r VARIES*
r (F)LSD** HSD** SNK** MRT*** SSD r=2 r=k=5 r r
$ ?' ™ n*503 1 0* 5 0 3 8* 1 2 1 1 2- 2 60 L 7*i?n S'S? 10*°^2 7* 9 6 7 1 2' 2 6 0
\ 7* 7n 9 *^28 7 * 7 8 8 12.260 2 7*]?n ' 8 , 6 0 5 7- 5 3 2 12.260 2 7.170 10.503 7.170 7.149 12.260
a = 0.05, n = 20, v = 1U, k = 6 Interpolated values from Kirk Table E.7 Interpolated values from Kirk Table e!8
It is obvious from these values that the (F)LSD will declare
more pairwise differences significant than any of the others
because its critical difference is the smallest. Likewise
the SSD will declare fewer differences significant than any
other because its critical difference is the largest. In
fact, on the basis of critical differences alone, we can rank
the procedures according to expected error rates and power
49
from (F)LSD (high) through MRT, SNK, HSD to the SSD (low).
In summary, it appears that answers to the question of
which multiple comparison technique to use in a given situ-
ation are more philosophical than mathematical. Those who
are deeply concerned about avoiding Type I errors recommend
the more conservative, but less powerful a procedures such 0 W
as SSD and HSD. Those who are more concerned with power and
significance of specific contrasts recommend the more power-
ful, but less protected a procedures such as (F)LSD and
SNK.
The decision as to which error rate is more appropriate
and what significance level is to be used must, finally, be
made by the experimenter (51, p. 327). The method of choice
depends on the experimenter's decision regarding which type
of error he would most like to minimize (4.7, p. 50). If an
investigator chooses a definition of error rate, a test, and
a significance level then this information permits him to
compute the significance level for any other suitable test
and error rate. This knowledge eliminates any paradox (51,
p. 328).
In March of 1982, Myette and White presented a paper at
the Annual Meeting of the American Educational Research
Association. They had attempted to synthesize the findings
of all Monte Carlo studies of multiple comparison procedures
m order to overcome the apparent contradictions of the
existing studies and bring some clarity to the problem of
50
choosing an appropriate multiple comparison technique. Only
twelve of the twenty Monte Carlo studies provided enough
codable data to permit synthesizing of results. But one key
conclusion was that the two-stage t-test, or FLSD, seemed to
provide a parsimonious solution to the bewildering problem of
selecting a multiple comparison procedure (4.5, p. 17). They
focused on two studies which had recommended this procedure:
Carmer and Swanson's 1973 study and Bernhardson's 1975
study. These studies form the heart of the present inves-
tigation and are now analyzed.
The Research of Carmer and Swanson
The 1973 study of Carmer and Swanson referenced by
Myette and White was really an extension of one published in
1971. The 1971 study focused on five multiple comparison
procedures: the unprotected LSD or multiple t-test, the
protected FLSD, the HSD, the MRT, and the BLSD. Eight sets
of 10 treatment means and seven sets of 20 treatment means,
each set possessing a different level of homogeneity, were
tested with 3, 4-> 5, and 6 replicates, or scores per sample.
Data was categorized in a randomized complete block design.
One thousand replications of each experimental condition
produced (1000 reps x 15 sets of means x /, levels of sample
size=) 60,000 experiments. Type I, II, and III error rates
were computed for each procedure. The observed rates of
correct decisions when real differences occur and the rates
51
of the three types of possible errors indicated that the
FLSD, the MRT, and the BLSD are more appropriate for use in
research than the LSD or the HSD. "Based on the statistical
properties observed in the study, a choice among these three
is difficult; however, the FLSD may be preferred due to its
familiarity to researchers and its simplicity of application"
(6, p. 94-0).
The 1973 study was an expansion of the 1971 study. Ten
multiple comparison procedures were studied in 1973. The
unprotected LSD, the MRT, and the HSD were carried over from
the 1971 study. The FLSD was tested at three levels of
significance for the F-test: the FSD1 applied the LSD when
the F-test was significant at the 0.01 level; the FSD2 ap-
plied the LSD when the F-test was significant at the 0.05
level; and the FSD3 applied the LSD when the F-test was
significant at the 0.10 level. The BLSD was renamed the
Bayes Significant Difference (BSD). The Student Newman-Keuls
(SNK), the Scheffe Significant Difference (SSD), and the
Bayes Exact Test (BET) were added to make a total of ten pro-
cedures .
Seven sets of 5 means, 8 sets of 10 means and 7 sets of
20 means, each with varying homogeneity, were tested with 3,
4-, 6, and 8 replicates. Data was categorized in a randomized
block design. One thousand replications of each experimental
condition produced (1000 reps x 22 sets of means x 4 levels
of sample sizes=) 88,000 experiments. Type I, II, and III
52
error rates were computed for each procedure. The observed
rates of correct decisions when real differences occurred and
the observed rates of the three types of possible errors
indicated that the FSD2 and the BET are more appropriate for
use in research than the FSD3, the LSD or the MRT, which
produced excessive experimentwise error rates, or the FSD1,
SSD, or HSD, which lacked the power to detect real dif-
ferences (7, p. 7A)• Carmer and Swanson again recommended
the most powerful and parsimonious procedure as the method of
choice: the LSD should be applied after a preliminary F-test
is found significant at the 0.05 level (7)
The focus of criticism leveled at the Carmer and Swanson
recommendation is the per comparison error rate of the FLSD
and its accompanying excessive experimentwise error rate (U,
24-, 47, 50). Carmer is not disturbed. He writes, "To me the
experimentwise error rate is of little or no use. One simply
does not care about it. The comparison is the unit of in-
terest in virtually all experiments I have been involved
with" (11). in his satirical look at the multiple comparison
problem, Carmer has Baby Bear, a young researcher, saying
that theoretical statisticians were looking for honey up the
wrong tree when they invented experimentwise error rates" (9,
p. 123). This perspective grows directly out of Carmer's
view of the role of the experiment. A study of his use of
the randomized block design provides insight into this view.
53
The Carmer and Swanson Model
Carmer and Swanson used a randomized complete block
design in both 1971 and 1973. Though the model is not
clearly described in educational research terms in the Carmer
and Swanson articles, this design is confirmed by Welsch (55,
p. 568) and most recently in personal letters from Kirk and
Carmer (11, 42). This design is a two-way ANOVA with treat-
ments as one variable and blocks as the other. Each treat-
ment cell contains one score (41, p. 238). This design is
shown in Figure $.
TREATMENTS
A1 A2 A3 Aj
B1 Y11 Y12 Y13 ... Y1j
B L
B2 121 Y22 Y23 ... Y2j
0 C
B3 131 Y32 Y33 ••• 13j
K •
•
•
• • •
S • • •
• •
Bn Yn1 Yn2 Yn3 •
... Ynj
Fig. 5—Randomized block desi gn
The blocking variable in the Carmer and Swanson studies were
replications of treatments which took values of 3, 4, 6, and
8. A replication m agricultural research is synonymous with
a score or "subject" in educational research (4.1, p. 238).
54
The reason for using "the randomized block design grows
out of practical considerations of agricultural research.
Carmer and Walker shed light on this reasoning by way of an
example. Given fifteen cultivars, how might a researcher
determine which cultivars differ significantly in their yield
from the others? One approach is to pair all fifteen cul-
tivars with the others. This yields (15x14/2 =) 105 pairs of
cultivars. Using four replications for each of two cultivars
requires eight plots. Eight plots for 105 pairs requires 840
plots to conduct the experiment. Using the randomized block
design, all fifteen cultivars with their four replications
each can be tested with sixty plots. This is shown in Figure
6.
COLUMNS
R 0 ¥ S
1
1 X 2 X 3 X 4 X
X X X X
15x14/2 = 105 trials CxRxl05 = 840 plots
1 2 3 4
X X X X
X X X X
COLUMNS
3 4
X X X X
X X X X
X X X X
1 trial CxR = 60 plots
15
X X X X
Fig. 6 -Comparison of two research designs
It is good economy to reduce the required plots from 840
to 60. Further, this change yields a more efficient statisti-
cal design in that the first approach yields three error
degrees of freedom (v=3) while the latter yields forty-two
55
(v=42) (9, p. 122). The unit of interest, the individual
comparisons between the 105 pairs of cultivars, has not
changed. Therefore, the per comparison, not the experiment-
wise error rate, is of primary interest to the researcher.
Calculating the experimentwise error rate for the randomized
complete block design in figure 6 yields a value of 0.78.
However, calculating the experimentwise error rate for the
105 independent trials yields a value of (1 - (.95)105 =)
0.9954- (10).
It is quite clear that the conceptual unit of interest
is the individual comparison and not the experiment. Carmer
and Walker state that "the penalties imposed by the use of an
experimentwise error rate [which include larger critical
values, larger Type II error rate, smaller power, and smaller
correct decision rate (10, p. 14)] should not be inflicted
upon the experimenter because he used a design with 60 ex-
perimental units rather than 105 trials occupying 840 experi-
mental units" (10, p. 13).
The equation for the randomized block design (4.1, p.
24-0) is given by
Yij = ^ + Tj + "i + £ij Eq. 16
where is a score for the experimental unit in block i and
treatment level j, M is the overall population mean, t is J
the effect of treatment level j and is subject to the
restriction that Zx = 0, tk is the effect of block i which is
normally distributed, and e±j. is the experimental error that
56
is normally distributed and independent of the block effect.
Gill recommends blocking units into homogeneous groups as the
chief device for reducing experimental error (24, p. 1507).
However, Carmer and Swanson set their block effect to a
constant zero (6, p. 942; 7, p. 68). Under this condition,
the randomized complete block design can be simplified to the
completely randomized design (41, p. 240). The equation for
the completely randomized design (41. p. 135) is given by
Yij = U + Tj + ei(j)• E1- 17
The randomized complete block design has the advantage
of specifying the source of experiment error more precisely
than the completely randomized design. It does this by
dividing the error term, ^, in equation 17 into two
parts. The first part accounts for error among blocks, TT.,
and the second accounts for the remainder. When the block
effect is zero, the generalized error terms in both equations
are equal. Under this condition, equation 17 can be used to
generate scores as precisely as equation 16. This is the
model used by Keselman and others in several studies (32, p.
99; 35, p. 264; 36, p. 127; 37, p. 1051; 38, p. 48; 40, p.
585).
Articles Citing the Carmer and Swanson Studies
The impact of the Carmer and Swanson recommendations for
the FLSD can be seen by the number of writers who have cited
57
their work. The Science Citation Index. 1973-1984., cites
seventy articles referencing the 1973 study. Thirty-one
articles published since 1974- used the FSD2 procedure and
specifically cited the Carmer and Swanson study (See Appendix
F for a complete list of articles noted in this section).
Four studies used the Bayes Exact Test which was the secon-
dary recommendation of the 1973 study. The BET performed as
well as the FLSD but is more difficult to use. Three ar-
ticles cited the 1973 study, but then used procedures that
the study did not recommend: the unprotected LSD, the Tukey
HSD and the Duncan MRT. Three articles discussed the general
use of multiple comparisons and were clearly supportive of
the Carmer and Swanson findings, while eight others only made
a passing reference to the studies. Eighteen articles could
not be located for analysis. Three articles openly chal-
lenged the findings of Carmer and Swanson on the basis of
error rate considerations.
One article extended the findings of the Carmer and
Swanson 1973 study and was cited by Myette and White in
support of the two-stage LSD technique. This is the 1975
study by Clemens S. Bernhardson (4).
The Research of Clemens Bernhardson
Bernhardson<s study is a refutation of the findings of
Boardman and Moffitfs 1971 study (5). Their recommendation
against the LSD was based on empirical tests which were made
58
without doing a preliminary F test U , p. 229). Bernhardson
developed four formulas with which to compute per comparison
and experimentwise error rates for multiple comparison proce-
dures to be used only after a significant F test. Equation
18 gives the per comparison rate for the combination sig-
nificant F test and multiple comparison procedure.
The number of Type I errors following significant F test
aA = ,Z 7 I T T - Eq. 18 (Number of experiments)(k(k-1)/2)
This formula reduced the a p c of the FLSD below the nominal
level, but not as much as the HSD procedure. Bernhardson
further modified equation 18 by changing the denominator to
reflect only experiments with significant F-ratios. This
definition is given by equation 19.
The number of Type I errors following significant F test
a -B ' ~ — Eq. 19
(Number of experiments with significant F)(k(k-1)/2)
This formula produced an unacceptably high error rate and was
not included in this study.
Bernhardson also studied two modifications of the for-
mula for experimentwise error rate. Equation 20 gives the
experimentwise error rate for the combination of a sig-
nificant F test and a multiple comparison procedure.
Number of experiments with a significant F
and one or more Type I errors a° = T~Z I 2 0
Number of experiments
59
This procedure reduced a g w of the FLSD to the level of the
HSD and was used in this study. An additional modification
of the experimentwise error rate was made by changing the
denominator to reflect only experiments with significant F-
ratios. This definition is given by equation 21.
Number of experiments with a significant F
and one or more Type I errors aD = ~ — — Eq. 21
JMumber of experiments
with a significant F
This formula also proved to yield excessive error rates and
was not used in this study. Bernhardson's results demon-
strated that the use of the modified formula reduces the two-
stage LSD experimentwise Type I error rate to the nominal
level of significance U , p. 231). Since excessive experi-
mentwise error rate has been the chief argument against the
FLSD, it would seem that Bernhardson's findings demand fur-
ther investigation. This is precisely the conclusion of
Myette and White in their synthesis of twenty empirical
studies of multiple comparison procedures (45, pp. 13-14.).
It should be noted that Carmer has recently become more
extreme m his position. Given the fact that ANOVA tests a
complete null hypothesis, that is, that all means in the ex-
periment are equal; and the fact that one can scarcely design
a realistic experiment in which ten or fifteen means are
equal ["Baby Bear considered it to be rather unlikely that
all 15 cultivars were genetically alike, so he did not worry
60
a great deal about possible Type I errors." (9, p. 122)], the
ANOVA is known to be significant before it is applied. It is
therefore unneccessary.
In deciding whether to use the ordinary LSD or the restricted LSD, the experimenter needs to consider the question: "How likely is it that all [k] treatments in my experiment have exactly the same true means?" If it is quite unlikely that all [k] treatments are equal, there may be little or no point in requiring the analysis of variance F ratio to be significant. On the other hand, if the experimenter has evidence that all LkJ treatments might be expected to be equal, use of the restricted LSD [FLSD] may be a good choice (10, p. 12).
While one might not expect ten or fifteen means in an experi-
ment to be equal, what about experiments that consist of
smaller k's such as those common in educational research?
Carmer writes,
• think that for 3, 4-> or 5 treatments one might opt for the FLSD if he felt there really were possibilities that all treatments were the same. I think that would be preferable to using an experiment-wise procedure. In agricultural research, we are, of course, concerned about Type I errors, but in general using the LSD or FLSD at 5% or 1% will give adequate
I 1 5 ^ r e a t m e n t s a n d u s e s Tukey's test n noo / C7 V l s ecluivalent to using the LSD at U.(JojS4% and one has little chance of making a Type I error and little chance of detecting real differences
The purpose of this study was to empirically study the
findings of Carmer and Swanson (6, 7) and Bernhardson U ) to
better understand the dynamics of the selected multiple com-
parison procedures, principally the FLSD and HSD, in relation
to per comparison and experimentwise error rates.
61
CHAPTER BIBLIOGRAPHY
1. Aitkin, M. A., "Multiple Comparisons in Psychological Experiments," The British Journal of Mathematical
Statistical Psychology. XXII (November 1969). pp. 193-198. "
2. Balaam, L. N., "Multiple Comparisons: A Sampling " Australian Journal of Statistics. Vol.
V (1963).
3. Barcikowski, Robert S., "Statistical Power With Group Mean As the Unit of Analysis," ED 191 910, National Institute of Education Grant, (Ohio State Univer-sity, 1980).
4-. Bernhardson, Clemens S., "375: Type I Error Rates When Multiple Comparison Procedures Follow a Significant
229-232°f A N° V A , M B i o m e t r i c s - X X X I (March 197$), pp.
5. Boardman, Thomas J. and Moffitt, Donald R., "Graphical Monte Carlo Type I Error Rates for Multiple Com-P a r i ^ S Jy° c e d u r es," Biometrics. (September 1971), pp. /j>o-/43»
6. Carmer, S. G. and Swanson, M. R., "Detection of Dif-ferences Between Means: A Monte Carlo Study of Five airwise Multiple Comparison Procedures," Aeronomv
Journal, LXIII (1971), p. 940-945. Agronomy
7.
8.
; ; . "An Evaluation of Ten airwise Multipie Comparison Procedures by Monte
Carlo Methods," Journal of the American Statistical Association, LXVIII (1973T,^?."55=7^
"Optimal Significant Levels for Applica-I N O N + _ _ I T-\. Sr Sr tion of the Least Significant Difference in Crop
Performance Trials," Crop Science. XVT f February 1976), pp. 95^99. January
9' Statist-! ̂ ? d T W ? l k r ; W' " B a b y B e a r' s Dilemma: A
62
10 ,
11
t "Pairwise Multiple Comparisons Procedures for Treatment Means," Techni-cal Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983), pp. 1-33.
, Professor of Biometry, University of Illinois, Urbana, Illinois, Personal letter received January 14, 1985.
12. Duncan, D. B. and Brant, L. J., "Adaptive t Tests for Multiple Comparisons," Biometrics. XXXIX, pp. 790-794- •
13. Dunnett, C. ¥., "Answer to Query 272: Multiple Comparison Tests," Biometrics, XXVI (September 1969), pp. 139-14-0 •
14-. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (197 <5) pp. 574-583. ~ '
15. Federer, Walter T., Experimental Design: Theory and Application. New York, The Macmillan Company, 1955.
16. Ferguson, George A., Statistical Analysis in Psychology and. Education, 5th ed., New York, McGraw Hill Book Publishers, 1981.
17. Fisher, R. A., Statistical Methods for Research Workers. 6th ed., Edinburgh (London), Oliver and Boyd, 1936.
> The Design of Experiments. 2nd ed., Edin-18. burgh, Oliver and Boyd, 1937.
19. Fryer, H. C., Concepts and Methods of Experimental Statistics. Boston, Allyn and Bacon, 1966.
20. Gabriel, Ruben K., "Comment," Journal of the American Statistical Association. LXXIII (September 1978), pp. 4-85-4-87. "
21. Games, Paul, "Inverse Relation Between the Risks of Type I and Type II Errors and Suggestions for the Unequal n Case m Multiple Comparisons," Psychological Bulletin, LXXV (1971), pp. 97-102. *
63
22* _> Keselman, H. J., and Clinch, Jennifer J., "Multiple Comparisons for Variance Hetereogeneity," British Journal of Mathematical and Statistical Psychology. XXXII, (1979), pp. 133-142i
23. Gill, J. L., "Current Status of Multiple Comparisons of Means in Designed Experiments," Journal of Dairv Science, Vol. LVI (1973).
, "Evolution of Statistical Design and Analysis of Experiments," Journal of Dairy Science, LXIV (June 1981), p. U94-1519.
25. Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology. 2nd ed. Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984-
26. Harter, H. Leon, "Error Rates and Sample Sizes for Range
('^957)in M u l^|P 1®^^ o mP a r i s o n s» " Biometrics . XIII
27. Hinkle, Dennis E.; Wiersma, William and Jurs, Stephen G., asic Behavioral Statistics, Boston, Houghton Mif-
flin Company, 1982.
28. Howell, David C., Statistical Methods for Psychology. Boston, Duxbury Press, 1982.
29. Howell, John F. and Games, Paul A., "The Effects of Variance Heterogeneity on Simultaneous Multiple Comparison Procedures with Equal Sample Size," British Journal of Mathematical and Statistical Psychology. XXVII (1974), pp. 72-81.
30. Johnson, ̂ Palmer^ CU and Jackson, Robert W. B., Modern Statistical Methods: Descriptive and Inductive. Chicago, Rand McNally & Company, 1959~*
31. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their ^ ^ i o n s h i p to Power," Journal of Dairy Science, i,VIII (September 1975), pp. 1372-1378.
32. Keselman, H. J., "A Power Investigation of the Tukey Multiple Comparison Statistic," Educational and •Psychological Measurement , XXXVI (1976), pp"T~97-IU4 •
6A
— ; Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin. LXXXVI (Julv 1979), pp. 884.-888.
34. and Murray, Robert, "Tukey Tests for Pairwise Contrasts Following the Analysis of Variance: Is There a Type IV Error?," Psychological Bulletin. LXXXI (1974.) p. 609.
^ — and Rogan, Joanne C., Effect of Very Unequal Group Sizes on Tukey's
Multiple Comparison Test," Educational and Psychological Measurement. XXXVT (Summp-r 1976), pp. 263-270. ' ̂
and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Parametric Tests for Mul-
Comparisons, " British Journal of Mathematical a n d Statistical Psychology. XXX (May 1977;, pp. 125-133 •
3 7 - = : = — - — . "The Tukey Multiple ?S?P?fr
1?°n Test: 1953-1976," Psychological Bulletin, LXXXIV (September 1977), pp. 1050-1056.
^8* + u — M — "A Comparison of the Modifled-Tukey and Scheffe Methods of Multiple Comparisons for Pairwise Contrasts," Journal of the American Statistical Association. VXXIII (March
3 9' _ — a n d Toothaker, Larry E., "Comparison of Tukey's T-Method and Scheffe's S-Method for Various Numbers of All Possible Differences of Averages Contrasts Under Violation of Assumptions", Educa-tional and Psychological Measurement. VXX (1Q7/) pp. 511-519.
^ n-r 7;— and Shooter, t V S Evaluation of Two Unequal n, Forms of the Tukey Multiple Comparison Statistic?" Journal of the American^Statistical Association. LXX, (September
4-1. Kirk, Roger E., Experimental Design: Procedures for the Behavioral Sciences, 2nd ed., Belmont, Calif^nilT Brooks/Cole Publishing Company, 1982.
65
4-2. Kirk, Roger E., Professor of Psychology, Baylor Univer-sity > Waco, Texas, Personal letter received. January 22, 1985.
43. Levin, J. R. and Marascuilo, L. A., "Type IV Errors and Interactions," Psychological Bulletin. LXXVIII (1972), pp. 368^374"!
4-4-. Light, Richard J. and Pillemer, David B. Summing Up: The Science of Reviewing Research. Cambridge, Harvard University Press, 1984..
4-5. Myette, Beverly M. and White, Karl R., "Selecting An Appropriate Multiple Comparison Technique: An In-tegration of Monte Carlo Studies," Paper presented before the Annual Meeting of the American Educa-tional Research Association, March 19-23, 1982.
46. O'Neill, R. and Wetherhill, G. B., "The Present State of Multiple Comparison Methods," Royal Statistical Society (Series B), XXXIII (1971).
4-7. Petrinovich, Lewis F. and Hardyck, Curtis D., "Error Rates for Multiple Comparison Methods: Some Evidence Concerning the Frequency of Erroneous Conclusions," Psychological Bulletin. Vol. VXXI (1969), pp. 43-54.
4-8. Ramsey, Philip H., "Power Differences Between Pairwise Multiple Comparisons," Journal of the American Statistical Association. LXXIII^"1978), p. 4.79.
4-9. Roscoe, John T., Fundamental Research Statistics for the Behavioral Sciences. 2nd ed., New York, Holt" Rinehart and Winston, Inc., 1975.
50. Ryan, T. A., "Comment on 'Protecting the Overall Rate of iype I Errors for Pairwise Comparisons With an
T®?t Statistic,'" Psychological Bulletin. LaXXVIII (September 1980), pp. 354,-355.
51. Steel, R. G. D., "Query 163: Error Rates in Multiple Comparisons," Biometrics. (1961), pp. 326-328.
52. Thomas, D. A. H., "Error Rates in Multiple Comparisons Among Means: Results of a Simulation Exercise," Unpublished Master's Thesis, University of Kent, Canterberry, England.
53. Waldo, D. R., "An Evaluation of Multiple Comparison Procedures," Journal of Animal Science. XLII (1Q7M PP. 539-54-4-. '
66
54-. Waller, Ray A. and Duncan, David B., "A Bayes Rule for the Symmetric Multiple Comparisons Problem," Journal
the American Statistical Association. LXIV (December 1969), p. K85.
55 • Welsch, Roy E., "Stepwise Multiple Comparison Procedures," Journal of the American Statistical Association. LXXII (1977T7~pp. 566-575^
56. Winer, B. J., Statistical Principles in Experimental Design, New York, McGraw-Hill Book Company, 1962.
CHAPTER III
PROCEDURES
The Simulation Plan
The following plan was followed for generating data,
applying the F-test and the six specified multiple comparison
procedures, and presenting the summary statistics.
Generating random numbers
The heart of this Monte Carlo study was a pseudo-random
number generator developed from the Fortran computer program
"RANDU." RANDU generates twelve (U=12) uniform random num-
bers ranging from 0.00 to 0.99, adds them together, and
subtracts the value of six (U/2) from the total. The result
is a pseudo-random number which, along with N-1 others,
simulates a normal distribution with a mean of 0 and standard
deviation of 1.0.
The generator routine was converted from Fortran IV to
BASIC. The BASIC version of the generator created an ex-
tremely leptokurtic distribution with U set to 12 (See Appen-
dix A). Scores were concentrated about the mean more than
one would expect in a normal curve. A BASIC program was
written to modify U and test distributions of scores by the
chi-square goodness of fit test until the generator could
produce a population of at least 1000 scores which fit the
67
68
theoretical normal curve. Beginning with U=12, 1000 scores
were generated by the equation.
Xij = u + Tj + ei(j)- 23
This is equation 17 in Chapter II. The value of y was a
constant 100. The value of x. was a constant 0 representing
the null condition of no treatment effect. The value of
ei(j) was simulated by the following BASIC program segment;
36^0 ' GENERATE SCORES
3700 FOR S%=1 TO K <__ k groups 3710 FOR N%=1 TO NN(S%) < — n for each group 3720 FOR J%=1 TO U < — add U uniform random 3730 A=A+RND numbers 374-0 NEXT J% 3^50 E=(A-B)-""SIGMA < — subtract B=U/2 from sum; 3760 A=0 multiply by SIGMA=10 3770 X-MU+E < — score = 100 + E(rror)
3850 NEXTES7 N % < _~ ^ c a l c u l a t i o n s o f sums 3o50 NEXT S% f o r g r 0 Up means not shown here)
Scores were then categorized into one of the following
exclusive score ranges: (1) less than or equal to 60, (2)
less than 70, (3) less than 80, (4.) less than 90, (5) less
than 100, (6) less than 110, (7) less than 120, (8) less than
130, (9) less than 14-0, and (10) greater than or equal to
U 0 . The theoretical Normal Curve distribution was divided
into ten equal z-score ranges which yielded theoretical
percentages for N scores categorized into ten classes. These
percentages were multiplied by N = 1000, yielding expected
frequencies for the ten categories of 1.3, 10.9, 54-.6, 159.8,
69
273 »4> 273.4-, 1 59.8, 5-4• 6, 10.9 and 1.3 scores respectively.
Actual counts of range frequencies were tested for goodness
of fit against expected counts from the normal distribution.
This process was repeated ten times for each value of
U. An average chi-square value for the ten repetitions was
calculated and tested against the critical value of 16.919
(df = 9, a = 0.05). U was then incremented by 1 and the
procedure repeated. This program and the detailed results of
the chi-square tests are located in Appendix A. Table VIII
shows mean chi—square values obtained when 1000 scores were
placed into ten categories for ten repetitions.
TABLE VIII
MEAN CHI-SQUARE VALUES FOR TEN REPETITIONS OF N=1000 SCORES AND A GIVEN VALUE OF U
U* MEAN CHI-SQUARE U MEAN CHI-SQUARE
12 100.062 21 8.720 13 80.123 22 10.4-53 u 63.24.5 23 14..963 15 4.9.230 24. 20.4.24. 16 39-384 25 24.. 914. 17 27.352 26 36.280 1 8 20.113 27 U.968 19 12.64.3 ** 28 62.382 20 8.678 *** 29 86.84.6
30 100.194.
Number of uniform randoms used to generate one "normal" random
«v i ^00(^ i"lt at 5% level of significance Best fit between generated scores and normal curve for N=1000 scores
70
As U increased from 12 to 18, the leptokurtosis of the
generated distributions decreased toward normality. As U
increased beyond 21, the generated distributions passed
through normality and became increasingly platykurtic. As
shown in Table VIII, the best fit between distributions
generated by the BASIC program and the normal curve occurred
when U was set at 20. That is, using twenty uniform random
numbers to produce each normal score produced the best simu-
lated normal distribution.
The next question to be answered was how large a popu-
lation of pseudo—random numbers could be generated while
using U = 20 uniform random numbers and still "fit" the
normal curve. The BASIC program was modified to maintain the
value of U at 20 and vary the size of N. Sets of 1000, 2000,
3000, 4.000, and $000 scores were generated and placed in ten
categories. A chi-square value was computed on each set of N
scores. This was repeated ten times, and a mean chi-square
value computed. This was tested against the same critical
value as before: 16.919 (df = 9, a = 0.0$). All of these
sets fit the normal curve. Ten sets of 10,000 scores were
tested m the same manner. A significant chi-square declared
these distributions non-normal. Ten sets of 7000 and $$00
scores were tested in the same manner with the same result.
Table IX shows mean chi-square values for various sizes
of N using the best case U = 20 uniform random numbers. As
shown in the table, the random number generator produced
71
"normal" distributions up to N = 5000. This study used a
maximum of 180 scores per experiment (k=6, J=7, n. = J
80,20,20,20,20,20). Populations of 1000 are common in educa-
tional research. Therefore, the scores generated by the
pseudo-random number generator and equation 23 were well
within the bounds of a normal population distribution for
purposes of this study.
TABLE IX
MEAN CHI-SQUARE VALUES FOR TEN REPETITIONS OF N SCORES WITH U = 20
N MEAN CHI-SQUARE
1000 8.610 * 2000 9.085 * 3000 11.515 * 5000 15.^02 * 5500 17.195 7500 23.867
10000 22.227
"Good fit" at 5% level of significance
The BASIC routine was further tested by generating ten
sets of ten thousand scores each with jj = 0, o = 1, and U =
20. A mean and standard deviation was computed for each
set. Finally, average values for mean and standard deviation
was computed across all ten sets. These average values were
compared to 0 and 1.0 respectively to estimate the accuracy
of the BASIC version of RANDU. Results of this test are
shown in Table X.
72
TABLE X
MEAN AND STANDARD DEVIATION VALUES FOR TEN SETS OF N=10,000 SCORES
Set Mean Standard Deviation
1 0.0145909 1.2959851 2 -0.0092742 1.2947861 3 0.0034-212 1.2959851 4 0.0026654 1.2860699 5 0.0034329 1.2981012 6 0.0092000 1.2819469 7 -0.0031013 1.3148083 8 -0.0171827 1.2973865 9 -0.0067541 1.2890456 10 0.0186605 1.2829025
Mean: 0.0015659 1.2937017
Interpolating Critical Value Tables
Four critical value tables were required to analyze the
means of the generated scores. These were the F-distribution
table (3, Table E.5), the Studentized Range table (3, Table
E.7), the Duncan Multiple Range table (3, Table E.8), and the
Studentized Augmented Range table (3, Table E.18).
A BASIC program was written to interpolate the four
tables and store them in a single random access file for use
by the main program. Appropriate values were read from these
tables upon each k,J cycle of the simulation. These values
were printed at the top of each data summary sheet (see
Appendix E) in order to insure that each procedure was using
73
the proper testing criteria. Interpolated critical values
were considered sufficiently accurate for this study since
mainframe statistical packages such as SPSS and SAS use
interpolated critical values (4., p. 129).
The Main BASIC Program
The main program for this study, listed in Appendix C,
was written to do the following:
1. Initialize all variables and the printer.—This
subroutine dimensioned all variable arrays, set all variables
to their initial values, created all printing formats for
generated results and initialized the printout settings on
the printer.
2. Set cycle parameters.—This section set initial
values for k and J. The values of k ranged from 3 to 6 means
per experiment. The values of J ranged from 1 to 7, reflec-
ting seven categories of sample size. Upon initializing the
program, specific values could be selected for k and J. The
default values were k=3 and J=1. Subsequent cycles incre-
mented J to J+1 until J=8. J was then reset to 1 and k set
to k+1. Program execution ended when k incremented to 7.
4-. Assigned sample n's.—For sample size conditions J(1)
through J(5), n was set to 5, 10, 15, 20, and 25 respec-
tively. For condition J(6), n was set in increments of 5,
beginning at 10. That is, for k=3, was set to 10, n ? was
7 U
set to 15, and was set to 20. Total N for J(6) ranged
from 4-5 (k=3) to 135 (k=6). For condition J(7), n^ was set
to 80, and all other n.'s to 20. Total N for J(7) ranged
from 120 (k=3) to 180 (k=6). These sample sizes were chosen
to reflect the common conditions of small k's and unequal n's
in educational research. Kirk recommends the HSD modifi-
cation by Spjs^tvoll-Stoline when sample n's are approximently
equal and the Tukey-Kramer modification when there is a
moderate or large imbalance among sample n's (3, p. 120).
Therefore, J(6) was selected to reflect educational studies
in which groups are nearly equal in size and J(7) was
selected to reflect those studies in which one group is much
larger than the others.
5. Obtained critical values.—Critical values were read
from the four interpolated critical value tables which
resided in a single random access computer file. Each criti-
cal value was located in the random access file by a record
number computed by
R = (F-1)x1000 + (1-2)x200 + J Eq. 2k
where F was the file number (1-4), I was the table column
number, and J was the table row number. For the F-test, F
was 1, I was degrees of freedom between groups [k-1], and J
was degrees of freedom within groups [N-k]. The F-ratio
critical value for the case where k=4 and J=4 (equal n case,
n=20) was found as follows:
75
R = (F-1)x1000 + (I-2)x200 + J Eq. 25
= (1-1) x1000 + (3-2)X200 + 76
= 276.
Record number 276 held the value of 2.739» the Interpolated
F-test critical value for a = 0.05, dfb = 3, dfw = 76. The
critical value for the (F)LSD, located in the Studentized
Range Table (F=2, r=2) was found as follows:
R = (F-1)x1000 + (1-2)x200 + J Eq. 26
= (2-1)x1000 + (2-2)x200 + 76
= 1076.
Record number 1076 held the value of 2.822, the interpolated
critical Studentized Range value for q(2), a = 0.05, dfw =
76. Values for hypothesis testing were obtained in this same
manner for the HSD, SNK(r) and MRT(r) for each k,J
combination. File three held values of the Multiple Range
table and file four held values of the Student Augmented
Range table, used by the Spj^voll-Stoline procedure.
6. Generated scores and calculated an F-ratio.—The
specified number of scores were generated with the random
number generator. A sample of generated scores is located in
Appendix B. An F-ratio was computed and compared to the
appropriate F table value. If the computed F-ratio equalled
or was larger than the table value, then the F-ratio was
declared significant. A flag was set to "1 " for a sig-
nificant F-test and "0" for a non-significant F-test. A
76
running count of significant F-ratios was maintained and
printed on the summary sheet.
7. Computed critical differences.—The six specified
multiple comparison procedures were used to test the pairwise
differences between group means. As demonstrated in Chapter
II, each multiple comparison procedure computes its critical
difference in a unique way. For the equal n cases (J=1 to
5), the standard error of the mean was computed the same way
for the (F)LSD, the MRT, the SNK and the HSD. The standard
error of the mean is the square root of the mean square
within value, taken from the ANOVA table, divided by n, the
number of scores in each group (1, p. 370):
sd - /MSW/n E q # 27
The critical difference for each multiple comparison
procedure was computed by multiplying the standard error of
the difference by the appropriate table critical value.
These equations are summarized as
(F)LSD = q(a,2,v) /MSw/n Eq. 28
= q(a,k,v) /MSw/n Eq. 29
SNK^ = q(ot,r,v) /MSw/n jjq. o,Q
MRT^ = m(ot,r,v) /MSw/n gq ̂
S S D = /Uk-1)F c v] /[2MSW/n]. Eq. 32
Tables XI to XIV (See Appendix G) contain critical differed ice
77
values for each multiple comparison procedure for the equal n
cases for k = 3 to 6. These values are the actual values
computed for the specified k,J combination on a randomly
selected ith cycle.
It is inappropriate to compare critical differences
across J because they depend directly on the magnitude of
MSw, which was computed from randomly varying scores. Values
for each k,J combination reflect one random cycle of its set
of 1000 cycles. For a specified k,J combination, however,
one can compare the critical differences across procedures.
In every case we find the differences ranging from low — the
(F)LSD — through the MRT, SNK, and HSD to high — the SSD.
For the two k,J conditions for unequal sample sizes (J=6
and J-7), a slightly different procedure was used to compute
critical differences. The standard error of the mean re-
quired modification to deal with unequal n's. For the
(F)LSD, the MRT, the SNK, and the Kramer modification of HSD,
n was replaced by the harmonic mean of the n's of the two
means being tested (2, p. 302; 3, p. 120). That is, n was
replaced by
2 nh = T; 7 — Eq. 33
1/n. + 1/n. 1 J
This procedure was recommended by Keselman, Murray and Rogan
as inducing less bias into the critical difference than
computing n^ on all sample n's in the experiment (2, p.
78
302). The formula for standard error of "the mean used for
the unequal n case was therefore /MSw/n^ for the (F)LSD, the
MRT, the SNK, and the Tukey-Kramer HSD.
A second modification of the HSD used in this study was
the Spj^tvoll-Stoline. The standard error of the mean was
computed using the smaller of the two n's of the two means
being tested. This number, referred to as n . , directlv m m ' J
replaced the n in the formula for the standard error of the
mean (3, p. 119). Using the minimum n rather than the har-
monic mean between the two n's increased the size of the
standard error of the mean, and, in turn, increased the
magnitude of the critical difference. Therefore this modifi-
cation was more conservative than the Tukey-Kramer.
The modification for the standard error of the mean for
SSD for unequal n's involved multiplying MSw by (1/n. + 1/n.)
where n ± and n̂ . are the sample sizes for the two means being
tested (3, pp. 121-2).
Tables XV to XVIII (See Appendix H) contain critical
difference values for each multiple comparison procedure for
the unequal n cases and k = 3 to 6. The order of these
values differs from those of Tables XI to XIV. There, criti-
cal differences were given for each k,J combination and, for
the range tests, the value of r. In Tables XV to XVIII,
critical differences are given for specific paired-mean
tests. Means were ranked ordered from high to low before the
multiple comparison tests were applied. Differences between
79
paired means were then ordered such that, given three means,
d=1 means the difference between means 1 and 3; d=2 the
difference between 1 and 2; and d=3 the difference between 2
and 3. The critical difference used to test the pairwise
difference was based on the specific n's in the pair. There-
fore for J=6 each pairwise difference is tested by a unique
critical difference. These critical differences are shown in
Tables XV to XVIII. In the case of the SNK and MRT, the
first test (d=1) must be significant for the second test
(d=2) to be made. The complete procedure for making stepwise
tests is given in Appendix D. Suffice it to say here that
the dash (-) in the table refers to range test values not
computed because a preliminary test proved not significant.
It is inappropriate to compare critical differences
across d in the tables because the ordered comparisons,
sample sizes, and the resultant critical values varied ran-
domly within each selected cycle of its set of 1000 cycles.
For a given comparison (d), however, one can compare the
critical differences across procedures. All procedures may
be directly compared for d=1. Procedures other than the MRT
and SNK can be compared for all values of d. In most cases
we find the differences ranging from low — the (F)LSD
through the MRT, SNK, HSD-TK, the HSD-SS to high — the SSD.
In some cases the critical differences for the HSD-SS
procedure were more conservative than the SSD.
80
8. Computed summary statistics.—One thousand experi-
ments were simulated for each k,J combination. Separate
counts for each multiple comparison procedure were made of
the number of (1) all experiments with at least one Type I
error, (2) experiments with a significant F-ratio and at
least one Type I error, (3) all comparisons incorrectly
declared significantly different, and (4.) comparisons incor-
rectly declared significantly different in experiments with a
significant F-ratio. The variables which held these differ-
ing counts are shown in Table XIX.
The variables without a percent sign (%) were counters
which tracked each procedure's experimentwise and comparison-
wise errors. The counters under "EXPERIMENTWISE - ALL
ANOVAS" were incremented each time at least one Type I error
was made within an experiment. The first Type I error within
an experiment set a flag which caused all other Type I errors
within that experiment to be ignored. The counters under
"EXPERIMENTWISE - SIG ANOVAS" were incremented only when a
Type I error was made within an experiment with a significant
F-ratio. Counters under "COMPARISONWISE - ALL ANOVAS" were
incremented each time a comparison was declared significantly
different. Counters under "COMPARISONWISE - SIG ANOVAS" were
incremented only when significant comparisons were detected
in experiments with a significant F-ratio.
TABLE XIX
VARIABLES ASSOCIATED WITH SPECIFIED ERROR COUNTS FOR EACH MULTIPLE COMPARISON PROCEDURE
AND TWO KINDS OF TYPE I ERROR
81
LSD
FLSD
MRT
SNK
HSD
SSD
EXPERIMENTWISE COMPARISONWISE
HSD-SS
HSD-TK
LE PLE%
ME PME%
NE PNE%
HE PHE%
HE* PHE%
TE PTE%
SE PSE%
ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVAS
LC PLC%
FE PFE%
SME PSME%
SNE PSNE%
SHE PSHE%
MC PMC%
NC PNC%
HC PHC%
UNEQUAL N HSD
SHE PSHE%
STE PSTE%
HC PHC%
TC PTC%
SSE PSSE%
SC PSC%
FC PFC%
SMC PSMC%
SNC PSNC%
SHC PSHC%
SHC PSHC%
STC PSTC%
SSC PSSC%
:"HSD and HSD-SS shared these variables since they ran under different J's.
The variables ending in held error rate percen-
tages. Experimentwise error counts were divided by 1000 t<
yield a proportion, then multiplied by 100 to compute the
82
percentage. Per comparison error counts were divided by 1000
x k(k-1)/2, then multipled by 100 to yield percentages.
All programs were developed in interpreted BASIC to
facilitate interactive testing of the procedures. The
greatest drawback of interpreted BASIC is its slowness of
execution. Each program statement is translated into machine
language as it is being executed. One thousand repetitions
of the smallest k,J combination — k=3, J=1 — required 57
minutes to run. Therefore it was decided to compile the main
program. Using a BASIC compiler, the interpreted BASIC
program was converted into machine language. When the com-
piled program executed, there was no translation of state-
ments necessary, and 1000 repetitions of k=3 and J=1 ran in
just under five minutes. Run times for each k,J combination
were printed on the summary sheets.
For ot=0.05, it was expected that the count of sig-
nificant AN0VAS out of 1000 repetitions would be 50. Due to
the randomness of the data, however, there were sets of 1000
repetitions that produced too many or too few significant
F's. It was also noted that tests made by the multiple
comparison procedures were affected by this fluctuation: as
the count of significant F's increased or decreased, so did
the count of significant comparisons. It was decided that
only data which generated between 4.6 and 54 significant F's
in 1000 repetitions (rounding error for 0.050) would be used
in comparing the multiple comparison procedures. Table XX
83
gives the counts of significant F-ratios for each k,J com-
bination for 68,000 experiments.
TABLE XX
COUNTS OF SIGNIFICANT F-RATIOS FOR 1000 REPETITIONS OF EACH k,J COMBINATION
k = 3 k = 4- k = 5 k = 6
1 39 50 51 59 49 44 53
2 57 4-0 61 56 54 43 56 45 49 52
3 42 63 50 42 53 49 44 66 35 56 53
4 36 56 50 38 42 43 47 64 55 51 60 45
5 54 55 54 53 54
6 56 58 53 61 46 56 44 46 56 42 46
7 5<? 53 45 44 49 58 55 62 43 63 37 48
9. Printed out results.—Summary statistics for the F-
tests and multiple comparison procedures for each k,J com-
bination were printed on a single 8.5x11" page. These data
summaries are located in Appendix E.
Summary
In summary, a Monte Carlo simulation procedure was used
where (1) experiment size (k) varied from 3 to 6 means; (2)
84
there were seven sample size patterns (J); (3) the population
from which scores were drawn had a mean of 100 and a standard
deviation of 10; (4.) 1000 repetitions were computed for each
k,J combination; (5) five multiple comparison procedures were
applied to each pairwise comparison in each experiment under
the two conditions of (a) all experiments regardless of F-
ratio significance and (b) experiments in which the F-ratio
was significant at the 0.05 level; (6) counts and percentages
were maintained for the six specified procedures: LSD, FLSD,
MRT, SNK, HSD [HSD-SS and HSD-TK for unequal n cases] and
SSD.
85
CHAPTER BIBLIOGRAPHY
1. Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology, 2nd ed., Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984.
2. Howell, David C., Statistical Methods for Psychology, Boston, Duxbury Press, 1982.
3. Kirk, Roger E., Experimental Design: Procedures for the Behavioral Sciences, 2nd ed., Belmont, California, Brooks/Cole Publishing Company, 1982.
4. Wilkinson, Leland, SYSTAT: The System for Statistics SYSTAT, Inc., Evanston, 111., 198-4.
CHAPTER IV
ANALYSIS OF DATA
The results of this study of specified multiple com-
parison procedure error rates is presented in two major
sections. The first section deals with the data directly
related to the two hypotheses stated in Chapter I. The
second section presents analysis of related data.
Hypotheses
The first hypothesis of this study was that there would
be no difference in the ranking of error rates found by
Carmer and Swanson (1973) using large k's and equal n's and
the ranking obtained in this study using small k's and un-
equal n's. The procedure for testing this hypothesis was to
compare the Type I error rates generated in this study to
those computed in the Carmer and Swanson study. If the
results of this comparison demonstrated that the level of
error rate followed the ranking of LSD > FLSD > MRT > SNK >
HSD > SSD, then hypothesis one was to be accepted. The em-
pirical data for experimentwise and comparisonwise Type I
error rates produced in unequal n cases for protected and
unprotected conditions is summarized in Table XXI.
86
87
TABLE XXI
EXPERIMENTWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS
UNEQUAL N'S FOR K = 3 TO 6
Procedure 3 4 5 6
LSD FLSDa
12.4% 5.3
19.3% 4 • 6
29.7% 4.8
35.2% 4.7
MRT MRTa
9.7 5.1
12.3 4«4
18.3 4.7
19.9 4.5
SNK SNKa
5.1 4.7
4.7 3.7
4.8 3.6
4.3 3.0
HSD-SS HSD-SSa
3.0 2.8
2.6 2.4
3.0 2.3
2.6 2.2
HSD-TK HSD-TKa
5.6 5.0
5.0 3.9
5.2 3.9
4.9 3.4
SSD SSDa
3.8 3.8
2.4 2.4
1.5 1.5
1 .2 1 .2
Congruency Congruency(a)
Nob
Noa Noc
Noc,d No? No
No° No
a, c experiments. HSD-SS < SSD and HSD-TK > SNK. nbD-rK > SNK. UHSD-SS = SSD.
In summary, the rankings of for small k's and un-
equal n1s were congruent with the rankings given by Carmer
and Swanson with the following exceptions: HSD-SS < SSD and
HSD-SS " < SSD (k=3); HSD-SSa < SSDa (k=4); and SNK < HSD-TK
(all k). Table XXI shows that 37 of the 48 k,J rankings, or
77 per cent, were in the proper order.
It was shown in Chapter II that experimentwise and
comparisonwise Type I error rates are related in a fixed
88
manner by the equation
1 " aew = ( 1 " V > ° E q - 3 4
where c is the number of comparisons to be made. It follows,
then, that the rankings of comparisonwise Type I error rates
should prove congruent with experimentwise Type I error
rates. The empirical data for comparisonwise Type I error
rates for unequal n cases is summarized in Table XXII. In
each case, with minor exceptions, the ranking of the Carmer
and Swanson studies was replicated.
In summary, the rankings of comparisonwise Type I error
rates for small k's and unequal n's were congruent with the
rankings of error rates given by Carmer and Swanson with the
exception of the following relationships: HSD-SS < SSD and
HSD-SS* < SSD* (k=3); HSD-SS* < SSD* (k=4); and SNK < HSD-TK
(k—6). Table XXII shows 4-4- of 4-8 of the k,J rankings, or 92
per cent, were in the proper order. When the overly conser-
vative HSD-SS procedure is excluded, 4.7 of 4.8, or 98 per
cent, fall in the proper order. Summing the counts from both
experimentwise and comparisonwise error rates, there are 81
of 96 of the k,J combinations, or 84 per cent, that fall in
the proper ranking. When the overly conservative HSD-SS is
excluded, there are 71 of 80, or 89 per cent, in the proper
rankings. Therefore, hypothesis one is retained for small
k's and unequal n's with the exceptions of the conservatism
of HSD-SS and liberalism of HSD-TK as they relate to the
equal n HSD procedure.
89
TABLE XXII
COMPARISONWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS
UNEQUAL N'S FOR K = 3 TO 6
Procedure 3 4 5 6
LSD FLSD*
5.25% 2.67
4.80% 1 .70
5.30% 1.48
5.02% 1 .31
MRT SMRT*
4.34 2.63
3.13 1 .57
3.14 1 .28
2.63 0.99
SNK SSNK*
2.^2 2.24
1.26 1.09
0.75 0.62
0.43 0.43
HSD-SS HSD-SS*
1 .08 1 .00
0.53 0.49
0.40 0.33
0.24 0.19
HSD-TK HSD-TK*
2.17 1 .97
1.11 0.93
0.69 0.56
0.46 0.36
SSD SSD*
1 .43 1.73
0.50 0.50
0.22 0.22
0.10 0.10
Congruency Congruency*
No** No**
Yes No**
Yes Yes
No*** Yes
HSD-SS < SSD. ***HSD-TK > SNK.
The rankings of experimentwise Type I error rates for
the equal n cases were congruent with the findings of Carmer
and Swanson with the exceptions of the relationships of SNK =
HSD for all k,J and FLSD* = MRT* (k=6). Table XXIII shows
that 29 of 4-0 k,J rankings, or 73 per cent, were in the
proper order.
90
TABLE XXIII
EXPERIMENTWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS
EQUAL N'S FOR K = 3 TO 6
Procedure 3 4 5 6
LSD FLSDa
11.96% 5.16
19.18% 5.14
27.56% 4.86
35.68% 5.20
MRT MRTa
9.70 5.12
13.56 5.14
18.68 4.86
22.48 5.20
SNK SNKa
5.12 4.74
5.04 4.30
4.64 3.70
5.00 3.80
HSD HSDa
5.12 4.74
5.04 4.30
4.64 3.70
5.00 3.80
SSD SSDa
4.22 4.22
2.52 2.52
1 .78 1 .78
1 .34 1.34
Congruency Congruencya
N oK No
N ok No ' c
No? No ' c
N o£ No ' c
cl cSignificant experiments. FLSD = protected MRT
SNK = HSD
If the ties are considered neutral, that is, as not refuting
the overall ranking order, then 40 of the 40 k,J rankings, or
100 per cent, are in proper order.
Table XXIV shows that the rankings of a p c for the equal
n cases were congruent with the findings of Carmer and Swan-
son with the one exception of FLSD"- = MRT* (k=3). Thirty-
nine of the 40 k,J rankings, or 98 per cent, are in the
proper order. If the ties are eliminated, then the ordering
is 100 per cent congruent.
91
TABLE XXIV
COMPARISONWISE ERROR RATES FOR MULTIPLE COMPARISON PROCEDURES AVERAGED ACROSS
EQUAL N'S FOR K = 3 TO 6
Procedure 3 U 5 6
LSD FLSD*
5.11% 2.62
4.89% 2.05
5.05% 1 .62
5.06% 1 .52
MRT MRT*
4-.25 2.62
3.61 1.89
3.28 1 .40
3.12 1 .28
SNK SNK*
2.51 2.39
1.32 1 .20
0.77 0.68
0.62 0.54
HSD HSD*
2.07 1.94-
1 .05 0.93
0.63 0.53 O
O
• •
0 00
SSD SSD*
1.64. 1.64.
0.51 0.51
0.23 0.23
0.12 0.12
Congruency Congruency*
Yes No**
Yes Yes
Yes Yes
Yes Yes
FLSD = MRT^
The second hypothesis of this study was that there would
be no statistically significant difference in experimentwise
Type I error rate between the HSD and FLSD procedures when
using the Bernhardson formulas. The procedure for testing
hypothesis two was to test the difference between error rates
computed from the Bernhardson formulas for HSD and FLSD using
the z-test for difference between proportions. (1, pp. 230-
232). Table XXV, located in Appendix I, shows the results of
these tests. Of the 36 comparisons, 21 were not signifi-
cantly different. That is, the FLSD and HSD procedures
92
produced comparable experimentwise Type I error rates. In
every case where a significant difference was found, the
difference was due to the conservatism of the HSD, not the
liberalism of the FLSD. The significant differences were due
to the HSD error rates falling below nominal a. Eight of the
15 significant differences were produced by the HSD-SS which
consistently yielded excessively conservative error rates.
The comparisons in Table XXV were made between the FLSD
and the HSD under the "Significant F-ratio" condition
produced by the Bernhardson formulas. Carmer and Swanson
applied the HSD without regard to the significance of the F-
ratio. Table XXVI (Appendix I) shows the comparison between
FLSD and the "unprotected HSD." Of the 36 comparisons, 30
were not significantly different. That is, the FLSD and HSD
procedures produced comparable experimentwise Type I error
rates when the HSD was applied without regard to the sig-
nificance of the F-ratio. All six of the significant dif-
ferences were produced by conservatism of HSD-SS, not the
liberalism of the FLSD.
Therefore, hypothesis two is accepted for 51 of 72
comparisons, or 71 per cent. When the consistently conserva-
tive HSD-SS procedure is excluded from the analysis, hypothe-
sis two is accepted for 49 of 56 comparisons, or 88 per cent.
93
Related Findings
There are several related findings resulting from
analysis of the data which fall outside the scope of the
hypotheses but which have direct bearing on the study -of
multiple comparison procedures. These are described in this
section.
1. A confidence interval (0.95) was computed to deter-
mine which procedures produced experimentwise error rates
significantly different from 0.05 (1, p. 230). The limits of
this confidence interval were 0.0635 and 0.0365. Using the
Bernhardson formulas for experimentwise Type I error rate, no
procedure produced an error rate in excess of 0.0635. The
FLSD and protected MRT procedures yielded consistent nominal
error rates under all k,J combinations. The protected SNK
procedure produced error rates below 0.0365 in nine k,J
combinations: k=4, and 7; k=5, J=3, 4, 6 and 7; and k=6,
J=5, 6 and 7. The protected HSD-TK produced error rates
below 0.0365 m only two k,J combinations: k=6, J=6 and 7.
The protected HSD-SS procedure produced error rates below
0.0365 m all unequal k,J combinations. The SSD produced
error rates below 0.0365 in all k,J combinations above k=3,
J-6. Figures 7 to 10 (See Appendix J) graphically display
the error rate relationships among the multiple comparison
procedures selected for this study. Figure 7 is a graphic
presentation of experimentwise Type I error rates in relation
to 0.95 confidence interval for 0.05 level of significance
'4
generated by application of Bernhardson formulas for k=3 and
k=4. Figure 8 depicts experimentwise Type I error rates in
relation to 0.95 confidence interval for a =0.05 and N=1000
generated by application of Bernhardson formulas for k=5 and
k 6. Figure 9 depicts experimentwise Type I error rates in
relation to 0.95 confidence interval for a =0.05 and N=1000
generated without prior significant F-ratio for k=3 and k=4.
Figure 10 depicts experimentwise Type I error rates in rela-
tion to 0.95 confidence interval for a =0.05 and N=1000
generated without prior significant F-ratio for k=5 and k=6.
2. The HSD-TK modification produced error rates much
closer to nominal a and the equal n HSD than the HSD-SS
procedure for both unequal n conditions. The HSD-TK produced
slightly higher error rates than the SNK for the unequal n
conditions. The HSD-SS produced error rates closer to SSD
than HSD levels, and at times yielded a more conservative
test than SSD.
3. The SNK and HSD procedures produced identical ex-
perimentwise Type I error rates for all equal n cases.
4-. The FLSD and protected MRT procedures produced the
same experimentwise Type I error rates in 21 of the 28 k,J
combinations. Error rates of the protected MRT were slightly
lower in the following combinations: k=3, J=4, j=7; &=/,.,
J=7; k=5, J=7; and k=6, J=6, J=7.
5. The unprotected LSD and MRT produced experimentwise
Type I error rates far in excess of nominal a for all It,J.
95
6. The unprotected LSD was "the only procedure "to produce
a comparisonwise Type I error rate equal to nominal a. All
other procedures produced comparisonwise Type I error rates
below a. Use of the FLSD decreased a below a, but not as pc
low as the HSD a pc
96
CHAPTER BIBLIOGRAPHY
1. Hinkle, Dennis E.; Wiersma, William and Jurs, Stephen G. Basic Behavioral Statistics. Boston, Houghton Mif-' flin Company, 1982.
CHAPTER V
CONCLUSIONS, RECOMMENDATIONS AND SUGGESTIONS FOR FURTHER RESEARCH
The driving force behind this study has been the desire
to work through the statistical jargon and conflicting as-
sumptions in the literature on multiple comparison procedures
to learn first-hand when and how they should be applied. The
problem of which procedure to use has proven to be related as
much to philosophical perspective as it is to mathematical
precision. It has been shown that each assertion for or
against a given procedure has been accompanied by theoretical
or empirical data analysis. The bewildering array of argu-
ments pro and con can leave the researcher, interested in his
subject more than the subtleties of statistical theory, with
little assurance that the procedure he has chosen will serve
his purposes better than those he did not choose. The recom-
mendations that follow come from both the synthesis of the
literature and the empirical analysis of data generated for
this study.
Conclusions
It is clear from the empirical data that if one's inter-
est is truly in testing paired comparisons in a parsimonious
design, the Least Significant Difference is the procedure of
97
98
choice. The unprotected LSD was the only procedure to yield
a comparisonwise Type I error rate at nominal a. This
parallels Carmer's most recent recommendation for the LSD
(3). The use of the FLSD reduces the comparisonwise Type I
error rate, but consequently reduces the power of the test as
well. Carmer suggests using the FLSD when the possibility is
good that all group means in an experiment are equal (4).
Otherwise, if the researcher has good reason to believe that
all means are not equal, the preliminary F-test is unneces-
sary. The smaller k's and unequal n's prevalent in educa-
tional research designs — due to sampling difficulties and
mortality of human subjects — suggest the benefit of at
least minimal protection afforded by the preliminary F—test.
It is clear from the empirical data that the FLSD
procedure does indeed hold the experimentwise Type I error
rate to nominal a for the complete null hypothesis. Com-
parison of the FLSD and HSD procedures reveals the FLSD
consistently yielded experimentwise error rates closer to
nominal <x than the HSD with a larger comparisonwise error
rate for every k,J combination. That is, the FLSD is more
sensitive to sample mean differences than the HSD while
protecting against experimentwise error.
Reviewing the writings of R. A. Fisher revealed the
conservatism of the HSD procedure. Given an experiment with
10 means (4.5 comparisons), he suggested that testing the one
pair with the largest difference out of the U5 should be made
99
against a probability of "1 in 900" rather than "1 in 20" (6,
p. 66). It was shown that this "1 in 900" is the same
probability, 0.0011 , that the HSD would use to test all 4-4-
remaining pairwise comparisons. If the interest of the
researcher is paired comparisons, the HSD becomes less at-
tractive as the number of means in his experiment increases.
Einot and Gabriel criticized the Carmer and Swanson
findings as "simple consequences of error rates defined by
the procedures, rather than the techniques themselves" (5, p.
577). Their solution was to set all of the procedures to the
same experimentwise error rate and compare their perfor-
mance. Since the type of error rate defined by a procedure
is an integral part of the procedure itself, it seems Einot
and Gabriel violated the procedures by forcing one kind of
error rate on all. Their suggestion that power of the HSD
can be raised by increasing a from 0.0$ to 0.25 begs the
question. Why strongly defend an experimentwise procedure,
m order to protect against excessive experimentwise error,
and then recommend raising the level of significance? Games
criticizes the Petrinovich and Hardyck recommendation on the
same grounds: "when one specifies a conservative test, and
then says that if this test is not significant, he will use a
more liberal test, he is merely adding confusion and incon-
sistency to his decision rule" (7, p. 100).
The SNK and MRT error rates fell between the FLSD and
HSD rates where they were expected. The SNK experimentwise
100
rate equalled the HSD rate for all k under the equal n
conditions. Analysis of data drawn from both significant and
non-significant experiment cycles showed that a pairwise
difference in a given experiment large enough to be detected
by SNK was also large enough to be detected by HSD. This is
because both procedures test the largest difference by the
same critical difference. Secondary significant differences
did not alter the experimentwise rate — one Type I error per
experiment is all that is considered. This finding opposes
Ramsey's objection to the SNK for its "excessive experiment-
wise error rate" (U, p. ^82). Perhaps this is due to the
complete null condition used in this study. Petrinovich and
Hardyck report that the error rates for the SNK are similar
to the MRT — too high — "for all conditions save the com-
plete null hypothesis" (13, p. 53). At any rate, the SNK and
MRT appear to fall somewhere between pure comparisonwise and
experimentwise error levels. Carmer and Walker criticize
both SNK and MRT for this very reason. It is better, in
their opinion, to choose an error rate which can best answer
the questions of a given study, and then apply a procedure
defined by that error rate (3, p. 21).
Of the two unequal n modifications of HSD, the Tukey-
Kramer test fared much better than the Spj^tvoll-Stoline. In
some cases the HSD-SS was more conservative than the
Scheffe. The conservatism of the HSD-SS was not due to its
being a "better test," but rather due to the way it handled
101
the unequal n situation. Use of the harmonic mean of the n's
from the two samples being tested yielded uniformly better
results that use of the minimum n. The HSD-TK consistently
yielded error rates within the 0.95 confidence interval.
The Scheffe Significant Difference demonstrated its
severe limitations for use with pairwise comparisons. For
all k > 3, the SSD yielded error rates below the lower limit
of the 0.95 confidence interval.
The Bernhardson formulas proved essential for control-
ling the experimentwise error rates of the LSD and MRT
procedures. Application of these formulas yielded experi-
mentwise error rates not significantly difference from
nominal a. The Bernhardson formulas were not helpful for the
SNK and HSD procedures. Under some conditions, the preceding
significant F-test pushed the HSD below the lower limit of
the 0.95 confidence interval. Use of the unprotected HSD
resulted in fewer significant departures from nominal a. The
Bernhardson formulas had no effect on the SSD.
The "perpetual dilemma" (7, p. 99; 10) of increasing one
type of error by reducing the other points up the limitation
of statistical tests. Simultaneously reducing both types of
error is achieved only by improving the research design
itself. This includes increasing the number of subjects,
using more precise tools of measurement, or using research
designs that better partition error ( 2 , p . 95,. 9 , p . 1 3 7 5 ) f
such as in the use of homogeneous blocks (8).
102
Recommendations
Is the researcher's concern focused on each pair of
means, or is it concerned with the entire family of all
pairs? If his concern is with each pair, then the use of the
unprotected LSD is most appropriate. If his concern is with
the entire experiment, then the unprotected HSD is most
appropriate.
Is the researcher reasonably sure that his treatment
means are different? If he is, then the preliminary F-test is
unnecessary and restricts the ability of the test to detect
pairwise differences. Use of the unprotected LSD is most
appropriate. If he is unsure about the equality of treatment
means — a common occurrance with the smaller k's found in
educational research then the preliminary F—test can be
applied. Use of the FLSD is most appropriate.
What is the anticipated cost of implementing the
findings of the study? Will the incorrect rejection of the
null hypothesis (Type I error) result in large commitments of
time and financial resources? If so, it is recommended to
protect against this possibility by using the HSD or the
FLSD. Will the incorrect retention of the null hypothesis
(Type II error) result in inefficient programs or methods
being continued while better procedures are not explored? If
so, it is recommended to protect against this possibility by
using the LSD if group means are known a priori to be dif-
ferent, or the FLSD if they are not.
103
How important is the magnitude of the difference being
sought? If the researcher is seeking only those differences
large enough to engender economically feasible changes which
will produce practical improvements, then the conservative
HSD or SSD can be applied. If, however, the researcher is
exploring a broad area of interest and seeks those dif-
ferences, however small, that are at least statistically
different, if not practically different, the FLSD or LSD
procedures can be applied. Differences found through this
approach can be focused on in further studies.
Suggestions for Futher Study
There are several questions that have been suggested by
this study that could be investigated further. These are
discussed in this section.
This study was limited to the complete null hypothesis.
Its findings are limited to situations where one is testing
group means under the complete null condition. Bernhardson
also tested the complete null situation (1). Ryan raises the
question of "partial" null hypotheses in which, for example,
nine means are equal and the tenth is much larger (15, p.
354-)* Even Carmer and Swanson reported an experimentwise
Type I error rate as high as 45.5 per cent for the FLSD under
the partial null condition (2, Table 4, p. 70). Ryan
criticizes not only Carmer and Swanson, but also Keselman,
Games, and Rogan (1979, 1980) for "perpetuating the same
10^
misleading recommendations by considering only complete
nulls" (15, p. 355). Related studies could be done to inves-
tigate the error rates and power for partial null conditions.
This study used two patterns of unequal n's related to
the recommendations of Kirk for using the Tukey-Kramer and
Spj^tvoll-Stoline modifications of the HSD. Related studies
could be done to investigate other patterns of unequal n's
with regard to specific kinds of educational research.
This study simulated the random drawing of scores from a
single population with mean of 100 and standard deviation of
10. A related study could be done to investigate the effects
of differing population variances on error rate and power of
the multiple comparison procedures using the Games-Howell and
Tamhane modifications of the HSD (11, pp. 120-121).
In the most recent publication of Carmer and Walker, the
suggestion was made that the study of differing rates of
application of pesticide, for example, would be better
analyzed with trend analysis through multiple regression than
by way of multiple comparisons (3, p. 5). Testing the sig-
nificance of differences between regression coefficients,
using effect coding, is tantamount to testing the differences
between means (12, p. 299). With the advent of computer
packages such as SPSS and SAS, which permit application of
multiple regression techniques with relative ease, a study
could be done investigating the differences between trend
analysis and pairwise comparisons. It may well be that, as
105
multiple regression grows in usage, and the pendulum swings
away from "all or nothing" hypothesis testing, the use of
multiple comparison procedures may decline. Certainly the
literature has shown movement away from this area since the
mid 1970's. Whether this happens or not, this study of
multiple comparison procedures was an excursion through the
Wonderful World of Statistical Theory which has both
broadened and deepened my understanding of and appreciation
for the logic and technical vocabulary of the field.
106
CHAPTER BIBLIOGRAPHY
1. Bernhardson, Clemens S., "375: Type I Error Rates When Multiple Comparison Procedures Follow a Significant F Test of ANOVA," Biometrics. XXXI (March 197*5). ™ . 229-232.
2. Carmer, S. G., "Optimal Significant Levels for Application of the Least Significant Difference in Crop Perfor-mance Trials," Crop Science, XVI (Januarv-Februarv 1976), pp. 95-99.
3* and Walker, W. M., "Pairwise Multiple Com-parisons Procedures for Treatment Means," Technical Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983), pp. 1-33.
4- Professor of Biometry, University of Il-linois, Urbana, Illinois, Personal letter received January 14-, 1985.
5. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of "the American Statistical Association. LXX (1975) . pp. 574-583.
6. Fisher, R. A., Statistical Methods for Research Workers, 6th ed., Edinburgh (London), Oliver and Boyd, 1936.
7. Games, Paul, "Inverse Relation Between the Risks of Type I and Type II Errors and Suggestions for the Unequal n Case in Multiple Comparisons," Psychological ~ Bulletin. LXXV (1971), pp. 97-102.
8. Gill, J. L., "Evolution of Statistical Design and Analysis of Experiments," Journal of Dairy Science, LXIV (June 1981), p. 14.94.-1519.
9. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their Relationship to Power," Journal of Dairy Science, LVIII (September 1975), pp. 1372-1378.
107
10. Keselman, H. J.; Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Palrwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin, LXXXVI (Julv 1979), pp. 884.-888.
11. Kirk, Roger E., Professor of Psychology, Baylor Univer-sity, Waco, Texas, Personal letter received January 22, 1985.
12. Pedhazur, Elazar J., Multiple Regression in Behavioral Research: Explanation and Prediction. 2nd ed., New York, Holt, Rinehart and Winston, Inc., 1982.
13- Petrinovich, Lewis F. and Hardyck, Curtis D., "Error Rates for Multiple Comparison Methods: Some Evidence Concerning the Frequency of Erroneous Conclusions," Psychological Bulletin. Vol. VXXI (1969), pp. 43-54.
14. Ramsey, Philip H., "Power Differences Between Pairwise Multiple Comparisons," Journal of the American Statistical Association. LXXIII~~f1978), p. 479.
15. Ryan, T. A., "Comment on 'Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic,'" Psychological Bulletin. LXXXVIII (September 1980), pp. 354-355.
APPENDIX A
A GOODNESS OF FIT CHI-SQUARE TEST OF RANDOM NUMBER NORMALITY
T h i s p r o g r a m t e s t e d t h e a b i l i t y of a m i c r o c o m p u t e r t o g e n e r a t e n o r m a l l y d i s t r i b u t e d random n u m b e r s . One t h o u s a n d random^ number s were c a t e g o r i z e d i n t o t e n c l a s s e s . T h i s d i s -t r i b u t i o n was t e s t e d a g a i n s t t h e e x p e c t e d f r e q u e n c i e s of n o r m a l numbers c a l c u l a t e d f r o m p e r c e n t a g e v a l u e s i n a n o r m a l c u r v e t a b l e .
F o l l o w i n g t h e p r o g r a m l i s t i n g a r e s e l e c t e d p r i n t o u t s showing t h e r e l a t i o n s h i p b e t w e e n t h e t h e o r e t i c a l n o r m a l c u r v e
and t h e d i s t r i b u t i o n of g e n e r a t e d d a t a ( " * » ) f o r U = 12 , 16 , 19 , 2 0 , 2 1 , 2 4 , and 2 8 .
PROGRAM LISTING:
1110 1120 1130 1140 1150 1160 1170 1180 1190 1200 1210 1220 1230 1250 1260 1270 1280 1290 1300 1310 1320 1330 1340 1350 1360 1370 1380 1390
* * * * * i n i t i a t e
CLS : CLEAR : COLOR 4 ,0 ,0 ,0 DEFINT J.K.N U=ll N=1000 MU=100 SIGMA=10 C N T = 1 ° ' # REPETITIONS DIM C(10),PER(10),E(10),D(10),D2(10),DE(10),T(10),AVG(10)
' NUM OF UNIFORM RND # ' NUM OF REPETITIONS ' POPULATION MEAN ' POPULATION SDEV
GOSUB 2030 'SET UP PRINTER
U=U+1 : IF U=31 THEN END T1=VAL(MID$(TIME$,4,2))*60+VAL(RIGHT$(TIME$,2)) RANDOMIZE T1
BEGIN RND NUMBER LOOP *****
CLS FOR REP=1 TO 10 PRINT USING "NOW ON REP ## FOR J=1 TO N
FOR K=1 TO U A=A+RND
NEXT K X=(A-(U/2))*SIGMA+MU A=0
";REP;
' NID RND FROM U UNIFORM RND
108
109 1400 ' ***** CATEGORIZE SCORES ***** 1410 '
1430 IF X<=60 THEN C(1)=C(1)+l: T(1)=T(1)+1 :GOTO 1620 1450 IF X<70 THEN C(2)=C(2)+1: T(2)=T(2)+1 :G0T0 1620 1470 IF X<80 THEN C(3)=C(3)+1: T(3)=T(3)+1 :G0T0 1620 1490 IF X<90 THEN C(4)=C(4)+1: T(4)=T(4)+1 :GOTO 1620 1510 IF X<100 THEN C(5)=C(5)+1: T(5)=T(5)+1 :GOTO 1620 1530 IF X<110 THEN C(6)=C(6)+1: T(6)=T(6)+1 :G0T0 1620 1550 IF X<120 THEN C(7)=C(7)+1: T(7)=T(7)+1 :G0T0 1620 1570 IF X<130 THEN C(8)=C(8)+1: T(8)=T(8)+1 :GOTO 1620 1590 IF X<140 THEN C(9)=C(9)+1: T(9)=T(9)+1 :GOTO 1620 1610 IF X=>140 THEN C(10)=C(10)+1:T(10)=T(10)+1 1620 NEXT J 1630 * 1 6 4 0 ' ****** GOODNESS OF FIT TEST ****** 1650 ' 1660 FOR 1=1 TO 10 1670 READ PER(I)
1690 NEXT(I)=N*PER(I) ' C A L C E X P E C T E D FREQUENCIES
1720 RESTORE 0 1 3" 0 1 0 9" 0 5 4 6" 1 5 9 8" 2 7 3 4'• 2 7 3 4" 1 5 9 8" 0 5 4 6" 0 1 0 9" 0 0 1 3
1730 ' 1740 FOR 1=1 TO 10 1750 D(I)=C(I)-E(I) 1760 D2(I)=D(I)*D(I) 1770 DE(I)=D2(I)/E(I) 1780 CHI=CHI+DE(I) 1790 NEXT I
1810 ® ™ 2 I = S D M C H 1 + C H I : P ™ T "SING "CHI(##) , ####.###»;REP,CHI
1820 FOR 11=1 TO 10 :C(II)=0 :NEXT 1830 NEXT REP
1840 AVGCHI=SUMCHI/10: PRINT USING "AVGCHI = ####.###». AVGCHI 1850 BEND=AVGCHI ' A V b U U
1860 FOR 1=1 TO 10 1870 AVG(I)=T(I)/CNT 1880 T(I)=0 1890 NEXT I 1900 ' i9io ; ***** P R I N T OUTPUT ***** ±7 20
1930 G0SUB 2300 1940 ' 1950 ' **** L 0 0 p B A C K 1960 1970 LPRINT CHR$(12); : GOTO 1230 1980 '
1990 ' S E T UP PRINTER ROUTINE
2020 ' =========================================== 2030 LPRINT CHR$(15); 'COMPRESSED PRINT
2040 LPRINT CHR$(27);CHR$(9);CHR$(20); 'MOVE TO COL 25 1 1 0
2050 LPRINT CHR$(27);CHR$(57); 'SET LEFT MARGIN 2060 WIDTH LPRINT 132 2070 SCALE = 2 2080 F0$="## + ###.# \"+STRING$(85," ")+"\" 2090 FE$=" | \"+STRING$(85," ")+"\"
2100 H1$="CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS" 2110 H2$="TEN REPETITIONS" 2120 H3$="================-===============______________:_!! 2130 HH$="N = 1000 U = ##"
2140 Fl$=" -1- - 2- -3- -4- -5- -6- -7- -8- -9- 10 " 2160 H4$=" I
2180 lil'l ttl'l lit-*, m • " m j m • * tlt'Tlttl" 2190 Fl$= Category: "I'll ###'# "*'* ###-# ###-#
2200 F2$=" Observed: "+F2$ 2210 F3$=" Expected: "+F3$ 2230 H4$=" "+H4$ 2240 RETURN 2250 ' 2260 '========================================:==============
2 2 7 0 ' PRINT ROUTINE 2 2 8 0 ' = = = = = = = = = = = = = = = = = = = = = = : = = = = = = = ; = = = = = = = = = = = = = = = = = = =
2290 ' =========
2300 LPRINT:LPRINT
2310 TX=45-(LEN(Hl$))/2
2320 LPRINT TAB(TX);H1$
2330 TX=45-(LEN(H2$))/2 2340 LPRINT TAB(TX);H2$ 2350 TX=(45-LEN(H3$)/2) 2360 LPRINT TAB(TX);H3$ 2370 TX=(45-LEN(HH$)/2) 2380 LPRINT TAB(TX) USING HH$;U 2390 LPRINT
2400 H5$="MEAN CHI-SQUARE =####.###" 2410 TX=(45-(LEN(H5$)/2))
2420 LPRINT TAB(TX) USING H5$;AVGCHI 2430 LPRINT
2440 LPRINT Fl$ : LPRINT H4$
g E ̂ 2560 FOR 1=1 TO 10 2570 EX=(E(I)/SCALE)-1: IF EX<0 THEN EX=0 2580 0BS=(AVG(I)/SCALE) 2590 IF EX=0 THEN E$="" . G0T0 26?n 2600 IF EX<1 THEN E$=":" . G0T0 2620 2610 E$=STRING$(EX," * 2620 IF 0BS=0 THEN 0$="" . G0T0 2650 2630 IF 0BS<1 THEN 0$="<" ! GOTO 2fisn 2640 0$=STRING$(0BS,"*") * 2650 LPRINT USING F0$;I,AVG(I),0$ 2660 LPRINT USING FE$;E$
111
2670 NEXT I 2680 '
2690 FOR I = 1 TO 10 : AVG(I)=0 : NEXT 2700 T0T=0 : CHI=0 : SUMCHI=0 : AVGCHI=0 2740 RETURN 2750 ' 2760 '========================== END =====
112
CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=100C SCORES
UNIFORM RANDOM NUMBERS (U) = 12
MEAN CHI-SQUARE = 100.062
Category: -1- -2 - -3 - -4 - -5 - -6 - -7 - -8- -9- -10-
y. u
10
Observed: 0.1 0.9 19.5 143.9 335.9 338.5 138.1 22.5 0.6 Expected: 1.3 10.9 54.6 159.3 273.4 273.4 159.3 54.6 10.9 l!c
+ 0 <
1 <
20 * * * •
144 n u m m m i n n i i n m i m i o u i M
339 * " H " H H H H H H m t » i M m i i i i m m i m < ] i i i i i i m i m i i i i i i i i m i i i i M ! H H H H H >
133 **»*»»»»*«»»Ma Mm »mi n i l
23 *****
1 <
0
1 1 3
CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS
TEN REPETITIONS OF N=100Q SCORES
UNIFORM RANDOM NUMBERS !U) = 16
MEAN CHI-SQUARE = 3 9 . 3 8 4
C a t e g o r y ! - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 1 0 -
O b s e r v e d : 0 . 2 3 . 4 3 4 . 8 1 5 2 . 8 3 0 4 . 7 3 1 2 . 2 1 5 2 . 8 3 4 . 4 4 . 6 0 . 1
E x p e c t e d : 1 . 3 1 0 . 9 5 4 . 6 1 5 9 . 8 2 7 3 . 4 2 7 3 . 4 1 5 9 . 8 5 4 . 6 1 0 . 9 1 . 3
1 + 0 <
35
3 1 2 MjHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHt
3 4
10
1 U
CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=iOOO SCORES
UNIFORM RANDOM NUMBERS (U) = 19
MEAN CHI-SQUARE = 12.643
Category.* -i- -2- -3- -4- -5- -6- -7- -8- -9- -10-
10
Observed: 0.5 7.2 53.9 157.3 280.6 287.1 157.8 47.8 7.6 0.2 Expected: 1.3 10.9 54.6 159.8 273.4 273.4 159.8 54.6 10.9 1.3
1 + 1 C
7 *
54 ***********
157 *******************************
281 ********************************************************
287 *********************************************************
158 ********************************
48 **********
8 * *
0 <
115
CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=100Q SCORES
UNIFORM RANDOM NUMBERS (U) = 20
MEAN CHI-SQUARE = 3.678
Category: -1- -2- -3- -4- -5- -6-- 1 0 -
Observed: 1.1 7.8 49.7 160.7 272.8 279.4 165.5 52.3 9.9 0.8 Expected: 1.3 10.9 54.6 159.8 273.4 273.4 159.8 54.6 lo!? 1.3
1 + 1 < i I
2 + 3 **
3 + 50 *****«#*#
161
166 MHHHHHHHHHHt*
52 *######**#
10 * *
116
CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=1000 SCORES
UNIFORM RANDOM NUMBERS (U) = 21
MEAN CHI-SQUARE = 3.720
Category: -1 -- 1 0 -
Observed: 1.2 8.3 55.7 157.1 269.0 275.1 165.6 54.0 12.3 1.7 Expected: 1.3 10.9 54.6 159.3 273.4 273.4 159.8 54.6 10.9 1.3
1 + 1 <
8 * *
56 ***********
10
157 *******************************
Zh9 ***********************************************#*##***
166 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
54 ***********
12 * *
117
CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS
TEN REPETITIONS OF N=1000 SCORES
UNIFORM RANDOM NUMBERS (U) = 24
MEAN CHI-SQUARE = 2 0 . 4 2 4
C a t e g o r y : - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 1 0 -
O b s e r v e d : 1 . 7 1 6 . 1 6 3 . 3 1 6 3 . 9 2 6 1 . 4 2 5 5 . 4 158 .4 6 2 . 2 1 5 . 4
E x p e c t e d : 1 . 3 1 0 . 9 5 4 . 6 1 5 9 . 3 2 7 3 . 4 2 7 3 . 4 1 5 9 . 8 5 4 . 6 1 0 . 9
1 + 2 C i
2 + 1 6 * * *
63 *************
164 MM********************#********
261 *******************************#M**********$*******
i '55 ********************************$##**********#*****
158 ********************************
62 ************
15 ***
2.2 1 . 3
118
CHI-SQUARE TEST FOR NORMALITY OF RANDOM NUMBERS TEN REPETITIONS OF N=1000 SCORES
UNIFORM RANDOM NUMBERS (U) = 28
MEAN CHI-SQUARE = 62.:
Category: -i- -2- -3- -4- -5- -5- -7-- 1 0 -
o. o
10
Observed: 3.8 19.6 75.0 160.3 248.9 241.2 152.8 71.4 23.2 Expected: 1.3 10.9 54.6 159.8 273.4 273.4 159.8 54.6 10.9 L 3
20 * * * *
75 ***************
160 ********************************
249 ***********************************^^^^^#^##^
241 *************************************^$#*m$#m
153 *******************************
71 **************
23 *****
4 C
APPENDIX B
TWO SAMPLES OF DATA GENERATED BY THE MAIN PROGRAM
T^ 1 S a P P e n d l x displays two samples of data generated by rk-3ITandP?°gram' samples are produced with three groups Lic-JJ and five scores per group [J=1]. e p
Group 3
67.99 100.50 10^.71 99.57
103.83
M: s:
Significant F-Ratio
Group 1 Group 2
103.16 94-70
105.17 88.60 80.07
117.60 124.01 108.79 100.01 116.15
94.34 10.391
113.31 9.194
95.32 15.431
SOURCE Between Within Total
Group 1
81.36 105.66 89.36
103.99 92.13
M: 94-. 50 s: 10.24.0
SOURCE Between Within Total
SS DF MS 1141.1 2 570.54 1722.5 12 143.54 2863.6 14
143.54
F 3.975
Fcv 3.890
Sig? Yes
Non-Significant F-Ratio
Group 2
109.30 105.82 78.68 93.44 106.09
98.67 12.709
F 0.228
SS DF MS 56.7 2 28.34
1491.4 12 124.29 1548.1 14
124.29
Group 3
104.91 82.38 97.01 85.25
103.37
94.58 10.319
Fcv Sig? 3.890 No
119
APPENDIX C
The Main BASIC Program Listing
1000 '
'PROGRAM TO ANALYZE EXPERIMENTWISE AND COMPARISONWISE ERROR RATES - F°R S I X S E L E C T E D MULTIPLE COMPARISON PROCEDURES
J030 Ph.D. Dissertation in Educational Research , College of Education
J050 North Texas State University 060 William R. Yount
1 0 7 0 . July 1985 1080 ' 1090 ' ~ 1100 'PROGRAM MAP: 1120 'INITIALIZATION 1 1 3 0 'SET UP VARIABLES H40 'SET UP PRINTER 1160 'PREPARE PARAMETERS FOR REPETITIONS 1 1 7 0 'SET CYCLE PARAMETERS 1 1 8 0 'ASSIGN SAMPLE SIZES 1 1 9 0 'OBTAIN CRITICAL VALUES 1200 'MAIN LOOP: 1000 REPETITIONS 1 2 1 0 'GENERATE SCORES 1 2 2 0 'CALCULATE AND TEST F-RATIO 1 2 3 0 'CALCULATE MCP TESTS 1240 'SUMMARIZE RESULTS 1 2 5 0 'CALCULATE PERCENTAGES 1 2 6 0 'PRINT OUT RESULTS 1280 |PROGRAM ENDS AFTER LAST K,J CONDITION COMPLETED 1290
1310 I " " " " " " " " " " " " " " " " " " " INITIALIZATION SUBROUTINE """""'"""""""mmmmmmmi
1320 'SET UP VARIABLES 1330 ' 1340 KEY OFF : CLS : CLEAR 1350 REP%=1000 1360 RANDOMIZE VAL(RIGHT$(TIME$,2)) 1370 FOR RX%=1 TO RND*100:NEXT 1380 DEFINT I-K 1390 DEFSTR 0
!//O S 1 > = V N<2>-10 : N(3)-15 : N(4)=20 : N(5)-25
1450 M ">'NZZ ^ I N T ° U T. 0 F M D L T I P L E COMPARISON ERROR RATES * * *» 1 W ) UZ - GROUPS (k): # COMPUTATION TIME: \ \"
120
121
1460 1470 1480 1490 1500 1510 1520 1530 1540 1550 1560 1570 1580 1590 1600 1610 1620 1630 1640 1650 1660 1670 1680 1690 1700 1710 1720 1730 1740 1760 1770 1780 1790 1800 1810 1820 1830 1840 1850 1860 1870 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
03 05 06 07 08 09 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 OFMT 0TBL1= 0TBL2= 0TBL3= 0TBL4== ODIFN= ODIFM= ODIFL= ODIFH= ODIFS= ODIFHS ODIFHT
="SIZE (J): ft
#
r =" #: ti
F ##.###
Q ##.###
M ##.###
QT" ##.###"
"F-TESTS: ####" :"SIG F-TESTS: #### E: ## ="PERCENT SIG: #.### E: 0.050"
EXPERIMENTWISE
COMPARISONS: #####"
COMPARISONWISE"
T?
__ ft
="LSD ft
="FLSD _ f t
="MRT __ f T ="SNK _ _ t t
="SSD IT
="HSD
ALL ANOVAS SIG ANOVAS ALL ANOVAS
:"HSD-TK ft
"HSD-SS tf
"REP=### "SOURCE " BET " WITH " TOT "DIF=## "DIF=## "DIF=## "DIF=## "DIF=## ="DIF=## '="DIF=##
##### ##.##%
##### ##.##%
##### ##### ##.##% ##.##% ##### ##### ##.##% ##.##% ##### ##### ##.##% ##.##% ##### ##### ##.##% ##.##%
UNEQUAL ##### ##.##% ##### ##.##%
NCOMPS= #### DFB DF MS ## ##### .## ##
### ##### .##"
###"
##### ##.##% ##### ##.##%
K=# J=# SS
######.# ######.# #######.# ## NCV(##)=##.##
MCV(##)=##.## LCV=##.## » HCV=##.## " SCV=##.## » HCV-SS=##.##" HCV-TK=##.##"
#####" ##.##%"
##### ##.##% ##### ##.##% ##### ##.##% ##### ##.##%
N HSD ##### ##.##% ##### ##.##%
'=## DFW=### F FCV .### ##.###
SIG ANOVAS" ft
#####" ##.##%" #####" ##.##%" #####" ##.##%" #####" ##.##%" #####" ##.##%"
tt
#####" ##.##%" #####" ##.##%"
DFT=### NTOT=###" : SIG ICNT" : # ###"
.##
.##
.##
.##
.##
.##
J SET UP PRINTER AND DISK FOR OUTPUT
LOCATE 3,1 : INPUT "(P)RINTER OR (S)CREEN "-ZZ$ IF ZZ$="P" OR ZZ$="p" THEN 0P="LPT1:" : GOTO 1960 IF ZZ$="S" OR ZZ$="s" THEN 0P="SCRN:" BEEP : GOTO 1920 OPEN OP FOR OUTPUT AS #3 IF OP="SCRN:" THEN 2100 PRINT "READY PRINTER — THEN PRESS ANY KEY" A$=INKEY$ : IF A$="" THEN 1990
GOTO 1960
r'Skip printer setup
122
2010 LPRINT CHR$(27);"f";CHR$(2) 2020 LPRINT CHR$(27);"y"; 2030 LPRINT CHR$(27);"q"; 2040 LPRINT CHR$(27);CHR$(9);CHR$(12); 2050 LPRINT CHR$(27);CHR$(57) ; 2060 LPRINT CHR$(27);CHR$(9);CHR$(75); 2070 LPRINT CHR$(27);CHR$(58); 2080 WIDTH LPRINT 80 2090 LPRINT CHR$(27);CHR$(9);CHR$(15); 2100 GOTO 2590 2120 2130 2140 2150 2160 2170 2180 2190 2200 2210 2220 2230 2240 2250 2260 2270 2280 2290 2300 2310 2320 2330 2340 2350 2360 2370 2380 2390 2400 2410 2420 2430 2440 2450 2470 2480 2490 2500 2510 2520
font module #2 10 pitch quality print move to col 12 set l.m. at 12 move to col 75 set r.m. at 75
move to col 12 SET UP PARAMETERS
MCP TEST PRINTOUT SUBROUTINE
LPRINT TAB(3+XX*6);XX; : LPRINT TAB(3+XX*6);"===":
NEXT XX : NEXT XX
LPRINT LAB$ FOR 11=2 TO K FOR XX=2 TO K LPRINT FOR XX=K TO 2 STEP -1
LPRINT " :"+STRING$((K-XX+1)*2 " ")• FOR YY=K-1 TO 1 STEP -1
IF YY>=XX THEN LPRINT TAB(LP0S(0)+1);" "• • GOTO 2240 M„vrn
L P R I N T TAB(LP0S(0)+1);USING » \ \ ";FLAG$(XX,YY); NEXT YY LPRINT
NEXT XX LPRINT RETURN i
J DIFFERENCE CHART PRINTOUT SUBROUTINE
LPRINT :LPRINT :LPRINT "PAIRWISE DIFFERENCES:" • LPRINT LPRINT TAB(13);""; FOR II=K-1 TO 1 STEP -1
LPRINT USING " ###.## ";XBAR(II); NEXT II LPRINT LPRINT TAB(13);""; FOR 11=1 TO K-l
LPRINT " ====== NEXT II LPRINT FOR II=K TO 2 STEP -1
LPRINT USING " ###.## FOR JJ=K-1 TO 1 STEP -1
IF JJ>=II THEN LPRINT TAB(LP0S(0)+1);" "• • GOTO ?500 DIF=ABS(XBAR(JJ)-XBAR(II)) ' * 0 0 1 0 2 5 0 0
LPRINT TAB(LP0S(0)+1);USING "###.## ";DIF-NEXT JJ
LPRINT NEXT II
";XBAR(II);
2530 RETURN
:'Select 1 cycle to print
123
2560 PREPARE PARAMETERS FOR REPETITIONS """"""'"""""""tmmtf
2570 ' INCREMENT K,J 2580 ' 2600 BEEP:BEEP
2610 LOCATE 5,1 : INPUT "ENTER K(3,4,5,6) AND J(l,2,3 4 5 6 7) AS # # "*lf T 2620 LOCATE 6,1 : INPUT "PRINTOUT (Y) OR NO PRINTOUT (N) OF CALCS "'-wt 2630 IF OPTO'T' AND OPTO»N" THEN BEEP : GOTO 2620 ' 2640 SETUP=1 2650 GOTO 2880 2660 ' 2670 ' TOP OF INCREMENT CYCLE 2680 '
2690 K=K+1 : J=0 : IF K=7 THEN 8510 .'goto nroeran, PnH 2700 J=J+1 : IF J=8 THEN 2690 !»? p r o ^ m e"d
2880 NC0MPS=REP%*(K*(K-l))/2 •»Pa a r r e S 6 t J
2890 PRTOUT-O ; . S e t p r i n t e r flag * C o n , P a r i s o n s
2900 RANDOMIZE RX% :'Randomize aoain
2920° '•'Reset timer to 0
2930 PRINT "*** BEGIN ***» :'Scrn display of status 2940 PRINT USING "*=» K=# J=#"-K J 2950 PRINT ' ' 2960 RD%=RND*1000 2970 PRINT "REP #"RD%" WILL BE PRINTED" 2980 * 2990 ' ASSIGN SAMPLE SIZES 3000 * 3010 NT0T=0 3020 FOR 1=1 TO K 3030 IF J<>6 THEN 3060 30^0 IF 1=1 THEN NN(I)=10 ELSE NN(I)=NN(I-l)+5 3050 GOTO 3100 3060 IF J<>7 THEN 3090 3 0 7 0 I F 1=1 THEN NN(I)=80 ELSE NN(I)=20 3080 GOTO 3100 3090 NN(I)=N( J) 3100 NTOT=NTOT+NN(I) 3110 NEXT I 3120 ' 3130 DFB=K-1 3140 DFW=NT0T-K 3150 DFT=NT0T-1 3160 ' 3170 ' READ CRITICAL VALUES 3180 * 3190 OPEN "G:TABLE.RND" AS #2 LEN=15
3210 ™ L D #2'3 A S TAB$' 2 A S B$' 2 A S W$' 4 A S CV5$' 4 A S C V 1 $
3220 IF 0PT="Y" THEN LPRINT "CRITICAL VALUES-" 3230 'F TABLE 3240 FX=1 : IX=DFB : JX=DFW 3250 RECORD = (FX-1)*1000 + (IX-2)*200 + JX
124 3260 GET #2, RECORD 3270 FCV=CVS(CV5$) 3280 IF OPT="Y" THEN LPRINT USING "FCV=##.###";FCV 3290
3300 'STUDENTIZED RANGE: SNK(LEVELS), LSD(2), HSD(K) 3310 FX=2 K J
3320 FOR IX=2 TO K 3330 RECORD = (FX-1)*1000 + (IX-2)*200 + JX 3340 GET #2, RECORD 3350 Q(IX)=CVS(CV5$)
3370 NEXTFIXPT=="Y" ™ E N L P R I N T U S I N G ";IX,Q(IX);
3380 IF OPT="Y" THEN LPRINT 3390 '
3400 'MULTIPLE RANGE: DMRT(LEVELS) 3410 FX=3 3420 FOR IX=2 TO K 3430 RECORD = (FX-1)*1000 + (IX-2)*200 + JX 3440 GET #2, RECORD 3450 M(IX)=CVS(CV5$)
3460 IF OPT="Y" THEN LPRINT USING "M(#)=#.### "-IX MCIXV 3470 NEXT IX , u , n u ^ ' 3480 IF OPT="Y" THEN LPRINT 3490 ' 3500 'STUDENT AUGMENTED RANGE: H-SS(K) 3510 FX=4 : IX=K
3520 RECORD = (FX-1)*1000 + (IX-2)*200 + JX 3530 GET #2, RECORD 3540 QT=CVS(CV5$)
3560 'F °PT="Y" T H E N L P R I N T U S I N G "QT=#-### "^ T : L P ™
3570 CLOSE #2
3590 '""'"'"""""'"""""""HMtmMMMMt BEGIN MAIN LOOP """""'"""""""""MMttrmMMnnHi
3600 3610 FOR NREPS%=1 TO REP% 3620 IF INT(NREPS%/C)<>NREPS%/C THEN 3640 3630 PRINT USING "REP: #### TIME: \ \":NREPS% TIMES 3640 IF NREPS%ORD% OR OPT="N" THEN 3700 , $
3660 PRTOUT-fING OFMT;NREPSZ'It'J'KCOMPS.DFB.I>™.DfT,NTOT 3670 ' 3680 ' GENERATE SCORES 3690 ' 3700 FOR S%=1 TO K 3710 FOR N%=1 TO NN(S%) 3720 FOR U%=1 TO 20 :'
3740 NFYTAn7+RND :»' R a n d o m NID error generated from 20 3750 E=(A-B)*SIGMA •' u n i f°™ random numbers. B-10. SIGMA-10. 3760 A=0 3770 X=MU+E •» • j -j 3780 SX2-SY?+Y*Y '' ndividual score without treatment effect
, 1 : S u m o f x squared T(S%)-T(S%)+X :' Sum of Xj
125
3800 NEXT N% 3810 XBAR(S%)=T(S%)/NN(S%) r'Meani 3820 TJ(S%)=T(S%)^T(S%)/NN(S%) :'Tj squared / ni 3830 T=T+T(S%) :'SumsumXij 3840 TJ=TJ+TJ(S%) :'Sun, of all Ti 3850 NEXT S% J
3860 ' 3870 IF NREPS%ORD% OR OPT="N" THEN 4000 3880 LPRINT :LPRINT "DATA SUMMARY:" 3890 FOR 1=1 TO K 3900 LPRINT USING "MEAN(#)=###.# ";I,XBAR(I); 3910 NEXT I 3920 LPRINT 3930 FOR 1=1 TO K 3940 LPRINT USING " N(#)= ## ";I,NN(I); 3950 NEXT I 3960 LPRINT 3970 ' 3980 ' CALCULATE F 3990 ' 4000 TTN=T*T/NT0T 4010 SSB=TJ-TTN 4020 SSW=SX2-TJ 4030 SST=SX2-TTN 4040 MSB=SSB/DFB 4050 MSW=SSW/DFW 4060 F =MSB/MSW
4080 ' IS F-RATIO SIGNIFICANT? 4090 ' 4100 IF F>=FCV THEN SIG=1 ELSE SIG=0 4110 IF SIG=1 THEN ICNT=ICNT+1 4120 *
4130 IF NREPS%ORD% OR OPT="N" THEN 4230 4140 LPRINT 4150 LPRINT 0TBL1
4160 LPRINT USING 0TBL2;SSB,DFB,MSB,F,FCV,SIG,ICNT 4170 LPRINT USING 0TBL3;SSW,DFW,MSW 4180 LPRINT USING 0TBL4;SST,DFT 4190 LPRINT 4220 ' 4230 FOR 1=1 TO K C L E A N U P
4240 T(I)=0 : TJ(I)=0 4250 NEXT I 4260 SX2=0 : T=0 : TJ=0 4280 ' """'"""MMimiiimniitmimti! MULTIPLE COMPARISONS """'""""""""""""""mmmmi 4285 4290 ' 4300 ' RANK MEANS FROM HIGH TO LOW 4310 ' 4320 FOR PRI=1 TO K-l 4330 FOR SEC=PRI+1 TO K 4340 IF XBAR(PRI)>=XBAR(SEC) THEN 4370 XBAR(l),NN(l)=high
126 4350 4360 4370 4380 4400 4410 4420 4430 4440 4450 4460 4470 4480 4490 4500 4510 4520 4530 4540 4550 4560 4570 4580 4590 4600 4610 4620 4630 4640 4650 4660 4670 4680 4690 4700 4710 4720 4730 4740 4750 4760 4770 4780 4790 4800 4810 4820 4830 4840 4850 4860 4870 4880
SWAP XBAR(PRI),XBAR(SEC) SWAP NN(PRI),NN(SEC)
NEXT SEC NEXT PRI f
IF J>5 THEN 6180 t
' EQUAL N ROUTINE T
SDS = SQR(2*MSW/NN(1)) SD = SQR(MSW/NN(1)) LCV=Q(2)*SD HCV=Q(K)*SD SCV=SQR((K-1)*FCV)*SDS IF NREPS%ORD% OR OPT='"N" THEN 4950 LPRINT "SED (L,M,N,H) LPRINT USING "SED LPRINT USING "SED LPRINT LPRINT "SED (SSD) LPRINT USING "SED LPRINT USING "SED LPRINT LPRINT "LSD CV LPRINT USING " LPRINT USING " LPRINT LPRINT "SNK CV FOR 1=2 TO K
SNK=Q(I)*SD LPRINT USING "SNK(#)
NEXT I : LPRINT LPRINT "MRT CV FOR 1=2 TO K
MRT=M(I)*SD LPRINT USING "MRT(#)
NEXT I LPRINT LPRINT "HSD CV LPRINT USING " LPRINT USING " LPRINT LPRINT "SSD CV LPRINT USING " LPRINT USING " LPRINT CHR$(12); LPRINT "RANKED MEANS:" FOR 1=1 TO K
LPRINT USING "MEAN(#)=### NEXT I LPRINT FOR 1=1 TO K
LPRINT USING " N(#)=##
XBAR(k),NN(k)=low
:' Goto Unequal n routine
' S.E.D.:SSD ' S.E.D.:LSD,HSD,SNK,MRT ' LSD CV ' HSD CV ' SSD CV
SQT(MSW / N)" SQT(####.#/##)";MSW,NN(1) ###.##";SD
SQT(2 * MSW / N)" SQT(2 * ####.#/##)";MSW,NN(1) ###.##";SDS
QC2) * SD" #.### * ##.###";Q(2),SD ##.###";LCV
Q(K) SD"
= #.### * ##.### = #.###";I,Q(I),SD,SNK
M(K) * SD"
= #.### * ##.### = #.###";!,M(I),SD,MRT
Q(K) * SD" #.### * ##.###";Q(K),SD ##.###";HCV
SQT((K-1)*FCV) * SDS" SQT(( # )*#.### * ##.###";K-l,FCV,SDS ##.###";SCV
# ";I,XBAR(I);
";I,NN(I);
127
4890 4900 4910 4920 4930 4940 4950 4960 4970 4980 4990 5000 5010 5020 5030 5040 5050 5060 5070 5080 5090 5100 5110 5120 5130 5140 5150 5160 5170 5180 5190 5200 5210 5220 5230 5240 5250 5260 5270 5280 5290 5300 5310 5320 5330 5340 5350 5360 5370 5380 5390 5400 5410
NEXT I GOSUB 2320 LPRINT : LPRINT LPRINT "MULTIPLE COMPARISON PROCEDURES:" : LPRINT
„ STUDENT NEWMAN KEULS (SNK): EQUAL N'S LAB$="LAYER" V
FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II
FOR II=K TO 2 STEP -1 FOR JJ=1 TO II-l
IF F L A G $ ( I I , A N D JJ=1 THEN 11=2 : GOTO 5200 *'SKIP RFST
™ N J J = n " 1 : G 0 T° ; R O W
NCV(KK)=Q(KK)*SD DIF=ABS(XBAR(II)-XBAR(JJ))
( IF NREPS%=RD% AND 0PT="Y" THEN LPRINT USING ODIFN;DIF,KK,NCV(KK)
IF DIF>=NCV(KK) THEN 5140 FLAG$(II,JJ+1)="-M
FLAG$(II-1,JJ)="-" FLAG$(II,JJ)="NS" IF KK=K THEN 5210 JJ=II-1 GOTO 5180
IF NF=1 THEN 5160 ELSE NE=NE+1 : NF=1 IF SIG=1 THEN SNE=SNE+1
NC=NC+1 : FLAG$(II,JJ)="*" IF SIG=1 THEN SNC=SNC+1
IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 NEXT JJ
NEXT II
IF NREPS%=RD% AND 0PT="Y" THEN LAB$="FINAL" : GOSUB 2150
r'COMP NOT SIG
:'SKIP ALL TESTS :'SKIP REST OF ROW
:'EW (ALL) :'EW (SIG) :'PC (ALL) :'PC (SIG)
— MULTIPLE RANGE TEST (MRT): EQUAL N'S
FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II
LAB$="LAYER" FOR II=K TO 2 STEP -1
FOR JJ=1 TO II-1 IF FLAG$(II,JJ)="-" AND JJ=1 THEN 11=2 IF FLAG$(II,JJ)="-" THEN JJ=II-1 KK=II-JJ+1 MCV(KK)=M(KK)*SD DIF=ABS(XBAR(II)-XBAR(JJ))
IF NREPS%=RD% AND OFIV'Y" THEN LPRINT USING ODIFM;DIF,K,MCV(EIC)
IF DIF>=MCV(KK) THEN 5440 FLAG$(II,JJ+1)="-" FLAG$(II-1,JJ)="-" FLAG$(II,JJ)="NS"
IF KK=K THEN 5510 :'SKIP ALL TESTS
: GOTO 5500 r ' S K I P REST : GOTO 5490 r ' S K I P THIS ROW :'Layer index for MRT
128
5430 GOTO 5490 : ^ 5 4 4 0 IF MF=1 THEN 5460 ELSE ME=ME+1 : MF=1 5 4 5 0 IF SIG=1 THEN SME=SME+1 5 4 6 0 MC=MC+1 : FLAG$(II,JJ)="*" 5 4 7 0 IF SIG=1 THEN SMC=SMC+1 5480 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 5490 NEXT JJ 5500 NEXT II 5510 *
5530 ^ OpT="Y" THEN LAB$="FINAL" : GOSUB 2150
255Q ! (F)LSD PROCEDURE: EQUAL N'S
5560 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II
5580 FOR II=K TO 2 STEP -1 5590 FOR JJ=1 TO II-l 5600 DIF=ABS(XBAR(II)-XBAR(JJ))
5620 S ™ > 2 c " 5 6 T ' Y " T H E" M I n T U S I N G 0 D I F L : ™ 5630 FLAG$(II,JJ)="NS" 5640 GOTO 5690
H I ° J F LF=1 THEN 5670 ELSE LE=LE+1 : LF=1 5 6 6 0 IF SIG=1 THEN FE=FE+1 5670 LC=LC+1 : FLAG$(II,JJ)="*»
I F S I G = 1 T H E N FC=FC+1 5690 NEXT JJ 5700 NEXT II 5710 ' 5720 IF NREPS%=RD% AND 0PT="Y" THEN LAB$="FINAL" : GOSUB 2150
5^° i TUKEY PROCEDURE (HSD): EQUAL N'S
5760 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II
5780 FOR II=K TO 2 STEP -1 5790 FOR JJ=1 TO II-l 5800 DIF=ABS(XBAR(II)-XBAR(JJ)) 5810 _ IF NREPS%=RD% AND OPT.»Y» THEN LPRINT USING ODIFH;DIP,HCV
5830 IF DIF>=HCV THEN 5860 5840 FLAG$(II,JJ)="NS" 5850 GOTO 5900
^60 IF HF=1 THEN 5880 ELSE HE=HE+1 : HF=1 58?0 IF SIG=1 THEN SHE=SHE+1 5880 HC=HC+1 : FLAG$(II,JJ)="*»
5900° N E X T J J I F S I G = 1 T H E N S H C » S H C ^
5910 NEXT II 5920 '
5940 ̂ N R E P S % = R D % A N D OPT="Y" THEN LAB$="FINAL" : GOSUB 2150
129
5 9 5 0 ' SCHEFFE PROCEDURE (SSD): EQUAL N'S 5960 5970 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II 5980 5990 FOR II=K TO 2 STEP -1 6000 FOR JJ=1 TO II-l 6010 DIF=ABS(XBAR(II)-XBAR(JJ))
6030 ' ^ N R E P S % = R D % A N D 0PT"'Y" THEN LPRINT USING ODIFS;DIF,SCV
6040 IF DIF>=SCV THEN 6070 6050 FLAG$(II,JJ)="NS" 6060 GOTO 6110 6 0 7 0 IF SF=1 THEN 6090 ELSE SE=SE+1 : SF=1 6 0 8 0 IF SIG=1 THEN SSE=SSE+1 6 0 9 0 SC=SC+1 : FLAG$(II, 5J9? IF SIG=1 THEN SSC=SSC+1 6110 NEXT JJ 6120 NEXT II 6130 '
6140 IF NREPS%=RD% AND OPT="Y" THEN LAB$="FINAL" : GOSUB 2150 6150 f
6170 *'Skip unequal n routine
6180 ' UNEQUAL N ROUTINE 6190 ' 6 2 0 0 ' SNK UNEQUAL N'S 6210 6220 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II
6240 LAB$="LAYER" 6250 FOR II=K TO 2 STEP -1 6260 FOR JJ=1 TO II-l
I F F L A G $ ( H » A N D JJ=1 THEN 11=2 : GOTO 6480 -'SKIP REST
6290° T ™ : , G 0 T O : ROW
6300 NHAR-2/((l/NN(II))+(l/NN(JJ))) • f Layer index for SNK
6310 NCV(KK)=Q(KK)*SQR(MSW/NHAR) •' SNK 6320 DIF=ABS(XBAR(II)-XBAR(JJ))
6330 i IF NREPS%=RD% AND OPT="Y» THEN LPRINT USING ODIFN;DIF,KK,NCV(KK)
6350 IF DIF>=NCV(KK) THEN 6420 6360 FLAG$(II,JJ+1 6370 FLAG$(II-1,JJ)="~" 6380 FLAG$(II,JJ)="NS"
6400 JJ=TT-]I ™ E N 6 4 9 0 :' S K I P A L L T E S T S
6410 GOTO 6460 :'SKIP REST OF ROW
6420 IF NF=1 THEN 6440 ELSE NE=NE+1 : NF=1 f430 IF SIG=1 THEN SNE=SNE+1 °440 NC=NC+1 : FLAG$(II,JJ)="*" 64f0 IF SIG=1 THEN SNC=SNC+1 6460 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 6470 NEXT JJ
130 6480 NEXT II 6490 ' 6500 IF NREPS%=RD% AND 0PT="Y" THEN LAB$="FINAL" : GOSUB 2150 6510 ' 6520 ' MRT UNEQUAL N'S 6530 ' 6540 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II 6550 LAB$="LAYER" 6560 ' 6570 FOR II=K TO 2 STEP -1 6580 FOR JJ=1 TO II-l 6590 IF FLAG$(II,JJ)="-" AND JJ=1 THEN 11=2 : GOTO 6800 :'SKIP REST 6600 IF F L A G $ ( I I , T H E N JJ=II-1 : GOTO 6790 :'SKIP THIS ROW 6610 KK=II-JJ+1 :'Layer index for MRT 6620 NHAR=2/((1/NN(II))+(1/NN(JJ))) :'MRT 6630 MCV(KK)=M(KK)*SQR(MSW/NHAR) :'MRT 6640 DIF=ABS(XBAR(II)-XBAR(JJ)) 6650 IF NREPS%=RD% AND OPT="Y" THEN LPRINT USING ODIFM;DIF,KK,MCV(KK) 6660 ' 6670 IF DIF>=MCV(KK) THEN 6740 6680 FLAG$(II,JJ+1)="-" 6690 FLAG$(II-1,JJ)="-" 6700 FLAG$(II,JJ)="NS" 6710 IF KK=K THEN 6810 SKIP ALL TESTS 6 7 2 0 JJ=II-1 :'SKIP REST OF ROW 6730 GOTO 6780 6740 IF MF=1 THEN 6760 ELSE ME=ME+1 : MF=1 6750 IF SIG=1 THEN SME=SME+1 6760 MC=MC+1 : FLAG$(II,JJ)="*" 6770 IF SIG=1 THEN SMC=SMC+1 6780 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 6790 NEXT JJ 6800 NEXT II 6810 ' 6820 IF NREPS%=RD% AND OPT="Y" THEN LAB$="FINAL" : GOSUB 2150 6830 ' 6840 ' (F)LSD UNEQUAL N'S 6850 ' 6860 LAB$="FINAL" 6870 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II 6880 ' 6890 FOR II=K TO 2 STEP -1 6900 FOR JJ=1 TO II-l 6910 NLS=((1/NN(II)) + (1/NN(JJ))) :'(F)LSD 6920 LCV=Q(2)*SQR(MSW*NLS/2) :'(F)LSD 6930 DIF=ABS(XBAR(II)-XBAR(JJ)) 6940 IF NREPS%=RD% AND 0PT="Y" THEN LPRINT USING ODIFLrDIF.LCV 6950 ' 6960 IF DIF>=LCV THEN 6990 6970 FLAG$(II,JJ)="NS" 6980 GOTO 7030 6990 IF LF=1 THEN 7010 ELSE LE=LE+1 : LF=1 7000 IF SIG=1 THEN FE=FE+1
131 7 0 1 0 LC=LC+1 : FLAG$(II,JJ)="*" 7 0 2 0 IF SIG=1 THEN FC=FC+1 7030 NEXT JJ 7040 NEXT II 7050 IF NREPS%=RD% AND 0PT="Y" THEN GOSUB 2150 7060 '
^non ' SPJOTVOLL-STOLINE MODIFICATION OF HSD /UoO
7090 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ,II
7110 FOR II=K TO 2 STEP -1 7120 FOR JJ=1 TO II-l
7130 IF NN(II)<=NN(JJ) THEN NMIN=NN(II) ELSE NMIN=NN(JJ) •' H-SS 7140 HCV=QT*SQR(MSW/NMIN) . t H_qq 7150 DIF=ABS(XBAR(II)-XBAR(JJ)) ' 7160 ^ IF NREPS%=RD% AND OPT="Y" THEN LPRINT USING ODIFHS;DIF,HCV
7180 IF DIF>=HCV THEN 7210 7190 FLAG$(II,JJ)="NS" 7200 GOTO 7250 7210 IF HF=1 THEN 7230 ELSE HE=HE+1 : HF=1 7 2 2 0 IF SIG=1 THEN SHE=SHE+1 7230 HC=HC+1 : FLAG$(II,JJ)="*"
72M NEXT JJ " S I G = 1 T H E N S H C = S H C + 1
7260 NEXT II 7270 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 7280 ' 72qq I TUKEY-KRAMER MODIFICATION OF HSD
7310 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)="0": NEXT JJ II 7320 FOR II=K TO 2 STEP -1 ' 7330 FOR JJ=1 TO II-l 7340 NTK=((l/NN(II))+(l/NN(JJ)))/2 .» H_T1z 7350 TCV=Q(K)*SQR(MSW*NTK) !. 7360 DIF=ABS(XBAR(II)-XBAR(JJ))
7380 ' I F N R E f S % = R D X S N D 0PT-"Y" ™ E N l pR™T USING ODIFHT;DIF,TCV
7390 IF DIF>=TCV THEN 7420 7^00 FLAG$(II,JJ)="NS" 7410 GOTO 7460 7*2° IF TF=1 THEN 7440 ELSE TE=TE+1 : TF=1 7f30 IF SIG=1 THEN STE=STE+1 7 ^ TC=TC+1 : FLAG$(II, JJ)="*"
74M NEXT JJ I F S K = I T H E" S T C' S T C + 1
7470 NEXT II 7480 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150 7490 f
7^J° J SSD UNEQUAL N'S
7520 FOR 11=1 TO K: FOR JJ=1 TO K: FLAG$(II,JJ)=»0": NEXT JJ,II
132 7540 FOR II=K TO 2 STEP -1 7550 FOR JJ=1 TO II-l 7560 7570 7580 7590 7600 ' 7610 7620 7630 7640 7650 7660 7670 7680
' SSD ' SSD
NSD=((1/NN(II)) + (1/NN(JJ))) SCV=SQR((K—1)*FCV)*SQR(MSW*NSD) DIF=ABS(XBAR(II)-XBAR(JJ)) IF NREPS%=RD% AND 0PT="Y" THEN LPRINT USING 0DIFS;DIF,SCV
IF DIF>=SCV THEN 7640 FLAG$(II,JJ)="NS" GOTO 7680
IF SF=1 THEN 7660 ELSE SE=SE+1 : SF=1 IF SIG=1 THEN SSE=SSE+1
SC=SC+1 : FLAG$(II, IF SIG=1 THEN SSC=SSC+1
NEXT JJ
| EW Type I error rate (all anovas) - LSD ^ EW Type I error rate (sig anovas) - FLSD ^ PC Type I error rate (all anovas) - LSD PC Type I error rate (sig anovas) - FLSD
7690 NEXT II
7700 IF NREPS%=RD% AND OPT="Y" THEN GOSUB 2150
7720 | ZERO OUT CYCLE COUNTERS
7740 FOR 11=1 TO K : XBAR(II)=0 : NEXT II 7750 LF=0: HF=0: SF=0: NF=0: MF=0: TF=0 7760 ' 7770 NEXT NREPS% 7780 ' 7790 m h m m m m , , , , s u m m a r i z e R E S U L T g
7820 * CALCULATE PERCENTS 7830 ' 7840 PLE=C*LE/REP% 7850 PFE=C*FE/REP% 7860 PLC=C*LC/NCOMPS 7870 PFC=C*FC/NCOMPS 7880 ' 7890 PME=C*ME/REP% 7900 PSME=C*SME/REP% 7910 PMC=C*MC/NC0MPS 7920 PSMC=C*SMC/NC0MPS 7930 ' 7940 PNE=C*NE/REP% 7950 PSNE=C*SNE/REP% 7960 PNC=C*NC/NCOMPS 7970 PSNC=C*SNC/NCOMPS 7980 ' 7990 PSE=C*SE/REP% 8000 PSSE=C*SSE/REP% 8010 PSC=C*SC/NCOMPS 8020 PSSC=C*SSC/NCOMPS 8030 ' 8040 IF J>5 THEN 8110 8050 PHE=C*HE/REP% 8060 PSHE=C*SHE/REP% 8070 PHC=C*HC/NC0MPS
' EW (all) ' EW (sig) ' PC (all) ' PC (sig)
' SNK
MRT
' SSD
HSD
133
8080 PSHC=C*SHC/NCOMPS
8090 GOTO 8210 Skip Unequal n HSD
8110 PHE=C*HE/REP% .» HSD-SS 8120 PSHE=C*SHE/REP% 8130 PHC=C*HC/NC0MPS 8140 PSHC=C*SHC/NCOMPS 8150 ' 8160 PTE=C*TE/REP% .' HSD-TK 8170 PSTE=C*STE/REP% 8180 PTC=C*TC/NCOMPS 8190 PSTC=C*STC/NCOMPS 8200 '
8230 ' - - - I ™ ! R U N T I M E F 0 R K'J COMBINATION
8240 ' PRINT OUT SUMMARY SHEET 8250 ' 8270 IF OPT="Y" THEN LPRINT CHR$(12); 8280 PRINT #3," " : PRINT #3," 11 : PRINT #3 " " 8290 PRINT#3, 01 ' 8300 PRINT#3, " ":PRINT#3," " : PRINT #3 » » 8310 PRINT#3, USING 02;K,CALC$ 8320 PRINT#3, USING 03;J,DATE$ 8330 PRINT#3," " : PRINT #3 " " 8340 PRINT#3, 05 8350 PRINT#3, 06 8360 FOR 1=2 TO K
8370 PRINT#3, USING 07;I,FCV,Q(I),M(I) OT 8380 NEXT I 8390 PRINT#3, 08 8400 PRINT#3, USING 09;REP% 8410 PRINT#3, USING 010;ICNT,.05*REP%,NC0MPS 8420 PRINT#3, USING 011;ICNT/REP% 8430 PRINT#3," " 8440 PRINT#3, 012 8450 PRINT#3, 013 8460 PRINT#3, 014 8470 PRINT#3, 015 8480 PRINT#3," " 8490 PRINT#3, USING 016;LE,LC 8500 PRINT#3, USING 017;PLE,PLC 8510 PRINT#3," " 8520 PRINT#3, USING 018;FE,FC 8530 PRINTI3, USING 019;PFE,PFC 8540 PRINT#3," " 8550 PRINT#3, USING 020;ME,SME,MC,SMC 8560 PRINT#3, USING 021;PME,PSME,PMC,PSMC 8570 PRINT#3," " 8580 PRINT#3, USING 022;NE,SNE,NC,SNC 8590 PRINT#3, USING 023;PNE,PSNE,PNC,PSNC 8600 PRINT#3," " 8610 '
8620 IF J>5 THEN 8700 : Skip Equal „ HSD
134 8630 '
8640 PRINT#3, USING 026;HE,SHE,HC,SHC HSD 8650 PRINT#3, USING 027;PHE,PSHE,PHC,PSHC 8660 ' 8670 GOTO 8790 , IT_ 8680 ' ' i p Unequal n HSD
8690 PRINT#3," " 8700 PRINT#3, 028 8710 PRINT#3," " 8720 PRINT#3, USING 031;HE,SHE,HC,SHC HSD-SS 8730 PRINT#3, USING 032;PHE,PSHE,PHC,PSHC 8740 PRINT#3," "
8750 PRINT#3, USING 029;TE,STE,TC,STC HSD-TK 8760 PRINT#3, USING 030;PTE,PSTE,PTC,PSTC 8770 PRINT#3," " 8780 PRINT#3, 028 8790 PRINT#3," "
8800 PRINT#3, USING 024;SE,SSE,SC,SSC •' SSD 8810 PRINT#3, USING 025;PSE,PSSE,PSC,PSSC 8820 PRINT#3," " 8830 PRINT#3, 033 8840 PRINT#3, CHR$(12); 8860 '
8870 'ZERO OUT COUNTERS FOR NEXT K,J CYCLE 8880 ' 8900 ICNT=0 8910 LF=0: HF=0: SF=0: NF=0: MF=0: TF=0 8920 LE=0: HE=0: SE=0: NE=0: ME=0: TE=0 8930 LC=0: HC=0: SC=0: NC=0: MC=0: TC=0: FC=0 8940 FE=0: SHE=0: SSE=0: SNE=0: SME=0: STE=0 8950 FC=0: SHC=0: SSC=0: SNC=0: SMC=0: STC=0 8960 SIG=0 8970 •
8980 GOTO 2590 R E T U R N F 0 R N E X T C Y C L E
9000 END
APPENDIX D
ANALYSIS OF THE STEPWISE AND SIMULTANEOUS TESTING PROCEDURES
Both the SNK and MRT are stepwise, or layered, multiple comparison procedures. The critical differences competed by these procedures depend on the distance between ranked means
» P Pt!! d l XJ'?S;
o n s t r a t e a h o w t h e s e procedures are applied and how they differ from simultaneous procedures. a p p l l e a
? example shown below is taken from one of the sig-nificant F-tests: repetition 130, k=6, J=5. The five zrout) means from this cycle were 97.63, 97.06 107 51 100 f*; 101.48, and 102.02. The ANOVA d i f f e r ' ' t ^ I H t e ^ f i h o w n
S S d f M S F Fcv
Between 1769.1 5 353.83 2 632 2~?sT Within 19359.4 144 13^44 Total 21128.5 U 9
Step U Rank order the group means. This results in the following listing of group means:
R a n k Mean Original position
1 107.51 3 2 102.02 6 3 101.48 5 4 100.55 4 5 97.63 1 o 97.06 2
Step 2. Create a paired difference matrix as shown below:
97.63 100.55 101.48 102.02 107.51
97.06 0.57 3.50 4-42 4.96 (10.45)
97.63 2.92
00 •
0̂ 4-39 9.88
100.55 0 . 9 3 1 .47 6.96
00 •
o
0.54 6.03 102.02
5.49
135
136 Step 3. Calculate the first critical difference. The first test is made of the largest difference, 10.45. The critical
diffe?enoP hS +° m p u t e d by.multiplying the standard error of inference by the appropriate Studentized Range Table criti-cal value as shown below. 1
s n k6 = <l(-05,6,144) /134.44/25
= (4.07) (2.319)
= 9.438
different6 J ^ U a l d i f f e r f c ® i s larger than the critical difference, this pair is declared significantly different This is reflected by a If this pair had not been
been made?1 1 0 different, no further tests would have
Step 4. Having tested the two means r = k = 6 ranks aDart we now test n,eans r = k-1 = 5 ranks apart, shown S e l o H n ( ) .
97.63 100.55
97.06 0.57 3.50
97.63 2.92
100.55
101.48
102.02
SNKC 5
0
• OO
102.02 107.51 — — — —
4.42 (4.96)
3.85 4.39 (9.88)
0.93 1 .47 6.96
0.54 6.03
5.49
test is computed as follows
= (3.90) (2.319)
= 9.044
a c ^ a l difference, 4.96, is less than the critical difference, the pair is declared not significantly different
S r t Z r ? r t ? e r t e s t s , a r e " d e on this row. A d ^ W o n a U y ! no further tests are made from this column to the left. The test barrier, shown as a " —" sign, marks the "Hmi + Q -p pairwise tests. limits for further
137
97.63 100.55 101.48 102.02 107.51
97.06 0.57 3-50 ~ ~ - ~
97.63 2.92 3.85
100.55
101.48
102.02
NS *
(9.88)
0.93 1.47 6.96
0.54 6.03
5.49
The next test made is between the actual difference 9.88 and the critical difference 9.44. Since the actual difference is greater than the critical difference, this pair is declared significantly different. No further tests are made on this row because of the barrier (-).
97.06
97.63
0.57
100.55
3.50
101.48 102.02
NS
107.51
97.63 2.92 3.85 —
100.55 0.93 1.47 6 .96
101.48 0.54 6.03
102.02 5.49
m e a n s ? " 1 5 « _ t ® s t e d ® e a n s J = k-1 ranks apart, we now test means r - k-2 - 4 means apart, as shown in the () below.
97.06
97.63
0.57
100.55
3.50
101.48
( ! ) "
102.02
NS
107.51
97.63 2.92 3.85 ( - ) -X-
100.55 0.93 1 .47 (6.96)
00 •
o
0.54 6.03 102.02
5.49 The critical difference for this test is computed as follows:
= (3.66) (2.319)
= 8.488
138 ac"^ual difference, 6.96, is less than the critical
difference, the pair is de Barriers are set to the l| below.
97.06
97.63
0.57
100
97.63 4.92
100.55
101.48
102.02
No further tests are made means apart, has been barr multiple comparison proced
The MRT procedure fol slightly lower critical va cal differences. In this e ferences were 7 .34 , 7 .20 , respectively. In this case two comparisons significan
As a point of compari was computed as 6.47.
5.49
because the next test, r = k-3 = 3 ed from testing. At this point, the ure ends. lows the same process but uses lues to compute its layered criti-•xample, the MRT critical dif-and 7.03 for r = 6 , 5, and 4 , both procedures declared the same tly different. son, the LSD critical difference
LSD = q ( .05
= (2.79)
= 6.470
It declared 3 comparisons below:
97.63
NS
100
97.06 NS |rs
97.63 $s
100.55
101.48
102.02
Notice that all of the difff the same simultaneous appro 9«44 for all differences.
clared not significantly different, ft and below this cell as shown
.55
.50
101.48
3.85
0.93
102.02
NS
(-)
0.54
107.51
NS
,2,144) /134•44/25
(2.319)
significantly different, as shown
55 101.48
NS
NS
NS
102.02
NS
NS
NS
NS
107.51
NS
NS
erences were tested. The HSD uses ach. Its critical difference was
139
HSD = q ( . 0 5 , 6 , 1 4 4 ) / 1 3 4 . 4 4 / 2 5
= ( 4 . 0 7 ) (2.319)
= 9.438
The same pairs declared different by SNK and MRT were declared significant by the HSD, as shown below:
97.63^ 100.55 101.48 102.02 107.51
97.06 NS NS NS NS *
97.63 NS NS NS *
1 0 0 , 5 5 NS NS NS
NS NS 101.48
102.02
11.075.
NS
The Scheffe Significant Difference was computed to be
SSD = /((k-1)Fcv) /(2MSW/n)
= / ( 6 - 1 ) ( 2 . 2 8 1 ) / ( 2 ) ( 1 3 4 . 4 4 ) / 2 5
= / ( 1 1 .405) /" (10.755)
= 11 .075
It declared no differences significant as shown below:
-Vitl 1 0 0 , 5 5 101 ̂ 8 1 0 2' 0 2 107.51
9 7' 0 6 NS NS NS NS~~ NS~
NS NS NS NS
NS NS NS
NS NS
NS
97.63
100.55
101 .48
102.02
APPENDIX E
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
3 1
COMPUTATION TIME: 00:04:34
r 2 : 3 :
F 3.890 3.890
Q 3.080 3 .770
M 3.080 3.230
QT 3. 791 3. 791
F-TESTS: 1000 SIG F-TESTS: 50 E: 50 PERCENT SIG: 0.050 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
109 10.90%
90 9.00%
48 A.80%
48 4 . 80%
39 3.90%
50 5.00%
50 5.00%
45 4 . 50%
45 4.50%
39 3 . 90%
COMPARISONS: 3000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
145 4.83%
123 4.10%
70 2. 33%
60 2.00%
48 1 . 60%
77 2.57%
77 2.57%
6 7 2. 23%
57 1 .90%
48 1 .60%
U o
u 1
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
3 2
COMPUTATION TIME: 00:08:00
r 2 : 3 : 3.355
Q 2.905 3.510
M 2.905 3.050
QT 3.519 3.519
F-TESTS: 1000 SIG F-TESTS: 52 E: 50 PERCENT SIG: 0.052 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
128 12.80%
104 10.40%
51 5 . 10%
51 5 . 10%
40 4.00%
52 . 20%
52 . 20%
47 . 70%
4 7 70%
40 00%
COMPARISONS: 3000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
159 5.30%
135 4. 50%
77 2.57%
58 1 .93%
44 1.47%
82 2.73%
8 2 2. 7 3%
-7 n
/ D 2.43%
1 . 80%
4 4 1 .47%
U 2
Ac * * P O PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
3 3
COMPUTATION TIME: 00:11:25
i' 2 : 3 :
F 3 . 222 3 . 222
Q 2.857 3.436
M 2.857 3 . 007
QT 3.446 3.446
F-TESTS: 1000 SIG F-TESTS: 50 E: 50 PERCENT SIG: 0.050 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
108 10.80%
90 9. 00%
49 4 . 90%
49 4 . 90%
40 4.00%
50 5.00%
50 5.00%
46 4 .60%
46 4 .60%
40 4.00%
COMPARISONS: 3000
COMF'ARI SONWI SE
ALL ANOVAS SIG ANOVAS
138 4 .60%
117 3 . 90%
71 2.37%
59 1 .97%
46 1 . 53%
2. 50%
7 5 2.50%
68 2. 2 7%
56 1.87%
46 1 .5 3%
1 U
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *
GROUPS (k): SIZE (J):
3 4
COMPUTATION TIME: 00:14:51
r 2 : -> . D :
F 3.162 3. 162
Q 2.834 3 . 406
M 2.834 2.984
QT 3.413 3.413
F-TESTS: 1000 SIG F-TESTS: 50 E: 50 PERCENT SIG: 0.050 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONS: 3000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
126 12.60%
155 5.17%
101 10.10%
54 5. 40%
54 5. 40%
4 7 4. 70%
50 5.00%
50 5.00%
48 4.80%
48 4 .80%
47 4 . 70%
130 4.3 3%
78 2 . 60%
66 2 . 20%
54 1 . 30%
75 2. 50%
75 2. 50%
2. 40%
60 2.00%
5 4 1 .80%
U 5
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (fc): SIZE (J):
3 5
COMPUTATION TIME: 00:18:19
r 2 : 3 :
F 3.134 3. 134
Q 2.824 3 .392
M 2.821 2.971
QT 3. 397 3.397
F-TESTS: 1000 SIG F-TESTS: 54 E: 50 PERCENT SIG: 0.054 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONS: 3000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
127 12.70%
161 5.37%
100 10.00%
54 5. 40%
54 5. 40%
45 4 . 50%
54 5.40%
54 5.40%
51 5.10%
51 5.10%
45 4 . 50%
133 4.43%
81 2. 70%
67 2. 23%
54 1 . 80%
84 2.80%
84 2.80%
78 2. 60%
64 2.13%
54 1 . 80%
146
k * •* PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k) SIZE (J):
3 6
CODfrUTATION TIME: 00:12:13
r 2: 3 :
F 3.222 3. 222
Q 2.857 3.436
M 2.857 3 .007
QT 3 . 446 3.446
F-TESTS: 1000 SIG F-TESTS: 53 E: 50 PERCENT SIG: 0.053 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONS: 3000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD—SS
HSD—TK
SSD
127 12.70%
100 10.00%
51 5.10%
45 4 . 50%
53 5.30%
53 5 . 30%
48 4.80%
162 5.40%
8 2 2.73%
134 82 4.47% 2.7 3%
75 72 2. 50% 2. 40%
34 3 .40%
34 3 . 40%
N HSD
37 1 . 23% 1
54 5.40%
51 5.10%
65 2.17%
KT u r n
2
45 4.50%
52 1 . 73%
37
62
52 .1. 735
U 7
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
COMPUTATION TIME: 00:29:19
r 2 : 3 :
F 3 .074 3.074
Q 2.802 3 . 362
M 2. 794 2.944
QT 3.364 3 . 364
F-TESTS: 1000 SIG F-TESTS: 53 E: 50 PERCENT SIG: 0.053 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS ALL ANOVAS
LSD 120 153 12.00% 5.10%
FLSD 53 5 . 30%
MRT 94 51 126 9 . 40% 5.10% 4. 20%
SNK 51 43 70 5.10% 4. 30% 2.3 3%
N HSD
HSD-SS 26 21 28 2 . 60% 2.10% 0.93%
HSD-TK 57 48 65 5 . 70% 4 . 80% 2.17%
N HSD
SSD 30 30 3 4 3 . 00% 3 .00% 1.13%
COMPARISONS: 3000
COMPARISONWISE
SIG ANOVAS
2 . 60%
i 6 2.53%
6 2 2.01%
23 0.11%
56 1.87%
34 1.135
14-8
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k) SIZE (J):
4 1
COMPUTATION TIME: 00:06:23
F 3.240 3.240 3.240
Q 3.000 3.650 4.050
M 3.000 3. 150 3. 230
QT 4 .050 4.050 4.050
F-TESTS: 1000 SIG F-TESTS: 51 E: 50 PERCENT SIG: 0.051 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
181 18.10%
128 12.80%
46 4 . 60%
46 4 . 60%
25 2. 50%
51 5.10%
51 5.10%
40 4.00%
40 4.00%
25 2 . 50%
COMPARISONS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
290 4.83%
216 3 .60%
77
1 . 28%
62 1.03%
33 0. 55%
133 2 . 22%
12 4 2.0 7%
71 1 .13%
56 0.93%
33 0.55%
U 9
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): 4 SIZE (J): 2
COMPUTATION TIME: 00:10:54
r 2: D : 4 :
F 2.872 2.872 2.872
Q 2.872 3 . 460 3.814
M 2.872 3.022 3 . 108
QT 3.814 3.814 3.814
F-TESTS: 1000 SIG F-TESTS: 54 PERCENT SIG: 0.054
E : E : : 50 0.050
COMPARI SONS: 6000
EXPERIMENTWISE COMPARISONWISE
ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVAS
LSD 182 18.20%
285 4. 75%
FLSD 54 5 . 40%
132 2 . 20%
MRT 132 13.20%
54 5. 40%
214 3.57%
119 1 .98%
SNK 50 5.00%
46 4 . 60%
92 1.53%
88 1 . 47%
HSD 50 5.00%
46 4 . 60%
70 1.17%
66 1 .10%
SSD 27 2. 70%
27 2. 70%
38 0.63%
38 0.63%
150
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *
GROUPS (k): SIZE (J):
4 3
COMPUTATION TIME: 00:15:29
F 2. 776 2. 776 2.776
Q 2.836 3 . 408 3. 750
M 2.836 2.986 3.084
QT 3. 749 3. 749 3. 749
F-TESTS: 1000 SIG F-TESTS: 53 E: 50 PERCENT SIG: 0.053 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONS: 6000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
195 19.50%
300 5 .00%
142 14.20%
55 5.50%
55 5. 50%
35 3 . 50%
53 5. 30%
53 5. 30%
48 4.80%
48 4.80%
35 3.50%
220 3.67%
83 1 . 38%
66 1.10%
38 0.63%
120 2.00%
111 1 .85%
76 1.27%
59 0. 98%
38 0.63%
151
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k.) SIZE (J):
4 4
COMPUTATION TIME: 00:20:03
r 2 : D :
4 :
2.739 2. 739 2. 739
Q 2.822 3 . 389 3. 724
M 2.818 2.96.8 3 .068
QT 3.724 3. 724 3. 724
F-TESTS: 1000
SIG F-TESTS: 45 E: 50 PERCENT SIG: 0.045 E: 0.050
E X P E R I M E N T S S E
ALL ANOVAS SIG ANOVAS
COMPARISONS: 6000
C OMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
207 20.70%
29 7 4.95%
139 13.90%
49 4. 90%
49 4 . 90%
15 1 . 50%
45
4 . 50%
45 4 . 50%
3 6 3.60%
3 . 60%
15 1 .505
207 3 . 4 5%
65 1 .08%
56 0.93%
16 0.2 7%
101 1 . 68%
9 3
1 . 55%
0 .8 7%
4 3 0 . 7 2%
16 0.27%
152
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
4 5
COMPUTATION TIME: 00:24:36
F 2.712 2.712 2.712
Q 2 . 8 1 2 3.3 76 3. 704
M 2.803 2.953 3.053
QT 3 . 707 3. 707 3. 707
F-TESTS: 1000 SIG F-TESTS: 54 E: 50 PERCENT SIG: 0.054 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
!NK
HSD
;SD
194 19.40%
137 13.70%
52 5 . 20%
52 5. 20%
24 2 . 40%
54 5 . 40%
54 5 . 40%
4 5 4 . 50%
4 5 4.50%
24 2. 40%
COMPARISONS: 6000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
307 5.12%
227 3 . 78%
79 1 .32%
61 1.02%
28 0.4 7%
129 2 . 15%
121 2.0 2%
7 2 1 . 20%
54 0 . 90%
28 0 . 4 7%
153
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
4 6
COMPUTATION TIME: 00:19:17
F 2. 752 2.752 2. 752
Q 2.827 3 . 396 3. 734
M 2.826 2.976 3.076
QT 3. 733 3.733 3.733
F-TESTS: 1000 SIG F-TESTS: 46 E: 50 PERCENT SIG: 0.046 E: 0.050
COMPARISONS 6000
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD-SS
HSD-TK
198 19.8056
130 13.00%
52 5. 20%
24 2 . 40%
53 5.30%
46 4 . 60%
45 4 . 50%
39 3 .90%
300 5.00%
200 3.33%
1 . 42%
UNEQUAL N HSD
24 2 . 40%
40 4 . 00%
31 0.5 2%
70 1.17%
106 1 .7 7%
101 1 .68%
7 2 1 . 20%
31 0.5 2%
5 7 0 . 95%
-UNEQUAL N HSD
zo 2. 60%
26 2 . 60%
35 0 . 58%
35
154
rt * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k) SIZE (J):
4 7
COMPUTATION TIME: 00:35:13
r F Q M QT 2: 2.674 2. 792 2. 786 3.672 3 : 2.674 3. 347 2.936 3.672 4 : 2.674 3.667 3.036 3.672
F-TESTS 1000 SIG F-TESTS: 45 E 50 COMPARISONS : 6000 PERCENT SIG: 0.045 E • 0.050
EXPERIMENTWISE C OMPARISONWI SE
ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVAS
LSD 187 276 18.70% 4 .60%
FLSD 45 9 7 4.50% 1 . 62%
MRT 116 43 175 87 11.60% 4. 30% 2.92% 1 .45%
SNK 4 2 3 4 66 58 4 . 20% 3 . 40% 1 . 10% 0.9 7%
M HCFl • • U IN JEJ v; .U /A L.
HSD-SS 28 23 32 27 2.80% 2. 30% 0 .53% 0. 4 5%
HSD-TK 47 38 63 54 4 . 70% 3 . 80% 1 .05% 0.90%
KT T4QT1 " U IN L U n L
SSD 22 22 25 25 2 . 20% 2 . 20% 0. 42% 0.4 2%
155
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES *
GROUPS (k): SIZE (J):
5 1
COMPUTATION TIME: 00:08:17
F 2.870 2.870 2.870 2.870
Q 2.950 3.580 3.960 4.230
M 2.950 3. 100 3. 180 3.250
QT 4. 233 4. 233 4. 23 3 4.233
F-TESTS: 1000 SIG F-TESTS: 49 E: 50 PERCENT SIG: 0.049 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONS: 10000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
264 26.40%
482 4.82%
193 19.30%
47 4. 70%
47 4. 70%
23 2. 30%
49 4 . 90%
49 4 . 90%
39 3.90%
39 3 .90%
23 2. 30%
344 3 .44%
79 0. 79%
62 0 .62%
30 0. 30%
161 1 .61%
139 1 . 39%
71 0 .71%
54 0.54%
30 0. 30%
156
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES *
GROUPS (k): SIZE (J):
5 ?
COMPUTATION TIME: 00:13:59
F 2.590 2.590 2.590 2.590
Q 2.853 3.430 3.777 4.013
M 2.853 3.003 3.095 3. 162
QT 4.024 4.024 4 .024 4 . 024
F-TESTS: 1000 SIG F-TESTS: 45 E: 50 PERCENT SIG: 0.045 E: 0.050
COMPARISONS: 10000
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD 263 26.30%
485 4.85%
FLSD
MRT
SNK
HSD
SSD
181 18.10%
45 4 . 50%
45 4 . 50%
13 1 . 30%
4 . 50%
45 4 . 50%
37 3 . 70%
37 3 . 70%
13 1 . 30%
319 3.19%
69 0.69%
59 0.59%
15 0.15%
148 1 . 48.%
127 1.27%
ol 0 .61%
51 0.51%
15 0.15%
157
* PRINT OUT OF MULTIPLE COMPARISON ERROR RATES
GROUPS (fc): SIZE (J):
5 3
COMPUTATION TIME: 00:20:00
F 2.517 2.517 2.517 2.517
Q 2.825 3 . 393 3. 730 3.970
M 2 . 8 2 2 2.972 3.072 3. 135
QT 3.968 3 .968 3.968 3 . 968
F-TESTS: 1000 SIG F-TESTS: 49 E: 50 PERCENT SIG: 0.049 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONS: 10000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
279 27.90%
516 5.16%
191 19.10%
48 4 . 80%
48 4 . 80%
18 1 . 80%
49 4 .90%
49 4.90%
36 3 . 60%
36 3 .60%
18 1 .80%
343 3 . 43%
80 0.80%
65 0.65%
26 0. 26%
174 1 . 74%
147 1 .47%
68 0 .68%
53 0.53%
26 0. 26%
158
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
5 4
COMPUTATION TIME: 00:25:41
F 2.483 2.483 2. 483 2.483
Q 2.813 3. 377 3. 705 3.945
M 2.804 2.954 3.054 3. 123
QT 3.94 2 3.942 3.942 3 .942
F-TESTS: 1000 SIG F-TESTS: 47 E: 50 PERCENT SIG: 0.047 E: 0.050
COMPARISONS: 10000
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
278 27.80%
485 4.85%
180 18.00%
46 4.60%
46 4 . 60%
16 1 . 60%
47 4. 70%
47 4. 70%
35 3 . 50%
35 3 . 50%
16 1 .60%
316 3.16%
70 0. 70%
56 0 . 56%
18 0.18%
158 1 . 53%
139 1 . 39%
_> /
0.59%
45
18 0 .18%
159
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *
GROUPS (k.) SIZE (J):
COMPUTATION TIME: 00:31:35
F 2.450 2.450 2.450 2.450
Q
2 . 8 0 0 3 . 360 3.680 3.920
M 2. 792 2.943 3.042 3.112
QT 3.917 3.917 3.917 3.917
F-TESTS: 1000 S1G F-TESTS: 53 E: 50 PERCENT SIG: 0.053 E: 0.050
COMPARISONS: 10000
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
294 29.40%
508 5.08%
189 18.90%
46 4 . 60%
46 4 . 60%
19 1 . 90%
53 5. 30%
53 5. 30%
38 3 .80%
38 3 .80%
19 1 .90%
316 3.16%
88 0.88%
71 0.71%
28 0 . 28%
169 1 .69%
149 1 . 49%
80 0.80%
6 3 0.63%
28 0 . 28%
160
* * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * *
GROUPS (k): SIZE (J):
5 6
COMPUTATION TIME: 00:27:54
F 2.433 2.483 2. 483 2.483
Q 2.813 3.377 3.705 3.945
M 2.804 2.954 3 .054 3. 123
QT 3.942 3.94 2 3 .94 2 3.942
F-TESTS: 1000 SIG F-TESTS: 46 E: 50 PERCENT SIG: 0.046 E: 0.050
COMPARISONS 10000
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD 295 29.50%
531 5.31%
FLSD
MRT
SNK
178 17.80%
45 4 . 50%
46 4 . 60%
45 4.50%
36 3 . 60%
299 2.99%
65 0.65%
1 . 38%
113 1 . 13%
0 . 565
HSD-SS
HSD-TK
SSD
19 1 . 90%
47 4 . 70%
13 1 . 30%
-UNEQUAL N HSD
17 1 . 70%
38 3.80%
27 0.27%
61 0 . 6 1 %
-UNEQUAL N HSD
13 1 . 30%
20 0 . 20%
z o 0 . 25%
52 0.52%
20 0 . 20%
161
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k) SIZE (J):
5 7
COMPUTATION TIME: 00:41:37
F 2.437 2.437 2.437 2.437
Q 2. 783 3.331 3.651 3.885
M 2. 779 2.929 3.029 3.099
QT 3.883 3.883 3.883 3 .883
F-TESTS: 1000 SIG F-TESTS: 49 E: 50 PERCENT SIG: 0.049 E: 0.050
COMPARISONS: 10000
EXPERIMENTWISE COMPARISONWISE —
ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVA
LSD 299 528 29.90% 5 . 28%
FLSD 49 158 4.90% 1 . 58%
MRT 188 4 8 328 142
1—•
CD
00
o
4 .80% 3 . 28% 1 . 42%
SNK 51 35 84 6 7 5.10% 3. 50% 0 .84% 0.6 7%
UNEQUAL N HSD
HSD-SS 41 29 53 41 4.10% 2.90% 0. 53% 0.41%
HSD-TK 56 39 77 60 5 . 60% 3 . 90% 0.7 7% 0. 60%
N ucn
SSD 17
uinl^/UHL
1 7
N
24 24 1 . 70% 1 . 70% 0 . 24% 0.24%
162
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
6 1
COMPUTATION TIME: 00:10:24
r F Q M QT 2: 2.620 2.920 2.920 4.373 3: 2.620 3.530 3.070 4.373 4 : 2.620 3.900 3. 150 4.373 5: 2.620 4. 170 3.220 4.373 6 : 2.620 4. 370 3 . 280 4.373
F-TESTS: 1000 SIG F-TESTS: 53 E : : 50 COMPARISONS: 15000 PERCENT SIG: 0.053 E : : 0.050
EXPERIMENTWISE COMPARISONWISE
ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVAS
LSD 346 780 34.60% 5. 20%
FLSD 53 240 5 . 30% 1 .60%
MRT 231 53 513 209 2 3.10% 5. 30% 3 . 42% 1 . 39%
SNK 50 39 100 89 5.00% 3 .90% 0.67% 0. 59%
HSD 50 39 75 64 5 . 00% 3 . 90% 0 . 50% 0. 4 3%
SSD 11 11 16 • 16 1 . 10% 1.10% 0.11% 0.11%
* PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
163
GROUPS (k): SIZE (J):
o 2
COMPUTATION TIME: 00:18:17
2 2 ,
2 .
2 .
2
394 394 394 394 394
Q 2.839 3.412 3. 755 3 . 963 4.181
M 2.
2 .
3. 3. 3.
839 989 086 149 206
QT 4 .184 4 . 184 4. 184 4. 184 4.184
F-TESTS: 1000 SIG F-TESTS: 49 E: 50 PERCENT SIG: 0.049 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONS: 15000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
5SD
354 35.40%
741 4.94%
229 22.90%
49 4.90%
49 4 . 90%
11 1 . 10%
49 4.90%
49 4 . 90%
40 4 . 00%
40 4 . 00%
11 1 . 10%
461 3.07%
89 0. 59%
72 0. 48%
13 0.09%
206 1 .37%
177 1.18%
80 0 . 53%
6 3 0.4 2%
13 0.09%
164.
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): SIZE (J):
6 3
COMPUTATION TIME: 00:24:0*
2 2 .
2 ,
2 ,
2,
338 338 338 338 338
2 ,
3. 3, 3 , 4,
818 384 716 956 136
M 2.812 2.962 3.062 3. 128 3. 188
QT 4. 136 4. 136 4. 136 4 . 136 4.136
F-TESTS: SIG F-TESTS PERCENT SIG:
1000 53
0.053 E E ;
50 0.050
COMPARISONS: 15000
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
SNK
HSD
SSD
350 35.00%
756 5 .04%
217 21.70%
55 5 . 50%
55 5.50%
21 2.10%
53 5. 30%
53 5. 30%
39 3 . 90%
39 3 . 90%
21 2. 10%
460 3.07%
98 0 .65%
81 0.54%
28 0.19%
233 1 . 55%
193 1 . 29%
81 0 . 54%
65 0.4 3%
28 0.19%
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES
165
GROUPS (k): SIZE (J):
6 4
COMPUTATION TIME: 00:30:56
F 2. 298 2. 298 2. 298 2. 298 2.298
2 3
3 3 4
803 364 686 926 106
M 2 ,
2 ,
3. 3, 3.
945 045 115 175
QT 4.103 4.103 4 .103 4 . 103
F-TESTS: 1000 SIG F-TESTS: 51 E: 50 PERCENT SIG: 0.051 E: 0.050
COMPARISONS: 15000
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
LSD
FLSD
MRT
5NK
HSD
SSD
359 35.90%
766 5.11%
225 22.50%
48 4. 80%
48 4 . 80%
11 1 .10%
51 5. 10%
51 5.10%
38 3 .80%
38 3 . 80%
11 1. 10%
455 3.03%
95 0.63%
66 0 . 44%
15 0 .10%
228 1 . 52%
187 1 . 25%
85 0.5 7%
56 0.3 7%
15 0 . 10%
166
PRINT OUT OF MULTIPLE COMPARISON ERROR RATES
GROUPS (k) SIZE (J):
D 5
COMPUTATION TIME: 00:39:24
r 7
5
2. 2 ,
2 ,
2.
281 281 281 281 281
Q 2. 788 3 . 340 3.660 3.896 4.072
F-TESTS: 1000 SIG F-TESTS: 54 E: 50 PERCENT SIG: 0.054 E: 0.050
EXPERIMENTWISE
ALL ANOVAS SIG ANOVAS
M 2. 783 2.933 3.033 3. 103 3 .163
COMPARISONS
QT 4.0 70 4.070 4.0 70 4.070 4.0 70
15000
COMPARISONWISE
ALL ANOVAS SIG ANOVAS
,SD
FLSD
MRT
SNK
HSD
SSD
375 37.50%
767 5.11%
222 22.20%
48 4. 80%
48 4 . 80%
13 1 . 30%
54 5. 40%
54 5. 40%
34 3 . 40%
3 4 3 . 40%
13 1 . 30%
453 3.02%
87 0 . 58%
69 0 . 46%
16 0.11%
231
193 1 "I Q L • JL. / -'o
73 0 . 49%
55 0.37%
16 0.11%
167
* * * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k): 6 COMPUTATION TIME : 00:38 : 04 SIZE (J) 6
r F Q M QT 2 : 2.287 2. 796 2. 789 4.086 3 : 2. 287 3.353 2.939 4.086 4 : 2. 287 3.673 3.039 4 . 086 5: 2. 287 3.911 3. 109 4.086 6 : 2. 287 4.089 3. 169 4 .086
F-TESTS: 1000 SIG F-TESTS : 46 E 50 COMPARISONS: 15000 PERCENT SIG : 0.046 E : 0.050
EXPERIMENTWISE COMPARISONWISE
ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVAS
LSD 342 741 34.20% 4.9 4%
FLSD 46 195 4.60% 1 . 30%
MRT 192 45 387 149 19.20% 4 . 50% 2.58% 0 . 99%
SNK 39 28 70 58 3.90% 2 . 80% 0.47% 0. 39%
M u c n U iN £, v»̂ U rl Li
HSD-SS 19 18 21 20 1 .90% 1 .80% 0.14% 0.13%
HSD-TK 46 34 65 52 4 . 60% 3 . 40% 0 . 43% 0.3 5%
KT u c n """ U IN E. U J r \ i_
SSD 13 13 14 14 1 . 30% 1 . 30% 0 . 09% 0 . 09%
168
* A * PRINT OUT OF MULTIPLE COMPARISON ERROR RATES * * *
GROUPS (k) : 6 COMPUTATION TIME: 00 : 4 8 : 1 9 SIZE (J ) : 7
r F Q M QT 2: 2.270 2.773 2.772 4.037 3 : 2.270 3.315 2.922 4.037 4 : 2.270 3.635 3.022 4 . 03 7 5 : 2. 270 3 .866 3.092 4.03 7 6: 2. 270 4.03 7 3 .152 4.037
F-TESTS : 1000 SIG F-TESTS: 48 E 50 COMPARISONS: 15000 PERCENT SIG: 0.048 E : 0.050
EXPERIMENTWISE COMPARI SONWISE
ALL ANOVAS SIG ANOVAS ALL ANOVAS SIG ANOVAS
LSD 361 763 36.10% 5 .09%
FLSD 48 196 4 . 80% 1.31 %
MRT 206 45 400 147 20.60% 4.50% 2.67% 0.98%
SNK 46 31 85 70 4 . 60% 3.10% 0 .57% 0.47 %
M wen " " U IN U r\ Li
HSD-SS 3 6 25 4 9 38 3 . 60% 2 . 50% 0.33% 0 . 25%
HSD-TK 51 33 73 5 5 5.10% 3 . 30% 0 . 49% 0.37%
W yen w JLN 11 IJ i_,
SSD 11 11 16 16 1.10% 1 . 10% 0.11% C >. 11%
APPENDIX F
Articles Citing the Findings of Carmer and Swanson 1973
Articles Which Used FLSD
1. Atchley, W. R.; Rutledge, J. J.; Cowley, D. E., "A Multi-variate Statistical Analysis of Direct and Correlated Response to Selection in the Rat," Evolution. XXXVI (July 1982), pp. 677-698.
2. Bryant, Edwin H., "Morphometric Adaptation of the Housefly, MUSCA DOMESTICA L.. in the United States," Evolution, XXXI (September 1977), pp. 580-596.
and Turner, Carl R., "Comparative Mor-phometric Adaptation in the Housefly and Facefly in the United States," Evolution, XXXII (December 1Q7fi). pp. 759-770.
4-. Cameron, Guy N. and Kincaid, W. Bradley, "Species Removal Effects on Movements of Sigmodon hispides [cotton rats] and Reithrodontomys fulvescens [harvest mice]", American Midland Naturalist, CVIII (Julv 19fi?K nn. 60-67.
5. Cardon, Kathleen; Anthony, Rita Jo; Hendricks, Deloy G; and Mahoney, Arthur W., "Effect of Atmospheric Oxidation on Bioavailability of Meat Iron and Liver Weights in Rats," Journal of Nutrition. CX (March 1980). 567-574.
6. Dhingra, 0. D. and Sinclair, J. B., "Survival of Macro-phomina phaseolina Sclerotia in Soil: Effects of Soil Moisture, Carbon: Nitrogen Ratios, Carbon Sources, and Nitrogen Concentrations," Phyto, LXV (March 1975), pp. 236-2^0.
169
170
7. Fajemisin, J. M. and Hooker, A. L., "Predisposition to Diplodia Stalk Rot in Corn Affected by Three Helmin-thosporium Leaf Blights," Phyto, LXIV (December 1974), PP. U96-U99 .
8. "Top Weight, Root Weight, and Root Rot of Corn Seedlings as Influenced by Three Helminthosporium Leaf Blights," Plant Dis-ease Reporter, LVIII (April 1974.), pp. 313-317.
9. Farmer, Bonnie R.; Mahoney, Arthur W.; Hendricks, Deloy G.; and Gillett, Tedford, "Iron Bioavailability of Hand-Deboned and Mechanically Deboned Beef," Journal of Food Science, XLII (November-December 1977), pp. 1630-1632.
10. Friedrich, J. W.; Smith, Dale; and Schrader, L. E., "Her-bage Yield and Chemical Composition of Switchgrass as Affected by N, S, and K Fertilizations," Agronomy Journal. LXIX (January-February 1977), pp. 30-33.
11. Fritzell, Erik K., "Habitat Use by Prarie Racoons During the Waterfowl Breeding Season," Journal of Wildlife Management, XLII (January 1978), pp. 118-127.
12. Garcia-de-Siles, J. L.j Ziegler, J. H.; and Wilson, L., "Effects of Marbling and Conformation Scores on Quality and Quantity Characteristics of Steer and Heifer Carcasses," Journal of Animal Science, XLIV (January 1977), pp. 36-4-6.
13. "Prediction of Beef Quality by Three Grading Systems," Journal of Food Science, XLII (Mav-June 1977), pp. 711-715.
u-"Growth, Carcass, and Muscle Characters of Hereford and Holstein Steers," Journal of Animal Science, XLIV (June 1977), pp. 973-984-
1$. Hammerstedt, Roy H. and Hay, Sandra R., "Effect of Incuba-tion Temperature on Motility and cAMP Content of Bovine Sperm," Archives of Biochemistry and Biophysics. CXCIX (February 1980), pp. 4.27-4-37.
16. Harrison, R. G. and Massaro, T. A., "Influence of Oxygen and Glucose on the Water and Ion Content of Swine Aorta," American Journal of Physiology. CCXXXI (December 1976), pp. 1800-1805.
171
17. Ilyas, M. B.j Ellis, M. A., and Sinclair, J. B., "Evalua-tion of Soil Fungicides for Control of Charcoal Rot of Soybeans," Plant Disease Reporter, LIX (April 1975), pp. 360-364..
18. Krapu, Gary and Swanson, George, "Some Nutritional Aspects of Reproduction in Prarie Nesting Pintails," Journal of Wildlife Management. XXXIX (January 1975), pp. 156-162.
19. Lorenz, K. and Dilsaver, W., "Microwave Heating of Food Materials at Various Altitudes," Journal of Food Science, XLI (May-June 1976), pp. 699-702.
20. Mahoney,^ Arthur ¥. and Hendricks, Deloy G., "Some Effects of Different Phosphate Compounds on Iron and Calcium Absorption," Journal of Food Science. XLV (September-October 1978), pp. H73-H76.
21 . • Gillett Tedford, "Effect of Sodium Nitrate on the Bioavailability of Meat Iron for the Anemic Rat," Journal of Nutrition. CIX (December, 1979).
22- J Farmer, Bonnie R.; and Hendrick, Deloy G., "Effects of Level and Source of Dietary Fat on the Bioavailability of Iron from Turkey Meat for the Anemic Rat," Journal of Nutrition, CX (August. 1980). pp. 1703-1708.
23. Mills, David E. and Robertshaw, David, "Response of Plasma Prolactin to Changes in Ambient Temperature and Humidity in Man," Journal of Clinical Endrocrinology and Metabolism. LII (February 1981), pp. 279-283.
24-• Richards, J. Scott; Hurt, Michael; and Melamed, Laurence, "Spinal Cord Injury: A Sensory Restriction Perspective," Archives of Physical Medicine and Rehabilitation. LXIII (May, 1982), pp. 195-199.
25. Rominger, R. S.; Smith, Dale; Petersen, L. A., "Yields and Elemental Composition of Alfalfa Plant Parts at Late Bud Under Two Fertility Levels," Canadian Journal of Plant Science, LV (January, 1975), pp. 69-75^
26 •
— ' * O / y • ^ / 7 r ' J r * ^ / * s •
Chemical Composition of Alfalfa as Influenced by High Rates of K Topdressed as KC1 and KoS0/f" Agronomy Journal, LXVIII (July-August, 1976/, pp. 573-577.
172
27. Smith, Dale and Rominger, R. S., "Distribution of Elements Among Individual Parts of the Orchard Grass Shoot and Influence of Two Fertility Levels," Canadian Journal of Plant Science, LIV (July, 1974), pp. 485-494.
28. Solso, Robert L. and McCarthy, Judith, E., "Prototype Formation of Faces: A Case of Pseudo-memory," British Journal of Psychology. LXXII (November, 1971),—pp^ 499-503.
29* Thatcher, R. W.; Lester, M. L.; McAlaster, R.; and Horst, R., "Effects of Low Levels of Cadmium and Lead on Cognitive Functioning in Chiildren," Archives of Environmental Health. XXXVII (May-June, 1982), pp. 159-166.
30. Volenec, Jeff; Smith, Dale; Soberalske, R. M.; and Ream, H. ¥., "Greenhouse Alfalfa Yields With Single and Split Applications of Deproteinized Alfalfa Juice," Agronomy Journal. LXXI (July-August, 1971), pp. 695-697 •
Articles That Cited Carmer and Swanson But Did Not Specifically State Whether
the LSD was Protected
1. Hagman, Joseph D. and Williams, Evelyn, "Use of Distance and Location in Short Term Motor Memory," Perceptual and Motor Skills. XLIV (June 1977), pp. 867-873.
2. Jensen, Craig, "Generality of Learning Differences in Brain-Weight-Selected Mice," Journal of Comparative an| Physiological Psychology. XCIX (.Tnnn 1Q77^ pp.
3« Parker, Robert J.; Hartman, Kathleen D.; and Sieber, Susan M., "Lymphatic Absorption and Tissue Disposition of Liposome-entrapped [C]Adriamysin Following In-traperitoneal Administration to Rats," Cancer Research. XLI (April, 1981), pp. 1311-13T7T
4* Spring, David R. and Dale, Philip S., "Discrimination of Linguistic Stress in Early Infancy," Journal of sPeech and Hearing Research. XX (June, 1977),"pp. 224-232•
173
Articles That Used the Bayes Exact Test, A Secondary Recomendation
1. Chamblee, Rick W.j Thompson, Layfayette; and Bunn, Tommie, Management of Broadleaf Signalgrass (Brachiaria
platyphylla) in Peanuts (Arachis hypogaea) with Herbicides," Weed Science. XXX (January 1982), pp. 4-0-4-4-•
2. and Coble, Harold, Inference of Broadleaf Signalgrass (Brachiaria
platyphylla) in Peanuts (Arachis hypogaea)," Weed Science, XXX (January 1982), pp. 4.5-4.9.
3« Johnson, Douglas H., "The Comparison of Usage and Availability Measurements For Evaluating Resource Preference," Ecology. LXI (February 1980), pp. 65-71.
4-. Santos, P. F. and Whitford, W. G. , "The Effects of Microarthropods on Litter Decomposition in a ?5o?V a h u a n !? e s e^ Ecosystem," Ecology. LXII (June, 1981), pp. 654--663.
Articles That Used Unrecommended Procedures
1. May, Philip R. A.; Tuma, A. H.; and Yale, Coralee, "Schizophrenia: A Follow-Up Study of Results of Treatment," Archives of General Psychiatry. XXXTTT (April, 1976), pp. 481-4.86.
2. Nilwik, H. J. M., "Growth Analysis of Sweet Pepper (Cap-sicum annum L.): The Influence of Irradiance and Temperature Under Greenhouse Conditions in Winter," Annals of Botany. XLVIII (August, 1981), pp. 129-136.
3. Rees, R. G.; Thompson, J. P.; and Mayer, R. J., "Slow a n d Tolerance to Rusts in Wheat: The Progress
and^Effects of Epidemics of Puccinia graminis tritici in Selected Wheat Cultivars," Australian Journal of Agricultural Research. XXX (May, 1979), pp. 4.O3-4T9.
174
Articles on Multiple Comparisons that Clearly Support the Carmer and Swanson Findings
Keselman, H. J. and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Paramtric Tests for Multiple Comparisons," British Journal of Mathematical and Statistical Psychology. XXX fMav 1977)7 PP- 12fPT33.
2. Games, Paul; and Rogan, Joanne C., "Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic " Psychological Biy^etin, LXXXVI (July 1979), pp. 8 8 ^
3. Wike, Edward L. and Church, James D., "Further Comments on N o*P<™etric M u l t iP l e Comparison Tests," Perceptual and Motor Skills, XLV (December, 1977), pp. 917-918.
Articles That Made <a Passing Reference To the Carmer and Swanson Studies
1. Adwinckle, Herb S.; Polach, F. J.; Molin, W. T.; and Pear-son, R. C., Pathogenicity of Phytophthora cactorum Isolates from New York Apple Trees and Other
989-994'" P h y t o p a t h o l o ^ Y ' L X V (September 1975), pp.
2. Carmer, S. G., "Optimal Significant Levels for Application
the Least Significant Difference in Crop Perfor-
1 9 7 6 ) S c i e n c e > XVI (January-February
3. Daniel, Wayne W.; Coogler, Carol G., "Statistical Applica-tions in Physical Medicine," American Journal of Physical Medicine % LIV (February 1975)"*
4. Gill, J. L , ''Evolution of Statistical Design and Analysis
198^ ̂ ^ 4 9 4 - 1 ~ Science, LXIV (June
5. Kemp, K. E., "Multiple Comparisons: Comparisonwise and Experimentwise Type I Error Rates and Their Relation-ship to Power," Journal of Dairy Science. LVITT (September 1975), pp. 1372-1378.
175
6. Keselman, H. J. and Rogan, Joanne C., "The Tukey Multiple Comparison Test: 1953-1976," Psychological Bulletin, LXXXIV (September 1977), pp. 1050-1056.
7. Madden, L. V.; Knoke, J. K.; and Raymond, Louie, "Con-siderations for the Use of Multiple Comparison Procedures in Phytopathological Investigations," Phytopathology. LXXII (August 1982), pp. 1015-1017.
8. Petersen, R. G., "Use and Misuse of Multiple Comparison Procedures," Agronomy Journal. LXIX (March-April, 1977), pp. 205-208.
Articles Openly Critical of the Carmer and Swanson Findings
1. Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (1975), ^p, 574.-583.
2. Games,^Paul, "A Three-Factor Model Encompassing Many Pos-sible Statistical Tests on Independent Groups," Psychological Bulletin. LXXXV (January 1978), pp. 168—182•
3. Ryan, T. A., "Comment on 'Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Om-nibus Test Statistic,'" Psychological Bulletin, LXXXVIII (September 1980), pp. 354.-355.
APPENDIX G
TABLES XI TO XIV SHOWING CRITICAL DIFFERENCES FOR EACH MULTIPLE COMPARISON PROCEDURE FOR
EQUAL N'S FOR k=3 TO k=6
TABLE XI
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=3 AND J=1 TO $
J r i* (F)LSD MRT SNK HSD SSD q(2) m(r) q(r) q(k)
1 2 728 13.768 13.768 13.768 3 14.4-39 16.853
2 2 758 10.673 10.673 10.673 3 11.206 12.896
3 2 53 8.908 8.908 8.908 3 9.376 10.714
-4 2 117 7.331 7.331 7.331 3 7.719 8.809
5 2 803 8.572 8.572 8.572 3 9.019 10.296
"i=randomly selected iteration from
176
177
TABLE XII
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=4 AND J=1 TO 5
(F)LSD MRT SNK HSD q(2) m(r) q(r) q(k)
SSD
1 2 165 17 .115 3 4
2 2 652 9 . 8 4 5 3 4
3 2 249 10 .035 3 4
4 2 683 9 . 2 8 4 3 4
5 2 191 7 . 2 9 5 3 4
17 .115 17 .115 2 3 . 1 0 5 2 5 . 1 5 ^ 17 .971 2 0 . 8 2 3 18.4-27 2 3 . 1 0 5
9 . 8 4 5 9.84-5 13 .074 U . 2 2 9 10 .359 11 .860 10.654- 13 .074
10 .035 10 .035 1 3 . 2 6 9 14 .440 10 .565 1 2 . 0 5 9 10 .912 1 3 . 2 6 9
9 . 2 8 4 9 . 2 8 4 12 .252 1 3 . 3 3 7 9 . 7 6 5 11 .151
10 .094 12 .252
7 . 2 9 5 7 . 2 9 5 9 . 6 0 9 10 .465 7 .661 8 . 7 5 9 7 .921 9 . 6 0 9
178
TABLE XIII
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=5 AND J=1 TO 5
(F)LSD q(2)
MRT m(r)
SNK q(r)
HSD q(k)
SSD
4
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3
5
382 16.028
224 11.319
869 9.4-62
295 7.14-3
793 7.860
16.028 16.84.3 17.278 17.658
11 .319 11.914-12.281 12.549
9-462 9.956 10.291 10.500
7.143 7.502 7.756 7.930
7.860 8.260 8.54-1 8.737
16.028 19.14-1 21.515 22.982
11.319 13.611 U.990 15.922
9.462 11.365 12.493 13.297
7.14-3 8.576 9.410 10.019
7.860 9.432 10.330 11.004
22.982 26.034
15.922 18.063
13.297 15.028
10.019 11.320
11.004 12.427
179
TABLE XIV
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=6 AND J=1 TO $
(F)LSD q ( 2 )
MRT q' (r)
SNK q (r)
HSD q (k)
SSD
1 2 619 18 .872 18 .872 18 .872 3 19.841 22 .814 4 20 .358 2 5 . 2 0 6 5 20 .811 26 .951 6 21 .199 28 .243
2 2 298 11 .576 11 .576 11 .576 3 12 .188 13 .912 4 12 .583 15.311 5 12 .840 16 .159 6 13 .073 17 .048
3 2 856 9 .391 9 .391 9 .391 3 9 .871 11 .278 4 10 .205 12 .384 5 10 .425 13 .184 6 10 .625 13 .784
4 2 313 7. .170 7 .170 7 .170 3 7 .532 8 . 6 0 5 4 7 . 7 8 8 9 . 4 2 8 5 7 . 9 6 7 10 .042 6 8 .121 10 .503
5 2 326 6. 00
00
6 . 8 5 8 6 . 8 5 8 3 7 .216 8 . 2 1 6 4 7 .462 9 . 0 0 3 5 7 .634 9 . 5 3 8 6 7 .781 10 .016
28.24-3 33 .081
17.04-8 19.951
13 .784 16 .114
10 .503 12 .262
10 .016 11 .748
APPENDIX H
TABLES XV TO XVIII SHOWING CRITICAL DIFFERENCES FOR EACH MULTIPLE COMPARISON PROCEDURE FOR
UNEQUAL N'S FOR k=3 TO k=6
TABLE XV
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=3 AND J=6 TO 7
J d" i (F)LSD q (2 )
MRT m (r)
SNK q (r)
HSD-SS q (k)
HSD-TK q (k)
SSD
6 1 2 3
728 10 .60 10 .06
8 . 8 7
11 .16 12 .75 14 .00 14 .00 11 .43
12 .75 12 .09 10 .67
13 .32 12.64. 11 .14
7 1 2 3
895 7 . 2 2 5 .71 5 .71
7 . 5 8 8 . 6 6 8 . 6 7 8 . 6 7 8 . 6 7
8 . 6 6 6 . 8 5 6 . 8 5
9 . 0 3 7 . U 7 . 1 4
id order of mean difference comparison tested
180
181
TABLE XVI
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=4 AND J=6 TO 7
J d i (F)LSD MRT SNK HSD-SS HSD-TK q(2) m (r) q (r) q (k) q (k)
SSD
6 1 951 8.66 9.42 11 .44 2 7.61 8.01
11 .44
3 9.82 9.41 —
4 10.32 10.89 —
5 9.4-9 — _ 6 8.28 - -
7 1 833 7.99 8.69 10.50 2 6.32 — —
3 7.99 —
4 7.99 — _ 5 6.32 —
6 6.32 — —
12.22 11.44 12.45 10.59 10.05 10.93 14.97 12.97 14.11 14.97 13.67 14.88 14.97 12.53 13.64 12.22 10.94 11 .90
10.51 10.50 11 .47 10.51 8.30 9.07 10.51 10.50 11.47 10.51 10.50 11 .47 10.51 8.30 9.07 10.51 8.30 9.07
182
TABLE XVII
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=5 AND J=6 TO 7
J d i (F)LSD MRT SNK HSD-SS HSD-TK SSD q(2) m (r) q (r) q (k) q (k)
6 1 571 9.62 10.68 13.49 2 10.20 —
3 9.85 — —
4 10.75 —
5 8.33 — _ 6 8.99 —
7 8.60 — __ 8 7.13 —
9 7.90 —
10 7.60 - -
7 1 A3 7.AA 8.29 10.39 2 5.88 —
10.39
3 7.44 — —
4 7.44 — —
5 7.4-4- — —
6 5.88 —
7 7.4-4- —
8 7.4-4- — _ 9 5.88 — _ 10 5.88 —
16.51 13 • A9 15.24 16.51 U .31 16.16 16.51 13 .82 15.62 16.51 15 .08 17.04 13.4-8 11 . 68 13.20 13.4-8 12 .62 14.25 13.4-8 12 .06 13.63 10.4.4. 10 .00 11 .30 11.67 11 .08 12.52 11.67 10 . 66 12.05
10.38 10 .39 11 .80 10.38 8 .21 9.33 10.38 10 .39 11 .80 10.38 10 .39 11 .80 10.38 10 .39 11 .80 10.38 8 .21 9.33 10.38 10, .39 11 .80 10.38 10, .39 11 .80 10.38 8, .21 9.33 110.38 8, .21 9.33
183
TABLE XVIII
CRITICAL DIFFERENCES FOR EACH OF THE TESTING PROCEDURES FOR k=6 AND J=6 TO 7
J d i (F)LSD MRT SNK q (2) m ( r ) q ( r )
6 1 793 10.06 11 .41 14.72 2 8 .66 —
14.72
3 7 .87 —
4 6 .98 —
5 7 .35 —
6 10.50 —
7 9 .17 — _ 8 8 .42 — —
9 7 .60 —
10 10.25 — ...
11 8 . 8 7 —
12 8 .10 _ ,
13 10 .87 —
14 9 . 5 9 —
15 11 .46 - -
7 1 601 8 . 3 9 9 .54 12.21 2 6 .63
9 .54
3 8 .39 — _
4 8 . 8 9 —
5 8 .89 —
6 8 .89 — _ 7 6 .63 —
8 8 .39 — —
9 8 . 3 9 — « _
10 8 .39 — _ 11 6 .63 _ 12 8 .39 — —
13 8 .39 _
14 6 .63 —
15 6 .63 —
HSD-SS q (k)
HSD-TK q (k)
SSD
18.34 14.72 14.98 12 .67 12 .97 11 .51 10.59 10.21 11.60 10.75 18.34 15.36 14.98 13.41 12.97 12.32 11.60 11.12 18.34 14.99 14.98 12.98 12 .97 11.85 18.34 15.90 14 .98 14.02 18.34 16.76
12.21 12.21 12.21 9 . 6 6 12.21 12.21 12.21 12.21 12.21 12.21 12.21 12.21 12.21 9 .66 12.21 12.21 12.21 12.21 12.21 12.21 12.21 9 .66 12.21 12.21 12.21 12.21 12.21 9 .66 12.21 9 .66
17.21 U . 8 1 13.46 11 .94 12 .57 17.96 15.68 14-4 0 13.00 17.53 15.18 13.86 18.59 16.4-0 19.60
14.42 11.40 14.42 14.42 14.42 14.42 11.40 14.42 14.42 14.42 11.40 14.42 14.42 11.40 11.40
APPENDIX I
This appendix contains summaries of the z-tests per-
formed between the HSD, and its modifications, and the FLSD
multiple comparison procedures. Table XXV shows the results
of the z-tests when the Bernhardson formulas are applied to
both HSD and LSD procedures. Table XXVI shows the results of
z-tests when the unprotected HSD is compared with the FLSD.
In every case, significant z scores were produced by the
conservatism of the HSD rather than the liberalism of the
FLSD.
184
185
TABLE XXV
Z-TESTS FOR SIGNIFICANT DIFFERENCE OF PROPORTIONS BETWEEN EXPERIMENTWISE TYPE I ERROR RATES FOR THE HSD AND
FLSD MULTIPLE COMPARISON PROCEDURES
FLSD HSD Standard error of z
k J CNT % CNT % Difference score SIG?
3 1 50 .050 45 .045 0.0067 0.7433 3 2 52 .052 47 .047 0.0069 0.7289 3 3 50 .050 46 .046 0.0068 0.5917 —
3 A 50 .050 48 .048 0.0068 0.2930 —
3 5 54 .054 51 .051 0.0071 0.4254 3 6* 53 .053 34 .034 0.0065 2.9456 Yes 3 6** 53 .053 51 .051 0.0070 0.2849 3 7- 53 .053 21 .021 0.0060 5.3609 Yes 3 7" 53 .053 48 .048 0.0069 0.7221 4 1 51 .051 40 .040 0.0066 1.6692 _
A 2 5A .054 46 .046 0.0069 1.1608 A 3 53 .053 48 .048 0.0069 0.7221 _
A A 45 .045 36 .036 0.0062 1.4438 A 5 54 .054 45 .045 0.0069 1.3121 A 6- 46 .0^6 24 .024 0.0058 3.7855 Yes A 6--* 46 .046 40 .040 0.0064 0.9353 _
A 7" 45 .045 23 .023 0.0057 3.8388 Yes A 7* -* 45 .0A5 38 .038 0.0063 1.1099 5 1 49 • 0A9 39 .039 0.0065 1.5419 _ 5 2 45 .045 37 .037 0.0063 1.2758 5 3 49 .049 36 .036 0.0064 2.0379 Yes 5 A 47 .047 35 .035 0.0063 1.9137 _ 5 5 53 .053 38 .038 0.0066 2.2761 Yes 5 6< 46 . 046 17 .017 0.0055 5.2504 Yes 5 6* 46 .046 38 .038 0.0063 1.2612 5 7 49 .049 29 .029 0.0061 3.2669 Yes 5 7-:: f* 49 .049 39 .039 0.0065 1.5419 6 1 53 .053 39 .039 0.0066 2.1134 Yes 6 2 49 .049 40 .040 0.0065 1.3802 6 3 53 .053 39 .039 0.0066 2.1134 Yes 6 A 51 .051 38 .038 0.0065 1 .9936 Yes 6 5 54 .054 34 .034 0.0065 3.0837 Yes 6 6- 46 .046 18 .018 0.0056 5.0309 Yes 6 6* 46 .046 34 .034 0.0062 1 .9365 6 7": 48 .048 25 .025 0.0059 3.8784 Yes 6 7-•* 48 .048 33 .033 0.0062 2.4063 Yes
* Spjptvoll-Stoline modification Tukey-Kramer modification
186
TABLE XXVI
Z-TESTS FOR SIGNIFICANT DIFFERENCE OF PROPORTIONS BETWEEN EXPERIMENTWISE TYPE I ERROR RATES FOR THE UNPROTECTED
HSD AND FLSD MULTIPLE COMPARISON PROCEDURES
FLSD HSD Standard error of z
k J CNT % CNT % Difference score SIG?
3 1 50 .050 48 .048 0.0068 0.2930 3 2 52 .052 51 .051 0.0070 0.1431 —
3 3 50 .050 49 .049 0.0069 0.1458 —
3 4 50 .050 54 .054 0.0070 -0.5697 —
3 5 5k .054 54 .054 0.0071 0.0000 _
3 6- 53 .053 34 .034 0.0065 2.9456 Yes 3 6-<* 53 .053 54 .054 0.0071 -0.1405 3 7-f 53 .053 26 .026 0.0062 4.3835 Yes 3 7" 53 .053 57 .057 0.0072 -0.5548 _
4 1 51 .051 46 .046 0.0068 0.7360 4 2 54 .054 50 .050 0.0070 0.5697 4 3 53 .053 55 .055 0.0071 -0.2798 —
4 4 45 .045 49 .049 0.0067 -0.5977 _
4 5 54 .054 52 .052 0.0071 0.2823 4 6- 46 .046 24 .024 0.0058 3.7855 Yes 4 6" 46 .046 53 .053 0.0069 -1.0205 4 7" 45 .045 28 .028 0.0059 2.8667 Yes 4 7" "* 45 .045 47 .047 0.0066 -0.3019 _ 5 1 49 .0^9 47 .047 0.0068 0.2959 —
5 2 45 .045 45 .045 0.0066 0.0000 5 3 49 .049 48 .048 0 .0068 0.1472 5 4 47 .047 46 .046 0 .0067 0.1502 5 5 53 .053 46 .046 0 .0069 1.0205 5 6- 46 . 046 19 .019 0 .0056 4.8150 Yes 5 6-•* 46 .046 47 .047 0.0067 -0.1502 5 7" 49 .049 41 .041 0.0066 1.2203 5 7-•* 49 .049 56 .056 0.0071 -0.9925 _ 6 1 53 .053 50 .050 0.0070 0.4292 6 2 49 .049 49 .049 0.0068 0.0000 6 3 53 .053 55 .055 0.0071 -0.2798 _ 6 4 51 .051 48 .048 0.0069 0.4374 6 5 54 .054 48 .048 0.0070 0.8624 6 6": 46 .046 19 .019 0.0056 4.8150 Yes 6 6" •* 46 .046 46 .046 0.0066 0.0000 6 7"; 48 .04.8 36 .036 0.0063 1.8918 6 7": •* 48 .0^8 51 .051 0.0069 -0.4374 -
Spj j^tvoll-Stoline modification Tukey-Kramer modification
APPENDIX J
GRAPHIC DISPLAYS OF EXPERIMENTWISE TYPE I ERROR RATES IN RELATION TO A 0.9$ CONFIDENCE
INTERVAL FOR a=0.05 and N=1000
J: 1
64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 15
upper limit
FM
NH NH
S S S
lower limit
Jc = 3 k = 4
4 5 6 7 1 2 3 4 5 6 7
FM FM F FM FM FM F NH T M FM
II II II II II II II II II II II II II II II II II II II
NH N T NH : S S S NH FM NH F F
N M M
NH NT T
H* S N
S S
s S S H* H*
H* S
S
(F)LSD (M)RT S(N)K (H)SD HSD-(T)K (H*)SD-SS (S)SD
Fig* 7—-Graphic presentation of experimentwise Type I error rates in relation to 0.95 confidence interval generated by application of Bernhardson formulas for k=3 and k=4.
187
188
J:
k = 5
1 2 3 4. 5 6 7
64-62 60 58 56 54 52 50 48 4-6
42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12
k = 6
1 2 3 4 5 6 7
FM
FM M FM FM
NH T NH NH T
NH—NH N — N -
FM FM FM
F M
F M
NH NH NH NH
F M
H*
NH T
N
H* S
S S S
H*
S S
T N
H*
(F)LSD (M)RT S(N)K (H)SD HSD-(T)K (H*)SD-SS (S)SD
Fig. 8—-Graphic presentation of experimentwise Type I error rates in relation to 0.95 confidence interval generated by application of Bernhardson formulas for k=5 and k=6.
189
J: 1
64 62 60 58 56 54 52 50 4-8 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14
k = 3
3 4 6 7
L128 L126 M104 M101
L109 L108 L127 M90 M90 NH M100
k = 4
3 4
L126 : L181 L195 M100 M128 M142
L120 L182 L207 M94 M132 M139
7
L194 L187 M137 M116
L198 M130
T NH NH T T
NH N N NH N ==========NH================ =============NH=========== NH S NH T
S S NH
N S S S
H*
S S H"
H* s s S H*
S
S
(L)SD (M)RT S(N)K (H)SD HSD-(T)K (H*)SD-SS (S)SD
Fig. 9—Graphic presentation of experimentwise Type I error rates in relation to 0.95 confidence interval generated without prior significant F-ratio for k=3 and k=4.
190
J: r
k = 5
-4. 5. •7
L263 M1 81
1
k = 6
-3 K 5- •7
64.-62 60 58 56 54-52 50 4-8 4-6 U 4.2 40 38 36 34-32 30 28 26 24-22 20 18 16 U 12
L278 L295 : L346 L350 L375 L361
L267 T?7Q toq/M1?8to M 2 3 1 M 2 1 7 M 2 2 2 M 2 0 6
L264. L279 L294- L299 L354. M193 M191 M189 M188 M229
L359 M225
L34-2 M192
T
N
NH NH T NH NH NH N
H*
lower limit-
S S S H";
NH
-NH—NH T
NH NH T N
N
-H*-
S S
H*
S S S S
(L)SD (M)RT S(H)K (H)SD HSD-(T)K (H»)SD-SS (S)SD
Fig. 10 -Graphic presentation of experimentwise Tvne T error rates in relation to 0.95 confidence interval generated Without prior significant F-ratio for k=5 and k=6? Senerated
BIBLIOGRAPHY
Books
Cohen, Jacob, Statistical Power Analysis for the Behavioral Sciences, Revised edition, New York, Academic Press, 1977.
Couch, James V., Fundamentals of Statistics for the Be-havioral Sciences. New York, St. Martin's Press, 1982. '
Federer, Walter T., Experimental Designi Theory and Application. New York, The Macmillan Company, 1955*
Ferguson, George A., Statistical Analysis in Psychology and Education, 5th ed., New York, McGraw Hill Book Publishers, 1981.
Fisher, R. A.,^Statistical Methods for Research Workers. 6th ed., Edinburgh (London), Oliver and Boyd, 1936.
T-r-:—' Design of Experiments. 2nd ed., Edinburgh, Oliver and Boyd, 1937.
Fryer, H. C., Concepts and Methods of Experimental Statistics. Boston, Allyn and Bacon, 1966.
Glass, Gene V. and Hopkins, Kenneth D., Statistical Methods in Education and Psychology. 2nd ed., Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1984.
Hinkle, Dennis E.; Wiersma, William and Jurs, Stephen G., Basic Behavioral Statistics« Boston, Houghton Mif-flin Company, 1982.
Howell, David C., Statistical Methods for Psychology. Boston. Duxbury Press, 1982. ^ '
Johnson, Palmer 0. and Jackson, Robert W. B., Modern Statis-tical Methods: Descriptive and Inductive. Chicago. Rand McNally & Company, 1959.
Kirk, Roger E., Experimental Design: Procedures for the Behavioral Sciences, 2nd ed., Belmont, California, Brooks/Cole Publishing Company, 1982.
191
192
Light, Richard J. and Pillemer, David B. Summing Up: The Science of Reviewing Research. Cambridge, Harvard university Press, 1984..
Miller, R_._ G. , Simultaneous Statistical Inference. New York. McGraw-Hill, 1966. ~ '
Pedhazur, Elazar J., Multiple Regression in Behavioral .Research: Explanation and Prediction. 2nd ed., New lork, Holt, Rinehart and Winston, Inc., 1982.
Roscoe, John T., Fundamental Research Statistics for the Behavioral Sciences. 2nd ed., New lork, Holt"! Rinehart and Winston, Inc., 1975.
Winer, B.J., Statistical Principles in Experimental Design. New York, McGraw-Hill Book Company, 1962.
Articles
Adwinckle, Herb S.; Polach, F. J.; and Molin, W. T., athogenicity of Phytophthora cactorum Isolates
from New York Apple Trees and Other Sources," Phytopathology. LXV (September 1975), pp. 989-994.
Aitkin, M. A.,^"Multiple Comparisons in Psychological
f^!?eq+m+°t+!" T^f B r i t l s h Journal of Mathematical statistical Psychology. XXII (November 1969),
PP • I 7 1 7O •
Anderson, D. A., "Overall Confidence Levels of the Least Significant Difference Procedure," The American Statistician. Vol. XXVI (1972).
Atchley, W. R.; Rutledge, J. J.; Cowley, D. E., "A Multi-variate Statistical Analysis of Direct and Corre-vvvfr? ?®sP°nfe to Selection in the Rat," Evolution. XXXVI (July 1982), pp. 677-698. '
Balaam, L. N., "Multiple Comparisons: A Sampling Experiment," Australian Journal of Statistics. Vol. V (1963).
Bernhardson, Clemens S., "375: Type I Error Rates When Mul-Te?t0of°?MnJiS°nR^Procedures Follow a Significant F 229-232 ' Biometrics, XXXI (March 1975), pp.
Boardman, Thomas J. and Moffitt, D. R., "Graphical Monte Carlo Type I Error Rates for Multiple Comparison Procedures," Biometrics. (September 1971), pp. 738-
193
Bryant, Edwin H., "Morphometric Adaptation of the Housefly, MUSCA DOMESTICA L., in the United States," Evolution, XXXI TSeptember 1977), pp. 580-596.
and Turner, Carl R., "Comparative Mor-phometric Adaptation in the Housefly and Facefly in the United States," Evolution, XXXII (December 1978), pp. 759-770.
Cameron, Guy N. and Kincaid, W. Bradley, "Species Removal Effects on Movements of Sigmodon hispides [cotton rats] and Reithrodontomys fulvescens [harvest mice]", American Midland Naturalist, CVIII (JiHv 1982), pp. 60-67.
Cardon, Kathleen; Anthony, Rita Jo; Hendricks, Deloy G; and Mahoney, Arthur W., "Effect of Atmospheric Oxidation on Bioavailability of Meat Iron and Liver Weights in Rats," Journal of Nutrition. CX (March 1QR0K rm. 567-574.
Carmer, S. G., "Optimal Significant Levels for Application of the Least Significant Difference in Crop Performance Trials," Crop Science, XVI (January-February 1976), pp. 95-99. '
and Swanson, M. R., "Detection of Differences Between Means? A Monte Carlo Study of Five Pairwise Multiple Comparison Procedures," Agronomy Journal. LXIII (1971), p. 940-945.
"An Evaluation of Ten Pairwise Multiple Comparison Procedures by Monte Carlo Methods," Journal of the American Statistical Association. LXVIII (19731, pp. 66-74.
and Walker, W. M., "Baby Bear's Dilemma: A Statistical Tale," Agronomy Journal. LXXIV (January-February 1982), pp. 122-124.
Chamblee, Rick W. ; Thompson, Layfayette; and Bunn, Tommie, "Management of Broadleaf Signalgrass (Brachiaria platyphylla) in Peanuts (Arachis hypogaea) with Herbicides," Weed Science. XXX (January 1982), pp. 40-44«
„T _ — — > a n d Coble, Harold, Inference of^Broadleaf Signalgrass (Brachiaria
platyphylla) in Peanuts (Arachis hypogaea)," Weed Science. XXX (January 1982), pp. 45-49.
194
Daniel, Wayne W.; Coogler, Carol G., "Statistical Applica-tions in Physical Medicine," American Journal of Physical Medicine, LIV (February 1975).
Dhingra, 0. D. and Sinclair, J. B., "Survival of Macrophomina phaseolina Sclerotia in Soil: Effects of Soil Mois-ture, Carbon: Nitrogen Ratios, Carbon Sources, and Nitrogen Concentrations," Phytopathology. LXV (Mnrnh 1975), pp. 236-240.
Duncan, D. B. and Brant, L. J., "Adaptive t Tests for Mul-tiple Comparisons," Biometrics. XXXIX, pp. 790-794..
Dunnett, C. W., "Answer to Query 272: Multiple Comparison Tests," Biometrics. XXVI (September 1969), pp. 139-14-0 •
Einot, Israel and Gabriel, K. R., "A Study of Powers of Several Methods of Multiple Comparisons," Journal of the American Statistical Association. LXX (1975). pp. 574-583. ~ '
Fajemisin, J. M. and Hooker, A. L., "Predisposition to Diplodia Stalk Rot in Corn Affected by Three Helmin-thosporium Leaf Blights," Phytopathology. LXTV (December 1974), pp. U96-1499.
—_____ , "Top Weight, Root Weight, and Root Rot of Corn Seedlings as Influenced by Three Helminthosporium Leaf Blights," Plant Disease Reporter. LVIII (April 1974), pp. 313-3VT.
Farmer, Bonnie R.; Mahoney, Arthur W.; Hendricks, Deloy G.; and Gillett, Tedford, "Iron Bioavailability of Hand-Deboned and Mechanically Deboned Beef," Journal of Food Science. XLII (November-December 1977).pp. 1630-1632. J'
Friedrich,^J. W.; Smith, Dale; and Schrader, L. E., "Herbage Yield and Chemical Composition of Switchgrass as Affected by N, S, and K Fertilizations," Agronomy Journal, LXIX (January-February 1977), pp. 30-33.
Fritzell, Erik K., "Habitat Use by Prarie Racoons During the Waterfowl Breeding Season," Journal of Wildlife Management. XLII (January 1978), pp. 118-127.
Gabriel, Ruben K., "Comment," Journal of the American Static tical Association. LXXIII ("September 1978), pp. 485 4-87 •
s-
195
Games, Paul, ''A Three-Factor Model Encompassing Many Possible Statistical Tests on Independent Groups," Psychological Bulletin, LXXXV (January 1978), pp. 168—182•
, "Inverse Relation Between the Risks of Type I and Type II Errors and Suggestions for the Unequal n Case in Multiple Comparisons," Psychological Bulletin, LXXV (1971), pp. 97-102.
., Keselman, H. J., and Clinch, Jennifer J., "Mul-tiple Comparisons for Variance Hetereogeneity," British Journal of Mathematical and Statistical Psychology, XXXII, (1979), pp. 133-142^
Garcia-de-Siles, J. L. ; Ziegler, J. H.; and Wilson, L. L., Effects of Marbling and Conformation Scores on
Quality and Quantity Characteristics of Steer and Heifer Carcasses," Journal of Animal Science, XT,TV (January 1977), pp. 36-^6.
"Prediction of Beef Quality by Three Grading ' Systems," Journal of Food Science. XLII fMav-.Tunp
1977), pp. 711-715. *
"Growth, Carcass, and Muscle Characters of Hereford ^?lr
Holstein Steers," Journal of Animal Science, XLIV (June 1977), pp. 973-984..
Gill, J. L. "Current Status of Multiple Comparisons of Means
vSl Lvf n(?973r r i m e n t S'" J ° U r n a l — D a i r y Science.
—_—> "Evolution of Statistical Design and Analysis of
^981)lmp U94J1519al ~ D a i r y Science, LXIV (June
Hagman, Joseph D. and Williams, Evelyn, "Use of Distance and Sh?Tr^T?xm M° t 0 r Memory," Perceptual and
Motor Skills, XLIV (June 1977), pp. 867-873.
Hammerstedt, Roy H. and Hay, Sandra R., "Effect of Incubation Temperature on Motility and cAMP Content of Bovine
^Archives of Biochemistry and Biophysics. CXCIX (February 1980;, pp. 4.27-437.
Harrison, R. G. and Massaro, T. A., "Influence of Oxygen and Glucose on the Water and Ion Content of Swine Aorta," American J ournal of Physiology. CCXXXT (December 1976), pp. 1 800-1 80ff.
196
Howell, John F. and Games, Paul A., "The Effects of Variance Heterogeneity on Simultaneous Multiple Comparison Procedures with Equal Sample Size," British Journal of Mathematical and Statistical Psychology, XXVII TT974), pp. 72-81.
Harter, H. Leon, "Error Rates and Sample Sizes for Range Tests in Multiple Comparisons," Biometrics, XIII (1957), pp. 511-536.
Ilyas, M. B.j Ellis, M. A., and Sinclair, J. B., "Evaluation of Soil Fungicides for Control of Charcoal Rot of Soybeans," Plant Disease Reporter. LIX (April 1975), pp. 360-364..
Jensen, Craig, "Generality of Learning Differences in Brain— Weight-Selected Mice," Journal of Comparative and Physiological Psychology. XCIX TJune 1977), ppTT29-64-1.
Johnson, Douglas H., "The Comparison of Usage and Availability Measurements For Evaluating Resource Preference," Ecology. LXI (February 1980), pp. 65-71.
Johnson, Steven B. and Berger, R. D., "On the Status of Statistics in Phytopathology." Phytopathology. (March 1982), pp. 1014-1015.
Kemp, K. E., "Multiple Comparisons: Comparisonwise and Ex-perimentwise Type I Error Rates and Their Relation-ship to Power," Journal of Dairy Science, LVIII (September 1975), pp. 1372-1378.
H. J. Keselman, "A Power Investigation of the Tukey Multiple Comparison Statistic," Educational and Psychological Measurement , XXXVI (1976), pp. 97-104..
. > Games, Paul; and Rogan, Joanne C., "Protect-ing the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic," Psychological Bulletin. LXXXVI (July 1979), pp. 884-888 •
and Murray, Robert, "Tukey Tests for Pairwise Contrasts Following the Analysis of Variance: Is There a Type IV Error?," Psychological Bulletin. LXXXI (1974) p. 609.
197
and Rogan, Joanne C., "Effect of Very Unequal Group Sizes on Tukey's Multiple Comparison Test," Educational and
26~3C270°giCal XXXVI (Summer 1976), pp.
and Rogan, Joanne C., "An Evaluation of Some Non-Parametric and Parametric Tests for Multiple Comparisons,^ British Journal of Mathematical and Statistical Psychology. XXX (May 1977), pp. 12fPT33.
"The Tukey Multiple Test: 1953-1976," Psychological Bulletin,
LXXXIV (September 1977), pp. 1050-1056.
"A Comparison of the —TT 7 vvmjJUl XQUU U J_ Modifled-Tukey and Scheffe Methods of Multiple Comparisons for Pairwise Contrasts," Journal of the 1978)Can S t ^ i ^ i c a l Association. VXXIII (March
and Toothaker, Larry E., "Comparison of Tukey's T-Method and Scheffe's S-Method for Various Numbers of All Possible Differences of Averages Contrasts Under Violation of Assumptions", Educa-tional^ and^Psychological Measurement. VXX (1 9 7 4 . ) ,
and Shooter, M., •i * y., _ » u u u wxiwwoox « r l • * ./l* Evaluation o f T w o Unequal n. Forms of the Tukey Multiple Comparison Statistic,"Journal of the fo^r!Can S t a t l s t i c a l Association. LXX, (September 1975), pp. 584.-587.
Krapu, Gary and Swanson, George, "Some Nutritional Aspects of Reproduction m Prarie Nesting Pintails," Journal of 1Ao Management. XXXIX (January 1975), pp. 156-I Dfc •
Levin, J. R. and Marascuilo, L. A., "Type IV Errors and Interactions," Psychological Bulletin. LXXVTTT (1972), pp. 368-374.
Lorenz, K. and^Dilsaver, W., "Microwave Heating of Food Materials at Various Altitudes," Journal of Food Science, XLI (May-June 1976), pp. 699-7027
Madden, L. V.; Knoke, J. K.; and Raymond, Louie, "Considera-^ 1 0m! f°r i!16-, ? o f Comparison Procedures m rhytopathological Investigations," Phytopathology. LXXII (August 1982), pp. 1015-1017.
198
Mahoney, Arthur W. and Hendricks, Deloy G., "Some Effects of Different Phosphate Compounds on Iron and Calcium Absorption," Journal of Food Science, XLV (September-October 1978), pp. 1473-14.76.
; Gillett, Tedford, "Effect of Sodium Nitrate on the Bioavailability of Meat Iron for the Anemic Rat," Journal of Nutrition, CIX (December, 1979)«
Farmer, Bonnie R.; and Hendrick, Deloy G., "Effects of Level and Source of Dietary Fat on the Bioavailability of Iron from Turkey Meat for the Anemic Rat," Journal of Nutrition, CX (August. 1980), pp. 1703-1708.
May, Philip R. A.; Tuma, A. H.j and Yale, Coralee, "Schizophrenia: A Follow-Up Study of Results of Treatment," Archives of General Psychiatry. XXXIII (April, 1976), pp. 4.81-4.85"!
Mills, David E. and Robertshaw, David, "Response of Plasma Prolactin to Changes in Ambient Temperature and Humidity in Man," Journal of Clinical Endrocrinology and Metabolism. LII (February 1981), pp. 279-283.
Nilwik, H. J. M., "Growth Analysis of Sweet Pepper (Capsicum annum L.): The Influence of Irradiance and Tempera-ture Under Greenhouse Conditions in Winter," Annals of Botany, XLVIII (August, 1981), pp. 129-136.
O'Neill, R. and Wetherhill, G. B., "The Present State of Multiple Comparison Methods," Royal Statistical Society (Series B), XXXIII (197TK
Parker, Robert J.; Hartman, Kathleen D.; and Sieber, Susan M., "Lymphatic Absorption and Tissue Disposition of Liposome-entrapped [C]Adriamysin Following In-traperitoneal Administration to Rats," Cancer Research, XLI (April, 1981), pp. 1311-1317.
Petersen, R. G., "Use and Misuse of Multiple Comparison Procedures," Agronomy Journal. LXIX (March-April, 1977), pp. 205-208.
Petrinovich, Lewis F. and Hardyck, Curtis D., "Error Rates for Multiple Comparison Methods: Some Evidence Concerning the Frequency of Erroneous Conclusions," Psychological Bulletin. Vol. VXXI (1969), pp. 43-54.
199
Ramsey, Philip H., "Power Differences Between Pairwise Mul-tiple Comparisons," Journal of the American Statis-tical Association, LXXIII 0 978) , p. 4-79.
Rees, R. G.; Thompson, J. P.; and Mayer, R. J., "Slow Rusting and Tolerance to Rusts in Wheat: The Progress and Effects of Epidemics of Puccinia graminis tritici in Selected Wheat Cultivars," Australian Journal of Agricultural Research, XXX (May, 1979), pp. 4.03—4.19.
Richards, J. Scott; Hurt, Michael; and Melamed, Laurence, "Spinal Cord Injury: A Sensory Restriction Perspective," Archives of Physical Medicine and Rehabilitation, LXIII (May, 1982), pp. 195-199.
Rominger, R. S.; Smith, Dale; Petersen, L. A., "Yields and Elemental Composition of Alfalfa Plant Parts at Late Bud Under Two Fertility Levels," Canadian Journal of Plant Science, LV (January, 1975), pp. 69-75.
"Yield and Chemical Composition of Alfalfa as Influenced by High Rates of K Topdressed as KC1 and K?S0,," Agronomy Journal, LXVIII (July-August, T975), pp, 573-577.
Ryan, T. A., "Comment on 'Protecting the Overall Rate of Type I Errors for Pairwise Comparisons With an Omnibus Test Statistic,'" Psychological Bulletin. LXXXVIII (September 1980), pp. 354—355.
Santos, P. F. and Whitford, W. G., "The Effects of Microarthropods on Litter Decomposition in a Chihuahuan Desert Ecosystem," Ecology. LXII (June, 1981 ) , pp. 654.-663.
Smith, Dale and Rominger, R. S., "Distribution of Elements Among Individual Parts of the Orchard Grass Shoot and Influence of Two Fertility Levels," Canadian Journal of Plant Science, LIV (July, 1974.), pp. 485— 4-94-«
Solso, Robert L. and McCarthy, Judith, E., "Prototype Forma-tion of Faces: A Case of Pseudo-memory," British Journal of Psychology. LXXII (November, 1971), pp. 4-99-503.
Spring, David R. and Dale, Philip S., "Discrimination of Linguistic Stress in Early Infancy," Journal of Speech and Hearing Research, XX (June. 1977). bd. 224-232.
200
Steel, R. G. D., "Query 163: Error Rates in Multiple Comparisons," Biometrics, (1961), pp. 326-328.
Thatcher, R. W. ; Lester, M. L.; McAlaster, R.; and Horst, R., "Effects of Low Levels of Cadmium and Lead on Cogni-tive Functioning in Chiildren," Archives of Environ-mental Health, XXXVII (May-June, 1982), pp. 159-166.
Volenec, Jeff; Smith, Dale; Soberalske, R. M.; and Ream, H. W., "Greenhouse Alfalfa Yields With Single and Split Applications of Deproteinized Alfalfa Juice," Agronomy Journal, LXXI (July-August, 1971)* pp. 695-697.
Waller, Ray A. and Duncan, David B., "A Bayes Rule for the Symmetric Multiple Comparisons Problem," Journal of the American Statistical Association, LXIV (December 1969), p. 14-85.
Welsch, Roy E., "Stepwise Multiple Comparison Procedures," Journal of the American Statistical Association, LXXII (1977TT~pp. 566-575.
Wike, Edward L. and Church, James D., "Further Comments on Nonparametric Multiple Comparison Tests," Perceptual and Motor Skills, XLV (December, 1977), pp. 917-918.
Willson, V. L., "Research Techniques in AERJ Articles: 1969— 1978," Educational Researcher, IX ("1980), pp. 5 — 10.
Reports and Manuals
Barcikowski, Robert S., "Statistical Power With Group Mean As the Unit of Analysis," ED 191 910, National In-stitute of Education Grant, (Ohio State University, 1980).
Carmer, S.G. and Walker, W. M., "Pairwise Multiple Com-parisons Procedures for Treatment Means," Technical Report Number 12, University of Illinois, Department of Agronomy, Urbana, Illinois, (December 1983), pp. 1-33.
Wilkinson, Leland, SYSTAT: The System for Statistics SYSTAT, Inc., Evanston, 111., 1984-•
Unpublished Materials
Carmer, S. G., Professor of Biometry, University of Illinois, Urbana, Illinois, Personal letter received January U , 1985.
201
Kirk, Roger E., Professor of Psychology, Baylor University, Waco, Texas, Personal letter received January 22, 1985.
Myette, Beverly M» and White, Karl R., "Selecting An Ap-propriate Multiple Comparison Technique: An Integra-tion of Monte Carlo Studies," Paper presented before the Annual Meeting of the American Educational Research Association, March 19—23> 1982.
Thomas, D. A. H., "Error Rates in Multiple Comparisons Among Means: Results of a Simulation Exercise," Un-published Master's Thesis, University of Kent, Canterberry, England.
R. A. Waller, "On the Bayes Rule for the Symmetric Multiple Comparison Problem," Unpublished Notes, Kansas State University, Manhatton 66506.