post hoc power
TRANSCRIPT
-
8/2/2019 Post Hoc Power
1/31
Post Hoc Power: A Concept WhoseTime Has Come
Anthony J. Onwuegbuzie
Department of Educational Measurement and ResearchUniversity of South Florida
Nancy L. Leech
Department of Educational Psychology
University of Colorado at Denver and Health Sciences
Thisarticle advocates the use ofposthoc power analyses. First, reasons for the nonuse
ofa prioripower analysesare presented.Next, posthoc power is defined and itsutility
delineated. Third, a step-by-step guide is provided for conducting post hoc power
analyses. Fourth, a heuristic example is provided to illustrate how post hoc power can
help to rule in/out rival explanations in the presence of statistically nonsignificant
findings.Finally, several methods areoutlined that describe howpost hocpoweranal-
yses can be used to improve the design of independent replications.
power, a prior power, post hoc power, significance testing, effect size
For more than 75 years, nullhypothesis significance testing (NHST)hasdominated
the quantitative paradigm stemming from the seminal works of Fisher (1925/1941)
and Neyman and Pearson (1928). NHST was designed to provide a means of ruling
out a chance finding, thereby reducing the chance of falsely rejecting the null hy-
pothesis in favor of the alternative hypothesis (i.e., committing a Type I error).
Although NHST has permeated the behavioral and social science field since its
inception, its practice has been subjected to severe criticism with the number of
critics growing throughout the years. The most consistent criticism of NHST that
has emerged is that statistical significance is not synonymous with practical im-portance. More specifically, statistical significance does not provide any informa-
UNDERSTANDING STATISTICS, 3(4), 201230Copyright 2004, Lawrence Erlbaum Associates, Inc.
-
8/2/2019 Post Hoc Power
2/31
tion about how important or meaningful an observed finding is (e.g., Bakan, 1966;
Cahan, 2000; Carver, 1978, 1993; Cohen, 1994, 1997; Guttman, 1985; Loftus,
1996; Meehl, 1967, 1978; Nix & Barnette, 1998; Onwuegbuzie & Daniel, 2003;Rozeboom, 1960; Schmidt, 1992; 1996; Schmidt & Hunter, 1997).
As a result of this limitation of NHST, some researchers (e.g., Carver, 1993)
contend that effect sizes, which represent measures of practical importance,
should replace statistical significance testing completely. However, reporting and
interpreting only effect sizes could lead to the overinterpretation of a finding
(Onwuegbuzie & Levin, 2003; Onwuegbuzie, Levin, & Leech, 2003). As noted by
Robinson and Levin (1997)
Although effect sizes speak loads about the magnitude of a difference or relationship,they are, in and of themselves, silent with respect to the probability that the estimated
difference or relationship is due to chance(samplingerror). Permitting authors to pro-
mote and publish seemingly interesting or unusual outcomes when it can be doc-
umented that such outcomes are not really that unusual would open the publication
floodgates to chance occurrences and other strange phenomena. (p. 25)
Moreover, interpreting effect sizes in the absence of a NHSTcould adversely affect
conclusion validity just as interpreting p values without interpreting effect sizes
could lead to conclusion invalidity (Onwuegbuzie, 2001). Table 1 illustrates thefour possible outcomes and their conclusion validity when combiningp values and
effect sizes. This table echoes the decision table that demonstrates the relationship
between Type I error and Type II error. It can be seen from Table 1 that conclusion
validity occurs when (a) both statistical significance (e.g.,p < .05)and a large effect
size prevail or (b) both statistical nonsignificance (e.g., p > .05) and a small effect
size occur. Conversely, conclusion invalidity exists if (a) statistical
nonsignificance (e.g., p > .05) is combined with a large effect size (i.e., Type A er-
ror) and (b) statistical significance (e.g., p < .05) is combined with a small effect
size (i.e., Type B error). Both Type A and Type B errors suggest that any declara-tion that the observed finding is meaningful on the part of the researcher would be
misleading and hence result in conclusion invalidity (Onwuegbuzie, 2001). In this
respect, conclusion validity is similar to what Levin and Robinson (2000) termed
conclusion coherence. Levin and Robinson (2000) defined conclusion coher-
ence as consistency between the hypothesis-testing and estimation phases of the
decision-oriented empirical study (p. 35).
Therefore, as can be seen in Table 1, always interpreting effect sizes regardless
of whether statistical significance is observed may lead to a Type A or Type B er-
ror being committed. Similarly, Type A and Type B errors also prevail whenNHST is used by itself. In fact, conclusion validity is maximized only if bothp val-
ues and effect sizes are utilized
202 ONWUEGBUZIE AND LEECH
-
8/2/2019 Post Hoc Power
3/31
a two-step process for making statistical inferences. According to this model, a
statistically significant observed finding is followed by the reporting and interpret-ing of one or more indexes of practical importance; however, no effect sizes are re-
ported in light of a statistically nonsignificant finding. In other words, analysts
should determine first whether the observed result is statistically significant (Step
1), and if and only if statistical significance is found, then they should report how
large or important the observed finding is (Step 2). In this way, the statistical sig-
nificance test in Step 1 serves as a gatekeeper for the reporting and interpreting
of effect sizes in Step 2.
This two-step process is indirectly endorsed by the latest edition of the Publica-
tion Manual of the American Psychological Association (American PsychologicalAssociation [APA], 2001):
When reporting inferential statistics (e.g., ttests, Ftests, and chi-square), include
[italics added] information about the obtained magnitude or value of the test statistic,
the degrees of freedom, the probability of obtaining a value as extreme as or more ex-
treme than the one obtained, and the direction of the effect. Be sure to include [italics
added]sufficient descriptive statistics (e.g., per-cell samplesize, means, correlations,
standard deviations) so that the nature of the effect being reported can be understood
by the reader and for future meta-analyses. (p. 22)
Three pages later, the APA (2001) stated
Neither of the two types of probability value directly reflects the magnitude of an ef-
fect or the strength of a relationship. For the reader to fully understand the importance
of your findings, it is almost always necessary to include some index of effect size or
strength of relationship in your Results section. (p. 25)
On the following page, APA (2001) stated that
Th l i i l t b f ll d h i t id th d t l ith
POST HOC POWER 203
TABLE 1
Possible Outcomes and Conclusion Validity When Null Hypothesis Significance Testing
and Effect Sizes Are Combined
Effect Size
Large Small
Not statistically significant p value Type A error Conclusion validity
Statistically significant Conclusion validity Type B error
-
8/2/2019 Post Hoc Power
4/31
Most recently, Onwuegbuzie and Levin (2003) proposed a three-step procedure
when two or more hypothesis tests are conducted within the same study, which in-
volves testing the trend of the set of hypotheses at the third step. Using either thetwo-step method or the three-step method helps to reduce not only the probability
of committing a Type A error but also the probability of committing a Type B er-
ror, namely, declaring as important a statistically significant finding with a small
effect size (Onwuegbuzie, 2001). However, whereas Type B error almost certainly
will be reduced by using one of these methods compared to using NHST alone, the
reduction in the probability of Type A error is not guaranteed using these proce-
dures. This is because if statistical power is lacking, then the first step of the
two-step method and the first and third steps of the three-step procedure, which
serve as gatekeepers for computing effect sizes, may lead to the nonreporting of anontrivial effect (i.e., Type A error). Simply put, sample sizes that are too small in-
crease the probability of a Type II error (not rejecting a false null hypothesis) and
subsequently increase the probably of committing a Type A error.
Clearly, both error probabilities can be reduced if researchers conduct a priori
power analyses to select appropriate sample sizes. However, unfortunately, such
analyses are rarely employed (Cohen, 1992; Keselman et al., 1998; Onwuegbuzie,
2002). When a priori power analyses have been omitted, researchers should con-
duct post hoc power analyses, especially for nonstatistically significant findings.
This would help researchers determine whether low power threatens the internalvalidity of findings (i.e., Type A error). Yet, researchers typically have not used
this technique.
Thus, in this article, we advocate the use of post hoc power analyses. First, we
present reasons for the nonuse of a priori power analyses. Next, we define post hoc
power and delineate its utility. Third, we provide a step-by-step guide for conduct-
ing post hoc power analyses. Fourth, we provide a heuristic example to illustrate
how posthoc power can help to rule in/out rival explanations in the presence of sta-
tistically nonsignificant findings. Finally, we outline several methods that describe
how post hoc power analyses can be used to improve the design of independentreplications.
DEFINITION OF STATISTICAL POWER
Neyman and Pearson (1933a,1933b) were the first todiscuss the concepts of TypeI
and Type II error. Type I error occurs when the researcher rejects the null hypothe-
sis when it is true.Asnotedpreviously, the TypeI error probability isdetermined by
the significance level (). For example, if a 5% level of significance is designated,then the Type I error rate is 5%. Stated another way, represents the conditional
204 ONWUEGBUZIE AND LEECH
-
8/2/2019 Post Hoc Power
5/31
are made over repeated samples from the same population under the same null and
alternative hypothesis, assuming thenull hypothesis is true. Conversely, Type II er-
ror occurs when the analyst accepts the null hypothesis when the alternative hy-pothesis is true. The conditional probability of making a Type II error under the al-
ternative hypothesis is denoted by .Statistical power is the conditional probability of rejecting the null hypothesis
(i.e., accepting the alternative hypothesis) when the alternative hypothesis is true.
The most common definition of power comes from Cohen (1988), who defined the
power of a statistical testas the probability [assuming the null hypothesis is false]
that it will lead to the rejection of the null hypothesis, i.e., the probability that it
will result in the conclusion that the phenomenon exists (p. 4). Power can be
viewed as how likely it is that the researcher will find a relationship or differencethat really prevails. It is given by 1 .
Statistical power estimates are affected by three factors.1 The first factor is level
of significance. Holding all other aspects constant, increasing the level of signifi-
cance increases power but also increases the probability of rejecting the null hy-
pothesis when it is true. The second influential factor is the effect size.
Specifically, the larger the difference between the value of the parameter under the
null hypothesis and the parameter under the alternative parameter, the greater the
power to detect it. The third instrumental component is the sample size. The larger
the sample size, the greater the likelihood of rejecting the null hypothesis2 (Chase& Tucker, 1976; Cohen, 1965, 1969, 1988, 1992). However, caution is needed due
to the fact that large sample sizes can increase power to the point of rejecting the
null hypothesis when the associated observed finding is not practically important.
Cohen (1965), in accordance with McNemar (1960), recommended a probabil-
ity of .80 or greater for correctly rejecting the null hypothesis representing a me-
dium effect at the 5% level of significance. This recommendation was based on
considering the ratio of the probability of committing a Type I error (i.e., 5%) to
the probability of committing a Type II error (i.e., 1 .80 = .20). In this case, the ra-
tio was 1:4, reflecting the contention that Type I errors are generally more seriousthan are Type II errors.
Power, levelof significance,effect size,andsamplesizeare relatedsuchthat any
oneofthesecomponentsisafunctionoftheotherthreecomponents.AsnotedbyCo-
hen (1988), whenanythreeof themare fixed, thefourth iscompletely determined
(p. 14).Thus, there are fourpossible types ofpower analyses inwhich one of the pa-
rametersisdeterminedasafunctionoftheotherthreeasfollows:(a)powerasafunc-
tionof levelof significance, effectsize,andsample size; (b) effectsize as a function
POST HOC POWER 205
1Recently, Onwuegbuzie and Daniel (2004) demonstrated empirically that power also is affected
by score reliability Specifically Onwuegbuzie and Daniel showed that the higher the score reliability
-
8/2/2019 Post Hoc Power
6/31
of level of significance, sample size, and power; (c) level of significance as a func-
tion of sample size, effect size, and power; and (d) samplesize as a function of level
ofsignificance,effect size,and power (Cohen,1965,1988). The latter typeofpoweranalysis is the most popular and most useful for planning research studies (Cohen,
1992).Thisformofpoweranalysis,whichiscalledana prioripoweranalysis, helps
the researcher to ascertain the sample size necessary to obtain a desired level of
power for a specified effect size and levelof significance. Conventionally, most re-
searchers set the power coefficient at .80 and the level of significance at .05. Thus,
once the expected effect size andtype of analysis are specified, then the sample size
needed to meet all specifications can be determined.
The value of an a priori power analysis is that it helps the researcher in planning
research studies (Sherron, 1988). By conducting such an analysis, researchers putthemselves in the position to select a sample size that is large enough to lead to a
rejection of the null hypothesis for a given effect size. Alternatively stated, a priori
power analyses help researchers to obtain the necessary sample sizes to reach a de-
cision with adequate power. Indeed, the optimum time to conduct a power analysis
is during the research design phase (Wooley & Dawson, 1983).
Failing to consider statistical power can have dire consequences for research-
ers. First and foremost, low statistical power reduces the probability of rejecting
the null hypothesis and therefore increases the probability of committing a Type II
error (Bakan, 1966; Cohen, 1988). It may also increase the probability of commit-ting a Type I error (Overall, 1969), yield misleading results in power studies
(Chase & Tucker, 1976), and prevent potentially important studies from being
published as a result of publication bias (Greenwald, 1975) that has been called the
file-drawer problem representing the tendency to keep statistically
nonsignificant results in file drawers (Rosenthal, 1979).
It has been exactly 40 years since Jacob Cohen (1962) conducted the first sur-
vey of power. In this seminal work, Cohen assessed the power of studies published
in the abnormal-social psychology literature. Using the reported sample size and a
nondirectional significance level of 5%, Cohen calculated the average power todetect a hypothesized effect (i.e., hypothesized power) across the 70 selected stud-
ies for nine frequently used statistical tests using small, medium, and large esti-
mated effect size values. The average power of the 2,088 major statistical tests
were .18, .48, and .83 for detecting a small, medium, and large effect size, respec-
tively. The average hypothesized statistical power of .48 for medium effects indi-
cated that studies in the abnormal psychology field had, on average, less than a
50% chance of correctly rejecting the null hypothesis (Brewer, 1972; Halpin &
Easterday, 1999).
During the next three decades after Cohens (1962) investigation, several re-searchers have conducted hypothetical power surveys across a myriad of disci-
206 ONWUEGBUZIE AND LEECH
-
8/2/2019 Post Hoc Power
7/31
Owen, 1973), communication (Chase & Tucker, 1975; Katzer & Sodt, 1973),
communication disorders (Kroll & Chase, 1975), mass communication (Chase &
Baran, 1976), counselor education (Haase, 1974), social work education (Orme &Tolman, 1986), science education (Penick & Brewer, 1972; Wooley & Dawson,
1983), English education (Daly & Hexamer, 1983), gerontology (Levenson,
1980), marketing research (Sawyer & Ball, 1981), and mathematics education
(Halpin & Easterday, 1999). The average hypothetical power of these 15 studies
was .24, .63, and .85 for small, medium, and large effects, respectively. Assuming
that a medium effect size is appropriate for use in most studies because of its com-
bination of being practically meaningful and realistic (Cohen, 1965; Cooper &
Findley, 1982; Haase, Waechter, & Solomon, 1982), the average power of .63
across these studies is disturbing. Similarly disturbing is the average hypothesizedpower of .64 for a medium effect reported by Rossi (1990) across 25 power sur-
veys involving more than 1,500 journal articles and 40,000 statistical tests.
An even more alarming picture is painted by Schmidt and Hunter (1997) who
reported that the average [hypothesized] power of null hypothesis significance
tests in typical studies and research literature is in the .40 to .60 range (Cohen,
1962, 1965, 1988, 1992; Schmidt, 1996; Schmidt, Hunter, & Urry, 1976;
Sedlmeier & Gigerenzer, 1989) [with] .50 as a rough average (p. 40). Unfortu-
nately, an average hypothetical power of .5 indicates that more than one half of all
statistical tests in the social and behavioral science literature will be statisticallynonsignificant. As noted by Schmidt and Hunter (1997), This level of accuracy is
so low that it could be achieved just by flipping a (unbiased) coin! (p. 40). Yet, the
fact that power is unacceptably low in most studies suggests that misuse of NHST
is to blame, not the logic of NHST. Moreover, the publication bias that prevails in
research suggests that the hypothetical power estimates provided previously likely
represent an upper bound. Thus, as declared by Rossi (1997), it is possible that at
least some controversies in the social and behavioral sciences may be artifactual in
nature (p. 178). Indeed, it canbe argued that low statistical power represents more
of a research design issue than it is a statistical issue because it can be rectified byusing a larger sample.
Bearing in mind the importance of conducting statistical power analyses, it is
extremely surprising that very few researchers conduct and report power analyses
for their studies (Brewer, 1972; Cohen, 1962, 1965, 1988, 1992; Keselman et al.,
1998; Onwuegbuzie, 2002; Sherron, 1988) even though statistical power has been
promoted actively since the 1960s (Cohen, 1962, 1965, 1969) and even though for
many types of statistical analyses (e.g., r, z, F, 2), tables have been provided byCohen (1988, 1992) to determine the necessary sample size. Even when a priori
power has been calculated, it is rarely reported (Wooley & Dawson, 1983). Thislack of power analyses still prevails despite the recommendations of the APA
POST HOC POWER 207
-
8/2/2019 Post Hoc Power
8/31
The lack of use of power analysis might be the result of one or more of the fol-
lowing factors. First and foremost, evidence exists that statistical power is not suf-
ficiently understood by researchers (Cohen, 1988, 1992). Second, it appears thatthe concept and applications of power are not taught in many undergraduate- and
graduate-level statistical courses. Moreover, when power is taught, it is likely that
inadequate coverage is given. Disturbingly, Mundfrom, Shaw, Thomas, Young,
and Moore (1998) reported that the issue of statistical power is regarded by in-
structors of research methodology, statistics, and measurement as being only the
34th most important topic in their fields out of the 39 topics presented. Also in the
Mundfrom et al. study, power received the same low ranking with respect to cover-
age in the instructors classes. Clearly, if power is not being given a high status in
quantitative-based research courses, then students similarly will not take it seri-ously. In any case, these students will not be suitably equipped to conduct such
analyses.
Another reason for the spasmodic use of statistical power possibly stems from
the incongruency between endorsement and practice. For instance, although APA
(2001) stipulated that power analyses be conducted despite providing several
NHST examples, the manual does not provide any examples of how to report sta-
tistical power (Fidler, 2002). Harris (1997) also provided an additional rationale
for the lack of power analyses:
I suspect that this low rate ofuse ofpower analysis is largely due to the lackofpropor-
tionality between the effort required to learn and execute power analyses (e.g., deal-
ing with noncentral distributions or learning the appropriate effect-size measure with
which to enter the power tables in a given chapter of Cohen, 1977) and the low payoff
from such an analysis (e.g., the high probability that resource constraints will force
you to settle for a lower Nthan your power analysis says you should have)espe-
ciallygiven theuncertainties involved in a prioriestimatesof effectsizesandstandard
deviations, which render the resulting power calculation rather suspect. If calculation
of thesamplesize neededforadequate powerandforchoosingbetween alternative in-
terpretations of a nonsignificant result could be made more nearly equal in difficulty
to the effort weve grown accustomed to putting into significance testing itself, more
of usmight in fact carryout these preliminaryandsupplementary analyses.(p.165)
A further reason why a priori power analyses are not conducted likely stems
from the fact that the most commonly used statistical packages, such as the Statis-
tical Package for the Social Sciences (SPSS; SPSS Inc., 2001) and the Statistical
Analysis System (SAS Institute Inc., 2002), do not allow researchers directly to
conduct power analyses. Furthermore, the statistical software programs that con-
duct power analyses (e.g., Erdfelder, Faul, & Buchner, 1996; Morse, 2001), al-though extremely useful, typically do not conduct other types of analyses, and
208 ONWUEGBUZIE AND LEECH
-
8/2/2019 Post Hoc Power
9/31
sive. Even when researchers have power software in their possession, the lack of
information regarding components needed to calculate power (e.g., effect size,
variance) serves as an additional impediment to a priori power analyses.It is likely that the lack of power analyses coupled with a publication bias pro-
mulgates the publishing of findings that are statistically significant but have small
effect sizes (possibly increasing Type B error) as well as leading researchers to
eliminate valuable hypotheses (Halpin & Easterday, 1999). Thus, we recommend
in the strongest possible manner that all quantitative researchers conduct a priori
power analyses whenever possible. These analyses should be reported in the
Method section of research reports. This report also should include a rationale for
criteria used for all input variables (i.e., power, significance level, effect size;
APA, 2001; Cohen, 1973, 1988). Inclusion of such analyses will help researchersto make optimum choices on the components (e.g., sample size, number of vari-
ables studied) needed to design a trustworthy study.
POST HOC POWER ANALYSES
Whether an a priori power analysis is undertaken and reported, problems can still
arise. One problem that commonlyoccurs in educational research is when thestudy
is completed anda statistically nonsignificant result is found. In many cases, the re-
searcher then disregards the study (i.e., file-drawer problem) or when he or she sub-mits the final report to a journal for review, finds it is rejected (i.e., publication
bias). Unfortunately, most researchers do not determine whether the statistically
nonsignificant result is the result of insufficient statistical power. That is, without
knowing the power of the statistical test, it is not possible to rule in or rule out low
statistical power as a threat to internal validity (Onwuegbuzie, 2003). Nor can an a
priori power analysis necessarily rule in/out this threat. This is because a priori
power analyses involve the use of a priori estimates of effect sizes and standard de-
viations (Harris, 1997). As such, a priori power analyses do not represent thepower
to detect the observed effect of the ensuing study; rather, they represent the powerto detect hypothesized effects. Before the study is conducted, researchers do not
knowwhat the observed effect size will be. All theycan do is try toestimate itbased
on previous research and theory (Wilkinson & the Task Force on Statistical Infer-
ence, 1999). The observed effect size could end up being much smaller or much
larger than the hypothesized effect size on which the power analysis is undertaken.
(Indeed, this is a criticism of the power surveys highlighted previously; Mulaik,
Raju, & Harshman, 1997.) In particular, if the observed effect size is smaller than
what is proposed, the sample size yielded by the a priori power analysis might be
smaller than is needed to detect it. In other words, a smaller effect size than antici-pated increases the chances of Type II error.
POST HOC POWER 209
-
8/2/2019 Post Hoc Power
10/31
gate the performance of an NHST (Mulaik et al., 1997; Schmidt, 1996; Sherron,
1988). Such a technique leads to what is often called a post hoc power analysis. In-
terestingly, several authors have recommended the use of post hoc power analysesfor statistically nonsignificant findings (Cohen, 1969; Dayton, Schafer, & Rogers,
1973; Fagely, 1985; Fagley & McKinney, 1983; Sawyer & Ball, 1981; Wooley &
Dawson, 1983).
When post hoc power should be reported has been the subject of debate. Al-
though some researchers advocate that post hoc power always be reported (e.g.,
Wooley & Dawson, 1983), the majority of researchers advocate reporting post hoc
power only for statistically nonsignificant results (Cohen, 1965; Fagely, 1985;
Fagley & McKinney, 1983; Sawyer & Ball, 1981). However, both sets of analysts
have agreed that estimating the power of significance tests that yield statisticallynonsignificant findings plays an important role in their interpretations (e.g.,
Fagely, 1985; Fagley & McKinney, 1983; Sawyer & Ball, 1981; Tversky &
Kahneman, 1971). Specifically, statistically nonsignificant results in a study with
low power suggest ambiguity. Conversely, statistically nonsignificant results in a
study with high power contribute to the body of knowledge because power can be
ruled out as a threat to internal validity (e.g., Fagely, 1985; Fagley & McKinney,
1983;Sawyer & Ball, 1981; Tversky & Kahneman, 1971). To this end, statistically
nonsignificant results can make a greater contribution to the research community
than they presently do. As noted by Fagely (1985), Just as rejecting the null doesnot guarantee large and meaningful effects, accepting the null does not preclude
interpretable results (p. 392).
Conveniently, post hoc power analyses can be conducted relatively easily be-
cause some of the major statistical software programs compute post hoc power es-
timates. In fact, post hoc power coefficients are available in SPSS for the general
linear model (GLM). Post hoc power (or observed power, as it is called in the
SPSS output) is based on taking the observed effect size as the assumed popula-
tion effect, which produces a positively biased but consistent estimate of the ef-
fect (D. Nichols,3 personal communication, November 4, 2002). For example, thepost hoc power procedure for analyses of variance (ANOVAs) and multiple
ANOVAs is contained within the options button.4 It should be noted that due to
sampling error, the observed effect size used to compute the post hoc power esti-
mate might be very different than the true (population) effect size, culminating in a
misleading evaluation of power.
210 ONWUEGBUZIE AND LEECH
3
David Nichols is a Principal Support Statistician and Manager of Statistical Support, SPSS Tech-nical Support.
4Details of the post hoc power computations in the statistical algorithms for the GLM and
-
8/2/2019 Post Hoc Power
11/31
PROCEDURE FOR CONDUCTING POST HOC
POWER ANALYSES
Although post hoc power analysis procedures exist for multivariate tests of statisti-
cal significance such as multivariate analysis of variance and multivariate analysis
of covariance (cf. Cohen, 1988), these methods are sufficiently complex to warrant
use of statistical software. Thus, for this article, we restrict our attention to post hoc
power analysis procedures for univariate tests of statistical significance. What fol-
lows is a description of how to undertake a post hoc power analysis by hand for
univariate tests, that is, for situations in which there is one quantitative dependent
variable and one or more independent variables. Here, we use Cohens (1988)
framework for conducting post hoc power for multiple regression and correlationanalysis. Furthermore, these steps can also be used for a priori power analyses by
substituting in a hypothesized effect size value instead of an observed effect size
value. As noted by Cohen (1988), the advantage of this framework is that it can be
used for any type of relationship between the independent variable(s) and depend-
ent variable (e.g., linear or curvilinear, whole or partial,general or partial). Further-
more, the independent variables can be quantitative or qualitative (i.e., nominal
scale), direct variates or covariates, and main effects or interactions. Also, the inde-
pendent variables can represent ex post facto research or retrospective research and
canbe naturally occurring or theresults of experimental manipulation.Most impor-tant, however, because multiple regression subsumes all other univariate statistical
techniques as special cases (Cohen, 1968), this post hoc power analysis framework
can be used for other members of the univariate GLM, including correlation analy-
sis, ttests, ANOVA, and analysis of covariance.
The Ftest can be used for all univariate members of the GLM to test the null hy-
pothesis that the proportion of variance in the dependent variable that is accounted
for by one or more independent variables (PVi) is zero in the population. This Ftest
can be expressed as
(1)
where PVi is the proportion of the variance in the dependent variable explained by
thesetof independentvariables,PVe is the proportion of error variance, dfi isthede-
grees of freedom for the numerator (i.e., the numberof independent variables in the
set), and dfe is degrees of freedom for the denominator (i.e., degrees of freedom for
the error variance). Equation 1 can be rewritten as
POST HOC POWER 211
,=
i
i
e
e
PVdf
FPV
df
-
8/2/2019 Post Hoc Power
12/31
The left-hand term in Equation 2, which represents the proportion of variance in the
dependent variable that is accounted for by the set of independent variables, is a
measure of the effect size (ES) in the sample. The right-hand term in Equation 2,namely, the study size or sensitivity of the statistical test, provides information
about the samplesizeandnumber of independent variables. Thus, degree ofstatisti-
cal significance is a function of the product of the effect size and the study size (Co-
hen, 1988).
Because most theoretically selected independent variables have a nonzero rela-
tionship (i.e., nonzero effect size) with dependent variables, the distribution ofF
values in any particular investigation likely takes the form of a noncentral Fdistri-
bution (Murphy & Myors, 1998). Thus, the (post hoc) power of a statistical signifi-
cance test can be defined as the proportion of the noncentral distribution thatexceeds the critical value used to define statistical significance (Murphy &
Myors, 1998, p. 24). The shape of the noncentral Fdistribution as well as its range
are a function of both the noncentrality parameter and the respective degrees of
freedom (i.e., dfi and dfe). The noncentrality parameter, which is an index that de-
termines the extent to which null hypothesis is false, is given by
= ES(dfi + dfe + 1), (3)
where ES, the effect size, is given by
(4)
and dfi and dfe are thenumerator anddenominatordegrees of freedom, respectively.
Once the noncentrality parameter, , has been calculated, Table 2 or Table 3, de-pending on the level of, can be used to determine the post hoc power of the statis-tical test. Specifically, these tables are entered for values of, , dfi, and dfe.
Step 1: Statistical Significance Criterion ()
Table 2 represents = .05, and Table 3 represents = .01. (Interpolation and ex-trapolation techniques can be used for values between .01 and .05 and outsidethese limits, respectively.)
Step 2: The Noncentrality Parameter ()
InTables 2 and 3,power values are presented for 15values.Because is a contin-uous variable, linear interpolation typically is needed. According to Cohen (1988),
212 ONWUEGBUZIE AND LEECH
,i
e
PVES
PV=
-
8/2/2019 Post Hoc Power
13/31
POST HOC POWER 213
TABLE 2
Power of the FTest As a Function of , dfi, and dfe
dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40
120 27 48 64 77 85 91 95 97 98 99 *
60 29 50 67 79 88 92 96 98 99 99 *
120 29 51 68 80 88 93 96 98 99 99 *
29 52 69 81 89 93 96 98 99 99 *
2 20 20 36 52 65 75 83 88 92 95 97 99 *
60 22 40 56 69 79 87 91 95 97 98 *
120 22 41 57 71 80 87 92 95 97 98 *
23 42 58 72 82 88 93 96 97 99 *
3 20 17 30 44 56 67 75 82 87 91 94 97 99 *
60 19 34 49 62 73 81 87 92 95 97 98 *
120 19 35 50 64 75 83 89 93 95 97 99 *
19 36 52 65 76 84 90 93 96 98 99 *
4 20 15 26 38 49 60 69 76 83 87 91 95 98 99 *
60 17 30 44 57 68 77 83 89 92 95 98 99 *
120 17 31 46 58 70 78 85 90 93 96 98 99 *
17 32 47 60 72 80 87 91 94 96 99 *
5 20 13 23 34 44 54 63 71 78 83 87 93 96 98 99 *
60 15 27 40 52 63 72 80 86 90 93 97 99 *120 16 29 41 54 65 75 82 87 91 94 98 99 *
16 29 43 56 68 77 84 89 93 95 98 99 *
6 20 12 21 30 40 50 59 66 73 79 84 91 95 97 99 *
60 14 25 37 48 59 68 76 83 87 91 96 98 99 *
120 14 27 39 50 62 71 79 85 89 93 97 99 99 *
15 27 40 53 64 74 81 87 91 94 97 99 *
7 20 11 19 28 37 46 54 62 69 75 80 88 93 96 98 99
60 17 24 35 45 56 65 73 80 85 89 94 97 99 99 *
120 13 25 37 47 59 68 76 82 87 91 96 98 99 *
14 25 38 50 61 71 79 85 89 93 97 99 99 *
8 20 10 18 26 34 42 50 58 65 71 76 85 91 94 97 9860 12 23 33 43 52 62 70 77 83 87 93 97 98 99 *
120 12 24 35 45 55 65 73 80 85 89 95 98 99 *
13 24 36 48 59 68 77 83 88 92 96 99 99 *
9 20 10 17 24 32 39 47 54 61 68 73 82 88 93 96 97
60 11 21 31 41 50 58 67 74 80 85 92 96 98 99 *
120 11 22 33 44 53 62 71 78 83 88 94 97 99 *
13 23 34 45 56 66 74 81 86 90 95 98 99 *
10 20 09 16 23 30 37 44 51 58 64 70 79 86 91 94 96
60 10 20 30 39 48 56 65 72 78 83 90 95 97 99 99
120 11 21 31 41 51 60 69 75 81 86 93 96 98 99 *
12 21 32 43 54 64 72 79 85 89 94 98 99 *
(continued)
-
8/2/2019 Post Hoc Power
14/31
214 ONWUEGBUZIE AND LEECH
TABLE 2 (Continued)
dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40
11 20 09 15 21 27 34 41 48 55 61 67 76 84 89 93 96
60 11 18 27 36 45 54 62 70 76 81 89 94 97 98 99
120 11 19 29 39 49 58 67 74 80 85 91 96 98 99 *
12 21 31 42 52 62 70 78 83 89 94 97 99 99 *
12 20 08 14 20 27 33 39 45 52 58 64 74 81 87 91 94
60 09 18 28 36 44 52 59 67 73 79 87 93 96 98 99
120 10 20 30 40 48 56 63 72 78 83 90 95 97 99 99
11 20 30 40 50 60 69 76 82 87 93 97 98 99 *
13 20 09 13 19 24 30 37 43 49 55 61 71 79 85 90 93
60 10 16 24 33 41 50 58 65 71 77 86 92 95 97 99
120 10 18 26 36 45 54 62 70 76 81 89 94 97 98 99
11 19 29 40 50 59 67 75 81 85 92 96 98 99 *
14 20 09 13 18 23 29 35 41 47 52 58 68 76 83 88 92
60 10 16 23 31 40 48 56 63 69 75 84 90 94 97 98
120 10 17 25 34 43 52 61 68 74 80 88 93 97 98 99
11 19 28 38 48 58 65 73 79 84 91 96 98 99 *
15 20 08 12 17 22 27 33 39 44 50 55 65 74 81 86 90
60 09 15 22 30 38 46 54 61 67 73 73 89 94 96 98
120 10 16 24 33 42 51 59 66 73 78 78 92 96 98 99 10 18 27 37 47 56 64 72 78 83 83 95 97 99 99
18 20 08 11 15 19 24 29 34 39 44 49 58 67 74 80 85
60 09 14 20 27 34 41 48 55 62 68 78 85 91 94 97
120 09 15 22 30 38 46 54 61 68 74 83 90 94 97 98
10 17 25 34 43 52 60 68 76 80 88 93 97 98 99
20 20 08 11 14 18 22 26 31 36 40 45 54 63 70 77 82
60 08 13 19 25 31 38 45 52 58 64 75 83 89 93 96
120 09 14 21 28 36 43 51 58 65 71 81 88 93 96 98
09 16 24 32 41 50 58 65 72 78 87 92 96 98 99
24 20 07 10 13 16 19 23 27 31 35 39 47 55 62 69 75
60 08 12 17 22 28 34 40 46 52 58 69 78 84 89 93120 08 13 19 25 32 39 46 53 60 66 76 84 90 94 96
09 15 21 29 37 46 54 61 68 74 83 90 94 97 98
30 20 07 09 11 14 16 19 22 25 29 32 39 45 53 59 65
60 08 11 15 19 24 29 34 40 45 51 61 70 78 84 89
120 08 12 16 22 28 34 40 46 53 59 69 78 85 90 94
08 13 19 26 33 41 48 56 62 69 79 87 92 95 97
40 20 07 08 10 11 13 15 18 20 22 25 30 35 41 47 52
60 07 10 12 16 19 23 27 32 36 41 50 59 67 74 80
120 07 10 14 18 23 28 33 38 44 49 60 69 77 83 88
08 12 17 22 29 35 42 49 55 61 72 81 87 92 95
48 20 07 08 09 10 12 14 15 17 19 21 25 30 35 39 44
60 07 09 11 14 17 20 24 28 31 35 44 52 59 67 73
-
8/2/2019 Post Hoc Power
15/31
POST HOC POWER 215
TABLE 2 (Continued)
dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40
60 20 07 08 09 10 12 13 14 16 18 19 23 26 30 34 38
60 07 08 10 12 15 17 20 23 26 29 36 43 50 57 63
120 07 09 11 14 17 21 25 28 33 37 46 54 62 70 76
07 10 14 18 23 28 34 39 45 01 62 71 79 85 70
120 20 06 07 08 08 09 10 10 11 12 12 14 15 17 19 20
60 06 07 08 09 10 11 12 13 15 16 19 23 26 30 34
120 06 07 08 10 11 13 15 17 19 21 26 31 36 41 47
06 08 11 13 16 19 23 24 31 35 43 52 60 68 74
Note. = .05. From Statistical Power Analysis for the Behavioral Sciences (2nd ed., pp. 420423),
by J. Cohen, 1988, Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Reprinted with permission.
*Power values here and to the right are > .995.
TABLE 3
Power of the FTest As a Function of , dfi, and dfe
dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40
1 20 10 23 37 51 63 73 80 86 90 94 97 99 *
60 12 26 42 57 69 79 86 91 94 96 99 *
120 12 27 44 58 71 80 87 92 95 97 99 *
12 28 45 60 72 81 88 92 95 97 99 *
2 20 06 15 26 37 48 58 67 75 81 86 93 96 98 99
60 08 18 30 45 57 68 76 83 88 92 97 99 99 *
120 08 19 33 46 59 70 78 85 90 93 97 99 *
08 20 35 49 61 72 80 87 91 94 98 99 *
3 20 05 11 20 29 39 48 57 65 72 78 87 93 96 98
60 06 14 25 37 49 60 69 77 83 88 94 97 99 *120 06 15 27 39 51 62 72 79 85 90 95 98 99 *
07 16 29 42 54 65 74 82 87 91 96 98 *
4 20 04 09 16 23 32 41 49 57 64 71 81 89 93 96
60 05 12 21 31 42 53 62 71 78 83 91 96 98 99
120 05 13 23 34 45 56 66 74 81 86 93 97 98 *
06 14 25 37 49 60 69 77 84 89 95 98 99 *
5 20 03 07 13 20 27 35 43 50 58 94 76 84 90 93 *
60 04 10 18 28 37 48 57 66 73 79 88 94 97 98
120 04 11 20 30 41 51 61 70 77 83 91 95 98 99
05 12 22 33 44 55 65 74 80 86 93 97 99 99
6 20 03 06 11 17 23 30 37 45 52 58 70 79 86 91 *60 04 09 16 24 34 43 52 61 69 75 85 92 96 98
120 04 10 17 27 37 47 57 66 73 79 89 94 97 99
-
8/2/2019 Post Hoc Power
16/31
216 ONWUEGBUZIE AND LEECH
TABLE 3 (Continued)
dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40
7 20 03 05 09 15 20 26 33 40 46 53 65 74 82 88 99
60 04 08 14 22 30 39 48 57 64 71 82 90 94 97 *
120 04 09 16 24 34 43 53 62 69 76 86 93 96 96
04 10 18 27 37 48 58 67 74 81 90 95 98 99
8 20 02 05 08 13 18 23 29 36 42 48 60 70 78 84 98
60 03 07 13 20 27 36 44 53 61 67 79 87 93 96 *
120 03 08 14 22 31 40 49 58 66 73 84 91 95 98
04 09 16 25 35 45 55 64 72 78 88 94 97 99
9 20 02 04 07 11 16 21 26 32 38 44 55 65 74 81 86
60 03 06 11 18 25 33 41 49 57 64 76 85 91 95 97
120 03 07 13 20 29 37 46 55 63 70 81 89 94 97 98
04 08 15 23 33 42 52 61 69 76 86 93 96 98 99
10 20 02 04 06 10 14 19 24 29 34 40 51 61 70 77 83
60 02 06 10 16 23 30 38 46 54 61 73 83 89 94 96
120 02 07 12 19 27 35 44 52 60 67 79 87 93 96 98
03 08 14 22 31 40 49 58 66 74 84 91 96 98 99
11 20 02 04 07 10 13 17 22 26 31 37 47 57 66 74 80
60 03 05 10 15 21 29 36 44 51 58 71 80 87 92 95
120 03 06 11 17 25 33 41 50 58 65 77 86 92 95 97 03 07 13 20 29 38 47 56 64 71 83 90 95 97 99
12 20 02 03 05 08 12 15 20 24 29 34 44 53 62 70 77
60 02 05 09 14 20 26 33 41 48 55 68 78 85 91 94
120 02 06 11 17 24 31 38 47 55 62 75 84 90 95 97
03 07 12 19 27 36 45 54 62 69 81 89 94 97 99
13 20 02 04 06 08 11 14 18 22 26 31 40 50 59 67 74
60 02 05 08 13 18 25 32 39 46 52 65 76 84 89 93
120 02 05 10 15 22 29 37 45 53 60 72 82 89 93 96
03 06 11 18 26 35 44 52 60 68 80 88 93 96 98
14 20 02 04 05 08 10 13 17 20 24 29 38 47 55 63 70
60 02 05 08 12 17 23 30 36 43 50 62 73 82 88 92120 02 05 09 14 21 28 35 43 50 58 70 80 88 92 96
03 06 11 17 25 33 42 50 59 66 78 87 92 96 98
15 20 02 03 05 07 09 12 15 19 23 27 35 44 52 60 67
60 02 04 07 11 16 22 28 34 41 47 60 71 80 86 91
120 02 05 09 13 19 26 33 41 48 55 68 78 86 90 95
02 06 10 16 24 32 40 49 57 64 77 86 92 95 98
18 20 02 03 04 06 08 10 13 15 18 22 29 36 44 51 59
60 02 04 06 10 14 18 23 29 35 41 53 64 74 81 87
120 02 04 07 11 17 22 29 36 43 49 62 73 82 88 93
02 05 09 14 21 28 36 44 52 59 72 82 89 94 96
20 20 02 03 04 05 07 09 11 14 16 19 25 32 39 46 53
60 02 04 06 09 12 16 21 26 32 37 49 60 70 78 84
-
8/2/2019 Post Hoc Power
17/31
for interpolation,when .995.
-
8/2/2019 Post Hoc Power
18/31
specttothereciprocalsofdfe, that is, 1/20,1/60, 1/120, and 0, respectively. Inpartic-
ular,whenthepowercoefficientisneededforagivenforagivendfe,where dfe falls
betweenLdfe and Udfe, the lower and upper valuespresented inTable 2 and Table 3,the power values PowerL and PowerUare obtained foratLdfe and Udfe by linear in-terpolation.Then,tocomputepowerforagiven dfe, thefollowingformulaisused:
(5)
For Udfe = , 1/Udfe = 0, such that for dfe > 120, the denominator of Equation 5 is1/120 0 = .0833 (Cohen, 1988).
Virtually all univariate tests of statistical significance fall into one of the fol-
lowing three cases: Case A, Case B, or Case C. Case A, the simplest case, involves
testing a set (A) of one or more independent variables. As before, dfi is the number
of independent variables contained in the set, and Equation 3 is used, with the ef-
fect size being represented by
(6)
where the numerator represents the proportion of variance explained by the vari-
ables in Set A and the denominator represents the error variance proportion.
With respect to Case B, the proportion of variance in the dependent variable ac-
counted for a Set A, over and above what is explained by another set B, is ascer-
tained. This is represented by . The error variance proportion is then
. Equation 3 can then be used, with the effect size being represented by
(7)
The error degrees of freedom for Case B is given by
dfe = N dfi dfb 1, (8)
where dfi is the number of variables contained in Set A, and dfb is the number of
variables in Set B.Case C is similar to Case B, inasmuch as the test of interest is the unique contri-
218 ONWUEGBUZIE AND LEECH
( )( )
1 1
( ).1 1
e eL u L
e e
Ldf dfPower Power Power Power
Ldf Udf
+
= =
+
2
2
,1
Y A
Y A
R
R
2 2B , YY A BR R
2 ,1 Y A BR
2 2B ,
2 ,
.
1
YY A B
Y A B
R R
R
-
8/2/2019 Post Hoc Power
19/31
. Therefore, Equation 3 can then be used, with the effect size being repre-
sented by
(9)
The error degrees of freedom for Case B is given by
dfe = N dfi dfb dfc 1, (10)
where dfi is the number of variables contained in Set A, dfb is the number of vari-
ables in Set B, and dfc is the number of variables in Set C. As noted by Cohen
(1988), Case C represents the most general case, with Cases A and B being special
cases of Case C.
FRAMEWORK FOR CONDUCTING POST HOC
POWER ANALYSES
We agree that post hoc power analyses should accompany statistically
nonsignificant findings.5 In fact, such analyses can provide useful information for
replicationstudies. Inparticular, the components of the post hoc power analysis can
be used toconducta prioripower analyses insubsequent replication investigations.
Figure 1 displays our power-based framework for conducting NHST. Spe-
cifically, once the research purpose andhypotheses have been determined, the next
step is touse ana prioripower analysis todesign the study.Oncedatahavebeencol-
lected,thenextstepistotestthehypotheses.Foreachhypothesis,ifstatisticalsignif-
icance is reached (e.g., at the 5% level), then the researcher should report the effect
size andconfidenceinterval aroundtheeffectsize (e.g.,Bird,2002;Chandler,1957;
Cumming&Finch,2001;Fleishman,1980;Steiger&Fouladi,1992,1997;Thomp-
son, 2002). Conversely, if statistical significance is not reached, then the researcher
shouldconductaposthocpoweranalysisinanattempttoruleinortoruleoutinade-
quate power (e.g., power < .80) as a threat to the internal validity of the finding.
HEURISTIC EXAMPLE
Recently, Onwuegbuzie, Witcher, Filer, Collins, and Moore (2003) conducted a
study investigating characteristics associated with teachers views on discipline.
POST HOC POWER 219
2 2B ,
2 , ,
1
YY A B
Y A B C
R R
R
5Moreover, we recommend that upper bounds for post hoc power estimatesbe computed (Steiger &
Fouladi 1997) This upper bound is estimated via the noncentralityparameter However this is beyond
2 , ,Y A B CR
-
8/2/2019 Post Hoc Power
20/31
The theoretical framework for this investigation, although not presented here, can
be found by examining the original study. Although several independent variables
were examined by Onwuegbuzie, Witcher, et al., we restrict our attention to one of
them, namely,ethnicity (i.e.,CaucasianAmerican vs.minority)and its relationship
to discipline styles.
Participants were 201 students at a large mid-southern university who were ei-ther preservice (77.0%) or in-service (23.0%) teachers. The sample size was se-
220 ONWUEGBUZIE AND LEECH
FIGURE 1 Power-based framework for conducting null hypothesis significant tests.
Ye s
S t a t i s t i c a l
s i g n i f i c a n c e
a c h i e v e d ?
R e p o r t a n d
i n t e r p r e t
p o s t - h o c p o w e r
c o e f f i c i e n t
P o s t - h o c
p o w e r
c o e f f i c i e n t
low?
L a c k o f p o w e r
i s a r i v a l
e x p l a n a t i on o f
t h e s t a t i s t i c a l ly
n o n - s i g n i f i c a n t
f i n d i n g
R e p o r t
e f f e c t s i z e a n d
c o n f i d e n c e
i n t e r v a l a r o u n d
e f f e c t s i z e
No
R u l e o u t p o w e r
a s a r i v a l
e x p l a n a t io n o f
t h e s t a t i s t i c a l
n o n -
s i g n i f i c a n c e
Ye s
No
I d e n t i f y
r e s e a r c h
p u r p o s e
D e t e r m i n e a n d
s t a t e
h y p o t h e s e s
C o n d u c t a p r i o r i
p o w e r a n a l y s i s
a s p a r t o f
p l a n n i n g
r e s e a r c h d e s i g n
C o l l e c t
d a t a
C o n d u c t N u l l
H y p o t h e s i s
S i g n i f i c a n c e
Test
-
8/2/2019 Post Hoc Power
21/31
d= .5) at the (two-tailed) .05 level of significance, maintaining a family-wise error
of 5% (i.e., approximately .01 for each set of statistical tests comprising the three
subscales used; Erdfelder et al., 1996). The preservice teachers were selected fromseveral sections of an introductory-level undergraduate education class. On the
other hand, the in-service teachers represented graduate students who were en-
rolled in one of two sections of a research methodology course.
On the 1st week of class, participants were administered the Beliefs on Disci-
pline Inventory (BODI), which was developed by Roy T. Tamashiro and Carl D.
Glickman (as cited in Wolfgang & Glickman, 1986). This measure was con-
structed to assess teachers beliefs on classroom discipline by indicating the de-
gree to which they are noninterventionists, interventionists, and interactionalists.
The BODI contains 12 multiple-choice items, each with two response options. Foreach item, participants are asked to select the statement with which they most
agree. The BODI contains three subscales representing the Noninterventionist, In-
terventionist, and Interactionalist orientations, with scores on each subscale rang-
ing from 0 to 8. A high score on any of these scales represents a teachers proclivity
toward the particular discipline approach. For our study, the Noninterventionist,
Interventionist, and Interactionalist subscales generated scores that had a classical
theory alpha reliability coefficient of .72 (95% confidence interval [CI] = .66, .77),
.75 (95% CI = .69, .80), and .94 (95% CI = .93, .95), respectively.
A series of independent ttests, using the Bonferroni adjustment to maintain afamily-wise error of 5%, revealed no statistically significant difference between
Caucasian American (n = 175) and minority participants (n = 26) for scores on the
Interventionist, t(199) = 1.47, p > .05; Noninterventionist, t(199) = 0.88, p > .05;
and Interactionalist, t(199) = 0.52, p > .05 subscales. After finding statistical
nonsignificance, Onwuegbuzie, Witcher, et al. (2003) could have concluded that
there were no ethnic differences in discipline beliefs. However, Onwuegbizie,
Witcher, et al. decided to conduct a post hoc power analysis. The post hoc power
analysis for this test of ethnic differences revealed low statistical power. Thus,
Onwuegbuzie, Witcher, et al. (2003) concluded the following:
The finding of no ethnic differences in discipline beliefs also is not congruent with
Witcher et al. (2001), who reported that minority preservice teachers less often en-
dorsedclassroomandbehavior management skillsas characteristic ofeffectiveteach-
ers than did Caucasian-American preservice teachers. Again, the non-significance
could have stemmed from the relatively small proportion of minority students (i.e.,
12.9%), which induced relatively low statistical power (i.e., 0.59) for detecting a
moderate effect size for comparing the two groups with respect to three outcomes
(Erdfelder et al., 1996). More specifically, using the actual observed effect sizes per-
taining to these three differences, and applying the Bonferroni adjustment, the
post-hoc statistical power estimates were .12, .06, and .03 for interventionist,
POST HOC POWER 221
-
8/2/2019 Post Hoc Power
22/31
Replications are thus needed to determine the reliability of the present findings of no
ethnic differences in discipline belief. (p. 19)
Although Onwuegbuzie, Witcher, et al. (2003) computed their power coeffi-
cients using a statistical power program (i.e., Erdfelder et al., 1996), these esti-
mates could have been calculated by hand using Equation 3 and Tables 2 and 3.
For example, to determine the power of the test of racial differences in levels of in-
terventionist orientation, Table 3 would be used (i.e., = .01) as a result of theBonferroni adjustment used (i.e., .05/3). The proportion of variance in interven-
tionist orientation accounted for by race (i.e., effect size) can be obtained by using
one of the following three transformation formulae:
(11)
(12)
(13)
where t2 is square of the tvalue, Fis square of the Fvalue, d2 is the square of Co-
hens dstatistic, and dfi and dfe are the numerator and denominator degrees of free-
dom, respectively (Murphy & Myors, 1998). Using the tvalue of 1.47 reported
previously by Onwuegbuzie,Witcher, et al. pertaining to the racial difference in in-
terventionist orientation, the associated numerator degrees of freedom (dfi) of 1
(i.e., one independent variable: race) and the corresponding denominator degrees
of freedom (dfe) of 199 (i.e., total sample size 2) from Equation 11, we obtain
(14)
Next, we substitute this value ofES and the numerator and denominator degrees of
freedom into Equation 3 to yield a noncentrality parameter of= .0107 (1 + 199 +1) = 2.15 2.FromTable 3,weuse dfi =1and = 2. Now, dfe = 199 isbetweenLdfe= 120 and Udfe = . We could use Equation 5, however, because if both the lower
and upper power values are .12,our estimated power value will be .12,which repre-sents extremely low statistical power for detecting the smallobserved effect size of
222 ONWUEGBUZIE AND LEECH
2
2,
( )e
tES
t df=
+
( ),
( )
i
i e
df FES
df F df =
+
2
2 ;4
dES d= +
2
2
(11.47).0107.
(1.47) 199ES = =
+
-
8/2/2019 Post Hoc Power
23/31
If Onwuegbuzie, Witcher, et al. (2003) had stopped after finding statistically
nonsignificant results, this would have been yet another instance of the deepening
file-drawer problem (Rosenthal, 1979). The post hoc power analysis allowed thestatistically nonsignificant finding pertaining to ethnicity to be placed in a more
appropriate context. This in turn helped the researchers to understand that their sta-
tistically nonsignificant finding was not necessarily due to an absence of a rela-
tionship between ethnicity and discipline style in the population but was due at
least in part to a lack of statistical power.
By conducting a post hoc analysis, Onwuegbuzie, Witcher, et al. (2003) real-
ized that the conclusion that no ethnic differences prevailed in discipline beliefs
likely would have resulted in a Type II error being committed. Similarly, if
Onwuegbuzie, Witcher, et al. had bypassed conducting a statistical significancetest and had merely computed and interpreted the associated effect size, they
would have increased the probability of committing a Type A error. Either way,
conclusion validity would have been threatened.
As noted by Onwuegbuzie and Levin (2003) and Onwuegbuzie, Levin, et al.
(2003), the best way to increase conclusion validity is through independent (exter-
nal) replications of results (i.e., two or more independent studies yielding similar
findings that produce statistically and practically compatible outcomes). Indeed,
independent replications make invaluable contributions to the cumulative knowl-
edge in a given domain (Robinson & Levin, 1997, p. 25). Thus, post hoc analysescan play a very important role in promoting as well as improving the quality of ex-
ternal replications. The post hoc analysis documented previously, by revealing
low statistical power first and foremost, strengthens the rationale for conducting
independent replications. Moreover, researchers interested in performing inde-
pendent replications could use the information from Onwuegbuzie, Witcher, et
al.s (2003) post hoc analysis for their subsequent a priori power analyses. Indeed,
as part of their post hoc power analysis, Onwuegbuzie, Witcher, et al. could have
conducted what we call a what-if post hoc power analysis to determine how many
more cases would have been needed to increase the power coefficient to a more ac-ceptable level (e.g., .80). As such, what-if post hoc power analyses provide useful
information for future researchers.
Even novice researchers who are unable or unwilling to conduct a priori power
analyses can use the post hoc power values to warn them to employ larger samples
(i.e., n > 201) in independent replication studies than that used in Onwuegbuzie,
Witcher, et al.s (2003) investigation. Moreover, novice researchers can use infor-
mation from what-if post hoc power analyses to set their minimum group/sample
sizes. Although we strongly encourage these researchers to perform a priori power
analyses, we believe that what-if post hoc power analyses place the novice re-searcher in a better position to select appropriate sample sizesones that reduce
POST HOC POWER 223
-
8/2/2019 Post Hoc Power
24/31
around what-if post hoc power estimates using the (95%) upper and lower confi-
dence limits around the observed effect size (cf. Bird, 2002). For instance, for the
two-sample t-test context, Hedges and Olkins (1985) z-based 95% CI6 could beused to determine the upper and lower limits that subsequently would be used to
derive a confidence interval around the estimated minimum sample size needed to
obtain adequate power in an independent replication.
Post hoc power analyses also could be used for meta-analytic purposes. In par-
ticular, if researchers routinely were to conduct post hoc power analyses,
meta-analysts could determine the average observed power across studies for any
hypothesis of interest. Knowledge of observed power across studies in turn would
help researchers interpret the extent to which studies in a particular area were truly
contributing to the knowledge base. Low average observed power across inde-pendent replications would necessitate research design modifications in future ex-
ternal replications. Additionally, what we call weighted meta-analyses could be
conducted wherein effect sizes across studies are weighted by their respective post
hoc power estimates. For example, if an analysis revealed a post hoc power esti-
mate of .80, the effect-size estimate of interest would be multiplied by this value to
yield a power-adjusted, effect-size coefficient. Similarly, a post hoc power esti-
mate of .60 would adjust the observed effect-size coefficient by this amount. By
using this technique when conducting meta-analyses, effect sizes resulting from
studies with adequate power would be given more weight than effect sizes derivedfrom investigations with low power.
Inaddition,ifaresearcherconductsbothanaprioriandaposthocpoweranalysis
within the same study, then a power discrepancy index (PDI) could be computed,
whichrepresentsthedifferencebetweentheaprioriandposthocpowercoefficients.
ThePDIwouldthenprovideinformationtoresearchersforincreasingthesensitivity
(i.e., the degree to whichsampling error introduce imprecision into the results of a
study; Murphy & Myors, 1998, p. 5) of future a priori power analyses. Sensitivity
mightbe improved by increasing thesamplesize in independent replication studies.
Alternatively,sensitivity might be increased byusing measures thatyield more reli-able and valid scores or a study design that allows them to control for undesirable
sources of variability in their data (Murphy & Myors, 1998).
We have outlined several ways in which post hoc power analyses can be used to
improve the quality of present and future studies. However, this list is by no means
exhaustive. Indeed, we are convinced that other uses of post hoc power analyses
are awaiting to be uncovered. Regardless of the way in which a post hoc power
analysis is utilized, it is clear that this approach represents good detective work be-
cause it enables researchers to get to know their data better and consequently puts
224 ONWUEGBUZIE AND LEECH
6According to Hess and Kromrey (2002) Hedges and Olkinss (1985) z-based confidence interval
-
8/2/2019 Post Hoc Power
25/31
them in a position to interpret their data in a more meaningful way and with more
conclusion validity. Therefore, we believe that post hoc power analyses should
play a central role in NHST.
SUMMARY AND CONCLUSIONS
Robinson andLevin (1997) proposed a two-step procedure for analyzing empirical
data whereby researchers first evaluate the probability of an observed effect (i.e.,
statistical significance), and if andonly if statistical significance is found, then they
assess the effect size. Recently, Onwuegbuzie and Levin (2003) proposed a
three-step procedure when two or more hypothesis tests are conducted within the
samestudy,whichinvolvestestingthetrendofthesetofhypothesesatthethirdstep.Although both methods are appealing, their effectiveness depends on the statistical
power of the underlying hypothesis tests. Specifically, if power is lacking, then the
first step of the two-step method and the first and third steps of the three-step proce-
dure, which serve as gatekeepers for computing effect sizes, may lead to the
nonreporting of a nontrivial effect (i.e., Type A error; Onwuegbuzie, 2001).
Because the typical level of power for medium effect sizes in the behavioral and
social sciences is around .50 (Cohen, 1962), the incidence of Type A error likely is
high. Clearly, this incidence can be reduced if researchers conduct an a priori
power analysis to select appropriate sample sizes. However, such analyses arerarely employed (Cohen, 1992). Regardless, when a statistically nonsignificant
finding emerges, researchers should then conduct a post hoc power analysis. This
would help researchers determine whether low power threatens the internal valid-
ity of their findings (i.e., Type A error). Moreover, if researchers conduct post hoc
power analyses whether statistical significance is reached, then meta-analysts
would be able to assess whether independent replications are making significant
contributions to the accumulation of knowledge in a given domain. Unfortunately,
virtually no meta-analyst has formally used this technique.
Thus, in this article, weadvocate the use of posthoc power analyses in empiricalstudies, especially when statistically nonsignificant findings prevail. First, reasons
for the nonuse of a priori power analyses were presented. Second, post hoc power
wasdefinedanditsutilitydelineated.Third,astep-by-stepguideisprovidedforcon-
ductingpost hoc power analyses. Fourth, a heuristic example was provided to illus-
trate how post hoc power can help to rule in/out rival explanations to observed
findings. Finally, several methods were outlined that describe how post hoc power
analyses can be used to improve the design of independent replications.
Although we advocate the use of post hoc power analyses, we believe that such
approaches should never be used as a substitute for a priori power analyses. More-over, we recommend that a priori power analyses always be conducted and re-
POST HOC POWER 225
-
8/2/2019 Post Hoc Power
26/31
more statistically nonsignificant findings emerge. Post hoc power analyses rely
more on available data and less on speculation than do a priori power analyses that
are based on hypothesized effect sizes.Indeed, we agree with Wooley and Dawson (1983), who suggested editorial
policies to require all such information relating to a priori design considerations
and post hoc interpretation to be incorporated as a standard component of any re-
search report submitted for publication (p. 680). Although it could be argued that
this recommendation is ambitious, it is no more ambitious than the editorial poli-
cies at 20 journals that now formally stipulate that effect sizes be reported for all
statistically significant findings (Capraro & Capraro, 2002). In fact, post hoc
power provides a nice balance in report writing because we believe that post hoc
power coefficients are to statistically nonsignificant findings as effect sizes are tostatistically significant findings. In any case, we believe that such a policy of con-
ducting and reporting a priori and post hoc power analyses would simultaneously
reduce the incidence of Type II and Type A errors and subsequently reduce the in-
cidence of publication bias and the file-drawer problem. This can only help to in-
crease the accumulation of knowledge across studies because meta-analysts will
have much more information with which to use. This surely would represent a step
in the right direction.
REFERENCES
American Psychological Association. (2001). Publication manualof theAmerican Psychological Asso-
ciation (5th ed.). Washington, DC: Author.
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 6,
432437.
Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of variance. Educational and Psy-
chological Measurement, 62, 197226.
Brewer, J. K. (1972). On the power of statistical tests in the American Education Research Journal.
American Education Research Journal, 9, 391401.
Brewer, J. K., & Owen, P. W. (1973). A note on the power of statistical tests in the Journal of Educa-
tional Measurement. Journal of Educational Measurement, 10, 7174.Cahan, S. (2000). Statistical significance is not a kosher certificate for observed effects: A critical
analysis of the two-step approach to the evaluation of empirical results. Educational Researcher,
29(1), 3134.
Capraro, R. M., & Capraro, M. M. (2002). Treatments of effect sizes and statistical significance tests in
textbooks. Educational and Psychological Measurement, 62, 771782.
Carver, R. P. (1978). The case against statistical significance testing.Harvard Educational Review, 48,
378399.
Carver, R. P. (1993). The case against statistical significance testing, revisited.Journal of Experimental
Education, 61, 287292.
Chandler, R. E. (1957). The statistical concepts of confidence and significance. Psychological Bulletin,
54, 429430.
Chase, L. J., & Baran, S. J. (1976). An assessment of quantitative research in mass communication.
226 ONWUEGBUZIE AND LEECH
-
8/2/2019 Post Hoc Power
27/31
Chase,L. J.,& Chase,R. B. (1976). A statistical power analysis of appliedpsychological research.Jour-
nal of Applied Psychology, 61, 234237.
Chase, L. J., & Tucker, R. K. (1975). A power-analytical examination of contemporary communication
research. Speech Monographs, 42(1), 2941.
Chase, L. J., & Tucker, R. K. (1976). Statistical power: Derivation, development, and data-analytic im-
plications. The Psychological Record, 26, 473486.
Cohen, J. (1962). The statistical power of abnormal social psychological research: A review.Journal of
Abnormal and Social Psychology, 65, 145153.
Cohen,J. (1965). Some statistical issues in psychologicalresearch. In B.B. Wolman(Ed.),Handbook of
clinical psychology (pp. 95121). New York: McGraw-Hill.
Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70,
426443.
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic.
Cohen, J. (1973). Statistical power analysis andresearch results.American Educational Research Jour-nal, 10, 225230.
Cohen,J. (1977). Statisticalpoweranalysis forthebehavioral sciences (Rev.ed.).NewYork:Academic.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Law-
rence Erlbaum Associates, Inc.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155159.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 9971003.
Cohen, J. (1997). The earth is round (p < .05). InL.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What
if therewere nosignificancetests?(pp. 2135). Mahwah, NJ:Lawrence Erlbaum Associates, Inc.
Cooper, H. M., & Findley, M. (1982). Expected effect sizes: Estimates for statistical power analysis in
social psychology. Personality and Social Psychology Bulletin, 8, 168173.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence in-tervals that are based on central and noncentral distributions. Educational and Psychological Mea-
surement, 61, 532575.
Daly, J., & Hexamer, A. (1983). Statistical power in research in English education. Research in the
Teaching of English, 17, 157164.
Dayton, C.M.,Schafer,W.D.,& Rogers, B.G. (1973). Onappropriate uses andinterpretationsofpower
analysis: A comment. American Educational Research Journal, 10, 231234.
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program.Behavior
Research Methods, Instruments, & Computers, 28, 111.
Fagely, N. S. (1985). Applied statistical power analysis and the interpretation of nonsignificant results
by research consumers. Journal of Counseling Psychology, 32, 391396.
Fagley, N. S.,& McKinney, I. J. (1983). Reviewer bias for statistically significant results: A reexamina-
tion. Journal of Counseling Psychology, 30, 298300.
Fidler, F. (2002). The fifth edition of theAPA Publication Manual: Why its statistics recommendations
are so controversial. Educational and Psychological Measurement, 62, 749770.
Fisher, R. A. (1941). Statistical methods for research workers (84th ed.) Edinburgh, Scotland: Oliver &
Boyd. (Original work published 1925)
Fleishman, A. I. (1980). Confidence intervals for correlation ratios. Educational and Psychological
Measurement, 40, 659670.
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulle-
tin, 82, 120.
Guttman, L. B. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic
Models and Data Analysis, 1, 310.
Haase, R. F. (1974). Power analysis of research in counselor education. Counselor Education and Su-
POST HOC POWER 227
-
8/2/2019 Post Hoc Power
28/31
Haase, R.F., Waechter,D.M.,& Solomon, G.S.(1982). How significant isa significant difference?Aver-
ageeffectsize ofresearch incounseling psychology.Journal of CounselingPsychology,29,5865.
Halpin, R., & Easterday, K. E. (1999). The importance of statistical power for quantitative research: Apost-hoc review. Louisiana Educational Research Journal, 24, 528.
Harris, R. J. (1997). Reforming significance testing via three-valued logic. In L. L. Harlow, S. A.
Mulaik,& J.H. Steiger (Eds.), What if therewere nosignificance tests?(pp. 145174).Mahwah, NJ:
Lawrence Erlbaum Associates, Inc.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic.
Hess, M. R., & Kromrey, J. D. (2002, April). Interval estimates of effect size. An empirical comparison
of methods for constructing confidence bands around standardized mean differences. Paper pre-
sentedat theannualmeeting of theAmericanEducational ResearchAssociation,NewOrleans,LA.
Katzer, J., & Sodt, J. (1973). An analysis of the use of statistical testing in communication research. The
Journal of Communication, 23, 251265.
Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., et al. (1998). Statis-tical practices of educational researchers:An analysis of their ANOVA, MANOVA, andANCOVA
analyses. Review of Educational Research, 68, 350386.
Kroll, R. M., & Chase, L. J. (1975). Communication disorders: A power analytical assessment of recent
research. Journal of Communication Disorders, 8, 237247.
Levenson,R. L. (1980). Statistical power analysis: Implications for researchers,planners,andpractitio-
ners in gerontology. Gerontologist, 20, 494498.
Levin, J. R., & Robinson, D. H. (2000). Statistical hypothesis testing, effect-size estimation, and the
conclusion coherence of primary research studies. Educational Researcher, 29(1), 3436.
Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze
data. Current Directions in Psychology, 5, 161171.
McNemar, Q. (1960). At random: Sense and nonsense. American Psychologist, 15, 295300.
Meehl, P. E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy
of Science, 34, 103115.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress
of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806834.
Morse, D. T. (2001, November). YAPP: Yet another power program.
Paper presented at the annual meeting of the Mid-South Educational Research Association, Chatta-
nooga, TN.
Mulaik, S.A., Raju, N.S., & Harshman, R.A. (1997). There is a timeand a place for significance testing.
In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp.
65115). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Mundfrom, D. J., Shaw, D. G., Thomas, A., Young, S., & Moore, A. D. (1998, April). Introductory
graduate research courses: An examination of the knowledge base. Paper presented at the annual
meeting of the American Educational Research Association, San Diego, CA.
Murphy, K. R., & Myors, B. (1998). Statistical power analysis: A simple and general model for tradi-
tional and modern hypothesis tests. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Neyman, J., & Pearson,E.S. (1928). On the use and interpretation ofcertain test criteria for the purposes
of statistical inference. Biometrika, 20A, 175200, 263294.
Neyman, J.,& Pearson, E.S. (1933a).On theproblem of themost efficient tests of statistical hypotheses.
Transactions of the Royal Society of London, Series A, 231, 289337.
Neyman, J., & Pearson, E. S. (1933b). Testing statistical hypotheses in relation to probabilities a priori.
Proceedings of the Cambridge Philosophical Society, 29, 492510.Nix, T. W., & Barnette, J. J. (1998). The data analysis dilemma: Ban or abandon. A review of null hy-
pothesis significance testing Research in the Schools, 5, 314
228 ONWUEGBUZIE AND LEECH
-
8/2/2019 Post Hoc Power
29/31
Onwuegbuzie, A. J. (2002). Common analytical and interpretational errors in educational research: An
analysis of the 1998 volume of the British Journal of Educational Psychology. Educational Re-
search Quarterly, 26, 1122.
Onwuegbuzie, A. J. (2003). Expanding the framework of internal and external validity in quantitative
research. Research in the Schools, 10, 7190.
Onwuegbuzie,A. J.,& Daniel,L. G. (2002). A frameworkforreportingandinterpreting internal consis-
tency reliability estimates. Measurement and Evaluation in Counseling and Development, 35,
89103.
Onwuegbuzie,A. J.,& Daniel,L. G. (2003,February12). Typologyofanalytical andinterpretational er-
rors in quantitative andqualitative educational research.Current Issues in Education, 6(2) [Online].
Retrieved October 10, 2004, from http://cie.ed.asu.edu/volume6/number2/
Onwuegbuzie, A. J., & Daniel, L. G. (2004). Reliability generalization: The importance of considering
sample specificity, confidence intervals, and subgroup differences. Research in the Schools, 11(1),
6172.Onwuegbuzie, A. J., & Levin, J. R. (2003). A new proposed three-step method for testing result direc-
tion. Manuscript in preparation.
Onwuegbuzie, A. J., & Levin, J. R. (2003). Without supporting statistical evidence, where would re-
portedmeasures of substantive importance lead? To no good effect.Journal of Modern Applied Sta-
tistical Methods, 2, 133151.
Onwuegbuzie, A. J., Levin, J. R., & Leech, N. L. (2003). Do effect-size measures measure up? A brief
assessment. Learning Disabilities: A Contemporary Journal, 1, 3740.
Onwuegbuzie, A. J., Witcher, A. E., Filer, J., Collins, K. M. T., & Moore, J. (2003). Factors associated
with teachers beliefs about discipline in the context of practice. Research in the Schools, 10(2),
3544.
Orme, J. G., & Tolman, R. M. (1986). The statistical power of a decade of social work education re-search. Social Science Review, 60, 619632.
Overall, J. E. (1969). Classical statistical hypotheses within the context of Bayesian theory. Psychologi-
cal Bulletin, 71, 285292.
Penick, J. E., & Brewer, J. K. (1972). The powerof statistical tests in science teaching research.Journal
of Research in Science Teaching, 9, 377381.
Robinson, D. H., & Levin, J. R. (1997). Reflections on statistical and substantive significance, with a
slice of replication. Educational Researcher, 26(5), 2126.
Rosenthal, R. (1979). The file-drawer problem and tolerance for null results. Psychological Bulletin,
86, 638641.
Rossi, J. S. (1990). Statistical powerof psychological research: What have we gained in 20 years?Jour-
nal of Consulting and Clinical Psychology, 58, 646656.
Rossi, J. S. (1997). A case study in the failure of psychology as a cumulative science: The spontaneous
recovery of verbal learning. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were
no significance tests? (pp. 175197). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin,
57, 416428.
SASInstitute, Inc. (2002). SAS/STAT UsersGuide (Version 8.2) [Computer software]. Cary, NC:Au-
thor.
Sawyer, A. G., & Ball, A. D. (1981). Statistical power and effect size in marketing research. Journal of
Marketing Research, 18, 275290.
Schmidt, F. L. (1992). What do data really mean? American Psychologist, 47, 11731181.Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Impli-
cations for the training of researchers Psychological Methods, 1, 115129
POST HOC POWER 229
-
8/2/2019 Post Hoc Power
30/31
(Eds.), What if there were no significance tests? (pp. 3764). Mahwah, NJ: Lawrence Erlbaum As-
sociates, Inc.
Schmidt, F. L., Hunter, J. E., & Urry, V. E. (1976). Statistical power in criterion-related validation stud-
ies. Journal of Applied Psychology, 61, 473485.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of
studies? Psychological Bulletin, 105, 309316.
Sherron,R.H. (1988). Power analysis: The other halfof the coin. Community/JuniorCollege Quarterly,
12, 169175.
SPSS, Inc. (2001). SPSS 11.0 for Windows [Computer software]. Chicago: Author.
Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calcula-
tion, and hypothesis testing for the squared multiple correlation. Behavior Research Methods, In-
struments, and Computers, 4, 581582.
Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and theevaluationof statistical
models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significancetests? (pp. 221257). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Thompson, B. (2002). What future quantitative social science research could look like: Confidence in-
tervals for effect sizes. Educational Researcher, 31(3), 2532.
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76,
105110.
Witcher, A. E., Onwuegbuzie, A. J., & Minor, L. C. (2001). Characteristics of effective teachers: Per-
ceptions of preservice teachers. Research in the Schools, 8(2), 4557.
Wilkinson, L., & The Task Force on Statistical Inference. (1999). Statistical methods in psychology
journals: Guidelines and explanations. American Psychologist, 54, 594604.
Witcher, A. E., Onwuegbuzie, A. J., & Minor, L. (2001). Characteristics of effective teachers: Percep-
tions of preservice teachers. Research in the Schools, 8, 4557.Wolfgang, D. H., & Glickman, C. D. (1986). Solving discipline problems: Strategies for classroom
teachers (2nd ed.). Boston: Allyn & Bacon.
Wooley, T. W., & Dawson, G. O. (1983). A follow-up power analysis of the statistical tests used in the
Journal of Research in Science Teaching. Journal of Research in Science Teaching, 20, 673681.
230 ONWUEGBUZIE AND LEECH
-
8/2/2019 Post Hoc Power
31/31