post hoc power

8/2/2019 Post Hoc Power

1/31

Post Hoc Power: A Concept WhoseTime Has Come

Anthony J. Onwuegbuzie

Department of Educational Measurement and ResearchUniversity of South Florida

Nancy L. Leech

Department of Educational Psychology

University of Colorado at Denver and Health Sciences

Thisarticle advocates the use ofposthoc power analyses. First, reasons for the nonuse

ofa prioripower analysesare presented.Next, posthoc power is defined and itsutility

delineated. Third, a step-by-step guide is provided for conducting post hoc power

analyses. Fourth, a heuristic example is provided to illustrate how post hoc power can

help to rule in/out rival explanations in the presence of statistically nonsignificant

findings.Finally, several methods areoutlined that describe howpost hocpoweranal-

yses can be used to improve the design of independent replications.

power, a prior power, post hoc power, significance testing, effect size

For more than 75 years, nullhypothesis significance testing (NHST)hasdominated

the quantitative paradigm stemming from the seminal works of Fisher (1925/1941)

and Neyman and Pearson (1928). NHST was designed to provide a means of ruling

out a chance finding, thereby reducing the chance of falsely rejecting the null hy-

pothesis in favor of the alternative hypothesis (i.e., committing a Type I error).

Although NHST has permeated the behavioral and social science field since its

inception, its practice has been subjected to severe criticism with the number of

critics growing throughout the years. The most consistent criticism of NHST that

has emerged is that statistical significance is not synonymous with practical im-portance. More specifically, statistical significance does not provide any informa-

UNDERSTANDING STATISTICS, 3(4), 201230Copyright 2004, Lawrence Erlbaum Associates, Inc.


2/31

tion about how important or meaningful an observed finding is (e.g., Bakan, 1966;

Cahan, 2000; Carver, 1978, 1993; Cohen, 1994, 1997; Guttman, 1985; Loftus,

1996; Meehl, 1967, 1978; Nix & Barnette, 1998; Onwuegbuzie & Daniel, 2003;Rozeboom, 1960; Schmidt, 1992; 1996; Schmidt & Hunter, 1997).

As a result of this limitation of NHST, some researchers (e.g., Carver, 1993)

contend that effect sizes, which represent measures of practical importance,

should replace statistical significance testing completely. However, reporting and

interpreting only effect sizes could lead to the overinterpretation of a finding

(Onwuegbuzie & Levin, 2003; Onwuegbuzie, Levin, & Leech, 2003). As noted by

Robinson and Levin (1997)

Although effect sizes speak loads about the magnitude of a difference or relationship,they are, in and of themselves, silent with respect to the probability that the estimated

difference or relationship is due to chance(samplingerror). Permitting authors to pro-

mote and publish seemingly interesting or unusual outcomes when it can be doc-

umented that such outcomes are not really that unusual would open the publication

floodgates to chance occurrences and other strange phenomena. (p. 25)

Moreover, interpreting effect sizes in the absence of a NHSTcould adversely affect

conclusion validity just as interpreting p values without interpreting effect sizes

could lead to conclusion invalidity (Onwuegbuzie, 2001). Table 1 illustrates thefour possible outcomes and their conclusion validity when combiningp values and

effect sizes. This table echoes the decision table that demonstrates the relationship

between Type I error and Type II error. It can be seen from Table 1 that conclusion

validity occurs when (a) both statistical significance (e.g.,p < .05)and a large effect

size prevail or (b) both statistical nonsignificance (e.g., p > .05) and a small effect

size occur. Conversely, conclusion invalidity exists if (a) statistical

nonsignificance (e.g., p > .05) is combined with a large effect size (i.e., Type A er-

ror) and (b) statistical significance (e.g., p < .05) is combined with a small effect

size (i.e., Type B error). Both Type A and Type B errors suggest that any declara-tion that the observed finding is meaningful on the part of the researcher would be

misleading and hence result in conclusion invalidity (Onwuegbuzie, 2001). In this

respect, conclusion validity is similar to what Levin and Robinson (2000) termed

conclusion coherence. Levin and Robinson (2000) defined conclusion coher-

ence as consistency between the hypothesis-testing and estimation phases of the

decision-oriented empirical study (p. 35).

Therefore, as can be seen in Table 1, always interpreting effect sizes regardless

of whether statistical significance is observed may lead to a Type A or Type B er-

ror being committed. Similarly, Type A and Type B errors also prevail whenNHST is used by itself. In fact, conclusion validity is maximized only if bothp val-

ues and effect sizes are utilized

202 ONWUEGBUZIE AND LEECH


3/31

a two-step process for making statistical inferences. According to this model, a

statistically significant observed finding is followed by the reporting and interpret-ing of one or more indexes of practical importance; however, no effect sizes are re-

ported in light of a statistically nonsignificant finding. In other words, analysts

should determine first whether the observed result is statistically significant (Step

1), and if and only if statistical significance is found, then they should report how

large or important the observed finding is (Step 2). In this way, the statistical sig-

nificance test in Step 1 serves as a gatekeeper for the reporting and interpreting

of effect sizes in Step 2.

This two-step process is indirectly endorsed by the latest edition of the Publica-

tion Manual of the American Psychological Association (American PsychologicalAssociation [APA], 2001):

When reporting inferential statistics (e.g., ttests, Ftests, and chi-square), include

[italics added] information about the obtained magnitude or value of the test statistic,

the degrees of freedom, the probability of obtaining a value as extreme as or more ex-

treme than the one obtained, and the direction of the effect. Be sure to include [italics

added]sufficient descriptive statistics (e.g., per-cell samplesize, means, correlations,

standard deviations) so that the nature of the effect being reported can be understood

by the reader and for future meta-analyses. (p. 22)

Three pages later, the APA (2001) stated

Neither of the two types of probability value directly reflects the magnitude of an ef-

fect or the strength of a relationship. For the reader to fully understand the importance

of your findings, it is almost always necessary to include some index of effect size or

strength of relationship in your Results section. (p. 25)

On the following page, APA (2001) stated that

Th l i i l t b f ll d h i t id th d t l ith

POST HOC POWER 203

TABLE 1

Possible Outcomes and Conclusion Validity When Null Hypothesis Significance Testing

and Effect Sizes Are Combined

Effect Size

Large Small

Not statistically significant p value Type A error Conclusion validity

Statistically significant Conclusion validity Type B error


4/31

Most recently, Onwuegbuzie and Levin (2003) proposed a three-step procedure

when two or more hypothesis tests are conducted within the same study, which in-

volves testing the trend of the set of hypotheses at the third step. Using either thetwo-step method or the three-step method helps to reduce not only the probability

of committing a Type A error but also the probability of committing a Type B er-

ror, namely, declaring as important a statistically significant finding with a small

effect size (Onwuegbuzie, 2001). However, whereas Type B error almost certainly

will be reduced by using one of these methods compared to using NHST alone, the

reduction in the probability of Type A error is not guaranteed using these proce-

dures. This is because if statistical power is lacking, then the first step of the

two-step method and the first and third steps of the three-step procedure, which

serve as gatekeepers for computing effect sizes, may lead to the nonreporting of anontrivial effect (i.e., Type A error). Simply put, sample sizes that are too small in-

crease the probability of a Type II error (not rejecting a false null hypothesis) and

subsequently increase the probably of committing a Type A error.

Clearly, both error probabilities can be reduced if researchers conduct a priori

power analyses to select appropriate sample sizes. However, unfortunately, such

analyses are rarely employed (Cohen, 1992; Keselman et al., 1998; Onwuegbuzie,

2002). When a priori power analyses have been omitted, researchers should con-

duct post hoc power analyses, especially for nonstatistically significant findings.

This would help researchers determine whether low power threatens the internalvalidity of findings (i.e., Type A error). Yet, researchers typically have not used

this technique.

Thus, in this article, we advocate the use of post hoc power analyses. First, we

present reasons for the nonuse of a priori power analyses. Next, we define post hoc

power and delineate its utility. Third, we provide a step-by-step guide for conduct-

ing post hoc power analyses. Fourth, we provide a heuristic example to illustrate

how posthoc power can help to rule in/out rival explanations in the presence of sta-

tistically nonsignificant findings. Finally, we outline several methods that describe

how post hoc power analyses can be used to improve the design of independentreplications.

DEFINITION OF STATISTICAL POWER

Neyman and Pearson (1933a,1933b) were the first todiscuss the concepts of TypeI

and Type II error. Type I error occurs when the researcher rejects the null hypothe-

sis when it is true.Asnotedpreviously, the TypeI error probability isdetermined by

the significance level (). For example, if a 5% level of significance is designated,then the Type I error rate is 5%. Stated another way, represents the conditional



5/31

are made over repeated samples from the same population under the same null and

alternative hypothesis, assuming thenull hypothesis is true. Conversely, Type II er-

ror occurs when the analyst accepts the null hypothesis when the alternative hy-pothesis is true. The conditional probability of making a Type II error under the al-

ternative hypothesis is denoted by .Statistical power is the conditional probability of rejecting the null hypothesis

(i.e., accepting the alternative hypothesis) when the alternative hypothesis is true.

The most common definition of power comes from Cohen (1988), who defined the

power of a statistical testas the probability [assuming the null hypothesis is false]

that it will lead to the rejection of the null hypothesis, i.e., the probability that it

will result in the conclusion that the phenomenon exists (p. 4). Power can be

viewed as how likely it is that the researcher will find a relationship or differencethat really prevails. It is given by 1 .

Statistical power estimates are affected by three factors.1 The first factor is level

of significance. Holding all other aspects constant, increasing the level of signifi-

cance increases power but also increases the probability of rejecting the null hy-

pothesis when it is true. The second influential factor is the effect size.

Specifically, the larger the difference between the value of the parameter under the

null hypothesis and the parameter under the alternative parameter, the greater the

power to detect it. The third instrumental component is the sample size. The larger

the sample size, the greater the likelihood of rejecting the null hypothesis2 (Chase& Tucker, 1976; Cohen, 1965, 1969, 1988, 1992). However, caution is needed due

to the fact that large sample sizes can increase power to the point of rejecting the

null hypothesis when the associated observed finding is not practically important.

Cohen (1965), in accordance with McNemar (1960), recommended a probabil-

ity of .80 or greater for correctly rejecting the null hypothesis representing a me-

dium effect at the 5% level of significance. This recommendation was based on

considering the ratio of the probability of committing a Type I error (i.e., 5%) to

the probability of committing a Type II error (i.e., 1 .80 = .20). In this case, the ra-

tio was 1:4, reflecting the contention that Type I errors are generally more seriousthan are Type II errors.

Power, levelof significance,effect size,andsamplesizeare relatedsuchthat any

oneofthesecomponentsisafunctionoftheotherthreecomponents.AsnotedbyCo-

hen (1988), whenanythreeof themare fixed, thefourth iscompletely determined

(p. 14).Thus, there are fourpossible types ofpower analyses inwhich one of the pa-

rametersisdeterminedasafunctionoftheotherthreeasfollows:(a)powerasafunc-

tionof levelof significance, effectsize,andsample size; (b) effectsize as a function

POST HOC POWER 205

1Recently, Onwuegbuzie and Daniel (2004) demonstrated empirically that power also is affected

by score reliability Specifically Onwuegbuzie and Daniel showed that the higher the score reliability


6/31

of level of significance, sample size, and power; (c) level of significance as a func-

tion of sample size, effect size, and power; and (d) samplesize as a function of level

ofsignificance,effect size,and power (Cohen,1965,1988). The latter typeofpoweranalysis is the most popular and most useful for planning research studies (Cohen,

1992).Thisformofpoweranalysis,whichiscalledana prioripoweranalysis, helps

the researcher to ascertain the sample size necessary to obtain a desired level of

power for a specified effect size and levelof significance. Conventionally, most re-

searchers set the power coefficient at .80 and the level of significance at .05. Thus,

once the expected effect size andtype of analysis are specified, then the sample size

needed to meet all specifications can be determined.

The value of an a priori power analysis is that it helps the researcher in planning

research studies (Sherron, 1988). By conducting such an analysis, researchers putthemselves in the position to select a sample size that is large enough to lead to a

rejection of the null hypothesis for a given effect size. Alternatively stated, a priori

power analyses help researchers to obtain the necessary sample sizes to reach a de-

cision with adequate power. Indeed, the optimum time to conduct a power analysis

is during the research design phase (Wooley & Dawson, 1983).

Failing to consider statistical power can have dire consequences for research-

ers. First and foremost, low statistical power reduces the probability of rejecting

the null hypothesis and therefore increases the probability of committing a Type II

error (Bakan, 1966; Cohen, 1988). It may also increase the probability of commit-ting a Type I error (Overall, 1969), yield misleading results in power studies

(Chase & Tucker, 1976), and prevent potentially important studies from being

published as a result of publication bias (Greenwald, 1975) that has been called the

file-drawer problem representing the tendency to keep statistically

nonsignificant results in file drawers (Rosenthal, 1979).

It has been exactly 40 years since Jacob Cohen (1962) conducted the first sur-

vey of power. In this seminal work, Cohen assessed the power of studies published

in the abnormal-social psychology literature. Using the reported sample size and a

nondirectional significance level of 5%, Cohen calculated the average power todetect a hypothesized effect (i.e., hypothesized power) across the 70 selected stud-

ies for nine frequently used statistical tests using small, medium, and large esti-

mated effect size values. The average power of the 2,088 major statistical tests

were .18, .48, and .83 for detecting a small, medium, and large effect size, respec-

tively. The average hypothesized statistical power of .48 for medium effects indi-

cated that studies in the abnormal psychology field had, on average, less than a

50% chance of correctly rejecting the null hypothesis (Brewer, 1972; Halpin &

Easterday, 1999).

During the next three decades after Cohens (1962) investigation, several re-searchers have conducted hypothetical power surveys across a myriad of disci-



7/31

Owen, 1973), communication (Chase & Tucker, 1975; Katzer & Sodt, 1973),

communication disorders (Kroll & Chase, 1975), mass communication (Chase &

Baran, 1976), counselor education (Haase, 1974), social work education (Orme &Tolman, 1986), science education (Penick & Brewer, 1972; Wooley & Dawson,

1983), English education (Daly & Hexamer, 1983), gerontology (Levenson,

1980), marketing research (Sawyer & Ball, 1981), and mathematics education

(Halpin & Easterday, 1999). The average hypothetical power of these 15 studies

was .24, .63, and .85 for small, medium, and large effects, respectively. Assuming

that a medium effect size is appropriate for use in most studies because of its com-

bination of being practically meaningful and realistic (Cohen, 1965; Cooper &

Findley, 1982; Haase, Waechter, & Solomon, 1982), the average power of .63

across these studies is disturbing. Similarly disturbing is the average hypothesizedpower of .64 for a medium effect reported by Rossi (1990) across 25 power sur-

veys involving more than 1,500 journal articles and 40,000 statistical tests.

An even more alarming picture is painted by Schmidt and Hunter (1997) who

reported that the average [hypothesized] power of null hypothesis significance

tests in typical studies and research literature is in the .40 to .60 range (Cohen,

1962, 1965, 1988, 1992; Schmidt, 1996; Schmidt, Hunter, & Urry, 1976;

Sedlmeier & Gigerenzer, 1989) [with] .50 as a rough average (p. 40). Unfortu-

nately, an average hypothetical power of .5 indicates that more than one half of all

statistical tests in the social and behavioral science literature will be statisticallynonsignificant. As noted by Schmidt and Hunter (1997), This level of accuracy is

so low that it could be achieved just by flipping a (unbiased) coin! (p. 40). Yet, the

fact that power is unacceptably low in most studies suggests that misuse of NHST

is to blame, not the logic of NHST. Moreover, the publication bias that prevails in

research suggests that the hypothetical power estimates provided previously likely

represent an upper bound. Thus, as declared by Rossi (1997), it is possible that at

least some controversies in the social and behavioral sciences may be artifactual in

nature (p. 178). Indeed, it canbe argued that low statistical power represents more

of a research design issue than it is a statistical issue because it can be rectified byusing a larger sample.

Bearing in mind the importance of conducting statistical power analyses, it is

extremely surprising that very few researchers conduct and report power analyses

for their studies (Brewer, 1972; Cohen, 1962, 1965, 1988, 1992; Keselman et al.,

1998; Onwuegbuzie, 2002; Sherron, 1988) even though statistical power has been

promoted actively since the 1960s (Cohen, 1962, 1965, 1969) and even though for

many types of statistical analyses (e.g., r, z, F, 2), tables have been provided byCohen (1988, 1992) to determine the necessary sample size. Even when a priori

power has been calculated, it is rarely reported (Wooley & Dawson, 1983). Thislack of power analyses still prevails despite the recommendations of the APA

POST HOC POWER 207


8/31

The lack of use of power analysis might be the result of one or more of the fol-

lowing factors. First and foremost, evidence exists that statistical power is not suf-

ficiently understood by researchers (Cohen, 1988, 1992). Second, it appears thatthe concept and applications of power are not taught in many undergraduate- and

graduate-level statistical courses. Moreover, when power is taught, it is likely that

inadequate coverage is given. Disturbingly, Mundfrom, Shaw, Thomas, Young,

and Moore (1998) reported that the issue of statistical power is regarded by in-

structors of research methodology, statistics, and measurement as being only the

34th most important topic in their fields out of the 39 topics presented. Also in the

Mundfrom et al. study, power received the same low ranking with respect to cover-

age in the instructors classes. Clearly, if power is not being given a high status in

quantitative-based research courses, then students similarly will not take it seri-ously. In any case, these students will not be suitably equipped to conduct such

analyses.

Another reason for the spasmodic use of statistical power possibly stems from

the incongruency between endorsement and practice. For instance, although APA

(2001) stipulated that power analyses be conducted despite providing several

NHST examples, the manual does not provide any examples of how to report sta-

tistical power (Fidler, 2002). Harris (1997) also provided an additional rationale

for the lack of power analyses:

I suspect that this low rate ofuse ofpower analysis is largely due to the lackofpropor-

tionality between the effort required to learn and execute power analyses (e.g., deal-

ing with noncentral distributions or learning the appropriate effect-size measure with

which to enter the power tables in a given chapter of Cohen, 1977) and the low payoff

from such an analysis (e.g., the high probability that resource constraints will force

you to settle for a lower Nthan your power analysis says you should have)espe-

ciallygiven theuncertainties involved in a prioriestimatesof effectsizesandstandard

deviations, which render the resulting power calculation rather suspect. If calculation

of thesamplesize neededforadequate powerandforchoosingbetween alternative in-

terpretations of a nonsignificant result could be made more nearly equal in difficulty

to the effort weve grown accustomed to putting into significance testing itself, more

of usmight in fact carryout these preliminaryandsupplementary analyses.(p.165)

A further reason why a priori power analyses are not conducted likely stems

from the fact that the most commonly used statistical packages, such as the Statis-

tical Package for the Social Sciences (SPSS; SPSS Inc., 2001) and the Statistical

Analysis System (SAS Institute Inc., 2002), do not allow researchers directly to

conduct power analyses. Furthermore, the statistical software programs that con-

duct power analyses (e.g., Erdfelder, Faul, & Buchner, 1996; Morse, 2001), al-though extremely useful, typically do not conduct other types of analyses, and



9/31

sive. Even when researchers have power software in their possession, the lack of

information regarding components needed to calculate power (e.g., effect size,

variance) serves as an additional impediment to a priori power analyses.It is likely that the lack of power analyses coupled with a publication bias pro-

mulgates the publishing of findings that are statistically significant but have small

effect sizes (possibly increasing Type B error) as well as leading researchers to

eliminate valuable hypotheses (Halpin & Easterday, 1999). Thus, we recommend

in the strongest possible manner that all quantitative researchers conduct a priori

power analyses whenever possible. These analyses should be reported in the

Method section of research reports. This report also should include a rationale for

criteria used for all input variables (i.e., power, significance level, effect size;

APA, 2001; Cohen, 1973, 1988). Inclusion of such analyses will help researchersto make optimum choices on the components (e.g., sample size, number of vari-

ables studied) needed to design a trustworthy study.

POST HOC POWER ANALYSES

Whether an a priori power analysis is undertaken and reported, problems can still

arise. One problem that commonlyoccurs in educational research is when thestudy

is completed anda statistically nonsignificant result is found. In many cases, the re-

searcher then disregards the study (i.e., file-drawer problem) or when he or she sub-mits the final report to a journal for review, finds it is rejected (i.e., publication

bias). Unfortunately, most researchers do not determine whether the statistically

nonsignificant result is the result of insufficient statistical power. That is, without

knowing the power of the statistical test, it is not possible to rule in or rule out low

statistical power as a threat to internal validity (Onwuegbuzie, 2003). Nor can an a

priori power analysis necessarily rule in/out this threat. This is because a priori

power analyses involve the use of a priori estimates of effect sizes and standard de-

viations (Harris, 1997). As such, a priori power analyses do not represent thepower

to detect the observed effect of the ensuing study; rather, they represent the powerto detect hypothesized effects. Before the study is conducted, researchers do not

knowwhat the observed effect size will be. All theycan do is try toestimate itbased

on previous research and theory (Wilkinson & the Task Force on Statistical Infer-

ence, 1999). The observed effect size could end up being much smaller or much

larger than the hypothesized effect size on which the power analysis is undertaken.

(Indeed, this is a criticism of the power surveys highlighted previously; Mulaik,

Raju, & Harshman, 1997.) In particular, if the observed effect size is smaller than

what is proposed, the sample size yielded by the a priori power analysis might be

smaller than is needed to detect it. In other words, a smaller effect size than antici-pated increases the chances of Type II error.

POST HOC POWER 209


10/31

gate the performance of an NHST (Mulaik et al., 1997; Schmidt, 1996; Sherron,

1988). Such a technique leads to what is often called a post hoc power analysis. In-

terestingly, several authors have recommended the use of post hoc power analysesfor statistically nonsignificant findings (Cohen, 1969; Dayton, Schafer, & Rogers,

1973; Fagely, 1985; Fagley & McKinney, 1983; Sawyer & Ball, 1981; Wooley &

Dawson, 1983).

When post hoc power should be reported has been the subject of debate. Al-

though some researchers advocate that post hoc power always be reported (e.g.,

Wooley & Dawson, 1983), the majority of researchers advocate reporting post hoc

power only for statistically nonsignificant results (Cohen, 1965; Fagely, 1985;

Fagley & McKinney, 1983; Sawyer & Ball, 1981). However, both sets of analysts

have agreed that estimating the power of significance tests that yield statisticallynonsignificant findings plays an important role in their interpretations (e.g.,

Fagely, 1985; Fagley & McKinney, 1983; Sawyer & Ball, 1981; Tversky &

Kahneman, 1971). Specifically, statistically nonsignificant results in a study with

low power suggest ambiguity. Conversely, statistically nonsignificant results in a

study with high power contribute to the body of knowledge because power can be

ruled out as a threat to internal validity (e.g., Fagely, 1985; Fagley & McKinney,

1983;Sawyer & Ball, 1981; Tversky & Kahneman, 1971). To this end, statistically

nonsignificant results can make a greater contribution to the research community

than they presently do. As noted by Fagely (1985), Just as rejecting the null doesnot guarantee large and meaningful effects, accepting the null does not preclude

interpretable results (p. 392).

Conveniently, post hoc power analyses can be conducted relatively easily be-

cause some of the major statistical software programs compute post hoc power es-

timates. In fact, post hoc power coefficients are available in SPSS for the general

linear model (GLM). Post hoc power (or observed power, as it is called in the

SPSS output) is based on taking the observed effect size as the assumed popula-

tion effect, which produces a positively biased but consistent estimate of the ef-

fect (D. Nichols,3 personal communication, November 4, 2002). For example, thepost hoc power procedure for analyses of variance (ANOVAs) and multiple

ANOVAs is contained within the options button.4 It should be noted that due to

sampling error, the observed effect size used to compute the post hoc power esti-

mate might be very different than the true (population) effect size, culminating in a

misleading evaluation of power.


3

David Nichols is a Principal Support Statistician and Manager of Statistical Support, SPSS Tech-nical Support.

4Details of the post hoc power computations in the statistical algorithms for the GLM and


11/31

PROCEDURE FOR CONDUCTING POST HOC

POWER ANALYSES

Although post hoc power analysis procedures exist for multivariate tests of statisti-

cal significance such as multivariate analysis of variance and multivariate analysis

of covariance (cf. Cohen, 1988), these methods are sufficiently complex to warrant

use of statistical software. Thus, for this article, we restrict our attention to post hoc

power analysis procedures for univariate tests of statistical significance. What fol-

lows is a description of how to undertake a post hoc power analysis by hand for

univariate tests, that is, for situations in which there is one quantitative dependent

variable and one or more independent variables. Here, we use Cohens (1988)

framework for conducting post hoc power for multiple regression and correlationanalysis. Furthermore, these steps can also be used for a priori power analyses by

substituting in a hypothesized effect size value instead of an observed effect size

value. As noted by Cohen (1988), the advantage of this framework is that it can be

used for any type of relationship between the independent variable(s) and depend-

ent variable (e.g., linear or curvilinear, whole or partial,general or partial). Further-

more, the independent variables can be quantitative or qualitative (i.e., nominal

scale), direct variates or covariates, and main effects or interactions. Also, the inde-

pendent variables can represent ex post facto research or retrospective research and

canbe naturally occurring or theresults of experimental manipulation.Most impor-tant, however, because multiple regression subsumes all other univariate statistical

techniques as special cases (Cohen, 1968), this post hoc power analysis framework

can be used for other members of the univariate GLM, including correlation analy-

sis, ttests, ANOVA, and analysis of covariance.

The Ftest can be used for all univariate members of the GLM to test the null hy-

pothesis that the proportion of variance in the dependent variable that is accounted

for by one or more independent variables (PVi) is zero in the population. This Ftest

can be expressed as

(1)

where PVi is the proportion of the variance in the dependent variable explained by

thesetof independentvariables,PVe is the proportion of error variance, dfi isthede-

grees of freedom for the numerator (i.e., the numberof independent variables in the

set), and dfe is degrees of freedom for the denominator (i.e., degrees of freedom for

the error variance). Equation 1 can be rewritten as

POST HOC POWER 211

,=

i

i

e

e

PVdf

FPV

df


12/31

The left-hand term in Equation 2, which represents the proportion of variance in the

dependent variable that is accounted for by the set of independent variables, is a

measure of the effect size (ES) in the sample. The right-hand term in Equation 2,namely, the study size or sensitivity of the statistical test, provides information

about the samplesizeandnumber of independent variables. Thus, degree ofstatisti-

cal significance is a function of the product of the effect size and the study size (Co-

hen, 1988).

Because most theoretically selected independent variables have a nonzero rela-

tionship (i.e., nonzero effect size) with dependent variables, the distribution ofF

values in any particular investigation likely takes the form of a noncentral Fdistri-

bution (Murphy & Myors, 1998). Thus, the (post hoc) power of a statistical signifi-

cance test can be defined as the proportion of the noncentral distribution thatexceeds the critical value used to define statistical significance (Murphy &

Myors, 1998, p. 24). The shape of the noncentral Fdistribution as well as its range

are a function of both the noncentrality parameter and the respective degrees of

freedom (i.e., dfi and dfe). The noncentrality parameter, which is an index that de-

termines the extent to which null hypothesis is false, is given by

= ES(dfi + dfe + 1), (3)

where ES, the effect size, is given by

(4)

and dfi and dfe are thenumerator anddenominatordegrees of freedom, respectively.

Once the noncentrality parameter, , has been calculated, Table 2 or Table 3, de-pending on the level of, can be used to determine the post hoc power of the statis-tical test. Specifically, these tables are entered for values of, , dfi, and dfe.

Step 1: Statistical Significance Criterion ()

Table 2 represents = .05, and Table 3 represents = .01. (Interpolation and ex-trapolation techniques can be used for values between .01 and .05 and outsidethese limits, respectively.)

Step 2: The Noncentrality Parameter ()

InTables 2 and 3,power values are presented for 15values.Because is a contin-uous variable, linear interpolation typically is needed. According to Cohen (1988),


,i

e

PVES

PV=


13/31

POST HOC POWER 213

TABLE 2

Power of the FTest As a Function of , dfi, and dfe

dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

120 27 48 64 77 85 91 95 97 98 99 *

60 29 50 67 79 88 92 96 98 99 99 *

120 29 51 68 80 88 93 96 98 99 99 *

29 52 69 81 89 93 96 98 99 99 *

2 20 20 36 52 65 75 83 88 92 95 97 99 *

60 22 40 56 69 79 87 91 95 97 98 *

120 22 41 57 71 80 87 92 95 97 98 *

23 42 58 72 82 88 93 96 97 99 *

3 20 17 30 44 56 67 75 82 87 91 94 97 99 *

60 19 34 49 62 73 81 87 92 95 97 98 *

120 19 35 50 64 75 83 89 93 95 97 99 *

19 36 52 65 76 84 90 93 96 98 99 *

4 20 15 26 38 49 60 69 76 83 87 91 95 98 99 *

60 17 30 44 57 68 77 83 89 92 95 98 99 *

120 17 31 46 58 70 78 85 90 93 96 98 99 *

17 32 47 60 72 80 87 91 94 96 99 *

5 20 13 23 34 44 54 63 71 78 83 87 93 96 98 99 *

60 15 27 40 52 63 72 80 86 90 93 97 99 *120 16 29 41 54 65 75 82 87 91 94 98 99 *

16 29 43 56 68 77 84 89 93 95 98 99 *

6 20 12 21 30 40 50 59 66 73 79 84 91 95 97 99 *

60 14 25 37 48 59 68 76 83 87 91 96 98 99 *

120 14 27 39 50 62 71 79 85 89 93 97 99 99 *

15 27 40 53 64 74 81 87 91 94 97 99 *

7 20 11 19 28 37 46 54 62 69 75 80 88 93 96 98 99

60 17 24 35 45 56 65 73 80 85 89 94 97 99 99 *

120 13 25 37 47 59 68 76 82 87 91 96 98 99 *

14 25 38 50 61 71 79 85 89 93 97 99 99 *

8 20 10 18 26 34 42 50 58 65 71 76 85 91 94 97 9860 12 23 33 43 52 62 70 77 83 87 93 97 98 99 *

120 12 24 35 45 55 65 73 80 85 89 95 98 99 *

13 24 36 48 59 68 77 83 88 92 96 99 99 *

9 20 10 17 24 32 39 47 54 61 68 73 82 88 93 96 97

60 11 21 31 41 50 58 67 74 80 85 92 96 98 99 *

120 11 22 33 44 53 62 71 78 83 88 94 97 99 *

13 23 34 45 56 66 74 81 86 90 95 98 99 *

10 20 09 16 23 30 37 44 51 58 64 70 79 86 91 94 96

60 10 20 30 39 48 56 65 72 78 83 90 95 97 99 99

120 11 21 31 41 51 60 69 75 81 86 93 96 98 99 *

12 21 32 43 54 64 72 79 85 89 94 98 99 *

(continued)


14/31


TABLE 2 (Continued)

dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

11 20 09 15 21 27 34 41 48 55 61 67 76 84 89 93 96

60 11 18 27 36 45 54 62 70 76 81 89 94 97 98 99

120 11 19 29 39 49 58 67 74 80 85 91 96 98 99 *

12 21 31 42 52 62 70 78 83 89 94 97 99 99 *

12 20 08 14 20 27 33 39 45 52 58 64 74 81 87 91 94

60 09 18 28 36 44 52 59 67 73 79 87 93 96 98 99

120 10 20 30 40 48 56 63 72 78 83 90 95 97 99 99

11 20 30 40 50 60 69 76 82 87 93 97 98 99 *

13 20 09 13 19 24 30 37 43 49 55 61 71 79 85 90 93

60 10 16 24 33 41 50 58 65 71 77 86 92 95 97 99

120 10 18 26 36 45 54 62 70 76 81 89 94 97 98 99

11 19 29 40 50 59 67 75 81 85 92 96 98 99 *

14 20 09 13 18 23 29 35 41 47 52 58 68 76 83 88 92

60 10 16 23 31 40 48 56 63 69 75 84 90 94 97 98

120 10 17 25 34 43 52 61 68 74 80 88 93 97 98 99

11 19 28 38 48 58 65 73 79 84 91 96 98 99 *

15 20 08 12 17 22 27 33 39 44 50 55 65 74 81 86 90

60 09 15 22 30 38 46 54 61 67 73 73 89 94 96 98

120 10 16 24 33 42 51 59 66 73 78 78 92 96 98 99 10 18 27 37 47 56 64 72 78 83 83 95 97 99 99

18 20 08 11 15 19 24 29 34 39 44 49 58 67 74 80 85

60 09 14 20 27 34 41 48 55 62 68 78 85 91 94 97

120 09 15 22 30 38 46 54 61 68 74 83 90 94 97 98

10 17 25 34 43 52 60 68 76 80 88 93 97 98 99

20 20 08 11 14 18 22 26 31 36 40 45 54 63 70 77 82

60 08 13 19 25 31 38 45 52 58 64 75 83 89 93 96

120 09 14 21 28 36 43 51 58 65 71 81 88 93 96 98

09 16 24 32 41 50 58 65 72 78 87 92 96 98 99

24 20 07 10 13 16 19 23 27 31 35 39 47 55 62 69 75

60 08 12 17 22 28 34 40 46 52 58 69 78 84 89 93120 08 13 19 25 32 39 46 53 60 66 76 84 90 94 96

09 15 21 29 37 46 54 61 68 74 83 90 94 97 98

30 20 07 09 11 14 16 19 22 25 29 32 39 45 53 59 65

60 08 11 15 19 24 29 34 40 45 51 61 70 78 84 89

120 08 12 16 22 28 34 40 46 53 59 69 78 85 90 94

08 13 19 26 33 41 48 56 62 69 79 87 92 95 97

40 20 07 08 10 11 13 15 18 20 22 25 30 35 41 47 52

60 07 10 12 16 19 23 27 32 36 41 50 59 67 74 80

120 07 10 14 18 23 28 33 38 44 49 60 69 77 83 88

08 12 17 22 29 35 42 49 55 61 72 81 87 92 95

48 20 07 08 09 10 12 14 15 17 19 21 25 30 35 39 44

60 07 09 11 14 17 20 24 28 31 35 44 52 59 67 73


15/31

POST HOC POWER 215

TABLE 2 (Continued)

dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

60 20 07 08 09 10 12 13 14 16 18 19 23 26 30 34 38

60 07 08 10 12 15 17 20 23 26 29 36 43 50 57 63

120 07 09 11 14 17 21 25 28 33 37 46 54 62 70 76

07 10 14 18 23 28 34 39 45 01 62 71 79 85 70

120 20 06 07 08 08 09 10 10 11 12 12 14 15 17 19 20

60 06 07 08 09 10 11 12 13 15 16 19 23 26 30 34

120 06 07 08 10 11 13 15 17 19 21 26 31 36 41 47

06 08 11 13 16 19 23 24 31 35 43 52 60 68 74

Note. = .05. From Statistical Power Analysis for the Behavioral Sciences (2nd ed., pp. 420423),

by J. Cohen, 1988, Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Reprinted with permission.

*Power values here and to the right are > .995.

TABLE 3

Power of the FTest As a Function of , dfi, and dfe

dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

1 20 10 23 37 51 63 73 80 86 90 94 97 99 *

60 12 26 42 57 69 79 86 91 94 96 99 *

120 12 27 44 58 71 80 87 92 95 97 99 *

12 28 45 60 72 81 88 92 95 97 99 *

2 20 06 15 26 37 48 58 67 75 81 86 93 96 98 99

60 08 18 30 45 57 68 76 83 88 92 97 99 99 *

120 08 19 33 46 59 70 78 85 90 93 97 99 *

08 20 35 49 61 72 80 87 91 94 98 99 *

3 20 05 11 20 29 39 48 57 65 72 78 87 93 96 98

60 06 14 25 37 49 60 69 77 83 88 94 97 99 *120 06 15 27 39 51 62 72 79 85 90 95 98 99 *

07 16 29 42 54 65 74 82 87 91 96 98 *

4 20 04 09 16 23 32 41 49 57 64 71 81 89 93 96

60 05 12 21 31 42 53 62 71 78 83 91 96 98 99

120 05 13 23 34 45 56 66 74 81 86 93 97 98 *

06 14 25 37 49 60 69 77 84 89 95 98 99 *

5 20 03 07 13 20 27 35 43 50 58 94 76 84 90 93 *

60 04 10 18 28 37 48 57 66 73 79 88 94 97 98

120 04 11 20 30 41 51 61 70 77 83 91 95 98 99

05 12 22 33 44 55 65 74 80 86 93 97 99 99

6 20 03 06 11 17 23 30 37 45 52 58 70 79 86 91 *60 04 09 16 24 34 43 52 61 69 75 85 92 96 98

120 04 10 17 27 37 47 57 66 73 79 89 94 97 99


16/31


TABLE 3 (Continued)

dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

7 20 03 05 09 15 20 26 33 40 46 53 65 74 82 88 99

60 04 08 14 22 30 39 48 57 64 71 82 90 94 97 *

120 04 09 16 24 34 43 53 62 69 76 86 93 96 96

04 10 18 27 37 48 58 67 74 81 90 95 98 99

8 20 02 05 08 13 18 23 29 36 42 48 60 70 78 84 98

60 03 07 13 20 27 36 44 53 61 67 79 87 93 96 *

120 03 08 14 22 31 40 49 58 66 73 84 91 95 98

04 09 16 25 35 45 55 64 72 78 88 94 97 99

9 20 02 04 07 11 16 21 26 32 38 44 55 65 74 81 86

60 03 06 11 18 25 33 41 49 57 64 76 85 91 95 97

120 03 07 13 20 29 37 46 55 63 70 81 89 94 97 98

04 08 15 23 33 42 52 61 69 76 86 93 96 98 99

10 20 02 04 06 10 14 19 24 29 34 40 51 61 70 77 83

60 02 06 10 16 23 30 38 46 54 61 73 83 89 94 96

120 02 07 12 19 27 35 44 52 60 67 79 87 93 96 98

03 08 14 22 31 40 49 58 66 74 84 91 96 98 99

11 20 02 04 07 10 13 17 22 26 31 37 47 57 66 74 80

60 03 05 10 15 21 29 36 44 51 58 71 80 87 92 95

120 03 06 11 17 25 33 41 50 58 65 77 86 92 95 97 03 07 13 20 29 38 47 56 64 71 83 90 95 97 99

12 20 02 03 05 08 12 15 20 24 29 34 44 53 62 70 77

60 02 05 09 14 20 26 33 41 48 55 68 78 85 91 94

120 02 06 11 17 24 31 38 47 55 62 75 84 90 95 97

03 07 12 19 27 36 45 54 62 69 81 89 94 97 99

13 20 02 04 06 08 11 14 18 22 26 31 40 50 59 67 74

60 02 05 08 13 18 25 32 39 46 52 65 76 84 89 93

120 02 05 10 15 22 29 37 45 53 60 72 82 89 93 96

03 06 11 18 26 35 44 52 60 68 80 88 93 96 98

14 20 02 04 05 08 10 13 17 20 24 29 38 47 55 63 70

60 02 05 08 12 17 23 30 36 43 50 62 73 82 88 92120 02 05 09 14 21 28 35 43 50 58 70 80 88 92 96

03 06 11 17 25 33 42 50 59 66 78 87 92 96 98

15 20 02 03 05 07 09 12 15 19 23 27 35 44 52 60 67

60 02 04 07 11 16 22 28 34 41 47 60 71 80 86 91

120 02 05 09 13 19 26 33 41 48 55 68 78 86 90 95

02 06 10 16 24 32 40 49 57 64 77 86 92 95 98

18 20 02 03 04 06 08 10 13 15 18 22 29 36 44 51 59

60 02 04 06 10 14 18 23 29 35 41 53 64 74 81 87

120 02 04 07 11 17 22 29 36 43 49 62 73 82 88 93

02 05 09 14 21 28 36 44 52 59 72 82 89 94 96

20 20 02 03 04 05 07 09 11 14 16 19 25 32 39 46 53

60 02 04 06 09 12 16 21 26 32 37 49 60 70 78 84


17/31

for interpolation,when .995.


18/31

specttothereciprocalsofdfe, that is, 1/20,1/60, 1/120, and 0, respectively. Inpartic-

ular,whenthepowercoefficientisneededforagivenforagivendfe,where dfe falls

betweenLdfe and Udfe, the lower and upper valuespresented inTable 2 and Table 3,the power values PowerL and PowerUare obtained foratLdfe and Udfe by linear in-terpolation.Then,tocomputepowerforagiven dfe, thefollowingformulaisused:

(5)

For Udfe = , 1/Udfe = 0, such that for dfe > 120, the denominator of Equation 5 is1/120 0 = .0833 (Cohen, 1988).

Virtually all univariate tests of statistical significance fall into one of the fol-

lowing three cases: Case A, Case B, or Case C. Case A, the simplest case, involves

testing a set (A) of one or more independent variables. As before, dfi is the number

of independent variables contained in the set, and Equation 3 is used, with the ef-

fect size being represented by

(6)

where the numerator represents the proportion of variance explained by the vari-

ables in Set A and the denominator represents the error variance proportion.

With respect to Case B, the proportion of variance in the dependent variable ac-

counted for a Set A, over and above what is explained by another set B, is ascer-

tained. This is represented by . The error variance proportion is then

. Equation 3 can then be used, with the effect size being represented by

(7)

The error degrees of freedom for Case B is given by

dfe = N dfi dfb 1, (8)

where dfi is the number of variables contained in Set A, and dfb is the number of

variables in Set B.Case C is similar to Case B, inasmuch as the test of interest is the unique contri-


( )( )

1 1

( ).1 1

e eL u L

e e

Ldf dfPower Power Power Power

Ldf Udf

+

= =

+

2

2

,1

Y A

Y A

R

R

2 2B , YY A BR R

2 ,1 Y A BR

2 2B ,

2 ,

.

1

YY A B

Y A B

R R

R


19/31

. Therefore, Equation 3 can then be used, with the effect size being repre-

sented by

(9)

The error degrees of freedom for Case B is given by

dfe = N dfi dfb dfc 1, (10)

where dfi is the number of variables contained in Set A, dfb is the number of vari-

ables in Set B, and dfc is the number of variables in Set C. As noted by Cohen

(1988), Case C represents the most general case, with Cases A and B being special

cases of Case C.

FRAMEWORK FOR CONDUCTING POST HOC

POWER ANALYSES

We agree that post hoc power analyses should accompany statistically

nonsignificant findings.5 In fact, such analyses can provide useful information for

replicationstudies. Inparticular, the components of the post hoc power analysis can

be used toconducta prioripower analyses insubsequent replication investigations.

Figure 1 displays our power-based framework for conducting NHST. Spe-

cifically, once the research purpose andhypotheses have been determined, the next

step is touse ana prioripower analysis todesign the study.Oncedatahavebeencol-

lected,thenextstepistotestthehypotheses.Foreachhypothesis,ifstatisticalsignif-

icance is reached (e.g., at the 5% level), then the researcher should report the effect

size andconfidenceinterval aroundtheeffectsize (e.g.,Bird,2002;Chandler,1957;

Cumming&Finch,2001;Fleishman,1980;Steiger&Fouladi,1992,1997;Thomp-

son, 2002). Conversely, if statistical significance is not reached, then the researcher

shouldconductaposthocpoweranalysisinanattempttoruleinortoruleoutinade-

quate power (e.g., power < .80) as a threat to the internal validity of the finding.

HEURISTIC EXAMPLE

Recently, Onwuegbuzie, Witcher, Filer, Collins, and Moore (2003) conducted a

study investigating characteristics associated with teachers views on discipline.

POST HOC POWER 219

2 2B ,

2 , ,

1

YY A B

Y A B C

R R

R

5Moreover, we recommend that upper bounds for post hoc power estimatesbe computed (Steiger &

Fouladi 1997) This upper bound is estimated via the noncentralityparameter However this is beyond

2 , ,Y A B CR


20/31

The theoretical framework for this investigation, although not presented here, can

be found by examining the original study. Although several independent variables

were examined by Onwuegbuzie, Witcher, et al., we restrict our attention to one of

them, namely,ethnicity (i.e.,CaucasianAmerican vs.minority)and its relationship

to discipline styles.

Participants were 201 students at a large mid-southern university who were ei-ther preservice (77.0%) or in-service (23.0%) teachers. The sample size was se-


FIGURE 1 Power-based framework for conducting null hypothesis significant tests.

Ye s

S t a t i s t i c a l

s i g n i f i c a n c e

a c h i e v e d ?

R e p o r t a n d

i n t e r p r e t

p o s t - h o c p o w e r

c o e f f i c i e n t

P o s t - h o c

p o w e r

c o e f f i c i e n t

low?

L a c k o f p o w e r

i s a r i v a l

e x p l a n a t i on o f

t h e s t a t i s t i c a l ly

n o n - s i g n i f i c a n t

f i n d i n g

R e p o r t

e f f e c t s i z e a n d

c o n f i d e n c e

i n t e r v a l a r o u n d

e f f e c t s i z e

No

R u l e o u t p o w e r

a s a r i v a l

e x p l a n a t io n o f

t h e s t a t i s t i c a l

n o n -

s i g n i f i c a n c e

Ye s

No

I d e n t i f y

r e s e a r c h

p u r p o s e

D e t e r m i n e a n d

s t a t e

h y p o t h e s e s

C o n d u c t a p r i o r i

p o w e r a n a l y s i s

a s p a r t o f

p l a n n i n g

r e s e a r c h d e s i g n

C o l l e c t

d a t a

C o n d u c t N u l l

H y p o t h e s i s

S i g n i f i c a n c e

Test


21/31

d= .5) at the (two-tailed) .05 level of significance, maintaining a family-wise error

of 5% (i.e., approximately .01 for each set of statistical tests comprising the three

subscales used; Erdfelder et al., 1996). The preservice teachers were selected fromseveral sections of an introductory-level undergraduate education class. On the

other hand, the in-service teachers represented graduate students who were en-

rolled in one of two sections of a research methodology course.

On the 1st week of class, participants were administered the Beliefs on Disci-

pline Inventory (BODI), which was developed by Roy T. Tamashiro and Carl D.

Glickman (as cited in Wolfgang & Glickman, 1986). This measure was con-

structed to assess teachers beliefs on classroom discipline by indicating the de-

gree to which they are noninterventionists, interventionists, and interactionalists.

The BODI contains 12 multiple-choice items, each with two response options. Foreach item, participants are asked to select the statement with which they most

agree. The BODI contains three subscales representing the Noninterventionist, In-

terventionist, and Interactionalist orientations, with scores on each subscale rang-

ing from 0 to 8. A high score on any of these scales represents a teachers proclivity

toward the particular discipline approach. For our study, the Noninterventionist,

Interventionist, and Interactionalist subscales generated scores that had a classical

theory alpha reliability coefficient of .72 (95% confidence interval [CI] = .66, .77),

.75 (95% CI = .69, .80), and .94 (95% CI = .93, .95), respectively.

A series of independent ttests, using the Bonferroni adjustment to maintain afamily-wise error of 5%, revealed no statistically significant difference between

Caucasian American (n = 175) and minority participants (n = 26) for scores on the

Interventionist, t(199) = 1.47, p > .05; Noninterventionist, t(199) = 0.88, p > .05;

and Interactionalist, t(199) = 0.52, p > .05 subscales. After finding statistical

nonsignificance, Onwuegbuzie, Witcher, et al. (2003) could have concluded that

there were no ethnic differences in discipline beliefs. However, Onwuegbizie,

Witcher, et al. decided to conduct a post hoc power analysis. The post hoc power

analysis for this test of ethnic differences revealed low statistical power. Thus,

Onwuegbuzie, Witcher, et al. (2003) concluded the following:

The finding of no ethnic differences in discipline beliefs also is not congruent with

Witcher et al. (2001), who reported that minority preservice teachers less often en-

dorsedclassroomandbehavior management skillsas characteristic ofeffectiveteach-

ers than did Caucasian-American preservice teachers. Again, the non-significance

could have stemmed from the relatively small proportion of minority students (i.e.,

12.9%), which induced relatively low statistical power (i.e., 0.59) for detecting a

moderate effect size for comparing the two groups with respect to three outcomes

(Erdfelder et al., 1996). More specifically, using the actual observed effect sizes per-

taining to these three differences, and applying the Bonferroni adjustment, the

post-hoc statistical power estimates were .12, .06, and .03 for interventionist,

POST HOC POWER 221


22/31

Replications are thus needed to determine the reliability of the present findings of no

ethnic differences in discipline belief. (p. 19)

Although Onwuegbuzie, Witcher, et al. (2003) computed their power coeffi-

cients using a statistical power program (i.e., Erdfelder et al., 1996), these esti-

mates could have been calculated by hand using Equation 3 and Tables 2 and 3.

For example, to determine the power of the test of racial differences in levels of in-

terventionist orientation, Table 3 would be used (i.e., = .01) as a result of theBonferroni adjustment used (i.e., .05/3). The proportion of variance in interven-

tionist orientation accounted for by race (i.e., effect size) can be obtained by using

one of the following three transformation formulae:

(11)

(12)

(13)

where t2 is square of the tvalue, Fis square of the Fvalue, d2 is the square of Co-

hens dstatistic, and dfi and dfe are the numerator and denominator degrees of free-

dom, respectively (Murphy & Myors, 1998). Using the tvalue of 1.47 reported

previously by Onwuegbuzie,Witcher, et al. pertaining to the racial difference in in-

terventionist orientation, the associated numerator degrees of freedom (dfi) of 1

(i.e., one independent variable: race) and the corresponding denominator degrees

of freedom (dfe) of 199 (i.e., total sample size 2) from Equation 11, we obtain

(14)

Next, we substitute this value ofES and the numerator and denominator degrees of

freedom into Equation 3 to yield a noncentrality parameter of= .0107 (1 + 199 +1) = 2.15 2.FromTable 3,weuse dfi =1and = 2. Now, dfe = 199 isbetweenLdfe= 120 and Udfe = . We could use Equation 5, however, because if both the lower

and upper power values are .12,our estimated power value will be .12,which repre-sents extremely low statistical power for detecting the smallobserved effect size of


2

2,

( )e

tES

t df=

+

( ),

( )

i

i e

df FES

df F df =

+

2

2 ;4

dES d= +

2

2

(11.47).0107.

(1.47) 199ES = =

+


23/31

If Onwuegbuzie, Witcher, et al. (2003) had stopped after finding statistically

nonsignificant results, this would have been yet another instance of the deepening

file-drawer problem (Rosenthal, 1979). The post hoc power analysis allowed thestatistically nonsignificant finding pertaining to ethnicity to be placed in a more

appropriate context. This in turn helped the researchers to understand that their sta-

tistically nonsignificant finding was not necessarily due to an absence of a rela-

tionship between ethnicity and discipline style in the population but was due at

least in part to a lack of statistical power.

By conducting a post hoc analysis, Onwuegbuzie, Witcher, et al. (2003) real-

ized that the conclusion that no ethnic differences prevailed in discipline beliefs

likely would have resulted in a Type II error being committed. Similarly, if

Onwuegbuzie, Witcher, et al. had bypassed conducting a statistical significancetest and had merely computed and interpreted the associated effect size, they

would have increased the probability of committing a Type A error. Either way,

conclusion validity would have been threatened.

As noted by Onwuegbuzie and Levin (2003) and Onwuegbuzie, Levin, et al.

(2003), the best way to increase conclusion validity is through independent (exter-

nal) replications of results (i.e., two or more independent studies yielding similar

findings that produce statistically and practically compatible outcomes). Indeed,

independent replications make invaluable contributions to the cumulative knowl-

edge in a given domain (Robinson & Levin, 1997, p. 25). Thus, post hoc analysescan play a very important role in promoting as well as improving the quality of ex-

ternal replications. The post hoc analysis documented previously, by revealing

low statistical power first and foremost, strengthens the rationale for conducting

independent replications. Moreover, researchers interested in performing inde-

pendent replications could use the information from Onwuegbuzie, Witcher, et

al.s (2003) post hoc analysis for their subsequent a priori power analyses. Indeed,

as part of their post hoc power analysis, Onwuegbuzie, Witcher, et al. could have

conducted what we call a what-if post hoc power analysis to determine how many

more cases would have been needed to increase the power coefficient to a more ac-ceptable level (e.g., .80). As such, what-if post hoc power analyses provide useful

information for future researchers.

Even novice researchers who are unable or unwilling to conduct a priori power

analyses can use the post hoc power values to warn them to employ larger samples

(i.e., n > 201) in independent replication studies than that used in Onwuegbuzie,

Witcher, et al.s (2003) investigation. Moreover, novice researchers can use infor-

mation from what-if post hoc power analyses to set their minimum group/sample

sizes. Although we strongly encourage these researchers to perform a priori power

analyses, we believe that what-if post hoc power analyses place the novice re-searcher in a better position to select appropriate sample sizesones that reduce

POST HOC POWER 223


24/31

around what-if post hoc power estimates using the (95%) upper and lower confi-

dence limits around the observed effect size (cf. Bird, 2002). For instance, for the

two-sample t-test context, Hedges and Olkins (1985) z-based 95% CI6 could beused to determine the upper and lower limits that subsequently would be used to

derive a confidence interval around the estimated minimum sample size needed to

obtain adequate power in an independent replication.

Post hoc power analyses also could be used for meta-analytic purposes. In par-

ticular, if researchers routinely were to conduct post hoc power analyses,

meta-analysts could determine the average observed power across studies for any

hypothesis of interest. Knowledge of observed power across studies in turn would

help researchers interpret the extent to which studies in a particular area were truly

contributing to the knowledge base. Low average observed power across inde-pendent replications would necessitate research design modifications in future ex-

ternal replications. Additionally, what we call weighted meta-analyses could be

conducted wherein effect sizes across studies are weighted by their respective post

hoc power estimates. For example, if an analysis revealed a post hoc power esti-

mate of .80, the effect-size estimate of interest would be multiplied by this value to

yield a power-adjusted, effect-size coefficient. Similarly, a post hoc power esti-

mate of .60 would adjust the observed effect-size coefficient by this amount. By

using this technique when conducting meta-analyses, effect sizes resulting from

studies with adequate power would be given more weight than effect sizes derivedfrom investigations with low power.

Inaddition,ifaresearcherconductsbothanaprioriandaposthocpoweranalysis

within the same study, then a power discrepancy index (PDI) could be computed,

whichrepresentsthedifferencebetweentheaprioriandposthocpowercoefficients.

ThePDIwouldthenprovideinformationtoresearchersforincreasingthesensitivity

(i.e., the degree to whichsampling error introduce imprecision into the results of a

study; Murphy & Myors, 1998, p. 5) of future a priori power analyses. Sensitivity

mightbe improved by increasing thesamplesize in independent replication studies.

Alternatively,sensitivity might be increased byusing measures thatyield more reli-able and valid scores or a study design that allows them to control for undesirable

sources of variability in their data (Murphy & Myors, 1998).

We have outlined several ways in which post hoc power analyses can be used to

improve the quality of present and future studies. However, this list is by no means

exhaustive. Indeed, we are convinced that other uses of post hoc power analyses

are awaiting to be uncovered. Regardless of the way in which a post hoc power

analysis is utilized, it is clear that this approach represents good detective work be-

cause it enables researchers to get to know their data better and consequently puts


6According to Hess and Kromrey (2002) Hedges and Olkinss (1985) z-based confidence interval


25/31

them in a position to interpret their data in a more meaningful way and with more

conclusion validity. Therefore, we believe that post hoc power analyses should

play a central role in NHST.

SUMMARY AND CONCLUSIONS

Robinson andLevin (1997) proposed a two-step procedure for analyzing empirical

data whereby researchers first evaluate the probability of an observed effect (i.e.,

statistical significance), and if andonly if statistical significance is found, then they

assess the effect size. Recently, Onwuegbuzie and Levin (2003) proposed a

three-step procedure when two or more hypothesis tests are conducted within the

samestudy,whichinvolvestestingthetrendofthesetofhypothesesatthethirdstep.Although both methods are appealing, their effectiveness depends on the statistical

power of the underlying hypothesis tests. Specifically, if power is lacking, then the

first step of the two-step method and the first and third steps of the three-step proce-

dure, which serve as gatekeepers for computing effect sizes, may lead to the

nonreporting of a nontrivial effect (i.e., Type A error; Onwuegbuzie, 2001).

Because the typical level of power for medium effect sizes in the behavioral and

social sciences is around .50 (Cohen, 1962), the incidence of Type A error likely is

high. Clearly, this incidence can be reduced if researchers conduct an a priori

power analysis to select appropriate sample sizes. However, such analyses arerarely employed (Cohen, 1992). Regardless, when a statistically nonsignificant

finding emerges, researchers should then conduct a post hoc power analysis. This

would help researchers determine whether low power threatens the internal valid-

ity of their findings (i.e., Type A error). Moreover, if researchers conduct post hoc

power analyses whether statistical significance is reached, then meta-analysts

would be able to assess whether independent replications are making significant

contributions to the accumulation of knowledge in a given domain. Unfortunately,

virtually no meta-analyst has formally used this technique.

Thus, in this article, weadvocate the use of posthoc power analyses in empiricalstudies, especially when statistically nonsignificant findings prevail. First, reasons

for the nonuse of a priori power analyses were presented. Second, post hoc power

wasdefinedanditsutilitydelineated.Third,astep-by-stepguideisprovidedforcon-

ductingpost hoc power analyses. Fourth, a heuristic example was provided to illus-

trate how post hoc power can help to rule in/out rival explanations to observed

findings. Finally, several methods were outlined that describe how post hoc power

analyses can be used to improve the design of independent replications.

Although we advocate the use of post hoc power analyses, we believe that such

approaches should never be used as a substitute for a priori power analyses. More-over, we recommend that a priori power analyses always be conducted and re-

POST HOC POWER 225


26/31

more statistically nonsignificant findings emerge. Post hoc power analyses rely

more on available data and less on speculation than do a priori power analyses that

are based on hypothesized effect sizes.Indeed, we agree with Wooley and Dawson (1983), who suggested editorial

policies to require all such information relating to a priori design considerations

and post hoc interpretation to be incorporated as a standard component of any re-

search report submitted for publication (p. 680). Although it could be argued that

this recommendation is ambitious, it is no more ambitious than the editorial poli-

cies at 20 journals that now formally stipulate that effect sizes be reported for all

statistically significant findings (Capraro & Capraro, 2002). In fact, post hoc

power provides a nice balance in report writing because we believe that post hoc

power coefficients are to statistically nonsignificant findings as effect sizes are tostatistically significant findings. In any case, we believe that such a policy of con-

ducting and reporting a priori and post hoc power analyses would simultaneously

reduce the incidence of Type II and Type A errors and subsequently reduce the in-

cidence of publication bias and the file-drawer problem. This can only help to in-

crease the accumulation of knowledge across studies because meta-analysts will

have much more information with which to use. This surely would represent a step

in the right direction.

REFERENCES

American Psychological Association. (2001). Publication manualof theAmerican Psychological Asso-

ciation (5th ed.). Washington, DC: Author.

Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 6,

432437.

Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of variance. Educational and Psy-

chological Measurement, 62, 197226.

Brewer, J. K. (1972). On the power of statistical tests in the American Education Research Journal.

American Education Research Journal, 9, 391401.

Brewer, J. K., & Owen, P. W. (1973). A note on the power of statistical tests in the Journal of Educa-

tional Measurement. Journal of Educational Measurement, 10, 7174.Cahan, S. (2000). Statistical significance is not a kosher certificate for observed effects: A critical

analysis of the two-step approach to the evaluation of empirical results. Educational Researcher,

29(1), 3134.

Capraro, R. M., & Capraro, M. M. (2002). Treatments of effect sizes and statistical significance tests in

textbooks. Educational and Psychological Measurement, 62, 771782.

Carver, R. P. (1978). The case against statistical significance testing.Harvard Educational Review, 48,

378399.

Carver, R. P. (1993). The case against statistical significance testing, revisited.Journal of Experimental

Education, 61, 287292.

Chandler, R. E. (1957). The statistical concepts of confidence and significance. Psychological Bulletin,

54, 429430.

Chase, L. J., & Baran, S. J. (1976). An assessment of quantitative research in mass communication.



27/31

Chase,L. J.,& Chase,R. B. (1976). A statistical power analysis of appliedpsychological research.Jour-

nal of Applied Psychology, 61, 234237.

Chase, L. J., & Tucker, R. K. (1975). A power-analytical examination of contemporary communication

research. Speech Monographs, 42(1), 2941.

Chase, L. J., & Tucker, R. K. (1976). Statistical power: Derivation, development, and data-analytic im-

plications. The Psychological Record, 26, 473486.

Cohen, J. (1962). The statistical power of abnormal social psychological research: A review.Journal of

Abnormal and Social Psychology, 65, 145153.

Cohen,J. (1965). Some statistical issues in psychologicalresearch. In B.B. Wolman(Ed.),Handbook of

clinical psychology (pp. 95121). New York: McGraw-Hill.

Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70,

426443.

Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic.

Cohen, J. (1973). Statistical power analysis andresearch results.American Educational Research Jour-nal, 10, 225230.

Cohen,J. (1977). Statisticalpoweranalysis forthebehavioral sciences (Rev.ed.).NewYork:Academic.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Law-

rence Erlbaum Associates, Inc.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155159.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 9971003.

Cohen, J. (1997). The earth is round (p < .05). InL.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What

if therewere nosignificancetests?(pp. 2135). Mahwah, NJ:Lawrence Erlbaum Associates, Inc.

Cooper, H. M., & Findley, M. (1982). Expected effect sizes: Estimates for statistical power analysis in

social psychology. Personality and Social Psychology Bulletin, 8, 168173.

Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence in-tervals that are based on central and noncentral distributions. Educational and Psychological Mea-

surement, 61, 532575.

Daly, J., & Hexamer, A. (1983). Statistical power in research in English education. Research in the

Teaching of English, 17, 157164.

Dayton, C.M.,Schafer,W.D.,& Rogers, B.G. (1973). Onappropriate uses andinterpretationsofpower

analysis: A comment. American Educational Research Journal, 10, 231234.

Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program.Behavior

Research Methods, Instruments, & Computers, 28, 111.

Fagely, N. S. (1985). Applied statistical power analysis and the interpretation of nonsignificant results

by research consumers. Journal of Counseling Psychology, 32, 391396.

Fagley, N. S.,& McKinney, I. J. (1983). Reviewer bias for statistically significant results: A reexamina-

tion. Journal of Counseling Psychology, 30, 298300.

Fidler, F. (2002). The fifth edition of theAPA Publication Manual: Why its statistics recommendations

are so controversial. Educational and Psychological Measurement, 62, 749770.

Fisher, R. A. (1941). Statistical methods for research workers (84th ed.) Edinburgh, Scotland: Oliver &

Boyd. (Original work published 1925)

Fleishman, A. I. (1980). Confidence intervals for correlation ratios. Educational and Psychological

Measurement, 40, 659670.

Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulle-

tin, 82, 120.

Guttman, L. B. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic

Models and Data Analysis, 1, 310.

Haase, R. F. (1974). Power analysis of research in counselor education. Counselor Education and Su-

POST HOC POWER 227


28/31

Haase, R.F., Waechter,D.M.,& Solomon, G.S.(1982). How significant isa significant difference?Aver-

ageeffectsize ofresearch incounseling psychology.Journal of CounselingPsychology,29,5865.

Halpin, R., & Easterday, K. E. (1999). The importance of statistical power for quantitative research: Apost-hoc review. Louisiana Educational Research Journal, 24, 528.

Harris, R. J. (1997). Reforming significance testing via three-valued logic. In L. L. Harlow, S. A.

Mulaik,& J.H. Steiger (Eds.), What if therewere nosignificance tests?(pp. 145174).Mahwah, NJ:

Lawrence Erlbaum Associates, Inc.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic.

Hess, M. R., & Kromrey, J. D. (2002, April). Interval estimates of effect size. An empirical comparison

of methods for constructing confidence bands around standardized mean differences. Paper pre-

sentedat theannualmeeting of theAmericanEducational ResearchAssociation,NewOrleans,LA.

Katzer, J., & Sodt, J. (1973). An analysis of the use of statistical testing in communication research. The

Journal of Communication, 23, 251265.

Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., et al. (1998). Statis-tical practices of educational researchers:An analysis of their ANOVA, MANOVA, andANCOVA

analyses. Review of Educational Research, 68, 350386.

Kroll, R. M., & Chase, L. J. (1975). Communication disorders: A power analytical assessment of recent

research. Journal of Communication Disorders, 8, 237247.

Levenson,R. L. (1980). Statistical power analysis: Implications for researchers,planners,andpractitio-

ners in gerontology. Gerontologist, 20, 494498.

Levin, J. R., & Robinson, D. H. (2000). Statistical hypothesis testing, effect-size estimation, and the

conclusion coherence of primary research studies. Educational Researcher, 29(1), 3436.

Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze

data. Current Directions in Psychology, 5, 161171.

McNemar, Q. (1960). At random: Sense and nonsense. American Psychologist, 15, 295300.

Meehl, P. E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy

of Science, 34, 103115.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress

of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806834.

Morse, D. T. (2001, November). YAPP: Yet another power program.

Paper presented at the annual meeting of the Mid-South Educational Research Association, Chatta-

nooga, TN.

Mulaik, S.A., Raju, N.S., & Harshman, R.A. (1997). There is a timeand a place for significance testing.

In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp.

65115). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Mundfrom, D. J., Shaw, D. G., Thomas, A., Young, S., & Moore, A. D. (1998, April). Introductory

graduate research courses: An examination of the knowledge base. Paper presented at the annual

meeting of the American Educational Research Association, San Diego, CA.

Murphy, K. R., & Myors, B. (1998). Statistical power analysis: A simple and general model for tradi-

tional and modern hypothesis tests. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Neyman, J., & Pearson,E.S. (1928). On the use and interpretation ofcertain test criteria for the purposes

of statistical inference. Biometrika, 20A, 175200, 263294.

Neyman, J.,& Pearson, E.S. (1933a).On theproblem of themost efficient tests of statistical hypotheses.

Transactions of the Royal Society of London, Series A, 231, 289337.

Neyman, J., & Pearson, E. S. (1933b). Testing statistical hypotheses in relation to probabilities a priori.

Proceedings of the Cambridge Philosophical Society, 29, 492510.Nix, T. W., & Barnette, J. J. (1998). The data analysis dilemma: Ban or abandon. A review of null hy-

pothesis significance testing Research in the Schools, 5, 314



29/31

Onwuegbuzie, A. J. (2002). Common analytical and interpretational errors in educational research: An

analysis of the 1998 volume of the British Journal of Educational Psychology. Educational Re-

search Quarterly, 26, 1122.

Onwuegbuzie, A. J. (2003). Expanding the framework of internal and external validity in quantitative

research. Research in the Schools, 10, 7190.

Onwuegbuzie,A. J.,& Daniel,L. G. (2002). A frameworkforreportingandinterpreting internal consis-

tency reliability estimates. Measurement and Evaluation in Counseling and Development, 35,

89103.

Onwuegbuzie,A. J.,& Daniel,L. G. (2003,February12). Typologyofanalytical andinterpretational er-

rors in quantitative andqualitative educational research.Current Issues in Education, 6(2) [Online].

Retrieved October 10, 2004, from http://cie.ed.asu.edu/volume6/number2/

Onwuegbuzie, A. J., & Daniel, L. G. (2004). Reliability generalization: The importance of considering

sample specificity, confidence intervals, and subgroup differences. Research in the Schools, 11(1),

6172.Onwuegbuzie, A. J., & Levin, J. R. (2003). A new proposed three-step method for testing result direc-

tion. Manuscript in preparation.

Onwuegbuzie, A. J., & Levin, J. R. (2003). Without supporting statistical evidence, where would re-

portedmeasures of substantive importance lead? To no good effect.Journal of Modern Applied Sta-

tistical Methods, 2, 133151.

Onwuegbuzie, A. J., Levin, J. R., & Leech, N. L. (2003). Do effect-size measures measure up? A brief

assessment. Learning Disabilities: A Contemporary Journal, 1, 3740.

Onwuegbuzie, A. J., Witcher, A. E., Filer, J., Collins, K. M. T., & Moore, J. (2003). Factors associated

with teachers beliefs about discipline in the context of practice. Research in the Schools, 10(2),

3544.

Orme, J. G., & Tolman, R. M. (1986). The statistical power of a decade of social work education re-search. Social Science Review, 60, 619632.

Overall, J. E. (1969). Classical statistical hypotheses within the context of Bayesian theory. Psychologi-

cal Bulletin, 71, 285292.

Penick, J. E., & Brewer, J. K. (1972). The powerof statistical tests in science teaching research.Journal

of Research in Science Teaching, 9, 377381.

Robinson, D. H., & Levin, J. R. (1997). Reflections on statistical and substantive significance, with a

slice of replication. Educational Researcher, 26(5), 2126.

Rosenthal, R. (1979). The file-drawer problem and tolerance for null results. Psychological Bulletin,

86, 638641.

Rossi, J. S. (1990). Statistical powerof psychological research: What have we gained in 20 years?Jour-

nal of Consulting and Clinical Psychology, 58, 646656.

Rossi, J. S. (1997). A case study in the failure of psychology as a cumulative science: The spontaneous

recovery of verbal learning. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were

no significance tests? (pp. 175197). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin,

57, 416428.

SASInstitute, Inc. (2002). SAS/STAT UsersGuide (Version 8.2) [Computer software]. Cary, NC:Au-

thor.

Sawyer, A. G., & Ball, A. D. (1981). Statistical power and effect size in marketing research. Journal of

Marketing Research, 18, 275290.

Schmidt, F. L. (1992). What do data really mean? American Psychologist, 47, 11731181.Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Impli-

cations for the training of researchers Psychological Methods, 1, 115129

POST HOC POWER 229


30/31

(Eds.), What if there were no significance tests? (pp. 3764). Mahwah, NJ: Lawrence Erlbaum As-

sociates, Inc.

Schmidt, F. L., Hunter, J. E., & Urry, V. E. (1976). Statistical power in criterion-related validation stud-

ies. Journal of Applied Psychology, 61, 473485.

Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of

studies? Psychological Bulletin, 105, 309316.

Sherron,R.H. (1988). Power analysis: The other halfof the coin. Community/JuniorCollege Quarterly,

12, 169175.

SPSS, Inc. (2001). SPSS 11.0 for Windows [Computer software]. Chicago: Author.

Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calcula-

tion, and hypothesis testing for the squared multiple correlation. Behavior Research Methods, In-

struments, and Computers, 4, 581582.

Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and theevaluationof statistical

models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significancetests? (pp. 221257). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Thompson, B. (2002). What future quantitative social science research could look like: Confidence in-

tervals for effect sizes. Educational Researcher, 31(3), 2532.

Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76,

105110.

Witcher, A. E., Onwuegbuzie, A. J., & Minor, L. C. (2001). Characteristics of effective teachers: Per-

ceptions of preservice teachers. Research in the Schools, 8(2), 4557.

Wilkinson, L., & The Task Force on Statistical Inference. (1999). Statistical methods in psychology

journals: Guidelines and explanations. American Psychologist, 54, 594604.

Witcher, A. E., Onwuegbuzie, A. J., & Minor, L. (2001). Characteristics of effective teachers: Percep-

tions of preservice teachers. Research in the Schools, 8, 4557.Wolfgang, D. H., & Glickman, C. D. (1986). Solving discipline problems: Strategies for classroom

teachers (2nd ed.). Boston: Allyn & Bacon.

Wooley, T. W., & Dawson, G. O. (1983). A follow-up power analysis of the statistical tests used in the

Journal of Research in Science Teaching. Journal of Research in Science Teaching, 20, 673681.



31/31

post hoc power

Documents