chapter 7 power & sample size calculationyibi/teaching/stat222/2017/lectures/c07.pdf · power...

Chapter 7 Power & Sample Size Calculation

Yibi Huang

Chapter 7 Power & Sample Size Calculation for CRDsSection 10.3 Power & Sample Size for Factorial Designs

Chapter 7 - 1

Power & Sample Size Calculation

When proposing an experiment (applying for funding etc),nowadays one needs to show that the proposed sample size (i.e.the number of experiment units) is

I neither so small that scientifically interesting effects will beswamped by random variation (i.e., has enough power toreject a false H0)

I nor larger than necessary, wasting resources (time & money)

Chapter 7 - 2

Errors and Power in Hypothesis TestingI A Type I error occurs when H0 is true but is rejected.

I A Type II error occurs when H0 is false but is accepted.

I The (significance) level of a test is the chance of making aType I error, i.e., the chance to reject a H0 when it is true.

I The power of the test is the the chance of rejecting H0 whenHa is true:

power = 1− P(making type II error|H0 is false) = 1− β= P(correctly reject H0|H0 is false)

I A good test should have small test-level and large power.

Ha is rejected H0 is rejected(H0 is accepted) (Ha is accepted)

H0 is true√

α = P(Type I error)

H0 is false β = P(Type II error)√

Chapter 7 - 3

Power Calculation for ANOVAFor a CRD design with g treatments, recall

the means model: yij = µi + εij

and the effects model: yij = µ+ αi + εij ,

for i = 1, . . . , g , and j = 1, . . . , ni .

The two models are related via the relationship

µi = µ+ αi .

For CRDs, we use the means model more often. However, forpower and sample size calculation, we need to use the effectsmodel and µ must be defined as

µ =1

N

g∑i=1

ni∑j=1

µi =1

N

g∑i=1

niµi .

which implies that∑g

i=1 niαi = 0. Here N =∑g

i=1 ni is the totalnumber of experimental units.

Chapter 7 - 4

Recall the H0 and Ha for the ANOVA F test are

H0 : µ1 = · · · = µg v.s. Ha : µi ’s not all equal,

in terms of the means model, or equivalently in terms of the effectsmodel, they are

H0 : α1 = · · · = αg = 0 v.s. Ha : Not all αi ’s are 0.

Recall we reject H0 at level α if the test statistic F = MSTrtMSE

exceeds the critical value Fα,g−1,N−g . So

Power = P(Reject H0|Ha is true) = P(F > Fα, g−1,N−g ).

To find P(F > Fα, g−1,N−g ), we must know the distribution of F .

I What is the distribution of F under H0? Fg−1,N−g .

I And under Ha?

Chapter 7 - 5

For F =MSTrtMSE

=SSTrt/(g − 1)

SSE/(N − g), recall in Ch3 we’ve shown that

I no matter under H0 or Ha, it is always true that

SSE

σ2∼ χ2

N−g , and E(MSE) = σ2

I For the numerator, SSTrt/σ2 has a χ2

g−1 distribution underH0. But under Ha, it has a non-central Chi-squaredistribution χ2

g−1,δ with non-central parameter

δ =

∑gi=1 niα

2i

σ2.

Moreover,E(SSTrt) = (g − 1)σ2 +

∑g

i=1niα

2i︸︷︷︸

δσ2

=

{(g − 1)σ2 under H0

(g − 1 + δ)σ2 under Ha

Chapter 7 - 6

Non-Central F -DistributionUnder Ha, it can be shown that

F =MSTrtMSE

has a non-central F -distribution on degrees of freedom g − 1and N − g , with non-centrality parameter δ, denoted as

F ∼ Fg−1,N−g ,δ where δ =

∑gi=1 niα

2i

σ2.

Recall that

αi = µi − µ, and µ =1

N

g∑i=1

ni∑j=1

µi =1

N

g∑i=1

niµi .

Non-central F -distribution is also right skewed, and the greater thenon-centrality parameter δ, the further away the peak of thedistribution from 0.

Chapter 7 - 7

Example 1: Power Calculation (1)I g = 5 treatment groups

group sizes: n1 = n2 = n3 = 5, n4 = 6, n5 = 4I assume σ = 0.8I desired significance level α = 0.05I find the power of the test when Ha is true with

µ1 = 1.6, µ2 = 0.6, µ3 = 2, µ4 = 0, µ5 = 1.

Solution.The grand mean µ is

µ =

∑gi=1 niµiN

=5× 1.6 + 5× 0.6 + 5× 2 + 6× 0 + 4× 1

5 + 5 + 5 + 6 + 4

=25

25= 1

The αi ’s are thus αi = µi − µ = µi − 1:

α1 = 0.6, α2 = −0.4, α3 = 1, α4 = −1, α5 = 0.

Chapter 7 - 8

Example 1: Power Calculation (2)The non-centrality parameter is

δ =

∑gi=1 niα

2i

σ2=

5× 0.62 + 5× (−0.4)2 + 5× 12 + 6× (−1)2 + 4× 02

0.82

=13.6

0.64= 21.25

So

F =MSTrtMSE

∼

{Fg−1,N−g = F4, 25−5 under H0

Fg−1,N−g ,δ = F4, 25−5, 21.25 under Ha

0 5 10 15

0.0

0.4

0.8

df(x

0, d

f1 =

4, d

f2 =

20)

H0: F ∼ F4,20

Ha: F ∼ F4,20,21.25

Chapter 7 - 9

Example 1: Power Calculation (3)The critical value to reject H0 keeping the significance level atα = 0.05 is F0.05,4,20 ≈ 2.866.

> qf(0.05,4,20, lower.tail=F)

[1] 2.866081

When H0 is true, the chance that H0 is rejected is only 0.05 (thered shaded area.)

0 5 10 15

0.0

0.4

0.8

df(x

0, d

f1 =

4, d

f2 =

20)

Accept H0 Reject H0

F0.05,4,20

Area = 0.05

Reminder: Don’t confuse the following two:I in most STAT books, Fα,df 1,df 2 means the area of the upper

tail is α.I in R, qf(alpha, df1, df2) means the the area of the lower

tail is α.Chapter 7 - 10

Example 1: Power Calculation (4)If Ha is true and

µ1 = 1.6, µ2 = 0.6, µ3 = 2, µ4 = 0, µ5 = 1,

we know then F ∼ F4,20,δ=21.25. The power to reject H0 is the areaunder the density of F4,20,δ=21.25 beyond the critical value (theblue shaded area,) which is 0.9249.

> pf(qf(.95,4,20),4,20, ncp=21.25, lower.tail=F)

[1] 0.9249342

In the R codes, ncp means the “non-centrality parameter.”

0 5 10 15

0.0

0.4

0.8

df(x

0, d

f1 =

4, d

f2 =

20)

Accept H0 Reject H0

F0.05,4,20

Power

Chapter 7 - 11

Example 1: Power Calculation (5)

Remark

The power of a test is not a single value, but a function of theparameters in Ha. If parameter µi ’s change their values, the powerof the test also changes. It makes no sense to talk about the powerof a test without specifying the parameters in Ha

Chapter 7 - 12

Power Of A Test Is Affected By ...

The larger the non-centrality parameter

δ =

∑gi=1 niα

2i

σ2,

the greater the power.

The power of a test will increase if

I the number of replicate ni per treatment increases

I the magnitude of treatment effects αi ’s increase (under theconstraint

∑i niαi = 0)

I the size of noise σ2 decreases.

Chapter 7 - 13

Example 2: Sample Size Calculation (1)I g = 5 treatment groups of equal sample size ni = n for all iI assume σ = 0.8I desired significance level α = 0.05I Assuming the design is balanced that all groups have identical

sample size n, what is the minimal sample size n pertreatment to have power 0.95 when

α1 = 0.5, α2 = −0.5, α3 = 1, α4 = −1, α5 = 0?

Solution. The non-centrality parameter is

δ =

∑gi=1 niα

2i

σ2=

n

0.82[0.52 + (−0.5)2 + 12 + (−1)2 + 02]

=n

0.82× 2.5 = 3.9n

So

F =MSTrtMSE

∼

{Fg−1,N−g = F4, 5n−5 under H0

Fg−1,N−g ,δ = F4, 5n−5, 3.9n under Ha

Chapter 7 - 14

Recall the critical value F ∗ for rejecting H0 at level α = 0.05 is

F ∗ = Fα, g−1,N−g = F0.05, 4, 5n−5

which can be find in R via the command

> F.crit = qf(alpha, g-1, N-g, lower.tail=F) # syntax

> F.crit = qf(0.05, 4, 5*n-5, lower.tail=F) # sub-in the values

By definition,

Power = P(reject H0 |Ha is true)

= P(the non central F statistic ≥ F ∗)

= P(F (g − 1,N − g , δ) ≥ F ∗)

= P(F (4, 5n − 5, 3.9n) ≥ F ∗)

which can be found in R via the command

> pf(F.crit, g-1, N-g, ncp=delta, lower.tail=F) # syntax

> pf(F.crit, 4, 5*n-5, ncp=3.9*n, lower.tail=F) # sub-in values

In the R codes, ncp means the “non-centrality parameter.”Chapter 7 - 15

Now we find the R code to find the power of the ANOVA F -testwhen n is known. Let’s plug in different values of n and see whatis the smallest n to make power ≥ 0.95.

> F.crit = qf(alpha, g-1, N-g, lower.tail=F)

> pf(F.crit, g-1, N-g, ncp=delta, lower.tail=F) # syntax

> n = 5

> F.crit = qf(0.05, 4, 5*n-5, lower.tail=F)

> pf(F.crit, 4, 5*n-5, ncp=3.9*n, lower.tail=F)

[1] 0.8994675 # less than 0.95, not high enough

> n = 6

> F.crit = qf(0.05, 4, 5*n-5, lower.tail=F)

> pf(F.crit, 4, 5*n-5, ncp=3.9*n, lower.tail=F)

[1] 0.9578791 # greater than 0.95, bingo!

So we need 6 replicates in each of the 5 treatment groups toensure a power of 0.95 when

α1 = 0.5, α2 = −0.5, α3 = 1, α4 = −1, α5 = 0.

Remark: Again, we have to specify the Ha fully to to find theappropriate sample size.

Chapter 7 - 16

But σ2 is Unknown . . .

As σ2 is usually unknown, here are a few ways to make a guess.

I Make a small-sample pilot study to get an estimate of σ.

I Based on prior studies or knowledge about the experimentalunits, can you think of a range of plausible values for σ?If so, choose the biggest one.

I You could repeat the sample size calculations for various levelsof σ to see how it affects the needed sample size.

Chapter 7 - 17

How to Specify the Ha?

As the power of a test depends on the alternative hypothesis Ha,that is, the αi ’s, one might has to try several sets of αi ’s to findthe appropriate sample size. But how many Ha’s we have to try?

Here is a useful trick.

1. Suppose we would be interested if any two means differed byD or more.

2. The smallest value of δ in this case is when two means differby exactly, D, and the other g − 2 means are halfway between.

3. So try α1 = D/2, α2 = −D/2, and αi = 0 for all other τ .Assuming equal sample sizes, the non-centrality parameter is

δ =∑i

nα2i

σ2=

n(D2/4 + D2/4)

σ2=

nD2

2σ2.

Chapter 7 - 18

Power and Sample Size Calculation for Complete Block Designs

For complete block designs

yij = µ+ αi + βj + εij RCBD

yijk = µ+ αi + βj + γk + εijk Latin Square

yijk` = µ+ αi + βj + γk + δ` + εijk` Graeco-Latin Square

where αi ’s are the treatment effect, and βj , γk , δ` are effects ofblocking factors, when Ha is true, treatments have different effects,the ANOVA F -statistic also has a non-central F -distribution

F =MStrt

MSE∼

{Fg−1, df of MSE under H0

Fg−1, df of MSE,δ under Ha

where the non-centrality parameter δ is

δ =r∑

i α2i

σ2, where r = # of replicates per treatment

Note that the df changes from N − g to the df of MSE.Chapter 7 - 19

Power and Sample Size Calculation for Balanced Factorial Designs

I Section 10.3 in Oehlert’s book

I We may ignore factorial structure and compute power andsample size for the overall null hypothesis of no model effects.

I We will address methods for balanced data only because mostfactorial experiments are designed to be balanced, and simpleformulae Power for balanced data for noncentrality parametersexist only for balanced data.

I For factorial data, we usually test H0 about main effects orinteractions in addition to the overall H0 of no model effects.

Chapter 7 - 20

Non-Centrality Parameter for Balanced Factorial Designs

Ex, consider a 3-way factorial design

yijk` = µ+ αi + βj + γk + αβij + βγjk + αγij + αβγijk + εijk`

to calculate the power of the F -test of a main effect or aninteraction effect under Ha, the non-centrality parameter can becalculated as follows:

I Square the factorial effects (using the zero-sum constraint)and sum them,

I Multiply this sum by the total number of data in the designdivided by the number of levels in the effect, and

I Divide that product by the error variance.

Chapter 7 - 21

E.g., consider a a× b × c factorial design with r replicates pertreatment


To test about the B main effects

H0 : β1 = · · · = βb = 0 v.s. Ha : not all βj are 0

the ANOVA F -statistic for B main-effect is

F =MSB

MSE∼

{Fb−1, df of MSE under H0

Fb−1, df of MSE,δ under Ha


δ =

(N

b

∑jβ2j

)/σ2 =

(rac∑

jβ2j

)/σ2.

Here N = abcr is the total number of observations.

Chapter 7 - 22

E.g., consider a a× b × c factorial design with r replicates pertreatment


To test about the AB interactions

H0 : all αβij = 0 v.s. Ha : not all αβij = 0

the ANOVA F -statistic for AB interactions

F =MSAB

MSE∼

{F(a−1)(b−1), df of MSE under H0

F(a−1)(b−1), df of MSE,δ under Ha


δ =

(N

ab

∑ij

(αβij)2

)/σ2 =

(rc∑

ij(αβij)

2)/

σ2.

Here N = abcr is the total number of observations.

Chapter 7 - 23

Sample Size Should Be The Last Thing To Consider

In practice, sample size calculation is the last thing to do indesigning an experiment. Consider the followings before calculatingthe required sample size.

I asking the right question,

I choosing an appropriate response,

I thinking about what factors need to be blocked on,

I choosing a right design,

I setting up the right procedure to recruit subjects, and so on,

All the above are more important than doing the right sample sizecalculation!

Sometimes, choosing a better design can substantially reduces therequired sample size.

Chapter 7 - 24

chapter 7 power & sample size calculationyibi/teaching/stat222/2017/lectures/c07.pdf · power...

Documents