principles of sample size calculation...principles of sample size calculation jonathan cook (with...

21
Principles of sample size calculation Jonathan Cook (with thanks to Doug Altman) Centre for Statistics in Medicine, NDORMS, University of Oxford EQUATOR – OUCAGS training course 24 October 2015

Upload: others

Post on 04-Feb-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Principles of sample size calculation

Jonathan Cook (with thanks to Doug Altman)

Centre for Statistics in Medicine, NDORMS, University of Oxford

EQUATOR – OUCAGS training course24 October 2015

2

Outline

Principles of study design

Principles of study sample size calculation

How to determine the sample size

How to calculate in practice

Summary

3

Study design – general principles

Play of change could mislead– The more subtle a question the more precise we need to be

evaluation (i.e. more information that is more data)

We need to be clear about our question– What exactly are we interested in?

– How precisely we want to know it?

Study (including sample size) should be fit forpurpose

– Relevant

– Sufficient for the intended analysis

4

Study size – how big?

Fundamental aspect of study design

– How many participants are needed?

Ethically and scientifically important

– legitimate experimentation

– Add to knowledge

Impact upon study conduct (e.g. 100 versus2000)

– Management of project

– Timeframe

– Cost

5

Principles of sample size calculation

Aim

– We wish to compare the outcome between the treatments and determine if there is a difference between them

Typically approach for RCT sample size calculation

– Choose the key (primary) outcome and base the calculation on it

– Get a large enough sample size to have reassurance that we will be able to detect a meaningful difference in the primary outcome

Main alternative approach

– Seek to estimate a quantity with a given precision

Same principles apply to all types of study

– What we are looking for may well differ

6

Reaching the wrong conclusion (1)

What can go wrong:

May conclude that there is a difference in outcome between active and control groups, when in fact no such difference exists

Technically called a Type I error

– more usefully called a false-positive result

Probability of making such an error is designated , commonly known as the significance level

Risk of false-positive conclusion (Type I error) does not decrease as the sample size increases

7

Reaching the wrong conclusion (2)

May conclude that there is no evidence of a difference in outcomes between active and control groups, when in fact there is such a difference

Technically called a Type II error

– more usefully called a false-negative result

Probability of making such an error is often designated as (1- is commonly known as the

statistical power)

Risk of missing an important difference (Type II error) decreases as the sample size increases

Type I and Type II errors

There really is a

difference

There really is no difference

Statistically significant

OK

Type I error (false positive)

Statistically non-significant

Type II error (false negative/

[1-power])

OK

9

How is sample size determined?

Sample size calculation sets the recruitmenttarget

– Usually a formula is available

– Note analysable data not participants per se

Required size is dependent upon:

– Trial design (e.g. cluster trial)

– Statistical analysis (e.g. t-test)

– Statistical parameters (e.g. sig. level and power)

– Difference we desire to detect (i.e. δ)

Some input have conventions somedon’t

– Educated guess sometimes needed

10

Typically what do we need for Standard RCT calculation?

Binary outcome

1. Anticipated control and intervention group rates (implies % target difference)

2. Significance level ()

3. Power (1-β)

Continuous outcomeEither 1,2, 4&5 (or 3-5)

1. Anticipated mean in each group (or more simply the target mean difference)

2. Anticipated standard deviation

3. Mean diff/SD (often called “effect size”)

4. Significance level ()

5. Power (1-β)

More complicated study designs/statistical analyses require more

inputs and may be framed differently

11

Choice of Type I & II errors ( and )

Varying and power (1-) often produces greatly different sample sizes

– For example difference of 80 & 70% in cure rate post treatment. =5% and power=80% requires 294 per group

How many does the following need:

– =5% and power=90%?

– =1% and power=90%?

Many clinical trials (and other studies) are far too small!

– Why?

12

Some ways to increase power

Increase sample size

– Extend recruitment period

– Relax inclusion criteria (can work against)

– Make the trial multi-centre, or add further centres

Increase event rate/reduce variation

– Selectively enrol “high-risk” patients

– Use a combined endpoint/precise estimate

– Do not exclude those at most risk of an event (e.g. oldest patients)

13

Example - FILMS trial

“….. to detect a 6-point ETDRS score difference(an effect size of 0.5) using a t-test at a 5%level of significance and 80% power, it wasestimated that 64 participants would benecessary in each group. This calculation wasbased on data from published studies.14,15”

14

Example - FILMS trial

“….. to detect a 6-point ETDRS score difference(an effect size of 0.5) using a t-test at a 5%level of significance and 80% power, it wasestimated that 64 participants would benecessary in each group. This calculation wasbased on data from published studies.14,15”

15

Target difference

How do we determine the difference we wish todetect?– Variety of formal and informal approaches available

– They can be judgement based, data driven or a combination

Most seek to identify a target difference which isviewed as important– e.g. minimum clinically important difference (MCID)

• Hard to pin down!!!

For a continuous outcome Cohen guidance oftenresorted to (small, medium and large)

16

Example text – FILMS expanded

FILMS trial: The primary outcome is ETDRS distancevisual acuity. A target difference of a mean difference of5 letters with a common standard deviation (SD) of 12was assumed. Five letters is equivalent to one line on avisual acuity chart and is viewed as an importantdifference by patients and clinicians. The SD value wasbased upon two previous studies – one RCT and oneobservational comparative study. This target differenceis equivalent to a standardised effect size of 0.42.Setting the statistical significance to the 2 sided 5%level and seeking 90% power, 123 participants pergroup are required; 246 in total.

17

How to calculate sample size

Using a formula

– Lots out there sometime only subtly different

Nomogram

Software (recommended approach)

– Formula

– Simulation (e.g. if no formula is readily available)

– Online/Apps

Sample size nomogram

Example

Power 0.8

Sign level 0.1

Effect size 0.5

Crude

nomogram:120

Proper

calculation:128

20

http://homepage.stat.uiowa.edu/~rlenth/Power/

BUTGOOD FOR ROUGH ESTIMATES BUT DANGEROUS

GET SOME ADVICE FROM SOMEONE EXPERIENCED (USUALLY NEED TO CONSULT A STATISTICIAN)!!!!

21

Summary

The sample size is important

– Affect many things

– It needs to fit the aim, design and the analysis

Sample size process is complex

– Not a one-hit wonder

– Easy to go wrong