principles of sample size calculation...principles of sample size calculation jonathan cook (with...
TRANSCRIPT
Principles of sample size calculation
Jonathan Cook (with thanks to Doug Altman)
Centre for Statistics in Medicine, NDORMS, University of Oxford
EQUATOR – OUCAGS training course24 October 2015
2
Outline
Principles of study design
Principles of study sample size calculation
How to determine the sample size
How to calculate in practice
Summary
3
Study design – general principles
Play of change could mislead– The more subtle a question the more precise we need to be
evaluation (i.e. more information that is more data)
We need to be clear about our question– What exactly are we interested in?
– How precisely we want to know it?
Study (including sample size) should be fit forpurpose
– Relevant
– Sufficient for the intended analysis
4
Study size – how big?
Fundamental aspect of study design
– How many participants are needed?
Ethically and scientifically important
– legitimate experimentation
– Add to knowledge
Impact upon study conduct (e.g. 100 versus2000)
– Management of project
– Timeframe
– Cost
5
Principles of sample size calculation
Aim
– We wish to compare the outcome between the treatments and determine if there is a difference between them
Typically approach for RCT sample size calculation
– Choose the key (primary) outcome and base the calculation on it
– Get a large enough sample size to have reassurance that we will be able to detect a meaningful difference in the primary outcome
Main alternative approach
– Seek to estimate a quantity with a given precision
Same principles apply to all types of study
– What we are looking for may well differ
6
Reaching the wrong conclusion (1)
What can go wrong:
May conclude that there is a difference in outcome between active and control groups, when in fact no such difference exists
Technically called a Type I error
– more usefully called a false-positive result
Probability of making such an error is designated , commonly known as the significance level
Risk of false-positive conclusion (Type I error) does not decrease as the sample size increases
7
Reaching the wrong conclusion (2)
May conclude that there is no evidence of a difference in outcomes between active and control groups, when in fact there is such a difference
Technically called a Type II error
– more usefully called a false-negative result
Probability of making such an error is often designated as (1- is commonly known as the
statistical power)
Risk of missing an important difference (Type II error) decreases as the sample size increases
Type I and Type II errors
There really is a
difference
There really is no difference
Statistically significant
OK
Type I error (false positive)
Statistically non-significant
Type II error (false negative/
[1-power])
OK
9
How is sample size determined?
Sample size calculation sets the recruitmenttarget
– Usually a formula is available
– Note analysable data not participants per se
Required size is dependent upon:
– Trial design (e.g. cluster trial)
– Statistical analysis (e.g. t-test)
– Statistical parameters (e.g. sig. level and power)
– Difference we desire to detect (i.e. δ)
Some input have conventions somedon’t
– Educated guess sometimes needed
10
Typically what do we need for Standard RCT calculation?
Binary outcome
1. Anticipated control and intervention group rates (implies % target difference)
2. Significance level ()
3. Power (1-β)
Continuous outcomeEither 1,2, 4&5 (or 3-5)
1. Anticipated mean in each group (or more simply the target mean difference)
2. Anticipated standard deviation
3. Mean diff/SD (often called “effect size”)
4. Significance level ()
5. Power (1-β)
More complicated study designs/statistical analyses require more
inputs and may be framed differently
11
Choice of Type I & II errors ( and )
Varying and power (1-) often produces greatly different sample sizes
– For example difference of 80 & 70% in cure rate post treatment. =5% and power=80% requires 294 per group
How many does the following need:
– =5% and power=90%?
– =1% and power=90%?
Many clinical trials (and other studies) are far too small!
– Why?
12
Some ways to increase power
Increase sample size
– Extend recruitment period
– Relax inclusion criteria (can work against)
– Make the trial multi-centre, or add further centres
Increase event rate/reduce variation
– Selectively enrol “high-risk” patients
– Use a combined endpoint/precise estimate
– Do not exclude those at most risk of an event (e.g. oldest patients)
13
Example - FILMS trial
“….. to detect a 6-point ETDRS score difference(an effect size of 0.5) using a t-test at a 5%level of significance and 80% power, it wasestimated that 64 participants would benecessary in each group. This calculation wasbased on data from published studies.14,15”
14
Example - FILMS trial
“….. to detect a 6-point ETDRS score difference(an effect size of 0.5) using a t-test at a 5%level of significance and 80% power, it wasestimated that 64 participants would benecessary in each group. This calculation wasbased on data from published studies.14,15”
15
Target difference
How do we determine the difference we wish todetect?– Variety of formal and informal approaches available
– They can be judgement based, data driven or a combination
Most seek to identify a target difference which isviewed as important– e.g. minimum clinically important difference (MCID)
• Hard to pin down!!!
For a continuous outcome Cohen guidance oftenresorted to (small, medium and large)
16
Example text – FILMS expanded
FILMS trial: The primary outcome is ETDRS distancevisual acuity. A target difference of a mean difference of5 letters with a common standard deviation (SD) of 12was assumed. Five letters is equivalent to one line on avisual acuity chart and is viewed as an importantdifference by patients and clinicians. The SD value wasbased upon two previous studies – one RCT and oneobservational comparative study. This target differenceis equivalent to a standardised effect size of 0.42.Setting the statistical significance to the 2 sided 5%level and seeking 90% power, 123 participants pergroup are required; 246 in total.
17
How to calculate sample size
Using a formula
– Lots out there sometime only subtly different
Nomogram
Software (recommended approach)
– Formula
– Simulation (e.g. if no formula is readily available)
– Online/Apps
Sample size nomogram
Example
Power 0.8
Sign level 0.1
Effect size 0.5
Crude
nomogram:120
Proper
calculation:128
20
http://homepage.stat.uiowa.edu/~rlenth/Power/
BUTGOOD FOR ROUGH ESTIMATES BUT DANGEROUS
GET SOME ADVICE FROM SOMEONE EXPERIENCED (USUALLY NEED TO CONSULT A STATISTICIAN)!!!!