powerpoint presentation - flinders university - pow… · reporting power and sample size consort...
TRANSCRIPT
14/06/2012
1
Sample size and power analysis
You should have at least 80% power of your study
(Anonymous)
Sample size conversation
Source: http://www.xtranormal.com/watch/6871831/biostatistics-vs-lab-research
Two scenarios
Another researcher conducted a similar study
comparing the effect of the same intervention vs. the
same placebo on reducing body weight, and found
the same 5 lbs reduction with the intervention group
but could not claim that the intervention was
effective because P=0.35.
What do you think the crying researcher did
differently from the smiling one?
A researcher conducted a study comparing the effect
of an intervention vs. placebo on reducing body
weight, and found 5 lbs reduction among the
intervention group with P=0.01.
Question (1) to Statistician
Question: How can I make my P-value smaller?
Enroll as
many as
you can.
Answer (1) to Researcher
You almost always need to estimate a required sample
size or estimate analytical power given a sample size
when you are planning a study. Only exception may
be a pilot study (a smaller study to show feasibility, or
to collect data to plan a larger study).
Through this process, you can avoid wasting your
efforts and resources conducting studies that are
hopeless to begin with.
14/06/2012
2
Question (2) to Statistician
Question: Can I keep enrolling participants
into my study until I observe P<0.05?
Answer (2) Researcher: Absolutely
NOT
Question (3) to Statistician
If only I had a cent
for every time I
was asked
How many
participants
do I need for
my study?
Answer (3) to Researcher
The purpose of sample size formulae ‘is not to give an exact number…but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection, and to give an estimate to distinguish whether tens, hundreds, or thousands of participants are required’
Williamson et al. (2000) JRSSA 163(1): 5-13
Answer (3) to Researcher
It does not seem
an easy question,
like
How much money
should I take on my
holidays?
Statistics lecture
There is no such thing as a sample size problem. Sample size is but one
aspect of study design. When you are asked to help determine the sample
size a lot of questions must be asked and answered before you get to that
one…….You may often end up never discussing sample size because there
are other matters that override it in importance.
Russell Lenth (2001)
---------- sample size is dependent upon not only
on the desired power
but also the true variability in the population and
a specification of a practically significant effect
size
Question (3) to Statistician
Question: How to play with these terms?
14/06/2012
3
Session
Start 1 hour session
Ingradiants
Type I error: The probability of erroneously rejecting the
H0. (Conclude that there is an effect, when in fact there is
no effect.
1-
Δ
Type II error: The probability of erroneously failing to
reject the H0. (Conclude that there is no effect, when in
fact there is an effect.)
Power: The chance of correctly identify H1 (Conclude
that there is an effect, when in fact there is an effect)
Effect: Significant difference of body weight between
intervention and placebo groups
, Significance level
Significance level () • First type of error : Conclude that there is an effect,
when in fact there is no effect.
The level of your test is the probability that you will falsely conclude that the program has an effect, when in fact it does not.
So with a level of 5%, you can be 95% confident in the validity of your conclusion that the program had an effect
For policy purpose, you want to be very confident in the answer you
give: the level will be set fairly low.
Common level of : 5%, 10%, 1%.
1-, Power
Purpose of power analysis
Power analyses need to be conducted to ensure
adequate sample size to detect a meaningful
effect of your intervention
14/06/2012
4
Interpretation
A power of 80% tells us that, in 80% of the experiments of this sample size conducted in this population, if there is indeed an effect in the population, we will be able to say in our sample that there is an effect with the level of confidence desired. The larger the sample, the larger the power. Common Power used: 80%, 90%
Visual concept of power
Null
Distribution:
difference=0.
Clinically relevant
alternative:
difference=10%.
Rejection region. Any value
>= 6.5 (0+3.3*1.96)
For 5% significance level, one-tail area=2.5%
(Z/2 = 1.96)
Power= chance of being in the
rejection region if the
alternative is true=area to the
right of this line (in yellow)
Visual concept of power
Rejection region. Any
value >= 6.5
(0+3.3*1.96)
Power= chance of being in the
rejection region if the
alternative is true=area to the
right of this line (in yellow)
Power here:
%85=)06.1>Z(P
=)3.3
105.6>Z(P
Is power analysis always needed?
Needed when:
• Designing a study
• Applying for grant
Less needed when:
• Secondary data analysis
• Pilot study to assess effect
A priori power anlysis
You want to find how many cases you will need to have a specified amount of power given a specified effect size the criterion of significance to be employed
A posteriori power analysis
You want to find out what power would be for a specified effect size sample size the criterion of significance to be employed
14/06/2012
5
, Effect sizes
Effect size
• A descriptive metric that characterizes the
standardized difference (in SD units) between the
mean of a control group and the mean of a
treatment group (intervention)
• Can also be calculated from correlational data
derived from pre-experimental designs or from
repeated measures designs
Sources of finding effect size
On the basis of previous research Meta-Analysis: Reviewing the previous literature and
calculating the previously observed effect size (in the same
and/or similar situations)
Pilot study When no prior studies exist for which one can extrapolate an
ES, it is often appropriate to conduct a small study with 10-20
participants in order to get an initial estimate of the effect size
On the basis of theoretical importance Deciding whether a small, medium or large effect is required.
Smallest size that would be clinically meaningful.
Unstandardized effect size
GB =
38.07 s USA =
38.08 s
D = 0.01 s
Standardized effect size
• The standard deviation captures the variability in the outcome. The more variability, the higher the standard deviation is
• The Standardized effect size is the effect size divided by the standard deviation of the outcome
= effect size/Standard deviation
Zero effect size
= 0.00
Control Group
Intervention Group
Overlapping
Distributions
= 0.00 means that the average treatment participant
outperformed 0% of the control participants
14/06/2012
6
Moderate effect size
Control Group
Treatment Group
= 0.40
= 0.40 means that the average treatment participant
outperformed 65% of the control participants
Large effect size
Control Group
Intervention Condition
= 0.85
= 0.85 means that the average treatment participant
outperformed 80% of the control participants
Attrition rate
Study design
Measurement of outcome
Attrition rate
If study is longitudinal or intervention study need to adjust
sample size by attrition rate Get attrition estimates from pilot studies or the literature of
studies in the same population
Default estimate would be 20%
Do power calculation and then adjust sample size
Final N=(N from Power estimate)/(1-attrition rate)
Example: 20% attrition rate
Power analysis yields total sample size of 100
Targeted N=???
Study design
Different designs have different power distributions and
considerations
Regression type design different than 2 x 2 ANOVA
Longitudinal vs. cross-sectional designs
Some designs harder to find power programs than others
- Longitudinal
- Nested/clustered designs
- Dichotomous and categorical outcomes
Keep in mind aim of study and not just design
Measurement of outcome
Level of measurement of outcome can have some influence on
power estimates
Differences in means
- Ex: Intervention study looking at differences in depression using CESD
Differences in proportions
- Ex: Intervention study looking at differences in depression dx
Power done for primary outcome
If several important outcomes, conduct power for all and select
sample size so that power is at least .80 for all outcomes
14/06/2012
7
Inter-relationship
Factors needed for sample sizes
Power
Size of the effect
- Study design
- Measurement of outcome
Significance level desired
Attrition
Inter-relatioship
n Sample size
Significance
level
Δ
1- Power
Effect size
Standard case
P(T)
alpha 0.05
Sampling distribution
if H0 is true
POWER = 1 -
Effect Size
Sampling distribution
if HA is true
Increased
P(T)
T
alpha 0.1
Sampling distribution
if H0 is true
POWER = 1 -
Sampling distribution
if HA is true
Decreased
P(T)
T
alpha 0.01
Sampling distribution
if H0 is true
POWER = 1 -
Sampling distribution
if HA is true
14/06/2012
8
Increased n
P(T)
T
alpha 0.05
Sampling distribution
if H0 is true
POWER = 1 -
Sampling distribution
if HA is true
Increased
P(T)
T
alpha 0.05
Sampling distribution
if H0 is true
POWER = 1 -
Effect Size ↑
Sampling distribution
if HA is true
Key points
The power of a statistical test is influenced by the
• Sample size (n) ↑ n → power ↑
• Significance level (α) ↑ α → power ↑
• Difference (effect) to be detected (Δ) ↑ Δ → power ↑
• Variation in the outcome (σ2) ↓ σ2 → power ↑
Key points
What we need Where we get it
Significance level This is often conventionally set at 5%.
The lower it is, the larger the sample size
needed for a given power
The mean and the
variability of the outcome
in the comparison group
From previous surveys conducted in
similar settings. The larger the variability
is, the larger the sample for a given power
The effect size that we
want to detect
What is the smallest effect that should
prompt a policy response? The smaller
the effect size, the larger a sample size we
need for a given power
Reporting power and sample size
CONSORT 22-point checklist PAPER SECTION Item Description
TITLE & ABSTRACT 1 How participants were allocated to interventions (e.g., "random allocation", "randomized", or "randomly assigned").
INTRODUCTION
Background 2 Scientific background and explanation of rationale.
METHODS
Participants 3 Eligibility criteria for participants and the settings and locations where the data were collected.
Interventions 4 Precise details of the interventions intended for each group and how and when they were actually administered.
Objectives 5 Specific objectives and hypotheses.
Outcomes 6 Clearly defined primary and secondary outcome measures and, when applicable, any methods used to enhance the quality of
measurements (e.g., multiple observations, training of assessors).
Sample size 7 How sample size was determined and, when applicable, explanation of any interim analyses and stopping rules.
Randomization --
Sequence generation
8 Method used to generate the random allocation sequence, including details of any restriction (e.g., blocking, stratification).
Randomization --
Allocation concealment
9 Method used to implement the random allocation sequence (e.g., numbered containers or central telephone), clarifying whether the
sequence was concealed until interventions were assigned.
Randomization --
Implementation
10 Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups.
Blinding (masking) 11 Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group
assignment. When relevant, how the success of blinding was evaluated.
Statistical methods 12 Statistical methods used to compare groups for primary outcome(s); Methods for additional analyses, such as subgroup analyses and
adjusted analyses.
RESULTS
Participant flow 13 Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of
participants randomly assigned, receiving intended treatment, completing the study protocol, and analyzed for the primary outcome. Describe protocol deviations from study as planned, together with reasons.
Recruitment 14 Dates defining the periods of recruitment and follow-up.
Baseline data 15 Baseline demographic and clinical characteristics of each group.
Numbers analyzed 16 Number of participants (denominator) in each group included in each analysis and whether the analysis was by "intention -to-treat".
State the results in absolute numbers when feasible (e.g., 10/20, not 50%).
Outcomes and
estimation
17 For each primary and secondary outcome, a summary of results for each group, and the estimated effect size and its precision (e.g.,
95% confidence interval).
Ancillary analyses 18 Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those
pre-specified and those exploratory.
Adverse events 19 All important adverse events or side effects in each intervention group.
DISCUSSION
Interpretation 20 Interpretation of the results, taking into account study hypotheses, sources of potential bias or imprecision and the dangers associated
with multiplicity of analyses and outcomes.
Generalizability 21 Generalizability (external validity) of the trial findings.
Overall evidence 22 General interpretation of the results in the context of current evidence.
Scientific rationale
Patient population
Sample size
Study designs & methods
Patient flow
Statistical analysis & results
Interpretation
14/06/2012
9
Reporting power
Reality and scientific validity
Reality vs. scientific validity
n
Reality: Resources Scientific validity:
Sample size formulae
Resources
• Number of available participants
• Laboratory resources
–Diagnostic tests, training program etc. – if
needed
• Time you have available
–Set by funding agency
–Set by your career trajectory
• Funds and personnel
Example and software
Example
A Clinician wants to conduct RCT to assess the effect of an
intervention to reduce HbA1c level among patients with type 2
diabetes. A pilot data suggests that mean HbA1c level among
patients without this intervention is 8.7% with standard deviation
of 2.2%. We believe that the intervention will decrease patient’s
HbA1c level by 1%. A total of 154 patients (77 patients in each
group) are needed to achieve 80% power at two-sided 5%
significance level.
Estimation of sample size comparing 2 group means
(independent sample t-test): Comparing post trial values
14/06/2012
10
Example
• Select the appropriate statistical test, based on the types of
outcome measures.
• Determine the minimum effect size.
• For continuous outcomes, estimate the standard deviation. For
dichotomous outcomes, estimate the baseline risk or incidence/
prevalence of the event.
• Set limits for Type I (α) and Type II (β) error.
• Specify your null hypothesis and alternative hypothesis (1-
tailed or 2-tailed).
Parameters needed for sample size computation
, Significance level = 5% (2 sided)
1-, Power = 80%
, Effect size = 1
, Variability = 2.2
m, sample size ratio between the two groups = 1
Software
G*Power: http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/
PS: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
Russ Lenth: http://www.stat.uiowa.edu/~rlenth/Power/
Epi Info: http://www.cdc.gov/Epiinfo/
WinPepi: http://www.brixtonhealth.com/pepi4windows.html
PASS: http://www.ncss.com/pass.html
G*Power 3
Determine the effect size
• Click on Determine
• Select n1=n2 for equal sample size
• Calculate and transfer to main window
Determine sample size
154 patients
77 in each group
14/06/2012
11
Determine power
Achieved 80%
power for 154
patients
Sample sizes vs. power
Conclusion
Crying researcher understood 80% (CI 70%-90%)
what he needs for
Smaller p-value
Better understanding
of study design Good knowledge of
outcome measure
Good statistical
approach Greater power to detect a
true difference!
Optimum
sample size ?
If in doubt…
Call Biostatisticians!!!!
FCEB (Flinders Centre for Epidemiology and Biostatistics) Discipline of General Practice
Level 3 Health Sciences Building
Flinders Medical Centre
Don’t miss…
FCEB Launch!!!!
3:00 PM today
Rooms 3.06-3.09, Health Science Lecture Theatre Complex