statistical power and sample size calculations drug development statistics & data management...

37
Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology & Statistics Department of Medical & Molecular Genetics King’s College London With thanks to Irene Rebollo Mesa and

Upload: herbert-pearson

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

Statistical Power and Sample Size Calculations

Drug Development Statistics & Data Management

July 2014

Cathryn LewisProfessor of Genetic Epidemiology & StatisticsDepartment of Medical & Molecular GeneticsKing’s College London

With thanks to Irene Rebollo Mesa and Frühling Rijsdijk

Page 2: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

Outline

Power and Sample size 2

1. Concepts of power2. Power and types of error3. Software to calculate power4. Power for continuous outcome5. Power for proportion, success/failure6. Quiz!

Page 3: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

Power and Sample size 3

Planning a StudyQuestion : What are the study endpoints?

Types of Endpoints:

•Binary clinical outcome: Death from disease.

•Quantitative : Creatinine, cholesterol levels, QOL.

•Time to Event: Time to graft failure, time to death, time to recovery

Good Qualities:

-Clinically meaningful

-Practical and feasible to measure

-Occur frequently enough throughout the duration of the trial

Page 4: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

4

Planning a StudyQuestion : What is the expected prevalence of outcome (discrete) or variability of the outcome (continuous)?•Based on previous studies, pilot study or hospital/NHS report.•Variability and prevalence are vital for power.• Both are best at intermediate levels.

Question:What is the expected difference between groups •in proportion of events (if discrete), or •in mean measure (if continuous)•Based on previous studies or pilot study•Alternatively, minimum difference clinically relevant•The larger the difference the higher the power

Power and Sample size

Page 5: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

Power and Sample size 5

Design: What is your Hypothesis

1.Superiority

Objective To determine whether there is evidence of statistical difference in the comparison of interest between two Tx regimes:

A: Tx of Interest B: Placebo or

Active control Tx

H0: The two Txs have equal effect with respect to the mean response

H1: The two Txs are different with respect to the mean response

A B

A B

Page 6: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

6

Statistical Power

Power and Sample size

Page 7: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

7

Power

• Definition: The expected proportion of samples in which we decide correctly against the null hypothesis

• It depends on:

1. Size of the (treatment) effect in the population ()

2. The significance level at which we reject the null (0.05)

3. Sample size (N)

4. Design of the study: parallel or crossover etc.

5. Endpoint measurement (categorical, ordinal, continuous)

6. The expected dropout rate

Power and Sample size

Page 8: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

8

Power primer

• We summarise results of a trial in a statistical analysis with a test statistic (e.g. chi-squared, Z score)

• Provide a measure of support for a certain hypothesis

• Pre-determine threshold on test statistic to reject null hypothesis

Test statistic

Inevitably leads to two types of mistake : false positive (YES instead of NO) (Type I)false negative (NO instead of YES) (Type II)

YES OR NO decision-making : significance testing

YESNO

Power and Sample size

Page 9: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

9

T

alpha 0.05

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER: 1 -

Standard Case

Power and Sample size

Page 10: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

10

Rejection of H0 Non-rejection of H0

H0 true

HA true

Power and Sample size

Power1-type II error = 1-β

Type II error = β

Signifcance level Type I error = α

Page 11: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

11

Hypothesis testing

• Null hypothesis : no effect

• A ‘significant’ result means that we can reject

the null hypothesis

• A ‘non-significant’ result means that we cannot

reject the null hypothesis

Power and Sample size

Page 12: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

12

Statistical significance

• The ‘p-value’

• The probability of a false positive error if the null were in fact true

• Typically, we are willing to incorrectly reject the null 5% or 1% of the time (Type I error)

Power and Sample size

Page 13: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

13

Rejection of H0 Non-rejection of H0

H0 true

HA true

Power and Sample size

Power1-type II error = 1-β

Type II error = β

Signifcance level Type I error = α

Page 14: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

14

Rejection of H0 Non-rejection of H0

H0 true

HA true

Nonsignificant result(1- )

Type II error at rate

Significant result(1-)

Type I error at rate

Power and Sample size

Page 15: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

15

T

alpha 0.05

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER: 1 -

Standard Case

Power and Sample size

Page 16: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

16

T

POWER: 1 - ↑

Increased effect size

alpha 0.05

Sampling distribution if HA were true

Sampling distribution if H0 were true

Power and Sample size

Page 17: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

17T

More conservative α

alpha 0.01

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER: 1 - ↓

Power and Sample size

Page 18: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

18

Less conservative α

alpha 0.1

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER: 1 - ↑

Power and Sample size

Page 19: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

19

T

alpha 0.05

Sampling distribution if HA were true

Sampling distribution if H0 were true

Reduced variation

Power and Sample size

POWER: 1 - ↑

Page 20: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

20

Determining Sample SizeWe need:

– Acceptable type I error rate (),

• usually 0.05, or 0.025 if one sided

– A meaningful difference in the response: the smallest Tx effect clinically worth detecting / that we wish to detect

– The desirable power (1- to detect this difference, min. 80%

– Ratio of allocation to the groups (equal sample sizes?)

– Whether to use one-sided or two-sided test

In addition, – The variability common to the two populations for continuous

endpoint– The response (event) rate of the control group for the binary

endpoint

Power and Sample size

Page 21: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

21Power and Sample size

Calculating power using software or Web

-PRISM StatMate ($50)

-G*Power 3 (Free)

-Statistical software: SPSS, SAS, Stata, R

-PS Power and Sample size Calculation (free) (Windows)

-Web: Google “Statistical Power Calculation”

-Russell V. Lenth

-http://www.stat.uiowa.edu/~rlenth/Power/

-David Schoenfeld

-http://hedwig.mgh.harvard.edu/sample_size/size.html

-Perform calculation in two methods – similar answers

Page 22: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

22Statistical Considerations

Russ Lenth’s Power and Sample size pagehttp://www.stat.uiowa.edu/~rlenth/Power/

Page 23: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

23Statistical Considerations

http://hedwig.mgh.harvard.edu/sample_size/size.html

Page 24: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

24

Determining Sample Size: Continuous outcome

• Two Anti-Hypertensives: – Testing for superiority

• Endpoint: Difference in Diastolic BP – Continuous variable

• Relevant parameters– Difference in Diastolic BP between drugs: =2 mm Hg– Standard deviation of Diastolic BP in each group: = 10 mm Hg– Significance level: 0.05– Required power: 0.8 – Assume equal sized groups

• Calculate sample size required

Power and Sample size

393 patients in each group

Page 25: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

25Power and Sample size

Russ Lenth’s Power and Sample size pagehttp://www.stat.uiowa.edu/~rlenth/Power/

Page 26: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

26Power and Sample size

Page 27: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

27

Power, by difference between two groups

Statistical Considerations

Page 28: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

28

Continuous outcome:

Power and Sample size

Page 29: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

29Power and Sample size

Determining Sample Size: Discrete Example

• APT070 perfusion vs. cold storage of kidney • Testing for superiority

• Endpoint: Delayed Graft Function after transplantation• Proportion of patients experiencing delayed graft

• Relevant parameters• Baseline prevalence: 35%• Minimum difference clinically significance, 10%• p1=0.35, p2=0.25 [proportion with delayed graft function in each group]

• Significance level =0.05 • Power = 80%

• Calculate sample size required

349 patients in each group

Page 30: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

30Power and Sample size

Russ Lenth’s Power and Sample size pagehttp://www.stat.uiowa.edu/~rlenth/Power/

Page 31: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

31Power and Sample size

http://hedwig.mgh.harvard.edu/sample_size/size.html

With 349 patients on treatment A and 349 patients on treatment B there will be a 0% chance of detecting a significant difference at a two sided 0.05 significance level. This assumes that the response rate of treatment A is 0.35 and the response rate of treatment B is 0.25.

With 349 patients on treatment A and 349 patients on treatment B there will be a 80% chance of detecting a significant difference at a two sided 0.05 significance level. This assumes that the response rate of treatment A is 0.35 and the response rate of treatment B is 0.25.

Page 32: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

32Power and Sample size

Discrete outcome

Page 33: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

33

How to use power calculations

• Use power prospectively for planning future studies– Determine an appropriate sample size– Evaluating a planned study – will it yield useful information?

• Put science before statistics. – Use effect sizes that are clinically relevant – Don’t get distracted by statistical considerations

• Perform a pilot study – Helps establish procedures, understand and protect against

the unexpected– Gives variance estimates needed in determining sample

size

Power and Sample size

Page 34: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

34Power and Sample size

1.Superiority

2.Equivalence:

Objective To demonstrate that two treatments have no clinically meaningful difference

H0: The two Txs effects are different with respect to the mean response

H1: The two Txs are equal with respect to the mean response

A B

A B

Design: What is your Hypothesis?

A B d or A B d

d A B d

d = largest difference clinically acceptable

Page 35: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

35Power and Sample size

3.Non-Inferiority:

Objective To demonstrate that a given treatment is not clinically inferior to another

H0: A given Tx is inferior with respect to the mean response

H1: A given Tx is non-inferior with respect to the mean response

Design: What is your Hypothesis?

A B d

A B d

Page 36: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

36

QUIZAssume 80% Power, α = 0.05, two-sided

(x) more with A(y) more with B(z) the same

Study A Study B1. Mortality 20% vs 10% 20% vs 15%

2. Mortality 20% vs 10% 40% vs 30%

3. Diastolic BP 80 vs 85 mmHg 90 vs 95 mmHgSt. dev 10 St dev 10

4. Diastolic BP 80 vs 85 mmHg 80 vs 85 mmHgSt. dev 10 St dev 8

A B

(x) more with A(y) more with B(z) the same

(x) more with A(y) more with B(z) the same

(x) more with A(y) more with B(z) the same

How manysubjects?

Which study needs largest sample size?

Power and Sample size

Page 37: Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology

37

1. B

2. B

3. Same

4. A

ANSWERS

Bigger effect size in A (doubling of survival. Smaller effect, larger sample size needed to detect

Small difference need more subjects

Only standard deviation matters

Bigger standard deviation more subjects

Power and Sample size