testing and estimation procedures in multi-armed designs with treatment selection gernot wassmer,...

Testing and Estimation Procedures in Multi-Armed Designs with Treatment Selection

Gernot Wassmer, PhD

Institut für Medizinische Statistik, Informatik und Epidemiologie

Universität zu Köln

ADDPLAN GmbH

Adaptive Design KOL Lecture Series, August 14th, 2009

Introduction

Confirmatory adaptive designs are a generalization of group sequential designs, where - in interim analyses - confirmatory analysis is performed under control of the Type I error rate and data dependent changes of design are allowed.

Three particular applications

– Sample size reassessment

– Treatment arm selection

– Subset selection (“enrichment designs”)

This talk shows

– how to reach a test decision in an adaptive multi-armed trial with treatment selection at interim

– how to calculate confidence intervals and overall p-values

Confirmatory adaptive designs

can be based on

– the combination testing principle

– the conditional error approach

Combination testing principle

Combination of p-values with a specific combination function

(Bauer, 1989; Bauer & Köhne, 1994)

Inverse normal method: The test decision is based on

221

11

11* )1()1(

k

kkk

ww

pwpwZ

Lehmacher & Wassmer, 1999where the weights wk are prefixed

The conditional error approach

Plan a trial with reasonable (optimum) design, including sample size

calculation and timing of interim analyses.

Calculate the conditional Type I error rate (x1,…,xk) at any time during

the course of the trial

(x1,…,xk) = conditional probability, under H0, of rejecting

H0 in one of the subsequent stages, given x1,…,xk

x1,…,xk: data up to stage k

Remainder of the trial can be defined as a test at level (x1,…,xk)

where the design of this test is arbitrary.

Müller & Schäfer (2001): “CRP principle” Brannath, Posch & Bauer (2002): “Recursice testing principle”

The situation

Consider many-to-one comparisons, e.g., G treatment arms and one control, normal case.

Throughout this talk, we consider one-sided testing.

In an interim stage a treatment arm is selected based on data observed so far.

Not only selection procedures, but also other adaptive strategies (e.g., sample size reassessment) can be performed.

Application within “Adaptive seamless designs” using the combination testing principle

Sources for alpha inflation

Interim analyses

Sample size reassessment

Multiple arms

The proposed adaptive procedure fulfils the regulatory requirements for the analysis of adaptive trials in that it strongly controls the prespecified Type I error rate.

This procedure will be based on the application of the closed test procedure together with combination tests (e.g., Bauer & Kieser, 1999; Hellmich, 2001; Posch et al., 2005, Bretz et al., 2009).

Other approaches: Thall et al., 1988; Follmann et al, 1994; Stallard and Todd, 2003; Stallard and Friede, 2008;

30

20

10 HHH

20

10 HH 3

010 HH 3

020 HH

10H 2

0H 30H

Stage II …

SH0

Simple “trick”: Test of intersection hypotheses are formally performed as tests for .0

SH

?

?

Stage I

Closed testing procedure

At the first interim analysis, consider a test statistic for

e.g., the test statistic

Closed testing procedure

,30

20

10 HHH

.3,2,1, forstatistic test stage first the denotes where

,),,(max

01

31

21

111

iHtZ

ZZZZ

ii

That is, compute Dunnett’s adjusted p-value for each intersection hypothesis, critical values are according to

ly.respective density, its and

cdf normal standard the denote and and where

,1)()1

()(

0

1 2

,,D

i

ii

Gi

i

GiG

nnn

dxxcx

c

Or compute the p-value using Dunnett’s t distribution.

where pJ is the p-value of the Dunnett test for testing

qS is the second stage test statistic for the selected treatment arm,

and u2 is the critical value for the second stage.

,0Ji

iH

Test decision for the second stage:

SH0 is rejected if

,),(min 2uqp SJSJ

This is the use of the inverse normal method for the Dunnett test situation.

22

21

12

11 )1()1(

),( Letww

qwpwqp

can be rejected if all combination tests exceed the critical value u2 .

Stage I

3210

10

Stage II …

Example S = 3

30H

30 20

Simple shortcut:If the treatment arm with the largest test statistic is selected, it suffices to combine the test for H0: with the test for H0:

3210 30

30

210 310 320

Properties of the Procedure

Choice of tests for intersection hypotheses is free, i.e., you might select, e.g., Dunnett‘s test, Bonferroni-, Simes or Sidak‘s test.

The procedure may become inconsonant and, hence, conservative. I.e., you can reject the global hypothesis, but no single hypothesis (Friede and Stallard, 2008).

A hypothesis can be rejected at a later stage even it was not selected for the current stage (and not rejected before). This can happen if, e.g., the test statistic for the global hypothesis exceeds u2 in the second stage but not u1 in the first stage, and the test statistic for the de-selected hypothesis exceeds u1 in the first stage.

12

An alternative procedure

. and stage, interim the at ninformatio the denotes where

,)())1)(1(

1(),(

01

1 21

,1111,CD

i

ii

Gi

i

Gi

iG

nnn

t

dxxt

cztxtzc

(König et al., 2008)

Compute conditional error at first stage:

In the second stage, perform a Conditional second-stage Dunnett test Separate second-stage Dunnett test

).,( level lconditiona at 1,CD zc G

This is the application of the CRP principle (Müller & Schäfer, 2001).

It assumes the variance to be known

A comparison shows that

the conditional second-stage Dunnett test performs best but is hardly better if a treatment arm selection was performed (cf., Friede and Stallard, 2008)

it is identical with the conventional Dunnett test if no adaptations were

performed

becomes complicated if, e.g.,

– allocation is not constant

– variance is unknown

the inverse normal technique is not optimum but enables early stopping

and more general adaptations

is straightforward if, e.g.,

– allocation is not constant

– variance is unknown

A comparison shows that

the conditional second-stage Dunnett test performs best but is hardly better if a treatment arm selection was performed (cf., Friede and Stallard, 2008)

it is identical with the conventional Dunnett test if no adaptations were performed

becomes complicated if, e.g., – allocation is not constant– variance is unknown

the inverse normal technique is not optimum but enables early stopping and more general adaptations

is straightforward if, e.g., – allocation is not constant– variance is unknown

Overall p-values

Defined as smallest p-value for which the test results yield rejection of the considered (single) hypothesis

Repeated overall p-value can be calculated at any stage of the trial.

That is,

p-values account for the step-down nature of the closed testing principle and are completely consistent with the test decision.

kHp ggk stage at rejected be can 0

Overall confidence intervals

Confidence intervals based on stepwise testing are difficult to construct. This is a specific feature of multiple testing procedures and not of adaptive testing.

Posch et al. (2005) proposed to construct confidence intervals based on the single step adjusted overall p-values. These can also be applied for the conditional Dunnett test.

The RCIs are not, in general, consistent with the test decision. It might happen that, e.g., a hypothesis is rejected but the lower bound of the CI is smaller 0.

They can be provided for each step of the trial.

In general, they may fail to become narrower for increasing sample size (e.g., if Bonferroni or Simes intersection tests are used).

Illustration

Two-stage design with G treatment arms Selection of treatment arm with highest respone, no efficacy stop at interim Bonferroni (or Simes) correction is used for first stage

Lower bound lbj of 95% confidence intervals for effect j = j - 0 of selected treatment arm at second stage is calculated through

21

where,96.12))(1()})(,1min{1(:max{ 2111 jjjjjj ppGlb

110

11011

01

110

11011

01

)1

1(

ly,analogeous and, , )1

1(

j

jjj

j

jjj

nn

nn

Gxx ub

nn

nn

Gxx lb

2,1),(1)(0

00

i

nn

nnxxp

ij

i

ij

ij

iij

jij

It is easy to see that

Summary The adaptive procedures fulfil the regulatory requirements for the analysis

of adaptive trials in that they control the prespecified Type I error rate. For

regulatory purposes, the class of envisaged decisions after stage 1 should

be stated in the protocol.

The “rules” for adaptation and stopping for futility

– not need to be pre-specified

– Adaptations may depend on all interim data including secondary and safety endpoints.

– can make use of Bayesian principles integrating all information available, also external to the study

– should be evaluated (e.g. via simulations) and preferred version recommended, e.g., in DMC charter

Software ADDPLAN MC is available for designing and analyzing these trials

22

23

• Bauer, P. (1989). Multistage testing with adaptive designs (with Discussion). Biometrie und Informatik in Medizin und Biologie 20, 130–148.

• Bauer, P., Köhne, K. (1994). Evaluation of experiments with adaptive interim analyses. Biometrics 50, 1029–1041.• Bauer, P., Kieser, M. (1999). Combining different phases in the development of medical treatments within a single trial.

Statistics in Medicine 18,1833–1848.• Brannath, W., Posch, M., Bauer, P., 2002: Recursive combination tests. J. Amer. Stat. Ass. 97, 236–244.• Follmann, D. A., Proschan, M. A., Geller, N. L., 1994: Monitoring pairwise comparisons in multi-armed clinical trials.

Biometrics 50, 325–336.• Friede, T., Stallard, N., 2008: A comparison of methods for adaptive treatment selection. Biometrical J. 50, 767–781.• Hellmich, M., 2001: Monitoring clinical trials with multiple arms. Biometrics 57, 892–898.• König, F., Brannath, W., Bretz, F., Posch, M. (2008). Adaptive Dunnett tests for treatment selection. Statistics in Medicine

27, 1612–1625.• Lehmacher, W., Wassmer, G. (1999). Adaptive sample size calculations in group sequential trials. Biometrics 55, 1286–

1290.• Müller, H.H., Schäfer, H. (2001). Adaptive group sequential designs for clinical trials, combining the advantages of

adaptive and of classical group sequential approaches. Biometrics 57,886–891.• Posch, M., König, F., Branson, M., Brannath, W., Dunger-Baldauf, C., Bauer, P. (2005). Testing and estimation in flexible

group sequential designs with adaptive treatment selection. Statistics in Medicine 24, 3697–3714.• Posch, M., Wassmer, G., Brannath, W. (2008). A note on repeated p-values for group sequential designs. Biometrika 95,

253-256.• Stallard, N., Friede, T. (2008). A group-sequential design for clinical trials with treatment selection. Statistics in Medicine

27, 6209–6227.• Stallard, N., Todd, S. (2003). Sequential designs for phase III clinical trials incorporating treatment selection. Statistics in

Medicine 22, 689-703. • Thall, P.F., Simon, R., Ellenberg, S.S. (1988). Two-stage selection and testing designs for comparative clinical trials.

Biometrika 75, 303-310.

References

testing and estimation procedures in multi-armed designs with treatment selection gernot wassmer,...

Documents