adaptive designs p. bauer medical university of vienna june 2007

Adaptive Designs P. Bauer Medical University of Vienna June 2007 Content Statistical issues to be addressed in planning a classical frequentist trial Sequential trials a step forward Type of design modifications The principles to achieve flexibility The price to be paid Weighting Optimality Feasibility A typical scenario Concluding remarks The classical frequentist trial The details of the design, such as sample size, method of randomization, , are laid down in advance. The statistical analysis strategy is also pre-fixed. There is lack of flexibility to deal with current information emerging from inside or outside the trial which could raise the demand for design modifications. The classical frequentist trial - Statistical issues for planning Population Inclusion criteria Treatments (doses) Application time(s), period and mode Main outcome variable(s) Measurement time(s) Recruitment time Follow up time Secondary outcome variable(s) Safety variables Statistical issues - continued Analysis strategy Statistical model (e.g., parametric versus non- par.) Goal (e.g., superiority versus non-inferiority) Test statistics Covariables Handling Multiplicity Subgroups Handling missing information Checking the assumptions Handling deviations from the assumptions Checking the stability of results Handling current safety data (dropping treatments?)... Statistical issues - continued Significance level Power Sample size Relevant effect size(s) Variability of the outcome variable Outcome in the control group(s) Drop out rate... Dealing with the unexpected ? A step forward - Sequential designs The sample size is not fixed ! By looking at the outcome more than one time during the trial decisions like stopping with an efficacy claim or for futility may be taken earlier in the trial The sample size adapts to the true effect Further design issues Number of interim analyses Timing of interim analyses Decision rules to be applied in the interim analyses Rules for dropping treatments Maximum sample size... Sequential designs - continued The false positive error rate can only be calculated in advance if also the decision rules are specified in advance! For particular deviations from the rules the impact on the error rate may be known. E.g.: In case of a large mid-trial effect you may increase, in case of a small effect you may decrease the sample size! POSCH et al. [2003], see later Planned Adaptivity Lay down a set of adaptation rules among you can chose Calculate the maximum possible Type I error rate inflation which may occur with these rules Adjust the rejection boundaries so that the maximum type I error rate is controlled Like in sequential designs you have to adhere to the pre-specified set of rules! Fully adaptive (flexible) designs Traditionally the notion adaptive was used for data- dependent randomization methods like the play the winner treatment allocation rule. Adaptive or flexible designs allow for mid-trial design modifications based on information from in- and outside the trial without compromising on the false positive error rate (and hopefully improving the performance of the running trial). Issues of flexibility Selection of treatments Reallocation of samples Modification of the total sample size Modification of statistical analysis Choosing optimal scores Insertion or skipping of interim analyses Subgroup selection Changing goals (non-inferiority superiority) Modification of endpoints... Only writing amendments for online design modifications will not be the general solution! The invariance principles to achieve flexibility: 1. Adaptive combination tests In a two-stage design an interim analysis is planned after a sample size of n 1 observations. Denote by p 1 and p 2 be the p-values of a test in the disjoint first and second stage sample respectively of the planned two stage design (instead stage-wise z-scores, i.e. standardized treatment differences, could be used). 1. Two stage combination tests (cont.) If different patients are investigated at the two stages these two p-values p 1 and p 2, under the null hypothesis, generally are independent and uniformly distributed between [0,1] (can be relaxed). This properties still hold true if the second stage of the design is modified in the interim analysis based on information from the first stage! 1. Two stage combination tests (cont.) The final test decision is based on a pre-fixed combination of the two p-values (or z-scores), e.g., on the product function [R. A. FISHER] or the inverse normal function * The weighted z-cores - an explanation (we pretend to know the variance 2 ) The z-scores * 1.Stage Mean treatment effect (balanced design) Standard error x 1A -x 1B n1n1 z1=z1= 2 This would be important not to be * If there is no treatment effect (under the null) the z- score for large sample sizes follows a st. norm. distr.! The weighted z-cores - an explanation (we pretend to know the variance 2 ) The z-scores 1.Stage2.StageTotal x 2A - x 2B n2n2 z2=z2= 2 (n 1 +n 2 ) x A - x B z= 2 x 1A - x 1B n1n1 z1=z1= 2 The clue The z-scores from disjoint samples under the null for large samples follow ind. standard normal distr. Adaptations (e.g., increasing the sample size from to ) performed before the sample at the following stage is observed obviously have no influence on this universal property! Using the planned sample sizes n 1 and n 2 for the weights w 1 and w 2 to combine the scores and we again under the null get a standard normal test statistics, which can be easily used for testing. If no adaptation is made we end with the conventional test! Dont pool the sample combine the test statistics! The distribution of the comb. function under the null does not depend on design modifications Hence the adaptive test is still a test at the level for the modified design ! Applicable for multiple looks Sequential decision boundaries can be applied Recursive application allows for a flexible # of looks BAUER [1989], BAUER and KHNE [1994], LEHMACHER and WASSMER [1999], CUI et al.[1999], BRANNATH et al.[2002] 1. Two stage combination tests (cont.) Continuation Adaptation The two stage combination test with sequential decision boundaries Futility-Stop Acceptance of H 0 Rejection of H 0 Acceptance of H 0 Rejection of H 0 p1p1 0 1 Stage n 1 n 1 C(p 1, p 2 ) 0 c 1 2. Stage n2n2n2n2 n 1 +n 2 ____________________________ The invariance principles (cont.) 2. The conditional error concept The conditional error probability of a designed trial at a particular time is the probability under the null hypothesis to get a rejection later on, given the information observed up to that time. PROSCHAN and HUNSBERGER [1995] Design modifications always preserving the conditional error prob. also preserve the level ! MLLER, SCHFER [2001, 2004] Any design change is allowed at any (unplanned) time if the new design has a conditional error prob. never exceeding that of the original design ! 00 11 * 11 Early Rejction of H 0 00 11 Futility-Stop Accept H 0 00 11 Early Rejction of H 0 Futility-Stop Accept H 0 00 Continuation (another n 2 observations) 11 Early Rejction of H 0 Futility-Stop Accept H 0 00 CEF The conditional error function is defined as the conditional error probability as a function of the outcome up to the interim analysis, in our example a function of p 1 It can be explicitly defined in advance, like the circular error function ( PROSCHAN & HUNSBERGER, 1995 ), or implicitly by the pre-planned design, e.g. of a group sequential design ( MLLER & SChFER, 2001 ) Combination test Conditional error * If the conditional error can be calculated then there is a close relationship between the two approaches POSCH and BAUER [1999] Example: Assume that the final test decision for the product criterion was planned to be p 1 p 2 c [=0.025, c =0.0038, p 1 =0.045] Any second stage test with the rejection region p 2 c /p 1 [p 2 0.0844] would preserve the conditional error ! Conditional error function - general Testing the mean of a normal distribution ( known variance) in a design with fixed sample size n Look into the data after n 1 observations: One-sided critical region at the end: Sufficient test statistics after n 1 observations: Interim analysis after n 1 observations - Reassessment of trial perspectives Conditional Power CRP (Conditional Rejection Probability) What is the probability to reject H 0 at the end, conditionally on the results after n 1 observations? Conditional power Overall power Overall power: Expected value of the r. v. conditional power taken over all possible interim outcomes (including those which definitely have not been observed in the trial). It is tempting and suggests itself to reassess the trial perspectives after knowing the interim results. Conditional power Proposals to estimate the conditional power 1.Use some fixed effect size c for determining the conditional power, e.g., the effect size c = d from the planning phase: The true conditional power is unknown since the true mean is unknown too ! Estimation of the conditional power 2. Use effect observed in the interim analysis 3.Weighted mixture from 1 and 2, e.g., posterior distribution of in interim analysis Since the conditional power in any case depends on the interim outcome it is a random variable ! To remember Density of the conditional power using the effect size from the planning phase =0.05/2=0.025, 1- =0.8, d =1 ( further called Conditional Power) I. Density of the conditional power in a perfectly powered study depending on the inspection time r We assume that the true effect size is exactly equal to the one used in the planning phase. Under these hypothetical assumptions we calculate the density of the true conditional power depending on the information time r=n 1 /n. Density of the Conditional Power based on d =1 Conditional power Density of the Conditional Power based on d =1 Conditional power Density of the Conditional Power based on d =1 Conditional power Density of the Conditional Power based on d =1 Conditional power II. The conditional power halfway through the trial depending on the true effect We assume that a mid-trial interim analysis (r=1/2) is performed. We assume that for calculating the conditional power the experimenter uses the effect size d the study has been powered for. We calculate the conditional power depending on the unknown true effect size which is generally different from d. Density of the Conditional Power based on d =1 Conditional power Density of the Conditional Power based on d =1 Conditional power Density of the Conditional Power based on d =1 Conditional power Density of the Conditional Power based on d =1 Conditional power III. Comparison of the Conditional Power vs. the Predictive Power (using the interim effect estimate) Conditional/Predictive Power Conditional Power Comparison of Conditional and Predictive Power Predictive Power Conditional Power Predictive Power Conditional/Predictive Power Comparison of Conditional and Predictive Power Conditional Power Conditional/Predictive Power Comparison of Conditional and Predictive Power Predictive Power Conditional Power Conditional/Predictive Power Comparison of Conditional and Predictive Power Predictive Power Conditional Power Comparison of Conditional and Predictive Power Conditional/Predictive Power Predictive Power IV. The deviation between predictive and true conditional power- simulations Pros Enormous flexibility (no pre-specification), e.g., look into the un-blinded data at whatever time to decide on forthcoming interim analyses. Using the appropriate combination function (c.e.f), then, in case of no design adaptations, the analysis is conventional. In this case no price is paid for the option of flexibility! Every conventional test with a fixed adaptation rule can deal with unexpected deviations from this rule via the conditional error function. Cons (There is no such thing as a free lunch!) Non-standard test statistics other than the sufficient statistics are used. Conflicting decisions with conventional analyses may occur (put on restrictions). Problems of estimation without mandatory adaptation rules (CIs available). Problems of interpretation if the adaptation changes the hypothesis (multiple inference available, see later Combining phases). Estimation (without early stopping) Median unbiased Unbiased * random weight Confidence intervals Define adaptive tests for every point in the parameter space The one-sided (1-)-CI contains all points for which the one sided null hypothesis is not rejected at the level Use the same sequential boundaries for the shifted test statistics (Conservative RCI which can be calculated at any interim analysis) Use conditional error functions for all parameter values to perform the dual tests Use an ordering in the sample space so that early stopping always leads to wider CIs BRANNATH et al. (2006); MEHTA et al. (2007); BRANNATH et al.( ) The optimality issue of not using sufficient test statistics * Simple null and alternative hypotheses Fixed sample size reassessment rule defines the spending functions for early rejection and early acceptance respectively No costs for interim analyses There is a non-adaptive LR-test (with interim analyses at all sample sizes possible in the adaptive design) which leads to a lower average sample size both under the null and the alternative hypothesis. TSIATIS & MEHTA (2003) The optimality issue (cont.) * A risk function in form of a weighted sum of the type I error probability, the type II error probability and the expected information is considered The Bayes risk in form of an expectation of this risk function over an a-priori distribution on a finite grid in the parameter space is minimized Each admissible decision rule is a Bayes rule with a solution based on the sufficient statistics JENNISON & TURNBULL (2006) The optimality issue (cont.) These optimality results only apply for the a-priori fixed risk structure Real life is higher dimensional and more complex than the simple models E.g., a simple risk structure may (and often does) change over time (and the trial) ! The optimality issue (cont.) Example An unexpected safety issue arises in a clinical trial, so that a larger sample size is required under the experimental therapy to achieve a sufficiently precise estimate of the cost benefit relationship before registration Now, the original weight in the a-priori defined risk function for the expected information may be completely irrelevant. The costs for additional sampling may be completely dominated by the need to get more information on a variable other than any pre-planned outcome variable The optimality issue (cont.) Given the adaptation and the interim data we can again use conditionally optimal designs based on the sufficient statistics for the rest of the trial * BRANNATH et al. (2006a) Fully adaptive designs are a way to deal with such situations of a changing environment and still not compromising on the type I error rate The weighting issue: When does sample size adaptation never inflate the type I error rate of the conventional test? One-sided z-test for a normal mean Sample size adaptation in an interim look at the data after half of the sample, n 1 =n/2 When does sample size adaptation never inflate the type I error rate of the conventional test ? * ( POSCH et al ) The weighting issue Adaptive tests generalize group sequential designs. They can be identical to the conventional tests when no adaptations are performed However, in case of (sample size) adaptations due to the a-priori definition of the combination function observations from different stages in general are weighted differently (the decision of the adaptive test is not based on the sufficient test statistics) Extreme situations for absurd test decisions can be constructed (if you decide to take one observation for the second stage instead of the planned 500 this observations may overrule the first stage data) ! The weighting issue (cont.) Comments There are statistics used in other fields where the observations may be weighted differently Typically the rejection region of the adaptive test is consistent with the fixed sample test if one applies reasonable sample size reassessment strategies, adequate early stopping boundaries, the marginally conservative dual test (which rejects only if both the adaptive and the LR-test based on the total sample reject). * The weighting issue (cont.) An alternative the worst case adjustment * For every interim outcome determine the worst case sample size reassessment rule which would produce the largest type I error rate for a test based on the conventional sufficient statistics Adjust the critical level of this test so that the level is controlled for all possible sample size rules PROSCHAN & HUNSBERGER (1995), WASSMER (1999), BRANNATH et al. (2006b) This test can be uniformly improved by a fully adaptive test based on the worst case CE-function! Reassessment of trial perspectives may we be misguided? Scenario Interim look halfway through a classical trial with =0.025 (one-sided), power 1- = 0.80, normal distribution, 2 =1, effect size for planning d =1 Conditional power to get a rejection at the end, given the first half of the outcome data The conditional power is random because it depends on the random interim results. The true conditional power is unknown, since it depends on the unknown effect size. Some plug in the a priori effect size d from the planning phase. Some use the effect size observed in the interim analysis. Reassessment of trial perspectives (cont.) The distribution of the conditional power * What is the distribution of the conditional power if we look halfway through an underpowered trial into the data given that the actual effect size is only 40% of the (optimistic) effect size used in the planning phase? BAUER and KNIG [2006] Density of the conditional power conditional power dddd observed effect Cons (continued) Possible impact of the interim analysis on the course of the trial (more than in sequential trials) A fundamental concern A trial is sized to finally get some decisions with a controlled probability for erroneous decisions Why do we expect that early information from the trial will reliably guide us to the right track? Flexible designs are more difficult to handle (also as compared to sequential designs) The feasibility issue Flexible designs require a careful planning and improved logistics in order to maintain integrity and persuasiveness of the results The control of the information flow is crucial as various un-blinded material may be needed in interim analyses to achieve good decisions about the necessity and type of adaptations to be done The adaptation process has to be documented carefully so that adaptations can be justified considering the complications connected with such designs The feasibility issue (cont.) Besides other (known) prices to be paid for flexibility the feasibility issues may be the largest obstacle against a wide use of flexible designs! Reflection Paper on Methodological Issues in Confirmatory Clinical Trials with Flexible Design and Analysis Plan, EMEA, 2006 FOR A SURVEY OF APPLICATIONS, see BAUER & EINFALT (2006) Planning for adaptations is important SEVERAL AUTHORS ( ) Sometimes one can get the impression that papers on flexible design have to include this type of warning, and the further impression that the authors feel somewhat guilty for showing how much freedom is possible, and, after having gone public, want to get some distance between their work and the possible consequences. TALK BY J.RHMEL, A typical application: Dose selection and confirmative inference (the burning issue of combining phases) Scenario 4 doses, Placebo, parallel groups, balanced Many-one comparisons of doses with Placebo Multiple level E.g., a sequential adaptive Dunnett, Hochberg or strictly hierarchical test procedure BAUER & KIESER (1999). HOMMEL (2001) Adaptive Seamless Designs: POSCH et al. (2005), THE NOVARTIS GROUP, e.g., BIOMETRICAL JOURNAL (2006) 75 PSI 2006 / Adaptive Designs Standard 2 phases Adaptive Seamless Design Plan & Design Phase III Dose Selection Learning A B C D Control A B C D Confirming Learning, Selecting and Confirming Plan & Design Phase IIb Plan & Design Phase IIb and III Comparison of ASD for treatment selection with separate phase II and III trials (1) BRETZ et al. (2006) Combining phases (cont.) Adaptive interim analysis Options Skip low ineffective (use a surrogate end point?) or high unsafe doses Early stopping with a positive efficacy claim or for futility (surrogate?) Redistribution of the sample units saved among the remaining doses and Placebo Change of the reallocation ratio because more observations on a high dose are needed in order to address an arising safety problem Increase of the planned total sample size Selection and multiple (closed) testing Treatments: A, B, C (Control). Two comparisons: A vs. C and B vs. C We predefine adaptive combination tests for the null hypotheses A=C, B=C, A=B=C (no effects) Treatment B has been selected for the second stage Final analysis: B is claimed to be effective if both the global null hypothesis A=B=C and the individual null hypothesis B=C are rejected in their combination test at the level * Reflection Paper (EMEA) In general changes to designs of an ongoing phase III trial are not recommended. If such changes are anticipated in a confirmatory clinical trial this would require pre-planning and a clear justification form an experimental point of view Studies with interim analyses where there are marked differences between different study part or stages, will be difficult to interpret From a regulatory point of view, whenever trials are planned to incorporate design modifications based on the results of an interim analysis, the applicant must pre-plan methods to ensure that results from different stages of the trial can be justifiably combined. In this respect, studies with adaptive designs need at least the same careful investigation of heterogeneity and justification as is usually required for the combination of individual trials in a meta-analysis Reflection Paper (EMEA) The need to reassess sample size in some experimental conditions is acknowledged. However, if more than one sample size reassessment seems necessary, this might raise concern that the experimental conditions are fluctuating and not fully understood External knowledge from other studies may suggest . In such cases, adaptive designs may allow an opportunity to discuss changes of the primary endpoint, changes in the components . A change in the primary endpoint after an interim analysis should not be acceptable: An adaptive design, combined with a multiple testing procedure, may offer the opportunity to stop recruitment of a placebo group after an interim analysis as soon as superiority of the experimental treatment over placebo has been demonstrated Reflection Paper (EMEA) Investigator may wish to further investigate more than one dose of the experimental treatment in phase III. Early interim results may resolve some of the ambiguities and recruitment may be stopped for some doses. : it is not sufficient to show that some dose of the experimental treatment is effective. In consequence, a multiple testing procedure to identify the appropriate dose should be incorporated Switching from non-inferiority to superiority: may be the desire to continue the study to demonstrate, with additional patients, superiority of the experimental treatment over the active comparator. This possibility should, however, be set into perspective If, based on interim analysis results, it can be assumed that the trial will still have sufficient power but using a randomization ratio of, say 1:2, then this may be seen as a useful option Reflection Paper (EMEA) In some cases late phase II development, the selection of doses is already well established and further investigation in phase II would be performed with the same endpoints that are of relevance in phase III. Similar considerations as outlined for the selection of treatment arms at an interim analysis may apply and would allow for the conduct of a combined phase II / phase III trial Phase II / phase III combination trials, when appropriately planned, may be used to better investigate the correlation between surrogate endpoints and clinical endpoints, and may, therefore, support the process of providing justification that an optimal dose-regimen for the experimental drug has been selected However, it will not be acceptable to argue for the acceptability of an application with only one combined phase II / phase III trial Conclusions In late phase trials flexibility should be used care- and thoughtfully to maintain integrity and persuasiveness of the results Too early looks may be strongly misleading Flexible designs are an excellent tool to deal with unexpected findings (e.g. larger sample sizes are needed because an arising safety issue, or can be afforded because of a new market situation, a dose adaptations seems to be indicated, a subgroup stands out, ) References: Special issue Adaptive designs in clinical trials Biometrical Journal, 48 (4), 2006 Thank you for your patience!

adaptive designs p. bauer medical university of vienna june 2007

Documents