ce course on adaptive dose-response studies … · ce course on adaptive dose-response studies 2007...
TRANSCRIPT
CE Course onAdaptive Dose-Response Studies
2007 Joint Statistical MeetingsSalt Lake City, UT – July 29, 2007
PRESENTERS
Christopher S. Coffey, PhDUniversity of Alabama at Birmingham
Email: [email protected]
Brenda Gaydos, PhDEli Lilly and CompanyEmail: [email protected]
José Pinheiro, PhDNovartis Pharmaceuticals
Email: [email protected]
LEARNING OBJECTIVES
At the end of the course, students should:
• Understand distinction between adaptive dose-response designs and other types of adaptive designs
• Understand use of adaptive designs in early and late phase drug development
• Understand advantages of adaptive dose-response designs over more traditional dose-response designs
• Understand how to implement adaptive dose-response trials
OUTLINE
I. What are Adaptive Designs? (~15 min.)
II. Summary of Development Stages (~15 min.)
III. Adaptive Dose-Response Methods for Early Exploratory Studies (~50 min.)
IV. Fixed Design Dose-Response Methods (~30 min.)
Break (~15 min.)
OUTLINE
V. Adaptive Dose-Response Methods for Late-Stage Exploratory Development (~50 min.)
VI. Simulations to Illustrate the Performance and Implementation of Adaptive Dose-Response Designs and Their Comparison to Traditional Methods(~ 50 min)
VII. Overall Conclusions and Recommendations (~15 min)
I. What are Adaptive Designs?
Outline:
1) Definition of an ‘adaptive design’.
2) Types of adaptive designs
3) Adaptive dose-response designs
WHAT ARE ADAPTIVE DESIGNS?
Recently, there has been considerable research on adaptive designs (also called flexible or innovative designs).
The rapid proliferation of interest in adaptive designs and inconsistent use of terminology has created confusion about similarities and differences among the various techniques.
For example, the definition of an “adaptive design” itself is a common source of confusion.
WHAT ARE ADAPTIVE DESIGNS?
PhRMA Working Group on Adaptive Designs (2006):
“By adaptive design we refer to a clinical study design that uses accumulating data to modify aspects of the study as it continues, without undermining the validity and integrity of the trial.”
“…changes are made by design, and not on an ad hoc basis”
“…not a remedy for inadequate planning.”
WHAT ARE ADAPTIVE DESIGNS?
Adaptive designs are NOT new.
The methodology has existed for decades.
However, because this is a rapidly expanding area of research, more practical experience is needed.
Although adaptive designs are scientifically and operationally more complex, the issues are resolvable.
WHAT ARE ADAPTIVE DESIGNS?
Myth #1: A study with one or more protocol amendments is an adaptive design.
Protocol Amendment:
• Addresses the unanticipated
• Need may or may not be based on outcome data
• Cam compromise validity of study conclusions
Adaptive by DESIGN:
• Planned flexibility to address areas of design uncertainty
• Impact of planned changes on conclusions understood
• MAY require a protocol amendment (but less likely)
WHAT ARE ADAPTIVE DESIGNS?
Myth #2: Adaptive design protocols should be vague to allow for flexibility.
In order to enable the process to be simulated, the extent to which adaptation is planned should be described a priori in detail, if possible.
Hung et al. (2006):“At the very least, the regulatory agencies need to know every detail of how the trial proceeded during its conduct and adaptations.”
WHAT ARE ADAPTIVE DESIGNS?
For this course, we focus on adaptive dose-response methods.
Such adaptive designs:
• Offer more efficient ways to learn about dose response
• Provide more information on dose-response profile earlier in development.
• Guide decision making on whether to continue program and, if so, which dose to select for further development
• Aim to increase probability of technical success by taking correct choice of dose forward for further study.
WHAT ARE ADAPTIVE DESIGNS?
Infinite number of adaptive design possibilities:
• Adapting dose is only one possibility.
• Many other aspects of the study can be changed:
- sample size - final test statistic
- primary endpoint - inclusion/exclusion criteria
- number of treatment arms - randomization procedure
- Number of interim looks - goal: superiority to non-inferiority
• Define objective of the adaptation and the design elements to adapt.
WHAT ARE ADAPTIVE DESIGNS?
Design Changes?Design Changes?
Adaptive/Flexible DesignsAdaptive/Flexible Designs
ChangeOther
Aspects(Test Statistic,
PrimaryEndpoint,Inclusion/Exclusion
CriteriaDose, etc.)
AdaptiveAdaptive
DoseDose--
ResponseResponse
SeamlessPhase II/IIIDesigns
AdaptiveRandomization
Planned
Planned Unplanned
EstimatedTreatment
Effect(Known
Variance)
SampleSize
Re-Estimation
Internal Pilots(EstimatedNuisance
Parameters)
Estimated“EffectSize”
???
WHAT ARE ADAPTIVE DESIGNS?
Historically, a great deal of controversy surrounding adaptive designs has been focused around a particular type of sample size re-estimation design:
SampleSize
Re-Estimation
EstimatedEstimated
Treatment EffectTreatment Effect
(Known Variance)(Known Variance)
Estimated “Effect Size”
Internal Pilots(EstimatedNuisance
Parameters)
WHAT ARE ADAPTIVE DESIGNS?
When rule for increasing sample size can be pre-specified, sample size re-estimation based on a revised estimate of treatment effect is nearly always less efficient than a group sequential approach.
- Tsiatis & Mehta (2003);Jennison & Turnbull (2003, 2006); Mehta & Patel (2006)
However, little controversy surrounds the use of IP designs (re-estimating only nuisance parameters).
Since IP designs can be implemented in large clinical trials with little penalty, the use of internal pilot designs should beencouraged.
WHAT ARE ADAPTIVE DESIGNS?
More recently, focus has shifted to the logistical barriers thatneed to be overcome before any adaptive design can be practically implemented.
These include:
Budget Administration
Increased communication with clinical sites
Information Technology
Protocol Issues
Shameless plug for upcoming 2007 JSM panel session:
Issues and Solutions to Planning and Implementing an Adaptive
Design in Practice
Organizer: Brenda Gaydos, Eli Lilly and Company
Time: Monday, 10:30-12:20
Panelists: Michael Krams, WyethPaul Gallo, Novartis PharmaceuticalsGernot Wassmer, The University of CologneJerald S. Schindler, CytelChristopher S. Coffey, UAB
WHAT ARE ADAPTIVE DESIGNS?
SUMMARY
Adaptive dose-response studies are one of many types of possible adaptive designs.
Adaptive designs are NOT always “better”.
Simulations under realistic scenarios are needed to assess how the design will perform.
Suggest routinely assessing the appropriateness of novel designs and analyses when developing clinical plans.
Adaptive by DESIGN – thorough upfront planning is required
II. Summary of Drug Development Stages
• Overview of Drug Development
• Traditional Phases of Clinical Development (I-III)
• Some Statistics
• Shift to Learn & Confirm Paradigm
OVERVIEW OF DRUG DEVELOPMENT
Drug Discovery
Preclinical Testing
File Investigational New Drug Application (IND)
Clinical Trial Development Phases
Phase I
Phase II
Phase III
File New Drug Application (NDA)
Review and Approval Process
Phase IV (Post-Marketing Studies)
PRIOR TO CLINICAL DEVELOPMENT
Drug Discovery (hypothesis generation)
Target-disease link identification & validation using biological tools
Assay development to support screening & evaluate screening hits
Molecule identification for preclinical testing
1 in 10,000 molecules synthesized will become a new medicine (approved)
Preclinical Testing
In vitro (laboratory) & in-vitro (animal subjects) studies to determine preliminary efficacy and pharmacokinetic information
Determine dose range to explore in Phase I
File New Drug Application (NDA)
Effective if FDA does not disapprove within 30 days
Institutional Review Boards (IRB) where clinical studies will beconducted must reviewed and approve prior to clinical study start
PHASE I
Typically in healthy volunteers (~ 20 to 100)
In cancer research, patients are typical (toxicity expected at efficacious levels)
Time in Phase I ~ 1.5 years
Objectives
Determine how the drug is absorbed, distributed, metabolized, and excreted
Identify the safe dose range for first efficacy dose (in patient studies)
– Maximum Tolerated Dose (MTD)
– No Adverse Effect Level dose (NOEL)
Assess preliminary efficacy
– If applicable animal model to relate concentration in animals to humans
– If applicable biomarker to relate biological activity to clinical efficacy
– If models available to relate concentration to marketed compound
PHASE I (cont.)
Typical Study Types
Single Dose Safety Study (SDSS)
– Also referred to as first human dose (FHD)
Multiple Dose Safety Study (MDSS)
Proof of Concept Study (PoC)
– Assess biological activity to support investment in Phase II
– MDSS study may be referred to as MDSS/PoC study if assessing preliminary efficacy & safety
– May be referred to as a Phase Ib study if in patients
Additional studies that may be run in Phase I
Biomarker studies
Methods studies
PHASE II
Clinical trials in patients (~ 100 to 500)
Time in Phase II ~ 2years
Objectives
Determine if there is a therapeutic response
Determine the appropriate dose, dose regimen, and patient population (inclusion/exclusion criteria) for further study in Phase III
Clinical trial material is not yet commercial formulation
PHASE III
Clinical trials in patients (~ 1000 to 5000)
Time in Phase III ~ 3.5 years
Objective
Evaluate overall benefit-risk relationship of the drug
Provide adequate basis for physician labeling
Clinical trial material is the expected marketing formulation
OTHER CONCURRENT STUDIES
Toxicology studies in animals
Prior to Phase I
During Phase I and/or II to support longer term human exposure
Biopharmaceutical package (clinical pharmacology)
Typically run during Phase III
Characterize drug kinetics (e.g. drug-drug interaction studies, kinetics in special populations, bio-equivalence, QT-prolongation)
Material development
FOLLOWING CLINICAL DEVELOPMENT
NDA
Contains all scientific information gathered
Typically 100,000 pages or more
Once approved, company must continue to submit periodic reports including adverse reactions
Phase IV (Post-Marketing Studies)
Involves post-launch safety surveillance to monitor for any rare or long-term adverse effects and ongoing technical support of a drug
Studies may be mandated by regulatory authorities or undertaken
by the company to gain more information
FROM WWW.PHRMA.ORG 7/07
Clinical Trials
Discovery/
Preclinical
Testing
Phase I Phase II Phase III FDA Phase IV
Years 6.5 1.5 2 3.5 1.5 15
Total
Test
Pop
Laboratory
and animal
studies
20 to 100
healthy
volunteers
100 to 500
patient
volunteers
1000 to 5000
patient
volunteers
Purpose
Assess
safety,
biological
activity, and
formulations
Determine
safety and
dosage
Evaluate
effectiveness,
look for side
effects
Confirm
effectiveness,
monitor
adverse
reactions
from long-
term use
Review
process /
Approval
Success
Rate
5,000
compounds
evaluated
File
IND
at
FDA
5 enter trials
File
NDA
at
FDA
1
approved
Additional
Post
marketing
testing
required
by FDA
SOME STATISTICS
Development time from lab to patient: 10-15 years *
Costs of bringing a new medicine to market: 800 Million to 1.7 Billion **
Success rate for novel candidate at Phase 1 to market: 8% **
Failure rate in Phase II: 40% *
Failure rate in Phase III: 45% ***
Development costs escalating: risen 55% in last 5 years **
* PhRMA; ** FDA Critical Path, 2004; *** Kola & Landis, 2004
R & D PRODUCTIVITY DECREASING
$0
$5
$10
$15
$20
$25
$30
$35
$40
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
0
20
40
60
80
100
120
140
160
180
200
Annual NMEApprovals
Industry R&D Expense
($ Billions)
R&D Investment
NME Approvals
Source: PhRMA, FDA, Lehman Brothers
CRITICAL PATH INITIATIVE: A CALL TO ACTION
"Critical Path" Paper Calls for
Academic Researchers, Product
Developers, and Patient Groups To
Work With FDA To Help Identify
Opportunities to Modernize Tools for
Speeding Approvable, Innovative
Products To Improve Public Health
www.fda.gov/oc/initiatives/criticalpath/
whitepaper.html
Source: Lawrence J. Lesko, Clinical Pharmacology Subcommittee of ACPS, Nov 4, 2004
… to Learn and Confirm
Phase 4Phase 4
Transition Zone
IND NDA Submission
ConfirmLearn
Phase 2
Phased approach …
IND NDA Submission
Phase 1 Phase 2 Phase 4
Transition time
Phase 3
SHIFT TO LEARN & CONFIRM
Source: Robert R. Ruffolo, Jr., Ph.D., Wyeth
Increased focus on the “Learn” phase More information on dose-response (safety/efficacy)
Reduction in transition time Seamless from Phase I-II
Seamless from Phase II-III
Expected OutcomeReduce clinical development time & costs
Increase information at time of NDA
WHY THE SHIFT ?
OBJECTIVES ADAPTIVE DOSE-RESPONSE
Getting information as fast as ethically possible about key aspects of the dose-response
Increased information with improved efficiencyExplore more doses with same sample size as fixed design
More observations at doses that better inform the dose-response
More observations on the doses that are most promising
Feasible to combine PoC and Phase II dose finding with early stopping for futility
Shorten development timelines
More informative Go / No Go decisions
Improve Pr (TS) in Phase III
Feasible to combine Phase II dose finding with Phase III
III. Adaptive Dose-Response Methods for Early Exploratory Studies
Outline:
• Summary of major philosophies regarding definition of maximum tolerated dose (MTD)
• Conventional 3+3 designs
• Model-based designs
• Case Studies
MTD - DEFINITION
Phase I clinical trials typically want to determine some maximum tolerated dose (MTD).
Accurate determination of the MTD is very important since the dose established as the MTD will be used for further testing in later phases.
Passing on too low of a dose may jeopardize a potentially useful drug
Passing on too high of a dose puts patients in later phase trials at risk
MTD - DEFINITION
Two major philosophies regarding MTD definition:
Dose that, if exceeded, would put patients at ‘unacceptable risk’ of toxicity.
• Treat the MTD as being observed from the data
• Vague from statisticians point of view since ‘unacceptable risk’may not be defined quantitatively
Specifying ‘unacceptable risk’ as a probability.
• Treat the MTD as an unknown parameter of a monotonic dose response curve.
• The MTD is estimated corresponding to a specified probability.
MTD - DEFINITION
1) Conventional up-and-down designs
• Such as 3+3 designs for cancer
2) Model-based designs where MTD is a quantile to be estimated
• Random walk rule
• Bayesian methods
These two definitions lead to two different approaches for designing phase I clinical trials:
CONVENTIONAL 3+3 DESIGNS
Conventional 3+3 methods employ an ad-hoc approach to screen dose levels and identify the MTD.
Toxicity is defined as a binary event and patients are treated in groups of three, starting with the initial dose.
Algorithm iterates moving dose up or down depending on the number of toxicities observed.
No estimation in a traditional sense is involved.
The MTD is a statistic identified from the data - highest dose studied with less than, say 1/3 toxicities (i.e., 0 or 1 dose-limiting toxicities out of six patients).
0 2 or moreCountEvents
Treat 3 patients at dose
Start at the lowest reasonable dose
Increase dose to next level
Treat 3 additional patients at dose
CountEvents
Decrease dose or stop and select lower dose
1
0 1 or more
CONVENTIONAL 3+3 DESIGNS CONVENTIONAL 3+3 DESIGNS
0%
20%
40%
60%
80%
100%
0.0 0.2 0.4 0.6 0.8 1.0
True "p"
Ch
an
ce o
f "S
tep
pin
g U
p"
Even with a 30% chance of an “event”there is still a 50% chance of stepping up!
CONVENTIONAL 3+3 DESIGNS
Strengths:
Simple to implement and understand
Requires no computer program
Familiar to many clinicians
CONVENTIONAL 3+3 DESIGNS
Drawbacks:
Estimate of MTD has no clear relationship with any percentile of the dose toxic response distribution
Tend to treat many patients at low, ineffective doses
No satisfactory approach for obtaining CI for MTD.
Often provide poor estimates of MTD (i.e., large uncertainty) - probability of stopping at incorrect dose is generally higher than perceived.
Hence, unsafe or non-efficacious doses may be advanced to Phase III trials.
RWR DESIGNS
Random Walk Rule (RWR, biased coin) designs:
Non-parametric model-based approaches to MTD estimation.
• MTD is treated as a quantile of a dose-response distribution, but no underlying parametric distribution is assumed.
Sample dose levels in unimodal region around MTD.
Provides unified approach targeting any quantile of interest.
Generalization of conventional up-and-down methods.
RWR DESIGNS
As in conventional designs, patients are treated sequentially, and dose escalation occurs when no toxicities are observed.
However, instead of applying a deterministic rule, a “biased coin” is flipped after observing each response.
The algorithm escalates to the next dose with probability p,where p depends on the targeted level of the response.
RWR DESIGNS
Strengths:
Non-parametric
Having a workable finite distribution theory
Simple and intuitive to implement
Simple software in MATLAB has been developed which gives the finite properties of the design(Durham, Flournoy, & Rosenberger, 1997)
CRM
Continual Reassessment Method (CRM):
Originated as a Bayesian method for phase I cancer trials of cytotoxic agents.
For a pre-defined set of doses and a binary response, estimates MTD as the dose level that yields a particular target proportion of responses (e.g., TD20).
Assumes a particular model (such as logistic function)
Assignment of doses converges to the MTD.
See Garrett-Moyer (2006) for an excellent tutorial.
CRM
The method assumes that the probabilities of both efficacy and toxicity increase with increasing dose.
The method also assumes that toxicity can be defined as a binary outcome.
The “acceptable” toxicity rate is explicitly defined and the MTD is the highest (most efficacious) dose with acceptable toxicity.
Similar designs can be used to explore dose-efficacy relationships (for agents that are non-cytotoxic).
CRM
The method begins with an assumed a priori dose-toxicity curve and a chosen target toxicity rate.
The first patients are assigned the dose most likely to be associated with the target toxicity level.
The estimated dose-toxicity curve is refit (i.e., the posterior distribution of the model is updated) after each patient’s outcome has been observed.
Hence, the updated curve is shifted slightly up or down depending on whether the patient experienced a dose-limiting toxicity.
CRM
The next patient is assigned the dose closest to the MTD based on the updated dose-toxicity curve (posterior distribution).
Patients continue to be treated until some pre-defined level of certainty is achieved or pre-defined stopping criteria are met.
Once the stopping criteria is achieved, the final dose is selected as the MTD.
CRM
For example, consider the following curve:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Dose
Even
t R
ate
If target level of toxicity is 10%, then dose level 5 would be the optimal starting dose.
CRM
An example of how the CRM might work:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Dose
Even
t R
ate
CRM
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Dose
Even
t R
ate
An example of how the CRM might work:
Final Dose
CRM
The implementation of a CRM requires a substantial collaboration between the investigator and statistician.
This collaboration is important in order to determine:
The dose-toxicity model to use
The target rate of toxicity (or response)
Stopping rules
CRM
Several types of mathematical models for the dose-toxicity curves may be chosen:
“One parameter” logistic models fix a midpoint and use the data to estimate the slope of the curve.
“Two parameter” logistic models estimate parameters that determine both midpoint and slope of the curve.
Hyperbolic tangent models
The choice of model is an aspect of CRM design that requires a statisticians assistance.
However, the estimation of the MTD has been shown to be fairly robust to model misspecification.
CRM
Choosing the target rate of toxicity is a key component of a CRM.
This requires defining the dose for which the probability of a dose-limiting toxicity is equal to some specified value :
Pr{ DLT | Dose = MTD } =
This determination should involve the opinions of several investigators and will depend on the nature of the DLT.
CRM
Stopping rules for CRM designs:
Continue until a fixed number of patients is treated.
Continue until a fixed number of patients have been treated at a dose (for discrete doses).
Continue until the target dose changes by less than 10% (for continuous doses).
CRM
When publishing results from a CRM trial, typical to display:
The recommended dose (MTD) for a future trial, along with some estimate of the variability surrounding the MTD estimate.
A table that shows how the CRM progressed, including:
• Number of dose-limiting toxicities for each cohort
• Estimated dose at end of each cohort
CRM
Strengths:
“Learns” from information gained at early time points in the study – all patients studied contribute to the estimated dose.
Less likely to treat patients at toxic doses – tends to incur fewer dose-limiting toxicities.
More likely to treat patients at efficacious doses
Can more accurately estimate the MTD as compared to standard 3+3 designs
CRM
Drawbacks:
Mathematical and statistical complexities make it difficult for many clinical investigators to understand.
Properties must be assessed via simulation.
Early on, large dose escalations can occur based on little information which may cause more patients to be treated at unsafe doses.
Dosing first patients at level deemed appropriate by the a priori curve may be worrisome due to uncertainty surrounding this curve.
MODIFIED CRM’s
To address some of the concerns with the original CRM, Several modified CRM approaches have been developed and implemented:
Always start at the lowest dose level under consideration
Enroll 2-3 patients in each cohort
Proceed as a standard 3+3 dose escalation design in the absence of dose-limiting toxicities.
Any given dose escalation cannot increase by more than one level.
MODIFIED CRM’s
Strengths:
Mathematical model is not solely responsible for determining dosage increases – restricted by design.
Starting dose can be chosen as with a traditional design – start dosing at the lowest level
MODIFIED CRM’s
Drawbacks:
Mathematical and statistical complexities make it difficult for many clinical investigators to understand.
Properties must be assessed via simulation.
OTHER BAYESIAN DESIGNS
1) Escalation with overdose control:
Similar to CRM, but addresses ethical need to control probability of overdosing.
Designed to approach MTD as rapidly as possible subject to constraint that the predicted proportion of patients given an overdose is less than or equal to .
The dose for each patient is chosen so that the predicted probability that it exceeds the MTD is .
Bayesian feasible – minimizes the predicted amount by which any given patient is overdosed.
OTHER BAYESIAN DESIGNS
2) Designs based on Bayesian decision theory:
Focused on efficient estimation and decision making by providing tools to achieve various goals
- Reducing sample size - Reducing cost
- Maximizing information
- Increasing likelihood of making a ‘correct’ decision
Use gain functions (based on desired goal) that are sequentially updated after each response.
Next dose assignment is determined by maximizing the gain function.
OTHER BAYESIAN DESIGNS
2) Designs based on Bayesian decision theory (cont.):
Whitehead & Brunier (1995) introduced a design that incorporates elements of Bayesian decision theory:
• Set of assigned dose levels
• Priors and loss functions
At each stage, a dose is selected by minimizing the asymptotic posterior variance of the MTD estimator with respect to the possible doses to be assigned.
Specifying the loss function as minimizing distance that next assigned dose is from the target quantile will yield a procedure similar to the CRM.
OTHER BAYESIAN DESIGNS
3) Bayesian D-optimal designs:
Similar to decision theoretic approaches
Concerned with both efficiency of estimation and protecting patients from being assigned to highly toxic doses.
Introduces formal optimality criterion (D-optimality) minimizing the determinant of the variance-covariance matrix of the model parameter estimates.
Constraint incorporates optimal design points and ensures that probability an administered dose exceeds the maximum acceptable dose is low.
OTHER BAYESIAN DESIGNS
3) Bayesian D-optimal designs (cont.):
The optimal allocation changes with each update of the posterior distribution.
Target the overall dose-response curve rather than the MTD only; therefore, any level of response can be estimated.
Concerned mainly with collective ethics (doing what is best for future patients) as opposed to individual ethics (doing what is best for current patients)
However, computational demands are high
PENALIZED D-OPTIMAL DESIGNS
Penalized D-Optimal designs:
Non-Bayesian designs which allow simultaneous assessment of efficacy and toxicity.
Attempt to find design that maximizes the information (collective ethics) but under control of total penalty for treating patients in the trial (individual ethics).
Similar to Bayesian D-optimal designs, the D-optimality criterion is applied at each step of the sequential trial to maximize expected increment of information about efficacy and toxicity dose response.
PENALIZED D-OPTIMAL DESIGNS
Penalized D-Optimal Designs (cont.) :
Flexibility in constraint and underlying bivariate model make approach particularly useful in early phase trials.
Scope extends beyond MTD estimation and allows a number of questions involving efficacy and safety dose-response relationships to be addressed simultaneously.
Has potential to accelerate the drug development process by combining traditional Phase I and Phase II into a single trial.
CASE STUDIES
Objective: Establish the MTD of Nalmefene that spares analgesia in an acceptable number of patients receiving epidural Fentanyl and dilute Bupivacaine for postoperative pain control.
CASE STUDIES
Doses Studied: 0.25, 0.50, 0.75 or 1.00 µg/kg Nalmefene
Toxicity: Reversal of analgesia was defined as increase in pain score of 2 or more above baseline on a visual analog scale from 0-10 after nalmefene administration.
MTD: That dose (among the four studied) with a final mean probability of reversal of anesthesia closest to 20%.
The investigators utilized the modified CRM.
Patients were treated in cohorts of one, starting with the lowest dose.
CASE STUDIES
The investigators assumed a one-parameter logistic function described the risk of ROA at the ith nalmefene dose:
In order to estimate an initial probability of ROA for each dose, a prior unit exponential distribution was chosen for .
The estimated curve was modified after the response for each subject was observed.
The MTD was determined after 25 patients were treated and evaluated for ROA.
exp 3Pr
1 exp 3
i
i
dROA
d
CASE STUDIES
Sequence of nalmefene doses over the course of the trial is shown to the right.
The modified CRM treated the last 7 and 15 of the last 17 patients at the estimated MTD.
CASE STUDIES
After 25 patients were treated, the final estimated median posterior probabilities of ROA were:
- 11% for 0.25 dose group
- 21% for 0.50 dose group
- 41% for 0.75 dose group
- 80% for 1.00 dose group
CASE STUDIES
Objective: Determine the minimum effective dose regimen (MEDR) of intravenous ibuprofen required to close ductus arteriosus in infants with a postmenstrual age of 27-29 weeks at birth.
CASE STUDIES
Doses Studied: Loading doses of 5, 10, 15, or 20 mg/kg, followed by two doses (half loading dose) at 24 hr. intervals
Efficacy: Target closure
MEDR: That dose (among the four studied) with a final mean probability of target closure closest to 80%.
The investigators utilized the CRM.
Cohorts of three consecutive patients received the same dose regimen.
CASE STUDIES
Each of the four dose levels was arbitrarily associated by the investigator with the following prior guesses of success probability: 60%, 80%, 90%, and 95%.
The one-parameter logistic model (with scale parameter fixed at 3) was chosen in order to fit the dose-response curve.
A prior exponential distribution with = 0.5 was initially chosen for the model parameter.
The dose allocated to each new cohort of patients was the dose level with updated response probability closest to the target rate of 80%, unless adverse events were observed.
CASE STUDIES
The CRM continued until one of the following were met:
A total of 20 subjects were studied
Estimated efficacy was too low for all levels
Suitable estimation of the MEDR was obtained – based on predictive gains of further patient inclusions on the response probability and width of the credibility interval
CASE STUDIES
Sequential posterior estimatedprobabilities of success of the four tested doses, updated after each new cohort is shown to the right.
Failures were recorded in 4 patients.
CASE STUDIES
After 20 patients were treated, the final estimated mean posterior probabilities of success were:
- 56% for 5 mg/kg group
- 77% for 10 mg/kg group
- 88% for 15 mg/kg group
- 94% for 20 mg/kg group
SUMMARY
Standard 3+3 designs were not designed with the intention of producing accurate estimates of a target quantile.
Rather, they are designed to screen drugs quickly and identify a dose level that does not exhibit too much toxicity.
Bayesian model-based methods (CRM, EWOC, decision theoretic approaches, etc.) provide better estimates of the MTD and dose-response curve.
SUMMARY
However, such methods are complicated to explain to non-statisticians and computationally challenging to implement.
The key to their usefulness lies in the packaging of these methods in user-friendly software that runs quickly and is well-documented.
IV. Fixed Design Dose-Response Methods
Outline:
• Traditional designs
• Multiple comparison procedures approach
• Modeling approach
• Combination methods
DIFFERENT GOALS
Establish proof-of-concept (PoC): response (typically a biomarker) changes with dose
Obtain maximum tolerated dose (MTD), or maximum safe dose (MSD) – safety driven
Estimate minimum effective dose (MED), maximum useful dose (MUD) – efficacy driven
Model dose-response relationship for efficacy, safety, or both
Fixed designs: allocation ratios are determined prior to start of trial and remain unchanged during it
TRADITIONAL DESIGNS
Choice of design will depend on study goals
Parallel groups
Patients independently randomized to dose groups, each patient receives just one dose
Inter-patient variation influences precision larger N
Most commonly used design in dose finding (DF) studies
Cross-over designs
Each patient receives all available doses, in randomized sequence (typically chosen to minimize confounding with period, previous dose, etc; e.g., Williams design, Latin squares)
Within-patient variance determines precision smaller N
Typically only used when endpoint is persistent (e.g., Asthma)
TRADITIONAL DESIGNS (CONT.)
Dose escalation
Cohorts of patients allocated sequentially to increasing doses
Safety is evaluated for current cohort, before new one started
Main goal is to estimate maximum tolerated dose (MTD)
Placebo and/or active control patients included for blinding
Titration designs
Patients are titrated to desired dose level – can be optional (e.g., based on efficacy) or forced
Optional titration designs can be challenging for dose response estimation (e.g., non-responder receiving higher doses)
Factorial designs (drug combinations)
Randomized concentration designs
MINIMUM EFFECTIVE DOSE – MED
MED is one of the key concepts in dose finding, often assumed the target dose
ICH-E4 (1996): Dose-response information to support drug registration“… smallest dose with a discernible useful effect …”
Reuberg, 1995: “… smallest dose producing a clinically important response that can be declared statistically significantly different from placebo …”
General perception: too high doses are brought into Ph. III
FDA: 20% of drugs approved between 1980 and 1999 had dose changed by more than 33% after approval (80% reductions)
ANALYSIS APPROACHES IN DF
Main strategies: (i) multiple comparison procedures (MCP) based on contrast tests of doses and (ii) modeling of dose response relationship
MULTIPLE COMPARISONS PROCEDURES
Two main goals: identification of dose response signal (PoC) and selection of target dose – both implemented via hypothesis testing
Two levels of multiplicity involved:
PoC: multiple samples – adequate global test (e.g., trend test)
Dose selection: multiple testing, multiplicity adjustment (e.g., Dunnett, Hochberg)
MCP is the most common approach used in DF studies –sample size calculations are typically based on the power to establish PoC for an assumed treatment effect
Dose is treated as a categorical variable
MCP - ADVANTAGES
Easy to implement and interpret: series of individual hypothesis tests based on contrasts between doses
Does not require much prior knowledge of dose response relationship – less sensitive to assumptions
Useful with small number of doses (e.g., 2 or 3), when modeling is not feasible
Reliable, validated software available for analysis (e.g., PROC MULTTEST in SAS)
MCP - DISADVANTAGES
Not designed for estimation of target dose, such as MED: can only select one out of doses used in trial
Does not provide information about precision of selected dose – confidence intervals not available
Including clinical relevance criterion typically difficult (emphasis is on hypothesis testing)
Does not provide information on dose response (DR)profile
MODELING
Parametric model is used to represent DR profile
Requires sufficient number of doses (typically > 3) and previous knowledge of DR shape
Dose is treated as a continuous variable
Dose response models are typically non-linear andmonotone; typical examples:
Linear, non-monotonic, and non-parametric (e.g., splines) can also be used in practice, typically when less is known
MODELING (CONT.)
Target dose estimation is done via inverse regression
PoC can be tested based on fitted model (e.g., likelihood ratio test vs. flat DR model)
MODELING – ADVANTAGES
Straightforward to estimate target doses, such as MED and MUD, which do not need to be included in study
Precision of estimated target doses can be assessed, e.g., using confidence intervals – can also be used for evaluating sample size calculations
Easy to include requirements on clinical relevance
Allows better understanding of DR, providing useful information for planning future studies (e.g., simulations)
Does not involve multiple comparisons, so multiplicity adjustment is not needed
MODELING – DISADVANTAGES
Requires prior knowledge of DR shape, if parametric model is used – more sensitive to assumptions
Difficult to use with small number of doses
Estimation and analysis are less straightforward than with MCP, especially when nonlinear models are used
Sample size calculations are more complex, generally requiring simulations
TYPICAL DOSE RESPONSE MODELS
MODEL SELECTION PROBLEM
True dose response shape is typically unknown at the time study is being planned
Choice of working model may have substantial impact on dose estimation
Current model selection approaches do not take into account statistical uncertainty associated with choice of DR model
How to combine MCP and Modeling, benefiting from the advantages of each approach?
MCP-MOD – A UNIFIED DF APPROACH
Set of candidate models
Optimal contrast coefficients
Selection of significant models while controlling FWER
Selection of a single model using max t, AIC,possibly combined with external data
Dose estimation and selection (MED, MSD,…)
MCP-MOD – OVERVIEW
DR model does not need to be specified before hand; just set of possible candidate models
Candidate models are expressed in terms of optimalcontrasts (maximize power of test when model is correct)
MCP approach used to control FWER of multiple model contrasts test used to test PoC
When PoC established, select best DR model (e.g., AIC)
Selected model used to estimate target doses, taking into account clinical relevance – estimate may not exist
Precision of target dose estimates can be assessed and used for sample size calculations
MCP-MOD – EXAMPLE
Randomized, double-blind, parallel group DF study
Placebo and four active doses: 0.05, 0.2, 0.6, and 1
100 patients per arm
Normally distributed endpoint, constant variance
All doses well tolerated – MSD > 1
Planned PoC test: step-down hierarchical procedure; preserve 5% one-sided FWER
MCP-MOD – EXAMPLE (CONT.)
What should be the MED?
EXAMPLE – CANDIDATE MODELS
Five candidate models identified: linear, linear in log-dose, Emax, quadratic, and exponential
High correlation between some contrasts (e.g., linear and linear in log-dose) less impact on multiplicity adjust.
EXAMPLE – MODEL CONTRASTS EXAMPLE – RESULTS
All contrast tests highly significant – critical value, adjusting for multiplicity = 1.93, 5% one-sided FWER
Emax model selected as best, based on AIC
Clinically relevant effect: increase of = 0.4 over placebo
Different MED estimates
pd is predicted DR at dose d, Ld and Ud are CI limits
EXAMPLE – MED ESTIMATES CONCLUSIONS
Fixed dose allocation designs are still prevalent in clinical development
Multiple comparison procedures are most commonly used approach for establishing PoC and estimating dose
Model-based methods are generally advantageous (compared to MCP), but require more assumptions
Combination methods taking advantage of the better features of MCP and modeling are available – need more experience using it, including software availability
Adaptive dose allocation methods give greater flexibility and can lead to substantial gains in efficiency
1. Multiple Comparison Methods
Combination Tests
2. Model Based Methods
Bayesian Approach
Normal Dynamic Linear Model
D-Optimal Criterion
Clinical Utility Function
V. Adaptive Dose-Response Methods for Late-Stage Exploratory Development Focus for late stage exploratory designs:
Population average dose response studies in patients
Other important considerations:
Exposure-response models
– Increase understanding of population dose response
– Adjust individual dosing in practice
– Useful to develop adaptive dose-response designs
Final conclusions on dose-response from entire database
– Not restricted to studies designed to inform about dose response
LEARNING ABOUT THE DOSE-RESPONSE
CLASSIFICATION OF ADAPTIVE METHODS
Multiple Comparison Approaches
Frequentist based
Some approaches imbed Bayesian methods within study stages
Model Based Approaches
Frequentist & Bayesian
Normal Dynamic Linear Model (NDLM, non-parametric)
D-optimal Criterion (parametric)
AdvantagesVery few (or no) assumptions about dose response shape
E.g. monotonic u1 < u2 < u3 < u4
Strong control Type I error
Disadvantages
Doesn’t leverage information across doses
NO information about what is happening between doses
Some inefficiencies for fixed designs:Typically requires high sample sizes per dose group
Feasibility limits number of doses explored
May identify if dose response exists BUT
Provides limited information on dose-response
MULTIPLE COMPARISON (MC) PROCEDURES
Objectives
Establish a dose-response relationship (trend test)
Identify dose(s) effective relative to a control (pairwise comparisons)
Approaches
Extending the classical group sequential framework
Combination function approaches (foundation in meta-analysis)
Types of adaptations:
Early termination of inferior dose(s)
Add dose(s)
Sample size reassessment for future stages
Early stopping for futility or efficacy
Seamless shift across development phases
MC FRAMEWORK FOR ADAPTIVE METHODS
Stallard & Todd (2003)
Extended classical group sequential designs to multiple treatment arms
Identify best treatment based on a maximum standardized test statistic
GROUP SEQUENTIAL DESIGNS
Approach
Trial analyzed in a series of independent stages
Very flexible
– Do NOT have to define what you will adapt in advance
– Bayesian decision theoretical approach can be used (posterior probabilities of events)
– Do have to define a-priori how you will combine the test statistics from the stages to make inference
Controls family-wise type I error rate
Adjustments are needed for inference (Posch et al. 2005)
Multiplicity adjusted p-values for dose-control comparisons
Point estimates and CI adjusted for
– Early stopping
– Treatment selection
ADAPTIVE TREATMENT SELECTION BASED ON COMBINATION TESTS
Stage wise tests
Independent observations between stages
Let p1 , p2 be p-values from stages 1 and 2 respectively
A-priori define at minimum
Combination function
Stage wise and overall alpha levels
>18 different combination functions (Becker, 1994)
Commonly used functions
Fisher’s: C(p1, p2) = p1*p2
Inverse Normal: C(p1, p2) = -w1 N-1(1-p1) - w2 N-1(1-p2)
THEORETICAL BACKGROUND
2
2
1
n
i n df
i
and X
2
22 ln ;i i i dfLet X P then X iid
01 02 1 2: Pr( )Note H H PP
(0,1)iUnder the Null P U iid
0 01 02lo :To test the G bal Null Hypothesis H H H
1
2
42; 2(ln ln ) (1 )dfHence compare p p to critical value
COMBINING P-VALUES: FISHER’S METHOD
Note P may only be approximately uniform [0,1] under the Null:
IF individual hypotheses are composite, or if responses are discrete
Jennison & Turnbull (2005); Robins, et al. (2000)
Two-stage procedure Bauer & Kieser (1999):
Weaker condition:
Distribution of P1 and conditional distribution of P2|P1
stochastically larger than or equal to the uniform distribution on [0,1]
ON DISTRIBUTION OF P-VALUE
Jennison & Turnbull (2005)
Inverse Normal: C(p1, p2) = -w1 N-1(1-p1) - w2 N-1(1-p2)
Historically
Mosteller & Bush (1954): generalization based on fixed weights
– If
– Then
Interpretation concern
– Weighting patient information unequally based on stage
INVERSE NORMAL FUNCTION
1
1
1( ... ) (1 ) (0,1)k k kZ Z where Z N p N
k
2
1
1K
k
k
w
1
(0,1)K
k k
k
w Z N
Let
Then the z-statistic for pooled data equals:
Combination test statistic equals the z-statistic for pooled data
– Invariant to partitioning of the data
– A function of the sufficient statistic (efficient)
– Sample size of stages must be fixed
Note: In general, the number of stages & weights can be adapted for K > 2
Fisher (1998) “variance spending”
– Spend of variance of Z statistic: study ends when sum is 1
INVERSE NORMAL FUNCTION (cont.)
1
K
k k
k
Z w Z
kk
nw
N
2
kw
Bauer & Kohne (1994)
The pre-specified combination test needs to be followed
Properties only hold if followed
e.g. Cannot decide to treat Stage 1 as internal pilot (even if no adaptation is made for Stage 2)
“Protocol has to describe which types of adaptation are
intended.”
Conclusions depend on types of adaptations
Ad-hoc adaptations (even if family-wise Type I error preserved) can make interpretation difficult
Estimates may be biased or intractable
POTENTIAL ABUSES
Combines
Closed testing procedure
A multiplicity adjustment procedure
– e.g. Bonferroni-Holm min P, Simms
A combination test procedure
– Fishers, Inverse Normal
Same approach can be used for seamless designs
Stages can span Phase II/III
APPLICATION TO DOSE FINDING
Following example taken from PhRMA Adaptive Design Working Group training presentation
Titled: Adaptive Seamless Designs for Phase IIb/III Clinical
Trials
Author: Jeff Maca, Ph.D., Novartis
Full set of this and other training slides can be found at the following open access WEB site:
http://biopharmnet.com/doc/doc12004.html
AN EXAMPLE
Closed test procedure
• n null hypotheses H1, …, Hn
• Closed test procedure considers all intersection hypotheses.
• Hi is rejected at global level ifall hypotheses HI formed by intersection with Hi arerejected at local level
H1 can only be rejected
at =.05 if H12 is also
rejected at =.05
Source: Jeff Maca
CLOSED TESTING
• A typical study with 3 doses 3 pairwise hypotheses.
• Multiplicity can be handled by adjusting p-values from each stage using Simes procedure
iSi
S pi
Sq min S is number of elements in Hypothesis,
p(i) is the ordered P-values
Source: Jeff Maca
CLOSED TESTING (cont.)
Stage sample sizes: n1 = 75, n2 =75
Unadjusted pairwise p-values from the first stage:
p1,1= 0.23, p1,2 = 0.18, p1,3 = 0.08
Dose 3 selected at interim
Unadjusted p-value from second stage: p2,3 = .01
Source: Jeff Maca
SCENARIO: DOSE FINDING 3 DOSES & CONTROL
q1,123 = min( 3*.08, 1.5*.18, 1*. 23)= .23
q2,123 = p2,3 = .01
C(q1,123, q2,123) = 2.17 P value = .015
Source: Jeff Maca
THREE-WAY TEST
q1,13 = min( 2*.08, 1*. 23)= .16
q1,23 = min(2*.08,1*.18) = .16
q2,13 = q2,23 = p2,3 = .01
C(q1,13, q2,13) = C(q1,23, q2,23) = 2.35 P.value = .0094
Source: Jeff Maca
TWO-WAY TEST
q1,3 = p1,3 = .08
q2,3 = p2,3 = .01
C(q1,13, q2,13) = C(q1,23, q2,23) = 2.64 P.value = .0042
Conclusion: Dose 3 is effective
Source: Jeff Maca
FINAL TEST
Assess design options/power via simulations
Power is a function of unknown dose response
In two stage approach with fixed sample sizes, inverse normal combination function is efficient
Model based approaches may be more efficient (but also more complex)
Resulting estimates can be biased
Recommend assessing via simulation
Last resort, use the last stage for estimation purposes
DO NOT ABUSE
Follow required pre-specified rules
Describe possible adaptations in protocol
RECOMMENDATIONS
ASSUMES a functional relationship between the dose and response
Parametric & Non-parametric model-based approaches
Estimates, such as ED95, inferred from the model
Potential inefficiencies with fixed dose design
Provides limited information on dose-response
– Same number of patients assigned to each dose
Often high likelihood doses selected a-priori are not optimal
Unlikely to identify at predetermined levels of precision, e.g.,MED, ED95
MODEL BASED APPROACHES
Objectives
Estimate dose-response
Identify optimal (target) dose(s)
Modeling components
Parametric or non-parametric
Prior distributions on model parameters
Decision making components
Dose allocation
Stopping rules
Highly flexible
BAYESIAN APPROACH
Objectives:
Identify target dose (ED95)
Estimate dose response
Modeling Component:
NDLM
Decision Making Components:
Dose Allocation Rule
Model-based optimization criteria
Stopping Rules
Decision analytic
EXAMPLE: ASTIN (Krams et al. 2003)
Objective: Allocate patients to maximize information about ED95
Maximize Utility Function
Minus the variance of the predicted mean response at the ED95
Includes uncertainty in ED 95 dose & in the dose response
Function of future patient data
Determining next patient assignment
Calculate the expected utility for each possible dose assignment
– Expectation over the posterior predictive distribution for the data yet to be observed
– Ongoing patient data predicted from earlier data using a longitudinal model (that gets updated during the study)
– Assume next patient is last patient
Assign dose that is expected to result in the smallest variance
– Randomly across doses within 5% of optimal dose
DOSE ALLOCATION RULE
Function of the posterior mean and variance of ED95
Stop for efficacy:
Lower bound of 80% credibility interval that the change relative to placebo >2 points for the ED95
Minimum of 250 evaluable patients
Stop for futility:
Upper bound of 80% credibility interval that the change relative to placebo < 1 point for the ED95
Minimum of 500 evaluable patients
Maximum sample size = 1300
STOPPING RULES
West and Harrison (1997): Bayesian Forecasting and Dynamic Models
A piece-wise linear model
Smoothed transitions in the dose-response slope across the doses
Does not restrict the shape of the dose response curve
Developed for analysis and forecasting of time series data
Other non-parametric models
Splines, Kernel Methods
NORMAL DYNAMIC LINEAR MODEL
Assumptions
Response at each dose normally distributed about a mean
Change in mean between adjacent doses can be predicted by a simple linear model
Variability decomposed into two components
Observational variability for the patient response about the mean for the given dose
System variability around the linear model that relates the adjacent means
NDLM (cont.)
Let Rik be the ith patient response at dose k, and Dk represent the kth
dose with mean µk
Observation Equation:
NDLM (cont.)
2| ~ (0, )ik k k ik ikR D where N
System Equations:2
1 1 ~ (0, )k k k k kwhere N H
2
1 ~ (0, )k k k kwhere N H
Priors placed on:
µi (mean at dose i) H (smoothing parameter)
(slope parameters) 2 (observational variance)
Neuropathic Pain
Minimum Clinical Significance:
Average Daily Pain Score (ADPS)
Ranges (0 no pain, 10 worst pain)
1.5 difference from placebo change from baseline
Design PoC study to select future dose(s) Phase III
12 fold dose range
Dose-response unknown…may be inverted-U shaped
Positive control desirable for assay sensitivity
Too costly to explore dose-range?
CONSIDER
Re
sp
on
se
Dose
NotInformative
Informative
FIXED DOSE DESIGN
• Pfizer: Smith, Jones, Morris, Grieve, Tan (2006)
1 wk
Lead-In
4 wks
Double Blind Treatment
1 wk
Follow-up
7 Doses
Positive Control
Placebo
Max n=35 per arm
Type I error < 5%
Power ~ 80%
ADAPTIVE PoC CASE STUDY
Decision Making Components:Dose Allocation Rule
– Initiate all 9 arms (equal allocation)
Stopping Rules
– Actions
– 2 interim analyses
– Drop up to 2 non-efficacious arms at each look
– Stop the study early if all doses non-efficacious
– Don’t stop early for efficacy (gather more information)
– Decision rules
– Futility at dose Pr ( Effect at dose < 1.5 ) > 0.80
– Worth continuing if Pr (Effect at dose > 1.5) > 0.80
Modeling Component:
Normal Dynamic Linear Model
ADAPTIVE FEATURES
Trial stopped at first interim
Flat dose-response
Approximately $2M saved due to stopping early
Rough comparison to fixed design with pairwise comparison of each dose to placebo
Approximately 3-4 times larger
– No early stopping
– Controlling for multiple comparisons
– Type I error, 1-sided, 10%
RESULTS
Targets estimation of the overall dose-response
Formal optimality criterion (D-optimal)
Minimize determinant of the variance covariance matrix of the model parameter estimates (maximizes information)
Allocates patients (sequentially or group sequentially) to provide the most information
Typically keep allocation of placebo constant
Wide class of models are applicable
e.g. Four-parameter logistic model
D-OPTIMAL
One approach (ASRS WG white paper)
Allocate equally across doses for first cohort
Fit model
Based on this model, determine optimal allocation ratio for next cohort of patients to maximize information
Bayesian D-Optimal
Place a prior distribution on the model parameters
After each cohort, calculate the posterior distribution
Similarly, update allocation ratio to minimize determinate of the variance co-variance matrix of model parameters
D-OPTIMAL (cont.) 4-PARAMETER LOGISTIC MODEL
4
1 22
3
1 ( )i i
i
RD
Patient indicator
Patient response
Level of drug
Response at 0 drug
Max. attributable effect of drug + 1
Dose producing response half way between 1 and 2
Related to steepness of slope
Random error for patient I {often iid N(0,1) }
1
2
3
4
i
i
i
i
R
D
1-1 Comparison to Emax Model
4 < 0 4 > 0
2 = E0 2 = E0
1 - 2 = Emax 1 - 2 = Emax
3 = ED50 ( 3)-1 = ED50
- 4 = Hill Coef 4 = Hill Coef
(Di)-1= Di
50
max0
ii i
i
D ER E
D ED
Requires monotonicityIncreasing or decreasing
Minimum of 5 doses desirable4-parameter model
If highest dose < ED95
Estimates of Emax, ED50, and Hill Coefficient (gamma) impacted
– High coefficient of variation & bias
Fit in data range usually good
Bayesian approachStrong priors might be assumed for Emax if highest dose thought to be less than ED95
4-PARAMETER LOGISTIC MODEL (cont.)
How many patients to assign per cohort
Doses to include
Sample size
Fixed
Information driven (select criteria for determinate)
Include stop for futility
Likelihood the trend test will not be statistically significant
Likelihood that effect of each dose is less than some threshold
DESIGN QUESTIONS TO EXPLORE THROUGH SIMULATION
Phase II dose ranging study
Schizophrenia
Objective
Confirm positive POC study
– 3 arm: High dose, Active (assay sensitivity), Placebo
Explore lower doses
Determine dose(s) Phase III
Dose range 8 fold
4 doses
Primary Measure
PANSS total score at 6 weeks
EXAMPLE: ADAPTIVE DESIGN NOT RECOMMENDED
Subjective primary measure / 20 sites
Significant effect due to site
Desirable to stratify by site
Long term outcome relative to expected enrollment rate
No biomarker
Narrow dose range well covered by 4 doses
High dose effective, but may not be near Emax
STUDY CHARACTERISTICS
Fixed design with equal allocation
Adaptive allocation
Bayesian D-Optimal Criterion (4 parameter logistic model)
– Allocation adapts to increase efficacy of estimates of model parameters
– Target is the overall dose-response curve
Stopping Rules (4 interim analyses)
Stop for Futility
– If predicted mean difference high dose vs placebo > -5, with 95% confidence
Stop for Efficacy
– If predicted mean difference low dose vs placebo < 0, with 95% confidence
DESIGNS COMPARED
No compelling advantage to adaptive randomization over fixed allocation
Adaptive randomization favored ~ equal allocation
– Slightly more on placebo, slightly less on lower doses
Fixed design would be slightly more powerful for pairwise comparisons with unequal allocation & not effect dose-response estimation adversely (2:1:1:1:1)
Perfect information was assumed in simulations for adaptive allocation
– Lag between patient outcome data & enrollment worsens performance
Use of parametric dose-response model (unknown dose-response)
Additional resources/complexity not warranted
RECOMMENDATION
Decision theoretic approach to choice of dose
Doses comparable on the utility index scale
Maximize utility
Quantify benefit risk / tradeoffs
Incorporate both efficacy & safety measures
Subjective
Requires development of subjective value functions
Requires development of subjective weights to define the importance of each measure to the decision
Functional form can be Additive or Multiplicative
Can be complex to interpret
UTILITIES
Change in CGI-Severity
Value
0.0
1.0
0.8 Value Function
Efficacy: CGI-Severity
EXAMPLE: VALUE FUNCTION
INDEX: .5*.25 + .8*.35 + (.3*.6 + .9*.4)*.4 = .621
Weights: 0.25 0.35 0.4
Attributes: Health Outcome Efficacy Safety
Weights: 1 1 0.6 0.4
Sub-Attributes: Weight Loss CGI-S AE QT
Value: 0.5 0.8 0.3 0.9
Note on Multiplicative Utility:
Similar to above (can incorporate weights directly into the value function)
Define value function to go to 0 value quickly if undesirable trait (e.g. safety concern)
EXAMPLE: ADDITIVE UTILITY INDEX
Consider routinely assessing appropriateness of adaptive designsin exploratory development
Asses potential gains against those of standard fixed designs
Balance complexity with potential gains
Trial simulations typically needed
Fine tune design
Assess operating characteristics
Recommended even when ONLY considering a fixed trial design
Consider Seamless PoC/Phase 2 dose-response studies
Recommend model based approaches
More informative of dose response profile than MC
Critical to assess model assumptions
Non-parametric models less restrictive
CONCLUDING REMARKS
VI. Simulations Comparing Adaptive and Non-Adaptive DF Methods
Outline:
• Evaluating statistical operational characteristics of complex DF designs and methods
• Performance metrics and graphical displays
• Comparing DF designs and methods: PhRMA’s Adaptive Dose Ranging Studies working group simulation study
• Conclusions from ADRS WG simulations
MOTIVATION
Evaluation of operational characteristics (OCs) of proposed statistical methods is a critical step in designing a clinical trial – comparison of methods
The OCs include the power to detect signals of interest, the precision of estimates for quantities of interest, expected duration, etc in particular, used to determine sample size and number of arms
Complexity of adaptive dose finding designs and other non-traditional dose finding methods typically no closed form expressions for OCs metrics
Simulation-based evaluation needs to be employed
KEY GOALS OF DF TRIALS
Typical goals of Phase II trials:
Determine evidence of dose response (DR) signal, i.e., if average response changes with dose level – proof-of-concept (PoC)
Select target dose(s) for confirmatory phase – typically MED; other targets also used (e.g., maximum useful dose)
Estimate DR profile – usually for efficacy, but safety of increasing interest
These goals determine the design of the study and the operational characteristics that need to be evaluated
SIMULATING DF TRIALS
Trial simulation is the main tool for evaluating study OCs; itneeds to properly incorporate multiple factors in study:
Type: parallel groups, cross-over, titration, etc
Available doses, inclusion of active control(s)
Dose allocation scheme (fixed vs. adaptive)
If adaptive, frequency and timing of adaptations (and algorithm for recalculating allocation ratios)
Dose response profile(s):
more than one should be used to assess sensitivity
flat dose response should be included to assess Type I error andimpact on dose selection
SIMULATING DF TRIALS (CONT.)
Response variables: type (e.g., continuous, binary, count, ordinal); distribution (e.g., normal, Poisson)
Possible covariates and their role in DR model
Longitudinal measurements per patient (when, how many)
Variance and covariance parameters (e.g., within- and between patient variances, between-site variances)
Sample size (e.g., expected, maximum)
Drop-out and missing data models (e.g., time to drop-out)
Patient accrual process (e.g., rates, uniformity over time)
Stopping rules, if any (e.g., futility, efficacy)
SIMULATING DF TRIALS (CONT.)
Number of simulations (need to take into account desired precision for OCs estimates)
Statistical analysis methods:
testing for DR signal
selecting target dose(s) – may need target clinical effect
estimating DR profile
Sensitivity analysis: impact of changes in assumed parameters/models/design on OCs (highly recommended)
Choice of software: general purpose (e.g., Trial Simulator) vs. customized (e.g., R or S-PLUS suite of functions)
PERFORMANCE METRICS
Used to quantify performance of different designs and methods with regard to key study goals
1. Detecting DRPr(DR) = probability of identifying DR (usual power for sample size calculations in Phase II trials)
estimate = % simul. trials for which DR was detected
2. Dose selectionPr(dose) = probability of selecting a dose at end of trial- dose selection also based on clinical relevance of effect- Pr(dose) Pr(DR), typically different
estimate = % simul. trials for which dose was selected
DOSE SELECTION METRICS
Let and represent the target dose and its estimate
Bias =
pBias = % Bias =
pError = % Error =
Expected value E(.) estimated by the simulation averages
For methods based on hypothesis testing (e.g., Dunnett),is typically one of the doses in the study,
For model-based methods, it takes values on a continuous scale (typically within the dose range for the trial)
Can also define Bias, pBias, and pError for target effect andthe effect associated with
argtdarg
ˆtd
argarg )ˆ( tt ddE
argargarg /)ˆ(100 ttt dddE
argargarg /)ˆ(100 ttt dddE
argˆ
td
argˆ
td
TARGET DOSE INTERVAL
Doses with an effect within ± 100p% of target effect
Example for p = 0.1 (i.e., ± 10% of target effect)
target dose interval
DOSE RESPONSE METRICS
For pre-defined grid of doses d1, d2, …, dK in range of interest, let be the corresponding expected responses andthe estimated responses
APE: average prediction error =
pAPE: % APE wrt target effect = 100APE/
PEQ(q): prediction error quantile of order q =e.g., median prediction error
expected values estimated by simulation means, quantiles estimated by simulation quantiles
K,,, 21
Kˆ,,ˆ,ˆ
21
KEK
i ii /ˆ1
qiiˆ
GRAPHICAL DISPLAYS
importance of conveying relevant info via plots\
histograms
dotplots
barplots
Sample DR curves
Trellis plots to combine information
GRAPHICAL DISPLAYS
Conveying relevant information in concise and efficient wayis a critical step in simulation report
Graphical displays are well-suited for this purpose, but must be chosen appropriately
Because simulations usually include many different combinations of scenarios (e.g., sample size, number of doses) and methods, Trellis displays are particular useful in presenting information
Will describe and illustrate various efficient graphical displays in the context of PhRMA WG simulations
PhRMA ADRS WG
Adaptive Dose Ranging Studies (ADRS) working group (WG): one of 10 Pharmaceutical Innovation Steering Committee (PISC) WGs
Formed as result of BCG survey to identify key drivers of poor performance in pharmaceutical industry poorunderstanding of DR indicated as one of leading causes for high attrition in late development.
Close collaboration with Novel Adaptive Designs PISC WG
ADRS WG: TEAM MEMBERS
• Alex Dmitrienko, Eli Lilly
• Amit Roy, BMS
• Beat Neuenschwander, Novartis
• Björn Bornkamp, U. Dortmund
• Brenda Gaydos, Eli Lilly
• Chyi-Hung Hsu, Novartis
• Frank Bretz, Novartis
• Frank Shen, BMS
• Franz König, U. Vienna
• Greg Enas, Eli Lilly
• José Pinheiro, Novartis
• Michael Krams, Wyeth
• Qing Liu, J&J
• Rick Sax, AstraZeneca
• Tom Parke, Tessella
ADRS WG: GOALS AND SCOPE
Investigate and develop designs and methods for efficientlylearning about efficacy and safety DR profiles benefit/risk profile
Evaluate statistical OCs of alternative designs and methods (adaptive and fixed) to make recommendations on their use
Increase awareness about ADRS, promoting their use, when advantageous
Comprehensive simulation study comparing ADRS to other DF methods, quantifying potential gains
SUMMARY OF DESIGN AND ASSUMPTIONS
Proof-of-concept + dose-finding trial, motivated by neuropathic pain indication (conclusions and recommendations can be generalized)
Key questions: Whether these is evidence of dose response and, if so, which dose level to bring to confirmatory phase and how welldose response (DR) curve is estimated.
Primary endpoint: Change from baseline in VAS at Week 6 (continuous, normally distributed)
Dose design scenarios (parallel arms):
- 5 equally spaced dose levels: 0, 2, 4, 6, 8
- 7 unequally spaced dose levels: 0, 2, 3, 4, 5, 6, 8
- 9 equally spaced dose levels: 0, 1, …, 8
Significance level: one sided FWER = 0.05
Sample sizes: 150 and 250 patients (total)
DOSE RESPONSE PROFILES DF METHODS USED IN SIMULATONS
Traditional ANOVA based on pairwise comparisons and multiplicity adjustment (Dunnett)
MCP-Mod combination of multiple comparison procedure (MCP) and modeling (Bretz, Pinheiro, and Branson, 2005)
MTT: novel method based on Multiple Trend Tests
Bayesian Model Averaging: BMA
Nonparametric local regression fitting: LOCFIT
GADA: Dynamic dose allocation based on Bayesian normal dynamic linear model (Krams, Lee, and Berry, 2005)
D-opt: adaptive dose allocation based on D-optimality criterion
TARGET DOSE INTERVALS
Target clinical effect: = -1.3 units (reduction in VAS)SELECTED SIMULATION
RESULTS
More detailed results given in the ADRS WG’s White Paper, available at http://biopharmnet.com/doc/doc12005.html
POWER TO IDENTIFY DR DOSE SELECTION UNDER FLAT DR
DOSE SELECTION UNDER ACTIVE DR CORRECT TARGET DOSE INTERVAL
DOSES SELECTED – LOGISTIC, N=150 DOSES SELECTED – UMBRELLA, N=150
% AVG. PREDICTION ERROR, N = 150 SAMPLE PRED. DR – LOGISTIC, N=150
SAMPLE PRED. DR – UMBRELLA, N=150 ADRS WG CONCLUSIONS
Detecting DR is considerably easier than estimating it
Current sample sizes for DF studies, based on power to detect DR, are inappropriate for dose selection and DR estimation
None of methods had good performance in estimating dose in the correct target interval: maximum observed percentage of correct interval selection – 60% larger N
needed
Adaptive dose-ranging methods (i.e., ADRS) lead to gains in power to detect DR, precision to select target dose, and to estimate DR – greatest potential in the latter two
ADRS WG CONCLUSIONS
Model-based methods have superior performance compared to methods based on hypothesis testing
Number of doses larger than 5 does not seem to produce significant gains (provided overall N is fixed) trade-offbetween more detail about DR and less precision at each dose
In practice, need to balance gains associated with adaptive dose ranging designs against greater methodological and operational complexity
VII. Overall Conclusions and Recommendations
CONCLUSIONS & RECOMMENDATIONS
Adaptive, model-based dose finding designs should be routinely considered for use in drug development (Early Development, PoC, Dose Ranging) can lead to substantial gains in efficiency over traditional methods
Dose assignment algorithm should be prospectively andclearly specified in study protocol
Trial simulations should be used to fully evaluate operational characteristics of design prior to study start
Seamless approaches should be considered to improve efficiency, especially between PoC (Ph. I/IIa) and dose ranging (Ph. IIb)
CONCL. & RECOMMENDATIONS (CONT.)
Sample size calculations for adaptive DF designs should take into account the precision of target dose estimates and, more broadly, the accuracy of the decision(s) to be made from the study
Early stopping rules, for efficacy and safety, should be implemented, when feasible, to allow greater efficiency gains in adaptive design
Potential gains associated with adaptive approaches should always be contrasted to additional complexity and costs related to their implementation – not a panacea
CONCL. & RECOMMENDATIONS (CONT.)
Greater usage of these adaptive DF designs should be encouraged and will require:
• Good quality software packages with well documented code and examples for implementing approaches and conducting simulations needed to evaluate operating characteristics of these methods.
• A greater understanding of the strengths and weaknesses of these approaches (hopefully, this course has helped out along this regard)
• More published examples of studies that have utilized these methods.
Practical Considerations
PRACTICAL CONSIDERATIONS
Assessing projects for an adaptive design:
• Rapid acquisition of data relative to enrollment rate
- Outcome is more immediate (and accurately) observable
- Trials of longer duration, with relatively slow recruitment, canbe good candidates
• Existence of predictive biological models and/or prior information in patient population of interest
- Predictive models for longer term outcome
PRACTICAL CONSIDERATIONS
Assessing projects for an adaptive design (cont.):
• Ethical considerations as driver for adaptation
• A high exploratory aspect may indicate greater efficacy gains
- Uncertainty relative to e.g., dose, variability, effect size
- Wide dose range
• Caution: Assuming the patient population remains constant over time
- Trials of long duration
- Selection bias due to unblinding of information
PRACTICAL CONSIDERATIONS
Considerations for simulations:
• Leverage information from disease state and exposure-response models
- Selecting dose-response model
- Defining prior distributions for model parameters
- Development of adaptive algorithm (decision criteria)
- Trial simulations to assess design performance
• Optimize over PD response model (best guess of truth)
• Assess sensitivity to using other response models (include models different from the design dose-response model)
PRACTICAL CONSIDERATIONS
Considerations for simulations (cont.):
• Understand impact of enrollment rate
- Include different rates in simulations
- Consider controlling rate if simulations indicate gains
• Assess impact across different dropout models
• Include information lag (e.g. batches) in simulations
• Demonstrating control of Type I error rate
- Simulate over a grid of scenarios in the null space
- Simulate across various dropout & enrollment rate models
PRACTICAL CONSIDERATIONS
Practical Issues for implementation:
• Additional time to develop the design & protocol
- May need to run extensive simulations to understand operating characteristics
- Communicate with the primary investigator’s about the design, receive feedback, and address concerns
• Clinical trial material needs
- Dosage strengths, quantity, packaging
• Additional resources for modeling & data analysis
- Interim data preparations & analyses
- Final analysis more complex
PRACTICAL CONSIDERATIONS
Practical Issues for implementation (cont.):
• Increase in site communication
- Design changes / Patient treatment assignments
- Fax, Interactive Voice Response Systems, WEB interface
- Additional site training
• Determine type of committee needed to monitor trial
- Ensure protocol is followed (no programming errors)
- Unanticipated safety signals not accounted for in the adaptive algorithm
- Engage committee early in scenario simulations (prior to protocol approval)
PRACTICAL CONSIDERATIONS
Practical Issues for implementation (cont.):
• Determine what data will be needed
• How data will be collected?
- Electronic data capture, Expedited report forms (not monitored), Voice Response system, Excel Spreadsheet
- eDC systems not friendly for interim data extraction
• How clean data needs to be?
- Fully verified (locked) data is not typical
- Use latest data in modeling/analysis (continually clean data)
• Document, Document, Document!!!