assessment of audit methodologies for bias evaluation...
TRANSCRIPT
Assessment of Audit Methodologies for Bias Evaluation of Tumor Progression in Oncology Clinical Trials (Running title: Evaluation of Audit Methodologies)
Jenny J Zhang Ph.D.1, Lijun Zhang Ph.D.1, Huanyu Chen Ph.D.1, Anthony J Murgo M.D.3, Lori E Dodd Ph.D.2, Richard Pazdur M.D.3, Rajeshwari Sridhara Ph.D.1* 1U.S. Food and Drug Administration, CDER/OTS/OB/DBV, Silver Spring, MD 20993 2Biostatistics Research Branch, National Institute of Allergy and Infectious Disease, NIH,
Bethesda, MD 20892 3U.S. Food and Drug Administration, CDER/OND/OHOP, Silver Spring, MD 20993 *Corresponding author: [email protected]; phone: (301) 796-1759; fax: (301) 796-
9733
Disclaimer: This article reflects the views of the authors and should not be construed to
represent the FDA’s views or policies.
Conflicts of interest: None.
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
2
Abstract
As progression-free survival (PFS) has become increasingly used as the primary endpoint
in oncology phase 3 trials, the Food and Drug Administration (FDA) has generally
required a complete-case blinded independent central review (BICR) of PFS to assess and
reduce potential bias in the investigator or local site evaluation (LE). However, recent
publications and FDA analyses have shown a high correlation between LE and BICR
assessments of the PFS treatment effect, which questions whether complete-case BICR is
necessary. One potential alternative is to use BICR as an audit tool to detect evaluation
bias in the LE. In this paper, the performance characteristics of two audit methods
proposed in the literature are evaluated on 26 prospective, randomized phase 3
registration trials in non-hematologic malignancies. The results support that a BICR
audit to assess potential bias in the LE is a feasible approach. However, implementation
and logistical challenges need further consideration and discussion.
Keywords
Blinded independent central review (BICR); Progression-free survival (PFS); Audit
methodology; Bias
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
3
Introduction
Progression-free survival (PFS) is defined as the time from randomization to either
disease progression or death, whichever occurs first. When PFS is the primary efficacy
endpoint of a clinical trial, the Food and Drug Administration (FDA) and other regulatory
authorities have generally required a blinded independent central review (BICR) under
the assumption that the investigator or local site evaluation (LE) could potentially be
biased due to the subjectivity in the measurement and interpretation of PFS. However,
this approach may lead to a greater than 30% disagreement at the patient-level between
the BICR and LE assessments and/or among the independent reviewers themselves.
These disagreements have been attributed to a variety of reasons, including selection of
different target lesions by the reviewers (1). In addition, since treatment is generally
changed after LE-determined progression resulting in no further protocol-specified
progression assessments before the BICR is conducted, missing data and informative
censoring are limitations of BICR-determined PFS analyses that may result in biased
treatment effect estimates (2). Informative censoring results when patients declared to
have progressive disease by the LE are censored by the BICR due to lack of further tumor
assessments after LE-determined progression. Note that LE assessments may include
local radiographic reads and/or other clinical assessments. BICR radiologists are not part
of the clinical trial investigation, typically do not have information about clinical
assessments, and are blinded to treatment assignment.
The role of BICR was first examined by Dodd and colleagues in 2008 in six phase 3
oncology trials (2), where differences between LE and BICR did not result in different
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
4
conclusions about treatment efficacy, despite relatively high discrepancy rates at the
patient-level. The Pharmaceutical Research and Manufacturers of America (PhRMA)
PFS Working Group performed a meta-analysis of 27 trials, which showed that while
discrepancies in determining the progression dates were observed on average in 50% of
patients, the relative treatment effect as measured by a hazard ratio (HR) was similar
when assessed by either LE or BICR (3). To further confirm these results, the FDA
conducted a meta-analysis of 28 randomized, phase 3 trials submitted for review in
consideration of approval across 9 non-hematologic malignant tumor types with BICR-
and LE-assessed PFS results reported (4). Note that there is some overlap of the trials
included in the FDA analysis with that of the analyses in Dodd et al. 2008 (2) and Amit et
al. 2011 (3).
The FDA analysis showed that there existed a high degree of association between LE and
BICR estimates of the PFS treatment effect (r = 0.954 [95% confidence interval (CI):
0.908, 0.977]). The overall ratio of hazard ratios (BICR vs. LE) was 1.03 (95% CI: 0.99,
1.07), indicating only a 3% difference between the two evaluations. Throughout, a
hazard ratio less than 1 indicates a treatment effect in favor of the experimental therapy.
Subgroup analyses of blinded versus open-label trials, interim versus final analysis results,
and first line versus subsequent line indications all showed similarly high correlations (4).
Although an inherent measurement error exists in the reading of radiographic scans and
disagreements between reviewers at the patient-level are commonly observed, regulatory
considerations for drug approval are based on the relative treatment effect at the
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
5
population level. Given that the FDA meta-analysis corroborated the high degree of
association between LE and BICR PFS treatment effects purported in recent publications,
complete-case BICR may not be necessary in many oncology trials. These findings
motivate the exploration of a random sample-based BICR audit as an alternative method
for bias evaluation to assess whether or not consistency of the treatment effect can be
concluded. .
The idea behind the audit strategy is to increase our confidence in the LE results of PFS
by conducting a BICR in a random sample of patients. The main savings of such a
strategy lies in the situation where there is no actual bias in the LE results and only a
partial BICR audit is needed to confirm that fact. Other potential benefits include: a
reduction in trial complexity and burden to investigators, and avoidance of some missing
data issues. Currently, two methods have been proposed in the literature for a random
sample-based BICR audit.
In 2011, Dodd and colleagues (5) proposed a two-stage procedure based on a hazard ratio
estimator to evaluate the consistency of treatment effect (as measured by a hazard ratio)
between the BICR audited assessments and the LE assessments that is more efficient than
the standard estimator based on the audit subset alone. Amit and colleagues (3) proposed
a procedure based on differential discordance rates of LE versus BICR between the
treatment and control arms. These two approaches will be respectively referred to herein
as Method A and Method B. The objective of this paper is to evaluate the performance
characteristics of these two audit methods.
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
6
Methods
Brief summaries of the two audit methods are given below. We refer the reader to the
original publications by Dodd et al. 2011 (2) and Amit et al.2011 (3) for more details.
The goal of Method A is to provide assurance that the LE PFS treatment effect estimate is
unlikely to be substantially biased. Thus, a BICR audit should only be considered when
the LE hazard ratio indicates a clinically meaningful and statistically significant effect in
favor of the experimental arm. The method is a two-stage procedure. First, the BICR-
based hazard ratio is estimated on the audited subset. Audited subjects were selected as a
simple random sample from all subjects in the study. If the hazard ratio is confirmed to
be significant, then the process concludes. If not, then a complete-case BICR is
conducted. A more efficient estimator of the BICR hazard ratio simultaneously
incorporates information from the patient-level LE data on all cases and the retrospective
random sample BICR audit cases. A formula to estimate the audit size is also provided,
which depends on the effect size and the minimum important difference (MID). The
MID is a threshold value (e.g. HR = 1) used in the proposed two-stage testing procedure.
The upper bound of the confidence interval of the BICR hazard ratio estimate is
compared to the MID to determine whether consistency of the PFS treatment effect has
been verified (5).
The basis of Method B is to use differential discordance as a measure to detect evaluation
bias. Two measures are defined in Table 1: the early discrepancy rate (EDR) is the
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
7
frequency that the LE declares progression earlier than BICR, and the late discrepancy
rate (LDR) is the frequency that the LE declares progression later than BICR. The
differential discordance for each measure is the difference between the rate on the
experimental arm and that on the control arm. A differential discordance beyond a
certain threshold is suggestive of bias being present in the LE assessment, although the
method does not quantify the uncertainty in estimation of the differential discordance. To
implement the method, a threshold of differential discordance that triggers complete-case
BICR is required; increasing the threshold decreases the sensitivity of the method to
detect bias, while improving its specificity to rule it out. Threshold values ranging from
0.075 to 0.100 and BICR audit sizes of 100-160 patients were recommended based on
simulation studies in Amit et al. 2011 (3). A negative differential discordance for EDR
and/or a positive differential discordance for LDR are indicative of bias in the LE result
in favor of the experimental arm, which would trigger a complete-case audit. A negative
differential discordance for EDR means a higher rate of LE progressions being called
earlier than BICR on the control arm, and a positive differential discordance for LDR
means a higher rate of LE progressions being called later than BICR on the experimental
arm (3).
EDR and LDR are calculated as
ba
abEDR
++= 3
and 32
2
aacb
acLDR
++++= using the cell values from Table 1.
The following hypothetical example will be used to illustrate the calculations. For
example, in the treatment arm of a study, a total of 90 patients were assessed to have PD
by both BICR and LE (cell ‘a’ in Table 1), of which 50 patients had agreement between
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
8
BICR and LE on the timing and occurrence of PD (a1), 25 patients had LE declaring PD
later than BICR (a2), and 15 patients had LE declaring PD earlier than BICR (a3). A
total of 30 patients were assessed to have PD by LE but not BICR (cell ‘b’ in Table 1),
and 25 patients were assessed to have PD by BICR but not LE (cell ‘c’ in Table 1).
Seventy patients did not have PD by either LE or BICR assessments (cell ‘d’ in Table 1).
According to the formulas in Table 1, the corresponding EDR and LDR for the
experimental arm are (30+15)/(50+25+25+30) = 45/120 = 0.375 and
(25+25)/(30+25+25+15) = 50/95 = 0.526, respectively. Similar calculations are needed
for the control arm. The differential discordance for EDR/LDR would be the difference
between arms for each rate.
The performance evaluation of these two proposed audit methods is based on 26
randomized, superiority phase 3 trials submitted for review in consideration of approval
in non-hematologic malignancies across 9 tumor types (Table 2). Note that these 26 trials
are a subset of the 28 trials included in the FDA meta-analysis reported in Zhang and
colleagues (4) with adequate data for evaluation to determine whether a random sample-
based BICR audit is a viable alternative to a complete-case BICR with respect to its
ability to detect bias in the LE. As a result of several trials having multiple cohorts or
multiple treatment arms, the number of analysis units or randomized comparisons was
greater than the number of trials (i.e. 31 instead of 26). Trials with multiple treatment
arms were weighted accordingly to account for correlation.
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
9
Table 3 provides a summary of the performance evaluation strategy and summary
measures of the two audit methods. Since all trials evaluated had a complete-case BICR
conducted, random sample audits are performed 10,000 times for each trial to assess the
performance of Method A. Summary measures of the mean audit size, percentage of
complete-case audits, and percentage of audits confirming the LE result (i.e., where
consistency of the PFS treatment effect is concluded) out of the 10,000 simulated
replicates were obtained. Note that these replicate audits are conducted for performance
evaluation purposes only; in practice, these procedures would be implemented a single
time.
For Method B, our evaluations calculated the differential discordance for both the early
and late discrepancy rates (EDR and LDR). The audit size was fixed at 160 patients, the
maximum audit size recommended by Amit et al. 2011 (3); analyses using other audit
sizes were also performed and gave similar results. For each trial, a random sample of
160 patients was drawn and the differential discordances for EDR and LDR were
calculated. To assess performance characteristics, the random samples were repeatedly
drawn 10,000 times; summary measures of the mean differential discordance values and
the percentage of times a complete-case audit was recommended (i.e. bias was detected)
out of the 10,000 replicates were obtained for each study. Under Method B, a trial was
considered to have a biased LE result, and thus a complete-case audit was recommended,
if the differential discordance for EDR or LDR passed the specified threshold value.
Results
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
10
Table 2 summarizes the level of LE-BICR discordance between treatment arms across the
26 trials, divided into disagreements on censoring status and the timing of progression
given a 7-day window. We see that the discordance rates are around 20% across arms for
both categories. Table 4 presents the performance measures for the two audit methods
for all evaluated trials; these results will be discussed and described in the figures below
and summarized in Table 5. Note that since the audit size for Method B was fixed at 160
patients for all studies, no mean audit size was reported.
Figure 1 assesses Method A by looking into the relationship between the mean audit size
for each trial and the upper bound of the 95% confidence interval (CI) of the LE hazard
ratio estimate. The cluster of circles at mean audit size of 100% are those trials for which
complete-case BICR audits were needed in all 10,000 replicates. Those trials all had
upper 95% CI bounds of the LE hazard ratio greater than 0.90. This means that, as
expected, trials with borderline or non-significant LE results would be recommended
complete-case BICR audits.
For all the other trials, the mean audit size decreases with the upper bound. This means
that trials with larger, more significant LE results would obtain the most savings in terms
of needing a much smaller audit size (most are below 50%). This general relationship
between the mean audit size and the upper CI bound holds true across tumor types
(results not shown).
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
11
Figure 2 assesses Method B by looking into the relationship between the HR ratio of
BICR versus LE and the differential discordance for the early or late discrepancy rate
(EDR or LDR). Note that a HR ratio greater than 1 implies an overestimate of the
treatment effect by the LE. Recall that EDR is the frequency that the LE declares
progression earlier than BICR. As explained previously, a negative differential
discordance for EDR is suggestive of bias in the LE result favoring the experimental arm.
In support of this rationale, we see that the differential discordance for EDR decreases as
the HR ratio increases. This means that, as more LE progressions are being called earlier
than BICR on the control arm, the difference in BICR and LE hazard ratio estimates also
increases. The reverse relationship is seen for the late discrepancy rate (LDR) since
LDR is the complement of EDR. This general relationship between the HR ratio of
BICR versus LE and the differential discordance for EDR/LDR holds true across tumor
types (results not shown).
Table 5 summarizes measures from both methods by categorizing the trials with respect
to their LE hazard ratio estimate. Of the 12 trials with a large observed LE-assessed PFS
treatment effect (HR ≤ 0.5), the median across studies of the mean audit sizes over the
10,000 replicates (in short, the median mean audit size) from Method A was 35% and the
LE was confirmed in all 10,000 replicate audits for all 12 trials (i.e. consistency of the
treatment effect was concluded for all replicates). This indicates very little (if any) loss in
power from the two-stage procedure of Method A. Of the same 12 trials, the mean across
studies of the proportion of times a complete-case audit was recommended over the
10,000 replicates from Method B was 43% and 37% for threshold values of 0.075 and
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
12
0.100, respectively. A complete-case audit is recommended for a study if the differential
discordance (DD) in EDR is less than the negative of the threshold value or the DD in
LDR is greater than the threshold value.
For smaller treatment effects (HR > 0.75), only 2 of the 8 trials resulted in all 10,000
replicate audits confirming the LE for Method A and the median mean audit size was
100%. It should be noted that, for most trials, either all 10,000 replicate audits confirmed
the LE or none did. Whereas there is an intuitive trend in decreased savings (larger
median mean audit sizes) using Method A as the observed LE treatment effect becomes
smaller, no such trend was seen for Method B as the mean proportion of times a
complete-case audit is recommended stayed fairly constant across the HR categories.
Case Studies
Carcinoid Study
One trial with evaluation bias present was the carcinoid trial (study 26 in Table 4), which
was discussed by the Oncologic Drug Advisory Committee (ODAC) in April of 2011 (6).
This was a phase 3, randomized (1:1), placebo-controlled study of everolimus for the
treatment of patients with unresectable or metastatic carcinoid tumor. The primary
endpoint was PFS by BICR. At the second interim analysis, an unprecedented
discordance of the PFS treatment effect was observed between the LE and BICR. The
LE PFS result (HR= 0.78, p = 0.003) crossed the efficacy boundary of p = 0.010 while
the BICR PFS result (HR= 0.93, p = 0.233) crossed the futility boundary of p = 0.175.
The boundary p-values are the significance levels to conclude either efficacy or futility,
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
13
while preserving the type I and II errors, respectively. Clearly, some bias was present in
this trial. The HR ratio was 1.19. Given that the LE and BICR gave such divergent
views of efficacy, it was of particular interest how the two audit methods would perform
for this study.
Table 6 summarizes the discordance between arms with respect to censoring status,
progression time, and censoring time. We see some discrepancies between the two arms.
For Method A, 100% of the 10,000 replicates resulted in complete-case audits with 0%
being able to verify the consistency of the treatment effect. For Method B, however, only
32.7% and 22.1% of the 10,000 replicates recommended a complete-case audit for
threshold values of 0.075 and 0.100, respectively, to support the conclusion that bias may
be present. Note that the fixed audit size of 160 was 37% of the total number of patients.
Soft Tissue Sarcoma (STS) Study
To illustrate the potential savings in audit size, another case study is presented (study 22
in Table 4), which was discussed by the Oncologic Drug Advisory Committee (ODAC)
in March of 2012 (7). This was a phase 3, randomized (1:1), placebo-controlled,
maintenance trial of ridaforolimus in 711 patients with soft tissue sarcoma (STS). The
final LE and BICR PFS hazard ratio estimates were 0.72 and 0.76, respectively, with a
HR ratio of 1.06. Table 6 summarizes the discordance between arms for this study,
which appears fairly balanced. For Method A, only 28% of the 10,000 replicates
resulted in complete-case audits with 100% verifying consistency of the treatment effect.
The mean audit size was 48%, a savings of over 50% in audit size.
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
14
For Method B, the fixed audit size of 160 was 23% of the total number of patients. Of
the 10,000 replicates, 51.7% and 41.4% recommended a complete-case audit for
threshold values of 0.075 and 0.100, respectively, to support the conclusion that bias may
be present. This was largely due to increased LDR discrepancies, as 2.0% and 41.4% of
the 10,000 replicates resulted in differential discordance values for EDR and LDR,
respectively, exceeded a threshold of 0.100. Thus, there is quite a bit of variability in the
differential discordance for LDR making it difficult to determine whether or not bias was
present in this study using Method B.
Discussion
Although measurement error is inherent in the reading of radiographic scans, regulatory
considerations for drug approval are based on the relative treatment effect at the
population level. Given that multiple publications (2-4) have corroborated the strong
correlation between LE- and BICR-assessed PFS treatment effect estimates, there is a
need for the exploration of alternative strategies to detect bias in the LE. The results of
the analyses presented herein support that a random sample-based BICR audit is a viable
alternative to a complete-case BICR, and may be a more efficient strategy for bias
evaluation of the LE. Note that, although there is general agreement, in one study
reported here (study 26 in Table 4), there were divergent conclusions depending on the
choice of LE and BICR, justifying a continued need for BICR to provide assurance about
the LE-based treatment effect.
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
15
Method A seems to perform well in most situations; i.e. it seems able to distinguish
between trials with and without bias present. The savings with respect to audit size varies
from case to case, depending on the study size and magnitude of the observed LE PFS
treatment effect; larger effect sizes and/or studies typically require smaller audit
proportions. Method B is intuitively appealing, but needs further evaluation, particularly
with respect to determination of the appropriate threshold value. Potential reasons for its
variable performance that need further evaluation include loss of important information
due to dichotomization and ignoring patients who were censored by both the LE and
BICR (cell “d” in Table 1) in the definition of EDR and LDR. Method B counts
discordances but not how far apart they are. For example, with the LDR, the LE could
call PD right after BICR or a long time after. If the late discrepancies occurred in the
control arm many visits after BICR, this would produce more significant bias in the HR
estimate. On a similar note, we could have a relatively small number of late
discrepancies, but if they occur very late, then this would produce greater bias in the HR
estimate.
Although real-time BICRs are becoming more prominent with advances in modern
digital technology, all the studies included in the analyses presented herein did not have
real-time BICRs. It should be noted that real-time BICR would ameliorate bias concerns
due to informative censoring, but does not alleviate potential bias due to the subjectivity
inherent to the PFS endpoint.
We acknowledge that the studies included in this analysis are limited to registration trials
that all had complete-case BICR and may not be representative of the population of all
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
16
clinical trials conducted. However, they are representative of the clinical trials that are
generally submitted to the FDA for regulatory consideration and for which BICRs are
usually required.
An ODAC meeting was held in July 2012 to discuss whether the current practice of
complete-case BICR should be replaced by a random sample-based BICR audit, based on
the information and analyses presented herein (8). All committee members agreed that a
random sample-based BICR audit should be considered; however, the potential merits
must be viewed in tandem with the potential limitations and challenges. They also
advised against the complete elimination of BICR, which could jeopardize the integrity
of the LE.
Although these analyses have demonstrated that a BICR audit to assess potential bias in
the LE is a feasible approach, the logistics of how the audit should proceed need further
discussion and consideration. The method of selecting the random sample audit needs to
be determined; a simple random sample audit may not be sufficient, for example, to
ensure representation of all study sites. Efforts should also be made to minimize any
additional burden that the audit may cause the investigator or sponsor without
compromising the integrity of the study. Selection of the actual audit strategy to
implement within a trial may need to be determined on a case-by-case basis. While we
focused on two methodologies, we expect that other approaches will become available
for consideration in the future.
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
17
References
1. Ford R, Schwartz L, Dancey J, Dodd LE, Eisenhauer EA, Gwyther S et al. Lessons
learned from independent central review. European Journal of Cancer 2009; 45: 268-274.
2. Dodd LE, Korn EL, Freidlin B, Jaffe CC, Rubinstein LV, Dancey J et al. Blinded
independent central review of progression-free survival in phase III clinical trials:
important design element or unnecessary expense? Journal of Clinical Oncology. 2008;
26: 3791-3796.
3. Amit O, Mannino F, Stone AM, Bushnell W, Denne J, Helterbrand J et al. Blinded
independent central review of progression in cancer clinical trials: results from a meta-
analysis. European Journal of Cancer. 2011; 47: 1772-1778.
4. Zhang JJ, Chen H, He K, Tang S, Justice R, Keegan P et al. Evaluation of blinded
independent central review of tumor progression in oncology clinical trials: A meta-
analysis. Drug Information Journal 2013;47: 167-74.
5. Dodd LE, Korn EL, Freidlin B, Gray R, Bhattacharya S. An audit strategy for
progression-free survival. Biometrics. 2011;67:1092-1099.
6. Oncologic Drug Advisory Committee – Everolimus for Carcinoid Tumors. April 12,
2011 Meeting.
http://www.fda.gov/AdvisoryCommittees/CommitteesMeetingMaterials/Drugs/Oncologi
cDrugsAdvisoryCommittee/ucm235829.htm
7. Oncologic Drug Advisory Committee – Ridaforolimus for STS. March 20, 2012
Meeting.
http://www.fda.gov/AdvisoryCommittees/CommitteesMeetingMaterials/Drugs/Oncologi
cDrugsAdvisoryCommittee/ucm285400.htm
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
18
8. Oncologic Drug Advisory Committee – Evaluation of Radiologic Review of PFS in
Non-Hematologic Malignancies. July 24, 2012 Meeting.
http://www.fda.gov/AdvisoryCommittees/CommitteesMeetingMaterials/Drugs/Oncologi
cDrugsAdvisoryCommittee/ucm285400.htm
Figure captions:
Figure 1. Relationship between mean audit size from Method A and CI upper bound of
LE hazard ratio
Figure 2. Relationship between HR ratio (BICR vs. LE) and differential discordance in
EDR/LDR from Method B
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
Table 1: Definition of EDR and LDR from Amit et al. 2011
BICR PD No PD
LE PD a = a1 + a2 + a3 b No PD c d
a1: number of agreements on timing and occurrence of PD a2: number of times LE declares PD later than BICR a3: number of times LE declares PD earlier than BICR PD: progressive disease Adapted from European Journal of Cancer, Amit, et al Copyright (2011), with permission from Elsevier (3).
on July 4, 2018. © 2013 A
merican A
ssociation for Cancer R
esearch.clincancerres.aacrjournals.org
Dow
nloaded from
Author m
anuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Author M
anuscript Published O
nlineFirst on M
arch 26, 2013; DO
I: 10.1158/1078-0432.CC
R-12-3364
Table 2. Summary of Study Characteristics
Study characteristics Meta-analysis trials (N = 26)
Tumor type MBC 6 RCC 7 MCRC 4 Othera 9 Design A vs. B / A+B vs. A / Placebo or BSCb 4 / 6 / 16 Open-label / Double-blind 11 / 15 Interim / Final analysis 9 / 17 1:1 / 2:1 randomization 18 / 8 1st / subsequentc / maintenance line 12 / 12 / 3 Sample Size Median 716.5 Min, Max 171, 1286
Discordance measure
Control (N = 31)
Experimental (N = 31)
% LE-BICR discordance on censoring status Mean (SD) 23.8% (9.7%) 23.0% (7.7%) Median 23.7% 24.5% Min, Max 9.1%, 43.3% 11.1%, 36.9%
% LE-BICR discordance on timing of PD (7-day window) Mean (SD) 21.5% (8.1%) 20.4% (8.7%) Median 21.4% 20.2% Min, Max 8.4%, 38.7% 6.1%, 41.4%
aIncludes trials in non-small cell lung cancer (3), pancreatic neuroendocrine tumors (2), soft tissue sarcoma (2), gastrointestinal stromal tumor (1), ovarian cancer (1), carcinoid tumors (1); bBSC = best supportive care; cOne trial (#12 in Table 4) included both 1st
and 2nd line patients and is double counted here; Abbreviations: MBC = metastatic breast cancer; RCC = renal cell carcinoma; MCRC = metastatic colorectal cancer
on July 4, 2018. © 2013 A
merican A
ssociation for Cancer R
esearch.clincancerres.aacrjournals.org
Dow
nloaded from
Author m
anuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Author M
anuscript Published O
nlineFirst on M
arch 26, 2013; DO
I: 10.1158/1078-0432.CC
R-12-3364
Table 3. Performance Evaluation Summary Measures of Audit Methods (for Each Study)
Method A (NCI Method) Method B (PhRMA Method) Evaluation Strategy
Conduct 10,000 random sample audits of varying size
Conduct 10,000 random sample audits of fixed size (N = 160)
Summary Measures
Mean audit size Mean differential discordance for EDR % complete-case (CC) audits1 Mean differential discordance for LDR % audits confirming LE result % complete-case (CC) audit1 recommended
1A complete-case (CC) audit is an audit with size = 100%
on July 4, 2018. © 2013 A
merican A
ssociation for Cancer R
esearch.clincancerres.aacrjournals.org
Dow
nloaded from
Author m
anuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Author M
anuscript Published O
nlineFirst on M
arch 26, 2013; DO
I: 10.1158/1078-0432.CC
R-12-3364
Table 4: Evaluation Results for Method A and Method B
Study
N
Tumor type1
LE HR (95% CI)
CC BICR HR
(95% CI)
HR Ratio
(BICR / LE)
Method A Method B
% CC audits2
Mean
audit size2
% replicate audits
confirming LE3
% CC audits
(0.100)4
% CC audits
(0.075)5 1 752 MBC 0.79 (0.68, 0.91) 0.74 (0.64, 0.87) 0.94 0% 73% 100% 14.8% 23.2% 2 722 MBC 0.49 (0.40, 0.59) 0.54 (0.44, 0.67) 1.10 5% 29% 100% 48.4% 57.7%
3-HER2- 952 MBC 0.89 (0.76, 1.04) 0.96 (0.78, 1.20) 1.08 100% 100% 0 26.1% 37.1% 3-HER2+ 219 MBC 0.72 (0.54, 0.97) 0.67 (0.45, 0.99) 0.93 100% 100% 100 21.2% 39.6%
4-Anth 622 MBC 0.66 (0.54, 0.81) 0.79 (0.63, 1.00) 1.20 78% 86% 36% 66.0% 75.3% 4-Cap 615 MBC 0.67 (0.56, 0.82) 0.70 (0.56, 0.87) 1.04 32% 55% 100% 29.6% 38.7%
5 762 MBC 0.81 (0.68, 0.95) 0.86 (0.72, 1.04) 1.06 100% 100% 0% 48.0% 58.0% 6 724 MBC 0.44 (0.36, 0.55) 0.35 (0.27, 0.46) 0.80 0% 30% 100% 4.9% 8.0% 7 769 RCC 0.44 (0.35, 0.54) 0.45 (0.37, 0.56) 1.02 0% 30% 100% 30.8% 38.9% 8 750 RCC 0.41 (0.33, 0.52) 0.41 (0.31, 0.53) 1.00 0% 35% 100% 23.4% 30.2%
9-25mg 416 RCC 0.70 (0.58, 0.86) 0.68 (0.55, 0.85) 0.97 2% 61% 100% 15.6% 25.6% 9-15mg 417 RCC 0.75 (0.61, 0.92) 0.76 (0.62, 0.94) 1.01 100% 100% 100% 3.7% 7.6%
10 416 RCC 0.33 (0.26, 0.42) 0.33 (0.26, 0.43) 1.00 0% 35% 100% 17.4% 24.1% 11 649 RCC 0.62 (0.51, 0.75) 0.59 (0.47, 0.74) 0.95 21% 41% 100% 19.4% 27.3% 12 435 RCC 0.43 (0.34, 0.54) 0.41 (0.32, 0.54) 0.95 0% 35% 100% 19.8% 29.0% 13 723 RCC 0.68 (0.56, 0.82) 0.68 (0.56, 0.83) 1.00 14% 49% 100% 24.5% 33.7% 14 463 MCRC 0.39 (0.32, 0.48) 0.55 (0.45, 0.67) 1.41 5% 29% 100% 98.2% 99.0%
15-Oxal 812 MCRC 1.35 (1.09, 1.67) 1.38 (1.08, 1.77) 1.02 100% 100% 0% 45.5% 54.7% 16-WT 656 MCRC 0.81 (0.67, 0.98) 0.80 (0.66, 0.98) 0.99 100% 100% 100% 25.6% 34.0% 16-Mu 527 MCRC 1.15 (0.95, 1.40) 1.22 (1.00, 1.50) 1.06 100% 100% 0% 16.4% 26.8% 17-WT 597 MCRC 0.71 (0.58, 0.87) 0.75 (0.62, 0.92) 1.06 16% 68% 100% 24.5% 33.4% 17-Mu 589 MCRC 0.82 (0.67, 0.99) 0.90 (0.74, 1.10) 1.10 100% 100% 0% 64.3% 74.6%
18 663 NSCLC 0.50 (0.41, 0.60) 0.63 (0.52, 0.76) 1.26 32% 46% 100% 53.4% 62.2% 19 884 NSCLC 0.71 (0.61, 0.82) 0.71 (0.60, 0.83) 1.00 29% 46% 100% 17.4% 24.8% 20 171 PNET 0.42 (0.26, 0.66) 0.31 (0.18, 0.54) 0.74 1% 55% 100% 0.0% 0.0% 21 410 PNET 0.38 (0.29, 0.48) 0.40 (0.30, 0.54) 1.05 0% 40% 100% 77.4% 84.7% 22 711 STS 0.72 (0.61, 0.85) 0.76 (0.64, 0.90) 1.06 28% 48% 100% 41.4% 51.7% 23 369 STS 0.35 (0.28, 0.45) 0.31 (0.24, 0.41) 0.89 0% 35% 100% 10.4% 16.0% 24 312 GIST 0.29 (0.20, 0.40) 0.32 (0.23, 0.45) 1.10 0% 40% 100% 56.5% 65.5% 25 645 Ovarian 0.69 (0.58, 0.82) 0.79 (0.65, 0.96) 1.14 67% 80% 100% 55.5% 66.0% 26 429 Carcinoid 0.78 (0.62, 0.98) 0.93 (0.71, 1.22) 1.19 100% 100% 0% 22.1% 32.7%
Abbreviations: Anth = anthracycline; Cap = capecitabine; Oxal = oxaliplatin; WT = wild type; Mu = mutant; CC = complete-case; DD = differential discordance 1MBC = metastatic breast cancer, RCC = renal cell carcinoma, MCRC = metastatic colorectal cancer, NSCLC = non-small cell lung cancer, PNET = pancreatic neuroendocrine tumors, STS = soft tissue sarcoma, GIST = gastrointestinal stromal tumor; 2over the 10,000 replicates per study; 3% of 10,000 audit replicates (whether partial or CC) per study in which consistency of the PFS treatment effect is concluded (i.e. the LE result is confirmed); 4% of 10,000 replicate audits per study for which CC audit is recommended (i.e. differential discordance (DD) in EDR < -0.100 or DD in LDR > 0.100); 5% of 10,000 replicate audits for which CC audit is recommended (i.e. DD in EDR < -0.075 or DD in LDR > 0.075)
on July 4, 2018. © 2013 A
merican A
ssociation for Cancer R
esearch.clincancerres.aacrjournals.org
Dow
nloaded from
Author m
anuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Author M
anuscript Published O
nlineFirst on M
arch 26, 2013; DO
I: 10.1158/1078-0432.CC
R-12-3364
Table 5: Method A vs. Method B
LE
hazard ratio
N1
Method A Method B Median mean audit
size2 (min, max) All replicate audits
confirmed LE3 Mean of % CC audit recommended4
Threshold = 0.100 Threshold = 0.075 ≤ 0.50 12 35% (29%, 55%) 12 (100%) 37% (0%, 98%) 43% (0%, 99%)
0.50 – 0.75 11 61% (41%, 100%) 10 (91%) 29% (4%, 66%) 39% (8%, 75%) > 0.75 8 100% (73%, 100%) 2 (25%) 33% (15%, 64%) 43% (23%, 75%)
1N = number of studies; 2median across studies of the mean over 10,000 replicates for each study; 3number of studies for which all 10,000 replicate audits concluded consistency of the PFS treatment effect (i.e. confirmed the LE result); 4mean across studies of % complete-case (CC) audit is recommended of 10,000 replicate audits; a CC audit is recommended for a study if the differential discordance (DD) in EDR is less than the negative of the threshold value or the DD in LDR is greater than the threshold value
Table 6: Discordance in Case Studies
Case Study 1 Case Study 2
Discordance Placebo
(N = 213) Everolimus (N = 216)
Placebo (N = 364)
Ridaforolimus(N = 347)
Censoring status 38% 26% 14% 16% PD time1 15% 18% 21% 27%
Censoring time1 1.4% 3.7% 0.8% 2.0% 1PD/censoring discordant outside of 7-day window
on July 4, 2018. © 2013 A
merican A
ssociation for Cancer R
esearch.clincancerres.aacrjournals.org
Dow
nloaded from
Author m
anuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Author M
anuscript Published O
nlineFirst on M
arch 26, 2013; DO
I: 10.1158/1078-0432.CC
R-12-3364
© 2013 American Association for Cancer Research
0
0.5 1.0
Mea
n au
dit
size
(%)
95% CI upper bound of LE hazard ratio
r = 0.797 (0.617, 0.898)
1.5
20
40
60
80
100
120
Figure 1:
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364
© 2013 American Association for Cancer Research
0.6
0.8
1.0
1.2
1.4
1.6
0.2 0.1 0.0 0.1 0.2 0.3
0.6
0.8
1.0
1.2
1.4
1.6
0.2 0.1 0.0 0.1 0.2 0.3
HR ratio = 1Threshold = 0.075Threshold = 0.1
Reference lines:HR ratio = 1Threshold = 0.075Threshold = 0.1
Reference lines:
r = 0.658 ( 0.821, 0.396) r = 0.827 (0.669, 0.914)
Mean differential discordance in EDR Mean differential discordance in LDR
Early discrepancy rate (EDR) Late discrepancy rate (LDR)H
R r
atio
(BIC
R/L
E)
HR
rat
io (B
ICR
/LE
)
Figure 2:
on July 4, 2018. © 2013 A
merican A
ssociation for Cancer R
esearch.clincancerres.aacrjournals.org
Dow
nloaded from
Author m
anuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Author M
anuscript Published O
nlineFirst on M
arch 26, 2013; DO
I: 10.1158/1078-0432.CC
R-12-3364
Published OnlineFirst March 26, 2013.Clin Cancer Res Jenny J Zhang, Lijun Zhang, Huanyu Chen, et al. Tumor Progression in Oncology Clinical TrialsAssessment of Audit Methodologies for Bias Evaluation of
Updated version
10.1158/1078-0432.CCR-12-3364doi:
Access the most recent version of this article at:
Manuscript
Authoredited. Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journal.Sign up to receive free email-alerts
Subscriptions
Reprints and
To order reprints of this article or to subscribe to the journal, contact the AACR Publications
Permissions
Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)
.http://clincancerres.aacrjournals.org/content/early/2013/03/23/1078-0432.CCR-12-3364To request permission to re-use all or part of this article, use this link
on July 4, 2018. © 2013 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on March 26, 2013; DOI: 10.1158/1078-0432.CCR-12-3364