outcomes research and diagnostic testing challenges and approaches rebecca smith-bindman, md...
TRANSCRIPT
Outcomes Research and Diagnostic Testing
Challenges and Approaches
Rebecca Smith-Bindman, MD
Associate Professor Radiology, Epidemiology and Biostatistics
Obstetrics, Gynecology and Reproductive Sciences
University of California, San Francisco
I Goals of Outcomes Research and Diagnostic Testing (Dx)
Similarities / differences compared with treatment studies
II Unique Challenges
II a: Diagnostic testing leads to an increase in number of casesII b: Methods of avoiding bias
III Examples of of studies focused on mammographic screening III a UtilizationIII b AccuracyIII c Outcomes
Outline
I Goals of Outcomes Research
•Goals the same as for studies assessing treatment
•IOM domains of quality all apply
Safety Ex: Amniocenteses vs CVS Timeliness Ex: Timing between screening/diagnosisEquity Ex: Testing by race Efficiency Ex: Procedures per dxPatient centered Ex: Patient satisfaction with imagingEffectiveness Outcomes Ex: Does the use of a a particular test improve outcomes Accuracy Unique: Does test find Dz with reasonable sensitivity / false positive
•Donabedian model of quality applies
Structure: Ex: MRI’s in a communityProcess: Ex: Mammography as Hedis measureOutcomes: Ex: Improved patient health due to use of testing
II Unique Challenges Measuring Outcomes and Testing
•Early diagnosis of disease is a small determinate of outcomes
•An accurate test necessary, but not sufficient evidence of value
•Need to adjust for other factors when evaluating outcomes
• Underlying health and co-morbidity (screening vs diagnostic tests)
• Quality of treatment after diagnosis
• Potential biases introduced through the process of diagnosis
Biases Introduced Through TestingOver diagnosis and Lead Time Biases
•There is a broad spectrum of abnormalities for every disease
•Not all patients labeled with a disease would have been symptomatic(I.e. prostate cancer)
•The more patients tested, the more labeled (labeling maybe > disease)
•Evaluating outcomes among this larger pool of patients will lead to biased outcomes as it is easy to “cure” patients and have really good outcomes when patients don’t really have the disease.
Outcome Symptom -Free Survival
Period
Adjust for co-morbidity Assess Outcomes
Adjust for severity of disease
Study of Treatment
Symptom
RX
By evaluating a test in asymptomatic patients, you will label patients who never would have been been diagnosed symptomatic
These patients will have better outcomes
Study of Diagnosis
TESTING
Over-diagnosed patient
The time between Dx and when symptoms would have occurred in the absence of screening is defined as the LEAD time
You don’t know when symptoms would have developed
Study of Diagnosis
TESTING Symptom
Lead Time
Question: Would the use of a (screening) test in this population have led to the earlier diagnosis and improved outcomes for those patients who were going to be diagnosed with disease (ie prostate ca)
Healthy Patients Sick Patients
Tested Population
Healthy Patients Sick Patients
It is not possible to evaluated survival of those “diagnosed”, as the population of patients is very different than the group of patients you originally were hoping to study
Population
Over-Diagnosis and Lead-Time Biases
• Often acknowledged and dismissed or accounted for
•These biases are impossible to account for in any reliable fashion
Impossible to know how deep the reservoir of
disease
It is impossible to quantitate the lead time
•Studies that evaluate outcomes in a screened population are flawed!!Henschke, NEJM 2007: CT For lung cancer screening
II b Methods to Avoid Biases when Assessing Outcomes
• Assess outcome that will come to attention even without screening
Death, advanced stage cancer, critical stage
disease
• Do not consider the diagnosis of early disease itself to be beneficial, but rather the decrease in
late stage disease is beneficial
• Assess outcomes in the ENTIRE population Those who did / did not undergo test
• For example, what is the rate of late disease in screened population
Not Testing
Tested
Asses test in entire population
Assess rate of bad outcomes (red dot) and each group
There will be a higher percent of bad outcomes among those Not Tested 3/6 (50%) versus 3/15 (20%) due to the over diagnosis from testing
However, the rate of bad outcomes will be lower in tested group only if test beneficial. Here 3 adverse outcomes per group = no benefit
III Example of Diagnostic Testing Studies: Mammography
• The most studied diagnostic test
• Studied with 7 RCT’s, 1000’s of observational studies
• Will use mammography to highlight different approaches to using observational data to assess:
III a UtilizationIII b Accuracy III c Outcomes
III a Utilization
• Utilization relatively easy to assess: example by age, race, SES
• Important to distinguish a screening test from a diagnostic test
• Interesting ONLY if receipt of test is used as a measure of quality
HEDIS measures include receipt of mammography
• Utilization cannot be used to assess if test if valuable
• Studies of utilization are interesting if effectiveness established
UTILIZATIONAre There Questions To Be Answered with These or
Similar Data
•Diagnostic testing of all types is increasing rapidly
•For radiology tests CT has increased 10% per year for last decadeMRI 15% per year for last decade
•For cardiology tests The increase has been even greater
•The most interesting studies would assess appropriateness of testing or outcomes, but as a first step, interesting to document current practice and cost
III b Accuracy
• Which of two tests is more discriminatory
• Useful once the general effectiveness of a test has been established
• Potentially misleading: finding more disease is not necessarily better
• On a practical level, impossible to do a RCT for each new test, and the general assumption is that if one diagnostic test has been studied, new tests can be compared with earlier proven test
Accuracy : Digital vs Film Mammography
•Both use x-rays Film: image captured on filmDigital: image captured electronically and
stored digitally
•Digital images are viewed directly on a computer and radiologist can alter contrast / brightness / magnify without additional x-rays
•Digital more expensive: 2-4 x’s $$$ , reimbursed higher
•Digital Mammographic Imaging Screening Trial (DMIST) funded by the NCI to determine the accuracy of digital versus vs film mammography
Pisano, October 2005 NEJM
• 49,500 women, 33 sites across U.S., 2001 -
2002
• All women had both digital and film
• Each exam was interpreted by a different
physician
• Digital mammography better in several groups with higher sensitivity and no change in false positive rate Sensitivity
Digital Film
Women < Age 50 78% 51%
Women with dense breasts 70% 55%
Pre- and Peri-menopausal women 72% 51%
DMIST : Design Typical for Accuracy Study
Can You Assess Accuracy Using Existing Data
• It is important to assess accuracy of technology (new and old) ideally before widespread utilization
• Head to head direct comparisons are extremely expensive DMIST cost $26 million
• Secondary analysis of existing data possible
• Evaluate accuracy among those who have undergone testHMOsDisease / screening registries
• For breast cancer, the Breast Cancer Surveillance Consortium is an extremely rich data base
Disease/Testing Registry
Breast Cancer Surveillance Consortium (BCSC)
• Registry of women who have undergone mammography within 7 regions of the U.S. (SF, Seattle, Vermont, NH, NC, NM, Col)
• Includes mammographic interpretation (accuracy)• Cancer risk factors• Demographic information • Cancer diagnosis and treatment
• BCSC will share data, or if you work with a BCSC collaborator, may analyze the data for you (Great Resource!)
Accuracy Study: Computer Aided Detection (CAD)
• Computer program assists radiologist in interpreting mammogram
• Large numbers of lesions are flagged by the computer, and radiologist must considers whether they are real
• Mixed results
• It is expensive and can be billed in addition to mammogram
• Since utilization of CAD has increased over time, would be ideal to study impact in actual practice
Accuracy of CAD Using Observational Data
• 429,345 mammograms obtained from 1998 - 2002 at 43 centers as part of BCSC
• 7/43 facilities switched to CAD
• Evaluated accuracy with and without CAD; before and after CAD
Methodologically tried to account for the fact that this was not a prospective trial by having multiple comparisons
They adjusted facility accuracy for factors known to alter accuracy
• CAD had worse accuracy: higher false positive rates, higher biopsy rates, and slightly lower detection of invasive cancer
Fenton, NEJM 2007
Accuracy US UK Comparison of Screening Mammography
• Accuracy of mammography seemed higher in UK ?? Due to age of women, frequency of screening, mix of diagnostic exams
• Comparison of mammographic accuracy US and UKUK National Breast Cancer Screening Program US National Breast Cervical Cancer Screening Program (CDC)US BCSC (NCI)
• Sample included 5 million women, 26,000 breast cancers
• Compared cancer detection rate, recall rate, unnecessary biopsy rateAfter counting for patient factors that impact accuracy (age, screening cycle and symptoms)
• Results dramatic: 2-3 times as many tests and negative biopsies in the US, with no differences in the number or types of cancers detected Smith-Bindman, JAMA, 2003
Other Accuracy Studies Completed Using BCSC Data
• Accuracy of mammography by patient factors: age, race, ethnicity, breast density, family history of cancer, presence of breast implants - to help develop guidelines around who should undergo mammography
• Accuracy of mammography by screening frequency to estimate how often to screen
• Accuracy of mammography by physician factors - such as annual volume - and institutional characteristics - such as whether a facility serves a high proportion of uninsured.
• Development of accuracy guidelines based on standard of care
I Goals of Outcomes Research and Diagnostic Testing (Dx)
Similarities / differences compared with treatment studies
II Unique Challenges
II a: Diagnostic testing leads to an increase in number of casesII b: Methods of avoiding bias
III Examples of of studies focused on mammographic screening III a UtilizationIII b AccuracyIII c Outcomes
Outline
Accuracy versus Outcomes
• An accurate test is necessary but not sufficient
• In commenting on the recent CAD study Susanne Fletcher noted
With mammography, we have multiple studies showing what happens to mortality rates if you get screening versus if you don't. With CAD systems [and the other newer tests we are using] we don’t have such studies
• Although not ideal, we have come to accept proxy studies comparing a new test to one already studied, without requiring direct evidence
• Ideally want to conduct outcomes studies of each new test - to see if exposure to the test improves patient outcomes
III c Outcomes Studies of Diagnostic Tests
• RCTs always good, but need to have a large sample size, in part because testing contributes only a small amount to outcomes
• To determine whether lung cancer screening with CT reduces lung cancer-specific mortality: 25,000 high risk patients were included in ongoing NCI funded study - and they are powered to detect a 50% or greater reduction in mortality.
• Can observational (and existing secondary data ) be used to assess the impact and outcomes associated with use of tests
Bach et al, CT screening and lung cancer outcomes JAMA 2007Henschke et al. Survival of patients with stage I lung cancer
detected on CT NEJM 2006
• Iezzoni, Risk Adjustment, chapter 5 focuses on using observational date to get at effectiveness
Observational Studies of Diagnostic Tests to Assess Outcomes
Ideal Data Base
Large, population basedInformation on exposure to tests, and outcomes among everyoneSufficient information on clinical characteristics / co-morbidity
Clinical disease registries often have very detailed clinical/ outcome data
Seer - Medicare: large population based data base with details of physician services (including tests, MD visits from Medicare); complete ascertainment of cancer cases (through SEER); and co-morbidity that can be derived using ICD-9 codes
Breast Cancer Surveillance Consortium (BCSC) clinical mammography data base, with detailed information about several breast cancer specific covariates and cancer outcome
Observational Data To Address Outcomes:Breast Cancer and The Benefit of Mammography
Three outcomes studies illustrated
I. Is mammographic screening beneficial in the elderly
II. Are the persistent racial and ethnic differences in breast cancer mortality due to screening
I. How much of racial and ethnic outcome differences are due to screening, treatment, biology, and SES
Outcomes i) : Is Mammography Effective In Elderly Women?
• Ideal study RCT, mortality endpoint
• Aside from issues of cost, ethical concerns• Mammography widely available, free from Medicare
• Can an observational study get at effectiveness
• NCI / CMS Collaboration
• Links SEER population based cancer registries (100% cases) with complete Medicare data
• Provide 5% random sample of patients from the same areas
• Using these data you can created a population based
cohort, including all cases of a particular type and a sample that can be used to approximate all controls.
For example: Request data 100% sample breast cancer cases
Request 5% random sample controls
• The data set is large, need programming expertise, but rich data set for studies on utilization, quality and outcomes of cancer care
SEER - Medicare Data Set
Combine cancers (cases) and 5% controls to recreate population
Determine which patients underwent screening
Assess if women who underwent regular screening had fewer breast cancer deaths (or advanced cancer) than women who did not
Creating SEER - Medicare Data Set
No Screening
Screening
Will see more early stage cancer in screened group
Need to assess rate of late stage disease in screened group
Stage shift of less advanced disease suggests benefit
Creating SEER - Medicare Data Set
Smith Bindman, American Journal of Medicine ,1999
Analysis of SEER - Medicare Data
Relative Risk Breast Cancer By Stage Among Women who Underwent Screening vs Those Who
Didn’t
Smith Bindman, American Journal of Medicine 1999
Analysis of SEER - Medicare Data
Absolute and Relative Risk of Breast Cancer By Stage
Among Women who Underwent Screening vs Those Who Didn’t
Outcomes ii) Cancer Differences by Race Ethnicity
• There are persistent racial/ethnic differences in cancer mortality rates
AA have fewer cases of breast cancer, but more deaths
• Decline in cancer mortality largely seen among whites
• If mammography use is the same among all women, and yet cancer mortality rates have not changed, might suggest that mammography is not the reason for breast cancer improvements
How To Reconcile Similar Utilization of Mammographyand Cancer Differences by Race/Ethnicity
• Difficult to disentangle issues of biology/screening/treatment
• There may be persistent differences in the use of mammography that could explain outcome differences
• To answer this question, our strategy was to:
a) Assess differences in tumor characteristics by R/E
b) Assess if use of screening predicts characteristics
c) Assess differences in the use of screening by R/E
d) See if differences in tumor characteristics decreased after stratifying by screening
Race and Outcomes using BCSC: Study Design
• Retrospective Cohort : 1995-2002
• Mammography assessed using medical records, patient survey
• All women who ever had a mammogram were included, including women who only had diagnostic mammogram
• Women characterized by their type and frequency of screening
• 1 million racially diverse women age 40-85; 17,000+ cancers
a. Tumor Characteristics at Diagnosis: Tumors Worse Among AA
Large>15 mm
AdvancedStage
HighGrade
LymphNode +
White ref ref ref ref
African
American1.2 1.3 1.4 1.3
Hispanic .91 .92 .92 .91
Native American .62 .61 .80 .64
Asian .73 .72 .89 .62
Relative Risk of Advanced Disease by Race/Ethnicity
b. Tumor Characteristics by Screening: More Frequent Screening is Better
Large>15 mm
AdvancedStage
HighGrade
LymphNode +
1 Year Referent
Referent Referent Referent
2 Year 1.1 1.1 1.1 1.0
3 Year 1.5 1.5 1.3 1.4
4 Year 2.6 2.6 2.2 2.5
Relative Risk of Advanced Disease by Screening
c. Differences in Screening by Race / Ethnicity. Minorities Screened Less
Frequently
Never Inadequate
White 74% 6% 18%
African American57% 16% 34%
(RR1.6)
Hispanic 66% 8% 24% (RR 1.5)
Asian70% 8% 19% (RR
1.4)
History of Mammography 5 Years Prior to Diagnosis
d. After Accounting for Screening Little Black/White Difference: Size
Large Breast Cancer Rate
0
1
2
3
4
5
6
7
1 yr 2 yrs 3 yrs 4+ yrs First
Can
cers
/100
0 M
amm
ogra
ms
d. After Accounting for Screening Little Black/White Difference: Stage
Advanced Stage Breast Cancer Rate
0
1
2
3
4
5
6
7
1 yr 2 yrs 3 yrs 4+ yrs First
Can
cers
/100
0 M
amm
ogra
ms
Outcomes ii) Summary
• Persistent differences in the use of mammography by race
• Mammography at least in part casual for differences in cancer characteristics by race/ethnicity
MethodologyAssessed screening in the cohort
Assessed outcomes in cohort (advanced disease)
Assessed associated between screening and outcome after stratifying by the most important predictor (screening frequency) and adjusting for patient risk factors
Lessons to Be Learned: Evaluation Outcomes Regarding Diagnostic Effectiveness Using
Observational Data• 1: It is important to assess the exposure to the test in a well defined cohort - the cohort cannot be limited only to those in whom the test was abnormal, or in whom the test led to a referral or to a diagnosis
• 2. It is important to evaluate the outcome in the entire cohort, not just those who had the test. If the test had no impact on outcomes, you should be able to see the “bad outcomes” equally in those in whom were exposed and not exposed
• 3. If exposure and outcomes are comprehensively assessed, you need to make sure you can assess “risk factors” or “confounding” factors that might erroneously lead you to conclude an association between exposure and outcome (ie if those who were destined to have good outcomes were also more likely to be exposured
• 4. If you have exposure, and outcomes, and important confounders, you can asses the effectiveness of exposure
Outcomes iii) Factors That Contribute to Breast Cancer Outcomes
Differences in breast cancer screening, breast cancer treatment, biology and SES exist by race and ethnicity
How each of these factors contribute to the differences in breast cancer survival and mortality are not fully know, as it has been difficult to assess all simultaneously in a single database
To complete this analysis, we used SEER - Medicare data and included measures of screening, treatment , biology , co-morbidity, demographics
Variables Included in SEER MEDICARE Survival Analysis
• SEER site• Age (SEER)• Race and Ethnicity (SEER)• Mammography Screening (pattern during proceeding 3 years) Medicare• Tumor Characteristics at Diagnosis (SEER)• Biological Measures (SEER)
Estrogen receptor, histological grade• Breast Cancer Treatment (SEER and Medicare) Haggstrom, Cancer 2005
• Co-Morbidity (Medicare)Charlson co-morbidity index derived from Medicare(Lots of different measures of co-morbidity can
be derived)• Socioeconomic Factors (Medicare)
Median income of residence recorded in zip codeCommunity size (rural, less rural, urban,
metropolitan, big city)
Statistical Analysis
• A Cox Proportional Hazards model was used to determine time from breast cancer diagnosis to cancer-specific death among all women with breast cancer, and stratified by stage at diagnosis (0/I, II/III, IV)
• The base model controlled for patient age and SEER site.
• Additional variables were added to increasingly more adjusted models
• We included interactions between screening / age, screening /stage, stage / radiation
• The order that the variables are added will determine how large an effect the estimated RR or each variable, but in the fully adjusted model, the order no longer matters
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
AAHispanicAsian/PI
Base Model
Screening
Tumor Severity
Biology RX Co-Morbidity
SES
Relative Hazard of Cancer Specific Death Relative Hazard (c/w White
Women)
.2 .4 .6. 8 1 1.2
1.4 1.6 1.8 2.0
Conclusion
Outcomes studies focused on diagnostic testing are important
Given how much $$$$$ we spend on testing, important to assess
There are several methodological biases that need to be taken into account
Lots of opportunities to use existing data to assess the utilization, accuracy and outcomes associated with diagnostic testing
When thinking about testing, to account for biases, you sometimes need to think about unusual study designs
Thank you