outcomes research and diagnostic testing challenges and approaches rebecca smith-bindman, md...

Outcomes Research and Diagnostic Testing

Challenges and Approaches

Rebecca Smith-Bindman, MD

Associate Professor Radiology, Epidemiology and Biostatistics

Obstetrics, Gynecology and Reproductive Sciences

University of California, San Francisco

I Goals of Outcomes Research and Diagnostic Testing (Dx)

Similarities / differences compared with treatment studies

II Unique Challenges

II a: Diagnostic testing leads to an increase in number of casesII b: Methods of avoiding bias

III Examples of of studies focused on mammographic screening III a UtilizationIII b AccuracyIII c Outcomes

Outline

I Goals of Outcomes Research

•Goals the same as for studies assessing treatment

•IOM domains of quality all apply

Safety Ex: Amniocenteses vs CVS Timeliness Ex: Timing between screening/diagnosisEquity Ex: Testing by race Efficiency Ex: Procedures per dxPatient centered Ex: Patient satisfaction with imagingEffectiveness Outcomes Ex: Does the use of a a particular test improve outcomes Accuracy Unique: Does test find Dz with reasonable sensitivity / false positive

•Donabedian model of quality applies

Structure: Ex: MRI’s in a communityProcess: Ex: Mammography as Hedis measureOutcomes: Ex: Improved patient health due to use of testing

II Unique Challenges Measuring Outcomes and Testing

•Early diagnosis of disease is a small determinate of outcomes

•An accurate test necessary, but not sufficient evidence of value

•Need to adjust for other factors when evaluating outcomes

• Underlying health and co-morbidity (screening vs diagnostic tests)

• Quality of treatment after diagnosis

• Potential biases introduced through the process of diagnosis

Biases Introduced Through TestingOver diagnosis and Lead Time Biases

•There is a broad spectrum of abnormalities for every disease

•Not all patients labeled with a disease would have been symptomatic(I.e. prostate cancer)

•The more patients tested, the more labeled (labeling maybe > disease)

•Evaluating outcomes among this larger pool of patients will lead to biased outcomes as it is easy to “cure” patients and have really good outcomes when patients don’t really have the disease.

Outcome Symptom -Free Survival

Period

Adjust for co-morbidity Assess Outcomes

Adjust for severity of disease

Study of Treatment

Symptom

RX

By evaluating a test in asymptomatic patients, you will label patients who never would have been been diagnosed symptomatic

These patients will have better outcomes

Study of Diagnosis

TESTING

Over-diagnosed patient

The time between Dx and when symptoms would have occurred in the absence of screening is defined as the LEAD time

You don’t know when symptoms would have developed

Study of Diagnosis

TESTING Symptom

Lead Time

Question: Would the use of a (screening) test in this population have led to the earlier diagnosis and improved outcomes for those patients who were going to be diagnosed with disease (ie prostate ca)

Healthy Patients Sick Patients

Tested Population

Healthy Patients Sick Patients

It is not possible to evaluated survival of those “diagnosed”, as the population of patients is very different than the group of patients you originally were hoping to study

Population

Over-Diagnosis and Lead-Time Biases

• Often acknowledged and dismissed or accounted for

•These biases are impossible to account for in any reliable fashion

Impossible to know how deep the reservoir of

disease

It is impossible to quantitate the lead time

•Studies that evaluate outcomes in a screened population are flawed!!Henschke, NEJM 2007: CT For lung cancer screening

II b Methods to Avoid Biases when Assessing Outcomes

• Assess outcome that will come to attention even without screening

Death, advanced stage cancer, critical stage

disease

• Do not consider the diagnosis of early disease itself to be beneficial, but rather the decrease in

late stage disease is beneficial

• Assess outcomes in the ENTIRE population Those who did / did not undergo test

• For example, what is the rate of late disease in screened population

Not Testing

Tested

Asses test in entire population

Assess rate of bad outcomes (red dot) and each group

There will be a higher percent of bad outcomes among those Not Tested 3/6 (50%) versus 3/15 (20%) due to the over diagnosis from testing

However, the rate of bad outcomes will be lower in tested group only if test beneficial. Here 3 adverse outcomes per group = no benefit

III Example of Diagnostic Testing Studies: Mammography

• The most studied diagnostic test

• Studied with 7 RCT’s, 1000’s of observational studies

• Will use mammography to highlight different approaches to using observational data to assess:

III a UtilizationIII b Accuracy III c Outcomes

III a Utilization

• Utilization relatively easy to assess: example by age, race, SES

• Important to distinguish a screening test from a diagnostic test

• Interesting ONLY if receipt of test is used as a measure of quality

HEDIS measures include receipt of mammography

• Utilization cannot be used to assess if test if valuable

• Studies of utilization are interesting if effectiveness established

UTILIZATIONAre There Questions To Be Answered with These or

Similar Data

•Diagnostic testing of all types is increasing rapidly

•For radiology tests CT has increased 10% per year for last decadeMRI 15% per year for last decade

•For cardiology tests The increase has been even greater

•The most interesting studies would assess appropriateness of testing or outcomes, but as a first step, interesting to document current practice and cost

III b Accuracy

• Which of two tests is more discriminatory

• Useful once the general effectiveness of a test has been established

• Potentially misleading: finding more disease is not necessarily better

• On a practical level, impossible to do a RCT for each new test, and the general assumption is that if one diagnostic test has been studied, new tests can be compared with earlier proven test

Accuracy : Digital vs Film Mammography

•Both use x-rays Film: image captured on filmDigital: image captured electronically and

stored digitally

•Digital images are viewed directly on a computer and radiologist can alter contrast / brightness / magnify without additional x-rays

•Digital more expensive: 2-4 x’s $$$ , reimbursed higher

•Digital Mammographic Imaging Screening Trial (DMIST) funded by the NCI to determine the accuracy of digital versus vs film mammography

Pisano, October 2005 NEJM

• 49,500 women, 33 sites across U.S., 2001 -

2002

• All women had both digital and film

• Each exam was interpreted by a different

physician

• Digital mammography better in several groups with higher sensitivity and no change in false positive rate Sensitivity

Digital Film

Women < Age 50 78% 51%

Women with dense breasts 70% 55%

Pre- and Peri-menopausal women 72% 51%

DMIST : Design Typical for Accuracy Study

Can You Assess Accuracy Using Existing Data

• It is important to assess accuracy of technology (new and old) ideally before widespread utilization

• Head to head direct comparisons are extremely expensive DMIST cost $26 million

• Secondary analysis of existing data possible

• Evaluate accuracy among those who have undergone testHMOsDisease / screening registries

• For breast cancer, the Breast Cancer Surveillance Consortium is an extremely rich data base

Disease/Testing Registry

Breast Cancer Surveillance Consortium (BCSC)

• Registry of women who have undergone mammography within 7 regions of the U.S. (SF, Seattle, Vermont, NH, NC, NM, Col)

• Includes mammographic interpretation (accuracy)• Cancer risk factors• Demographic information • Cancer diagnosis and treatment

• BCSC will share data, or if you work with a BCSC collaborator, may analyze the data for you (Great Resource!)

Accuracy Study: Computer Aided Detection (CAD)

• Computer program assists radiologist in interpreting mammogram

• Large numbers of lesions are flagged by the computer, and radiologist must considers whether they are real

• Mixed results

• It is expensive and can be billed in addition to mammogram

• Since utilization of CAD has increased over time, would be ideal to study impact in actual practice

Accuracy of CAD Using Observational Data

• 429,345 mammograms obtained from 1998 - 2002 at 43 centers as part of BCSC

• 7/43 facilities switched to CAD

• Evaluated accuracy with and without CAD; before and after CAD

Methodologically tried to account for the fact that this was not a prospective trial by having multiple comparisons

They adjusted facility accuracy for factors known to alter accuracy

• CAD had worse accuracy: higher false positive rates, higher biopsy rates, and slightly lower detection of invasive cancer

Fenton, NEJM 2007

Accuracy US UK Comparison of Screening Mammography

• Accuracy of mammography seemed higher in UK ?? Due to age of women, frequency of screening, mix of diagnostic exams

• Comparison of mammographic accuracy US and UKUK National Breast Cancer Screening Program US National Breast Cervical Cancer Screening Program (CDC)US BCSC (NCI)

• Sample included 5 million women, 26,000 breast cancers

• Compared cancer detection rate, recall rate, unnecessary biopsy rateAfter counting for patient factors that impact accuracy (age, screening cycle and symptoms)

• Results dramatic: 2-3 times as many tests and negative biopsies in the US, with no differences in the number or types of cancers detected Smith-Bindman, JAMA, 2003

Other Accuracy Studies Completed Using BCSC Data

• Accuracy of mammography by patient factors: age, race, ethnicity, breast density, family history of cancer, presence of breast implants - to help develop guidelines around who should undergo mammography

• Accuracy of mammography by screening frequency to estimate how often to screen

• Accuracy of mammography by physician factors - such as annual volume - and institutional characteristics - such as whether a facility serves a high proportion of uninsured.

• Development of accuracy guidelines based on standard of care

I Goals of Outcomes Research and Diagnostic Testing (Dx)

Similarities / differences compared with treatment studies

II Unique Challenges

II a: Diagnostic testing leads to an increase in number of casesII b: Methods of avoiding bias

III Examples of of studies focused on mammographic screening III a UtilizationIII b AccuracyIII c Outcomes

Outline

Accuracy versus Outcomes

• An accurate test is necessary but not sufficient

• In commenting on the recent CAD study Susanne Fletcher noted

With mammography, we have multiple studies showing what happens to mortality rates if you get screening versus if you don't. With CAD systems [and the other newer tests we are using] we don’t have such studies

• Although not ideal, we have come to accept proxy studies comparing a new test to one already studied, without requiring direct evidence

• Ideally want to conduct outcomes studies of each new test - to see if exposure to the test improves patient outcomes

III c Outcomes Studies of Diagnostic Tests

• RCTs always good, but need to have a large sample size, in part because testing contributes only a small amount to outcomes

• To determine whether lung cancer screening with CT reduces lung cancer-specific mortality: 25,000 high risk patients were included in ongoing NCI funded study - and they are powered to detect a 50% or greater reduction in mortality.

• Can observational (and existing secondary data ) be used to assess the impact and outcomes associated with use of tests

Bach et al, CT screening and lung cancer outcomes JAMA 2007Henschke et al. Survival of patients with stage I lung cancer

detected on CT NEJM 2006

• Iezzoni, Risk Adjustment, chapter 5 focuses on using observational date to get at effectiveness

Observational Studies of Diagnostic Tests to Assess Outcomes

Ideal Data Base

Large, population basedInformation on exposure to tests, and outcomes among everyoneSufficient information on clinical characteristics / co-morbidity

Clinical disease registries often have very detailed clinical/ outcome data

Seer - Medicare: large population based data base with details of physician services (including tests, MD visits from Medicare); complete ascertainment of cancer cases (through SEER); and co-morbidity that can be derived using ICD-9 codes

Breast Cancer Surveillance Consortium (BCSC) clinical mammography data base, with detailed information about several breast cancer specific covariates and cancer outcome

Observational Data To Address Outcomes:Breast Cancer and The Benefit of Mammography

Three outcomes studies illustrated

I. Is mammographic screening beneficial in the elderly

II. Are the persistent racial and ethnic differences in breast cancer mortality due to screening

I. How much of racial and ethnic outcome differences are due to screening, treatment, biology, and SES

Outcomes i) : Is Mammography Effective In Elderly Women?

• Ideal study RCT, mortality endpoint

• Aside from issues of cost, ethical concerns• Mammography widely available, free from Medicare

• Can an observational study get at effectiveness

• NCI / CMS Collaboration

• Links SEER population based cancer registries (100% cases) with complete Medicare data

• Provide 5% random sample of patients from the same areas

• Using these data you can created a population based

cohort, including all cases of a particular type and a sample that can be used to approximate all controls.

For example: Request data 100% sample breast cancer cases

Request 5% random sample controls

• The data set is large, need programming expertise, but rich data set for studies on utilization, quality and outcomes of cancer care

SEER - Medicare Data Set

Combine cancers (cases) and 5% controls to recreate population

Determine which patients underwent screening

Assess if women who underwent regular screening had fewer breast cancer deaths (or advanced cancer) than women who did not

Creating SEER - Medicare Data Set

No Screening

Screening

Will see more early stage cancer in screened group

Need to assess rate of late stage disease in screened group

Stage shift of less advanced disease suggests benefit

Creating SEER - Medicare Data Set

Smith Bindman, American Journal of Medicine ,1999

Analysis of SEER - Medicare Data

Relative Risk Breast Cancer By Stage Among Women who Underwent Screening vs Those Who

Didn’t

Smith Bindman, American Journal of Medicine 1999

Analysis of SEER - Medicare Data

Absolute and Relative Risk of Breast Cancer By Stage

Among Women who Underwent Screening vs Those Who Didn’t

Outcomes ii) Cancer Differences by Race Ethnicity

• There are persistent racial/ethnic differences in cancer mortality rates

AA have fewer cases of breast cancer, but more deaths

• Decline in cancer mortality largely seen among whites

• If mammography use is the same among all women, and yet cancer mortality rates have not changed, might suggest that mammography is not the reason for breast cancer improvements

How To Reconcile Similar Utilization of Mammographyand Cancer Differences by Race/Ethnicity

• Difficult to disentangle issues of biology/screening/treatment

• There may be persistent differences in the use of mammography that could explain outcome differences

• To answer this question, our strategy was to:

a) Assess differences in tumor characteristics by R/E

b) Assess if use of screening predicts characteristics

c) Assess differences in the use of screening by R/E

d) See if differences in tumor characteristics decreased after stratifying by screening

Race and Outcomes using BCSC: Study Design

• Retrospective Cohort : 1995-2002

• Mammography assessed using medical records, patient survey

• All women who ever had a mammogram were included, including women who only had diagnostic mammogram

• Women characterized by their type and frequency of screening

• 1 million racially diverse women age 40-85; 17,000+ cancers

a. Tumor Characteristics at Diagnosis: Tumors Worse Among AA

Large>15 mm

AdvancedStage

HighGrade

LymphNode +

White ref ref ref ref

African

American1.2 1.3 1.4 1.3

Hispanic .91 .92 .92 .91

Native American .62 .61 .80 .64

Asian .73 .72 .89 .62

Relative Risk of Advanced Disease by Race/Ethnicity

b. Tumor Characteristics by Screening: More Frequent Screening is Better

Large>15 mm

AdvancedStage

HighGrade

LymphNode +

1 Year Referent

Referent Referent Referent

2 Year 1.1 1.1 1.1 1.0

3 Year 1.5 1.5 1.3 1.4

4 Year 2.6 2.6 2.2 2.5

Relative Risk of Advanced Disease by Screening

c. Differences in Screening by Race / Ethnicity. Minorities Screened Less

Frequently

Never Inadequate

White 74% 6% 18%

African American57% 16% 34%

(RR1.6)

Hispanic 66% 8% 24% (RR 1.5)

Asian70% 8% 19% (RR

1.4)

History of Mammography 5 Years Prior to Diagnosis

d. After Accounting for Screening Little Black/White Difference: Size

Large Breast Cancer Rate

0

1

2

3

4

5

6

7

1 yr 2 yrs 3 yrs 4+ yrs First

Can

cers

/100

0 M

amm

ogra

ms

d. After Accounting for Screening Little Black/White Difference: Stage

Advanced Stage Breast Cancer Rate

0

1

2

3

4

5

6

7

1 yr 2 yrs 3 yrs 4+ yrs First

Can

cers

/100

0 M

amm

ogra

ms

Outcomes ii) Summary

• Persistent differences in the use of mammography by race

• Mammography at least in part casual for differences in cancer characteristics by race/ethnicity

MethodologyAssessed screening in the cohort

Assessed outcomes in cohort (advanced disease)

Assessed associated between screening and outcome after stratifying by the most important predictor (screening frequency) and adjusting for patient risk factors

Lessons to Be Learned: Evaluation Outcomes Regarding Diagnostic Effectiveness Using

Observational Data• 1: It is important to assess the exposure to the test in a well defined cohort - the cohort cannot be limited only to those in whom the test was abnormal, or in whom the test led to a referral or to a diagnosis

• 2. It is important to evaluate the outcome in the entire cohort, not just those who had the test. If the test had no impact on outcomes, you should be able to see the “bad outcomes” equally in those in whom were exposed and not exposed

• 3. If exposure and outcomes are comprehensively assessed, you need to make sure you can assess “risk factors” or “confounding” factors that might erroneously lead you to conclude an association between exposure and outcome (ie if those who were destined to have good outcomes were also more likely to be exposured

• 4. If you have exposure, and outcomes, and important confounders, you can asses the effectiveness of exposure

Outcomes iii) Factors That Contribute to Breast Cancer Outcomes

Differences in breast cancer screening, breast cancer treatment, biology and SES exist by race and ethnicity

How each of these factors contribute to the differences in breast cancer survival and mortality are not fully know, as it has been difficult to assess all simultaneously in a single database

To complete this analysis, we used SEER - Medicare data and included measures of screening, treatment , biology , co-morbidity, demographics

Variables Included in SEER MEDICARE Survival Analysis

• SEER site• Age (SEER)• Race and Ethnicity (SEER)• Mammography Screening (pattern during proceeding 3 years) Medicare• Tumor Characteristics at Diagnosis (SEER)• Biological Measures (SEER)

Estrogen receptor, histological grade• Breast Cancer Treatment (SEER and Medicare) Haggstrom, Cancer 2005

• Co-Morbidity (Medicare)Charlson co-morbidity index derived from Medicare(Lots of different measures of co-morbidity can

be derived)• Socioeconomic Factors (Medicare)

Median income of residence recorded in zip codeCommunity size (rural, less rural, urban,

metropolitan, big city)

Statistical Analysis

• A Cox Proportional Hazards model was used to determine time from breast cancer diagnosis to cancer-specific death among all women with breast cancer, and stratified by stage at diagnosis (0/I, II/III, IV)

• The base model controlled for patient age and SEER site.

• Additional variables were added to increasingly more adjusted models

• We included interactions between screening / age, screening /stage, stage / radiation

• The order that the variables are added will determine how large an effect the estimated RR or each variable, but in the fully adjusted model, the order no longer matters

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

AAHispanicAsian/PI

Base Model

Screening

Tumor Severity

Biology RX Co-Morbidity

SES

Relative Hazard of Cancer Specific Death Relative Hazard (c/w White

Women)

.2 .4 .6. 8 1 1.2

1.4 1.6 1.8 2.0

Conclusion

Outcomes studies focused on diagnostic testing are important

Given how much $$$$$ we spend on testing, important to assess

There are several methodological biases that need to be taken into account

Lots of opportunities to use existing data to assess the utilization, accuracy and outcomes associated with diagnostic testing

When thinking about testing, to account for biases, you sometimes need to think about unusual study designs

Thank you

outcomes research and diagnostic testing challenges and approaches rebecca smith-bindman, md...

Documents

population of patients

testing slide

improved outcomes

patients dont

c outcomes

biased outcomes

good outcomes

group of patients