building risk adjustment models andy auerbach md mph

57
Building Risk Adjustment Models Andy Auerbach MD MPH

Upload: augustus-cox

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Building Risk Adjustment Models

Andy Auerbach MD MPH

Overview

• Reminder of what we are talking about• Building your own• Model discrimination• Model calibration• Model validation

Outcomes measurement is an applied form of risk adjustment

• Typically explain only 20-25% of variation in health care utilization

• Explaining this amount of variation can be important if remaining variation is extremely random

• Example: supports equitable allocation of capitation payments from health plans to providers

Where does ‘risk adjustment’ fit in our model

• Donabedian A. JAMA 1988;260:1743-8

Structure Process Outcomes

Community Characteristics

Delivery System Characteristics

Provider Characteristics

Population Characteristics

Health Care Providers- Technical processes- Care processes- Interpersonal processes

Public & Patients- Access- Equity- Adherence

Health Status

Functional Status

Satisfaction

Mortality

Cost

Patient severity of illness is a kind of confounder

Exposure Outcome

Confounding Factor

?

What risk factors are….

• They are….– Factors that affect the patient’s risk for outcome

independent of the treatment

• These factors can also be associated with:– Risks for receiving the treatment (allocation bias

propensity scores)– Modification of the effect of treatment (interaction

terms)

Risk factors are not…

• While there may be some in common, risk factors for an outcome given a health condition are not the same as the risk factors for the condition. – Hyperlipidemia is a risk factor for MI but not for survival

following an MI

Because in the end your analyses will look like this….

Measure = Predictor + confounders + error term

Measure = Predictor + risk of outcome + other confounders + error term

Building Your Own Risk-Adjustment Model

• What outcomes– Must be clearly defined and frequent enough for modeling

• What population– Generic or disease specific

• What time period– Single visit/hospitalization or disease state that includes multiple

observations • What purpose– Implications for how good the model needs to be

Inclusion/Exclusion: Hospital Survival for Pneumonia

• Include– Primary ICD-9 code 480-487 (viral/bacterial pneumonias)– Secondary ICD-9 code 480-487 and primary of empyema (510),

pleurisy (511), pneumothorax (512), lung abscess (513), or respiratory failure (518)

• Exclude– Age <18 years old– Admission in prior 10 days– Other diagnoses of acute trauma– HIV, cystic fibrosis, tuberculosis, post operative pneumonia

Episode of Care

• Does dataset include multiple observations (visits) over time of the same individual?– Re-hospitalizations– Hospital transfers

• Can dataset support linking observations (visits) over time?

• Inclusion and exclusion criteria should describe handling of multiple observations

Identifying Risk Factors for Model

• Previous literature

• Expert opinion/consensus

• Data dredging (retrospective)

Reliability of Measurement

• Is the ascertainment and recording of the variable standardized within and across sites?

• Are there audits of the data quality and attempts to correct errors?

Missing Data

• Amount• Why is it missing? Biased ascertainment?• Does missing indicate normal or some other value?• Can missing data be minimized by

inclusion/exclusion criteria?• May want to impute missing values

Risk Factors: Which Value With Multiple Measurements?

• First? Worst? Last?

• Consider whether timing of data collection of risk factor accurately reflects relevant health state, could confound rating of quality or number of missing values

• May be able to improve estimate of some risk factors using multiple measures

Co-Morbidity or Complication

• Including complications in risk adjustment models gives providers credit for poor quality care

• True co-morbidities may be dropped from risk adjustment models out of concern that they sometimes represent complications

Vulnerability

• Should models include adjustment for rage, gender, or insurance type?

Caveats to risk factors: Gaming

• Situation in which the coding of risk factors is influenced by coder’s knowledge or assumptions regarding how the data will be used to create a performance report or to calculate payment

• The potential for gaming to alter the results (eg quality comparisons of providers) is related to the degree that it occurs similarly or differently across providers

Caveats: Co-morbidities andComplications

• In administrative data, preexisting and acute illnesses have been coded without differentiation (e.g. acute vs. preexisting thromboembolism).– Generally not an issue for chronic diseases– Link to earlier records (eg previous admissions) can be

helpful

• Condition present at admission (CPAA) coding now a standard part of California hospital discharge data

Risk Factors: Patient Characteristics Not Process of Care

• Processes of care can be indicative of severity– Use of IABP

• However treatments also reflect practice style/quality

• Process measures can be explored as a possible mediators as opposed to risk factors for outcomes

Coronary Artery Disease: Mortality Rates by Race

Elements in Model Black/White

Risk Ratio

Confidence Interval

Race 1.41 1.27-1.56

Race + clinical elements*

1.18 1.05-1.32

Race/clinical elements/Rx

1.08 0.97-1.20

Age, coronary anatomy, ejection fraction, chf, angina, AMI, mitral regurgitation, periph vasc disease, coexisting illnesses: Peterson et al, NEJM, 1997

Building Multivariate Models

• Start with conceptual framework from literature and expert opinion

• Pre-specify statistical significance for retaining variables

Building models: More in depth coursework

• Model selection, checking, and bootstrapping Epi 208 (mostly), and some in 209.

• Selection and evaluation of prediction models and Multiple imputation in Biostat 210.

• Regression Methods in Biostatistics (Vittinghoff, et. Al.) has lots of information on model selection and evaluation.

Empirical Testing of Risk Factors

• Univariate analyses to perform range checks, eliminate invalid values and low frequency factors

• Bivariate analyses to identify insignificant or counterintuitive factors

• Test variables for linear, exponential, u-shaped, or threshold effects

Building Multivariate Models

• Stepwise addition (or subtraction) monitoring for:– 20% or more change in predictor parameter estimate– Statistical significance of individual predictors

• Test for connections between risk and outcome/predictors– Add interactions between predictor and risk factors

(or between risk factors)– Stratified analyses

CAGB Registry in NY State:Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft

Surgery 1989-1992

Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft Surgery in New York State, 1989-1992

Risk Factors in Large Data Sets: Can you have too much power?

• Large datasets prone to finding statistical significance

• May want to consider whether statistical significance is clinically significant – May also want to select risk factors based on a clinically

relevant prevalence…

• Conversely, consider forcing in clinically important predictors even if not statistically significant

Counterintuitive findings in risk adjustment

• Outcomes of MI treatment– Hypertension is protective - decreased risk of mortality– Perhaps a surrogate for patients on beta blockers

• If don’t believe hypertension truly protective then best to drop from model

Smaller Models are Preferred

• Rule of thumb: 10-30 observations per covariate not generally an issue in large datasets

• Smaller models are more comprehensible

• Less risk of “overfitting” the data

Evaluating Model’s Predictive Power

• Linear regression (continuous outcomes)

• Logistic regression (dichotomous outcomes)

Evaluating Linear Regression Models

• R2 is percentage of variation in outcomes explained by the model - best for continuous dependent variables– Length of stay– Health care costs

• Ranges from 0-100% • Generally more is better

More to Modeling than Numbers

• R2 biased upward by more predictors

• Approach to categorizing outliers can affect R2 as predicting less skewed data gives higher R2

• Model subject to random tendencies of particular dataset

Evaluating Logistic Models

• Discrimination - accuracy of predicting outcomes among all individuals depending on their characteristics

• Calibration - how well prediction works across the range of risk

Discrimination

• C index - compares all random pairs of individuals in each outcome group (alive vs dead) to see if risk adjustment model predicts a higher likelihood of death for those who died (concordant)

• Ranges from 0-1 based on proportion of concordant pairs and half of ties

Adequacy of Risk Adjustment Models

• C index of 0.5 no better than random,1.0 indicates perfect prediction

• Typical risk adjustment models 0.7-0.8 – 0.5 SDs better than chance results in c statistic =0.64– 1.0 SDs better than chance resutls in c statistic = 0.76– 1.5 SDs better than chance results in c statistic =0.86– 2.0 SDs better tha chance results in c statistic =0.92

Best Model Doesn’t Always Have Biggest C statistic

• Adding health conditions that result from complications will raise c statistic of model but not make the model better for predicting quality.

Spurious Assessment of Model Performance

• Missing values can lead to some patients being dropped from models

• Be certain when comparing models that the same group of patients is being used for all models otherwise comparisons may reflect more than model performance

Calibration - Hosmer-Lemeshow

• Size of C index does not indicate how well model performs across range of risk

• Stratify individuals into groups (e.g. 10 groups) of equal size according to predicted likelihood of adverse outcome (eg death)

• Compare actual vs expected outcomes for each stratum• Want a non significant p value for each stratum and across

strata (Hosmer-Lemeshow statistic)

Stratifying by Risk

• Hosmer Lemeshow provides a summary statistic of how well model is calibrated

• Also useful to look at how well model performs at extremes (high risk and low risk)

Hosmer-Lemeshow

• For k strata the chi squared has k-2 degrees of freedom

• Can obtain false negative (non significant p value) by having too few cases in a stratum

Goodness-of-fit tests for AMI mortality models

Individual’s CABG Mortality Risk

• 65 y.o obese non white woman with diabetes and serum creatinine of 1 mg/dl presents with an urgent need for CABG surgery. What is her risk of death?

Calculating Expected Outcomes

• Solve the multivariate model incorporating an individual’s specific characteristics

• For continuous outcomes the predicted values are the expected values

• For dichotomous outcomes the sum of the derived predictor variables produces a “logit” which can be algebraically converted to a probability

• (e nat log odds/1 + e nat log odds)

Individual’s Predicted CABG Mortality Risk

• 65 y.o obese non white woman with diabetes presents with an urgent need for CABG surgery. What is her risk of death?

• Log odds = -9.74 +65(0.06) + 1(.37)+1(.16)+1(.42)+1(.26)+1(1.15) +1(.09) = 3.39

• Probability of death = elnodds/1+elnodds

0.034/1.034=3.3%

Observed CABG Mortality Risk

• Actual outcome of whether individual lived or died

• Observed rate for a group is number of deaths per the number of people in that group

Actual and Expected CABG Surgery Mortality Rates by Patient Severity of Illness in New York

Chi squared p=.16

Validating Model

• Eyeball test– Face validity/Content validity– Does empirically derived model correspond to a pre-

determined conceptual model?• If not is that because of highly correlated predictors? A dataset

limitation? A modeling error?

• Internal validation in split sample• Test in a different data set

Internal Validation

• Take advantage of the large size of administrative datasets

• Establish development and validation data sets- Randomly split samples- Samples from different time periods/areas- Determine stability of model’s predicting power

• Re-estimate model using all available data

Overfitting Data: Overspecified Model

• Model performs much better in fitted data set than validation data set

• May be due to– Infrequent predictors– Unreliable predictors– Including variables that do not meet pre-specified

statistical significance

Model Performance for Risk Adjustment (R2)

5 level risk adjustment variable

5 level plus plus 10 other predictors

5 level plus 40-65 other predictors

Fitting 0.134 0.260 0.293

Validating 0.133 0.225 0.195

Validating Model in Other Datasets: Predicting Mortality following CABG

STS NY VA Duke MN

C statistic .759 .768 .722 .789 .752

Jones et al, JACC, 1996

Recalibrating Risk Adjustment Models• Necessary when observed outcome rate different than expected

derived from a different population

• This could reflect quality of care or differences in coding practices

• Assumption is that relative weights of predictors to one another is correct

• Recalibration is an adjustment to all predictor coefficients to force average expected outcome rate to equal observed outcome rate

Recalibrating Risk Adjustment Models

• New York AMI mortality rate is 15%• California AMI mortality rate is 13%• Is care or coding different?

• If want to use New York derived risk adjustment model to predict expected deaths in California need to adjust predictors (eg multiply by 13/15)

Summary: Risk Adjustment Using Secondary Data

• Requires large datasets• Risk factors are patient characteristics that predict

outcomes, not process of care and not complications • Multivariate model building should be guided by

literature/expert opinion• The smallest model that performs well is generally best• Next time we will evaluate model performance

Summary

• Summary statistics provide a means for evaluating the predictive power of multivariate models

• Care should be taken to look beyond summary statistics to ensure that the model is not overspecified and that it conforms to a conceptual model

• Models should be validated with internal and ideally external data

• Next time we will review how risk adjustment models should deal with hierarchical data structures