welcome [365.himss.org]
TRANSCRIPT
2
Adding Community-level Social Determinants of Health Factors to Patient-level Data to Predict Stroke
2
Rema Padman, PhD; Min Chen, PhD
Carnegie Mellon University, Pittsburgh, PA;
Florida International University, Miami, FL
DISCLAIMER: The views, opinions and images expressed in this presentation are those of the authors and do not necessarily represent official policy or position of HIMSS.
BG5, March 9, 2020
Conflict of Interest
Rema Padman, PhD and Min Chen, PhD
Have no real or apparent conflicts of interest to report.
4
Collaborators
5
Xuan Tan, PhD Candidate
Department of Information Systems & Business Analytics, Florida International University
Manjiri Kshirsagar, Jana Macickova, Ashita Vadlamudi, Chi Zhang
Graduate students, Heinz College of Information Systems and Public Policy, Carnegie
Mellon University
Agenda
• Introduction and Motivation
• Study questions
• Datasets and Analysis Cohort Identification
• Social Determinants of Health Factors
• Stroke Prediction Models
• Results and Discussion
• Conclusions
6
Learning Objectives
• Evaluate the extent to which data available only at admission can be
used to provide a relatively reliable prediction of acute disease
diagnosis, such as stroke, and which type of information is the most
valuable
• Assess the value of adding SDoH to the prediction of disease
incidence and outcomes
• Leverage SDoH data to achieve a more accurate risk assessment, and
ultimately, better performance for healthcare providers and better
outcomes for population health
7
8
Introduction
• Stroke or brain attack is one of
the leading causes of disability
and death worldwide [1]
• Estimated cost is $34 billion each
year in the US [2-3]
• Undiagnosed stroke or
misdiagnosed stroke means
delayed treatment or no
treatment at all
https://getaheadofstroke.org/
9
https://www.stroke.net.nz/stroke-information
Introduction• Stroke occurs when blood supply to a
certain area in the brain suddenly gets interrupted and brain cells die quickly
due to lack of oxygen [4-5]
• The harmful effects of stroke vary from
person to person and depend on the
affected area in the brain, size of stroke,
age, comorbidities [4-5]
• How quickly the patient receives medical
treatment is most critical for recovery. The
first hour of symptom onset is called “the
golden hour” [6]
• Patients who receive medications or
procedures to restore blood flow within
the first three hours have significantly
higher probability of recovery [6]
Timely and Accurate Diagnosis of Stroke: A Serious Challenge
• There are many medical conditions
that can initially look like stroke - Stroke
Mimics
• Diagnosis relies on laboratory medicine
resources and time-consuming and
expensive imaging and are not always
readily available at patient admission
to an emergency care facility [7]
10
https://www.kob.com/albuquerque-news/be-fast-neurologists-lay-
out-guide-to-spotting-stroke-symptoms/5363071/
• Social Determinants of Health (SDoH) have been shown to have an
association with risk of stroke and many other diseases [8-9]
Summary of Previous Studies on Stroke Diagnosis Prediction
• Several studies have attempted to detect potential biomarkers to
distinguish acute ischemic stroke from stroke-mimics [10-12]
• Some studies have reported predicting risk or mortality of stroke using
claims data [13, 15]. Others, such as Teoh (2018), predicted diagnosis
of stroke within one year using electronic health records data in Japan,
reporting an ROC of 0.67 [14]
• Few studies have tried to incorporate SDoH information to predict
stroke. Min et al. (2018) derived a model for stroke pre-diagnosis with
potentially modifiable risk factors (including lifestyle factors), and were
able to correctly discriminate between normal subjects and stroke
patients in 65% of the cases [16]
11
12
• Social Determinants of Health
(SDoH) include various community
and social factors, such as
“conditions in which people are
born, grow, work, live, and age,
and the systems shaping the
conditions of daily life” [17]
• Socioeconomic status• Education
• Occupation
• Transportation
• Health insurance
• Urban/rural residence
• Social support• Neighborhood factors
• ….
What are Social Determinants of Health (SDoH)?
https://hitconsultant.net/2019/03/18/social-determinants-of-health-sdoh-collection/#.XkweTihKg2w
13
• 20 percent of a person’s health
and well-being is related to
access to care and quality of
services [19-21]
• The physical environment,
social determinants and
behavioral factors drive 80
percent of health outcomes
[19-21]
• Your zip code could matter more than your genetic code
Impact of Social Determinants of HealthSocial determinants of health have tremendous affect on an individual’s
health regardless of age, race, or ethnicity.
14
• Population health suggests addressing upstream
SDoH factors such as access to healthy food and
viable transportation options [18]
• There is significant correlation between certain
SDoH factors (e.g., neighborhood socioeconomic
status indicators) and clinical outcomes including
hospitalizations due to stroke [8-9]
Impact of Social Determinants of Health
Acoihc.az.gov
• There are few systematic studies assessing the value of SDoH factors in the
prediction of diverse clinical events
• There is a need to explicitly evaluate whether and how social determinants
of health data can contribute to improving patient risk stratification and
prediction
Study Questions:
• Motivated by the current challenges of acute disease prediction,
especially the diagnosis of stroke when a patient presents for
emergency care at the Emergency Department(ED), this study aims to
answer the following questions:
• Can we use data available only at admission to provide a reliable
prediction of stroke diagnosis? Moreover, which type of information is
the most valuable?
• How can we identify an analysis cohort that best represents the
potential stroke population?
• How can we leverage SDoH data to achieve more accurate
prediction and risk assessment?15
Datasets
• State Inpatient Dataset (SID) from the Agency for Healthcare
Research and Quality (AHRQ) – HCUP Data
• The universe of discharge records from patients admitted across all
Florida community hospitals
• 2012 to 2014
• American Community Survey (ACS) Data
• Data from the US Census Bureau
• Demographic, socioeconomic, and other neighborhood information
about individuals and households at various geographic levels of
aggregation
16
HCUP Data
• The Healthcare Cost and Utilization Project (HCUP) has been
developed through a Federal-State-Industry partnership sponsored by
the AHRQ
• HCUP maintains both the State Inpatient Databases (SID) and the
State Emergency Department Databases (SEDD)
• SID contains all hospital inpatient discharge records including
information on patients seen in the emergency room and
subsequently admitted to the hospital
• SEDD captures discharge information on all “treat and release” ED
visits
17
ACS Data vs. Census Data
• Similarities
• Both administered by the U.S. Census Bureau
• Both provide neighborhood level SDoH information
• Differences
• Census is conducted once every 10 years
• ACS provides more up-to-date information about the social and
economic needs of the community
18
Creating the Analysis Dataset
19
For those admitting diagnosis codes related to
stroke and mimics:
89,925 (stroke sample) + 55,051 (mimics sample)
= 144,976
Ended up as stroke:
66,704 (stroke sample) + 786
(mimics sample) = 67,490
Ended up as non-stroke:
23,221 (stroke sample) + 54,265
(mimics sample) = 77,486
Identifying Stroke Mimics
• We compiled a list of conditions with similar initial symptoms as stroke by
consulting physicians, Epocrates, and medical literature
• We identified stroke mimics in the patient-level data using admitting diagnosis
codes for specific conditions (e.g. hypoglycemia, complicated migraine, seizure)
• Top 5 diagnoses from Stroke Mimics diagnosis codes
20
DX_CCS1 DescriptionPercent of
observations
83 Epilepsy; convulsions 56.8%
50 Diabetes mellitus with complications 6.3%
660 Alcohol-related disorders 5.1%
35 Cancer of brain and nervous system 4.1%
51 Other Endocrine disorders 1.9%
Stroke versus Stroke Mimics - Examples
21
Patient
Admitting Symptoms Principal Diagnosis
ICD-9 Code Description
Clinical
Classification
Code
Description
A 78039 Convulsions 83 Epilepsy, convulsions
B 78039 Convulsions 131 Respiratory failure
C 78039 Convulsions 129
Aspiration pneumonitis;
food/vomitus
D 43491
Cerebral artery
occlusion with
infarction 109
Acute cerebrovascular
disease
E 431
Intracerebral
hemorrhage 109
Acute cerebrovascular
disease
Available Patient-Level Variables in HCUP
22
Information available at IP
admission: • Age, gender, race, ZIP code,
rural or urban residence,
median income of patient’s ZIP
code• Admission time
• Patient point of origin (e.g.
home, ER, nursing facility, etc.)
• Admitting diagnosis code
• Number of chronic conditions• Primary expected payer
• Whether it was a weekend
admission
• Whether the admission was
during night shift
Information available at
discharge: • Discharge time
• Died during
hospitalization?
• Physician ID
• Diagnosis Related
Group
• Major Diagnostic Category
• Length of stay
• Procedures
• Total hospital charges
Patient ID is used to track a patient’s visits across hospitals over time
HCUP Data Used for Analysis
23
Information available at ED
presentation (extracted from IP
admission): • Age, gender, race, ZIP code,
rural or urban residence,
median income of patient’s
ZIP code
• Patient point of origin (ER)
• Number of chronic conditions
• Primary expected payer
Patient ID is used to track a patient’s visits across hospitals over time
Outcome Measure: Binary indicator variable
of stroke versus non-
stroke (using Primary
Diagnosis Code from IP
admission)
SDoH Data for Analysis
24
431 Variables: household characteristics, relationship & marital status, fertility, educational attainment, veteran status, disability status, residence 1 year ago, place of birth, citizenship status,
language spoken at home, ancestry, computer use, employment status, commuting to work,
occupation & industry, income, health insurance coverage, poverty status, housing
characteristics, vehicles available, house value & expense, Gini index
Subjective selection based on literature review
78 Variables: The 78 variables were chosen from a larger set of 4 ACS tables and several hundred variables because they represent social, economic, housing, and demographic characteristics
referenced in the literature to have a relationship with health status
Data cleaning steps
(i) Removing non-numeric indicators of missing data and replacing them with column averages,
(ii) Log10 transforming columns with large dollar values to be closer in value to the other columns,
which ranged from 0-100 indicating percentages
(iii)Only a negligible portion of the smaller population ZIP codes had missing column values, so
replacing the missing values with the column average was deemed appropriate
25
Flow Chart of Analysis Data Creation
join
Original ACS dataset (2010-2014): 983 rows
No-NA Diagnosis + ACS dataset (input for analytical models): 97,134 rows
Diagnosis + ACS dataset: 101,558 rows
Zip Code Level SDoH Data
Patient Level Data + SDoH Data
De-duplicated dataset : 125,266 rows
No-NA dataset: 101,558 rows (70% of original)
Remove NAs
Patient Level Stroke and Mimics Data
Original dataset (2012-2014): 144,976 rows
Keep only ED transferred to IP records
Train set (80% of 97,134): 77,707 rows Test set (20% of 97,134): 19,427 rows
Remove NAs
ED to IP dataset : 106,010 rows
Keep only index admission
Descriptive Summary
26
Stroke Sample Mimics Sample All Sample
Age*** 71.11(14.74) 55.83(22.74) 63.72(20.51)
Female*** 0.50(0.50) 0.51(0.50) 0.51(0.50)
Number of Chronic Conditions*** 7.14(3.00) 5.30(3.21) 6.25(3.23)
Elixhauser Score 7.59(9.82) 7.61(9.84) 7.60(9.83)
Race/Ethnicity
White*** 0.66(0.47) 0.62(0.49) 0.64(0.48)
Black*** 0.17(0.38) 0.20(0.40) 0.19(0.39)
Hispanic*** 0.14(0.35) 0.15(0.36) 0.15(0.35)
Other Race 0.03(0.16) 0.03(0.16) 0.03(0.16)
Medical Insurance
Medicare*** 0.70(0.46) 0.49(0.50) 0.60(0.49)
Medicaid*** 0.08(0.27) 0.18(0.38) 0.12(0.33)
Private Insurance*** 0.13(0.34) 0.17(0.37) 0.15(0.36)
Other Insurance*** 0.09(0.29) 0.16(0.37) 0.13(0.33)
Urban Residence*** 0.96(0.20) 0.97(0.17) 0.96(0.19)
Median household income for patient's ZIP Code
First Quartile*** 0.40(0.49) 0.42(0.49) 0.41(0.49)
Second Quartile** 0.33(0.47) 0.32(0.47) 0.33(0.47)
Third Quartile 0.20(0.4) 0.20(0.40) 0.20(0.4)
Fourth Quartile 0.07(0.25) 0.06(0.24) 0.06(0.25)
Number of Observations 50,159 46,975 97,134
Note: Standard Deviations are in parentheses. ***significant at 1%; **significant at 5%, *significant at 10%.
27
Prediction Models
• Logistic Regression (LR): a popular baseline model to determine the
relationship between a set of predictor variables and a binary
outcome variable [22]
• Random Forest (RF): a supervised machine learning method for
classification that fits multiple decision trees on different subsamples
of the data to classify outcomes [23]
• Gradient Boosting Machine (GBM): a machine learning method for
classification, which produces a prediction model in the form of an
ensemble of weak prediction models [24]
28
Model Evaluation Scenarios
• HCUP variables only
• Add community-level SDoH variables from the ACS data
• Add Comorbidities to HCUP variables: Elixhauser Index
• Add community-level SDoH variables from the ACS data and
Elixhauser Index
Key Performance Measures
• Accuracy: % of predictions that the model got right
• AUC: Probability that the model will rank a randomly chosen positive example
higher than a randomly chosen negative example
• Precision: Ratio of correct positive results to total positive results predicted by
the model
• Recall: Ratio of correct positive results to the total correct results in the
data
• F1 score: A [0, 1] score that is computed as the harmonic mean
between precision and recall
29
30
Performance of Stroke Diagnosis Models
Model Predictors Accuracy AUC Precision Recall F1 Score
Logistic
Regression
SID Only 0.67 0.67 0.66 0.75 0.70
SID + ACS 0.67 0.67 0.66 0.75 0.70
SID + Elixhauser 0.67 0.67 0.66 0.75 0.70
SID + ACS + Elixhauser 0.68 0.68 0.67 0.75 0.71
Random
Forest
SID only 0.69 0.68 0.65 0.86 0.74
SID + ACS 0.68 0.67 0.66 0.77 0.71
SID + Elixhauser 0.69 0.68 0.65 0.86 0.74
SID + ACS + Elixhauser 0.68 0.67 0.66 0.77 0.71
Gradient
Boosting
Machine
SID only 0.69 0.68 0.65 0.84 0.73
SID + ACS 0.70 0.69 0.67 0.81 0.73
SID + Elixhauser 0.69 0.68 0.65 0.84 0.73
SID + ACS + Elixhauser 0.70 0.69 0.67 0.82 0.74
Discussion: Why Does SDoH Information Lack Strong Predictive Power?
• Insufficient variability in the ZIP code level SDoH measures
Welch-adjusted ANOVA shows significant difference in means
between stroke versus non-stroke patients for most of the ACS
variables
• Patient-level demographics present in claims data may have
accounted for much of the variability in the ACS variables
Regress ACS SDoH variables onto the patient-level variables
R-squared: 30-40%
32
Key Takeaways
• Claims data can be used to predict stroke diagnosis
• Individual-level SDoH variables are important predictors
• Age
• Primary payer
• Adding community-level ACS variables to the patient-level data did
not improve predictive power substantially
33
Contributions, Limitations and Implications
• One of the first large-scale studies that systematically assesses the added
value of SDoH information using claims data
• Development and integration of individual level SDoH screening tools is
strongly indicated; incentivize the collection of SDoH data through
financial or quality measures
• Link data from Emergency Department and pre-hospital care settings,
including outpatient ambulatory care to get a more complete patient
trajectory
• Integrate clinical data with the claims data to include laboratory and
imaging test results
• Investigate stroke prediction at milestone events after ED presentation,
such as when a test is completed, to determine the added value of
distinct interventions
34
References1. “Heart Disease and Stroke Statistics— 2018 Update: A Report From the American Heart Association”. American Heart Association. March 20, 2018.
2. Benjamin EJ, Virani SS, Callaway CW, Chamberlain AM, Chang AR, Cheng S, et al. Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart
Association. Circulation. 2018;137(12):e67-e492.
3. Johnson CO, Nguyen M, Roth GA, Nichols E, Alam T, Abate D, et al. Global, Regional, and National Burden of Stroke, 1990–2016: A Systematic Analysis for the Global Burden of
Disease Study 2016. The Lancet Neurology. 2019;18(5):439-58.
4. Kelly Adam G, Hellkamp Anne S, Olson D, Smith Eric E, Schwamm Lee H. Predictors of Rapid Brain Imaging in Acute Stroke. Stroke. 2012;43(5):1279-84.
5. Alberts MJ, Hademenos G, Latchaw RE, Jagoda A, Marler JR, Mayberg MR, et al. Recommendations for the Establishment of Primary Stroke Centers. JAMA. 2000;283(23):3102-9.
6. Mayo Clinic. Stroke-Diagnosis & Treatment 2019 [Available from: https://www.mayoclinic.org/diseases-conditions/stroke/diagnosis-treatment/drc-20350119.
7. Musuka TD, Wilton SB, Traboulsi M, Hill MD. Diagnosis and Management of Acute Ischemic Stroke: Speed is Critical. CMAJ. 2015;187(12):887-93.
8. Chan KS, Roberts E, McCleary R, Buttorff C, Gaskin DJ. Community Characteristics and Mortality: the Relative Strength of Association of Different Community Characteristics.
American journal of public health. 2014;104(9):1751-8.
9. Hill PL, Weston SJ, Jackson JJ. Connecting Social Environment Variables to the Onset of Major Specific Health Outcomes. Psychology & health. 2014;29(7):753-67.
10. Glickman SW, Phillips S, Anstrom KJ, Laskowitz DT, & Cairns CB. Discriminative capacity of biomarkers for acute stroke in the emergency department. The Journal of emergency
medicine. 2010; 41(3): 333-339.
11. Reynolds MA, Kirchick HJ, Dahlen JR, Anderberg JM, McPherson PH, Nakamura KK, ... & Buechler KF. Early biomarkers of stroke. Clinical Chemistry. 2003; 49(10):1733-1739.
12. Saenger AK, & Christenson RH. Stroke biomarkers: progress and challenges for diagnosis, prognosis, differentiation, and treatment. Clinical chemistry. 2010; 56(1): 21-33.
13. Ong MEH, Chan YH, Lin WP, & Chung WL. Validating the ABCD2 score for predicting stroke risk after transient ischemic attack in the ED. The American journal of emergency
medicine. 2010; 28(1): 44-48.
14. Teoh D. Towards stroke prediction using electronic health records. BMC medical informatics and decision making. 2018; 18(1): 1-11.
15. Cheon S, Kim J, Lim J. The Use of Deep Learning to Predict Stroke Patient Mortality. International journal of environmental research and public health. 2019 Jan;16(11):1876.
16. Min SN, Park SJ, Kim DJ, Subramaniyam M, & Lee KS. Development of an Algorithm for Stroke Prediction: A National Health Insurance Database Study in Korea. European
neurology. 2018; 79(3-4): 214-220.
17. World Health Organization. Social determinants of health. Secondary Social determinants of health. [Available from: http://www.who.int/social_determinants/sdh_definition/en/
18. Freij M, Dullabh P, Hovey L, Leonard J, Card A, Dhopeshwarkar R. Incorporating Social Determinants of Health in Electronic Health Records: A Qualitative Study of Perspectives on
Current Practices among Top Vendors. NORC at the University of Chicago 2018.
19. ProMedica. Social Determinants of Health 2019 [Available from: https://www.promedica.org/socialdeterminants/pages/default.aspx.
20. County Health Rankings & Roadmaps. County Health Rankings Model 2016 [Available from: https://www.countyhealthrankings.org/county-health-rankings-model.
21. Jessica T. Claudio, HRET HIIN, the Association for Community Health Improvement. Reducing Root Causes of Harm: Social Determinants of Health. November 15, 2018. [Available
from: http://www.hret-hiin.org/Resources/health_care_disparities/18/hret-hiin-virtual-event-reducing-root-causes-of-harm-social-determinants-of-health-slides.pdf.
22. Hosmer, D. W., and Lemeshow, S. 2000. “Interpretation of the Fitted Logistic Regression Model,” Chapter 3 in Applied Logistic Regression (2 nd ed.), New York: Wiley, pp. 47-90.
23. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning (2nd ed.). Springer. ISBN 0-387-95284-5.
24. Hastie, T.; Tibshirani, R.; Friedman, J. H. (2009). "10. Boosting and Additive Trees". The Elements of Statistical Learning (2nd ed.). New York: Springer.
35
Thank you!
We welcome and appreciate your feedback.
[email protected]; [email protected]
Questions?
36
Click here to rate this sessionOr
Type the below URL in your browserhttps://himss.pswebsurvey.com/SE.asp?SID=BG5