tdwi stl 20140306 analytics - leslie mcintosh
DESCRIPTION
Data mining and analytics in healthcare by Leslie McIntosh from Washington University School of Medicine / BJC Healthcare at the 2014-03-06 TDWI St. Louis chapter meeting. Contact Leslie through LinkedIn (http://www.linkedin.com/pub/leslie-mcintosh/3/176/659)TRANSCRIPT
Healthcare Analytics: Mining Electronic Medical Records
Leslie McIntosh, PhD, MPH
Health Informatics
Computer Science/IS
Health/Medicine
Analytics/Statistics
https://edgewatertech.wordpress.com
Located, Consolidated, and Standardized the Data
Facilities
Syntactic interoperability Use of standardized programming interfaces
APIs Service-oriented architecture Messaging standards
Semantic interoperabilityUse of controlled vocabularies and ontologies
ICD9-10 – Diagnosis Codes LOINC – Laboratory tests SNOMED – Clinical terms (e.g. diagnoses) CPT – Medical procedures RxNORM – Medications
Facilitating Interoperability
Clinical Data Demographics : name, address, race, gender, phone numberVisits: age, patient type, facility, diagnosis code, procedure codeLabs: age, collection time, facility, lab procedure, lab test name, specimen type (e.g. serum vs CSF glucose), result Medications: age, duration, frequency, medication, route & form Allergies: allergen type, allergy reaction, sensitivity, severity, type, onset dateVitals: age, body site, facility, measurement, observation, value and units Documents: age, document content, document name, facility and physician Illicit Drug Use: type, history, positive/negative Biospecimen Data: Specimen Type, Accession Number and Clinical diagnosis
Natural Language Processing (NLP)
Custom Analytics
Acknowledgements• Leslie D. McIntosh, PhD, MPH – Washington
University • Walton Sumner, MD – Washington University • Lynn Latham - BJC• Bijoy George – Washington University • Pavan Kalantri – Washington University • Suhas Khot – Washington University • Anthony Juehne – Washington University • Rakesh Nagarajan MD, PhD – Washington University
Alcohol, Tobacco, and Illicit Drug Use in Patients are Not Discrete Data
Problem
Electronic Medical Documents
Rules Creation
Negative
vs.
Positive
Use
Current Use Quit Duration
Substance
Volume
Duration
Family vs.
PersonalHistory
AlcoholTobaccoIllicit Drugs
CIDER
Controlled
Vocabulary
Discrete data extraction
Defined
Variable
Search
Manual
curation
Mining Discrete Data
Chronic Diseases associated with Multiple CTs
Unknown what Diagnoses are Associated with Multiple CT Scans
Problem #1
Common Chronic DiseasesProblem #2
Acknowledgements• Richard Griffey, MD, Division of Emergency Medicine, Washington
University School of Medicine • Leslie McIntosh, PhD, MPH, Center for Biomedical Informatics,
Department of Pathology, Washington University School of Medicine• Tom Bailey, MD, Division of Infectious Diseases, Department of Medicine,
Washington University School of Medicine
Disclosure• Emergency Medicine Foundation & Emergency Medicine Patient Safety
Foundation Patient Safety Fellowship
• Institutional KM1 Comparative Effectiveness Award Number KM1CA156708 through the National Cancer Institute (NCI) at the National Institutes of Health (NIH) and Grant Numbers UL1 RR024992, KL2 RR024994, TL1 RR024995 through The Clinical and Translational Science Award (CTSA) program of the National Center for Research Resources and the National Center for Advancing Translational Sciences at the National Institutes of Health.
Background
Methods
Design: Exploratory, Retrospective
Step 1Identify conditions among patients associated with being in the top 10% of CT study count Step 2 Test whether among all patients having one of these conditions increased the odds of being highly imaged (in the top 10%).
Patients with 5+ CT(2004-2011)
Prior CT at BJH* (2004-2011)
Unique patients BJH - ED (2011)
58,079
35,3982 (0)21,404 (1+ with Dx)
693 (1+ w/o Dx)
18,816 (no) 2,588 (yes)
1. Identify patients in top 10% of CT
Top 10%
ED Visit + CT (ever)
*CTs were limited to those commonly ordered from the ED (e.g. head, cervical spine, chest, abdomen-pelvis)
Top 10%
2. Identify diagnoses (ICD-9s) associated with these visits*
• Rank by frequency & dual review those appearing >100 times• Statistical scoring (based on NLP algorithm tf-idf)• Exclude: cancer diagnoses, non-chronic conditions (e.g.
trauma), those not mapping to an indication for imaging (e.g. HTN, DM)
1830 malignant neoplasm of ovary7533 other specified congenital anomalies of kidney5308 other specified disorders of esophagus5678 other specified peritonitis75313 polycystic kidney autosomal dominant2384 polycythemia vera7530 renal agenesis and dysgenesis5582 toxic gastroenteritis and colitis1551 malignant neoplasm of intrahepatic bile ducts20210 mycosis fungoides unspecified site 4413 abdomial aortic aneurysm ruptured1520 malignant neoplasm of duodenum1541 malignant neoplasm of rectum53087 mechanical complication of esophagostomy56489 other functional disorders of intestine19882 secondary malignant neoplasm of genital organs4412 thoracic aortic aneurysm without mention of rupture5187 transfusion related acute lung injury 6190 urinary-genital tract fistula female99681 complications of transplanted kidney
* Inpatient & ED only
Candidate diagnoses
112.5 partial complex seizure282.6 Sickle cell disease115.1 schizophrenia774.0 psychosis 979.7 petite mal seizure532.4 pulmonary embolus487.3 quadriplegia475.8 hemiparesis784.9 HIV282.3 Hb-ss disease without crisis232.0 ulcerative colitis215.1 lupus212.3 sickle cell NOS343.5 migraine489.6 constipation234.1 crohn’s disease299.5 hydrocephalus214.1 sickle cell pain crisis
HCUP Clinical Classification System
Sickle cell disease282.3 348.6 594.7 843.6 119.0298.4 120.9 879.4 282.3 948.1387.3 282.6 859.1 214.1 902.0893.0 213.3 912.1 981.1 873.0
Paralysis349.4 348.6 594.7 843.6 432.4475.8 120.9 879.4 239.5 948.1387.3 212.3 859.1 203.1 902.0893.0 487.3 912.1 981.1 873.0
Regional enteritis349.4 348.6 594.7 843.6 234.1298.4 120.9 879.4 239.5 948.1387.3 212.3 859.1 203.1 902.0232.0 213.3 912.1 981.1 873.0
Psychosis349.4 348.6 594.7 843.6 432.4298.4 120.9 879.4 239.5 948.1387.3 290.3 859.1 203.1 902.0893.0 249.3 912.1 115.1 873.0
3. Cluster similar diagnoses in to “conditions”
214.1
282.3
213.3
94
532
1137
6,331
Diagnoses94 21
Methods – Step 2Patients at BJH(2010)
Patients at BJH (years in system)
Diagnosis of Interest
CT Status
Big Data Consumable
Built a Tool
Trained Clinicians
{This page intentionally left blank}
People
Computer Scientists/Developers
Analysts
Clinical Researchers
In the end, we have…
DevelopersAnalystsClinicians
DATA
Data Drivers
Liaisons facilitating the transformation of data to information
to knowledge
What’s next