Creating Dynamic Groupers Using Overrepresentation of Clinical Terms
Tomasz Adamusiak MD PhD
Froedtert & Medical College of Wisconsin
2
Conflict of interest disclosure
Tomasz Adamusiak has no real or apparent conflicts of interest to report
3
Learning objectives
• Recognize the value of structured clinical information
• Identify computational and terminology challenges in big data analytics
• Evaluate how this approach applies to different use cases
4
What is a grouper?
Lists of specific values derived from standard vocabularies used to define clinical concepts, e.g. patients with diabetes
• SNOMED CT concepts
• ICD-9/10 codes
• EDG terms
• CQM Value Sets
5
Diabetes: Eye Exam CMS eMeasure: CMS131v2
Value Set Name
Diabetes
Type Grouping
Steward National Committee for Quality Assurance
Program CMS,MU2 EP Update 2013-06-14
… … …
190330002 Diabetes mellitus, juvenile type, with hyperosmolar coma (disorder)
SNOMEDCT
250 Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled
ICD9CM
E10.10 Type 1 diabetes mellitus with ketoacidosis without coma
ICD10CM
6
Mining associations in EHR data
Diabetes mellitus
Yes No
Glucohemoglobin measurement
Yes 1509 5442
No 881 99
7
Positive association
Background reference
Dynamic = expansion + association
8
CPT-4 83036
ICD10 E08-E13
Extract-Load-Transform
9
Transformation in ClinMiner https://clinminer.hmgc.mcw.edu user:epicdemo pass:epicdemo
10
This image by Tomasz Adamusiak is licensed under a CC BY 3.0 US license
ClinMiner is a non-commercial, prototype software
Pilot: test all possible diabetes associations
11
8k patients
12M observations
Labs (CPT-4/LOINC)
Medications (RxNorm)
Problems (ICD-9)
Procedures (CPT-4)
18 764 terms 162 significant
associations
Summarize, but normalize per patient 1 + 1 = 1
12
Parent Concepts
ICD-10-CM
Relatively straightforward in ICD
13
Parent Concepts
ICD-10-CM
Caveat: flat hierarchy results in disconnected clinical contexts
Q: All tuberculosis codes
• 010-018.99 TUBERCULOSIS
• 137 Late effects of tuberculosis
• 647.3 Tuberculosis complicating pregnancy childbirth or the puerperium
14
Expansion has to take into account multiple inheritance in SNOMED CT
15
SNOMED CT
Parent Concepts
Pieter Brueghel the Elder (1526/1530–1569) [Public domain], via Wikimedia Commons
In pursuit of a single language
16
Integrating terminologies with UMLS
Donald A.B. Lindberg, M.D.
Clinical
Terminologies
UMLS
17
UMLS is ideal for integration of heterogeneous clinical data
• Single entry point to MU terminologies
• Cross-walk between MU terms
• Terminology-agnostic
• Text-mining
18
UMLS
Exanthema C0015230
SNOMED CT
ICD-10-CM
UMLS establishes equivalence mappings across biomedical terminologies
SNOMED CT
rash NOS
ICD-10:R21
Cutaneous eruption
SCT:112625008
Eruption
SCT:1806006
UMLS
Exanthema C0015230
SNOMED CT
ICD-10-CM
UMLS establishes equivalence mappings across biomedical terminologies
SNOMED CT
Cutaneous eruption
SCT:112625008
rash NOS
ICD-10:R21
Eruption
SCT:1806006
6o of terminological Kevin Bacon
Acute myocardial infarction
Myocardial ischemia
Vascular Diseases
Disorder of soft tissue
Collagen Diseases
Connective Tissue Diseases
Epidermal and dermal conditions
Skin and subcutaneous tissue disorders
Dermatologic disorders
21
Expansion limited to MU terminologies and by semantic type
22
Finding
Disease or Syndrome
Ignore
Open issue: cycles due to subtle differences in meaning
23
Immune System
Endocrine System
Expansion in UMLS across MU sources
24
Diabetes mellitus without mention of complication,
type II or unspecified type, not stated as
uncontrolled
ICD-9
ICD-10
SNOMED CT
NDF-RT
Situation with explicit
context
Metabolic diseases
roots:
Statistical methods for establishing over/under-representation
• Serial contingency tables
• Chi-squared test with Bonferroni correction
• RR estimate of effect size
• Test diabetes in all 18 764 concept pairs
25
EHR-based association rule mining
Diabetes mellitus (C0011849)
Yes No
Glucohemoglobin measurement
(C0202054)
Yes 1509 5442
No 881 99
26
Positive association
Background reference
Other positive associations
• C0785704 Blood glucose monitoring equipment
• C0935929 Antidiabetics
• C0304870 Insulin, Long-Acting
• C0770893 Metformin hydrochloride
• C0011882 Diabetic Neuropathies
• C0011880 Diabetic Ketoacidosis
• C0011884 Diabetic Retinopathy
27
Expansion generalization on
class or system level
A non-representative control background can bias the findings
Diabetes inversely associated with
• C1314183 Special EEG tests
• C0242953 Barbiturate hypnotic
• C0064636 lamotrigine
• C1719410 Epilepsy and recurrent seizures
28
Open issue: reconciling lab orders with results
Clinical Laboratory
Hemoglobin A1c/Hemoglobin .total in Blood by
HPLC
LOINC:17856-6
Hemoglobin; glycosylated (A1C)
CPT-4:83036
29
Challenges
• Availability of correctly and exhaustively coded data
• Expansion with multiple inheritance memory intensive
• Testing all possible (180M) combinations computationally expensive
30
What can we learn from other industries?
31
Thank You!
Tomasz Adamusiak MD PhD
Human and Molecular Genetics Center
Medical College of Wisconsin
@7omasz
For more information
• Next-generation phenotyping using the Unified Medical Language System (UMLS). Adamusiak T, Shimoyama N, Shimoyama M, JMIR Med Inform. doi:10.2196/medinform.3172
• EHR-based phenome wide association study in pancreatic cancer. Adamusiak T, Shimoyama M, AMIA Summits Transl Sci Proc. 2014 (in press)