clincal natural language processing for health …
TRANSCRIPT
CLINCAL NATURAL LANGUAGE PROCESSING
FOR HEALTH OUTCOMES RESEARCH:
Current Applications and Future State for Patient Care
Majid Afshar, MD, MSCR
Assistant Professor
Division of Allergy, Pulmonary and Critical Care Medicine
Department of Medicine
CONTENT
• Clinical NLP– Current research - machine learning/data-driven methods
• Language Model– Future research– Future applications
• Model Deployment
• Career Development through NIH
DISCLOSURE
• None
NLP TASKS
•Sentiment analysis•is a text positive or negative?
•Text generation•provide a prompt and the model will generate what follows.
•Name entity recognition (NER)•in an input sentence, label each word with the entity it represents (anatomy, symptom, medication, etc.)
•Question/Answering•provide the model with some context and a question, extract the answer from the context.
•Filling masked text•given a text with masked words, fill the blanks.
•Summarization•generate a summary of a long text.
•Translation•translate a text in another language.
•Feature extraction•return a tensor representation of the text.
•Relation extraction•Semantic relationships between two or more entities
CLINICAL DOCUMENTS IN THE EHR
EXAMPLE #1:
Social History:
- Previous smoker - Heavy alcohol use. ( Pint of bourbon daily) - No drugs
EXAMPLE #2:
HISTORY OF PRESENT ILLNESS: …past medical history of seizures from
"quitting beer" who presents to the ER by paramedics after being found down
from an apparent fall down the stairs.
• lorazepam 1 mg inj 1 mg
intravenous every 2 hours as needed
EXAMPLE #3:
• Alcohol use 3.0 - 4.0 oz/weeKS 6 - 8 Standard drinks or
equivalent per week Comment: 3/16- he drinks 2 beers/night M-F and
weekends 6 pack and 1/2 bottle of whiskey - on each weekend day
NATURAL LANGUAGE PROCESSING PIPELINE
▪ Unstructured Information Management Architecture (UIMA)▪ Java-based framework for the analysis of unstructured content like text, video, and audio data▪ Originally developed by IBM and later open sourced
▪ NLP Engine: Apache clinical Text and Knowledge Extraction System (cTAKES)▪ Built on Apache UIMA (Uniform Information Managment Architecture)▪ Modular, portable (JAVA), open-source
Clinical Document
Annotators
XMI File
cTAKES PIPELINEBoundary Detection
Tokenization
Part-of-speech Tagging
Normalization
… The patient underwent a CT scan in April which did not reveal lesions in his liver. …
The patient underwent a CT scan in April which did not reveal lesions in his liver .
- - undergo - - - - - - do - - lesion - - -
Courtesy of Guergana Savova, PhD. Boston Children’s Hospital/Harvard Medical School
DT NN VBD DT NN NN IN NNP WDT VBD RB VB NNS IN PRP$ NN .
NAME ENTITY RECOGNITION
https://ctakes.apache.org/
NORMALIZE TO DOMAIN ONTOLOGIES
▪ Map synonyms to the same Concept Unique Identifier (CUI)
▪ C0572070 → ‘heroin overdose’
▪ Atoms/concept relations: overdose heroin, diamorphine overdose, intentional self poisoning by and exposure to narcotic drugs
Entity Recognition
CT scan Lesion Liver
Procedure Disease / Disorder Anatomy
UMLS ID: C0040405 UMLS ID: C0022198 UMLS ID: C0023884
PRE-PROCESSING NOTES
Dligach D et al. J Biomed Inform. 2020
NLP ARCHITECTURE AT SCALE
Afshar et al. J Amer Med Inform Assoc. 2019; AMIA Knowledge Center – August 8, 2019 Webinar
CONVERSION TO STANDARDIZED MEDICAL VOCABULARY
Data Standard Frequency (%)
LOINC 17,024,130,269 (45.1%)
SNOMEDct_US 11,233,587,817 (29.8%)
MeSH 5,752,874,568 (15.3%)
RxNORM 1,580,300,288 (4.2%
ICD-9-CM 818,290,694 (2.2%)
ICD-10-CM 526,588,528 (1.4%)
CPT 599,669,981 (1.5%)
HCPT 186,444,461 (0.5%)
Total CUIs: 37,721,886,606
Clinical documents Frequency (%)
Clinical Notes
(admission, daily progress,
discharge, nursing, ancillary)
69,671,392 (83.8%)
Laboratory result reports 9,555,249 (11.5%)
Radiology reports 2,844,433 (3.5%)
Pathology reports 575,966 (0.7%)
Electrocardiogram reports 351,446 (0.4%)
Cardiology-related reports 113,164 (0.1%)
Unlabeled documents 32,704 (<0.1%)
Afshar et al. J Amer Med Inform Assoc. 2019
APPLICATION FOR 30-DAY MORTALITY PREDICTION
▪ 30M notes and 1.3M unigrams▪ 100K unique CUIs▪ Among top CUI features:
▪ Βeta = 3.1 |c0012306| Hydromorphone▪ Beta = 1.4 |c0026549| Morphine
Afshar et al. J Amer Med Inform Assoc. 2019
EARLY DETECTION OF RESPIRATORY FAILURE
0
10
20
30
40
50
ICU Day 1 ICU Day 2Pe
rcen
t
p=0.439
• Consecutive ICU admissions across 3 years:
• 31% of ARDS cases mentioned in notes of ARDS confirmed cases
• ~30% not receiving lung protective ventilation across first 2 days
LUNG SAFE. JAMA 2016; Needham et al. AJRCC 2014.
KEWYORD RULE-BASED ALGORITHMS FOR RESPIRATORY FAILURE
Afshar et al. J AMIA Annu Symp Proc. 2018
MODEL DEVELOPMENT
Supervised Learning
Assertion
Classifier
Free text
Learner Predictor
TrainingDataset80% = 446
TestingDataset20% = 112
Feature
Engineering
• 533 patients 162,440 radiology reports and clinical notes• 9,255 radiology reports• 1,704 radiology reports within 24 hours of qualifying oxygen ratio
• 6,026 unique words (unigrams)• 1,774 Concept Unique Identifiers
• Accounting for negation (no edema vs. edema)
Afshar et al. J AMIA Annu Symp Proc. 2018
ARDSYES/NO
MODEL DEVELOPMENT AND INTERNAL VALIDATION
FeatureEngineering
Model/Machine Learning Classifier
NPV PPV ROC AUC
Text with keyword Rule-based system 89%
(81% - 97%)
42%
(28%-56%)
NA
Unigrams
(24hr)
Support Vector Machine 82%
(74%-90%)
55%
(43%-77%)
0.73
(0.65-0.85)
Unigram
(All time)
Support Vector Machine 85%
(78%-93%)
63%
(43%-82%)
0.81
(0.72-0.91)
• 533 patients, 162,440 notes → 9,255 radiology reports → 1,704 radiology reports w/n 24hrs
Afshar et al. J AMIA Annu Symp Proc. 2018
TOP FEATURES AND EXTERNAL VALIDATION
PositiveCUI Features
Respiratory Distress Syndrome, AdultMalaiseFluid overloadAspiration-actionBoth lungsCommunicable diseasesPulmonary edemaInfiltrationOn ventilatorDisease progression
NegativeCUI Features
EdemaPlain chest x-rayPneumothoraxTracheal extubationChronic obstructive airway diseaseAtelectasisWidened mediastinumComminuted fracture type
Text vs. CUI-based models
Mayampurath A. et al. Crit Care Med. 2020; https://github.com/AfsharJoyceInfoLab/ARDS_Classifier
ERA OF DEEP LEARNING AND LESS FEATURE ENGINEERING
Dligach D. et al. J Amer Med Inform Assoc. 2019
Playground.tensorflow.org
Feature
Engineering
Supervised Learning
(Reference: EHR )
Assertion
Classifier
Concept unique identifiers
vs.
Character-based
vs.
words (n-grams)
Learner Predictor
Training/
Validation
(80%)
Testing
(20%)
YES
NO
Opioid Misuse
Heroin
Cocaine
Injection
drug
Overdose
Inhale
Asthma
Breath
CUI embeddings into Convolutional Neural Network
C0011892
C0009170
C0556406
C0029944
C0004048
C0004096
C0225386
Sharma et al. BMC Med Inform Dec Making 2020
SCREENING FOR OPIOID MISUSE
COMPARISON OF MODEL
• Rule-based approach• (+) urine drug screen opioid and
• illicit drug • Nonmedical
benzodiazepine Rx• Nonmedical amphetamine
Rx• (+) urine drug screen opioid
only and no Rx• ICD-9/10 codes for opioid
poisoning or intoxication
Sharma et al. BMC Med Inform Dec Making 2020
PREDICTIVE VALIDITY METRICS
Model
ROC AUC
(95% CI)
F1 Sensitivity
%
(95% CI)
Specificity %
(95% CI)
PPV %
(95% CI)
NPV %
(95% CI)
Rule-Based
Keyword
Approach
NA 76 87
(76-94)
79
(71-86)
68
(57-78)
92
(85-96)
Logistic
Regression -CUIs
91
(86-95)
79 71
(58-81)
95
(90-98)
89
(77-96)
86
(80-91)
Convolutional
neural network -
CUIS
93
(90-97)
81 79
(68-88)
91
(85-95)
82
(70-90)
89
(83-94)
Convolutional
neural network -
words
94
(91-98)
84 75
(63-85)
98
(93-100)
94
(85-99)
88
(82-93)
Sharma et al. BMC Med Inform Dec Making 2020
EXTERNAL VALIDATION
• First 24 hrs: • sensitivity and specificity of
0.75 (0.71- 0.78) and 0.99 (0.99 - 0.99)
• PPV and NPV were 0.61 (0.57-0.64) and 0.99 (0.99 - 0.99).
• ~2 alerts per day for every 100 patients
• 1 in every 1.6 alerts would be a true positive
• Number needed to evaluate of 1.6
Rush University Medical Center – Manuscript under review
UW HEALTH ADVANCED ANALYTICS
Courtesy of Frank Liao, PhD, Director of Data Science & Advanced Analytics, Enterprise Analytics at UW Health
QUASI-EXPERIMENTAL STUDY
ClinicalTrials.gov Identifier: NCT03833804 Data-driven Identification for Substance Misuse (NIH/NIDA R01 DA051464)
ERA OF DEEPER LEARNING WITH PRE-TRAINED LANGUAGE MODELS
https://tinyurl.com/NAACLTransfer
• General-domain language models vs. domain-specific pre-training
• Bidirectional Encoder Representation from Transformers (BERT) is the first end-to-end representation model that achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks, outperforming many task-specific architectures
EVOLUTION OF NLP LANGUAGE MODELS
Microsoft.gom
GOOGLE BERT LANGUAGE MODEL IN PRACTICE
https://github.com/google-research/bert
FEATURE EXTRACTION
Dligach D, et al. Journal of American Medical Informatics Association, 2019
MASKED LANGUAGE MODELING VS BILLING CODE PREDICTION
Self-supervised pre-training is competitive with using billing codes for supervision
Dligach D, et al. Journal of Biomedical Informatics, 2020.
Dligach D, Afshar M, Miller T. Journal of American Medical Informatics Association, 2019
SELF-SUPERVISED WITH MASKED LANGUAGE MODEL
Dligach D, et al. Journal of Biomedical Informatics, 2020.
LEARNING HIERARCHICAL TRANSFORMER-BASEDREPRESENTATIONS OF CLINICAL NOTES
SystemsTest Average
Macro F1 Score
Our hierarchical model 0.743
The best performing domain-
specific pre-trained model0.755
The best performing model
from the challenge0.675
Xu S. AMIA Annual Symposium Poster Presentation, 2020.
BIOMEDICAL LANGUAGE UNDERSTANDING ANDREASONING BENCHMARK (BLURB)
https://microsoft.github.io/BLURB/
EHR DEPLOYMENT CONSIDERATIONS
• Calibration and Model Updating
• Fair and Equitable/Bias Checks
• Interpretability and Explainability of Algorithms
• Workload
DOCTOR GPT-3: HYPE OR REALITY?
https://www.nabla.com/blog/gpt-3/
CALIBRATION DRIFT
DECISION CURVE ANALYSIS
PEth = phosphatidylethanol; BAC = blood alcohol concentration. Decision curve analysis was applied to examine the net
benefit of the best derived biomarker against BAC.
BIAS ASSESSMENT
MODEL-AGNOSTIC INTERPRETATION METHODS
• Partial Dependence Plot• Individual Conditional
Expectation (ICE)• Accumulated Local Effects• Feature interaction• Permutation Feature Importance• Global Surrogate• Local Surrogate (LIME)• Scoped Rules (Anchors)• Shapley Values
https://christophm.github.io/interpretable-ml-book/
Local Interpretable Model-Agnostic Explanations (LIME)
Kulshrestha S. et al. Mauscript submitted
• Prediction of severe chest injury from first 8 hours of notes during traumaencounter: Top Features from CUI embedded CNN
Global Local
CAREER DEVELOPMENT
• NIH/NIAAA F32 or equivalent
– Ruth L. Kirschstein National Research Service Fellowship
– 1-year during clinical fellowship
• NIH/NIAAA Loan Repayment Program
– Entering 5th year
– LRP Ambassador
– Up to $35,000/yr towards education debt
CAREER DEVELOPMENT
• NIH/NIAAA K23 Career Development Award or Equivalent– Year 4/5– Protects 75% effort– Begin building lab/team
• NIH/NIDA R01 Independent Researcher Award– Year 1/5– Budget to support our lab/team
• Study Section Reviewer: AHRQ, DOD, NIH– Early Career Review Program
UW-Madison Critical Care Medicine Data Science Lab
LAB
• Matthew Churpek, MD, MPH, PhD
(co-lead)
• Madeline Oguss, MPH
• Meysam Ghaffari, PhD
• Azi Bashiri, PhD
• John Caskey, PhD
• Alex Spicer, MS
• Kyle Carey, MS
COLLABORATORS
• Dmitriy Dligach, PhD
• Cara Joyce, PhD
• Anoop Myampurath, PhD
• Niranjan Karnik, MD, PhD
• Hale Thompson, PhD
• Brihat Sharma, PhD
• David Gustafson, PhD
• Randy Brown, MD, PhD
Twitter: @UW_ICU_DataSci