clincal natural language processing for health …

CLINCAL NATURAL LANGUAGE PROCESSING

FOR HEALTH OUTCOMES RESEARCH:

Current Applications and Future State for Patient Care

Majid Afshar, MD, MSCR

Assistant Professor

Division of Allergy, Pulmonary and Critical Care Medicine

Department of Medicine

CONTENT

• Clinical NLP– Current research - machine learning/data-driven methods

• Language Model– Future research– Future applications

• Model Deployment

• Career Development through NIH

DISCLOSURE

• None

NLP TASKS

•Sentiment analysis•is a text positive or negative?

•Text generation•provide a prompt and the model will generate what follows.

•Name entity recognition (NER)•in an input sentence, label each word with the entity it represents (anatomy, symptom, medication, etc.)

•Question/Answering•provide the model with some context and a question, extract the answer from the context.

•Filling masked text•given a text with masked words, fill the blanks.

•Summarization•generate a summary of a long text.

•Translation•translate a text in another language.

•Feature extraction•return a tensor representation of the text.

•Relation extraction•Semantic relationships between two or more entities

CLINICAL DOCUMENTS IN THE EHR

EXAMPLE #1:

Social History:

- Previous smoker - Heavy alcohol use. ( Pint of bourbon daily) - No drugs

EXAMPLE #2:

HISTORY OF PRESENT ILLNESS: …past medical history of seizures from

"quitting beer" who presents to the ER by paramedics after being found down

from an apparent fall down the stairs.

• lorazepam 1 mg inj 1 mg

intravenous every 2 hours as needed

EXAMPLE #3:

• Alcohol use 3.0 - 4.0 oz/weeKS 6 - 8 Standard drinks or

equivalent per week Comment: 3/16- he drinks 2 beers/night M-F and

weekends 6 pack and 1/2 bottle of whiskey - on each weekend day

NATURAL LANGUAGE PROCESSING PIPELINE

▪ Unstructured Information Management Architecture (UIMA)▪ Java-based framework for the analysis of unstructured content like text, video, and audio data▪ Originally developed by IBM and later open sourced

▪ NLP Engine: Apache clinical Text and Knowledge Extraction System (cTAKES)▪ Built on Apache UIMA (Uniform Information Managment Architecture)▪ Modular, portable (JAVA), open-source

Clinical Document

Annotators

XMI File

cTAKES PIPELINEBoundary Detection

Tokenization

Part-of-speech Tagging

Normalization

… The patient underwent a CT scan in April which did not reveal lesions in his liver. …

The patient underwent a CT scan in April which did not reveal lesions in his liver .

- - undergo - - - - - - do - - lesion - - -

Courtesy of Guergana Savova, PhD. Boston Children’s Hospital/Harvard Medical School

DT NN VBD DT NN NN IN NNP WDT VBD RB VB NNS IN PRP$ NN .

NAME ENTITY RECOGNITION

https://ctakes.apache.org/

NORMALIZE TO DOMAIN ONTOLOGIES

▪ Map synonyms to the same Concept Unique Identifier (CUI)

▪ C0572070 → ‘heroin overdose’

▪ Atoms/concept relations: overdose heroin, diamorphine overdose, intentional self poisoning by and exposure to narcotic drugs

Entity Recognition

CT scan Lesion Liver

Procedure Disease / Disorder Anatomy

UMLS ID: C0040405 UMLS ID: C0022198 UMLS ID: C0023884

PRE-PROCESSING NOTES

Dligach D et al. J Biomed Inform. 2020

NLP ARCHITECTURE AT SCALE

Afshar et al. J Amer Med Inform Assoc. 2019; AMIA Knowledge Center – August 8, 2019 Webinar

CONVERSION TO STANDARDIZED MEDICAL VOCABULARY

Data Standard Frequency (%)

LOINC 17,024,130,269 (45.1%)

SNOMEDct_US 11,233,587,817 (29.8%)

MeSH 5,752,874,568 (15.3%)

RxNORM 1,580,300,288 (4.2%

ICD-9-CM 818,290,694 (2.2%)

ICD-10-CM 526,588,528 (1.4%)

CPT 599,669,981 (1.5%)

HCPT 186,444,461 (0.5%)

Total CUIs: 37,721,886,606

Clinical documents Frequency (%)

Clinical Notes

(admission, daily progress,

discharge, nursing, ancillary)

69,671,392 (83.8%)

Laboratory result reports 9,555,249 (11.5%)

Radiology reports 2,844,433 (3.5%)

Pathology reports 575,966 (0.7%)

Electrocardiogram reports 351,446 (0.4%)

Cardiology-related reports 113,164 (0.1%)

Unlabeled documents 32,704 (<0.1%)

Afshar et al. J Amer Med Inform Assoc. 2019

APPLICATION FOR 30-DAY MORTALITY PREDICTION

▪ 30M notes and 1.3M unigrams▪ 100K unique CUIs▪ Among top CUI features:

▪ Βeta = 3.1 |c0012306| Hydromorphone▪ Beta = 1.4 |c0026549| Morphine

Afshar et al. J Amer Med Inform Assoc. 2019

EARLY DETECTION OF RESPIRATORY FAILURE

0

10

20

30

40

50

ICU Day 1 ICU Day 2Pe

rcen

t

p=0.439

• Consecutive ICU admissions across 3 years:

• 31% of ARDS cases mentioned in notes of ARDS confirmed cases

• ~30% not receiving lung protective ventilation across first 2 days

LUNG SAFE. JAMA 2016; Needham et al. AJRCC 2014.

KEWYORD RULE-BASED ALGORITHMS FOR RESPIRATORY FAILURE

Afshar et al. J AMIA Annu Symp Proc. 2018

MODEL DEVELOPMENT

Supervised Learning

Assertion

Classifier

Free text

Learner Predictor

TrainingDataset80% = 446

TestingDataset20% = 112

Feature

Engineering

• 533 patients 162,440 radiology reports and clinical notes• 9,255 radiology reports• 1,704 radiology reports within 24 hours of qualifying oxygen ratio

• 6,026 unique words (unigrams)• 1,774 Concept Unique Identifiers

• Accounting for negation (no edema vs. edema)


ARDSYES/NO

MODEL DEVELOPMENT AND INTERNAL VALIDATION

FeatureEngineering

Model/Machine Learning Classifier

NPV PPV ROC AUC

Text with keyword Rule-based system 89%

(81% - 97%)

42%

(28%-56%)

NA

Unigrams

(24hr)

Support Vector Machine 82%

(74%-90%)

55%

(43%-77%)

0.73

(0.65-0.85)

Unigram

(All time)

Support Vector Machine 85%

(78%-93%)

63%

(43%-82%)

0.81

(0.72-0.91)

• 533 patients, 162,440 notes → 9,255 radiology reports → 1,704 radiology reports w/n 24hrs


TOP FEATURES AND EXTERNAL VALIDATION

PositiveCUI Features

Respiratory Distress Syndrome, AdultMalaiseFluid overloadAspiration-actionBoth lungsCommunicable diseasesPulmonary edemaInfiltrationOn ventilatorDisease progression

NegativeCUI Features

EdemaPlain chest x-rayPneumothoraxTracheal extubationChronic obstructive airway diseaseAtelectasisWidened mediastinumComminuted fracture type

Text vs. CUI-based models

Mayampurath A. et al. Crit Care Med. 2020; https://github.com/AfsharJoyceInfoLab/ARDS_Classifier

ERA OF DEEP LEARNING AND LESS FEATURE ENGINEERING

Dligach D. et al. J Amer Med Inform Assoc. 2019

Playground.tensorflow.org

Feature

Engineering

Supervised Learning

(Reference: EHR )

Assertion

Classifier

Concept unique identifiers

vs.

Character-based

vs.

words (n-grams)

Learner Predictor

Training/

Validation

(80%)

Testing

(20%)

YES

NO

Opioid Misuse

Heroin

Cocaine

Injection

drug

Overdose

Inhale

Asthma

Breath

CUI embeddings into Convolutional Neural Network

C0011892

C0009170

C0556406

C0029944

C0004048

C0004096

C0225386

Sharma et al. BMC Med Inform Dec Making 2020

SCREENING FOR OPIOID MISUSE

COMPARISON OF MODEL

• Rule-based approach• (+) urine drug screen opioid and

• illicit drug • Nonmedical

benzodiazepine Rx• Nonmedical amphetamine

Rx• (+) urine drug screen opioid

only and no Rx• ICD-9/10 codes for opioid

poisoning or intoxication


PREDICTIVE VALIDITY METRICS

Model

ROC AUC

(95% CI)

F1 Sensitivity

%

(95% CI)

Specificity %

(95% CI)

PPV %

(95% CI)

NPV %

(95% CI)

Rule-Based

Keyword

Approach

NA 76 87

(76-94)

79

(71-86)

68

(57-78)

92

(85-96)

Logistic

Regression -CUIs

91

(86-95)

79 71

(58-81)

95

(90-98)

89

(77-96)

86

(80-91)

Convolutional

neural network -

CUIS

93

(90-97)

81 79

(68-88)

91

(85-95)

82

(70-90)

89

(83-94)

Convolutional

neural network -

words

94

(91-98)

84 75

(63-85)

98

(93-100)

94

(85-99)

88

(82-93)


EXTERNAL VALIDATION

• First 24 hrs: • sensitivity and specificity of

0.75 (0.71- 0.78) and 0.99 (0.99 - 0.99)

• PPV and NPV were 0.61 (0.57-0.64) and 0.99 (0.99 - 0.99).

• ~2 alerts per day for every 100 patients

• 1 in every 1.6 alerts would be a true positive

• Number needed to evaluate of 1.6

Rush University Medical Center – Manuscript under review

UW HEALTH ADVANCED ANALYTICS

Courtesy of Frank Liao, PhD, Director of Data Science & Advanced Analytics, Enterprise Analytics at UW Health

QUASI-EXPERIMENTAL STUDY

ClinicalTrials.gov Identifier: NCT03833804 Data-driven Identification for Substance Misuse (NIH/NIDA R01 DA051464)

ERA OF DEEPER LEARNING WITH PRE-TRAINED LANGUAGE MODELS

https://tinyurl.com/NAACLTransfer

• General-domain language models vs. domain-specific pre-training

• Bidirectional Encoder Representation from Transformers (BERT) is the first end-to-end representation model that achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks, outperforming many task-specific architectures

https://tinyurl.com/NAACLTransfer

EVOLUTION OF NLP LANGUAGE MODELS

Microsoft.gom

GOOGLE BERT LANGUAGE MODEL IN PRACTICE

https://github.com/google-research/bert

FEATURE EXTRACTION

Dligach D, et al. Journal of American Medical Informatics Association, 2019

MASKED LANGUAGE MODELING VS BILLING CODE PREDICTION

Self-supervised pre-training is competitive with using billing codes for supervision

Dligach D, et al. Journal of Biomedical Informatics, 2020.

Dligach D, Afshar M, Miller T. Journal of American Medical Informatics Association, 2019

SELF-SUPERVISED WITH MASKED LANGUAGE MODEL

Dligach D, et al. Journal of Biomedical Informatics, 2020.

LEARNING HIERARCHICAL TRANSFORMER-BASEDREPRESENTATIONS OF CLINICAL NOTES

SystemsTest Average

Macro F1 Score

Our hierarchical model 0.743

The best performing domain-

specific pre-trained model0.755

The best performing model

from the challenge0.675

Xu S. AMIA Annual Symposium Poster Presentation, 2020.

BIOMEDICAL LANGUAGE UNDERSTANDING ANDREASONING BENCHMARK (BLURB)

https://microsoft.github.io/BLURB/

EHR DEPLOYMENT CONSIDERATIONS

• Calibration and Model Updating

• Fair and Equitable/Bias Checks

• Interpretability and Explainability of Algorithms

• Workload

DOCTOR GPT-3: HYPE OR REALITY?

https://www.nabla.com/blog/gpt-3/

CALIBRATION DRIFT

DECISION CURVE ANALYSIS

PEth = phosphatidylethanol; BAC = blood alcohol concentration. Decision curve analysis was applied to examine the net

benefit of the best derived biomarker against BAC.

BIAS ASSESSMENT

MODEL-AGNOSTIC INTERPRETATION METHODS

• Partial Dependence Plot• Individual Conditional

Expectation (ICE)• Accumulated Local Effects• Feature interaction• Permutation Feature Importance• Global Surrogate• Local Surrogate (LIME)• Scoped Rules (Anchors)• Shapley Values

https://christophm.github.io/interpretable-ml-book/

Local Interpretable Model-Agnostic Explanations (LIME)

Kulshrestha S. et al. Mauscript submitted

• Prediction of severe chest injury from first 8 hours of notes during traumaencounter: Top Features from CUI embedded CNN

Global Local

CAREER DEVELOPMENT

• NIH/NIAAA F32 or equivalent

– Ruth L. Kirschstein National Research Service Fellowship

– 1-year during clinical fellowship

• NIH/NIAAA Loan Repayment Program

– Entering 5th year

– LRP Ambassador

– Up to $35,000/yr towards education debt

CAREER DEVELOPMENT

• NIH/NIAAA K23 Career Development Award or Equivalent– Year 4/5– Protects 75% effort– Begin building lab/team

• NIH/NIDA R01 Independent Researcher Award– Year 1/5– Budget to support our lab/team

• Study Section Reviewer: AHRQ, DOD, NIH– Early Career Review Program

UW-Madison Critical Care Medicine Data Science Lab

LAB

• Matthew Churpek, MD, MPH, PhD

(co-lead)

• Madeline Oguss, MPH

• Meysam Ghaffari, PhD

• Azi Bashiri, PhD

• John Caskey, PhD

• Alex Spicer, MS

• Kyle Carey, MS

COLLABORATORS

• Dmitriy Dligach, PhD

• Cara Joyce, PhD

• Anoop Myampurath, PhD

• Niranjan Karnik, MD, PhD

• Hale Thompson, PhD

• Brihat Sharma, PhD

• David Gustafson, PhD

• Randy Brown, MD, PhD

Twitter: @UW_ICU_DataSci

clincal natural language processing for health …

Documents