automating the inpatient chronic heart failure quality measures in va jennifer garvin phd, mba,...

Automating the Inpatient Chronic Heart Failure Quality Measures in

VA

Jennifer Garvin PhD, MBA, RHIA, CPHQ, CCS, CTR, FAHIMASalt Lake City VA Healthcare System IDEAS Center

University of Utah 5/16/14This study is supported by the VA HSR&D IBE- 09-069-1 grant

www.wordle.net

Overview

• Describe development of natural language processing (NLP) tools

• Describe inpatient chronic heart failure (CHF) quality measures

• Demonstrate the development of natural language process development tools using the case study of CHF

ClinicalGuidelines

FormularyClinicalProcesses

AppropriatenessMeasures

DecisionSupportPerformance

Measures

ClinicalReminders

Applied Use Case Purpose & Background

Evidence

http://www.healthquality.va.gov/chf/

http://www.healthquality.va.gov/chf/

VA Informatics and Computing Infrastructure- VINCI

• Research approvals • Assigned a research folder on VINCI• In VINCI all approved investigators and staff

can access data• http://www.hsrd.research.va.gov/for_researchers/vinci/

4

http://www.hsrd.research.va.gov/for_researchers/vinci/

http://www.hsrd.research.va.gov/for_researchers/vinci/

Preparation for Research• Workflow analysis- visitation and discussion with 2 echo

laboratories • Understanding how the documents are developed-

variety of approaches• Initial review of document structure to inform our

sampling strategy- we planned to oversampled free text and semi-structured text– Paragraph with no outline structure (free text)– Outline with some free text (semi-structured text)– Outline (highly structured text)

5

Document Format (Fake Texts using XXX in Place of Letters)

6

EF Sampling Strategy

• We had a total of 765 documents available, we needed 367 minimum documents for the test set, and used the remaining 398 documents were available for training.

• However, if during system training, the performance of the system reaches the pre-specified level of accuracy without using all available training documents, the remaining unused documents in the training set could be added to the test set.

• The 765 documents were randomly assigned to a training and test set in preparation for annotation.

7

NLP Development Methods- Reference (Gold) Standard

Development• All documents in the training and test sets

must have an accompanying reference (gold) standard so that the accuracy of the system can be measured during training and testing

• Software program called Knowtator was used for annotation

• Two independent reviewers with a third adjudicator when disagreement occurred

8

Ejection Fraction Annotation Schema1. Ejection Fraction – annotate all mentions of left ventricular ejection fraction. 2. Value – annotate all mentions of the quantitative value associated with left ventricular ejection fraction. 3. Qualitative assessment – annotate all mentions of qualitative assessment of LV ejection fraction and LV systolic function.4. LV systolic function – annotate all mentions of left ventricular systolic function.5. Document levela) EF Range (<40%, >=40%, undetermined)b) Informativeness: The format of the document was

sufficiently predictable that I could skim it rapidly to find what I wanted

c) Consistency: In order to verify that the information in the document was internally consistent, I found that I had to go “back and forth”

Classes

NLP Development Methods- Training and Testing• Training

– The system we used was developed but was not “trained” for this specific use case

– Separated the training documents into batches of documents and ran the system of a set of the batches

– Evaluated false positives and false negatives based on comparison to the reference (gold) standard

– Reprogram the system and run against the next batch– When pre-specified level of accuracy reached, measure accuracy at

the last iteration

• Testing– The system is run on the sequestered documents and the output

received by the statistician

10

Diagram of Overall Classification and Sub-classifications

Automated Data Acquisition for Heart Failure (ADAHF)

Summary of Development Steps

• Determine a use case• Investigate if there are

existing tools that have been used for a given use case

• If none, an existing tool may need to be may be generalized or a new one developed

• Determine data elements• Determine development

environment• Train and Test• Assess accuracy via

sensitivity,(recall) specificity, positive predictive value (precision)

Stakeholder EngagementTheoretic Framework and Model

• The Promoting Action on Research on Implementation in Health Services (PARIHS) framework1-2 – Evidence– Context – Facilitation

• Socio-Technical Model (STM)3

Eight Dimensions of which we are using four:– hardware and software– clinical content– workflow and

communication– internal organizational

features

1Stetler, 2011 http://www.implementationscience.com/content/6/1/99 2

Kitson , 2008 http://www.implementationscience.com/content/3/1/1

3 Sittig and Singh , A new sociotechnical model for studying health information technology in complex adaptive healthcare systems, Qual Saf Health Care 2010;19

http://www.implementationscience.com/content/6/1/99



Stakeholder Engagement: Semi-structured Interview and Thematic

Analysis • Approach is “Applied”- to solve a problem6 using a theoretical

thematic analysis7

• Two independent reviewers each create summary and organize identify themes to answer research questions

• Research group met to develop consensus codes on master themes.

• Three documents resulted – two summaries, group- consensus codes , consensus codes with highlighted text

6 Guest et al, Applied Thematic Analysis , 20127 Braun et al, Using Thematic Analysis in Psychology , 2006

Stakeholder Interview Results: Respondent Characteristics

• We interviewed 13 stakeholders. The interviewees included among others: clinical quality specialists; directors of quality management, clinical analysis and reporting; epidemiology; clinicians and pharmacists; and program analysts

• The range for the number of years in VA of respondents is 2-35

• And similarly, the range for the number of years in quality/patient safety is 2-33

Stakeholder Engagement Preliminary Results - Internal Factors

• Internal factors that facilitate implementation of an automated system include: – Use of evidence-based care– A culture of continuous quality improvement

coupled with measurement and accountability processes

– Quality control reporting both within and external to the VA.

Stakeholder Engagement (cont.)Hardware and Software-Preliminary Findings

• Informatics is used for:– Communication between

Providers and Patients• Secure messaging• Blue Button download• Kiosks• Mobile technology• MyHealtheVet

• Informatics is used for (cont.):– Clinical Care

• CPRS• Clinical Decision Support• Templates designed to

facilitate clinically relevant content

• Smart forms• CART-CL• Primary Care Almanac

We have an Informatics-Rich Environment in VA

Stakeholder Engagement- Preliminary Results- Informatics Applications Used

with Quality Metrics• Quality Improvement

Functionality (current)• Performance integrated

tracking application (PITA)• Measure Master Report

• System extraction and output specifics:

• Capture the concepts and values as well as the words around the concepts

• Have the ability to adjust the EF value captured

• In Development Growing number of informatics tools

• Process chart notes using NLP

• Surveillance tools• Analytic tools

• Meaningful Use– Determine how we

could provide data for meaningful use.

Formative Evaluation Process

• Initial stakeholder engagement- went through a couple cycles of:– Develop a prototype of

the tool with report– Revise based on

feedback

• Developed a final prototype of the table

• Develop an initial functional HMP Module

• User-centered Design Analysis

Thank you! Questions or Comments?

• Please contact me at [email protected]

• This study is undertaken as part of the VA HSR&D IBE- 09-069-1 grant. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs or the University of Utah School of Medicine.

• I thank the Department of Veterans Affairs for my fellowship and Gail Graham RHIA and Mark Weiner MD for being my mentors

• ADAHF Team:– Julia Heavirland/ Jenifer Williams

(Annotators)– Youngjun Kim (Application Specialist)– Stephane Meystre MD, PhD (Faculty

System Developer)– Drs. Bruce Bray, Paul Heidenreich, Mary

Goldstein, Wendy Chapman, Michael Matheny, Gobbel (Co-investigators)

– Andrew Redd PhD and Dan Bolton MS (Statisticians)

– Megha Kalsy MS and Natalie Kelly MBA (Stakeholder Engagement)

– Jennifer Garvin PhD, MBA (Principal Investigator)

mailto:[email protected]

Extra Slides

EF Sampling Strategy• To account for clustering, the sample size was increased by

the design effect. For ICC = 0.005, Deff = 1 + 25(0.005) = 1.125 so we require 179(1.125) = 201.375 202 positive cases or 29 positive cases per facility;

• Dividing by prevalence estimate of EF in documents, we have 29/0.80 = 36.25 37 documents required per facility. We multiply that result by the number of sites and the product is 37(7) = 259

• Doubling the sample size for the three facilities with free- or semi-structured text resulted in a minimum of 367 documents in the test set.

22

Definitions• Sensitivity is the proportion of patients with disease who test positive.

In probability notation: P(T+|D+) = TP / (TP+FN). • Specificity is the proportion of patients without disease who test

negative. In probability notation: P(T-|D-) = TN / (TN + FP). • Sensitivity and specificity describe how well the test discriminates

between patients with and without disease. They address a different question than we want answered when evaluating a patient, however. What we usually want to know is: given a certain test result, what is the probability of disease? This is the predictive value of the test. Predictive value of a positive test (PPV) is the proportion of patients with positive tests who have disease. In probability notation: (D+|T+) = TP / (TP+FP).

23

Definitions

• The weighted harmonic mean of precision and recall is the F-measure

• F= 2*Precision*Recall/(Precision + Recall)• Kappa- more accurate than percent agreement

as it accounts for chance agreement• K= Observed agreement + Hypothetical

probability of chance agreement/1-hypothetical probability of chance agreement

24

Definitions

• Regular expressions:– A regular expression (regex or regexp for short) is

a special text string for describing a search pattern.

– www.regular-expressions.info

25

automating the inpatient chronic heart failure quality measures in va jennifer garvin phd, mba,...

Documents