software engineering of nlp-based computer assisted coding

Software Engineering of NLP-Based Computer Assisted Coding Applications

Mark Morsch, MS; Carol Stoyla, BS,CLA; Ronald Sheffer, Jr., MA; Brian Potter, PhD

A-Life Medical, Inc. – San Diego, CA

Presentation Overview

• Introduction and Motivations• Background

– What is CMM (or CMMI)?– Can CMM be applied to NLP Software Development?

• Development Process– Key Practice Areas– Example Development Schedule

• Testing Model for NLP-based CAC Software• NLP and Scalability• Conclusion – Focus on Results

Introduction – NLP and CAC

• Natural Language Processing (NLP) software applications “read” physician notes and extract facts for coding

• NLP for CAC requires electronic text – human transcription or speech recognition, OCR of typed text also possible

• Primary applications today – CPT and ICD-9 coding for certain outpatient specialties, for example– Radiology it’s the biggest→– Emergency Medicine includes E/M coding→– Pathology

• Emerging applications– Inpatient coding work with current workflow, tools?→– Quality measures JCAHO, CMS, etc.→– Outcomes analysis

Motivations

• Why do we want structured processes for NLP software development?– Users should be confident in the results– Over time, CAC software should consistently improve– Medical coding is difficult and constantly changing, and the amount of

prerequisite knowledge is massive• What is (medical coding) or (NLP software development) like?

“I see mysteries and complications wherever I look, and I have never met a steadily logical person.” - Martha Gelhorn

• Structured processes bring order and logic to development– Deliver updates on schedule– More confidence that what is promised can be delivered– Verify that the application works as intended

Capability Maturity Model (CMM)

• Using a ranking system, measures the maturity of an organization’s software development processes– Developed by the Software Engineering Institute (SEI) at Carnegie

Mellon University– Defines best practices for organization involved in product development– Started in the 1980s to assess the capability of government contractors– Originally published in 1989, updates halted in 1997 in favor of CMMI

• Capability Maturity Model Integration (CMMI) is the successor to CMM– Integrates models from various disciplines – software development,

systems engineering, integrated product development and software acquisition

– Better fit to iterative development methods, versus the traditional waterfall approach

Software Hall of Shame

(Source – Charette, Robert. Why Software

Fails. IEEE Spectrum,

September 2005)

CMMI Overview

• Level 1: Initial – Ad hoc processes, results are unpredictable and primarily driven by the skill of the team

• Level 2: Managed – Core software development activities followed primarily at the project level

• Level 3: Defined – Development activities are implemented and managed across multiple projects, performance improved through training, verification & validation and integrated project management

• Level 4: Quantitatively Managed – Measures of business results such as cost, quality and timeliness utilized to improve organization performance, statistical quality control

• Level 5: Optimized – Continuous, quantitative and proactive process improvement allowing an organization to learn, adapt and improve

Applying CMM

• NLP, like other Artificial Intelligence (AI) software, is often not developed following a software development process– Input requirements very difficult to fully specify– Complex algorithms require special knowledge– Development is often evolutionary or experimental– Individuals in NLP development often do not have experience in

software engineering

• Criticisms of CMM– Emphasis of process over the individual– Lack of emphasis on innovation– Emphasis on activities over results

Development Process

• Performance is consistent and continuously improving over time

• Foster innovative thinking• Robust testing model for measuring results• Combine efforts of three skill areas:

– Computer science– Linguistics– Medical coding

Five Practice Areas

1. Requirements Management2. Rapid Development Cycle3. Verification and Validation4. Complete Configuration Control5. Formal Build and Installation Process

1. Requirements Management

• Using a defect tracking tool, domain experts file bug reports and enhancement requests

• Example medical documents and details of the desired output

• Items are assigned a severity and frequency• Priority list is defined at the start of each

update cycle

2. Rapid Development Cycle

• Relatively short time lines between the publication date and the implementation date of coding changes– 6 to 12 weeks typically

• Weekly build and unit test cycles– Verification of changes taking place with each unit test

• For new product development, iterations are extended to accommodate more significant development

3. Verification and Validation

• Verification ensures the changes work correctly– With zero or minimal regressions

• Validation ensures that the right changes have been done• Verification is done at both the unit and system testing

levels– Unit testing is performed at the component level and is executed

by the NLP Development team– QA team may assist in the analysis of unit test results

• Independent quality assurance team, separate from the NLP development team, performs system verification and validation

4. Complete Configuration Control

• All source code and system knowledge base files are maintained within a configuration control system– Examples: VSS, CVS, ClearCase

• Development Lead is responsible for coordinating source code check-in and setting build checkpoints

• All changes are recorded and documented, and past build configurations can be recovered

5. Formal Build and Installation Process

• Installation packages are used with written installation instructions

• Installation package records all component names and version numbers

• Used to install releases into both the QA and production environments

• Greatly reduces the likelihood of errors during the installation process

Example Development Schedule

• Phase 1: Requirements Analysis– Weeks 1 -2: Bug reports and enhancement requests analyzed and

prioritized

• Phase 2: Development and Unit Testing– Weeks 3 – 9: Changes implemented, may overlap with Phase 1 if

final code updates are not known

• Phase 3: System Testing– Weeks 10 – 11: System installation package is built and delivered

to the QA team

• Phase 4: Production Deployment and Documentation– Week 12: NLP software installed into the production environment

Tracking The Process

Testing Model

• For NLP software, it’s difficult to determine the appropriate level of testing

• Regression Testing– Ensures development does not break current behavior– Large scale test with a statistically significant sample (5% to 7%)

of monthly production data– For A-Life in radiology, over 150,000 documents per batch

• Progression Testing– Verifies changes are functioning as expected– Scale is much smaller, hundreds of documents– Use examples identified by domain experts during Requirements

Analysis phase

Test Execution and Analysis

• Automation of execution and analysis is essential to reach this scale

• Unit Testing Platform– Encapsulates the core NLP processing– Used by the NLP development team

• System Testing Platform– Copy of the production system, including all pre- and post-

processing stages

• Analysis Platform– Scripts compare differences between any two test runs– Visual evaluation tool allows coding experts to score each change

Coder Change Statistics

Analysis Platform

NLP and Scalability

• Even with large-scale testing, NLP software will encounter new or unfamiliar language

• Two qualities of graceful behavior:– Understand more than patterns of words but also model the

underlying semantics – use ontologies– Detect situations when the content of the document is not

adequately recognized by the NLP software

• Can NLP CAC software scale across medical domains?– Most current applications are focused on medical specialties– Verification and validation to address an even larger scale– Tighter focus on a single coding system, such as ICD-9, that can

be validated in narrower context

Conclusion – Focus on Results

• Structured software development works for NLP– Gives confidence to both the developer and user– Iterative, results-driven approach is best

• CAC application require means of verification that are transparent, repeatable and scalable

• Onus on NLP software developers to verify performance on a large scale

• Acceptable performance is difficult to quantify• Ongoing work - Define a verification process

applicable across multiple medical domains

More Information

• Mark Morsch, VP NLP/Software Engineering, [email protected]

• CMMI Web Site - http://www.sei.cmu.edu/cmmi/• Why Software Fails. IEEE Spectrum, Sept 2005 -

http://www.spectrum.ieee.org/sep05/1685• LifeCode® NLP technology - http://

www.alifemedical.com/documents/LifeCodeAIMagazine.pdf

mailto:[email protected]

http://www.sei.cmu.edu/cmmi/

http://www.alifemedical.com/documents/LifeCodeAIMagazine.pdf

http://www.alifemedical.com/documents/LifeCodeAIMagazine.pdf

software engineering of nlp-based computer assisted coding

Documents