talk - xml structuring clinical narrative

180
XML Structuring of Clinical XML Structuring of Clinical Narrative Using Natural Language Narrative Using Natural Language Processing Processing Naomi Sager HL7-CDA2 Acapulco, Mexico October 20, 2004

Upload: others

Post on 11-Nov-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Talk - XML Structuring Clinical Narrative

XML Structuring of Clinical XML Structuring of Clinical Narrative Using Natural Language Narrative Using Natural Language

ProcessingProcessing

Naomi SagerHL7-CDA2 Acapulco, Mexico

October 20, 2004

Page 2: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—2

Good morning.I would like to thank the Program Committee for this opportunity to

introduce you to Natural Language Processing (NLP). Perhaps my presence

here means that people no longer pose the question:

Why Process Clinical Narrative?

Page 3: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—3

Why process clinical narrative ?Why process clinical narrative ?• Natural language patient documents contain important

information— details and context of findings— time features of disease process

• Structured Data Entry (SDE) cannot capture it all— menus too detailed ?— menus too brief ?

• Natural language is natural— known— powerful— habitual

Page 4: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—4

SOME BACKGROUND

NLP goes back some 45 years.

In the late 1950’s, the US National Science Foundation was concerned with

the post-war explosion of the scientific and technical literature. They sought

new means of processing and retrieving textual information. They turned to

linguists to help solve the problem, with surprising initial success.

Page 5: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—5

First English parsing programFirst English parsing program

• University of Pennsylvania, Department of Linguistics, 1959

• UNIVAC I— Vacuum tubes— 1,000 words of storage (backed by tapes)

• Parsed 1-page scientific text.

Page 6: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—6

The first English parsing program ran successfully in 1959, on the Univac, one of

the very first computers.Few of you can call up an image of the

Univac, but I can, because my office at the University of Pennsylvania was on top of it.

That is, I was on the second floor and the Univac occupied a very large room on the

ground floor below.The walls of that room were lined from floor

to ceiling with racks of chassis filled with vacuum tubes. Yes, vacuum tubes. It was

one person's sole job to hunt down and replace tubes that were no longer lit.

Page 7: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—7

Technical issuesTechnical issues

• Need high speed and large memory• Need large and rich lexicon• Need new forms of rules

Page 8: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—8

If a parser functioned 45 years ago, you may ask: why is it taking so long for NLP

applications to emerge?

For one, the technology had to catch up with the possibility.

At first it took 20 minutes to parse a sentence.

Also, a sample text needed only a small dictionary—just the words of the text with

their parts of speech and certain attributes. So dictionaries, or lexicons,

had to be built.

Page 9: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—9

Then unanticipated issues arose. For example, if the parsing grammar covered all the possible ways you can compose a

sentence, then unless constrained, it would build many parses for a single

sentence.

So new kinds of constraining rules had to be implemented.

Page 10: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—10

NLP issuesNLP issues

• Massive detail of language— How to organize it

• Meaning— How to characterize it

• Information— How to represent it

Page 11: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—11

As the issues became defined, researchers generally associated these three major

issues with three major levels of processing:

Syntax, Semantics, Pragmatics.

Page 12: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—12

On the Syntax level, the process was to —Obtain the grammatical structure of

sentences (what came to be known as parsing)

On the Semantics level, the process was to — Treat word meanings and relations by some operational system of attributes

attached to wordsOn the Pragmatics level it was

understood that we had to — Develop representational structures (perhaps in

the then emerging database framework) and create the appropriate application

algorithms.

Page 13: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—13

Major levels of processingMajor levels of processing

• Syntax— grammatical structure of sentences

• Semantics— word meanings and relations

• Pragmatics— representational structures and

application algorithms

Page 14: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—14

Not so easy! Progress came to be measured first in years, and then in decades. Here, briefly, is how those

decades were spent.

The first decade saw different theories of grammar being implemented in a variety

of parsing algorithms.

Page 15: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—15

NLP NLP …… by decadesby decades

1965-1975: Parsing using linguistics— Rule collections [Harvard]— Transformational Generative Grammar

[IBM]— Linguistic String Analysis [NYU]— Augmented Transition Network [BBN]

Page 16: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—16

Strangely, parsing a sentence, that is, obtaining a grammatical representation that corresponded to the meaning of the

sentence, proved to be unexpectedly difficult.

By the end of the decade, after much time and money had been spent, most of the

parsing efforts were abandoned.

Almost uniquely, not at NYU.

Page 17: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—17

NLP NLP …… by decadesby decades

1975-1985: Semantic Representation— Semantic Primitives [Yale]— Conceptual Graphs [IBM]— Semantic Nets— Artificial Intelligence (Block World)— Sublanguage Analysis [NYU]

Page 18: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—18

In the second decade, along came the semanticists and the new field of artificial

intelligence. No more parsing. The attack on meaning was direct.

A Yale researcher proposed a set of semantic primitives into which all word

meanings and relations were to be decomposed.

Others worked with representational formalisms, such as Conceptual Graphs

and Semantic Nets.

Page 19: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—19

A successful AI program was able to interpret such instructions as “Place the green block on the red block” and cause

the block images on the screen to carry out the action. Unfortunately, these

efforts remained on the block world level.

At NYU, we continued with linguistic methods, specializing the general NLP

system for the “sublanguage” of medicine.

Page 20: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—20

NLP NLP …… by decadesby decades

1985-1995: Parsing Using Statistics— Corpus-Based Text Processing

Page 21: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—21

Decade 3. Back to parsing. For information, it was inescapable.

The new power of computers suggested to some researchers that grammatical

relations and word associations could be discovered automatically from gigabytes

of text strings.

This work is ongoing.

Page 22: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—22

NLP NLP …… by decadesby decades

1995-now: DiverseGoogle search of “natural language processing”— 785 actual hits (out of 496,000 reported

hits)

Page 23: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—23

Over the last decade, the technology of the Internet has spawned diverse efforts. A google search for ‘Natural Language

Processing’ yielded 785 relevant hits.

Only time will tell which ones prove successful.

Page 24: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—24

NLP NLP …… by decadesby decades

1995-now: Medical Language Processing— MLP— MedLEE— Language and Computing— A-Life

Page 25: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—25

Some work of medical interest in the last decade includes

+ The MLP System, which I will focus on today;

+ MedLEE, developed at Columbia Presbyterian Medical Center;

+ Language and Computing, from Europe, and

+ A-Life, a relative newcomer to the field.

The first two are academic projects; the second two are commercial.

Page 26: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—26

Since the remainder of my talk will be devoted to medical language processing,

I thought I would start by sharing with you some gems of clinical narrative.

Page 27: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—27

• “Discharge status: Alive but without permission. The patient will need disposition, and therefore we will get Dr. Blank to dispose of him.”

• “By the time he was admitted, his rapid heart had stopped, and he was feeling better.”

• “On the second day the knee was better and on the third day it had completely disappeared.”

• “The patient has been depressed ever since she began seeing me in 1983.”

Medical MemosMedical MemosThe following quotes were taken from actual medical records dictated by physicians. They appeared in a column written by Richard Lederer, Ph.D., for the Journal of Court Reporting:

Page 28: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—28

Actually my favorite is the statement: “Discharge Status: Alive but

without permission.”

When we process this sentence into its informational components, what we call Health Information Units, or HIUs, the

result is less funny but more regular; the information content is made explicit.

Page 29: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—29

Health Information Units (Health Information Units (HIUsHIUs))

“Discharge status : alive but without permission .”

• HIU #1: “Discharge status : alive”• CONNECTIVE: “but”• HIU #2: “Discharge status : without

permission”

Page 30: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—30

Here, “Discharge Status” has been copied into the second HIU to create a

complete information unit.So we have 2 units:

“Discharge Status: alive”“Discharge Status: without permission”

occurring with the connective “but”.This is a small example of what language

does to information.

Page 31: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—31

In its millennia of evolution, language developed ways of shortening the

message without losing content. For example, here, as readers, we fill out a

statement with its missing words because the missing words are repeats of previous

words in a parallel position: before and after the conjunction “but”.

Page 32: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—32

A major job of Natural Language Processing is, so to speak, to undo

evolution and present the underlying content in a more regular form.

Page 33: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—33

Transformation into information unitsTransformation into information units

Original sentence from an anonymized patient document:

Today, she has no cough, chest pain, or shortness of breath.

is transformed into single information units:

Today, she has no cough, today, she has no chest pain,and today, she has no shortness of breath.

where:• Time word “today” is distributed to the basic statements;• Negative word “no” is distributed to every object;• “or” in distributed negative statement is transformed to “and”.

Page 34: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—34

Here is another example of NLP restoring a complex sentence to its

underlying information units.

The sentence is “Today she has no cough, chest pain or shortness of breath”. It is

transformed into 3 single information units:

“Today she has no cough,”“Today she has no chest pain”

and“Today she has no shortness of breath”.

Page 35: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—35

NLP obtained them from the original sentence by expanding around the

conjunctions, copying parallel material, and changing ‘or’ to ‘and’ under

negation.

Page 36: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—36

Still, you might ask: “What is the utility in breaking up complex information into

its more elementary components?”The answer is: while text contains

valuable information, there is simply too much of it.

Page 37: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—37

How to treat textual contentHow to treat textual content

• Granted: Text contains valuable information.• Text is too voluminous for sequential viewing.• Develop a method for selective viewing:

— Identify fact units— Tag words with their medical content— Provide a mean of sorting facts by their tags— Link sorted facts to their textual context.

Page 38: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—38

A clinician may confront a patient chart containing 20 or 30 or even 50

documents. It is not possible to read through them all in order to find the facts

relevant to an immediate concern; for example, to follow a particular patient

problem.

Page 39: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—39

By now it is accepted that textual content must be included in the Electronic Health

Record. But blobs of text are unwieldy when it comes to accessing specific

content.

Here is where Natural Language Processing may help.

Page 40: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—40

By identifying up discreet information units, tagging the words with their

medical content, providing a means of sorting facts by their tags and linking the

sorted facts to their textual context, the system creates “hooks” into the text for

selective viewing and other applications.

Page 41: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—41

• Viewer of Dolin

Page 42: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—42

Here you see an example of selective viewing, using a viewer specifically

developed to work with patient documents that have been processed by MLP.

Page 43: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—43

Along the top you see (in red) that CHART QUERIES have been chosen.

The selected Query Type is SUMMARY SHEET.

Page 44: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—44

Page 45: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—45

About SUMMARY SHEETS:Clinicians use a variety of approaches to organizing information contained in the

chart, for efficient retrieval and rapid review of historical information. One of the most common views is a SUMMARY

SHEET that summarizes key information useful in managing a patient's medical

problems.

Page 46: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—46

MLP processed documents allow for multiple query approaches and different

organizational views of the clinical information contained in the chart. This

is due to the comprehensive tagging of the data, starting at the document level

information, and progressing down to the clinical content contained within each

HIU. We will demonstrate the retrieval and display of clinical information on one

patient, but many other views are possible.

Page 47: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—47

The Patient is HL0130, a male whose date of birth September 24, 1932.

We choose to sort the medical facts, the HIUs, by Anatomic System.

After clicking on SUBMIT you see on the left the SUMMARY SHEET for Patient

HL0130, who is represented in the database by 1 document.

Page 48: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—48

Page 49: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—49

DIAGNOSIS, SIGNS AND SYMPTOMS, MEDICATION, ALLERGIES, and

HEALTH-RELATED HABITS.

The subheadings in the SUMMARY SHEET depend on the content of the

given patient's documents.

Patient HL0130, for example, has findings in 4 anatomic systems: the

cardiovascular, integumentary, muculoskeletal, and respiratory.

Page 50: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—50

The HIUs obtained by the MLP system for this document are sorted (due to our

choice above) by Anatomic System. Thus, for example, if we choose (under

DIAGNOSIS, SIGNS AND SYMPTOMS) to see data regarding the patients

CARDIOVASCULAR SYSTEM,

Page 51: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—51

VIEWER Click on CARDIOVASCULAR SYSTEM, under DIAGNOSIS, SIGNS AND

SYMPTOMS

Page 52: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—52

and click on this subheading, we see the HIUs that contain a tag for the

Cardiovascular System. Each HIU carries the date of the visit, arranged in

reverse chronological order.

VIEWER Click on "Hypertension, well-controlled”

Page 53: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—53

Page 54: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—54

By clicking on an HIU, for example, the first HIU under Cardiovascular System,

“Hypertension, well-controlled”, the sentence containing that HIU appears at the top of the right screen in the context

of the given document.

You will perhaps recognize here the text of the Consultation Note concerning

Henry Levin the 7th, here anonymized to Patient HL0130. The text appears as an

example in the HL7 CDA Release 2 Document.

Page 55: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—55

VIEWER Click on Respiratory System

If we click on Respiratory System, we see a greater number of HIUs than under

Cardiovascular System, a quick indication that this is a major problem

area for this patient,

Page 56: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—56

VIEWER Click on the last HIU under Respiratory System

who, has, in fact, been referred for management of his asthma, as we see in

one of the Respiratory System HIUs.

Page 57: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—57

Page 58: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—58

Under Health Related Habits,

VIEWER Click on HIU "prior smoking history",

we find that the patient has a “prior smoking history”,

VIEWER Click on HIU “1 pack per day between the ages of 20 and 55”.

detailed as “1 pack per day between the ages of 20 and 55”.

Page 59: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—59

Page 60: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—60

Note that negative statements are highlighted. Here, the HIU “Smoking:

then he quit” is understood to be a negation of smoking.

VIEWER Click oh HIU “Smoking: then he quit”

This came from ‘and then he quit’ under ‘Smoking’. ‘Smoking’ was copied into the

HIU the way ‘Discharge Status’ was copied in ‘Discharge Status: Alive but

without permission’.

Page 61: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—61

As we noted before, the HIUs for this document were sorted by Anatomic System. Another view of the data is

obtained by choosing to sort by BODY REGION.

VIEWER Sort on BODY REGIONSUBMIT

Page 62: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—62

Now we may see, for example, the data that pertains to EXTREMITY

VIEWER Click on EXTREMITYIlluminate HIUs

"Skin: erythmatous rash, left index finger” and “Osteoarthrytis, right knee”.

where we see several HIUs.“Skin: erythmatous rash, left index

finger”and “Osteoarthrytis, right knee”.

Page 63: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—63

Page 64: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—64

Page 65: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—65

Selective viewing can become important when a patient has numerous documents.

For example,

VIEWER Pull down PATIENTS to SPF and click

Return Sort to Anatomic System

Page 66: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—66

This patient, with 36 documents in the database, has problems in almost every

anatomic system, and is, or was, on a dozen types of medication. The

SUMMARY SHEET for this patient thus contains more headings than were displayed for the previous patient.

Page 67: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—67

Page 68: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—68

A Viewer is one possible application of Medical Language Processing, perhaps

the most important in terms of patient care.

We will return to the Viewer for further examples later.

Page 69: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—69

Now, how is all this accomplished?

For medical language processing, we need to determine the regular forms that

are specific for clinical content, yet based on general properties of language.

Summarizing in 6 points what lies behind what we have seen thus far, first, we

recognize that

Page 70: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—70

1. There is an underlying informational structure in all natural language sentences.

2. The information content of a sentence is given by its syntactic structure and the meaning of the individual words:

3. Structure is given by parsing. A word’s meaning is determined by what other words it occurs with.

structure + word meaning = information

Basis of Medical Language ProcessingBasis of Medical Language Processing

Page 71: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—71

A word may have intrinsic meaning but it functions as a part of language by its

relations to other words.

In linguist Firth’s words: Know a word by the company it keeps.

Page 72: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—72

Basis of Medical Language ProcessingBasis of Medical Language Processing

severe respiratory distressdevelopedpatientcoughdevelopedpatientsevere pain in abdomendevelopedpatient

feverdevelopedpatient

mild colddevelopedpatientOBJECTVERBSUBJECT

4. A semantic class is formed of words that occur in similar environments

SYMPTOM CLASS

Page 73: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—73

A semantic class is formed of words that occur in similar environments. Thus, for

example, the formation of the symptom class.

‘Cold’ in ‘mild cold’ is the central noun in the object of ‘developed’, in ‘Patient developed mild

cold’;‘Fever’ occurs similarly in ‘Patient developed

fever’.So also ‘pain’ in ‘Patient developed severe pain

in abdomen’.And again, for ‘cough’ in ‘Patient developed

cough’.And for ‘distress’ in ‘Patient developed severe

respiratory distress’.

Page 74: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—74

While we know these words are all symptoms based on medical knowledge,

the significance for computer processing is that they form a class, that, together

with other classes, form patterned occurrences.

Page 75: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—75

5. A statement type in a subject area is formed of semantic classes co-occurring frequently in a syntactic relation.

6. Clinical statement types comprise a computable semantic structure for housing narrative clinical information.

Basis of Medical Language ProcessingBasis of Medical Language Processing

todaypainchestnohassheTIMESYMPTPTPARTNEGV-PTPT

Page 76: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—76

Here we see an example of a patient-state statement type, based on the frequent co-

occurrence of words in the classes for Patient, Patient-verb, and Symptom in the Subject-Verb-Object relation,

corresponding to the words‘She has pain’.

Page 77: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—77

We see here only a flattened version of the structure. It does not display explicitly the modifier relations:

‘chest’ as a patient-part modifier of ‘pain’;

‘no’ as a negation modifier of ‘pain’and

‘today’ as a time modifier of the whole statement.

Page 78: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—78

Instances of statement types in texts when output as XML structures enriched with

tags that represent their medical content, become HIUs, Health Information Units.

Page 79: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—79

For each document sentence, the first step is to produce a parse tree.

Page 80: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—80

A parse of one sentenceA parse of one sentenceSENTENCE|TEXTLET|ONESENT----------------------------------------------------------------------------------MORESENT|INTRODUCER---CENTER---------------------------------------------------------ENDMARK

| |ASSERTION---------------------------------------PAREN-FRAG '.'| |SA-------SUBJECT---TENSE---VERB--------------OBJECT----SA .| | | |SAOPTS NSTG LV---VVAR----RV NSTGO| | | |NSTGT LNR TV NSTG| | | |

| | VHAVE has LNRTIME-PHRASE | | |

| LN---------------------NVAR---RN LN----------------------NVAR---RN| | | | || TPOS--QPOS--APOS--NPOS PRO TPOS--QPOS--APOS--NPOS N| | | |

| she LTR coughLTIME--NSTG H-PT/H-FAMILY |

| LT----T---RT H-INDICLNR NTIME2 |

| noLN----------------------NVAR--RN--COMMASTG| | | H-NEGTPOS--QPOS--APOS--NPOS N ','

| |

Today ,

*SID=990318P2 030.20B.03.02Today , she has no cough .

Page 81: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—81

The overall structure is an Assertion, with a Subject, Verb and Object.

Words are associated with the bottom-most or “terminal” nodes, which are

parts of speech. Thus, the last word ‘cough’ is a noun N in the lexicon that

matches the terminal node N in the parse tree. The pink symbols here are attributes

carried by the matched word in the lexicon. For example, ‘no’ has the

attribute H-NEG in the lexicon, H for Healthcare sublanguage.

Page 82: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—82

XML output of one sentenceXML output of one sentence<SID id="990318P2 098.20B.03.02"><!-- Today , she has no cough . -->

<PATIENT-STATE-HIU id="990318P2 098.20B.03.02“ sect="REVIEW OF SYSTEMS" row=“1"><EVENT-TIME>

<REF-PT> <_4152><tm><tm_tm-loc> Today </tm_tm-loc></tm></_4152> , </REF-PT></EVENT-TIME>

<PT-DEMOG><GENDER>[FEMALE]</GENDER></PT-DEMOG><SUBJECT> <_5705><per> she </per></_5705> </SUBJECT><VERB> <_7168><li><li_vhv> has </li_vhv></li></_7168>

<TENSE>[PRESENT]</TENSE></VERB><PSTATE-DATA>

<SIGN-SYMP><MODS><NEG> <_3440><md><md_ng> no </md_ng></md></_3440></NEG></MODS><_802><s-s><a-s_resp><b-r_m-r> cough </b-r_m-r></a-s_resp></s-s></_802>

</SIGN-SYMP></PSTATE-DATA></PATIENT-STATE-HIU></SID>

Page 83: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—83

After several stages of processing the parse tree has been transformed into a

medically labeled XML structure, an HIU, in which the individual terms carry

XML tags that represent their medical content.

Page 84: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—84

This is a Patient State type HIU.The EVENT-TIME is “today”

The Patient DEMOGraphic information is the GENDER ‘female’ from the lexicon

entry for the SUBJECT word ‘she’.The SUBJECT is ‘she’and the VERB is ‘has’.

The PSTATE DATA is a SIGN-SYMPTOM ‘cough’ with the modifier

NEG whose value is the word “no”.

Page 85: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—85

Notice that each word is carrying XML tags of its own.

For example, ‘cough’ is tagged as s-s(sign-symptom),

a-s_resp (anatomic system, respiratory), b-r_m-r (body region, multi-region).

These tags are drawn from the Structured Health Markup Language,

or SHML.

Page 86: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—86

We will return to the SHML as it is used in the overall process of converting clinical narrative into a structured, medically tagged, representation of

content.

Page 87: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—87

MLP with SHML linkageMLP with SHML linkageMedical

documents

Preprocessing (standardization)

Documents with SIDs

MLP

MLP and SHML Dictionaries

Documents in HIU’s with SHML and MLP tags

GENERATORS• SHML/DTD• SHML/XSL• SHML/XQL

OtherApplications

Viewer

Page 88: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—88

In the overall process, medical documents first pass through a

preprocessing stage where every sentence receives a Sentence Identifier.

The documents are then processed by the MLP system, and then, by drawing on a dictionary containing the SHML tags of

the words, the documents obtain a representation as HIUs. This

representation, along with the original documents, serves as input to a viewer or

other applications.

Page 89: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—89

PreprocessingPreprocessing

• Identification of sections and SIDs• Name identification (person, geographical

location, institution,…)• Spelling• Punctuation• Time, date, unit, number standardization

(ANSI standard)

Page 90: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—90

In the course of processing documents from 15 institutional sources,

encompassing close to 118,000 sentences, we have encountered

37 document types and 491 different section names.

Page 91: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—91

Document typesDocument typesAcute Care VisitAdmission NoteBreast Clinic NoteBreast ONC Interval NoteCardiology Assoc Clinic NoteCardiology Assoc. Clinic NoteCardiology Associate CCU Admission NoteCardiology Associates Admission NoteCardiology Associates Progress NoteClinic NoteClinic NotesConsultant NoteConsultation ReportDischarge SummaryEEG ReportEmergency Department ReportEncounter NoteFollow-Up Clinic NoteGIM Acute Care Visit

GIM Return VisitGood Health Clinic Consultation noteInterval NoteNeurology New Patient EvalNew Patient EvaluationOHNS Clinic NoteOperative ReportOrthopaedic Clinic NotePhysical and Occ. Therapy NotePre-Operative VisitProcedure Note/ReportPulmonary Consultation NotePulmonary Return VisitRenal New Patient EvaluationRenal Return Patient EvalRenal Return Patient EvaluationReturn VisitRheumatology Clinic Note117,581 sentences

Page 92: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—92

Clearly, this is only a sample of what lies out there.

But a sufficient sample to reveal a number of issues in preparing documents for standardized processing, whether by

NLP or other means.

Page 93: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—93

SectionsSectionsPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIESPROCEDURES/THERAPIES

LABLABLABLABLABLABLABLABLABLABLABLABLABLABLABLABLABLAB

------------------

DDDDDDDDDDDDDDDDDD

LAB / TELEMETRYLAB WORK BLOOD DRAWLABORATORIESLABORATORIES AT ADMISSIONLABORATORYLABORATORY ADDENDUMLABORATORY DATALABORATORY DATA / TEST RESULTSLABORATORY DATA AT ADMISSIONLABORATORY DATA ON ADMISSIONLABORATORY EVALUATIONLABORATORY OF NOTELABORATORY ON ADMISSIONLABORATORY RESULTSLABORATORY STUDIESLABORATORY TESTS ON ADMISSIONLABORATORY VALUESLABS

Page 94: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—94

As you well know, section names range over a wide gamut. The 491 different

section names that we have encountered probably only scratch the surface. It is

not easy to group section names into gross classes but some are variants on a

single theme.

Here, for example, are different section names concerning Laboratory Tests. For document processing, variants are given

a single designation, in this case, for LAB, the letter “D”.

Page 95: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—95

Sentence id (SID)Sentence id (SID)

*SID=MLPC15 HL0130.001B.01.002XPT-HL0130 is a 67 year old male referred for further asthma management .

Institutional document

set

Patient Number

Record number

Section id

Paragraph number

Sentence number within paragraph

Page 96: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—96

The section letter appears in the Sentence Identifier, the SID, along with other

coded information.The SID identifies:

• the institutional document set,• a patient number,

• record number, • section identifier (here, B for mainly

historical information), • the paragraph number within the

section, and • the sentence number within the

paragraph.

Page 97: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—97

MLP with SHML linkageMLP with SHML linkageMedical

documents

Preprocessing (standardization)

Documents with SIDs

MLP

MLP and SHML Dictionaries

Documents in HIU’s with SHML and MLP tags

GENERATORS• SHML/DTD• SHML/XSL• SHML/XQL

OtherApplications

Viewer

Page 98: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—98

We return to the overall process.After preprocessing, documents enter the

MLP proper.

Page 99: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—99

ENGLISH Source Text

ENGLISH PARSING

FRENCH PARSING

GERMAN PARSING

ENGLISH SELECTION

GERMAN SELECTION

FRENCH SELECTION

GERMAN Source Text

FRENCH Source Text

ENGLISH Transformation

GERMAN Transformation

FRENCH Transformation

REGULARIZATION

INFORMATION FORMATTING

SYNTACTIC & MEDICAL LEXICON

GRAMMAR RULES: BNF & RESTRICTIONS

MEDICAL COOCCURRENCE

PATTERNS

MEDICAL REPRESENTATION

STRUCTURE

RELATIONAL dBMS / XML & SHML VIEWER

PARSER

Page 100: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—100

It makes good sense to modularize natural language processing.

We noted earlier that initially researchers identified 3 major levels of processing:

syntax, semantics, pragmatics. In practice, the functions become more

specific, particularly in response to implementation issues.

Page 101: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—101

Here we see the process divided into 5 sequential components.

The first component is PARSING. It draws on a lexicon and a grammar. The MLP system has been implemented in 3

languages, actually 4, as Dutch has recently added by a European colleague.

Page 102: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—102

The lexicons are, naturally, language-specific; the grammars for related

languages are similar. In fact, the French and German MLP

grammars were developed as updates to the English MLP grammar.

Page 103: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—103

The second component, SELECTION, resolves ambiguity where possible, based

on medical word-class co-occurrence patterns. We will see some examples.

The third component transforms complex sentences into their individual

information units, as we saw for the sentence:

"Today she has no cough, chest pain or shortness of breath".

Page 104: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—104

The fourth component provides a uniform connective structure for sentences with

more than 1 information unit.By this time the different language

versions can use the same program.

The fifth component maps each information unit into the appropriate medical statement type, which has an

XML representation and will become an HIU with the addition of SHML tags.

Page 105: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—105

HIU TypesHIU Types• PATIENT-STATE (34.59%)• ALLERGIES (0.73%)• MEDICATIONS-INFO (8.27%)• IMAGING-INFO (1.67%)• MED-SURG-PROCEDURES (4.32%)• LAB-TEST (5.17%)• FAMILY-FRIEND (0.19%)• MISC-TREATMENTS (1.80%)• EKG-TEST (0.61%)• DOCUMENT-INFO (2.46%)• TEXTPLUS (8.19%)

Page 106: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—106

A working set of HIU types classifies information at the highest level and is

useful for retrieval of information.

Page 107: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—107

PATIENT-STATE is by far the most frequently occurring HIU type. It covers

all descriptions of the patient, the patients problems, risk factors, functionality, and

historical information.

ALLERGIES, as an HIU type, has been singled out from the PATEINT-STATE

HIU type because of its singular importance in patient care.

MEDICATION-INFO HIUs include the named medication and dose, along with

whatever time or change information occurs in the statement.

Page 108: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—108

IMAGING-INFO has its own importance among diagnostic tools. Other

procedures could also be singled out to become HIU types.

MED-SURG-PROCEDURES also could be further divided. It functions on a high level to distinguish procedures from all

other information.LAB-TEST HIUs cover blood work,

urinalysis, culturing and the like.FAMILY-FRIEND HIUs cover Family

History, and other statements involving persons other that the patient or care-

givers.

Page 109: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—109

MISC-TREATMENTS covers such complementary treatments as bedrest,

physical therapy and the like, but also an MLP statement type of "general medical

management".EKG-INFO as an HIU type is an example

where the special language of a diagnostic test requires, virtually, a

subgrammar to process it and a special structure to house the information. Many more of this type will certainly be needed.

Page 110: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—110

DOCUMENT-INFO is an HIU type to hold information about a document being

referenced or discussed.

TEXTPLUS is an HIU type that picks up un-analyzed sentences. The words are

tagged, but no parse was possible over that sentence or stretch of words within a

sentence.

Page 111: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—111

I am often asked, which part of the MLP process is the hardest?

The answer is PARSING. Who would have thought there could be so many ways to analyze a sentence? Or how important is the correct assigning of

structure to obtaining a correct representation of information.

Equally important, though, is the resolution of ambiguity. Ambiguity comes in 2 flavors: word-sense and

syntactic.

Page 112: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—112

In word sense ambiguity, a word has 2 or more distinct meanings.

Consider ‘depression’ in 2 occurrences:

Page 113: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—113

Word sense ambiguityWord sense ambiguity

Patient suffers from severe depression.vs.

Electrocardiogram shows ST depressionin lead 5.

Page 114: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—114

‘Patient suffers from severe depression’versus

‘Electrocardiogram shows ST depression in lead 5’

Clearly, the different contexts distinguish the different meanings. But storing

contexts on the level of the words themselves is not feasible because of the

large number of words and the variety of contexts. However, storing contexts in

terms of word classes is manageable.

Page 115: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—115

Resolution of word sense ambiguityResolution of word sense ambiguity

feltmassesH-INDIC

orOR

no growthH-INDIC

H-PTFUNC

normaldevelopmentH-PTFUNC

andAND

growthH-INDIC

H-PTFUNC

Page 116: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—116

For example, word class patterns resolve the ambiguity of 'growth'.

• ‘Growth’ in ‘growth and development normal’ is a word describing a normal

patient physiological function, in the Healthcare sublanguage word class H-

PTFUNC.• ‘Growth’ in ‘no growth or masses felt’is a disease-indicator word, in the class

H-INDIC. Conjoined nouns should be in the same

class (or certain compatible classes), so this decides which sense of ‘growth’ is

correct in these occurrences.

Page 117: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—117

Resolution of syntactic ambiguityResolution of syntactic ambiguity

swelling in kneesandswelling in hands

swelling in kneesandfever

H-PTPARTH-PTPARTH-INDIChands)and(kneesinswelling

NCONJNPN

H-INDICH-PTPARTH-INDICfeverandknees)in(swelling

Page 118: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—118

In addition to word sense ambiguity, there is syntactic ambiguity.

From the same sequence of Noun, Preposition, Noun, Conjunction, Noun we

can have two different groupings.

Page 119: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—119

Again we rely on matching medical word classes.

• In ‘swelling in knees and hands’ the match is H-PTPART: ‘knees and hands’

• In ‘swelling in knees and fever’, the match is H-INDIC: ‘swelling and fever’

This matching of subclasses is important because, if we treat the two structures as equivalent, the system could generate the

incorrect ‘swelling in fever’.

Page 120: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—120

Word class patternsWord class patterns• 58 semantic classes• Conjunction Equivalent Classes: 47 patterns.• Computed Phrase (Left Adjunct+Noun): 147 patterns.• Computed Phrase (Noun+Noun): 58 patterns.• Computed Phrase (Noun+Right Adjunct): 165 patterns.• Noun−Preposition−Noun: 3,383 patterns.• Adjective−Noun: 727 patterns.• Noun−Noun: 546 patterns.• Subject−Verb−Object: 566 patterns.• Subject−Be−Object: 100 patterns.

Page 121: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—121

Resolution of ambiguity by matching subclasses requires the accumulation of

many instances of well-formed word class co-occurrence patterns.

On the one hand, it is daunting to see the amount of language data that is needed

to accomplish the conversion of free text to structured information.

On the other hand, it is rather remarkable that it can be done with as

few as 58 word classes.

Page 122: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—122

MLP with SHML linkageMLP with SHML linkageMedical

documents

Preprocessing (standardization)

Documents with SIDs

MLP

MLP and SHML Dictionaries

Documents in HIU’s with SHML and MLP tags

GENERATORS• SHML/DTD• SHML/XSL• SHML/XQL

OtherApplications

Viewer

Page 123: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—123

Now, having dwelt on the MLP part of the overall process, let us return to the SHML tags, the “hooks” into specific

medical content.

Page 124: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—124

SHMLSHMLStructured Health Markup LanguageStructured Health Markup Language

• Medical knowledge XML tag set• Designed to work with Medical Language

Processing via XML• Authors: David Rothwell, MD, Richard

Wheeler, MD, Ngô Thanh Nhàn, Ph.D.

Page 125: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—125

SHML is a medical knowledge XML tag set, designed to work with medical

language processing. It is the work over the past several years primarily by Dr.

David Rothwell, well known for his authorship of Snomed, along with Dr.

Richard Wheeler, formerly Chief Medical Manager of Healthmatics, and Dr. Ngô

Thanh Nhàn, the computer scientist who created the XML implementation of the

SHML.

Dr. Nhàn will be speaking in this Conference on Friday.

Page 126: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—126

SHML tags are a mix of linguistic and medical categories needed to perform

information-sensitive tasks: For example, to provide a physician selective viewing of the content of a patient’s documents,

as we have seen; or to perform a retrieval over a patient population, such as to meet a JCAHO requirement, an application we

will see shortly.

Some of the features of SHML are

Page 127: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—127

SHMLSHML

• Uses XML formalism• Data and document are combined• SHML tags are metadata—medical

information not explicit in text• Preserves fundamental structure of

document (EHR)• Users can create their own tags and tag

extensions

Page 128: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—128

SHML tags: exampleSHML tags: example<dx>

<a-s_resp_l-r><b-r_m-r><dx-prcss_imm_all><dx-kind_d-k-resp_r-a-d>Asthma</dx-kind_d-k-resp_r-a-d></dx-prcss_imm_all></b-r_m-r></a-s_resp_l-r>

</dx>

Page 129: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—129

As an example of SHML tagging, ‘Asthma’ is a diagnosis dx, associated

with the anatomic system a-s, specifically respiratory resp, lower respiratory

system l-r.It is associated with the body region b-r,

of the type multi-region m-r. It is associated with the disease process

immunologic allergic. And in a typing of diagnoses by group, it is respiratory,

more specifically reactive airways disease r-a-d.

Page 130: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—130

Example with Example with SnomedSnomed codecode<dx>

<a-s_resp_l-r><b-r_m-r><dx-prcss_imm_all><dx-kind_d-k-resp_r-a-d><Snomed_D2-51000>Asthma</Snomed_D2-51000></dx-kind_d-k-resp_r-a-d></dx-prcss_imm_all></b-r_m-r></a-s_resp_l-r>

</dx>

Page 131: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—131

SHML tagging is not coding, but codes can be added as additional tags. As you

see here, the Snomed code for Asthma has been added as another tag.

Dr. Rothwell has kindly allowed me to use a number of his slides to illustrate

various features of the SHML.

Page 132: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—132

SHML tagsSHML tags——formal taxonomiesformal taxonomies

• Anatomy <a-s>• Body region <b-r>• Organisms <or>• Chemicals <chem>• Meds <med>• Diagnoses <dx>• Procedures <pr>• …

Page 133: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—133

One can describe SHML tags as a formal taxonomies.

Page 134: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—134

SHML Tag TypesSHML Tag Types• Activities (sports,…)• Medications: (Multum),

med-class• Chemicals• Time: freq, repetition, exact,

begin, end• Links• Modifiers: modal, negation,

changes, amount, desc, s-q• Person: kin, civil• Demographic

• Anatomic structure• Body region• Sign-symptom• Diagnosis• Dx-process• Dx group by system• Procedures• Organisms• Allergies• Pt social behavior• Health status (adl…)

Page 135: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—135

These are the main tag types now in use.

Page 136: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—136

SHML tag systemSHML tag system

b-rb-r_h-n_hdb-r_h-n_hdb-r_h-n_hdb-r_tk_thxb-r_tk_thxb-r_tk_thxb-r_tk_thxb-r_tk_thx

a-sa-s_nra-s_nr_cnsa-s_nr_cns_brna-s_rspa-s_rsp_u-ra-s_rsp_l-ra-s_rsp_l-r_lnga-s_gi_gi-tr_u-gi_stm

anatomic systemneurologic systemcentral nervous systembrainrespiratory systemupper respiratory tractlower respiratory tractlungstomach

Body RegionAnatomic StructureTag name

Page 137: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—137

The encapsulated hierarchic structure of the anatomic system is illustrated here by

the tag for 'brain' in the central nervous system, of the neurologic system, an

anatomic system.

A feature of the SHML is that a term which has an a-s tag also has a b-r tag.

Medical conditions are located both with regard to anatomy and the body region in

which they occur.

Page 138: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—138

SHML tagsSHML tagsOrganisms <or>– microorganism <or_mc>– bacteria <or_mc_bct>– Gram positive <or_mc_bct_gm-pos>– Gram negative <or_mc_bct_gm-neg>– virus <or_mc_vr>– Rickettsia <or_mc_rck>– fungus <or_mc_fgs>– parasite <or_mc_par>– arthropod <or_mc_arthropod>

Page 139: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—139

This an example of the tag classes for organisms—

at the highest level: ‘or’ for organisms;the subclass microorganisms ‘or_mc’;

the further subclass bacteria ‘or_mc_bact’;

and so forth.

Page 140: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—140

SHML tag systemSHML tag system

med-cl_antiinf_pcnmed-cl_antiinf_pcnmed-cl_aceinhmed-cl_aceinh

medmedmedmed

PenicillinAmpicillinAce inhibitorCaptopril

SHML support tagsSHML tagTerm

Page 141: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—141

Here we see a sample of how medications are tagged: with ‘med’ and the “support class” of their medication

class,for example, penicillin, carrying the

support class tag:med-cl_anti-inf_penicillin.

indicating its medication class is anti-inflammatory, type penicillin.

Page 142: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—142

SHML tag systemSHML tag system

dxdx-prcssdx-prcss_infdx-prcss_immdx-prcss_np

dx-kinddx-kind_nrdx-kind_nr_migrdx-kind_resp_r-a-ddx-kind_resp_r-a-d

DiagnosisDiagnostic processInfectious diagnostic processImmune diagnostic processNeoplastic diagnostic process

Diagnosis groupNeurologic diseaseMigraineReactive Airway DiseaseAsthma

dx and Support TagsTag Name

Page 143: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—143

Here we we see the dx tag for diagnosis with examples of 2 types of support tags,

one for the diagnostic process and one for the diagnostic group. We saw earlier

examples of both support tags in the tagging of the diagnosis ASTHMA.

Page 144: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—144

Where the tags deal with general language, they follow the classes of the

MLP system. For example, words expressing time.

Page 145: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—145

tm_prepPH-TMPREPsincetm_durADJH-TMDURhabitualtm_durADJH-TMDURfleetingtm_endNH-TMENDend stagetm_endVH-TMENDdiscontinuetm_begADJH-TMBEGemergenttm_begDH-TMBEGinitiallytm_locDH-TMLOCon admissiontm_locVH-TMLOCantecede

SHML tagPart of SpeechMLP classTerm

Terms expressing timeTerms expressing time

Page 146: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—146

and negation.

Page 147: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—147

Terms expressing negationTerms expressing negation

md_ngPH-NEGwithoutmd_ngVH-NEGrejectmd_ngPROH-NEGnothingmd_ngADJH-NEGnot ablemd_ngDH-NEGnevermd_ngVH-NEGexcludemd_ngPH-NEGin absence ofmd_ngVH-NEGdenymd_ngVH-NEGdecline

SHML TagPart of SpeechMLP classTerm

Page 148: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—148

There are a remarkable number of ways to express negation. The MLP lexicon contains 282 negation terms — nouns,

adjectives, adverbs and verbs. For example, the verb ‘declined’ in ‘patient

declined surgery’ is a patient action, but in terms of whether surgery was

performed, it is a negation.

Page 149: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—149

Of course, ‘decline’ has another meaning in relation to quantities, indicating a lessened value, as in

‘hemoglobin declined slightly from 12.3’but that is a matter of ambiguity

resolution.

Page 150: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—150

Terms expressing uncertaintyTerms expressing uncertainty

md_modalADJH-MODALhypotheticalmd_modalTVH-MODALhypothesizemd_modalNH-MODALhypothesismd_modalADJH-MODALdoubtfulmd_modalDH-MODALconceivablymd_modalNH-MODALassumptionmd_modalTVH-MODALassumemd_modalDH-MODALallegedly

SHML tagPart of SpeechMLP classTerm

Page 151: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—151

The tag ‘md_modal’ corresponds to the MLP class H-MODAL that contains

terms that express uncertainty, a surprising 905 terms in the MLP lexicon.

Page 152: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—152

MLP with SHML linkageMLP with SHML linkageMedical

documents

Preprocessing (standardization)

Documents with SIDs

MLP

MLP and SHML Dictionaries

Documents in HIU’s with SHML and MLP tags

GENERATORS• SHML/DTD• SHML/XSL• SHML/XQL

OtherApplications

Viewer

Page 153: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—153

To see how the combined MLP-SHML system functions, we return to the viewer,

this time for an example involving a patient population.

Page 154: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—154

JCAHO ORYX core measuresJCAHO ORYX core measures

• JCAHO – Joint Commission for the Accreditation of Healthcare Organization

• ORYX – JCAHO’s Quality Measurement System to allow quality care comparison between organizations.

Page 155: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—155

A number of accrediting organizations are requiring healthcare providers to

demonstrate the quality of the care being delivered. An example is the JCAHO

ORYX Core Measures.

As these measures have been refined, more clinical information is being

required, which often must be abstracted from clinic notes. This puts an added

burden and cost on provider organizations.

Page 156: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—156

A JCAHO ORYX core measures A JCAHO ORYX core measures for congestive heart failurefor congestive heart failure

• What percent of patients with congestive heart failure (CHF) and a low ejection fraction (EF) are on ACE inhibitors ?

Page 157: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—157

One of the requirements of the Congestive Heart Failure measure is

what percent of patients with CHF, and a low ejection fraction, are on ACE

Inhibitors.

ACE is Angiotensin Converting Enzyme. ACE Inhibitor is a medication that can be

used to lower blood pressure, but has also been shown to decrease morbidity

and mortality in patients with Congestive Heart Failure, or a recent MI.

Page 158: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—158

For most organizations, accessing the Ejection Fraction requires abstracting

the chart.

This information can also be accessed by processing the documents and running a

query on the processed documents.

Page 159: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—159

Viewer: JCAHO requirementViewer: JCAHO requirement

• A set of CHF patients, with EF < 40% on ACE inhibitors

Page 160: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—160

VIEWER:-Select JCAHO Core Measures at

the top of the Viewer-Select JCAHO CHF in the Query

Type field.-Select idtmerg in the Document Set

field, -push Submit

Page 161: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—161

I have selected a data set of discharge summaries for 95 patients who were hospitalized for a variety of cardiac

conditions.

We find that there were 42 CHF Patients.— of those 42, 11 had a documented

Ejection Fraction of < 40%(the query also accepted 40%)

— of those 11, 8 were on ACE Inhibitors.

Page 162: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—162

We can open some of the HIU's that gave us the results for these patients.

VIEWER Open 42 CHF Pts

As an example of CHF patients

VIEWER Choose IDT051Click on HIU ‘She was found to be in

Congestive Heart Failure’

Page 163: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—163

One HIU: ‘She was found to be in Congestive Heart Failure’

By clicking on this HIU, we see the text context.

The link to the text is important. NLP is not perfect, and 1 HIU may not be the

whole story.

Other patients have more than 1 HIU

Page 164: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—164

VIEWER Choose IDT061Click on HIU ‘Chest Xray showed severe

congestive heart failure’This patient qualifies for CHF by the first

HIU: ‘Chest Xray showed severe congestive heart failure’

and by others.VIEWER Choose IDT061

Click on last HIU ‘congestive heart failure’

CHF was the Discharge Diagnosis.

Page 165: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—165

Of the 42 CHF patients, 11 had EF less than 40 %

Clicking on the first patient in this group

VIEWERGo to 11 Pts with EF < 40 %

Click on IDT001

Page 166: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—166

we see the HIU ‘Most recent echocardiogram in 05/00/92 (May 1992)

showed an ejection fraction of 20 %’.

VIEWER Click on HIU‘Most recent echocardiogram in 05/00/92

showed an ejection fraction of 20 %’.

Page 167: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—167

For this patient there might me a question whether the ORYX measure is met since

the ‘most recent’ EF was obtained in May, and this admission is in September

The other criteria are met. We find the HIU ‘IMPRESSION: Congestive Heart

Failure’

VIEWER Click on HIU‘IMPRESSION: Congestive Heart

Failure’

Page 168: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—168

and numerous HIUs showing the patient is taking Captipril, an ACE Inhibitor.

VIEWER Highlight (without clicking) the 4 Captipril HIUs

Page 169: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—169

If time permits, or at some other time during the conference, we may explore

other examples.

Page 170: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—170

To conclude, I would like to summarize the program for treating text in the EHR, using natural language processing, that I

have just presented.

Page 171: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—171

Text in EHRText in EHRTEXT MLP … XML/SHML HIU’s VIEWER

Page 172: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—172

Text is captured by whatever means are available at the time, and are

preprocessed.

Page 173: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—173

Text in EHRText in EHRTEXT MLP … XML/SHML HIU’s VIEWER

1. Electronic formTranscriptionVoiceOCR

2. Preprocessing

Page 174: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—174

Text sentence are then passed through the 5 components of the MLP system to

produce XML trees.NIMPH is a quality control procedure

applied to NLP output trees.XML-SHML tags are added to text words,

creating HIUs, a representation of clinical facts, which are input to the

Viewer.

Page 175: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—175

Text in EHRText in EHRTEXT MLP … XML/SHML HIU’s VIEWER

1. Electronic formTranscriptionVoiceOCR

2. Preprocessing

tagging clinical facts

5 MLP steps

dBtrees + NIMPH

Page 176: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—176

Templates are provided to sort and display HIUs in their document context,

or to perform other tasks on the database created by MLP.

Page 177: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—177

Text in EHRText in EHRTEXT MLP … XML/SHML HIU’s VIEWER

1. Electronic formTranscriptionVoiceOCR

2. Preprocessing

5 MLP steps

dBtrees + NIMPH

tagging clinical facts

SortDisplay/Access

Perform analysisPerform tasks

Tem

plat

es

Page 178: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—178

Whether or not an NLP system, the MLP or another, will come to play a role in the

future EHR, I hope today I have given you an inkling of what is involved in

creating an NLP system, and also a hint of what might be its contribution.

Page 179: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—179

And I thank you for your patient attention.

Page 180: Talk - XML Structuring Clinical Narrative

October 20, 2004 XML Structuring of Clinical Narrative using NLP—180

The endThe end