creating and evaluating a consensus for negated and speculative words in a swedish clinical corpus...

Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus

Hercules Dalianis Maria Skeppstedt

Stockholm UniversityDepartment of Computer and Systems Sciences

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010

Intro and Contents

• An experiment with annotated clinical text1 Background

2 Creation of a consensus

3 Automatic detection of cues and the class

4 Comparison with the BioScope Corpus

5 Conclusion and next step

2


What is special about clinical text?

3


Kvinna med hjärtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med.

Example of clinical text (Swedish)

4


Woman with heart failures, atrial

fibrillation, and angina pectoris. Single,

widow. Former CVL with sequele, right

hemiparesis and aphasia. Prior hosp. care

for seizures, apoplectic suspected. Arrive to

hospital after being found in a chair and

probably been sitting there over night. Arrive

for further investigation and care.

Accompanied by her son Johan.

Example of clinical text

5


Related research: Negation and speculation detection in clinical text

• Both rule-based systems and machine learning systems

• Precision and recall from just above 80% to just below 100%

• Most on English text

6


The Stockholm EPR Corpus

• Clinics in Stockholm

• 2006-2008

• >800 clinics, >1 million patients

• In Swedish

7


The annotation

• Three annotators

• The assessment part of health records

• 6 740 sentences

Annotated:– Cues for negation and speculation– Classify the sentence as either certain or uncertain,

or break it up the into sub-clauses

8

The annotation


<Sentence>

<Uncertain>

<Speculative_words>

<Negation>Not</Negation>

really

</Speculative_words>

much worse than before

</Uncertain>

<Sentence>

9

Construction of a consensus

General idea:

• Choose the majority annotation

Discarded:

• The first annotation rounds discarded (16%)

• 2% too different to be resolved, also discarded

In the resulting consensus:

• 92% identically annotated by at least two persons

• 6% identically annotated by at least two persons for class. (For cues, only identical

when disregarding the scope. Ex. could perhaps)

• 2% only identical for class, only when scope of class disregarded.

10


Differences between the individual annotations and the consensus

1. Fewer uncertain expressions

2. Fewer cues for speculation

3. Fewer sentences that were divided into sub-clauses

11


The BioScope Corpus

1. Cues for speculation and negation

2. The scope of speculation and negation

<sentence id="S1345.2">Correlation with the patient's height and weight <xcope id="X1345.2.1"><cue type="speculation" ref="X1345.2.1">may</cue> be some value</xcope>.</sentence>

12


Comparison between the BioScope Corpus and our corpus

Type of word Our Consensus BioScope

Unique negation cues

13 19

Negation cues occurring only once

5 10

Unique speculation cues

408 79

Speculation cues occurring only once

294 19

13


Our corpus/the BioScope Corpus

1. Not so detailed guidelines/More detailed guidelines

2. Consensus with majority decision/Resolving differences with chief annotator (also higher inter-annotator agreement)

3. Assessment part from many clinics/Radiology reports

14


Experiment with the Stanford Named Entity Recognizer

Based on Conditional Random Fields

• Detections of cues and certain/uncertain

• Comparison between our corpus and the BioScope Corpus

15


Result of automatic detection of cues for negation

Precision Recall

Our corpus 0.879 0.917

The BioScope corpus

0.976 0.967

16


Result of automatic detection of cues for speculation

Precision Recall

Our corpus 0.674 0.354

The BioScope corpus

0.946 0.908

17


Result of automatic detection of class and scope

Precision Recall

Our corpus(Uncertain expression)

0.494 0.371

Precision Recall

BioScope(Scope for either negation or speculation)

0.838 0.812

18


Conclusion and next step

1. Low results for detecting cues for speculation and class in our constructed corpus

2. Simplifying the task can hopefully result in:• Higher inter-annotator agreement• Easier to automatically learn to detect speculation

19

Thank you!

Questions?Hercules [email protected]

Maria [email protected]

creating and evaluating a consensus for negated and speculative words in a swedish clinical corpus...

Documents

annotation dalianis

annotated clinical text

bioscope corpus

fewer cues

scope of speculation

speculation detection

english text

scope of class