creating and evaluating a consensus for negated and speculative words in a swedish clinical corpus...
TRANSCRIPT
Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus
Hercules Dalianis Maria Skeppstedt
Stockholm UniversityDepartment of Computer and Systems Sciences
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Intro and Contents
• An experiment with annotated clinical text1 Background
2 Creation of a consensus
3 Automatic detection of cues and the class
4 Comparison with the BioScope Corpus
5 Conclusion and next step
2
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
What is special about clinical text?
3
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Kvinna med hjärtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med.
Example of clinical text (Swedish)
4
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Woman with heart failures, atrial
fibrillation, and angina pectoris. Single,
widow. Former CVL with sequele, right
hemiparesis and aphasia. Prior hosp. care
for seizures, apoplectic suspected. Arrive to
hospital after being found in a chair and
probably been sitting there over night. Arrive
for further investigation and care.
Accompanied by her son Johan.
Example of clinical text
5
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Related research: Negation and speculation detection in clinical text
• Both rule-based systems and machine learning systems
• Precision and recall from just above 80% to just below 100%
• Most on English text
6
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
The Stockholm EPR Corpus
• Clinics in Stockholm
• 2006-2008
• >800 clinics, >1 million patients
• In Swedish
7
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
The annotation
• Three annotators
• The assessment part of health records
• 6 740 sentences
Annotated:– Cues for negation and speculation– Classify the sentence as either certain or uncertain,
or break it up the into sub-clauses
8
The annotation
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
<Sentence>
<Uncertain>
<Speculative_words>
<Negation>Not</Negation>
really
</Speculative_words>
much worse than before
</Uncertain>
<Sentence>
9
Construction of a consensus
General idea:
• Choose the majority annotation
Discarded:
• The first annotation rounds discarded (16%)
• 2% too different to be resolved, also discarded
In the resulting consensus:
• 92% identically annotated by at least two persons
• 6% identically annotated by at least two persons for class. (For cues, only identical
when disregarding the scope. Ex. could perhaps)
• 2% only identical for class, only when scope of class disregarded.
10
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Differences between the individual annotations and the consensus
1. Fewer uncertain expressions
2. Fewer cues for speculation
3. Fewer sentences that were divided into sub-clauses
11
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
The BioScope Corpus
1. Cues for speculation and negation
2. The scope of speculation and negation
<sentence id="S1345.2">Correlation with the patient's height and weight <xcope id="X1345.2.1"><cue type="speculation" ref="X1345.2.1">may</cue> be some value</xcope>.</sentence>
12
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Comparison between the BioScope Corpus and our corpus
Type of word Our Consensus BioScope
Unique negation cues
13 19
Negation cues occurring only once
5 10
Unique speculation cues
408 79
Speculation cues occurring only once
294 19
13
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Our corpus/the BioScope Corpus
1. Not so detailed guidelines/More detailed guidelines
2. Consensus with majority decision/Resolving differences with chief annotator (also higher inter-annotator agreement)
3. Assessment part from many clinics/Radiology reports
14
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Experiment with the Stanford Named Entity Recognizer
Based on Conditional Random Fields
• Detections of cues and certain/uncertain
• Comparison between our corpus and the BioScope Corpus
15
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Result of automatic detection of cues for negation
Precision Recall
Our corpus 0.879 0.917
The BioScope corpus
0.976 0.967
16
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Result of automatic detection of cues for speculation
Precision Recall
Our corpus 0.674 0.354
The BioScope corpus
0.946 0.908
17
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Result of automatic detection of class and scope
Precision Recall
Our corpus(Uncertain expression)
0.494 0.371
Precision Recall
BioScope(Scope for either negation or speculation)
0.838 0.812
18
Dalianis & Skeppstedt, NeSp-NLP July 10, 2010
Conclusion and next step
1. Low results for detecting cues for speculation and class in our constructed corpus
2. Simplifying the task can hopefully result in:• Higher inter-annotator agreement• Easier to automatically learn to detect speculation
19