![Page 1: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/1.jpg)
Exploiting Subjectivity Classificationto Improve Information Extraction
Ellen Riloff University of Utah
Janyce Wiebe University of Pittsburgh
William Phillips University of Utah
![Page 2: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/2.jpg)
Subjectivity ?
• Definition: Subjective language expresses or refers to opinions, emotions, sentiments and other private states.
• Related Work: – Sentiments (Turney & Littman 2003; Dave, Lawrence, &
Pennock 2003; Pang & Lee 2004)
– Product Reputation Tracking (Morinaga et al. 2002; Yi et al. 2003)
– Opinion Oriented Summarization and QA (Hu & Liu 2004; Yu & Hatzivassiloglou 2003)
• Opinion - personal beliefs
• Emotion - state of mind
• Sentiments - positive/negative judgements
![Page 3: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/3.jpg)
Motivation
• Our observation: many false hits produced by Information Extraction (IE) systems come from subjective sentences.
• Hypothesis: we can improve IE performance by avoiding extractions from subjective sentences.
![Page 4: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/4.jpg)
Examples
“D’Aubruisson unleashed harsh attacks on Duarte…”
“The Parliament exploded into fury against the government when word leaked out…”
“The subversives must suspend the aggression against the people and the destruction of the economy…”
![Page 5: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/5.jpg)
The Big Picture
SubjectiveSentenceClassifier
subjectivesentences
objectivesentences
FullInformationExtraction
SelectiveInformationExtraction
![Page 6: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/6.jpg)
The Subjectivity Classifier
• Most documents contain a mix of subjective and objective sentences– 44% of sentences in newspaper articles subjective!
(Wiebe et al. 2004)
• We used the Naïve Bayes subjective sentence classifier developed by Wiebe & Riloff [2005].
– Classifies at sentence level– unsupervised
– rivals best supervised methods
![Page 7: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/7.jpg)
Initial Training Data Creation
rule-based subjectivesentenceclassifier
rule-basedobjectivesentenceclassifier
subjective & objective sentences
unlabeled texts
subjective clues
![Page 8: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/8.jpg)
Naïve Bayestraining
POSfeatures
subjectiveclues
Naïve Bayes Training
extractionpattern learner
training set
objectivepatterns
subjectivepatterns
Naïve BayesClassifier
![Page 9: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/9.jpg)
NB Confidence Measure
CM =
![Page 10: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/10.jpg)
MUC-4 IE Task
• To extract information about terrorist events in Latin America.
• Evaluated performance on 4 types of information:– perpetrators (individuals), victims, targets, weapons
• Corpus: 1700 texts– 1400 used for training, 100 for tuning, 200 for testing
• Used Autoslog-TS to generate extraction patterns– system used 397 patterns
![Page 11: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/11.jpg)
Base IE System Performance
System Rec Prec F #Correct #Wrong
IE .52 .42 .47 266 367
![Page 12: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/12.jpg)
Filtering Subjective Sentences
System Rec Prec F #Correct #Wrong
IE .52 .42 .47 266 367IE+SubjFilter .44 .44 .44 218 (-48) 273 (-94)
![Page 13: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/13.jpg)
Source Attribution Sentences
• In news articles, factual information is often prefaced with a source attribution. Examples:
“The Associated Press reported…” “The President stated…”
• Source attribution sentences often contain important facts even if subjective language is also present.
![Page 14: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/14.jpg)
Source Attribution Modification
• Keep the subjective sentences if they contain a source attribution.
1) the sentence contains a communication verb:
{affirm, announce, cite, confirm, convey, disclose, report,
tell, say, state }
2) the subjectivity classifier considers the sentence to be only
weakly subjective (CM 25)
![Page 15: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/15.jpg)
Results with Source Attribution Modification
System Rec. Prec. F #Correct #Wrong
IE .52 .42 .47 266 367
IE+SubjFilter .44 .44 .44 218(-48) 273(-94)
IE+SubjFilter2 .46 .44 .45 231(-35) 289(-78)
![Page 16: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/16.jpg)
Selective Filtering
• We observed that subjective sentence can contain important facts. For example:
“He was
outraged by the terrorist attack on the World Trade Center.”
• Modification: selectively extract information from subjective sentences
• Done using Indicator Patterns.
![Page 17: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/17.jpg)
Indicator Patterns
• We defined an indicator pattern as a pattern that has the following Autoslog-TS statistics :
P(relevant | pattern) 0.65 and Frequency 10
• Indicator Patterns clearly represent a fact of interest– “murder of X” – “X was assassinated”
.
![Page 18: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/18.jpg)
Results for Selective Subjectivity Filtering
System Rec Prec F #Correct #Wrong
IE .52 .42 .47 266 367
IE+SubjFilter .44 .44 .44 218 (-48) 273 (-94)
IE+SubjFilter2 .46 .44 .45 231 (-35) 289 (-78)
IE+SF2+Slct .51 .45 .48 258 (-8) 311 (-56)
![Page 19: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/19.jpg)
Removing Subjective Extraction Patterns
• Example:
“….to destroy the building.”
“…to destroy the process of reconciliation.”
• Use subjectivity analysis to remove subjective patterns.
• We classified a pattern as subjective if:1)
P(subjective | pattern) > .50 and
2) frequency 10
![Page 20: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/20.jpg)
Final Results
System Rec Prec F #Correct #Wrong
IE .52 .42 .47 266 367
IE+SubjFilter .44 .44 .44 218 (-48) 273 (-94)
IE+SubjFilter2 .46 .44 .45 231 (-35) 289 (-78)
IE+SF2+Slct .51 .45 .48 258 (-8) 311 (-56)
IE+SF2+Slct
-SubjEPs .51 .46 .48 258(-8) 305(-62)
![Page 21: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/21.jpg)
Subjectivity Filtering Combined with Topic Classification
System Rec Prec
IE .52 .42IE w/Perfect TC .52 .53IE w/Perfect TC + SubjFilter .51 .56
![Page 22: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/22.jpg)
Conclusions
• Subjectivity filtering strategies improved IE precision with minimal recall loss.
• The benefits of subjectivity classification are synergistic with those of topic classification.
• As subjectivity classification improves, we expect corresponding improvements to IE.
![Page 23: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/23.jpg)
IE Evaluation
Performed at extraction level, before template generation
Standard IE System
texts extracts
Slot Extraction
Component
Template Generation Component
![Page 24: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/24.jpg)
• We defined an indicator pattern as a pattern that has the following Autoslog-TS statistics :
P(relevant | pattern) 0.65 and Frequency 10
• Using only the indicator patterns for IE not sufficient.
Rec Prec FIE .52 .42 .47IE (Indicators Only) .40 .54 .46
![Page 25: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/25.jpg)
IE System
• We used Autoslog-TS to generate extraction patterns.– 40,553 distinct patterns were learned
• We manually reviewed top patterns (2808 patterns)
• The final system used 397 patterns.
![Page 26: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/26.jpg)
Examples of Filtered Extractions
• The demonstrators, convoked by the solidarity with Latin America Committee, verbally attacked Salvadoran President Alfredo Cristiani and have asked the Spanish government to offer itself as a mediator to promote and end to the armed conflict.
PATTERN: attacked <dobj>VICTIM: “Salvadoran President
Alfredo Cristiani”
![Page 27: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/27.jpg)
Examples of Filtered Extractions
• The crime was directed at hindering the development of the electoral process and destroying the reconciliation process…
PATTERN: destroying <dobj>
TARGET: “the reconciliation process”
• Presidents, political and social figures of the continent have said that the solution is not based on the destruction of a native plant but in active fight against drug consumption.
PATTERN: destruction of <np>
TARGET: “a native plant”
![Page 28: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/28.jpg)
Breakdown by Extraction Type
Category Baseline SubjFilter Rec Prec Rec Prec
Perp .47 .33 .45 .38
Victim .51 .50 .50 .52
Target .63 .42 .62 .47
Weapon .45 .39 .43 .42
Total .52 .42 .51 .46
![Page 29: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/29.jpg)
Subjective Patterns
attacks on <np> to attack <dobj>
communique by <np> to destroy <dobj>
<subj> was linked leaders of <np>
<subj> unleashed was aimed at <np>
offensive against <np> dialogue with <np>
The following extraction patterns were classified as subjective:
![Page 30: Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649ed35503460f94be356f/html5/thumbnails/30.jpg)
Metaphor
• False hits can come from subjective sentences that contain metaphorical language.
The Parliament exploded into fury against the government when word leaked out…