lrec, malta mayapril 20 th, 2010 annotation scheme and gold standard for dutch sentiment-bearing...
Embed Size (px)
TRANSCRIPT

LREC, Malta MayApril 20th, 2010
Annotation Scheme and Gold Standard for
Dutch sentiment-bearing Adjectives
Isa Maks and Piek Vossen
Faculty of Arts, VU University Amsterdam

LREC, Malta MayApril 20th, 2010
Overview of Presentation
• Annotation scheme for subjectivity and sentiment annotation of (Dutch) Adjectives
• Composition of the Gold Standard
• Results of Human Annotation Task
• Conclusions and Future Work

LREC, Malta MayApril 20th, 2010
Sentiment Lexicon
tools for automatic
lexicon building
systems for rich automatic
sentiment analysis and opinion mining
Gold Standard to evaluate
automatically built sentiment lexicons
Guidelines for sentiment and
subjectivity annotation
evaluation
morphology, morfo-syntax, semantics,
usage, etc.
sentiment and subjectivity information
Sentiment Lexicon
Dutch Wordnet
(synsets)
Dutch Reference Lexicon
(lexical units)

LREC, Malta MayApril 20th, 2010
Existing Annotation Schemata
• ‘prior’ polarity: positive (good), negative (ugly), neutral (direct), posneg (curious)
• ‘prior’ subjectivity: subjective or objective alarm [emotion] vs. alarm [device]
• annotation at word sense (or synset) level
Wiebe et al. (2004, 2006), Su et al. (2008)
• new: Attitude Holder

LREC, Malta MayApril 20th, 2010
Attitude Holder (examples)
• ...there are reports from inside Gaza (AE) that criticize (NEG) Hamas (TOPIC)
• ...the dominant media (AE) vilify (NEG) Hamas (TOPIC) and …. (SW)
• Bush (AE) is angry (NEG) about Obama’s behaviour (TOPIC) ...
• Bush is bad (NEG) for the economy ... (SW)
SW = Speaker or Writer AE = Agent or Experiencer

LREC, Malta MayApril 20th, 2010
Values for Attitude Holder Annotation
• SW: speaker’s or writer’s attitude (bad, ugly, beautiful)
• AE: agent’s or experiencer’s attitude ( angry, bent on)
• no-specific attitude holder (water proof, rainy, biological)

LREC, Malta MayApril 20th, 2010
burgerlijk (civil) obj - ntr n.r. Burgerlijk huwelijk (civil marriage)
burgerlijk(narrow-minded)
subj SW neg judgment (moral) Zijn buren zijn vreselijk burgerlijk (his neighbours are terribly narrow-minded)
wreed (cruel) subj SW neg judgment (moral) Een wrede despoot (a cruel tyrant)
wreed (fantastic, cool) subj SW pos appreciation Ze rijden daar in vet wrede auto’s rond (they drive around in really cool cars)
gelukkig (happy, satisfied) subj AE pos emotion Bos is gelukkig met Zalms keuze (Bos is happy with Zalm’s choice)
bijziend (myopic) obj - neg descriptive Zo’n 30% van de bevolking is bijziend (30% of the population is myopic)
afkerig (averse) subj AE neg appreciation Hij is afkerig van geld (he is averse of money)
Subjectivity Polarity
Attitude Holder
Semantic CategoryLexical Unit Illustration

LREC, Malta MayApril 20th, 2010
a cautious estimate
W.B. is happy with the choice for ..
Bush is angry over Obama's leeking of private conversation .....
They drive around in beautiful cars Bush is bad for the economy ….
water resistant watches
deaf man
civil marriage
Polarity, Subjectivity andAttitude Holder
Subj=subjectivity Obj=objectivityAE= Agent/Experiencer SW=Speaker/Writer No-AH=no attitude holder

LREC, Malta MayApril 20th, 2010
Gold Standard Annotation Schema
Summarizing: • annotation at word sense level (instead of word level)
because words may be subjectivity or polarity ambiguous
• annotation of subjectivity (objective vs. subjective), polarity (positive, negative, posneg, neutral) ,
• attitude holder (whose opinion: speaker/writer or agent/experiencer)
Question: How reliable is human annotation with a complex
schema for subjectivity annotation

LREC, Malta MayApril 20th, 2010
Data Set Gold Standard
Requirements:• representative of the whole lexicon• relevant to automatic annotation of subjectivity
– inclusion of subjective and objective lexical items – equal distribution of items across the lexicon with regard
to frequency, polysemy and synset size
English• General Inquirer (Stone, 1966), Hatzivassiloglou, V.
et al. (1997), Riloff and Wiebe (2005) , Jijkoun et al. (2008)
• Micro-WNOp (Cerini et al., 2007), Su et al. (2008)

LREC, Malta MayApril 20th, 2010
Composition Gold Standard ADJECTIVES
frequency polysemy synsetsize
high 179 129 202
mid 164 239 256
low 266 241 151
609 609 609
3 variants:• 609 lexical units• 512 synsets• 390 words

LREC, Malta MayApril 20th, 2010
Inter-annotator resultsPolarity Annotation
Attitude Holder Annotation
Both
86.3% (k=0.80) 87% (k=0.73) 79% (k=0.73)
single-category kappa computation
polarity attitude holder both polarity and attitude holder
overall agreement for 2 annotators

LREC, Malta MayApril 20th, 2010
Comparison with Other Studies
word sense level
word level
word level
synset level

LREC, Malta MayApril 20th, 2010
Analysis of Disagreements
• OBJ-neg (0.34) vs. SW-neg and OBJ-pos (0.23) vs. SW-pos
kaalhoofdig (bald-headed), oud (old- having lived for a long time) , mute (doofstom), droog (dry), langzaam (slow), zuiver (pure), etc.
• AE-pos vs. SW-negbelust (bent on) - hij is belust op geld (he is bent on money)

LREC, Malta MayApril 20th, 2010
human annotations across various lexicon dimensions
agreement decreases when word frequency increases
agreement decreases when polysemy increases
agreement increases when item is a member of a large synset
(65%)

LREC, Malta MayApril 20th, 2010
Conclusions
Development of similar annotation schemes and gold standards for nouns and verbs use of the gold standard to test methods and techniques to build a sentiment lexicon for Dutch
We designed a new annotation scheme for polarity, subjectivity and attitude holder annotation and showed that all substantial categories can be reliably annotated by human annotators. We assume that this holds for automatic annotation as well. We aimed at an equal distribution of test items across 3 lexicon dimensions (word frequency, large synset membership and polysemy) relevant to subjectivity and polarity identification; we measured correlations between polarity annotation and each of these lexicon dimensions.
Future Work

LREC, Malta MayApril 20th, 2010
Acknowledgements
• The research is part of the project From Text To Political Positions (http://www2.let.vu.nl/oz/cltl/t2pp)
• Funded by the Interfaculty Reseach Institute CAMeRA - VU university Amsterdam
• Gold standard data available at (http:// www2.let.vu.nl/ oz/cltl/t2pp)

LREC, Malta MayApril 20th, 2010
Thank you for your attention

LREC, Malta MayApril 20th, 2010
obj SW ntr burgerlijk (civil) Burgerlijk huwelijk (civil marriage)
obj SW neg descriptive bijziend (myopic) Zo’n 30% van de bevolking is bijziend (30% of the population is myopic)
subj SW neg judgment (moral) wreed (cruel) Een wrede despoot (a cruel tyrant)
subj SW pos appreciation wreed (fantastic) Ze rijden daar in vet wrede auto’s rond (they drive around in cool cars)
subj SW neg judgment (moral) burgerlijk(narrow-minded)
Zijn buren zijn vreselijk burgerlijk (his neighbours are terribly narrow-minded)
subj AE pos emotion boos (angry) Bos is gelukkig met Zalms keuze (Bos is happy with Zalm’s choice)
subj AE neg appreciation afkerig (averse) Hij is afkerig van geld (he is averse of money)
SubjectivityPolarity
Attitude Holder Semantic Category
Lexical Unit Illustration

LREC, Malta MayApril 20th, 2010
Attitude holder: CDA-lijsttrekker
Polarity negative
topic:
linkse coalitie

LREC, Malta MayApril 20th, 2010
polarity Subj vs. Obj
Attitude holder
This Study 86%
κ=0.80
87%
κ =0.73
Jijkoun et al.
(2008)
79%
κ =0.66
Andreevskaia et al.(2006)
79%
Su et al.(2009) 89%
κ =0.83
90%
κ =0.79

LREC, Malta MayApril 20th, 2010
What is an Opinion or Attitude(Kim, Hovy 2006)
(1) Bush is bad for the economy
(2) Bush is angry about Obama’s behaviour
judgment
emotion -> judgment
sentence1 sentence2
attitude holder Speaker/writer Bush
polarity negative negative
topic Bush (for the economy) Obama’s behaviour

LREC, Malta MayApril 20th, 2010
Gold standard distribution
5% 2%
30%
29%
6%
21%
2%
4%
1%
AE-neg
AE-pos
SW-neg
SW-pos
SW-pn
OBJ-ntr
OBJ-neg
OBJ-pos
OBJ-pnAE8%
SW63%
OBJ29%
AE SW OBJ
polarity
attitude holder
polarity and attitude holder

LREC, Malta MayApril 20th, 2010