semi-automatic building method for a multidimensional affect dictionary for a new language

Semi-automatic Building Method for a Multidimensional Affect Dictionary

for a New Language

Guillaume Pitel, Gregory Grefenstette

LREC2008

Manually Built Resources

• Defining Semantic Dimensions of Affect


• Creating seed words– L1 : For each dimension, select 2 to 4 words.

Total 229 seed words.– L2 : Extended L1 to average 10 words per

class. Total 881 seed words.


• Creating gold standard– L3 : Using a synonyms dictionary(*), and

manually deleting some words by a human annotator.

– Total 4980 word-to-class relations (3513 distinct words, a word can belong to more than one class.)

– L2 was included, so leaving 2632 words for evaluation.

Classifying affect words along theirdimensions

• SL-dLSA+SVM• Semantic Likeliness from diversified LSA and SVM.• δ [1..10, 15, 20, 25, 30]∈ : window size.• Considered the windows [0, + δ], [− δ, + δ], [− δ, 0].• For each word, each window will create 300 dimen

sions LSA vector.• Total 12600 dimensions.

– Raw cooccurence matrices would have totalized some 5.3 million dimensions.

– A 44 class SVM classifier was trained.

Scores of the SL-dLSA+SVM 44 class classifier

• Trained on L1 • Trained on L2

Scores of the SL-dLSA+SVM 44 class classifier

• Classification of the word “désagrément” using SL-dLSA+SVM with L2

• Classification of the word “disgrâce” using SL-dLSA+SVM with L2

=Annoyance, unpleasantness =disgrace, disfavour

Classifying with SL-PMI measure

• Semantic Orientation Pointwise Mutual Information (Turney and Littman, 2002)– SO-PMI measure is intended to evaluate t

he positiveness/negativeness of a given word.– They adapt SO-PMI to a likeliness measure.

Classifying with SL-PMI measure

• SL-PMI_C(Semantic Likeliness Pointwise Mutual Information from Information Retrieval for class C)

• H_δ(w1, w2) is the number of cooccurrences of words w1 and w2 in a δ words window.

Scores of the SL-PMI 44 classes classifier


Classifying with SL-LSA measure

• As for the SO-PMI, the original SO-LSA measure is intended to evaluate the positiveness/negativeness of a given word.

• LSAδ(w) is the vector representing word w in a LSA space built with a δ words window.

F-Scores for the SL-LSA 44 classes classifiers


F-scores of the classification methods

• Using L1 as the training data.

• Using L2 as the training data.

Improvement ratios between L2 and L1 F-scores

Perspectives

• They we did not evaluate the SVM classifier on simple LSA feature spaces.

• SL-LSA family of classifiers– had similar f-score, but their kappa agreement were v

ery low(0.26~0.34).– Select the correct answers from SL-LSA(L2,30) and

SL-LSA(L2,2), the f-score would raise from 0.13 to 0.19.

• Train a SL-dLSA+SVM classifier using L3 data.

Perspectives

• Some of classes are partial overlapping.• Advantage and Facilitation• Comfort and Pleasure• Admiration and Praise• See page 7

semi-automatic building method for a multidimensional affect dictionary for a new language

Documents

sldlsa svm classifier

words window

sllsa measureas

slpmi measureslpmi

affect words

words w1

class svm classifier

distinct words