dynamicconstruconofdiconaries(( forsenmentclassiﬁcaon( · bonne((good) 1.360 0.509 posive triste(...

Dynamic construc.on of dic.onaries for sen.ment classifica.on

Hanen Ameur* Salma Jamoussi #

*,# Mul(media, InfoRma(on systems and Advanced Compu(ng Laboratory MIRACL-‐Sfax University, Sfax-‐Tunisia

IEEE Interna.onal Conference on Data Mining -‐ SENTIRE

(ICDMW 2013)

Outline

1-‐ Introduc(on

2-‐ Related work

3-‐ Proposed method

4-‐ Conclusion and future work

3.1-‐ Acquisi(on and prepara(on of the corpus

3.2-‐ Dic(onaries construc(on

3.3-‐ Comment classifica(on

3.4-‐ Evalua(on and discussion

The sen(ments have become numerically documented.

How can I analyze and classify automa.cally an opinion text based on its textual content?

Introduc.on

With the emergence of web 2.0 and the advent of community sites, hundreds of thousands of sen(ments are shared and circulated every day on the canvas.

It is very important to process the sen(ments in many fields (commercial, poli(cal and others).

Introduc.on

Goal: Determine the polarity "sen.mental orienta.on" of a text bearer sen(ment and the intensity of its polarity “valence”.

Differen(a(on between posi.ve and nega.ve sen(ment.

Sentiment classification

Linguistic approach Statistic approach Hybrid approach

Related work (Linguis.c approach)

1-‐Sen(ment lexicon construc(on.

Linguis.c approach is depicted in two essen(al tasks:

2-‐Subjec(ve sentence classifica(on. § Manual method (by experts), e.g. [1] and [2].

§ Dic(onaries-‐based method , e.g. [3] and [4]

§ Corpora based method , e.g. [5] and [6]

§ Hybrid method , e.g. [7]

§ Concept based method , e.g. [8], [9] and [10]

èCount the number of posi(ve words and the number of nega(ve words present in a sentence. [11] and [12]

Proposed method

Acquisi.on and prepara.on of the corpus

Pretreated corpus

Learning corpus

Test corpus

Facebook Pages

Dic.onaries construc.on

Comment classifica.on

Evalua.on

P N

Test corpus classified by the expert

Test corpus classified by the system

U.liza.on Enrichment

Proposed method Acquisi.on and prepara.on of the corpus

It contains comments collected from the Facebook social network.

A lack of corpus Facebook

Automa(c construc(on of corpus Facebook.

32 poli.cal pages

acquisi.on and structuring program

The raw material of a sen(ment classifica(on system.

Proposed method Acquisi.on and prepara.on of the corpus

1-‐Normaliza.on

2-‐Filtering

4-‐Lemma.za.on

5-‐Stop words removal

Lengthening

@ Ridha I am against the frauds commieed with Ben Ali **thief :((( hep://www.facebook.com/.

@ Ridha i am against the frauds commieed with ben ali thief :((( hep://www.facebook.com/.

i am against the frauds commieed with ben ali thief :(((

je suis contre les fraudes commises avec ben ali voleur :(((

je être contre le fraude commeere avec ben ali voleur :(((

contre fraude commeere ben ali voleur :(((

contre fraude comme^re ben ali voleur :(

3-‐Transla.on

Proposed method

Pretreated corpus

Learning corpus

Test corpus


Facebook Pages



Evalua.on

P N




Proposed method Dic.onaries construc.on

Goal: Generate two dic(onaries (posi.ve and nega.ve) covering the majority of the sen(ment lexicon related to our applica(on.

Learning corpus

Ini(al dic(onaries construc(on

Dic(onaries enrichment

Nega.ve Posi.ve


Ini.al dic.onaries construc.on

Positive emotion symbols

Negative emotion symbols

:) :p :-*



Principle :

@ Calculate the posi(ve and nega(ve valences of each lexicon word.

Frequency of the word m with emo(on symbols having the polarity pol, with pol={pos,neg}. The sum of the frequencies of all words present in the dic.onary of polarity pol.

Each word can be present in the two dic.onaries, but with different valences.



Principle :

@ Calculate the posi(ve and nega(ve valences of each lexicon word.

@ Compare the two posi(ve and nega(ve valences of the word.

Word Posi.ve valence Nega.ve valence

Polarity

paix (peace) 4.667 3.364 Posi.ve

bonne (good) 1.360 0.509 Posi.ve

triste (sad) 0.146 1.019 Nega.ve

guerre (war) 0.236 0.815 Nega.ve


Dic.onaries enrichment

Goals :

@ It is based on the words present in the ini.al dic.onaries whose their valences are known.

-‐ Se^le and adjust the valences of words present in the dic(onaries.

-‐ Add new words to the dic(onaries.

This gentleman deserves the respect, he is the best

Pretreatment

monsieur mériter respect meilleur

Words Frequencies Valences monsieur 14.4682 0.4608 mériter 16.1875 0.5236

meilleur 67.7028 2.3637

…

Nega.ve dic.onary Words Frequencies Valences monsieur 12.5258 0.2879 mériter 81.3735 1.8664

meilleur 241.4172 5.5568

…

Posi.ve dic.onary



Goals :





Pretreatment


1- Calculate the posi(ve and nega(ve valences of the comment.

Valences 7.3317 2.9928



Goals :





Pretreatment


2- Compare the posi(ve and nega(ve valences of the comment.

Valences 7.3317 2.9928

Calculate the percentage of the polarity of the comment.

%Pos= 0.7107



Goals :





Pretreatment


Valences 7.3317 2.9928

%Pos= 0.7107 Words Frequencies Valences monsieur 12.5258 0.2879

mériter 81.3735 1.8664

meilleur 241.4172 5.5568

…

Posi.ve dic.onary

13.2359 0.3042

82.0836 1.8866

242.1273 5.5649 respect 0.7107 0.0163

Modify the valences of the existent words

è Add nonexistent words



@ Apply the enrichment principle on 30000 comments.

Ini.al dic.onaries Enriched dic.onaries

Posi.ve Nega.ve Posi.ve Nega.ve

23705 5106 44302 46420

Ini.al dic.onaries Enriched dic.onaries

Posi.ve valence Nega.ve valence Posi.ve valence Nega.ve valence

peace 4.6702 3.3646 5.5566 2.5632

love 1.2491 1.3254 1.1789 0.5572

urgency 0.1012 0 0.0686 0.0988

educe 0 0 0 0.213

• Quan.ta.ve viewpoint

• Qualita.ve viewpoint

-‐ Adjust the valences of exis(ng words.

-‐ Correct the valences of exis(ng words.

-‐ Add words to the dic(onaries.


Handling nega.on

@ Reverse the polarity of all words preceded by one of the nega(on par(cles.

The nega.on par.cles

'ne', 'n', 'pas', 'ni', 'jamais', 'aucun', 'no‘, 'none', 'not', 'neither', 'never',

'ever',

لم', 'لن', 'ال' '

@ At the enrichment dic.onaries

Add words preceded by a nega(on par(cle to the dic(onary that corresponds to the inverse comment’s polarity.

First method :

Reverse also the frequencies of all words preceded by a nega(on par(cle, when calcula(ng the valences of the comment.

Second method :

Proposed method

Pretreated corpus

Learning corpus

Test corpus


Facebook Pages



Evalua.on

P N




Proposed method Comment classifica.on

Goal: Determine the polari.es of comments (posi(ve/nega(ve) using the dic(onaries obtained in the previous step.

We love you Mister President Obama.

Pretreatment

aimer monsieur président obama

2.8696 0.5626 4.3439 5.8631 0.2039 0.3058 2.3450 2.1411

Valences ∑ Pcom : 10.207 Ncom : 4.4861

Proposed method

Pretreated corpus

Learning corpus

Test corpus


Facebook Pages



Evalua.on

P N




Proposed method Evalua.on

The external evalua.on

@ Measure the adequacy between the results of our system classifica(on and that made by the expert.

Poli.cal dic.onaries

Ini.al Enriched

w/ nega.on w/o nega.on Method 1 Method 2

Error rate 35.02 33.75 18.75 19.83

Recall 65.08 66.39 80.76 80.76

Accuracy 66.06 66.52 81.01 80.07

F-‐score 65.57 66.45 80.89 80.42

Conclusion and future work

Realize an automa.c classifica.on system of comments derived from the social network Facebook.

Construc(on of dic.onaries covering the majority of the sen(ment lexicon from our learning corpus (based on linguis.c approach).

They are used for the calcula(on of the posi(ve and nega(ve polari.es of comments.

Propose a sta.s.c method to construct a sen(ment lexicon served for the sen(ment classifica(on.

Find a vector representa.on of words and sentences and exploit the classifica(on methods (supervised, unsupervised, and semi-‐supervised).

References

[1] Nasukawa, T., Yi, J., Sen(ment analysis: Capturing favorability using natural language processing. In: Proceedings of the Conference on Knowledge Capture (K-‐CAP). New York, NY, USA (2003)

[2] Wilson, T., Wiebe, J., Homann, P., Recognizing contextual polarity in phrase-‐level sen(ment analysis. In: Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). Vancouver, CA (2005)

[3] J. Kamps and M. Marx, “Words with astude,” in 1st Interna(onal WordNet Conference, Mysore, India, 2002.

[4] M. Taboada, C. Anthony, and K. Voll, “Methods for crea(ng seman(c orienta(on dic(onaries,” in Conference on Language Resources and Evalua(on (LREC), 2006.

[5] H. Kanayama and T. Nasukawa, “Fully automa(c lexicon expansion for domain-‐oriented sen(ment analysis,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia, July 2006.

[6] X. Ding and B. Liu, “The u(lity of linguis(c rules in opinion mining,” in Proceedings of the 30th annual interna(onal ACM SIGIR conference on Research and development in informa(on retrieval, New York, NY, USA, 2007.

References

[7] A. Pak and P. Paroubek, “Construc(on dun lexique affec(f pour le franais par(r de twieer,” TALN 2010. B(ment 508, F-‐91405 Orsay Cedex, France: Universit de Paris-‐Sud, juillet2010. [8] E. Cambria, T. Mazzocco, and A. Hussain, “Applica(on of mul(-‐dimensional scaling and ar(ficial neural networks for biologically inspired opinion mining,” Biologically Inspired Cogni(ve Architectures, vol. 4, p. 4153, 2013. [9] M. Grassi, E. Cambria, A. Hussain, and F. Piazza, “Sen(c web: A new paradigm for managing social media affec(ve informa(on,” Cogni(ve Computa(on, vol. 3, no. 3, pp. 480– 489, 2011. [10] E. Cambria, B. Schuller, Y. Xia, and C. Havasi, “New avenues in opinion mining and sen(ment analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15–21, 2013. [11] M. Hu and B. Liu, “Mining opinion features in customer reviews,” in Proceedings of AAAI, Seaele, WA, USA, 2004. [12] D. Poirier, F. Fessant, C. Bothorel, E. Guimier De Neef, and M. Boull´e, “Approches sta(s(que et linguis(que pour la classifica(on de textes d’opinion portant sur les films,” vol. E17, pp. 147–169, 2009.

Thank you :)

dynamicconstruconofdiconaries(( forsenmentclassiﬁcaon( · bonne((good) 1.360 0.509 posive triste(...

Documents