dynamicconstruconofdiconaries(( forsenmentclassificaon( · bonne((good) 1.360 0.509 posive triste(...
TRANSCRIPT
-
Dynamic construc.on of dic.onaries for sen.ment classifica.on
Hanen Ameur* Salma Jamoussi #
*,# Mul(media, InfoRma(on systems and Advanced Compu(ng Laboratory MIRACL-‐Sfax University, Sfax-‐Tunisia
IEEE Interna.onal Conference on Data Mining -‐ SENTIRE
(ICDMW 2013)
-
Outline
1-‐ Introduc(on
2-‐ Related work
3-‐ Proposed method
4-‐ Conclusion and future work
3.1-‐ Acquisi(on and prepara(on of the corpus
3.2-‐ Dic(onaries construc(on
3.3-‐ Comment classifica(on
3.4-‐ Evalua(on and discussion
-
The sen(ments have become numerically documented.
How can I analyze and classify automa.cally an opinion text based on its textual content?
Introduc.on
With the emergence of web 2.0 and the advent of community sites, hundreds of thousands of sen(ments are shared and circulated every day on the canvas.
It is very important to process the sen(ments in many fields (commercial, poli(cal and others).
-
Introduc.on
Goal: Determine the polarity "sen.mental orienta.on" of a text bearer sen(ment and the intensity of its polarity “valence”.
Differen(a(on between posi.ve and nega.ve sen(ment.
Sentiment classification
Linguistic approach Statistic approach Hybrid approach
-
Related work (Linguis.c approach)
1-‐Sen(ment lexicon construc(on.
Linguis.c approach is depicted in two essen(al tasks:
2-‐Subjec(ve sentence classifica(on. § Manual method (by experts), e.g. [1] and [2].
§ Dic(onaries-‐based method , e.g. [3] and [4]
§ Corpora based method , e.g. [5] and [6]
§ Hybrid method , e.g. [7]
§ Concept based method , e.g. [8], [9] and [10]
èCount the number of posi(ve words and the number of nega(ve words present in a sentence. [11] and [12]
-
Proposed method
Acquisi.on and prepara.on of the corpus
Pretreated corpus
Learning corpus
Test corpus
Facebook Pages
Dic.onaries construc.on
Comment classifica.on
Evalua.on
P N
Test corpus classified by the expert
Test corpus classified by the system
U.liza.on Enrichment
-
Proposed method Acquisi.on and prepara.on of the corpus
It contains comments collected from the Facebook social network.
A lack of corpus Facebook
Automa(c construc(on of corpus Facebook.
32 poli.cal pages
acquisi.on and structuring program
The raw material of a sen(ment classifica(on system.
-
Proposed method Acquisi.on and prepara.on of the corpus
1-‐Normaliza.on
2-‐Filtering
4-‐Lemma.za.on
5-‐Stop words removal
Lengthening
@ Ridha I am against the frauds commieed with Ben Ali **thief :((( hep://www.facebook.com/.
@ Ridha i am against the frauds commieed with ben ali thief :((( hep://www.facebook.com/.
i am against the frauds commieed with ben ali thief :(((
je suis contre les fraudes commises avec ben ali voleur :(((
je être contre le fraude commeere avec ben ali voleur :(((
contre fraude commeere ben ali voleur :(((
contre fraude comme^re ben ali voleur :(
3-‐Transla.on
-
Proposed method
Pretreated corpus
Learning corpus
Test corpus
Acquisi.on and prepara.on of the corpus
Facebook Pages
Dic.onaries construc.on
Comment classifica.on
Evalua.on
P N
Test corpus classified by the expert
Test corpus classified by the system
U.liza.on Enrichment
-
Proposed method Dic.onaries construc.on
Goal: Generate two dic(onaries (posi.ve and nega.ve) covering the majority of the sen(ment lexicon related to our applica(on.
Learning corpus
Ini(al dic(onaries construc(on
Dic(onaries enrichment
Nega.ve Posi.ve
-
Proposed method Dic.onaries construc.on
Ini.al dic.onaries construc.on
Positive emotion symbols
Negative emotion symbols
:) :p :-*
-
Proposed method Dic.onaries construc.on
Ini.al dic.onaries construc.on
Principle :
@ Calculate the posi(ve and nega(ve valences of each lexicon word.
Frequency of the word m with emo(on symbols having the polarity pol, with pol={pos,neg}. The sum of the frequencies of all words present in the dic.onary of polarity pol.
Each word can be present in the two dic.onaries, but with different valences.
-
Proposed method Dic.onaries construc.on
Ini.al dic.onaries construc.on
Principle :
@ Calculate the posi(ve and nega(ve valences of each lexicon word.
@ Compare the two posi(ve and nega(ve valences of the word.
Word Posi.ve valence Nega.ve valence
Polarity
paix (peace) 4.667 3.364 Posi.ve
bonne (good) 1.360 0.509 Posi.ve
triste (sad) 0.146 1.019 Nega.ve
guerre (war) 0.236 0.815 Nega.ve
-
Proposed method Dic.onaries construc.on
Dic.onaries enrichment
Goals :
@ It is based on the words present in the ini.al dic.onaries whose their valences are known.
-‐ Se^le and adjust the valences of words present in the dic(onaries.
-‐ Add new words to the dic(onaries.
This gentleman deserves the respect, he is the best
Pretreatment
monsieur mériter respect meilleur
Words Frequencies Valences monsieur 14.4682 0.4608 mériter 16.1875 0.5236
meilleur 67.7028 2.3637
…
Nega.ve dic.onary Words Frequencies Valences monsieur 12.5258 0.2879 mériter 81.3735 1.8664
meilleur 241.4172 5.5568
…
Posi.ve dic.onary
-
Proposed method Dic.onaries construc.on
Dic.onaries enrichment
Goals :
@ It is based on the words present in the ini.al dic.onaries whose their valences are known.
-‐ Se^le and adjust the valences of words present in the dic(onaries.
-‐ Add new words to the dic(onaries.
This gentleman deserves the respect, he is the best
Pretreatment
monsieur mériter respect meilleur
1- Calculate the posi(ve and nega(ve valences of the comment.
Valences 7.3317 2.9928
-
Proposed method Dic.onaries construc.on
Dic.onaries enrichment
Goals :
@ It is based on the words present in the ini.al dic.onaries whose their valences are known.
-‐ Se^le and adjust the valences of words present in the dic(onaries.
-‐ Add new words to the dic(onaries.
This gentleman deserves the respect, he is the best
Pretreatment
monsieur mériter respect meilleur
2- Compare the posi(ve and nega(ve valences of the comment.
Valences 7.3317 2.9928
Calculate the percentage of the polarity of the comment.
%Pos= 0.7107
-
Proposed method Dic.onaries construc.on
Dic.onaries enrichment
Goals :
@ It is based on the words present in the ini.al dic.onaries whose their valences are known.
-‐ Se^le and adjust the valences of words present in the dic(onaries.
-‐ Add new words to the dic(onaries.
This gentleman deserves the respect, he is the best
Pretreatment
monsieur mériter respect meilleur
Valences 7.3317 2.9928
%Pos= 0.7107 Words Frequencies Valences monsieur 12.5258 0.2879
mériter 81.3735 1.8664
meilleur 241.4172 5.5568
…
Posi.ve dic.onary
13.2359 0.3042
82.0836 1.8866
242.1273 5.5649 respect 0.7107 0.0163
Modify the valences of the existent words
è Add nonexistent words
-
Proposed method Dic.onaries construc.on
Dic.onaries enrichment
@ Apply the enrichment principle on 30000 comments.
Ini.al dic.onaries Enriched dic.onaries
Posi.ve Nega.ve Posi.ve Nega.ve
23705 5106 44302 46420
Ini.al dic.onaries Enriched dic.onaries
Posi.ve valence Nega.ve valence Posi.ve valence Nega.ve valence
peace 4.6702 3.3646 5.5566 2.5632
love 1.2491 1.3254 1.1789 0.5572
urgency 0.1012 0 0.0686 0.0988
educe 0 0 0 0.213
• Quan.ta.ve viewpoint
• Qualita.ve viewpoint
-‐ Adjust the valences of exis(ng words.
-‐ Correct the valences of exis(ng words.
-‐ Add words to the dic(onaries.
-
Proposed method Dic.onaries construc.on
Handling nega.on
@ Reverse the polarity of all words preceded by one of the nega(on par(cles.
The nega.on par.cles
'ne', 'n', 'pas', 'ni', 'jamais', 'aucun', 'no‘, 'none', 'not', 'neither', 'never',
'ever',
لم', 'لن', 'ال' '
@ At the enrichment dic.onaries
Add words preceded by a nega(on par(cle to the dic(onary that corresponds to the inverse comment’s polarity.
First method :
Reverse also the frequencies of all words preceded by a nega(on par(cle, when calcula(ng the valences of the comment.
Second method :
-
Proposed method
Pretreated corpus
Learning corpus
Test corpus
Acquisi.on and prepara.on of the corpus
Facebook Pages
Dic.onaries construc.on
Comment classifica.on
Evalua.on
P N
Test corpus classified by the expert
Test corpus classified by the system
U.liza.on Enrichment
-
Proposed method Comment classifica.on
Goal: Determine the polari.es of comments (posi(ve/nega(ve) using the dic(onaries obtained in the previous step.
We love you Mister President Obama.
Pretreatment
aimer monsieur président obama
2.8696 0.5626 4.3439 5.8631 0.2039 0.3058 2.3450 2.1411
Valences ∑ Pcom : 10.207 Ncom : 4.4861
-
Proposed method
Pretreated corpus
Learning corpus
Test corpus
Acquisi.on and prepara.on of the corpus
Facebook Pages
Dic.onaries construc.on
Comment classifica.on
Evalua.on
P N
Test corpus classified by the expert
Test corpus classified by the system
U.liza.on Enrichment
-
Proposed method Evalua.on
The external evalua.on
@ Measure the adequacy between the results of our system classifica(on and that made by the expert.
Poli.cal dic.onaries
Ini.al Enriched
w/ nega.on w/o nega.on Method 1 Method 2
Error rate 35.02 33.75 18.75 19.83
Recall 65.08 66.39 80.76 80.76
Accuracy 66.06 66.52 81.01 80.07
F-‐score 65.57 66.45 80.89 80.42
-
Conclusion and future work
Realize an automa.c classifica.on system of comments derived from the social network Facebook.
Construc(on of dic.onaries covering the majority of the sen(ment lexicon from our learning corpus (based on linguis.c approach).
They are used for the calcula(on of the posi(ve and nega(ve polari.es of comments.
Propose a sta.s.c method to construct a sen(ment lexicon served for the sen(ment classifica(on.
Find a vector representa.on of words and sentences and exploit the classifica(on methods (supervised, unsupervised, and semi-‐supervised).
-
References
[1] Nasukawa, T., Yi, J., Sen(ment analysis: Capturing favorability using natural language processing. In: Proceedings of the Conference on Knowledge Capture (K-‐CAP). New York, NY, USA (2003)
[2] Wilson, T., Wiebe, J., Homann, P., Recognizing contextual polarity in phrase-‐level sen(ment analysis. In: Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). Vancouver, CA (2005)
[3] J. Kamps and M. Marx, “Words with astude,” in 1st Interna(onal WordNet Conference, Mysore, India, 2002.
[4] M. Taboada, C. Anthony, and K. Voll, “Methods for crea(ng seman(c orienta(on dic(onaries,” in Conference on Language Resources and Evalua(on (LREC), 2006.
[5] H. Kanayama and T. Nasukawa, “Fully automa(c lexicon expansion for domain-‐oriented sen(ment analysis,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia, July 2006.
[6] X. Ding and B. Liu, “The u(lity of linguis(c rules in opinion mining,” in Proceedings of the 30th annual interna(onal ACM SIGIR conference on Research and development in informa(on retrieval, New York, NY, USA, 2007.
-
References
[7] A. Pak and P. Paroubek, “Construc(on dun lexique affec(f pour le franais par(r de twieer,” TALN 2010. B(ment 508, F-‐91405 Orsay Cedex, France: Universit de Paris-‐Sud, juillet2010. [8] E. Cambria, T. Mazzocco, and A. Hussain, “Applica(on of mul(-‐dimensional scaling and ar(ficial neural networks for biologically inspired opinion mining,” Biologically Inspired Cogni(ve Architectures, vol. 4, p. 4153, 2013. [9] M. Grassi, E. Cambria, A. Hussain, and F. Piazza, “Sen(c web: A new paradigm for managing social media affec(ve informa(on,” Cogni(ve Computa(on, vol. 3, no. 3, pp. 480– 489, 2011. [10] E. Cambria, B. Schuller, Y. Xia, and C. Havasi, “New avenues in opinion mining and sen(ment analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15–21, 2013. [11] M. Hu and B. Liu, “Mining opinion features in customer reviews,” in Proceedings of AAAI, Seaele, WA, USA, 2004. [12] D. Poirier, F. Fessant, C. Bothorel, E. Guimier De Neef, and M. Boull´e, “Approches sta(s(que et linguis(que pour la classifica(on de textes d’opinion portant sur les films,” vol. E17, pp. 147–169, 2009.
-
Thank you :)