korean twitter emotion classification using automatically ... · lexicon list. this is because...

9
Korean Twitter Emotion Classification Using Automatically Built Emotion Lexicons and Fine-Grained Features Hyo Jin Do and Ho-Jin Choi School of Computing Korea Advanced Institute of Science and Technology Daejeon, Republic of Korea {hjdo, hojinc}@kaist.ac.kr Abstract In recent years many people have begun to express their thoughts and opinions on Twit- ter. Naturally, Twitter has become an ef- fective source to investigate people’s emo- tions for numerous applications. Classifying only positive and negative tweets has been ex- ploited in depth, whereas analyzing finer emo- tions is still a difficult task. More elaborate emotion lexicons should be developed to deal with this problem, but existing lexicon sets are mostly in English. Moreover, building such lexicons is known to be extremely labor- intensive or resource-intensive. Finer-grained features need to be taken into account when determining finer-emotions, but many exist- ing works still utilize coarse features that have been widely used in analyzing only the po- larity of emotion. In this paper, we present a method to automatically build fine-grained emotion lexicon sets and suggest features that improve the performance of machine learning based emotion classification in Korean Twitter texts. 1 Introduction Nowadays, people freely express their thoughts on microblogs, and Twitter is known to be one of the popularly used services. In 2014, 500 million tweets were sent per day by 316 million monthly active users across the globe 1 . Not surprisingly, Twitter has been actively mined in the field of computer sci- ence to investigate public opinion (Diakopoulos and Shamma, 2010; Kim et al., 2014; O’Connor et al., 1 https://about.twitter.com/company 2010), get real-time information (Doan et al., 2012), and even forecast future events (Bollen et al., 2011). All such research shows Twitter’s potentials in the analysis of human thought and behavior. In partic- ular, researchers are showing interest in the analysis of human emotions presented in Twitter messages. Many studies have been done to classify sentiments (positive and negative) in tweets. Going further, re- searchers are currently trying to analyze fine-grained emotions beyond polarity. Fine-grained emotion analysis is known to be more challenging than sen- timent analysis because it must identify subtle dif- ferences between emotions. Dealing with emotions in an individual Twitter post is even more difficult because of its short length with the frequent use of informal words and erroneous sentence structures. Elaborate emotion lexicons should be used to deal with the problem, but non-English speaking coun- tries have difficulties using existing lexicon sets be- cause they are mostly in English. Further, build- ing such lexicons is known to be extremely labor- intensive or resource-intensive that can be a burden to under-resourced countries. Moreover, a set of features that achieves the best performance in fine- grained emotion classification should be exploited that is particularly attuned to tweets written in spe- cific language. Our goal in this paper is to classify Korean Twit- ter messages into fine-grained emotions. The emo- tion types are Ekman’s six basic emotions (Ekman, 1992) and it is known to be the most frequently used in the field of computer science for emotion min- ing and classification (Bann and Bryson, 2012). For this goal, we employed machine learning algorithms PACLIC 29 142 29th Pacific Asia Conference on Language, Information and Computation: Posters, pages 142 - 150 Shanghai, China, October 30 - November 1, 2015 Copyright 2015 by Hyo Jin Do and Ho-Jin Choi

Upload: others

Post on 30-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

Korean Twitter Emotion Classification Using Automatically Built EmotionLexicons and Fine-Grained Features

Hyo Jin Do and Ho-Jin ChoiSchool of Computing

Korea Advanced Institute of Science and TechnologyDaejeon, Republic of Korea

{hjdo, hojinc}@kaist.ac.kr

Abstract

In recent years many people have begun toexpress their thoughts and opinions on Twit-ter. Naturally, Twitter has become an ef-fective source to investigate people’s emo-tions for numerous applications. Classifyingonly positive and negative tweets has been ex-ploited in depth, whereas analyzing finer emo-tions is still a difficult task. More elaborateemotion lexicons should be developed to dealwith this problem, but existing lexicon setsare mostly in English. Moreover, buildingsuch lexicons is known to be extremely labor-intensive or resource-intensive. Finer-grainedfeatures need to be taken into account whendetermining finer-emotions, but many exist-ing works still utilize coarse features that havebeen widely used in analyzing only the po-larity of emotion. In this paper, we presenta method to automatically build fine-grainedemotion lexicon sets and suggest features thatimprove the performance of machine learningbased emotion classification in Korean Twittertexts.

1 Introduction

Nowadays, people freely express their thoughts onmicroblogs, and Twitter is known to be one of thepopularly used services. In 2014, 500 million tweetswere sent per day by 316 million monthly activeusers across the globe1. Not surprisingly, Twitterhas been actively mined in the field of computer sci-ence to investigate public opinion (Diakopoulos andShamma, 2010; Kim et al., 2014; O’Connor et al.,

1https://about.twitter.com/company

2010), get real-time information (Doan et al., 2012),and even forecast future events (Bollen et al., 2011).All such research shows Twitter’s potentials in theanalysis of human thought and behavior. In partic-ular, researchers are showing interest in the analysisof human emotions presented in Twitter messages.Many studies have been done to classify sentiments(positive and negative) in tweets. Going further, re-searchers are currently trying to analyze fine-grainedemotions beyond polarity. Fine-grained emotionanalysis is known to be more challenging than sen-timent analysis because it must identify subtle dif-ferences between emotions. Dealing with emotionsin an individual Twitter post is even more difficultbecause of its short length with the frequent use ofinformal words and erroneous sentence structures.Elaborate emotion lexicons should be used to dealwith the problem, but non-English speaking coun-tries have difficulties using existing lexicon sets be-cause they are mostly in English. Further, build-ing such lexicons is known to be extremely labor-intensive or resource-intensive that can be a burdento under-resourced countries. Moreover, a set offeatures that achieves the best performance in fine-grained emotion classification should be exploitedthat is particularly attuned to tweets written in spe-cific language.

Our goal in this paper is to classify Korean Twit-ter messages into fine-grained emotions. The emo-tion types are Ekman’s six basic emotions (Ekman,1992) and it is known to be the most frequently usedin the field of computer science for emotion min-ing and classification (Bann and Bryson, 2012). Forthis goal, we employed machine learning algorithms

PACLIC 29

14229th Pacific Asia Conference on Language, Information and Computation: Posters, pages 142 - 150

Shanghai, China, October 30 - November 1, 2015Copyright 2015 by Hyo Jin Do and Ho-Jin Choi

Page 2: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

with fine-grained features including an emotion lex-icon feature. Specifically, we addressed the follow-ing problems:

1. Emotion lexicon construction. Is there any sim-ple and automatic method to generate emotionlexicons particularly attuned to the Twitter do-main without using other lexical resources?

2. Feature engineering. What is the best set offeatures that can effectively show the subtledistinctions between finer-grained emotions ex-pressed in Korean Twitter texts?

We propose an emotion lexicon constructionmethod and features to address the problems above.Our main contributions are the following:

1. Emotion lexicon construction. We proposethe weighted tweet frequency (weighted TwF)method, a simple and automatic way to buildemotion lexicon lists directly from an anno-tated corpus without using other resources. Themethod will be useful for many countries whererelevant resources are not available.

2. Feature engineering. We propose a set offine-grained and language-specific features thatimproves the overall performance of machinelearning based emotion classification in KoreanTwitter texts.

3. Resource and Dataset Our study is unique be-cause emotion analysis on Korean Twitter textshas rarely been addressed before. In addition,we built an annotated dataset, emotion lexiconsets, and other resources. Since finding relateddatasets and resources in Korean is difficult, webelieve our work can contribute to future re-lated studies.

The rest of this paper is organized as follows. Sec-tion 2 overviews related work, and in Section 3,we introduce our annotated dataset. In Section 4,we present our emotion lexicon construction methodand in Section 5, we describe features designed forclassification. We provide experimental results andanalysis in Section 6 and conclude in Section 7.

2 Related Work

There have been extensive studies on sentimentanalysis that classify expressions of sentiment intopositive and negative emotions. In the last few years,researchers have started to explore finer granularityof emotion because simple division of polarity maynot suffice in many real-world applications. Thereare two main approaches to emotion analysis, one isa lexicon based approach and the other is a machinelearning based approach. The lexicon based ap-proach utilizes a dictionary of words annotated withtheir emotional orientation and simply counts thewords or aggregates the according values presentedin texts. In contrast, the machine learning based ap-proach performs classification using machine learn-ing algorithms based on carefully designed features.Roberts et al. (2012) tried to classify seven emo-tions in the Twitter domain with binary support vec-tor machine(SVM) classifiers, and Balabantaray etal. (2012) also used SVM classifiers with featuresincluding WordNet-Affect emotion lexicons.

A large number of existing emotion lexicon setswere built manually such as WordNet-Affect (Strap-parava and Valitutti, 2004) and Linguistic Inquiryand Word Count (Pennebaker et al., 2001). Crowd-sourcing is often utilized to obtain a large volumeof human annotated lexicon sets such as the NRCWord-Emotion Association Lexicon (Mohammadand Turney, 2013). Non-English speaking countrieslike Korea have difficulties building emotion lexi-cons without human labor because existing lexiconsand crowd-sourcing platforms are mostly availablein English. To deal with the difficulties, one pop-ular approach is to build lexicons upon other re-sources. For example, AffectNet (Cambria and Hus-sain, 2012) was constructed using ConceptNet(Liuand Singh, 2004) and WordNet-Affect (Strapparavaand Valitutti, 2004). Another popular choice forbuilding lexicon sets automatically is translating ex-isting lexicon lists written in English. Those builtby Remus et al. (2010) and Momtazi (2012) are ex-amples. We also propose an automatic method thatdoes not require lexical resources and translation.

Very few attempts have been made so far to ana-lyze emotions in Korean text. Cho and Lee (2006)identified eight emotions in Korean song lyrics withmanually annotated word emotion vectors. Lee et al.

PACLIC 29

143

Page 3: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

(2013) classified Korean tweets into seven emotionsand achieved 52% accuracy when using the multi-nomial naı̈ve Bayes algorithm with morpheme fea-tures. The only publicly available Korean emotionlexicons we found were a set of 265 terms of nineemotion types, manually built by Rhee et al. (2008).Our work differs from the aforementioned Koreanstudies because we automatically construct largeremotion lexicon sets and introduce fine-grained fea-tures that are particularly attuned for Korean Twittertexts.

3 Korean Twitter Emotion Analysis(KTEA) Dataset

A Twitter dataset annotated by emotion types isessential in the machine learning based approachfor the purpose of training. To build the corpus,we collected random Korean Twitter messages us-ing Twitter streaming API. We removed tweets withRT, URL links, and replies. After the collectionprocess, a corpus can be annotated either manuallyby human annotators or automatically by distant la-bels (Go et al., 2009; Wicaksono et al., ; Lee et al.,2013). In our case, we manually annotated the cor-pus. Each tweet was labeled by three annotators,producing three emotion labels per tweet. Conse-quently, we constructed a Korean Twitter EmotionAnalysis (KTEA) dataset2, which contains 5,706valid tweets labeled by seven types of emotions -Ekman’s six emotions and no emotion(neutral). Us-ing the dataset, we constructed emotion lexicon setsas described in Section 4 and trained our machinelearning algorithm as presented in Section 5.

4 Constructing Emotion Lexicons

4.1 Our Approach - Weighted TweetFrequency

We built emotion lexicons automatically from theannotated corpus without using other lexical re-sources. For the construction, we utilized part of ourKTEA dataset, which is the set of tweets, each ofwhich was labeled as representing one of Ekman’ssix emotion types (disregarding the neutral case) byat least one annotator. Table 1 shows the number oftweets we used per emotion for the purpose of lexi-con construction.

2goo.gl/Gu0GNw

Emotion Number of TweetsHappiness 770Sadness 1377Anger 903

Disgust 694Surprise 475

Fear 228Total 4447

Table 1: The number of tweets we used to generate emo-tion lexicons using weighted TwF approach

To generate emotion lexicons, we propose theweighted tweet Frequency (weighted TwF) method.First, we aggregated tweets of the same emotion la-bel in one document (d), producing six documents(D) of tweets as a result. Using the six documents,we calculated the weighted TwF for each term (t)that appeared in the documents. The weighted TwFis expressed in Equations 1, 2, and 3. Consequently,we generated six emotion lexicon lists, one list foreach emotion type. Each lexicon has a weightedTwF value which shows the strength of the corre-sponding emotion, i.e., the higher the value is, thestronger the emotion is. The basic idea is simi-lar to the concept of term frequency - inverse doc-ument frequency (TF-IDF)3, for which the occur-rences of a term are counted and a penalty is givenif the term appears in several documents. However,TF-IDF is not appropriate for our task because thestructures of tweets are often highly ungrammati-cal, and there are many tweets with meaninglessterms, which are sometimes excessively repeated inone tweet. In such cases, the meaningless termsproduce high term frequency, which results in erro-neous emotion lexicons. As illustrated in Figure 1,when term frequency (TF-IDF) is used, we can seesome words (that are names in this example), suchas 시우민 “Xiumin”, 성규 “Sung Kyu”, and 김민석 “Kim Min Seok”, ranked high in the happinesslexicon list. This is because there are few tweetsthat excessively repeat those names. Similar kindsof unstructured tweets are frequent in Twitter, andwe can disregard such cases by using the tweet fre-quency defined in Equation 1. It counts the num-ber of tweets instead as true emotion lexicons appearacross many tweets, not in a few erroneous tweets.

3https://en.wikipedia.org/wiki/Tf-idf

PACLIC 29

144

Page 4: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

Figure 1: Example of top-ranked happiness lexicons generated from TF-IDF and weighted TwF

Another reason why TF-IDF is not suitable is the logterm in IDF, which is trivial due to the small num-ber of documents. Thus, we used a simple weightingscheme instead as in Equation 2. We set the weightto zero when a lexicon appeared in all the emotiondocuments in order to remove lexical items that ap-pear very frequently but without any emotions, forexample,있다 “is”, and나 “I”.- d : A document with tweets of same emotion- D : Total set of ds- t : Target term- n : The number of ds where t appears

Normalized Tweet Frequency

=Number of tweets in d where t appears

Total number of terms in the d

(1)

Weight =

1

nn < |D|

0 n = |D|(2)

weighted Tweet Frequency (weighted TwF)

= Normalized Tweet Frequency × Weight(3)

Automatic methods of building emotion lexiconshave been studied in many works. There are twowidely used methods, namely, a thesaurus-basedapproach (Section 4.2) and a translation-based ap-proach (Section 4.3).

4.2 Thesaurus-Based ApproachThe thesaurus-based method builds emotion lexiconlists using synonyms. Using a small set of emotionseed words, this method looks for synonyms using a

thesaurus and adds them to the emotion lexicon lists.Due to the lack of a large and representative Ko-rean thesaurus, we combined various publicly avail-able resources, namely, Dong-a’s Prime dictionary4,Naver dictionary5, a Korean thesaurus6, and Wise-WordNet7. First, seed words – happiness, sadness,anger, disgust, surprise, and fear – were translatedinto Korean using Dong-a’s Prime English-KoreanDictionary. Then, we extended the emotion lexiconsets to include derivatives and synonyms using var-ious resources and thesauruses. Since the resourceswere not perfect, there were many erroneous syn-onyms. Thus, for the last step, we manually re-moved the unreasonable ones. The detailed proce-dure is summarized in Table 2.

4.3 Translation-Based ApproachThere are many lexical resources in English foremotion analysis. This method translates such re-sources to a specific language, in our case, Ko-rean. Among many lexical resources, we choseWordNet-Affect (Strapparava and Valitutti, 2004) asit is one of the popular and typical emotion lexiconsets used in emotion analysis, and it is freely avail-able. WordNet-Affect contains WordNet synonymsand is manually annotated by Ekman’s six emotions.We translated the WordNet-Affect list using GoogleTranslate8. We employed the Google service as it isthe most widely used translator and its performanceis known to be fairly accurate. However, there were

4http://www.dongapublishing.com/entry/index.html5dic.naver.com6http://www.wordnet.co.kr/7Software Research Laboratory, ETRI8https://translate.google.co.kr/

PACLIC 29

145

Page 5: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

Thesaurus-Based ApproachSeed wordshappiness, sadness, anger, disgust, surprise, fear(each seed word constructs according emotionlexicon list)Step 1. Translate seed words to Korean using

Dong-a’s Prime dictionaryStep 2. Add derivatives using NAVER dictionaryStep 3. Using Korean thesaurus, add synonyms

of each wordStep 4. Using WiseWordNet, add primary

synonyms of each wordStep 5. Leave only exclusive words for each

emotion and remove duplicates within listStep 6. Manually remove unreasonable or

misleading emotion words

Table 2: Procedure of making emotion lexicons usingthesaurus-based approach

some erroneous translations since the Korean trans-lator is not perfect. Thus, we manually modified andremoved problematic words and duplicates.

4.4 ComparisonIn this section, we explain the qualitative aspects ofour lexicon construction method in comparison withother approaches. The advantages of our emotionlexicon sets built by weighted TwF approach are thefollowing:

1. As the wordlist is constructed based on real Twit-ter messages, the method generates Twitter-specificlexicons that include slang, swear words, and un-grammatical words. Example: 존잘님 “slang forhandsome person”,조아 “ungrammatical word forlike”

2. Our method discovers topics that are closely re-lated to some particular emotions. Example: 야자“night school study” (sadness - many students feelsad when they are forced to study at night in school)

3. It is possible to discover keywords that particularlyappear in a specific time range. The method au-tomatically updates the lexicons to include newly-coined words, which are essential for emotion anal-ysis in Twitter domain. Example: 빅뱅 “Big Bang”(happiness - a famous Korean singer Big Bang re-leased a new album at the time we constructed theemotion lexicons)

We show the effectiveness of our weightedTwF approach by comparing it with the popular

Approach Weighted TwF Thesaurus TranslationAutomatic? O O O

Resource-free? O X(thesaurus) X(translator)No manual work? O X XTwitter-specific? O X X

Table 3: Comparison of our weighted TwF approach withthesaurus-based and translation-based approaches

thesaurus-based and translation-based approaches.Table 3 compares the three approaches. These ap-proaches can automatically generate emotion lexi-cons. To be specific, using the thesaurus-based ap-proach, we are able to construct emotion wordlistseasily and automatically by using only a small setof seed words. The translation-based approach alsotranslates the existing emotion lexicons automati-cally using translators. However, the thesaurus-based approach is heavily dependent on lexical re-sources like dictionaries and thesauruses. A well-built thesaurus is not likely to be available inmany non-English speaking countries. Additionally,translation-based approach requires a reliable trans-lator. In comparison, our weighted TwF approach isbased on statistics, which are independent of lexicalresources and translators; thus, it would be very use-ful for under-resourced countries. Moreover, we ob-served that the thesaurus- and translation-based ap-proaches generate a lot of erroneous words due toerrors of resources and translators. Hence, manualremoval of those words was necessary to achieveaccurate results. In contrast, our approach gener-ates lexicons with strength values that show how ac-curately the word may belong to an emotion type.Even though erroneous words are included in thelist, they are likely to be ignored due to the lowweighted TwF value. Lastly, our lexicon sets are par-ticularly attuned to the Twitter domain; they includeslang, jargon, ungrammatical words, and newly-coined words, whereas most other approaches donot.

5 Machine Learning with Fine-GrainedFeatures

Our goal is to classify Korean Twitter messages ac-cording to one of the following six emotions, happi-ness, sadness, anger, disgust, surprise, and fear. Weused a machine learning algorithm to classify eachTwitter message represented by a feature vector. We

PACLIC 29

146

Page 6: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

first explain features that we propose in this workand explain our machine learning classification.

5.1 Fine-Grained Features

Feature engineering is very important in machinelearning. Features that have been traditionally usedin emotion analysis are lexicons and punctuations.Positive and negative emoticons such as :) and :(have also been used in some research. However,more fine-grained and language-specific features arenecessary to distinguish finer granularity of emo-tions. To come up with some effective features, weworked with the following ideas:

• Emoticons and symbols may express specifictypes of fine-grained emotions

• Some alphabet letters may convey emotions• Exclamation words may appear in surprise

messages• Swear words may appear in angry messages

We explain how we designed the features accord-ing to the ideas we presented above with some ex-amples.

Fine-Grained Emoticons and Symbols Emoticonsand symbols are important in analyzing online lan-guage because many people express their feelingsusing them. We constructed a list of emoticons foreach fine-grained emotion type that are used in Ko-rea as well as general emoticons widely used in East-ern and Western countries. We constructed a dictio-nary of emoticons and symbols with the aid of vari-ous website articles. Moreover, we included Emojiswhich have become increasingly popular on Twittersince mobile devices adopted them. We sorted eachemoticon and symbol into one of the six emotionsusing the explanations written in the websites. Oneinteresting aspect of the dictionary is that it utilizesregular expressions to incorporate various mutationsof emoticons. For example, Korean emoticons of-ten use various or extended particles to representthe mouth of a face. In the case of a smiling face(ˆˆ), people use various mutation of such emoticonssuch as (ˆ ˆ),(ˆ ˆ),(ˆ.ˆ),(ˆㅅˆ),(ˆ3ˆ). In other words,similar to languages, emoticons also have informalversions of similar patterns. Thus, we incorporatedsuch common cases with regular expressions. Part

Figure 2: Example of fine-grained emoticon-emotion dic-tionary

Korean Letters Meaningㅋ ,ㅎ Laughing with Happinessㅠ ,ㅜ Crying with Sadnessㄷㄷ Shaking with Fearㅂㄷㅂㄷ Trembling with Anger

Table 4: List of Korean letters closely related to emotions

of the dictionary of emoticons and symbols is shownin Figure 2.Korean Emotion Letters Language-specific featureare important in performing emotion analysis for aspecific language. Koreans use certain Korean let-ters to show emotions, so we took certain letters intoaccount that are listed in Table 4. ‘ㅋ’ and ‘ㅎ’ areoften used to indicate laughter, while ‘ㅠ’ and ‘ㅜ’indicate crying. Also, sequences of letters, such as’ㄷㄷ’ and ’ㅂㄷㅂㄷ’, are often used to express fearand anger, respectively. We counted and normalizedthe number of such emotion letters and added themas a language-specific feature.Exclamations of Surprise According to Merriam-Webster dictionary, the definition of an exclamationis “a sharp or sudden cry, a word, phrase, or soundthat expresses a strong emotion9”. We assumed thatexclamations are often used in tweets expressingsurprise, such as맙소사 “oh no”,앗 “oh dear”, and우와 “wow”. We searched various websites and col-lected examples to make a list of exclamations ofsurprise. As a result, we constructed a list of 45surprise exclamation words. We then counted thenumber of occurrences of such words in tweets andadded them as a feature.Swear Words We observed that swear words are of-ten used in angry tweets; therefore, we assumed thatthere occurrence is a strong clue to identify tweetsexpressing anger. We constructed our own list ofKorean swear words by combining numerous related

9http://www.merriam-webster.com/dictionary/exclamation

PACLIC 29

147

Page 7: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

resources and websites. As a result, a list of 227Korean swear words was built. For each tweet, wecounted the number of occurrences of swear wordsand added them as a feature.

Consequently, we designed a feature vector basedon the conventional features as well as the featureswe presented above. To sum up, we considered thefollowing features for classification:

1. Emotion lexicons (weighted TwF)2. Emotion lexicons (thesaurus+translation)3. Punctuation (? ! !? . , ∼)4. Fine-grained emoticons and symbols5. Korean emotion letters6. Exclamations of surprise7. Swear words

5.2 Machine Learning Based ClassificationBefore constructing a machine learning classifier,we applied the synthetic minority oversamplingtechnique (SMOTE) (Chawla et al., 2002) to thetraining set, a well-known oversampling method,which is known to be more effective than the plainoversampling method with replication. We pre-ferred an oversampling method to an undersam-pling method since our dataset is highly imbalanced,and undersampling removes too many instances.SMOTE generates synthetic instances of the minor-ity class by choosing a random point for each linesegment between randomly selected neighbors fromk nearest minority neighbors. As a result of apply-ing SMOTE to our training set, we could make abalanced dataset, which is favored for most machinelearning algorithms. We compared several machinelearning algorithms for classification, including sup-port vector machine (SVM), multinomial logistic re-gression, random forest, J48, naive Bayes, and ze-roR.

6 Experimental Results and Analysis

We performed experiments using WEKA10 to eval-uate 1) our weighted tweet frequency method and 2)the performance of machine learning based classifi-cation using the feature vector we engineered.Dataset For training and testing the machine learn-ing algorithms, we used 899 Twitter messages fromour KTEA dataset, which contains tweets for which

10http://www.cs.waikato.ac.nz/ml/weka/

three annotators all agreed on the emotion type, ex-cluding neutral. We performed 5-cross validation.Performance Measure We used precision, recalland F-measure to evaluate the classification perfor-mance for each emotion type. Also, the weightedaverage of each measure was computed to determinethe overall performance of unbalanced test dataset.Weighted Tweet Frequency We investigated theperformance of our lexicon building method,weighted tweet Frequency, and compared it with theperformance of the thesaurus- and translation-basedmethods. We found that the lexicons based on thethesaurus- and translation-based approaches sufferfrom low coverage due to the lack of reliable wordsproduced by the Korean resources and translator.Therefore, we combined the lexicon lists producedby the thesaurus- and translation-based approachesto make a larger emotion lexicon list. In other words,we compared our approach (weighted TwF) againstthe combined approach (thesaurus+translation). Theprecision, recall, and F-measure of using SVM isshown in Figure 3. The F-measure of our approachis higher than that of the thesaurus+translation ap-proach. The precision of the thesaurus+translationapproach is relatively high due to the manual re-moval of erroneous words from the lists. How-ever, its recall is very low because it does not con-tain Twitter-specific words. Furthermore, our ap-proach, used together with the thesaurus+translationapproach, achieves the best performance.Machine Learning Based Classification First, weinvestigated the most appropriate machine learningalgorithm for classification. We tested various ma-chine learning algorithms: SVM, multinomial logis-tic regression, random forest, J48, naive Bayes, andzeroR. Figure 4 shows the results. SVM producedthe best precision, recall, and F-measure comparedto the others.

We conducted another experiment to evaluatehow well the features we proposed improved theperformance of SVM. As shown in Figure 5, thebest performance was observed when all the fea-tures were combined and the overall F-measure wasabout 70%. Emotion lexicon and punctuation fea-tures achieved an F-measure of about 64%. Addingthe exclamation of surprise feature improved theclassification of the surprise emotion by a 12% F-measure. Further adding Korean emotion letters

PACLIC 29

148

Page 8: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

Figure 3: Precision, recall, and F-measure of using our weighted TwF, thesaurus+translation, and the two approachescombined. The best accuracy was observed when all the approaches were used.

Figure 4: Precision, recall, and F-measure achieved byusing different machine learning algorithms. SVM gen-erated the best performance.

helped to classify sadness, increasing the classifi-cation score from 67% to 72.1%, while the valuefor fear increased from 76.3% to 79.5%. Moreover,combining emoticon and symbol feature particularlyimproved the classification for happiness increasedfrom 73% to 76.3%. Lastly, we added the swearword feature. As expected, it increased the classi-fication of anger from 54.4% to 56.5%. Overall, wefound that our fine-grained features helped the anal-ysis of fine-grained emotions, and we believe thatimproving the feature resources will further improvethe overall performance.

7 Conclusion

We proposed a machine learning based classifica-tion method that sorts Korean Twitter messages into

Figure 5: Precision, recall, and F-measure using com-bined set of features. Using all features achieved the bestperformance which is about 70% F-measure

six emotion types using carefully designed features.Emotion analysis research in under-resourced coun-tries can benefit from our emotion lexicon buildingmethod as we automatically construct lexicons with-out any help from other resources and tools. In ad-dition, we suggested several fine-grained features toimprove classification performance. We believe thatour research, the KTEA dataset, and resources rep-resent a significant step forward in Korean Twitteremotion analysis studies, which have been rarely ad-dressed before.

Acknowledgment

This work was supported by the National ResearchFoundation of Korea(NRF) grant funded by the Ko-rea government(MSIP) (No. 2010-0028631).

PACLIC 29

149

Page 9: Korean Twitter Emotion Classification Using Automatically ... · lexicon list. This is because there are few tweets that excessively repeat those names. Similar kinds of unstructured

References

R. C. Balabantaray, Mudasir Mohammad, and NibhaSharma. 2012. Multi-class twitter emotion classifi-cation: A new approach. International Journal of Ap-plied Information Systems, 4(1):48–53.

E. Y. Bann and J. J. Bryson. 2012. The conceptualisationof emotion qualia: Semantic clustering of emotionaltweets. In Proceedings of the 13th Neural Computa-tion and Psychology Workshop. World Scientific.

Johan Bollen, Huina Mao, and Xiaojun Zeng. 2011.Twitter mood predicts the stock market. Journal ofComputational Science, 2(1):1–8.

Erik Cambria and Amir Hussain. 2012. Sentic Comput-ing: Techniques, Tools, and Applications, volume 2.Springer Science & Business Media.

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall,and W. Philip Kegelmeyer. 2002. Smote: syntheticminority over-sampling technique. Journal of Artifi-cial Intelligence Research, 16:321–357.

Young Hwan Cho and Kong Joo Lee. 2006. Auto-matic affect recognition using natural language pro-cessing techniques and manually built affect lexi-con. IEICE Transactions on Information and Systems,89(12):2964–2971.

Nicholas A. Diakopoulos and David A. Shamma. 2010.Characterizing debate performance via aggregatedtwitter sentiment. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems,pages 1195–1198. ACM.

Son Doan, Bao-Khanh Ho Vo, and Nigel Collier. 2012.An analysis of twitter messages in the 2011 tohokuearthquake. In Electronic Healthcare, volume 91,pages 58–66. Springer.

Paul Ekman. 1992. An argument for basic emotions.Cognition & Emotion, 6(3-4):169–200.

Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit-ter sentiment classification using distant supervision.CS224N Project Report, Stanford, 1:12.

Jeongin Kim, Dongjin Choi, Myunggwon Hwang, andPankoo Kim. 2014. Analysis on smartphone relatedtwitter reviews by using opinion mining techniques. InAdvanced Approaches to Intelligent Information andDatabase Systems, pages 205–212. Springer.

Cheolseong Lee, Donghee Choi, Seongsoon Kim, andJaewoo Kang. 2013. Classification and analysis ofemotion in korean microblog texts. Journal of Ko-rean Institute of Information Scientists and Engineers:Databases, 40(3):159–167.

Hugo Liu and Push Singh. 2004. Conceptnet — a prac-tical commonsense reasoning tool-kit. Journal of BTTechnology, 22(4):211–226.

Saif M. Mohammad and Peter D. Turney. 2013. Crowd-sourcing a word–emotion association lexicon. Com-putational Intelligence, 29(3):436–465.

Saeedeh Momtazi. 2012. Fine-grained german senti-ment analysis on social media. In Proceedings of the8th International Conference on Language Resourcesand Evaluation. European Language Resources Asso-ciation (ELRA).

Brendan O’Connor, Ramnath Balasubramanyan, Bryan RRoutledge, and Noah A Smith. 2010. From tweets topolls: Linking text sentiment to public opinion timeseries. Proceedings of the International Conferenceon Web and Social Media, 11(122-129):1–2.

James W. Pennebaker, Martha E. Francis, and Roger J.Booth. 2001. Linguistic inquiry and word count:Liwc 2001. Mahway: Lawrence Erlbaum Associates,71:2001.

Robert Remus, Uwe Quasthoff, and Gerhard Heyer.2010. Sentiws - a publicly available german-languageresource for sentiment analysis. In Proceedings of the7th International Conference on Language Resourcesand Evaluation. European Language Resources Asso-ciation (ELRA).

June Woong Rhee, Hyun Joo Song, Eun Kyung Na, andHyun Suk Kim. 2008. Classification of emotion termsin korean. Korean Journal of Journalism and Commu-nication Studies, 52(1):85–116.

Kirk Roberts, Michael A. Roach, Joseph Johnson, JoshGuthrie, and Sanda M. Harabagiu. 2012. Empatweet:Annotating and detecting emotions on twitter. In Pro-ceedings of the 8th International Conference on Lan-guage Resources and Evaluation. European LanguageResources Association (ELRA).

Carlo Strapparava and Alessandro Valitutti. 2004. Word-net affect: an affective extension of wordnet. In Pro-ceedings of the 4th International Conference on Lan-guage Resources and Evaluation, volume 4, pages1083–1086.

Alfan F. Wicaksono, Clara Vania, Distiawan T. Bayu, andMirna Adriani. Automatically building a corpus forsentiment analysis on indonesian tweets. 28th PacificAsia Conference on Language, Information and Com-puting.

PACLIC 29

150