sense-level semantic clustering of hashtags in social mediaceur-ws.org/vol-1743/paper17.pdf ·...

Sense-Level Semantic Clustering of Hashtags in Social Media

Ali JavedDepartment of Computer Science

University of VermontBurlington, Vermont 05405, U.S.A.

[email protected]

Byung Suk LeeDepartment of Computer Science

University of VermontBurlington, Vermont 05405, U.S.A.

[email protected]

Abstract

We enhance the accuracy of the cur-rently available semantic hashtag cluster-ing method, which leverages hashtag se-mantics extracted from dictionaries suchas Wordnet and Wikipedia. While immuneto the uncontrolled and often sparse us-age of hashtags, the current method dis-tinguishes hashtag semantics only at theword level. Unfortunately, a word canhave multiple senses representing the ex-act semantics of a word, and, therefore,word-level semantic clustering fails to dis-ambiguate the true sense-level semanticsof hashtags and, as a result, may generateincorrect clusters. This paper shows howthis problem can be overcome throughsense-level clustering and demonstrates itsimpacts on clustering behavior and accu-racy.

1 IntroductionHashtags are used in major social media (e.g.,Twitter, Facebook, Tumblr, Instagram, Google+,Pinterest) for various purposes – to tell jokes, fol-low topics, put advertisements, collect consumerfeedback, etc. For instance, McDonald’s createda hashtag #Mcdstories to collect consumer feed-back; #OccupyWallStreet, #ShareaCoke and #Na-tionalFriedChickenDay are only a few examplesof many successful hashtag campaigns.

Twitter is the first social media platform that in-troduced hashtags, and is used as the representa-tive social media in this paper. Most tweets con-tain one or more hashtags in their texts of up to140 characters.

Clustering is commonly used as a text classifi-cation technique, and clustering of hashtags is thefirst step in the classification of tweets given that

hashtags are used to index those tweets. There-fore, admittedly, classification of tweets benefitsfrom accurate clustering of hashtags.

Social media is arguably the best source oftimely information. On Twitter alone, for ex-ample, an average of 6000 micro-messages areposted per second (Internet Live Stats, last viewedin May 2016). Thus, social media analysts useclusters of hashtags as the basis for more com-plex tasks (Muntean et al., 2012) such as retriev-ing relevant tweets (Muntean et al., 2012; Parkand Shin, 2014), tweet ranking, sentiment analy-sis (Wang et al., 2011), data visualization (Bhulaiet al., 2012), semantic information retrieval (Teufland Kraxberger, 2011), and user characterization.Therefore, the accuracy of hashtag clustering isimportant to the quality of the resulting informa-tion in those tasks.

The popular approach to hashtag clustering hasbeen to leverage the tweet texts accompanyinghashtags (Costa et al., 2013; Teufl and Kraxberger,2011; Tsur et al., 2012; Tsur et al., 2013; Bhu-lai et al., 2012; Muntean et al., 2012; Rosa etal., 2011) by identifying their “contextual” seman-tics (Saif et al., 2012). There are two prominentproblems with this approach, however. First, a ma-jority of hashtags are not used frequently enoughto find sizable tweet texts accompanying them,thus causing a sparsity problem. Second, tweettexts are open-ended, with no control over theircontents at all, and therefore often exhibit poorlinguistic quality. (According to Pear Analytics,40.1% of tweets are “pointless babble” (Kelly, lastviewed in May 2016).) These problems make text-based techniques ineffective for hashtag cluster-ing. Hence, methods that utilize other means toidentifying semantics of hashtags are needed.

In this regard, the focus of this paper is onleveraging dictionary metadata to identify thesemantics of hashtags. We adopt the pioneer-

140

ing work done by Vicient and Moreno (2014).Their approach identifies the “lexical” semanticsof hashtags from external resources (e.g., Word-net, Wikipedia) independent of the tweet messagesthemselves. To the best of our knowledge, theirwork is the only one that uses this metadata-basedapproach. This approach has the advantage ofbeing immune to the sparsity and poor linguisticquality of tweet messages, and the results of theirwork demonstrate it.

On the other hand, their work has a major draw-back, in that it makes clustering decisions at theword level while the correct decision can be madeat the sense (or “concept”) level. It goes with-out saying that the correct use of metadata is crit-ical to the performance of any metadata-based ap-proach, and indeed clustering hashtags based ontheir word-level semantics has been shown to er-roneously putting hashtags of different senses inthe same cluster (more on this in Section 4).

In this paper, we devise a more accurate sense-level metadata-based semantic clustering algo-rithm. The critical area of improvement is in theconstruction of similarity matrix between pairs ofhashtags, which then is input to a clustering algo-rithm. The immediate benefits are shown in theaccuracy of resulting clusters, and we demonstrateit using a toy example. Experimental results usinggold standard testing show a 26% gain of cluster-ing accuracy in terms of the weighted average pair-wise maximum f-score (Equation 5), where theweight is the size of a ground truth cluster. Despitethe gain in the clustering accuracy, we were ableto keep the run-time and space overheads for sim-ilarity matrix construction within a constant factor(e.g., 5 to 10) through a careful implementationscheme.

The remainder of this paper is organized as fol-lows. Section 2 provides some background knowl-edge. Section 3 describes the semantic hash-tag clustering algorithm designed by Vicient andMoreno (2014). Section 4 discusses the proposedsense-level semantic enhancement to the cluster-ing algorithm, and Section 5 presents its evalu-ation against the word-level semantic clustering.Section 6 discusses other work related to the se-mantic hashtag clustering. Section 7 summarizesthe paper and suggests future work.

Concept Meaningdesert.n.01 arid land with little or no vegeta-

tionabandon.v.05 leave someone who needs or

counts on you; leave in the lurchdefect.v.01 desert (a cause, a country or

an army), often in order to jointhe opposing cause, country, orarmy

desert.v.03 leave behindTable 1: Example synset for the word “desert”.

2 Background2.1 Wordnet – synset hierarchy and

similarity measureWordnet is a free and publicly available lexicaldatabase of English language. It groups Englishwords into sets of synonyms called synsets. Eachword in Wordnet must point to at least one synset,and each synset must point to at least one word.Hence, there is a many-to-many relationship be-tween synsets and words (Vicient, 2014). Synsetsin Wordnet are interlinked by their semantics andlexical relationships, which results in a network ofmeaningful related words and concepts.

Table 1 shows an example synset. The synsetcontains 4 different concepts, where a concept is aspecific sense of a word – e.g., “desert” meaning“arid land with little or no vegetation”, “desert”meaning “to leave someone who needs or countson you”.

All of these concepts are linked to eachother using the semantic and lexical relationshipsmentioned. For example “oasis.n.01”(meaning“a fertile tract in a desert”) is a meronymof “desert.n.01” i.e, “oasis.n.01” is a part of“desert.n.01”.

Given this network of relationships, Wordnet isfrequently used in automatic text analysis throughthe application program interface (API). Thereare different API functions that allow for the cal-culation of semantic similarity between synsets,and the Wu-Palmer similarity measure (Wu andPalmer, 1994) is used in this paper in order to stayconsistent with the baseline algorithm by Vicientand Moreno (2014). In a lexical database likeWordnet synset database, where concepts are or-ganized in a hierarchical structure, the Wu-Palmersimilarity between two concepts C1 and C2, de-

141

noted as simWP

(C1, C2), is defined as

simWP

(C1, C2) =2 · depth(LCS(C1, C2))

depth(C1) + depth(C2)(1)

where LCS(C1, C2) is the least common subsumerof C1 and C2 in the hierarchy of synsets.

This Wordnet functionality is used to calculatethe semantic similarity between hashtags in thispaper, that is, by grounding hashtags to specificconcepts (called “semantic grounding”) and cal-culating the similarity between the concepts.

2.2 Wikipedia – auxiliary categoriesWikipedia is by far the most popular crowd-sourced encyclopedia. Not all hashtags can begrounded semantically using Wordnet becausemany of them are simply not legitimate termsfound in Wordnet (e.g. #Honda). This situationis where Wikipedia can be used to look up thosehashtags. Wikipedia provides auxiliary categoriesfor each article. For example, when Wikipediais queried for categories related to the page titled“Honda”, it returns the following auxiliary cate-gories.

[Automotive companies of Japan’,Companies based in Tokyo’,Boat builders’,Truck manufacturers’,...]

Auxiliary categories can be thought of as cate-gories the page belongs to. In this example, if weare unable to look up the word “Honda” on Word-net, then, through the help of these auxiliary cate-gories, we can relate the term to Japan, Automo-tive, Company, etc. There are several open sourceWikipedia APIs available to achieve this purpose– for example, the Python library “wikipedia”.

2.3 Hierarchical clusteringHierarchical clustering is a viable approach tocluster analysis, and is particularly suitable for thepurpose of hashtag clustering in this paper for afew reasons. First, the approach does not requireapriori information about the number of clusters.(The number of outputs clusters is not known inmost real applications.) Second, it is suited to thetaxonomic nature of language semantics. Third, itfacilitates a fair comparison with the algorithm byVicient and Moreno (2014), which also uses hier-archical clustering.

There are two popular strategies for hierarchicalclustering – bottom-up (or agglomerative) and top-

down (or divisive). In bottom-up strategy, each el-ement starts in its own cluster and two clusters aremerged to form one larger cluster as the cluster-ing process moves up the hierarchy. In top-downstrategy, all elements start in one cluster and onecluster is split into two smaller clusters as the clus-tering process moves down the hierarchy. Bottom-up strategy is used in this paper because it is con-ceptually simpler than top-down (Manning et al.,2008).

For bottom-up strategy, several distance mea-surement methods are available to provide link-age criteria for building up a hierarchy of clus-ters. Among them, single-linkage method and un-weighted pair group method with arithmetic mean(UPGMA) are used most commonly, and are usedin this paper. Single-linkage method calculates thedistance between two clusters C

u

and Cv

as

d(Cu

, Cv

) = minui2Cu ^ vj2Cv

dist(ui

, vj

) (2)

and UPGMA calculates the distance as

d(Cu

, Cv

) =X

ui2Cu,vj2Cv

d(ui

, vj

)

|Cu

|⇥ |Cv

| (3)

where |Cu

and |Cv

| denote the number of elementsin clusters C

u

and Cv

, respectively.To generate output clusters, “flat clusters” are

extracted from the hierarchy. There are multiplepossible criteria to do that (SciPy.org, last viewedin May 2016), and in this paper we use the “dis-tance criterion” – that is, given either of the dis-tance measures discussed above, flat clusters areformed from the hierarchy when items in eachcluster are no farther than a distance threshold.

3 Semantic Hashtag ClusteringWe adopted the semantic clustering approach pro-posed by Vicient and Moreno (2014) specificallyfor hashtags. This approach uses Wordnet andWikipedia as the metadata for identifying the lex-ical semantics of a hashtag. Source codes of theiralgorithms were not available, and therefore weimplemented the approach described in Vicient’sPhD dissertation (Vicient, 2014) to the best of ourabilities using the algorithms and descriptions pro-vided.

There are three major steps in their semanticclustering algorithm: (a) semantic grounding, (b)similarity matrix construction, and (c) semanticclustering. Algorithm 1 summarizes the steps.

In the first stage (i.e., semantic grounding), each

142

Input: list H of hashtagsOutput: clustersStage 1 (Semantic grounding):Step 1: For each hashtag h 2 H perform Step1a.

Step 1a: Look up h from Wordnet. If h isfound then append the synset of h to a list(LC

h

). Otherwise segment h into multiplewords and drop the leftmost word andthen try Step 1a again using the reduced huntil either a match is found fromWordnet or no more word is left in h.

Step 2: For each h 2 H that has an empty listLC

h

, look up h in Wikipedia. If an articlematching h is found in Wikipedia, acquire thelist of auxiliary categories for the article,extract main nouns from the auxiliarycategories, and then, for each main nounextracted, go to Step 1a using the main nounas h.Stage 2 (Similarity matrix construction):Discard any hashtag h that has an empty LC

h

.Calculate the maximum pairwise similaritybetween each pair of lists LC

hi and LChj

(i 6= j) using any ontology-based similaritymeasure.Stage3 (Clustering): Perform clustering onthe distance matrix (1’s complement of thesimilarity matrix) resulting from Stage 2.

Algorithm 1: Semantic hashtag clustering (Vi-cient and Moreno, 2014).

hashtag is looked up in Wordnet. If there is a di-rect match, that is, the hashtag is found in Word-net, then it is added as a single candidate synset,and, accordingly, all the concepts (or senses) (seeSection 2.1) belonging to the synset are saved inthe form of a list of candidate concepts relatedto the hashtag. We call this list LC

h

. If, on theother hand, the hashtag is not found in Wordnet,then the hashtag is split into multiple terms (us-ing a word segmentation technique) and, then, theleftmost term is dropped sequentially until eithera match is found in Wordnet or there is no moreterm left.

For each hashtag that was not found from Word-net in Step 1 (i.e., of which the LC

h

is empty), itis looked up in Wikipedia. If a match is foundin Wikipedia, the auxiliary categories (see Sec-tion 2.2) of the article are acquired. Main nounsfrom the auxiliary categories are then looked up inWordnet, and if a match is found, we save the con-

cepts by appending them to the list LCh

; this stepis repeated for each main noun.

In the second stage (i.e, similarity matrix con-struction), first, hashtags associated with an emptylist of concepts are discarded; in other words,hashtags that did not match any Wordnet entry,either by themselves or by using word segmen-tation technique, and also had no entry found inWikipedia are discarded. Then, using the remain-ing hashtags (each of whose LC

h

contains at leastone concept in it), semantic similarity is calcu-lated between each pair of them. Any ontology-based measure can be used, and Wu-Palmer mea-sure (see Section 2.1) has been used in our work tostay consistent with the original work by Vicientand Moreno (2014).

Specifically, the similarity between two hash-tags, h

i

and hj

, is calculated as the maximum pair-wise similarity (based on the Wu-Palmer measure)between one set of concepts in LC

hi and anotherset of concepts in LC

hj . Calculating the similar-ity this way is expected to find the correct sense ofhashtag (among all the sense/concepts in LC

h

).Finally, in the third stage (i.e., clustering), any

clustering algorithm can be used to cluster hash-tags based on the similarity matrix obtained in thesecond stage. As mentioned earlier, in this paperwe use hierarchical clustering which was used inthe original work by Vicient and Moreno (2014).

4 Sense-Level Semantic HashtagClustering

In this section, we describe the enhancement madeto the word-level semantic hashtag clustering andshowcase its positive impact using a toy example.Both Stage 1 (i.e, semantic grounding) and Stage 3(i.e, clustering) of the sense level semantic cluster-ing algorithm are essentially the same as those inthe word-level semantic clustering algorithm (seeAlgorithm 1 in Section 3). So, here, we discussonly Stage 2 (i.e, similarity matrix construction)of the algorithm, with a focus on the difference inthe calculation of maximum pairwise similarity.

4.1 Similarity matrix construction4.1.1 Word-level versus sense-level similarity

matrixAs mentioned in Section 3, the similarity betweentwo hashtags h

i

and hj

is defined as the maxi-mum pairwise similarity between one set of sensesin LC

hi and another set of senses in LChj . (Re-

call that LCh

denotes a list of senses retrieved

143

from Wordnet to semantically ground a hashtagh.) This maximum pairwise similarity is an ef-fective choice for disambiguating the sense of ahashtag and was used to achieve a positive ef-fect in the word-based approach by Vicient andMoreno (2014).

However, we have observed many instanceswhere a hashtag word has multiple senses and itintroduces an error in the clustering result. Thatis, the word-level algorithm does not distinguishamong different senses of the same word whenconstructing a similarity matrix and, as a result,two hashtags are misjudged to be semanticallysimilar (because they are similar to a third hash-tag in two different senses) and are included inthe same cluster. Moreover, a false triangle thatviolates the triangular inequality property may beformed at the word level. (Note this property isrequired of any distance metric like Wu-Palmer.)See Figure 1 for an illustration. As its side effect,we have observed that a cluster tends to be formedcentered around a hashtag that takes on multiplesenses.

(a) Sense level. (b) Word level.(Edge weights denote similarity values (= 1 � distance).Assume the minimum similarity threshold is 0.5. Then,at the sense level (a), two clusters ({H1, H2}, {H1, H3})should formed because H2 and H3 are not similar (note0.1 < 0.5), but, at the word level (b), one cluster {H1, H2,H3} is formed because it appears as if H2 and H3 weresimilar via H1. Moreover, the false triangle that appearsto be formed at the word level violates the triangular in-equality property because dist(H1, H2) + dist(H1, H3) <dist(H2, H3).)

Figure 1: An illustration of clustering at the wordlevel versus sense level.

Thus, we chose to explicitly record the sense inwhich a hashtag is close to another hashtag whenconstructing a similarity matrix. This sense-levelhandling of hashtag semantic distance helps us en-sure that the incorrect clustering problem of word-level clustering does not happen. Accordingly, itavoids the formation of clusters that are centered

around a hashtag that has multiple senses.

4.1.2 Word-level similarity matrixconstruction

Algorithm 2 outlines the steps of calculating max-imum pairwise similarity between hashtags in theword-level algorithm. One maximum pairwisesimilarity value is calculated for each pair of hash-tags semantically grounded in the previous stage(i.e., Stage 1) and is entered into the similarity ma-trix. The similarity matrix size is |H|2, where His the number of hashtags that have at least onesense (i.e., nonempty LC

h

). Note that the pairwisesimilarity comparison is still done at the senselevel, considering all senses of the hashtags thatare compared.

Input: set H of hashtags h with nonemptyLC

h

.Output: pairwise hashtag similarity matrix.

1 Initialize an empty similarity matrixM[|H|, |H|].

2 Initialize maxSim to 0.3 for each pair (h

i

, hj

) of hashtags in H do4 // Calculate the maximum pairwise

similarity between hi and hj.

5 for each sp

2 LChi do

6 for each sq

2 LChj do

7 Calculate the similarity simbetween s

p

and sq

.8 if sim > maxSim then9 Update maxSim to sim .

10 end11 end12 end13 Enter maxSim into M[i, j].14 end

Algorithm 2: Word-level construction of seman-tic similarity matrix.

4.1.3 Sense-level similarity matrixconstruction

Algorithm 3 outlines the steps of constructing asimilarity matrix at the sense-level algorithm. Un-like the case of the word-level algorithm, entriesin the similarity matrix are between senses thatmake maximum similarity pairs between a pair ofhashtags. Since these senses are not known un-til the maximum pairwise similarity calculationsare completed, the construction of the similaritymatrix is deferred until then. In the first phase(Lines 2⇠16), for each pair of hashtags, the al-

144

Input: set H of hashtags h with nonemptyLC

h

.Output: pairwise hashtag similarity matrix.

1 Create an empty list LHs

of (hashtag sensepair, pairwise maximum similarity).

2 for each pair (hi

, hj

) of hashtags in H do3 // Calculate the maximum pairwise

similarity between hi and hj.

4 Initialize maxSim to 0.5 Initialize maxSimPair to (null, null).6 for each s

p

2 LChi do

7 for each sq

2 LChj do

8 Calculate the similarity simbetween s

p

and sq

.9 if sim > maxSim then

10 Update maxSim to sim .11 Update maxSimPair to (h

i

.sp

,hj

.sq

).12 end13 end14 end15 Add (maxSimPair, maxSim) to LH

s

.16 end17 // Construct the similarity matrix.

18 Count the number |S| of distinct hashtagsenses in LH

s

.19 Initialize a similarity matrix M[|S|, |S|] as a

0 matrix.20 for each triplet (h

i

.sp

, hj

.sq

, maxSim) in LHs

do21 Update the M[m,n] to maxSim, where

(m,n) is the matrix index for (hi

.sp

,hj

.sq

) .22 end

Algorithm 3: Sense-level construction of seman-tic similarity matrix.

gorithm saves the pair of senses (hi

.sp

, hj

.sq

) inthe maximum similarity pair and the maximumsimilarity value in the list LH

s

. Then, in the sec-ond phase (Lines 18⇠22), for each triplet element(h

i

.sp

, hj

.sq

, maxSim) in LHs

, the algorithm en-ters the maximum similarity value maxSim at thematrix index corresponding to the pair of senses(h

i

.sp

, hj

.sq

).This two-phase construction of similarity ma-

trix brings two advantages. First, it enables thealgorithm to use exactly the needed number of ma-trix entries for those senses that are distinct amongall senses that constitute pairwise maximum sim-ilarities between hashtags. The size of the ma-

trix, therefore, is |S|2, where S is the set of dis-tinct senses in LH

s

(see Lines 18⇠19). Second,it enables the algorithm to add exactly the needednumber of entries, that is, |H|2 entries (i.e., onefor each pair of hashtags (see Lines 20⇠22)) intoa matrix of size |S|2, where |S|2 > |H|2. (Theremaining entries are initialized to 0 and remain 0,as they are for pairs of senses that do not representmaximum similarity pair between any hashtags.)Our observation is that the ratio |S|/|H| is limitedfrom 5 to 10 for most individual hashtags, which isconsistent with Vicient’s statement (Vicient, 2014)that, out of semantically-grounded 903 hashtags,almost 100 of them have only 2 senses and veryfew have more than 5 senses.

Since what is clustered are hashtags, althoughtheir similarities are measured at the sense level, anumber of interesting points hold. First, we do notneed to add similarities between all pairs of sensesin the similarity matrix. Second, a hashtag mayappear in multiple clusters, where each cluster isformed based on distinct senses of the hashtag, andtherefore the resulting clusters are overlapping.

4.2 A toy exampleTo demonstrate the merit of clustering at the senselevel as opposed to the word level, we made a toyset of hashtags and ran the metadata-based seman-tic clustering algorithm at both the word level andthe sense level. The hashtags used are #date, #au-gust, #tree, and #fruit. From Wordnet, we foundthat there were 3 senses associated with the wordaugust, 13 senses with date, 5 senses with fruit,and 7 senses with tree.

Using the Wu-Palmer similarity measure (ex-plained in Section 2.1) at the word level, we ob-tained the distance matrix shown below.

Hashtag august date fruit treeaugust 0.000 0.200 0.500 0.667date 0.200 0.000 0.100 0.400fruit 0.500 0.100 0.00 0.556tree 0.667 0.400 0.556 0.000

Then, to perform clustering using the word-level distance matrix as the input, we used both thesingle-linkage and UPGMA (see Section 2.3) asthe measure to calculate distance between newlyformed clusters and the distance threshold for ex-tracting flat clusters from hierarchical clusters wasset to 0.5.

Table 2 shows the clusters obtained using theword-level clustering. We see that #august, #date,

145

and #fruit are included in the same cluster in bothcases of the distance measures. This exampledemonstrates a case in which #date takes on mul-tiple sense identities and glues together #augustand #fruit in the same cluster at the word level al-though these two are not similar at the sense level,as shown next.

HashtagClusterusing single-linkage

Cluster usingUPGMA

august 1 1date 1 1fruit 1 1tree 1 2

Table 2: Cluster assignment at the word level.

Now, using the sense-level clustering, out of atotal of 28 senses associated with the four hash-tags, the algorithm picked 10 senses shown in Ta-ble 3. These 10 senses were picked as a resultof maximum pairwise similarity calculations be-tween two sets of senses belonging to each pairof hashtags. (With 4 hashtags, there are a max-imum of 12 senses that can be obtained for 6 (=C(4, 2)) maximum similarity pairs, and in this ex-ample case, there were duplicate senses, conse-quently giving 10 distinct senses.) As mentionedearlier, each of these senses represents the seman-tics of the hashtag word it belongs to, and thusmakes an entry into the similarity (or distance) ma-trix input to the hierarchical clustering algorithm.

The distance matrix obtained from the 10 sensesis shown in Figure 2. The numbers in bold face arethe maximum similarity values entered. Note thatdistance 1.000 means similarity 0.000.

Table 4 shows the resulting cluster assignments.(The outcome is the same for both distance mea-sures, which we believe is coincidental.) We seethat #august and #date are together in the samecluster and so are #date and #fruit but, unlike theword-level clustering result, the three of #august,#date, and #fruit are not altogether in the samecluster. This separation is because, at the senselevel, #date can no longer take on multiple identi-ties as it did at the word level.

5 EvaluationIn this evaluation, all algorithms were imple-mented in Python and the experiments were per-formed on a computer with OS X operating sys-tem, 2.6 GHz Intel Core i5 processor, and 8 GB

Sense Semanticsaugust.n.01 the month following July and pre-

ceding Septemberaugust.a.01 of or befitting a lordcorner.v.02 force a person or animal into a po-

sition from which he can not es-cape

date.n.02 a participant in a datedate.n.06 the particular day, month, or year

(usually according to Gregoriancalendar) that an even occurred

date.n.08 sweet edible fruit of the date palmwith single long woody seed

fruit.n.01 the ripened reproductive body ofa seed plant

fruit.v.01 cause to bear fruittree.n.01 a tall perennial woody plant hav-

ing a main trunk and branchesforming a distinct elevated crown;includes both gymnosperms andangiosperms

yield.n.03 an amount of product(‘n’ stands for noun, ‘v’ for verb and ‘a’ for adjective.)

Table 3: Senses and their semantics (source:Wordnet).

Hashtag Hashtag senseClusterusing single-linkage

Cluster usingUPGMA

date date.n.02 1 1tree tree.n.01 1 1fruit yield.n.03 2 2fruit fruit.v.01 3 3

august august.a.01 3 3tree corner.v.02 4 4fruit fruit.n.01 5 5date date.n.08 5 5

august august.n.01 6 6date date.n.06 6 6

Table 4: Cluster assignment at the sense level.

1600 MHz DDR3 memory.

5.1 Experiment setup5.1.1 Performance metricEvaluating clustering output is known to be a“black art” (Jain and Dubes, 1988) with no objec-tive accuracy criterion. It is particularly challeng-ing for the semantic hashtag clustering addressedin this paper. Sense-level clustering generatesmore items to be clustered than word-level andthe output clusters are overlapping. Therefore, in-ternal measures (e.g., Silhouette coefficient, SSE)are not desirable because they simply consider thecohesion and separation among the output clus-ters without regard to the semantic accuracy of the

146

Hashtag sense august.n.01 august.a.01 corner.v.02 date.n.02 date.n.06 date.n.08 fruit.n.01 fruit.v.01 tree.n.01 yield.n.03Hashtag august august tree date date date fruit fruit tree fruit

august.n.01 august 0.000 1.000 1.000 1.000 0.200 1.000 1.000 1.000 1.000 1.000august.a.01 august 1.000 0.000 0.667 1.000 1.000 1.000 1.000 0.500 1.000 1.000corner.v.02 tree 1.000 0.667 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000date.n.02 date 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.400 1.000date.n.06 date 0.200 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 1.000date.n.08 date 1.000 1.000 1.000 1.000 1.000 0.000 0.100 1.000 1.000 1.000fruit.n.01 fruit 1.000 1.000 1.000 1.000 1.000 0.100 0.000 1.000 1.000 1.000fruit.v.01 fruit 1.000 0.500 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000tree.n.01 tree 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.000 0.556

yield.n.03 tree 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.556 0.000

Figure 2: Distance matrix in the toy example.

items clustered. For this reason, our evaluation isdone with an external “standard” (i.e., gold stan-dard test). To this end, we use f-score, which iscommonly used in conjunction with recall and pre-cision to evaluate clusters in reference to groundtruth clusters, as the accuracy metric. In our eval-uation, the f-score is calculated for each pair of acluster in the ground truth cluster set and a clus-ter in the evaluated algorithm’s output cluster set.Then, the final f-score resulting from the compar-ison of the two cluster sets is obtained in two dif-ferent ways, depending on the purpose of the eval-uation. For the purpose of evaluating individualoutput clusters, the pairwise maximum (i.e., “bestmatch”) f-score, denoted as fm-score, is used asthe final score. Given a ground truth cluster G

i

matched against an output cluster set C, the fm-score is obtained as

fm-score(Gi

,C) =

maxCj2C^ f-score(Gi,Cj)>0

f-score(Gi

, Cj

) (4)

where the pairwise matching is one-to-one be-tween G and C.

On the other hand, for comparing overall accu-racy of the entire set of clusters, the weighted aver-age of pairwise maximum f-scores, denoted as fa-score, is used instead. Given a ground truth clusterset G and an output cluster set C, the fa-score iscalculated as

fa-score(G,C) =

PGi2G(fm-score(G

i

,C)⇥ |Gi

|)P

Gi2G |Gi

|(5)

5.1.2 DatasetWith the focus of evaluation on comparing be-tween the sense-level and the word-level of thesame clustering algorithm, deliberate choices weremade in the selection of the datasets and the num-ber of hashtags used in the experiments so that

they match those used in the evaluation of word-level clustering by Vicient and Moreno (2014).They used tweet messages from the Symplur web-site, and so we did.

We manually gathered a tweet dataset from theSymplur web site (http://www.symplur.com). Thedataset consists of 1,010 unique hashtags that areincluded in 2,910 tweets. The median of the num-ber of tweets per hashtag was only two. (Distribu-tion of the number of tweet messages per hashtaggenerally follows the power law (Muntean et al.,2012).

5.2 Experiment: gold standard testIn order to enable the gold standard test, we pre-pared a ground truth based on observed hashtagsemantics. Out of the 1,010 hashtags, we manu-ally annotated the semantics to choose 230 hash-tags and classified them into 15 clusters. The re-maining hashtags were classified as noise. Fig-ure 3 shows the sizes of the 15 ground truth clus-ters.

Figure 3: Sizes of ground truth clusters.

The distance threshold for determining flat clus-ters in hierarchical clustering was set using the“best result” approach. That is, we tried both dis-tance measures (i.e., single-linkage and UPGMA)and different distance threshold values and pickedthe measure and value that produced the best resultbased on the weighted average f-score measure.

Figure 4 shows the accuracies achieved by themetadata-based semantic clustering at the word-level and the sense-level. Table 5 shows more de-tails, including precision and recall for individualclusters. From the results we see that every sense-

147

level cluster outperforms the word-level counter-part (except cluster 1 due to rounding-off differ-ence). Particularly, the fm-scores are zero forword-level clusters 6, 14, and 15, thus bringingthe performance gain to “infinity”. (Word-levelclustering did not generate any cluster of size 3 orlarger and with the best match f-score to clusters 6,14, and 15 greater than 0.1.) Further, when all 15clusters are considered together, the weighted av-erage of maximum pairwise f-scores, fa-score, is0.43 for sense-level clustering and 0.34 for word-level clustering – a 26% gain.

Figure 4: Maximum pairwise f-scores of outputclusters from word-level and sense-level semanticclustering.

6 Related WorkThere are several works on semantic clustering ofhashtags that focused on the contextual semanticsof hashtags (Tsur et al., 2012; Tsur et al., 2013;Muntean et al., 2012; Rosa et al., 2011; Stilo andVelardi, 2014) by using the bag of words model torepresent the texts accompanying a hasthag. Tsuret al. (2012; 2013) and Muntean et al. (2012) ap-pended tweets that belonged to each unique hash-tag into a unique document called “virtual docu-ment”. These documents were then representedas vectors in the vector space model. Rosa etal. (2011) used hashtag clusters to achieve topi-cal clustering of tweets, where they compared theeffects of expanding URLs found in tweets. Stiloand Paola (2014) clustered hashtag “senses” basedon their temporal co-occurrence with other hash-tags. The term “sense”in their work is differentfrom the lexical sense used in this paper.

Lacking the ability to form lexical semanticsense-level clusters of hashtag has been a ma-jor shortcoming of the current approaches. Tothe best our knowledge, the work by Vicient andMoreno (2014) is the only one that opened re-search in this direction. They used Wordnet andWikipedia as the metadata source for clusteringhashtags at a word-level.

7 ConclusionIn this paper, we enhanced the current metadata-based semantic hashtag clustering algorithm bydetermining the semantic similarity between hash-tags at the sense level as opposed to the word level.This sense-level decision on clustering avoids in-correctly putting hashtags of different senses inthe same cluster. The result was significantlyhigher accuracy of semantic clusters without in-creasing the complexities of the algorithm in prac-tice. A gold standard test showed that the sense-level algorithm produced significantly more accu-rate clusters than the word-level algorithm, withan overall gain of 26% in the weighted average ofmaximum pairwise f-scores.

For the future work, new metadata sources canbe added to provide the metadata-based semantichashtag clustering algorithm with more abilities.For example, to understand hashtags of a differentlanguage, online translation services like GoogleTranslate (https://translate.google.com) can bea good source since empirical evidences sug-gest that it can be very effective in identify-ing spelling errors, abbreviations, etc. Addition-ally, crowdsourced websites like Urban Dictionary(www.urbandictionary.com) that specializes in in-formal human communication can be a helpfulmetadata source for decoding lexical semantics ofhashtags. Internet search engines also provide richinformation on the semantics of hashtags.

AcknowledgmentsThis project was supported by a Fulbright Programgrant sponsored by the Bureau of Educational andCultural Affairs of the United States Departmentof State and administered by the Institute of Inter-national Education.

ReferencesSandjai Bhulai, Peter Kampstra, Lidewij Kooiman, Ger

Koole, Marijn Deurloo, and Bert Kok. 2012. Trendvisualization on Twitter: What’s hot and what’s not?In Proceedings of the 1st International Conferenceon Data Analytics, pages 43–48.

Joanna Costa, Catarina Silva, Mario Antunes, andBernardete Ribeiro. 2013. Defining semantic meta-hashtags for twitter classification. Lecture Notes inComputer Science, 7824:226–235.

Internet Live Stats. last viewed inMay 2016. Twitter usage statistics.http://www.internetlivestats.com/twitter-statistics/.

148

Ground truth clusters Sense-level clusters Word-level clustersId Size Recall Precision fm-score Size Recall Precision fm-score Size1 32 0.63 0.65 0.63 31 0.63 0.67 0.65 302 26 0.35 0.39 0.37 23 0.31 0.35 0.33 233 23 0.39 0.43 0.41 21 0.35 0.19 0.24 434 23 0.91 0.84 0.88 25 0.83 0.76 0.79 255 22 0.41 0.45 0.43 20 0.41 0.20 0.27 446 14 0.21 0.18 0.19 17 n/a n/a n/a n/a7 14 0.64 0.50 0.56 18 0.64 0.50 0.56 188 12 0.25 0.43 0.32 7 0.50 0.24 0.32 259 11 0.82 0.39 0.53 23 0.82 0.08 0.14 11810 11 0.18 0.11 0.14 18 0.09 0.17 0.12 611 10 0.40 0.27 0.32 15 0.50 0.08 0.14 5912 9 0.11 0.25 0.15 4 0.11 0.17 0.13 613 9 0.22 0.33 0.27 6 0.22 0.29 0.25 714 8 0.13 0.25 0.17 4 n/a n/a n/a n/a15 6 0.17 0.20 0.18 5 n/a n/a n/a n/a

fa-score (weighted average of fm-scores) is 0.43 for sense-level clusters and 0.34 for word-level clusters.Table 5: Details of gold standard test results.

Anil K. Jain and Richard C. Dubes. 1988. Algorithmsfor Clustering Data. Prentice-Hall.

Ryan Kelly. last viewed in May 2016.Twitter study reveals interesting resultsabout usage - 40% is pointless babble.http://pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/.

Christopher D. Manning, Prabhakar Raghavan, andHinrich Schutze. 2008. Introduction to InformationRetrieval. Cambridge University Press.

Cristina Muntean, Gabriel Morar, and Darie Moldovan.2012. Exploring the meaning behind twitter hash-tags through clustering. Lecture Notes in BusinessInformation Processing, 127:231–242.

Suzi Park and Hyopil Shin. 2014. Identification ofimplicit topics in twitter data not containing explicitsearch queries. In Proceedings of the 25th Inter-national Conference on Computational Linguistics,pages 58–68.

Kevin Dela Rosa, Rushin Shah, and Bo Lin. 2011.Topical clustering of tweets. In Proceedings of the3rd Workshop on Social Web Search and Mining,pages 133–138, July.

Hassan Saif, Yulan He, and Harith Alani. 2012. Se-mantic sentiment analysis of twitter. In Proceedingsof the 11th International Conference on The Seman-tic Web - Volume Part I, pages 508–524.

SciPy.org. last viewed in May 2016. Hi-erarchical clustering flat cluster parameters.http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.fcluster.html.

Giovanni Stilo and Paola Velardi. 2014. Temporal se-mantics: Time-varying hashtag sense clustering. In

Proceedings of the 19th International Conference onKnowledge Engineering and Knowledge Manage-ment, pages 563–578, November.

Peter Teufl and Stefan Kraxberger. 2011. Extractingsemantic knowledge from twitter. In Proceedings ofthe 3rd IFIP WG 8.5 International Conference onElectronic Participation, pages 48–59.

Oren Tsur, Adi Littman, and Ari Rappoport. 2012.Scalable multi stage clustering of tagged micro-messages. In Proceedings of the 21st InternationalConference on World Wide Web, pages 621–622,April.

Oren Tsur, Adi Littman, and Ari Rappoport. 2013. Ef-ficient clustering of short messages into general do-mains. In Proceedings of the 7th International AAAIConference on Weblogs and Social Media.

Carlos Vicient and Antonio Moreno. 2014. Unsu-pervised semantic clustering of twitter hashtags. InProceedings of the 21st European Conference on Ar-ticifial Intelligence, pages 1119–1120, August.

Carlos Vicient. 2014. Moving Towards The SemanticWeb: Enabling New Technologies through the Se-mantic Annotation of Social Contents. Ph.D. thesis,Universitat Robira I Virgili, December.

Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou,and Ming Zhang. 2011. Topic sentiment analysisin Twitter: a graph-based hashtag sentiment classifi-cation approach. In Proceedings of the 20th ACMConference on Information and Knowledge Man-agement, pages 1031–1040, October.

Zhibiao Wu and Martha Palmer. 1994. Verbs seman-tics and lexical selection. In Proceedings of the 32ndAnnual Meeting on Association for ComputationalLinguistics, pages 133–138.

149

sense-level semantic clustering of hashtags in social mediaceur-ws.org/vol-1743/paper17.pdf ·...

Documents