meaning as collective use: predicting semantic hashtag categories on twitter

Meaning as Collective Use:Predicting Semantic Hashtag Categories on Twitter

Lisa PoschGraz University of Technology,

Knowledge ManagementInstitute

Inffeldgasse 13, 8010 Graz,Austria

[email protected]

Claudia WagnerJOANNEUM RESEARCH,Institute for Information and

Communication TechnologiesSteyrergasse 17, 8010 Graz,

[email protected]

Philipp SingerGraz University of Technology,



[email protected]

Markus StrohmaierGraz University of Technology,



[email protected]

ABSTRACTThis paper sets out to explore whether data about the us-age of hashtags on Twitter contains information about theirsemantics. Towards that end, we perform initial statisti-cal hypothesis tests to quantify the association between us-age patterns and semantics of hashtags. To assess the util-ity of pragmatic features – which describe how a hashtagis used over time – for semantic analysis of hashtags, weconduct various hashtag stream classification experimentsand compare their utility with the utility of lexical features.Our results indicate that pragmatic features indeed containvaluable information for classifying hashtags into semanticcategories. Although pragmatic features do not outperformlexical features in our experiments, we argue that pragmaticfeatures are important and relevant for settings in which tex-tual information might be sparse or absent (e.g., in socialvideo streams).

Categories and Subject DescriptorsH.3.5 [Information Storage and Retrieval]: Online In-formation Services—Web-based services

KeywordsTwitter; hashtags; social structure; semantics

1. INTRODUCTIONA hashtag is a string of characters preceded by the hash

(#) character and it is used on platforms like Twitter asdescriptive label or to build communities around particulartopics [15]. To outside observers, the meaning of hashtagsis usually difficult to analyze, as they consist of short, oftenabbreviated or concatenated concepts (e.g., #MSM2013).

Copyright is held by the International World Wide Web ConferenceCommittee (IW3C2). IW3C2 reserves the right to provide a hyperlinkto the author’s site if the Material is used in electronic media.WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil.ACM 978-1-4503-2038-2/13/05.

Thus, new methods and techniques for analyzing the se-mantics of hashtags are definitely needed.

A simplistic view on Wittgenstein’s work [17] suggeststhat meaning is use. This indicates that the meaning ofa word is not defined by a reference to the object it denotes,but by the variety of uses to which the word is put. There-fore, one can use the narrow, lexical context of a word (i.e.,its co-occurring words) to approximate its meaning. Ourwork builds on this observation, but focuses on the prag-matics of a word (i.e., how a word, or in our case a hashtag,is used by a large group of users) – rather than its narrow,lexical context.

The aim of this work is to investigate to what extent prag-matic characteristics of a hashtag (which capture how a largegroup of users uses a hashtag) may reveal information aboutits semantics. Specifically, our work addresses the followingresearch questions:

• Do different semantic categories of hashtags reveal sub-stantially different usage patterns?

• To what extent do pragmatic and lexical properties ofhashtags help to predict the semantic category of ahashtag?

To address these research questions we conducted an em-pirical study on a broad range of diverse hashtag streams be-longing to eight different semantic categories (such as tech-nology, sports or idioms) which have been identified in pre-vious research [12] and have shown to be useful for groupinghashtags. From each of the eight categories, we selected tensample hashtags at random and collected temporal snap-shots of messages containing at least one of these hashtagsat three different points in time. To quantify how hashtagsare used over time, we extended the set of pragmatic streammeasures which we introduced in our previous work [16] andapplied them to the hashtag streams in our dataset. Thesepragmatic measures capture not only the social structure ofa hashtag at specific points in time, but also the changes insocial structure over time.

To answer the first research question, we used statisti-cal standard tests which allow to quantify the associationbetween pragmatic characteristics of hashtag streams andtheir semantic categories. To tackle the second researchquestion, we firstly computed lexical features using a a stan-dard bag-of-words model with term frequency (TF). Then,we trained several classification models with lexical featuresonly, pragmatic features only and a combination of both. Wecompared the performance of different classification modelsby using standard evaluation measures such as the F1-score(which is defined as the harmonic mean of precision andrecall). To get a fair baseline for our classification mod-els, we constructed a control dataset by randomly shufflingthe category labels of the hashtag streams. That means wedestroyed the original relationship between the pragmaticproperties and the semantic categories of hashtags.

Our results show that pragmatic features indeed revealinformation about hashtags’ semantics and perform signifi-cantly better than the baseline. They can therefore be usefulfor the task of semantically annotating social media con-tent. Not surprisingly, our results also show that lexical fea-tures are more suitable than pragmatic features for the taskof semantically categorizing hashtag streams. However, anadvantage of pragmatic features is that they are language-and text-independent. Pragmatic features can be applied totasks where the creation of lexical features is not possible –such as multimedia streams. Also for scenarios where tex-tual content is available, pragmatic features allow for moreflexibility due to their independence of the language used inthe corpus. Our results are relevant for social media andsemantic web researchers who are interested in analyzingthe semantics of hashtags in textual or non-textual socialstreams (e.g., social video streams).

This paper is structured as follows: Section 2 gives anoverview of related research on analyzing the semantics oftags in social bookmarking systems and research on hash-tagging on Twitter in general. In Section 3 we describeour experimental setup, including our datasets, feature en-gineering and evaluation approach. Our results are reportedin Section 4 and further discussed in Section 5. Finally, weconclude our work in Section 6.

2. RELATED WORKIn the past, a considerable effort has been spent on study-

ing the semantics of tags (e.g., tags in social bookmarkingsystems), but also hashtags in Twitter have received atten-tion from the research community.

Semantics of tags: On the one hand, researchers ex-plored to what extent semantics emerge from folksonomiesby investigating different algorithms for extracting tag net-works and hierarchies from such systems (see e.g., [1], [3] or[13]). The work of [14] evaluated three state-of-the-art folk-sonomy induction algorithms in the context of five socialtagging systems. Their results show that those algorithmsspecifically developed to capture intuitions of social taggingsystems outperform traditional hierarchical clustering tech-niques. Korner et al. [5] investigated how tagging usage pat-terns influence the quality of the emergent semantics. Theyfound that ‘verbose’ taggers (describers) are more useful forthe emergence of tag semantics than users who use a smallset of tags (categorizers).

On the other hand, researchers investigated to what extenttags (and the resources they annotate) can be semantically

grounded and classified into predefined semantic categories.For example, Noll and Meinel [8] presented a study of thecharacteristics of tags and determined their usefulness forweb page classification [9]. Similar to our work, Overell etal. [10] presented an approach which allows classifying tagsinto semantic categories. They trained a classifier to classifyWikipedia articles into semantic categories, mapped Flickrtags to Wikipedia articles using anchor texts in Wikipediaand finally classified Flickr tags into semantic categories byusing the previously trained classifier. Their results showthat their ClassTag system increases the coverage of the vo-cabulary by 115% compared to a simple WordNet approachwhich classifies Flickr tags by mapping them to WordNetvia string matching techniques. Unlike our work, they didnot take into account how tags are used, but learn relationsbetween tags and semantic categories via mapping them toWikipedia articles.

Pragmatics and semantics of hashtags: On Twitter,users have developed a tagging culture by adding a hashsymbol (#) in front of a short keyword. The first introduc-tion of the usage of hashtags was provided by Chris Messinain a blog post [7]. Huang et al. [4] state that this kindof new tagging culture has created a completely new phe-nomenon, called micro-meme. The difference between suchmicro-memes and other social tagging systems is that theparticipation in micro-memes is an a-priori approach, whileother social tagging systems follow an a-posteriori approach.This is due to the fact that users are influenced by the ob-servation of the usage of micro-meme hashtags adopted byother users. The work of [4] suggests that hashtagging inTwitter is more commonly used to join public discussionsthan to organize content for future retrieval. The role ofhashtags has also been investigated in [18]. Their studyconfirms that a hashtag serves both as a tag of content anda symbol of community membership. Laniado and Mika [6]explored to what extent hashtags can be used as strong iden-tifiers like URIs are used in the Semantic Web. They mea-sured the quality of hashtags as identifiers for the SemanticWeb, defining several metrics to characterize hashtag usageon the dimensions of frequency, specificity, consistency, andstability over time. Their results indicate that the lexicalusage of hashtags can indeed be used to identify hashtagswhich have the desirable properties of strong identifiers. Un-like our work, their work focuses on lexical usage patternsand measures to what extent those patterns contribute tothe differentiation between strong and weak semantic iden-tifiers (binary classification) while we use usage patterns toclassify hashtags into semantic categories.

Recently, researchers have also started to explore the dif-fusion dynamics of hashtags - i.e., how hashtags spread inonline communities. For example the work of [15] aims topredict the exposure of a hashtag in a given time framewhile [12] are interested in the temporal spreading patternsof hashtags.

3. EXPERIMENTAL SETUPOur experiments are designed to explore to what extent

pragmatic properties of hashtag streams can be used togauge the semantic category of a hashtag. We are not onlyinterested in the idiosyncrasies of hashtag usage within onesemantic category but also in the deltas between differentsemantic categories. In this section, we first introduce ourdataset as well as the pragmatic and lexical measures which

we used to describe hashtag streams. Then we present themethodology and evaluation approach which we used to an-swer our research questions.

3.1 DatasetIn this work we use data that we acquired from Twit-

ter’s API. Romero et al. [12] conducted a user study and aclassification experiment and identified eight broad semanticcategories of hashtags: celebrity, games, idiom, movies/TV,music, political, sports and technology. We used a list con-sisting of the 500 hashtags which were used by most userswithin their dataset and which were manually assigned tothe eight categories as a starting point for creating our owndataset.

For each category, we chose ten hashtags at random (seeTable 1). We biased our random sample towards activehashtag streams by re-sampling hashtags for which we foundless than 1000 posts at the beginning of our data collection(March 4th, 2012). For those categories for which we couldnot find ten hashtags that had more than 1000 posts (i.e.,games and celebrity), we selected the most active hashtagsper category (i.e., the hashtags for which we found the mostposts).

The dataset consists of three parts, each part representinga time frame of four weeks. The different time frames ensurethat we can observe the usage of a hashtag over a givenperiod of time. The time frames are independent of eachother, i.e., the data collected at one time frame does notcontain any information of the data collected at another timeframe.

At the start of each time frame, we retrieved the mostrecent tweets in English for each hashtag using Twitter’spublic search API. Afterwards, we retrieved the followersand followees of each user who had authored at least onemessage in our hashtag stream dataset. Some pragmatic fea-tures capture information about who potentially consumesa hashtag stream (followers) or who potentially informs au-thors of a hashtag stream (followees) and therefore requirethe one-hop neighborhood of hashtag streams’ authors. Inthis work, we call users who hold both of these roles (i.e.,have established a bidirectional link with an author) friends.The starting dates of the time frames were March 4th (t0),April 1st (t1) and April 29th, 2012 (t2). Table 2 depictsthe number of tweets and relations between users that wecollected during each time frame.

The stream tweets were retrieved on the first day of eachtime frame, fetching tweets that were authored a maximumof seven days previous to the date of retrieval. During thefirst week of each time frame, the user IDs of the followersand followees were collected. Figure 1 depicts this process.

Since we were interested in learning what types of char-acteristics are useful for describing a semantic hashtag cat-egory, we removed hashtag streams that belong to multiple

t0 t1 t23/4/2012 4/1/2012 4/29/2012

stream tweets

crawl of social

structure

stream tweets

crawl of social

structure

stream tweets

crawl of social

structure

1 week

Figure 1: Timeline of the data collection process

categories (concretely, we removed the two hashtags #bsband #mj). We also decided to remove inactive hashtagstreams (those where less than 300 posts where retrieved)as estimating information theoretic measures is problematicif only few observations are available [11]. The most commonsolution is to restrict the measurements to situations whereone has an adequate amount of data. We found four inactivehashtags in the category games and seven in the categorycelebrity. The removal of these hashtag streams resulted inthe complete removal of the category celebrity as it was onlyleft with one hashtag stream (#michaeljackson). A possi-ble explanation for the low number of tweets in the hashtagstreams for this category is that topics related to celebritieshave a shorter life-span than topics related to other cate-gories. Our final datasets consist of 64 hashtag streams andseven semantic categories which were sufficiently active dur-ing our observation period.

Table 2: Description of the complete dataset

t0 t1 t2

Tweets 94,634 94,984 95,105Authors 53,593 54,099 53,750Followers 56,685,755 58,822,119 66,450,378Followees 34,025,961 34,263,129 37,674,363Friends 21,696,134 21,914,947 24,449,705Mean Followers per Author 1,057.71 1,087.31 1,236.29Mean Followees per Author 634.90 633.34 700.92Mean Friends per Author 404.83 405.09 454.88

3.2 Feature EngineeringIn the following, we define the pragmatic and lexical fea-

tures which we designed to capture the different social andmessage based structures of hashtag streams. For our prag-matic features we further differentiate between static prag-matic features (which capture the social structure of a hash-tag at a specific point in time) and dynamic pragmatic fea-tures (which combine information from several time points).

3.2.1 Static Pragmatic Measures:Entropy Measures are used to measure the random-

ness of streams’ authors and their followers, followees andfriends. For each hashtag stream, we rank the authorsby the number of messages they published in that stream(norm entropy author) and we rank the followers (norm -entropy follower), followees (norm entropy followee) and

Table 1: Randomly selected hashtags per category (ordered alphabetically)technology idioms sports political games music celebrity movies

blackberry factaboutme f1 climate e3 bsb ashleytisdale avatarebay followfriday football gaza games eurovision brazilmissesdemi bbcqt

facebook dontyouhate golf healthcare gaming lastfm bsb bonesflickr iloveitwhen nascar iran mafiawars listeningto michaeljackson chuck

google iwish nba mmot mobsterworld mj mj gleeiphone nevertrust nhl noh8 mw2 music niley glennbeck

microsoft omgfacts redsox obama ps3 musicmonday regis moviesphotoshop oneofmyfollowers soccer politics spymaster nowplaying teamtaylor supernatural

socialmedia rememberwhen sports teaparty uncharted2 paramore tilatequila tvtwitter wheniwaslittle yankees tehran wow snsd weloveyoumiley xfactor

friends (norm entropy friend) by the number of stream’sauthors they are related with. A high author entropy indi-cates that the stream is created in a democratic way since allauthors contribute equally much. A high follower entropyand friend entropy indicate that the followers and friends donot focus their attention towards few authors but distributeit equally across all authors. A high followee entropy andfriend entropy indicate that the authors do not focus theirattention on a selected part of their audience.

Overlap Measures describe the overlap between the au-thors and the followers (overlap authorfollower), followees(overlap authorfollowee) or friends (overlap authorfriend) ofa hashtag stream. If overlap is one, all authors of a streamare also followers, followees or friends of stream authors.This indicates that the stream is consumed and produced bythe same users. A high overlap suggests that the communityaround the hashtag is rather closed, while a low overlap indi-cates that the community is more open and that active andpassive part of the community do not extensively overlap.

Coverage Measures characterize a hashtag stream viathe nature of its messages. We introduce four coveragemeasures. The informational coverage measure (informa-tional) indicates how many messages of a stream have aninformational purpose - i.e., contain a link. The conversa-tional coverage (conversational) measures the mean numberof messages of a stream that have a conversational purpose- i.e., those messages that are directed to one or severalspecific users (e.g., through @replies). The retweet cover-age (retweet) measures the percentage of messages whichare retweets. The hashtag coverage (hashtag) measures themean number of hashtags per message in a stream.

3.2.2 Dynamic Pragmatic Measures:To explore how the social structure of a hashtag stream

changes over time, we measure the distance between thetweet-frequency distributions of authors at different timepoints, and the author-frequency distributions of followers,followees or friends at different time points. The intuitionbehind these features is that certain semantic categories ofhashtags may have a fast changing social structure sincenew people start and stop using those types of hashtagsfrequently, while other semantic categories may have a morestable community around them which changes less over time.

We use a symmetric variation of the Kullback-Leibler di-vergence (DKL) which represents a natural distance mea-sure between two probability distributions (A and B) andis defined as follows: 1

2DKL(A||B) + 1

2DKL(B||A). The KL

divergence is also known as relative entropy or informationdivergence. The KL divergence is zero if the two distri-butions are identical and approaches infinity as they differmore and more. We measure the KL divergence for the dis-tributions of authors (kl authors), followers (kl followers),followees (kl followees) and friends (kl friends).

Figure 2 visualizes the different time frames and their no-tation. t0 only contains the static features computed fromdata collected at t0. Consequently, t1 and t2 only containthe static features computed from data collected at t1 ort2, respectively. t0→1 includes static features computed ondata collected at t0 and the dynamic measures computedon data collected at t0 and t1. t1→0 includes static featurescomputed on data collected at t1 and the dynamic measurescomputed on data collected at t0 and t1. t1→2 and t2→1 aredefined in the same way.

3.2.3 Lexical Measures:We use vector-based methods which allow representing

each microblog message as a vector of terms and use termfrequency (TF ) as weighting schema. In this work lexicalmeasures are always computed for individual time pointsand are therefore static measures.

3.3 Usage Patterns of Hashtag CategoriesOur first aim is to investigate whether different seman-

tic categories of hashtags reveal substantially different us-age patterns (such as that they are used and/or consumedby different sets of users or that they are used for differ-ent purpose). To compare the pragmatic fingerprints ofhashtags belonging to different semantic categories and toquantify the differences between categories, we conducteda pairwise Mann-Whitney-Wilcoxon-Test which is a non-parametric statistical hypothesis test for assessing whetherone of two samples of independent observations tends to havelarger values than the other. We used a non-parametric testsince the Shapiro-Wilk-Test revealed that not all featuresare normally distributed, even after applying arcsine trans-formation to ratio measures. Holm-Bonferroni method wasused for adjusting the p-values and counteract the problemof multiple comparisons. For this experiment, we used thetimeframes t0→1 and t1→2.

t2t1t0

t0→1 t1→2

t1→0 t2→1

Figure 2: Illustration of our time frames

3.4 Hashtag ClassificationOur second aim is to investigate to what extent pragmatic

and lexical properties of hashtag streams contribute to clas-sify them into their semantic categories. That means weaim to classify temporal snapshots of hashtag streams intotheir correct semantic categories (to which they were as-signed in [12]) just by analyzing how they are used overtime. We then compare the performance of the pragmati-cally informed classifier with the performance of a classifierinformed by lexical features within a semantic multiclassclassification task. We used the timeframes t1→0 and t2→1

for this experiment in order to avoid including informationfrom the ‘future’ in our classification.

We performed grid search with varying hyperparametersusing Support Vector Machine (linear and RBF kernels) andan ensemble method with extremely randomized trees. Sinceextremely randomized trees are a probabilistic method andperform slightly different in each run, we run them ten timesand report the average scores. The features were standard-ized by subtracting the mean and scaling to unit variance.We used stratified 6-fold cross-validation (CV) to train andtest each classification model.

Since we have two different types of pragmatic features,static and dynamic ones, we trained and tested three sepa-rate classification models which were only informed by prag-matic information:

Static Pragmatic Model: We trained and tested thisclassification model with static pragmatic features on datacollected at t1 using stratified 6-fold CV. The experimentwas repeated on the data collected at t2.

Dynamic Pragmatic Model: We trained and testedthe classification model with dynamic pragmatic features ondata collected at t0 and t1 using stratified 6-fold CV. Thecomputation of our dynamic features requires at least twotime points. We repeated this experiment on data collectedat t1 and t2.

Combined Pragmatic Model: We combined the staticand dynamic pragmatic features, and trained and tested theclassification model on the data of t1→0 using stratified 6-fold CV. Again, we repeated the experiment on the data oft2→1.

We also performed our classification with a model usingour lexical features (i.e., TF weighted words). Finally, wetrained and tested a combined classification model usingpragmatic and lexical features, which leads to the follow-ing classification models:

Lexical Model: We trained and tested the model ondata from t1 using stratified 6-fold CV and repeated theexperiment on data collected at t2.

Combined Pragmatic and Lexical Model: We trainedand tested the mixed classifier on the data of t1→0 us-ing stratified 6-fold CV, then repeated this experiment fort2→1. A simple concatenation of pragmatic and lexical fea-tures is not useful, since the vast amount of lexical featureswould overrule the pragmatic features. Therefore, we useda stacking method (see [2]) and performed firstly a clas-sification using lexical features alone and Leave-One-Outcross-validation. We used a SVM with linear kernel forthis classification since it worked best for these features.Secondly, we combined the pragmatic features with the re-sulting seven probability features which we got from theprevious classification model and which describe how likelyeach semantic class is for a certain stream given its words.

To get a fair baseline for our experiment, we constructed acontrol dataset by randomly shuffling the category labels of

the 64 hashtag streams. That means we destroyed the orig-inal relationship between the pragmatic properties and thesemantic categories of hashtags and evaluated the perfor-mance of a classifier which tries to use pragmatic propertiesto classify hashtags into their shuffled categories within a 6-fold cross-validation. We repeated the random shuffling 100times and used the resulting average F1-score as our base-line performance. For the baseline classifier we also used gridsearch to determine the optimal parameters prior to train-ing. Our baseline classifier tests how well randomly assignedcategories can be identified compared to our real semanticcategories. One needs to note that a simple random guesserbaseline would be a weaker baseline than the one describedabove and would lead to a performance of 1/7.

To gain further insights into the impact of individual prop-erties, we analyzed their information gain (IG) with respectto the categories. The information gain measures how accu-rately a specific stream property P is able to predict stream’scategory C and is defined as follows: IG(C,P ) =H(C) −H(C | P ) where H denotes the entropy.

4. RESULTSIn the following section, we present the results from our

empirical study on usage patterns of different semantic cat-egories of hashtags.

4.1 Usage Patterns of Hashtag CategoriesTo answer our first research question, we explored to what

extent usage patterns of hashtag streams in different seman-tic categories are indeed significantly different.

Our results indicate that some pragmatic measures areindeed significantly different for distinct semantic categories.This indicates that hashtags of certain categories are usedin a very specific way which may allow us to relate thesehashtags with their semantic categories just by observinghow users use them. Table 3 depicts the measures that showstatistically significant (p < 0.05) differences in both t0→1

and t1→2. In total, 35 pragmatic category differences werefound to be statistically significant (with p < 0.05) for t0→1

and 33 for t1→2. 26 pragmatic category differences werefound to be significant for both t0→1 and t1→2 which suggeststhat results are independent of our choice of time frame.

Table 3: Features which showed a statistically significant difference (with p < 0.05) for each pair of categoriesin both t0→1 and t1→2

games idioms movies music political sportsidioms informational

retweetmovies informationalmusic informationalpolitical kl followers kl authors

kl followerskl followeesinformationalhashtag

sports kl followers kl authorskl followersinformational

technology kl followers kl authors kl friends kl friends overlap authorfollowerkl followers overlap authorfriendkl followeeskl friendsinformationalretweethashtag

games

idioms

movies

music

political

sports

technology

0.0 0.2 0.4 0.6 0.8 1.0

Informational Coverage

(a) This figure shows the percentage of mes-sages of hashtag streams belonging to differ-ent categories that contain at least one link.

games

idioms

movies

music

political

sports

technology

6 8 10 12 14

KL Divergence Authors

(b) This figure shows how much the au-thors’ tweet-frequency distributions of hash-tag streams of different categories change onaverage.

games

idioms

movies

music

political

sports

technology

1 2 3 4 5

KL Divergence Followers

(c) This figure shows how much the follow-ers’ author-frequency distributions of hash-tag streams of different categories change onaverage.

games

idioms

movies

music

political

sports

technology

1 2 3 4 5 6

KL Divergence Friends

(d) This figure shows how much the friends’author-frequency distributions of hashtagstreams of different categories change on av-erage.

Figure 3: Each plot shows the feature distribution of different categories of one of the 4 best pragmaticfeatures for t0→1. We obtained similar results for t1→2.

Not surprisingly, the category which shows the most spe-cific usage patterns is idioms and therefore the hashtags ofthis category can be distinguished from all hashtags just byanalyzing their pragmatic properties. Hashtag streams ofthe category idioms exhibit a significantly lower informa-tional coverage than hashtag streams of all other categories(see Figure 3(a)) and a significantly higher symmetric KL di-vergence for author’s tweet-frequency distributions (see Fig-ure 3(b)). Also the followers’ and friends’ author-frequencydistributions tend to have a higher symmetric KL divergencefor idioms hashtags than for other hashtags (see Figures 3(c)and 3(d)). This indicates that the social structure of hashtagstreams in the category idioms changes faster than hashtagsof other categories. Furthermore, hashtag streams of thiscategory are less informative - i.e., contain significantly lesslinks per message on average.

The category technology can be distinguished from allother categories except sports, particularly because its fol-lowers’ and friends’ author-frequency distributions havesignificantly lower symmetric KL divergences than hash-tags in the categories games, idioms, movies and music(see Figures 3(c) and 3(d)). This indicates that hashtagstreams in the category technology have a stable socialstructure which changes less over time. This is not sur-prising since this semantic category denotes a topical areaand users who are interested in such areas may consumeand provide information on a regular base. It is especiallyinteresting to note that the only pragmatic measures whichallows distinguishing political and technological hashtagstreams are the author-follower and author-friend over-laps since these overlaps are significantly lower for thecategory technology compared to the category political.

Com

b. P

ragm

atic

and

Lex

ical

Com

bine

d P

ragm

atic

Dyn

amic

Pra

gmat

ic

Lexi

cal

Sta

tic P

ragm

atic

F1 m

easu

re

0.0

0.2

0.4

0.6

0.8

1.0

Baseline performance Model performance

(a) t1→0

Com

b. P

ragm

atic

and

Lex

ical

Com

bine

d P

ragm

atic

Dyn

amic

Pra

gmat

ic

Lexi

cal

Sta

tic P

ragm

atic

F1 m

easu

re

0.0

0.2

0.4

0.6

0.8

1.0

Baseline performance Model performance

(b) t2→1

Figure 4: Weighted averaged F1-scores of different classification models trained and tested on t1→0 4(a) andt2→1 4(b) using 6-fold cross-validation

This indicates that the content of hashtag streams of thecategory political is far more likely to be produced and con-sumed by the same people than content of technologicalhashtag streams.

Comparing the individual measures reveals that the infor-mational coverage (six category pairs) and the symmetricKL divergences of followers’ author-frequency distributions(six category pairs), authors’ tweet-frequency distributions(three pairs) and friends’ follower-frequency distributions(three pairs) are the most discriminative measures. Figure 3depicts the distributions of these four measures per category.Other measures that show significant differences in mediansfor both t0→1 and t1→2 are the symmetric KL divergenceof followees’ author-frequency distributions (two pairs), theauthor-follower and the author-friend overlap (one pair) aswell as the retweet and hashtag coverage (two pairs).

Some measures like the conversational coverage measuredid not show any significant differences for any of the cat-egory pairs, for any time frame. This indicates that in allhashtag streams an equal amount of conversational activitiestake place.

4.2 Hashtag ClassificationIn order to quantify the value of different pragmatic and

lexical properties of hashtag streams for predicting their se-mantic category, we conducted a hashtag stream classifi-cation experiment and systematically compared the perfor-mance of various classification models trained with differentsets of features.

Figure 4 shows the performance of the best classifier (ex-tremely randomized trees) trained with different sets of fea-tures. One can see from this figure that in general lexicalfeatures perform better than pragmatic features, but alsothat pragmatic features (both static and dynamic) signifi-cantly outperform a random baseline. This indicates thatpragmatic features indeed reveal information about a hash-tag’s meaning, even though they do not match the perfor-mance of lexical features in this case. In 4(a) we can see thatfor t1→0 the combination of lexical and pragmatic featuresperforms slightly better than using lexical features alone.

Table 4: Top features for two different datasetsranked via Information Gain

Rank t1→0 t2→1

1 informational kl followers2 kl followers informational3 kl friends hashtag4 hashtag kl followees5 norm entropy friend kl friends

4.2.1 Feature Ranking:In addition to the overall classification performance which

can be achieved solely based on analyzing the pragmatics ofhashtags, we were also interested in gaining insights into theimpact of individual pragmatic features. To evaluate the in-dividual performance of the features we used informationgain (with respect to the categories) as a ranking criterion.The ranking was performed on t1→0 and t2→1 with stratified6-fold cross-validation. Table 4 shows that the top five fea-tures (i.e., the pragmatic features which reveal most aboutthe semantic of hashtags) are features which capture thetemporal dynamics of the social context of a hashtag (i.e.,the temporal follower, followees and friends dynamics) aswell as the informational and hashtag coverage. This indi-cates that the collective purpose for which a hashtag is used(i.e., if it used to share information rather than for otherpurposes) and the social dynamics around a hashtags – i.e.,who uses a hashtag for whom – play a key role in under-standing its semantics.

5. DISCUSSION OF RESULTSAlthough our results show that lexical features work best

within the semantic classification task, those features aretext and language dependent. Therefore, their applicabilityis limited to settings where text is available. Pragmatic fea-tures on the other hand rely on usage information which isindependent of the type of content which is shared in socialstreams and can therefore also be computed for social videoor image streams.

We believe that pragmatic features can supplement lex-ical features if lexical features alone are not sufficient. Inour experiments, we could see that the performance mayslightly increase when combining pragmatic and lexical fea-tures. However, the effect was not significant. We thinkthe reason for this is that in our setup lexical features alonealready achieved good performance.

The classification results coincide with the results of thestatistical significance tests. Ranking the properties by in-formation gain showed that the most discriminative proper-ties (the ones that showed a statistical significance in botht0→1 and t1→2 for the highest amount of category pairs)found in 4.1 were also the top ranked features (informationalcoverage and the KL divergences).

6. CONCLUSIONS AND IMPLICATIONSOur work suggests that the collective usage of hashtags

indeed reveals information about their semantics. How-ever, further research is required to explore the relationsbetween usage information and semantics, especially in do-mains where limited text is available. We hope that ourresearch is a first step into this direction since it shows thathashtags of different semantic categories are indeed used indifferent ways.

Our work has implications for researchers and practition-ers interested in investigating the semantics of social mediacontent. Social media applications such as Twitter providea huge amount of textual information. Beside the textualinformation, also usage information can be obtained fromthese platforms and our work shows how this informationcan be exploited for assigning semantic annotations to tex-tual data streams.

7. ACKNOWLEDGEMENTSThis work was supported in part by a DOC-fForte fellow-

ship of the Austrian Academy of Science to Claudia Wagner.Furthermore, this work is in part funded by the FWF Aus-trian Science Fund Grant I677 and the Know-Center Graz.

8. REFERENCES[1] D. Benz, C. Korner, A. Hotho, G. Stumme, and

M. Strohmaier. One tag to bind them all: Measuringterm abstractness in social metadata. In Proc. of 8thExtended Semantic Web Conference ESWC 2011,Heraklion, Crete, (May 2011), 2011.

[2] T. Hastie, R. Tibshirani, and J. Friedman. TheElements of Statistical Learning. Springer Series inStatistics. Springer New York Inc., New York, NY,USA, 2001.

[3] P. Heymann and H. Garcia-Molina. Collaborativecreation of communal hierarchical taxonomies in socialtagging systems. Technical Report 2006-10, ComputerScience Department, April 2006.

[4] J. Huang, K. M. Thornton, and E. N. Efthimiadis.Conversational tagging in twitter. In Proceedings ofthe 21st ACM conference on Hypertext andhypermedia, HT ’10, pages 173–178, New York, NY,USA, 2010. ACM.

[5] C. Korner, D. Benz, M. Strohmaier, A. Hotho, andG. Stumme. Stop thinking, start tagging - tagsemantics emerge from collaborative verbosity. In

Proceedings of the 19th International World WideWeb Conference (WWW 2010), Raleigh, NC, USA,Apr. 2010. ACM.

[6] D. Laniado and P. Mika. Making sense of twitter. InP. F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika,L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm,editors, International Semantic Web Conference (1),volume 6496 of Lecture Notes in Computer Science,pages 470–485. Springer, 2010.

[7] C. Messina. Groups for twitter; or a proposal fortwitter tag channels. http://factoryjoe.com/blog/2007/08/25/groups-for-twitter-or-a-proposal-

for-twitter-tag-channels/, 2007.

[8] M. G. Noll and C. Meinel. Exploring socialannotations for web document classification. InProceedings of the 2008 ACM symposium on Appliedcomputing, SAC ’08, pages 2315–2320, New York, NY,USA, 2008. ACM.

[9] M. G. Noll and C. Meinel. The metadata triumvirate:Social annotations, anchor texts and search queries. InWeb Intelligence, pages 640–647. IEEE, 2008.

[10] S. Overell, B. Sigurbjornsson, and R. van Zwol.Classifying tags using open content resources. InProceedings of the Second ACM InternationalConference on Web Search and Data Mining, WSDM’09, pages 64–73, New York, NY, USA, 2009. ACM.

[11] S. Panzeri and A. Treves. Analytical estimates oflimited sampling biases in different informationmeasures. Network: Computation in Neural Systems,7(1):87–107, 1996.

[12] D. M. Romero, B. Meeder, and J. Kleinberg.Differences in the mechanics of information diffusionacross topics: idioms, political hashtags, and complexcontagion on twitter. In Proceedings of the 20thinternational conference on World wide web, WWW’11, pages 695–704, New York, NY, USA, 2011. ACM.

[13] P. Schmitz. Inducing ontology from flickr tags. InProceedings of the Workshop on Collaborative Taggingat WWW2006, Edinburgh, Scotland, May 2006.

[14] M. Strohmaier, D. Helic, D. Benz, C. Korner, andR. Kern. Evaluation of folksonomy inductionalgorithms. Transactions on Intelligent Systems andTechnology, 2012.

[15] O. Tsur and A. Rappoport. What’s in a hashtag?:content based prediction of the spread of ideas inmicroblogging communities. In Proceedings of the fifthACM international conference on Web search anddata mining, WSDM ’12, pages 643–652, New York,NY, USA, 2012. ACM.

[16] C. Wagner and M. Strohmaier. The wisdom intweetonomies: Acquiring latent conceptual structuresfrom social awareness streams. In Semantic SearchWorkshop at WWW2010, 2010.

[17] L. Wittgenstein. Philosophical Investigations.Blackwell Publishers, London, United Kingdom, 1953.Republished 2001.

[18] L. Yang, T. Sun, M. Zhang, and Q. Mei. We knowwhat @you #tag: does the dual role affect hashtagadoption? In Proceedings of the 21st internationalconference on World Wide Web, WWW ’12, pages261–270, New York, NY, USA, 2012. ACM.