using topic models for twitter hashtag recommendation
DESCRIPTION
Presentation given at the Making Sense of Micropost Worksop at the World Wide Web conference of 2013TRANSCRIPT
ELIS – Multimedia Lab
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Using Topic Models for Twitter Hashtag Recommendation
Multimedia Lab, Ghent University – iMinds, Belgium
Reservoir Lab, Ghent University, Belgium
Image and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (1)
Indexing
Search
Linking
General Topic
Memes Grouping
Information retrieval
3
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (2)
±10% of tweets contain a hashtag
3% of the hashtags are used more than 5 times
Indexing
Search
Linking
General Topic
MemesGrouping
4
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Goal
Suggest keywords that resemble the general topic of a tweet and that could be used as a hashtag
Promote hashtags for effective indexing
Allow for effective search of tweets through hashtags
Reduce the use of sparse hashtags
5
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Architectural overview
Basic filterTweetLanguage identificati
on
Topic distribution
Hashtag suggestion
Hashtagged tweet
6
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Basic filter
Clean up the tweet: URLs, special HTML entities, digits, punctuations, the hash character, …
During training:Remove tweets with just one wordRemove retweets
7
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Language identification
Why We need to build a language-dependent topic model.
Goal Build unsupervised classifier that discriminates between English and non-English tweets.
How Using Naive Bayes and the Expectation-Maximization algorithm + character n-gram features
Result Evaluation on a test set of 1000 randomly selected tweets
Lui & Baldwin (LangID.py)
Our algorithm
Precision
97.9% 97.0%
Recall 91.8% 97.8%
F1 94.8% 97.4%
8
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Calculating the topic distribution
Idea Find the general topic(s) of a tweet
How Using Latent Dirichlet Allocation to find the topic distribution in an unsupervised manner
Training 1.8 million tweets pre-filtered on 4000 keywords200 topics, α=0.1, β=0.1
Example “Please RT!! sign Bernie Sanders petition for the fiscal cliff! http://..”
0 1 2 3 57 199[0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05]
Topic 57:1. Fiscal2. Political3. President…
9
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (1)
Idea Suggest a number of hashtags based on the topic distribution of the tweet
How Sample the topic distribution and suggest the top ranked keywords
Yay, we got sixth period today school business light time period
Please RT!! Sign Bernie Sanders petition for the fiscall! Http://.. fiscal political traffic president policy
comfort, elegance, prettiness little good love relationship god
Example
10
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (2)
0 1 2 3 4 5 6 7 8 9 100
5
10
15
20
25
30
35
5 hashtags
10 hashtags
Number of correctly suggested hashtags
Perc
en
tag
e of
tweets
(%
)Evaluation of 100 tweets
11
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Conclusions and Future Work
We built a hashtag recommendation system:Suggests general keywordsUnsupervised
In the future:Use more context information: semantic web, social graph,…Adopt a hybrid approach between general and specifichashtags
12
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin