a network based model for predicting a hashtag break out in twitter
TRANSCRIPT
A Network-Based Model for Predicting Hashtag Breakouts in Twitter
Agenda
Background
Methodology
Our visualization tool
Experiment & Results
Introduction
Tweets:
Textual contents
User interaction: retweeting,
mentioning, replying, etc.
Hashtags:
tagging mechanism created
by users
Help in categorizing tweets
Become very popular in
trending topics
Some Definitions
Tweet Hashtag Volume: Number of tweets “containing a given
hashtag” per day.
Spike: sharp increase in the volume
Research Question
Some hashtags become viral.
Can we predict whether a hashtag will go viral at nascent
stages?
Network base?
Textual Content base?
Viral Diffusion
Network Based Analysis
• Arruda et al. examined the role of centrality measures in diseasespread on a SIR model and spreading rumors on a social network.
• In SIR model for rumors, infected individuals recover by someprobability while a spreader becomes a carrier thru contacts insocial networks.
Content Based Analysis
• Hypothesized that a specific groups of words are more likely to be contained in viral tweets.
• Li et al. analyzed tweets in terms of emotional divergence aspects (or sentiment analysis) and noted that highly interactive tweets tend to contain more negative emotions than other tweets.
Running average and standard deviation
20 days sliding window
Running Average and Standard Deviation
20 days sliding window
Hashtag Volume
Utilizing Three Sigma Rule
68-95-99.7 Rule
Empirical rule
Hashtags Distribution
Accumulative Period
Break out or Die
out?
Build a
predictive
learning model
based on …
Accumulative Period
Break out or Die
out?
Build a
predictive
learning model
based on …
Break out vs Die out
Break out Non break out (Die out)
Our Approach
Can we predict #Hashtag breakouts in Twitter at their early stages using local and global network interaction measures ?
Local measures: interaction network within the 20 days accumulation window.
Global measures: interaction network from earlier until the end of the current window.
1. Define a 3-sigma/empirical rule based breakout measure2. Model evolutionary episodes of hashtag volumes, as:
• Accumulation, Breakout, Die-Out3. Extract local and global network features 4. Train and test a classifier to:
• Predict if Accumulation leads to Breakout or Die-Out
IDENTIFY evolutionary episodes in #Hashtag volume time-series
BreakoutAccumulation Die-out Accumulation Die-out
Trending Hashtag Forcaster
Local and global network measures are computed as features
Network measures:
Eigen Vector Centrality
Page Rank
Closeness Centrality
Betweeness Centrality
Degree Centrality
Indegree Centrality
Outdegree Centrality
Link Rate
Distinct Link Rate
Number of Uninfected neighbors of early adopters
Neighborhood average degree
PCA Ranking of Features
Exploratory method: reducing the original measure
variables by orthogonal transformation.
PCA would return sorted number of (linearly uncorrelated)
components along with its variance.
Highest number of variance among instances.
PCA Ranking of Features
Prediction Accuracies
Break out
• Non Break out (die out)
Conclusion and Future Work
• A content independent network based classifier for predicting hashtag breakouts
• Next, we propose to study the utility of content based features such as keywords, named-entities, topics and sentiments.
Thank you for listening!
Any question?