a network based model for predicting a hashtag break out in twitter

23
A Network-Based Model for Predicting Hashtag Breakouts in Twitter

Upload: sultan-alzahrani

Post on 17-Jul-2015

76 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: A network based model for predicting a hashtag break out in twitter

A Network-Based Model for Predicting Hashtag Breakouts in Twitter

Page 2: A network based model for predicting a hashtag break out in twitter

Agenda

Background

Methodology

Our visualization tool

Experiment & Results

Page 3: A network based model for predicting a hashtag break out in twitter

Introduction

Tweets:

Textual contents

User interaction: retweeting,

mentioning, replying, etc.

Hashtags:

tagging mechanism created

by users

Help in categorizing tweets

Become very popular in

trending topics

Page 4: A network based model for predicting a hashtag break out in twitter

Some Definitions

Tweet Hashtag Volume: Number of tweets “containing a given

hashtag” per day.

Spike: sharp increase in the volume

Page 5: A network based model for predicting a hashtag break out in twitter

Research Question

Some hashtags become viral.

Can we predict whether a hashtag will go viral at nascent

stages?

Network base?

Textual Content base?

Page 6: A network based model for predicting a hashtag break out in twitter

Viral Diffusion

Network Based Analysis

• Arruda et al. examined the role of centrality measures in diseasespread on a SIR model and spreading rumors on a social network.

• In SIR model for rumors, infected individuals recover by someprobability while a spreader becomes a carrier thru contacts insocial networks.

Content Based Analysis

• Hypothesized that a specific groups of words are more likely to be contained in viral tweets.

• Li et al. analyzed tweets in terms of emotional divergence aspects (or sentiment analysis) and noted that highly interactive tweets tend to contain more negative emotions than other tweets.

Page 7: A network based model for predicting a hashtag break out in twitter

Running average and standard deviation

20 days sliding window

Page 8: A network based model for predicting a hashtag break out in twitter

Running Average and Standard Deviation

20 days sliding window

Page 9: A network based model for predicting a hashtag break out in twitter

Hashtag Volume

Page 10: A network based model for predicting a hashtag break out in twitter

Utilizing Three Sigma Rule

68-95-99.7 Rule

Empirical rule

Page 11: A network based model for predicting a hashtag break out in twitter

Hashtags Distribution

Page 12: A network based model for predicting a hashtag break out in twitter

Accumulative Period

Break out or Die

out?

Build a

predictive

learning model

based on …

Page 13: A network based model for predicting a hashtag break out in twitter

Accumulative Period

Break out or Die

out?

Build a

predictive

learning model

based on …

Page 14: A network based model for predicting a hashtag break out in twitter

Break out vs Die out

Break out Non break out (Die out)

Page 15: A network based model for predicting a hashtag break out in twitter

Our Approach

Can we predict #Hashtag breakouts in Twitter at their early stages using local and global network interaction measures ?

Local measures: interaction network within the 20 days accumulation window.

Global measures: interaction network from earlier until the end of the current window.

1. Define a 3-sigma/empirical rule based breakout measure2. Model evolutionary episodes of hashtag volumes, as:

• Accumulation, Breakout, Die-Out3. Extract local and global network features 4. Train and test a classifier to:

• Predict if Accumulation leads to Breakout or Die-Out

Page 16: A network based model for predicting a hashtag break out in twitter

IDENTIFY evolutionary episodes in #Hashtag volume time-series

BreakoutAccumulation Die-out Accumulation Die-out

Page 17: A network based model for predicting a hashtag break out in twitter

Trending Hashtag Forcaster

Page 18: A network based model for predicting a hashtag break out in twitter

Local and global network measures are computed as features

Network measures:

Eigen Vector Centrality

Page Rank

Closeness Centrality

Betweeness Centrality

Degree Centrality

Indegree Centrality

Outdegree Centrality

Link Rate

Distinct Link Rate

Number of Uninfected neighbors of early adopters

Neighborhood average degree

Page 19: A network based model for predicting a hashtag break out in twitter

PCA Ranking of Features

Exploratory method: reducing the original measure

variables by orthogonal transformation.

PCA would return sorted number of (linearly uncorrelated)

components along with its variance.

Highest number of variance among instances.

Page 20: A network based model for predicting a hashtag break out in twitter

PCA Ranking of Features

Page 21: A network based model for predicting a hashtag break out in twitter

Prediction Accuracies

Break out

• Non Break out (die out)

Page 22: A network based model for predicting a hashtag break out in twitter

Conclusion and Future Work

• A content independent network based classifier for predicting hashtag breakouts

• Next, we propose to study the utility of content based features such as keywords, named-entities, topics and sentiments.

Page 23: A network based model for predicting a hashtag break out in twitter

Thank you for listening!

Any question?