modeling and predicting retweeting dynamics via a mixture ...gdac.uqam.ca/ ·...

2
Modeling and predicting retweeting dynamics via a mixture process Jinhua Gao, Huawei Shen, Shenghua Liu and Xueqi Cheng [email protected], {shenhuawei, liushenghua, cxq}@ict.ac.cn CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ABSTRACT Modeling and predicting retweeting dynamics in social me- dia has important implications to an array of applications. Existing models either fail to model the triggering effect of retweeting dynamics, e.g., the model based on reinforced Poisson process, or are hard to be trained using only the retweeting dynamics of individual tweet, e.g., the model based on self-exciting Hawkes process. In this paper, mo- tivated by the observation that each retweeting dynamics is generally dominated by a handful of key nodes that sepa- rately trigger a high number of retweets, we propose a mix- ture process to model and predict retweeting dynamics, with each subprocess capturing the retweeting dynamics initiated by a key node. Experiments demonstrate that the proposed model outperforms the state-of-the-art model. Keywords retweeting dynamics; popularity prediction; mixture process 1. INTRODUCTION Different from traditional media where people access in- formation released generally by a single source, social media platforms, e.g., Facebook, Twitter, and Sina Weibo, offer a decentralized style of information spreading, i.e., infor- mation spreads among users via retweeting or forwarding. Consequently, popularity of individual information follows a highly-asymmetric distribution, and it is a challenging prob- lem to predict the popularity of information. Retweeting dynamics prediction, i.e., predicting how popular a piece of information will become, attracts much research attention in the last decade. Existing methods make predictions either by exploring relevant factors and applying standard regres- sion/classification methods, or by fitting retweeting dynam- ics using certain class of functions [1]. Recently, researchers begin to directly model the process that a piece of information gains its popularity. Shen et al. employed reinforced Poisson process (RPP) to model the arriving process of paper citations [2], and this model is Copyright is held by the author/owner(s). WWW’16 Companion, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4144-8/16/04. http://dx.doi.org/10.1145/2872518.2889389. subsequently extended to model the retweeting dynamics of microblog [3]. RPP assumes that information spreading is an arriving process of attention to a particular piece of infor- mation, simply using the count of retweets as the sole indi- cator of triggering effect of all retweets rather than modeling the triggering effect of each retweet separately. Self-exciting Hawkes process (SEHP) is then proposed to explicitly model the triggering effect of each retweet, with the arriving rate of retweets being determined by the aggregation of the trig- gering effect of all the previous retweets [4, 5, 6]. Although SEHP has the flexibility to capture the triggering effect of each retweet, it is impractical to learn the strength of trig- gering effect for each retweet when only the retweeting dy- namics of individual tweet is available. Therefore, we still lack an effective method to predict retweeting dynamics in social media. In this paper, we propose a mixture process to model and predict retweeting dynamics. Our model is motivated by the observation that the retweeting process of tweets could be generally characterized by a diffusion tree with only a hand- ful of key nodes, each triggering a high number of retweets (Fig. 1A). Thus, the whole process of retweeting dynamics could be divided into several subprocesses, each of which is well modeled via a RPP model (Fig. 1B), and the whole process is taken as the aggregation of these subprocesses (Fig. 1C). The model based on such a mixture process has higher flexibility than RPP model, and could be much easily trained than SEHP model. Experiments demonstrate that the proposed model outperforms the state-of-the-art model at predicting retweeting dynamics. 2. MODEL AND VALIDATION Retweeting dynamics of a tweet during a time period [0,T ] is denoted by a set of retweeting moments {ti }(1 i n), where n represents the total number of retweets up to time T . Without loss of generality, we have 0 = t 0 t 1 ≤···≤ t i ≤···≤ t n T . We model the retweeting dynamics as a mixture process of RPP characterized by the rate function x(t)= k l=1 λ l f (t - τ l ; θ l )c(t), (1) where k is the number of subprocesses, λ l captures the strength of triggering effect initiated by the source retweeter of the l-th subprocess, τ l is the moment when the l-th subprocess begins, f (t; θ l ) is the relaxation function that characterizes how the triggering effect matters over time modulated by pa- rameters θ l , and c(t) is the total number of retweets received 33

Upload: trinhnhan

Post on 13-Jul-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Modeling and predicting retweeting dynamics via amixture process

Jinhua Gao, Huawei Shen, Shenghua Liu and Xueqi [email protected], {shenhuawei, liushenghua, cxq}@ict.ac.cn

CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology,Chinese Academy of Sciences, Beijing, China

ABSTRACTModeling and predicting retweeting dynamics in social me-dia has important implications to an array of applications.Existing models either fail to model the triggering effect ofretweeting dynamics, e.g., the model based on reinforcedPoisson process, or are hard to be trained using only theretweeting dynamics of individual tweet, e.g., the modelbased on self-exciting Hawkes process. In this paper, mo-tivated by the observation that each retweeting dynamics isgenerally dominated by a handful of key nodes that sepa-rately trigger a high number of retweets, we propose a mix-ture process to model and predict retweeting dynamics, witheach subprocess capturing the retweeting dynamics initiatedby a key node. Experiments demonstrate that the proposedmodel outperforms the state-of-the-art model.

Keywordsretweeting dynamics; popularity prediction; mixture process

1. INTRODUCTIONDifferent from traditional media where people access in-

formation released generally by a single source, social mediaplatforms, e.g., Facebook, Twitter, and Sina Weibo, offera decentralized style of information spreading, i.e., infor-mation spreads among users via retweeting or forwarding.Consequently, popularity of individual information follows ahighly-asymmetric distribution, and it is a challenging prob-lem to predict the popularity of information. Retweetingdynamics prediction, i.e., predicting how popular a piece ofinformation will become, attracts much research attention inthe last decade. Existing methods make predictions eitherby exploring relevant factors and applying standard regres-sion/classification methods, or by fitting retweeting dynam-ics using certain class of functions [1].Recently, researchers begin to directly model the process

that a piece of information gains its popularity. Shen etal. employed reinforced Poisson process (RPP) to modelthe arriving process of paper citations [2], and this model is

Copyright is held by the author/owner(s).WWW’16 Companion, April 11–15, 2016, Montréal, Québec, Canada.ACM 978-1-4503-4144-8/16/04.http://dx.doi.org/10.1145/2872518.2889389.

subsequently extended to model the retweeting dynamics ofmicroblog [3]. RPP assumes that information spreading isan arriving process of attention to a particular piece of infor-mation, simply using the count of retweets as the sole indi-cator of triggering effect of all retweets rather than modelingthe triggering effect of each retweet separately. Self-excitingHawkes process (SEHP) is then proposed to explicitly modelthe triggering effect of each retweet, with the arriving rateof retweets being determined by the aggregation of the trig-gering effect of all the previous retweets [4, 5, 6]. AlthoughSEHP has the flexibility to capture the triggering effect ofeach retweet, it is impractical to learn the strength of trig-gering effect for each retweet when only the retweeting dy-namics of individual tweet is available. Therefore, we stilllack an effective method to predict retweeting dynamics insocial media.

In this paper, we propose a mixture process to model andpredict retweeting dynamics. Our model is motivated by theobservation that the retweeting process of tweets could begenerally characterized by a diffusion tree with only a hand-ful of key nodes, each triggering a high number of retweets(Fig. 1A). Thus, the whole process of retweeting dynamicscould be divided into several subprocesses, each of which iswell modeled via a RPP model (Fig. 1B), and the wholeprocess is taken as the aggregation of these subprocesses(Fig. 1C). The model based on such a mixture process hashigher flexibility than RPP model, and could be much easilytrained than SEHP model. Experiments demonstrate thatthe proposed model outperforms the state-of-the-art modelat predicting retweeting dynamics.

2. MODEL AND VALIDATIONRetweeting dynamics of a tweet during a time period [0, T ]

is denoted by a set of retweeting moments {ti}(1 ≤ i ≤ n),where n represents the total number of retweets up to timeT . Without loss of generality, we have 0 = t0 ≤ t1 ≤ · · · ≤ti ≤ · · · ≤ tn ≤ T . We model the retweeting dynamics as amixture process of RPP characterized by the rate function

x(t) =

k∑l=1

λlf(t− τl; θl)c(t), (1)

where k is the number of subprocesses, λl captures the strengthof triggering effect initiated by the source retweeter of thel-th subprocess, τl is the moment when the l-th subprocessbegins, f(t; θl) is the relaxation function that characterizeshow the triggering effect matters over time modulated by pa-rameters θl, and c(t) is the total number of retweets received

33

Real data RPP model

A

B

C

0

20

40

60

80

100

120

140

160

180

0 5 10 15 20 25 30 35 40 45

Fre

qu

ency

Hours after publication

Real data

Our model

Figure 1: Retweeting dynamics of a message in SinaWeibo. (A) Diffusion tree of the retweeting pro-cess; (B) Subprocesses, each being initiated by a keynode; (C) Retweeting dynamics of the message.

up to time t. In this paper, we use log-normal function asthe relaxation function as in [2].The probability of the i-th retweet arrives at ti, given the

(i− 1)-th retweet arrives at ti−1, can be written as

p1(ti|ti−1) = x(ti)exp

(−∫ ti

ti−1

x(t)dt

), (2)

and the probability of no retweet between [tn, T ] is

p0(T |tn) = exp

(−∫ T

tn

x(t)dt

). (3)

Therefore, the likelihood of the retweeting dynamics {ti}ni=1

during [0, T ] is

L = p0(T |tn)n∏

i=1

p1(ti|ti−1). (4)

The parameters (i.e., λl, θl, τl) are estimated by maximizingthe log likelihood, and the retweeting dynamics of messagecould be predicted as

c(t) = n ∗ exp

(∫ t

T

k∑l=1

λlf(s− τl; θl)ds

). (5)

To evaluate the effectiveness of the proposed model, wecompare it with the RPP model [2] on a dataset crawledfrom Sina Weibo, the largest microblogging platform. Wefilter out the tweets with less than 100 retweets, and obtainthe retweeting dynamics of 164 tweets. For each tweet, weuse its retweeting dynamics in the first two hours for train-ing, and predict its dynamics at the following hours. In theexperiments, we set k=3 with the prior that few tweets hasmore than 3 key nodes in its retweeting dynamics in the firsttwo hours. We use MAPE (mean average percentage error)to evaluate prediction accuracy. Results are shown in Fig. 2.

0

100

200

300

400

500

600

700

800

900

1000

0 500 1000 1500

Real data

Our model

RPP model

0

50

100

150

200

250

300

350

400

450

500

0 50 100 150 200

Minutes after publication

Acc

um

ula

tiv

e re

twee

t co

un

t

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0 5 10 15 20

MA

PE

Hours after training period

Our model

RPP model

Figure 2: Comparison with the RPP model.

To offer some intuitions about the comparison, we also givetwo examples. We can see that our model not only fits wellthe retweeting dynamics in training period, but also has amore desirable prediction accuracy than competing model.

3. CONCLUSIONIn this paper, we proposed a mixture process to model

and predict retweeting dynamics in social media. The ideais motivated by the fact that the whole process of retweet-ing dynamics could be effectively captured by a handful ofsubprocesses initiated by retweeters with high number of fol-lowers. The proposed model is efficient to be trained usingonly the retweeting dynamics of individual tweet. Experi-mental results demonstrate the effectiveness of the proposedmodel, compared with the state-of-the-art model.

4. ACKNOWLEDGMENTSThis work was funded by the National Basic Research

Program of China (the 973 program) with grant numbers2014CB340401 and 2013CB329602, and the National Natu-ral Science Foundation of China with grant numbers 61472400,61232010, and 61572467.

5. REFERENCES[1] G. Szabo and B. A. Huberman. Predicting the

popularity of online content. Communications of theACM 53(8):80–88, 2010.

[2] H. W. Shen, D. Wang, C. Song, and A.-L. Barabasi.Modeling and predicting popularity dynamics viareinforced poisson processes. In AAAI 2014, pp.291–297.

[3] S. Gao, J. Ma, and Z. Chen. Modeling and predictingretweeting dynamics on microblogging platforms. InWSDM 2015, pp. 107–116.

[4] R. Crane and D. Sornette. Robust dynamic classesrevealed by measuring the response function of a socialsystem. PNAS, 105(41):15649–15653, 2008.

[5] P. Bao, H. W. Shen, X. Jin, and X. Q. Cheng. Modelingand predicting popularity dynamics of microblogs usingself-excited hawkes processes. In WWW 2015, pp. 9–10.

[6] Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, andJ. Leskovec. SEISMIC: A Self-Exciting Point ProcessModel for Predicting Tweet Popularity. In KDD 2015,pp. 1513–1522.

34