introduction of “insights into internet memes”

29
INTRODUCTION OF “INSIGHTS INTO INTERNET MEMES” Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (July 17, 2011 – July 21, 2011) Christian Bauckhage Fraunhofer IAIS Bonn, Germany

Upload: bart

Post on 23-Mar-2016

40 views

Category:

Documents


5 download

DESCRIPTION

Introduction of “Insights into Internet Memes”. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media ( July 17, 2011 – July 21, 2011) Christian Bauckhage Fraunhofer IAIS Bonn, Germany. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction of “Insights  into Internet  Memes”

INTRODUCTION OF “INSIGHTS INTO INTERNET MEMES”Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media

(July 17, 2011 – July 21, 2011)Christian Bauckhage

Fraunhofer IAISBonn, Germany

Page 2: Introduction of “Insights  into Internet  Memes”

Introduction Dawkin, R. 1976. The Selfish Gene. He

postulates memes as a cultural analogue of genes in order to explain how rumors, catch-phrases, melodies, or fashion trends replicate through a population.

Focuses on observable characteristics of Internet memes in epidemic outbreaks.

Means: email, instant messaging, forums, blogs, or social networking sites.

Page 3: Introduction of “Insights  into Internet  Memes”

Introduction Content Example:

Internet memes are inside jokes or pieces of hip underground knowledge, that many people are in on.

Page 4: Introduction of “Insights  into Internet  Memes”

Introduction Most Internet memes spread rapidly,

voluntarily, unpredictably and uncontrollably.

Viral marketing: public relation, advertisement, political campaigning. Image than information.

knowyourmeme.com, memedump.com, or memebase.com

Collected from: Google Insights, delicious.com, digg.com, and stumbleupon.com.

Page 5: Introduction of “Insights  into Internet  Memes”

Introduction Using mathematical epidemiology and log-

normal distributions in modeling the temporal dynamics of Internet memes.

Analyze similarities and differences among the time series data from different sources .

Introduce mathematical models of outbreak data and apply them to characterize Internet memes and their evolution.

Page 6: Introduction of “Insights  into Internet  Memes”

Related Work Identify influential members in a community

so as to contain the spread of misinformation or rumors (Budak, Agrawal, and Abbadi 2010; Shah and Zaman 2009). Set up models of how events disseminate

through online communities and use these to track memes through specific social media (Adar and Adamic 2005; Lin et al. 2010)

Investigate the interplay between social and traditional media (Leskovec, Backstrom, and Kleinberg 2009).

Page 7: Introduction of “Insights  into Internet  Memes”

Related Work Outbreak analysis for trend prediction is an

active area of research in epidemic modeling (Britton 2010).

An emergent property of the scale- free nature of social or communication networks (Keeling and Eames 2005; Lloyd and May 2001; Pastor-Satorras and Vespignani 2001).

Infer social relations from observations of information propagation among individuals (Myers and Leskovec 2010).

Page 8: Introduction of “Insights  into Internet  Memes”

Related Work Yang and Lescovec (2011) and Kubo et al. (2007)

cluster time series obtained from a micro blogging service in order to predict future interest in a topic.

The results indicate that more elaborate compartment models and log-normal distributions capture trend prediction more accurately.

Log-normal distributions :to accurately model a wide range of long tail phenomena (Limpert, Stahel, and Abbt 2001) including Internet measurements such as communication times or the growth of the web graph (Downey 2005; Mitzenmacher 2004).

Page 9: Introduction of “Insights  into Internet  Memes”

Data Collection and Preprocessing Collection of 150 Internet memes.

Page 10: Introduction of “Insights  into Internet  Memes”

Data Collection and Preprocessing Google Insights : A service by Google that

provides statistics on queries terms users have entered into the Google search engine.

Delicious : A social bookmarking service for storing web bookmarks.

Digg is a social news service where users can vote on web content submitted by others.

StumbleUpon is a discovery engine that recommends web content that has been entered by its users.

Page 11: Introduction of “Insights  into Internet  Memes”

Data Collection and Preprocessing Time series = [] T: From January 2004 to December 2010. (84

months) Comparing meme related activities across

different sources, the data were turned into probability vectors where = / .

Onset times using Teager-Kaiser operator : A signal processing technique to detect abrupt variations in a data stream.TK() =

For each time series, the earliest variation was define the onset time . Model fitting in was done using series = [, . . . , ].

Page 12: Introduction of “Insights  into Internet  Memes”

Immediate Observations and Implications

Over the years there is a growing correlation between the frequencies of meme related queries to Google and activities of the Delicious community.

Page 13: Introduction of “Insights  into Internet  Memes”

Immediate Observations and Implications

In an attempt to quantify this observation, the author examined weighted average annual correlations between series from Google Insights and their counterparts from the other services.

avgcorr(y) =

The largest correlations are between the Google and the Delicious time series.

Page 14: Introduction of “Insights  into Internet  Memes”

Immediate Observations and Implications

The author compared interests and behaviors of different communities and determined the average daily activities ranked in descending order.

The twenty highest ranking memes according to per day popularity in the Delicious, Digg, and StumbleUpon communities.

Page 15: Introduction of “Insights  into Internet  Memes”

Immediate Observations and Implications

In the case of Digg reflects its role as a social news service: if content or stories that just showed up on the Internet are posted at Digg, users react quickly to the news.

The shorter the time since onset, the more meme related daily activity there is. On the other hand, memes that have been around for a while hardly provoke further reactions from the Digg community.

A quarter of the memes that are most popular among the StumbleUpon community have to do with rather artistic content, which is in contrast to the most popular memes determined from Delicious. Users of recommendation engines are more after sophisticated content than after mundane jokes or fads.

Page 16: Introduction of “Insights  into Internet  Memes”

Modeling Meme Dynamics Investigate the use of two classes of

models and argue that and why log-normal distributions are well suited to represent the temporal dynamics of Internet memes.

Page 17: Introduction of “Insights  into Internet  Memes”

Compartment Models Compartment models are an established

approach to describe the progress of an epidemic in a large population.

SIRS model:population was divided into (S): susceptible to the disease, (I):those who are infectious, and (R):those who have recovered, which are governed by the following differential equations:

S˙(t) = −βI(t)S(t) + φR(t) I˙(t) = βI(t)S(t) − νI(t) R˙(t) = νI(t) − φR(t)

Page 18: Introduction of “Insights  into Internet  Memes”

Compartment Models Simpler models (of type SI, SIS, SIR) have

been used to study information dissemination within web-based communities (Kubo et al. 2007; Myers and Leskovec 2010) and were reported to give a good account of the interaction dynamics in social networks.

General assumption: meme related time series available from Google Insights correspond to the infectious rates I(t) of epidemic processes.

Page 19: Introduction of “Insights  into Internet  Memes”

Compartment Models However, differential equations are nonlinear so that model

fitting is complicated. The author use Markov Chain Monte Carlo methods and found SIRS type models to provide the best explanations of meme activity data.

SIRS models reproduces the general behavior of memes, but tend to underestimate the early contagious stages of the meme. This indicates that stochastic compartment models with constant parameters lack the flexibility required to accurately describe the temporal dynamics of Internet memes.

Page 20: Introduction of “Insights  into Internet  Memes”

Log-Normal Models Log-normal distributions have been successfully

used to model frequency distributions. (Wu and Huberman 2007; Lescovec, Adamic, and Huberman 2007; Crane and Sornette 2008).

A random variable x is log-normally distributed, if log(x) has a normal distribution. Accordingly, the probability density function of such a random variable is f(x) = exp

The distribution is only defined for positive values, skewed to the left, and often long-tailed. The mean μ and standard deviation σ of log(x) define the curve.

The process is governed by a time-dependent random variable γt such that =

Page 21: Introduction of “Insights  into Internet  Memes”

Log-Normal Models Such processes are commonly applied to describe

growth and decline in biological or economic systems and provide accurate models for a variety of Internet related phenomena, not for Internet memes.

The author determined the best fitting log-normal distribution using least squares optimization for each of the 150 time series.

Log-normal distributions provide a highly accurate account of the temporal dynamics of the memes.

Page 22: Introduction of “Insights  into Internet  Memes”

Log-Normal Models In order to quantify this impression, the

author performed with all 150 memes and found the p-values of SIRS. Log-normal models exceeded a confidence threshold of 0.9 in about 70% of the cases.

In 83% of the cases, the p-values obtained for log-normal fits exceeded those of the corresponding SIRS fits.

Kullback-Leibler divergence log

Page 23: Introduction of “Insights  into Internet  Memes”

Log-Normal Models Table 1 lists the resulting DKL measures (closer to 0.0

is better) for SIRS and log-normal fits. In 55% of the cases, the log-normal fits better DKL measures than SIRS model.

Log-normal distributions provide accurate descriptions.

Page 24: Introduction of “Insights  into Internet  Memes”

Implications and Application to Prediction

Log-normal do not model processes and mechanisms of meme spread but summarize corresponding time series.

Work by Dover, Goldberg, and Shapira (2010) shown that temporally log-normal diffusion rates indicate networks of log-normal link distributions.

For the majority of Internet memes data represented by log-normal distributions, the author conjecture that they spread through rather homogenous communities of similar interests and preferences instead of through the Internet at large.

Page 25: Introduction of “Insights  into Internet  Memes”

Implications and Application to Prediction

Apply the resulting descriptions in order to produce a compressed representation of memes in the space spanned by the shape parameters μ and σ.

The majority of memes is found in a cluster represented by the “salad fingers” meme.

Memes on the top right indicating a pattern of still increasing popularity.

Page 26: Introduction of “Insights  into Internet  Memes”

Implications and Application to Prediction

Forecast future evolution according to the corresponding log-normal model.

10-year forecasts for a collection of six memes.

Page 27: Introduction of “Insights  into Internet  Memes”

Conclusion The term Internet meme is used to describe evolving

content that rapidly gains popularity on the Internet.

Users of the Digg social news service react to recent memes and users of the StumbleUpon appear to be interested mostly in sophisticated memes.

Compartment models give a good account of the growth and decline patterns of memes yet lack the flexibility to characterize short-lived bursts of meme related activity.

Log-normal distributions account for time-dependent growth and decline rates.

Page 28: Introduction of “Insights  into Internet  Memes”

Conclusion Taking into account the fact that log-normal

diffusion processes indicate networks of log-normal link distributions (Dover, Goldberg, and Shapira 2010) and the observation that the globally scale free Internet graph appears to contain many log-normal sub-graphs(Pennock et al. 2002), the majority of currently famous Internet memes spreads through homogenous communities and social networks rather than through the Internet at large.

Page 29: Introduction of “Insights  into Internet  Memes”

Thanks for your attention!!