cs 599: social media analysis university of southern california1 information diffusion kristina...

55
CS 599: Social Media Analysis University of Southern California 1 Information Diffusion Kristina Lerman University of Southern California

Upload: elijah-adam-conley

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

CS 599: Social Media Analysis

University of Southern California 1

Information Diffusion

Kristina LermanUniversity of Southern California

Page 2: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Information diffusion on Twitter follower graph

Page 3: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Diffusion on networks• The spread of disease, ideas, behaviors, … on a network can

be described as a contagion process where an active node (infected/informed/adopted) activates its non-active neighbors with some probability– … creates a cascade on a network

• How large do cascades become?• What determines their growth?

Page 4: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Gangnam style

• "Gangnam Style" became the first YouTube video to reach one billion views

• As of May 31, 2014, the music video has been viewed over two billion times almost 13,000 man-years!

Page 5: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Ebola outbreak

Page 6: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Studying diffusion: data

• Large-scale data about contagion processes is now available– YouTube views [Crane & Sornette 2008]– Flickr favorites [Cha, Mislove & Gummadi, 2009]– Twitter retweets [Ghosh & Lerman, 2011]– Facebook likes [Dow, Adamic & Frigerri, 2014]

• Challenges– Volume of the data

• Storing and processing data– Complexity

• How does the “whole” depend on its “parts”• Networks add to complexity

Page 7: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Studying diffusion: methods• Analytic models

– Model cascading behavior, e.g., differential equation• Solve model under different conditions

• Simulations– Implement a model to synthetically recreate the process

• Empirical studies– Does observations of real-world data agree with model

and simulations results?

Page 8: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Cascading Behavior in Complex Socio-technical Networks (Borge-Holthoefer et al.)• Research questions

– How can global cascades occur on sparse networks?– What affects cascade growth?

• Network topology• How node is activated by an active neighbor• Properties of the diffusing item?

– How can cascades be characterized?• Models of diffusion on networks

– Threshold model– Epidemic models– Complex contagion

• Empirical data allow testing of models

Page 9: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Threshold model (Watts 2002)• Each node has some “infection threshold” i

– Node becomes infected if fraction of infected neighbors is more than threshold

infected

exposed

1 2

3 4

Exposure response function

infe

ctio

n pr

ob.

number infected neighbors

1

iki

Page 10: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Threshold model (Watts 2002)• Under some conditions, global cascades can start from a few

“infected” seeds– Network topology and individual thresholds interact in

cascading behavior

Page 11: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Epidemic models• Infected nodes propagate contagion to susceptible neighbors

with probability (transmissibility or virality of contagion)

infected

exposed

Exposure response function

infe

ctio

n pr

ob.

number infected neighbors

1

Page 12: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Epidemic models• Epidemic threshold :

– For < , localized cascades (epidemic dies out)– For >, global cascades

• Epidemic threshold depends on topology only: largest eigenvalue of adjacency matrix of the network– True for any network

Num

. in

fect

ed

node

s

N

Epidemic threshold0

Transmissibility,

Page 13: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Complex contagion• Virus can propagate with a single exposure. Spread of

behaviors requires multiple exposures.• Non-monotonic exposure response

Exposure response function

infe

ctio

n pr

ob.

number infected neighbors

1

infected

exposed

Page 14: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Characterizing cascades• Connected tree-like subgraph. Typically star-like• Size related to centrality

Page 15: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Seeding large outbreaks• How to select seeds that will initiate large outbreaks?

– Influence maximization• Are some network positions better at triggering large

outbreaks?– Being a hub is sufficient but not necessary

• “Million follower fallacy” (Cha et al)– “hub fire wall” – epidemics die out when reaching a hub

Page 16: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Epidemic Spreading on Real Networks: An Eigenvalue Viewpoint [Wang et al, 2003]• Research questions

– How do epidemic cascades on a real network?– Does an epidemic threshold exist for a given network?

• Contributions– Model how epidemics propagate on a network– Propagation depends on network topology

• epidemic threshold is related to the largest eigenvalue of the adjacency graph describing the network

Page 17: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Homogeneous mixing model• Homogeneous mixing

– Each node interacts with every other node• Infection rate : a node infects neighbor with probability • Curing rate : infected node is cured with probability

infected

exposed

cured

Page 18: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Homogeneous mixing model• Homogeneous mixing

– Each node interacts with every other node• Infection rate : a node infects neighbor with probability • Curing rate : infected node is cured with probability

infected

exposed

cured

Page 19: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Homogeneous mixing model• Homogeneous mixing

– Each node interacts with every other node• Infection rate : a node infects neighbor with probability • Curing rate : infected node is cured with probability

infected

exposed

cured

Page 20: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Homogeneous mixing model• Homogeneous mixing

– Each node interacts with every other node• Infection rate : a node infects neighbor with probability • Curing rate : infected node is cured with probability

infected

exposed

cured

Page 21: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Homogeneous mixing: epidemic threshold• Infection rate : node infects neighbor with probability • Curing rate : node is cured with probability

– Number of infected nodes: Ninf = (1-/<k>)N– Epidemic threshold: critical value of / =1/<k>

• beyond which Ninf N, but below Ninf 0

infected

exposed

cured

Page 22: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Epidemics on networks• Homogeneous mixing model is a good approximation of virus

propagation in a population where contact among individuals is homogeneous, i.e., each individual is equally likely to encounter another– Public spaces: airports, shopping centers, …– Schools– Public transportation

• But, social interactions are usually structured– what role does network structure play in epidemic

spread?– How does the size of cascades depend on network

properties?

Page 23: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Model of epidemic cascades on a network

Page 24: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Simulations on real and synthetic graphs• Simulate epidemics on

– Real-world networks– Scale-free graphs (power law degree distribution)– Random graphs (Poisson degree distribution)

• Results are the same as homogeneous mixing model• Simulations steps

– Start with a set of randomly chosen infected nodes– At each time step

• Infected node attempts to infect each neighbor (probability )• An infected node is cured (probability )

– Continue until number of infected nodes no longer changes

Page 25: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Simulation results on real-world network• Simulations on 10,900 node Oregon network graph, with

<k>=5.72, =0.14

/=1.75 /=0.58

Cascade size vs time

Page 26: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Epidemic threshold

Page 27: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Epidemic threshold and cascade growth

/=0.4

/=0.2

/=0.13

/=0.1/=0.06

Page 28: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Epidemic threshold and cascade size

Num

. inf

ecte

d no

des

N

Epidemic threshold

0

Effective Transmissibility,

Page 29: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Summary• A variety of models proposed to explain cascading behavior

on networks– Some models explain the relationship between properties

of the network and properties of cascades, e.g., epidemic threshold depends on the eigenvalue of the adjacency matrix of the graph

– Some models can produce global cascades• What does data say?

Page 30: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

The Structure of

Online Diffusion Networks

SHARAD GOEL, Yahoo! Research DUNCAN J. WATTS, Yahoo! Research

DANIEL G. GOLDSTEIN, Yahoo! Research

Page 31: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 32: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 33: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 34: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 35: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

“A relatively small number of seeds can trigger a relatively large number of adoptions via some, usually multistep, diffusion process”

Page 36: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

How oftenHow much

Is it worth it

Page 37: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Findings

Most cascades small and shallowMost adoptions lie in such cascades. Rare for adoptions to result from chains of referrals

Page 38: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Yahoo! Kindnessone month period in 2010, Yahoo!’s philanthropic arm launched a website (kindness.yahoo.com)

59,000 users adopted the campaign 7 Different 7 Different

SourcesSources

Page 39: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Zync a plug-in for Yahoo! Messenger, an instant messaging (IM) application, that allows pairs of users to watch videos synchronously while sending instant messages to one another.

7 Different 7 Different SourcesSources

Page 40: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

The Secretary Game Players are encouraged to share the game’s URL with at least three other people with an explanation that the game designers are seeking the world’s best players.

7 Different 7 Different SourcesSources

Page 41: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Twitter News Stories. 80,000 news stories posted on the Twitter during November 2011, where the original article was distributed by one of five popular news sites: The New York Times, CNN, MSNBC, Yahoo! News, and The Huffington Post.

Tweeted Adopted7 Different 7 Different

SourcesSources

Page 42: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Twitter Videos 540,000 YouTube videos posted on Twitter during November 2011

Tweeted Adopted7 Different 7 Different

SourcesSources

Page 43: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Friend Sense third-party Facebook application that queried respondents about their political views as well as their beliefs about their friends’ political views

7 Different 7 Different SourcesSources

Page 44: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Yahoo! Voice paid service launched in 2004 that allows users to make voice- over-IP calls to phones through Yahoo! Messenger.

1.8 million users purchased voice credits, who are defined as adopters

7 Different 7 Different SourcesSources

Page 45: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Data SourcesVaried

CostNature of the networkIncentiveTimescale

Page 46: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

• d

Page 47: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

“The usual intuition regarding heavy-tailed distributions, however,

is that large events, although rare, are sufficiently large to dominate certain key properties of the corresponding system.”

Page 48: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 49: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 50: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 51: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California
Page 52: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Authors point of view

Diffusion on online social networks does not really follow epidemic models.

Researchers should focus on sub-critical process.

Page 53: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Authors point of view

What accounts for sudden popularity of some YouTube videos or products like Gmail and Facebook?

Mass Media and traditional advertisement.

Page 54: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

MapReduce parallel computation framework

Tree Canonicalization

Implementation Details (Time Permitting)

Page 55: CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California

Questions?!Comments!

Thanks For Listening