towards maximising cross-community information diffusion

13
© Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Towards Cross-Community Information Diffusion Maximisation Václav Belák, Samantha Lam, Conor Hayes

Upload: vaclav-belak

Post on 28-Nov-2014

221 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Towards Maximising Cross-Community Information Diffusion

© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Towards Cross-Community Information Diffusion

Maximisation Václav Belák, Samantha Lam, Conor Hayes

Page 2: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Motivation

•  Information cascades of high interest in marketing, CRM, etc. •  A common approach is to maximise information diffusion by

targeting influential actors •  In the context of many online communities (e.g. discussion

fora) the information is shared to the community as a whole and not to individual actors

common case – targeting individuals cross-community case – targeting communities

Page 3: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Objectives

•  Our main hypothesis is that it is possible to efficiently spread a message over the information flow network by targeting highly influential communities

•  The main problem is then formulated as a prediction of the set of communities to target such that the message is spread over the network as much as possible •  Spread over the actors, i.e. user activation fraction •  Spread over the communities, i.e. community

activation fraction

Page 4: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

•  We propose (Belák et al., ‘12) to take two factors into account: 1.  degree of community membership of the users 2.  centrality of the users within each community

•  Impact of community A on community B defined as an average centrality of actors from A within B, weighted by their membership in A

Methods: Definition of Impact

Page 5: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Methods: Targeting Communities

•  Level of dispersion (heterogeneity) of total impact of community i can be measured as an entropy of an i-th row/column of the impact matrix

•  We propose to target communities by means of the product of the total impact of community i and its entropy: impact focus (IF)

•  We simulated the diffusion by extending Independent Cascade (ICM) and Linear Threshold (LTM) Models (Kempe et al., ‘03)

1.  Take q target communities and sample s users from each of them 2.  Run the original models from the union of sampled users

•  Information diffusion network derived from the reply-to network:

i jreplies to

information

flow wiji j

rji

Page 6: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Evaluation Strategy

•  IF compared with random targeting (R), and group in-degree (GI) (Everett & Borgatti, ’99)

•  The main aim was to investigate robustness of our framework with respect to:

•  Character of the system •  Diffusion models •  User and Community Activation Fractions

•  Procedural outline 1.  Target q communities using one of the heuristics evaluated on

the data from time-slice t 2.  Run the diffusion model on the network from time-slice t+1 3.  Compute an average user and community spreads over all pairs (t, t+1)

Page 7: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Evaluation Data-Sets

•  51 weeks of data of the largest Irish discussion board system •  Segmented using 1 week sliding window

•  1 week window represents approx. 84% of cross-fora posting activity

•  540 communities, 5.3k users/snapshot (avg)

•  5 years of data from the technical support fora of SAP •  Used only for the diffusion experiments •  Segmented using 2 months sliding window

•  2 months represent approx. 50% of cross-fora posting activity

•  33 communities, 2k users/snapshot (avg)

Page 8: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

User Act. Fraction

One targeted community

5 10 15 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

q=1, Boards−LTM

user sample size (s)

mea

n us

er a

ctiva

tion

fract

ion

(u)

IFGIR

5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

q=1, SAP−LTM

user sample size (s)

mea

n us

er a

ctiva

tion

fract

ion

(u)

IFGIR

Page 9: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Community Act. Fr.

One targeted community

5 10 15 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

q=1, Boards−LTM

user sample size (s)

mea

n co

mm

unity

act

ivatio

n fra

ctio

n (c

)

IFGIR

5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

q=1, SAP−LTM

user sample size (s)

mea

n co

mm

unity

act

ivatio

n fra

ctio

n (c

)

IFGIR

Page 10: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Community Act. Fr.

Five targeted communities

5 10 15 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

q=5, Boards−LTM

user sample size (s)

mea

n co

mm

unity

act

ivatio

n fra

ctio

n (c

)

IFGIR

5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

q=5, SAP−LTM

user sample size (s)

mea

n co

mm

unity

act

ivatio

n fra

ctio

n (c

)

IFGIR

Page 11: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Results Highlights

•  Diffusion process became saturated at approximately 80% of users or communities in Boards, and 30% in SAP

•  More efficient to target few communities

•  Impact Focus outperformed the other two strategies with respect to both user and community activation fractions, namely for small number of targeted communities (i.e. [1, 2]) and seed users (i.e. [1, 20])

•  Diminishing returns

•  For high number of targeted communities and seed users, random strategy outperformed the other two with respect to community activation fractions in SAP data-set

•  SAP network fragmented into many small components, which made it hard to reach peripheral communities

Page 12: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusion

•  The evaluation demonstrated that the framework •  is able to identify highly influential communities •  can predict which communities to target s.t. the

message spreads efficiently over both individual users and communities

•  We aim to extend it with content analysis •  E.g. What are the most influential communities with

respect to a particular topic?

•  We will also investigate empirically-observed topic cascades and modify our models accordingly if needed

Page 13: Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Questions?

References •  Belák V., Lam S., Hayes C. Cross-Community Influence in Discussion

Fora. ICWSM. AAAI, 2012. •  M. Everett and S. Borgatti. The centrality of groups and classes. J. of

Mathematical Sociology, 23(3):181–201, 1999. •  D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of

influence through a social network. SIGKDD. ACM, 2003.