debunking the myths of influence maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf ·...

33
Debunking the Myths of Influence Maximization Akhil Arora 1 , Sainyam Galhotra 1 , Sayan Ranu [email protected] University of Massachusetts, Amherst January 27, 2017 NEDB, 2017 1 The first two authors have contributed equally to this work. Influence Maximization January 27, 2017 NEDB, 2017 1 / 19

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Debunking the Myths of InfluenceMaximization

Akhil Arora1, Sainyam Galhotra1, Sayan Ranu

[email protected] of Massachusetts, Amherst

January 27, 2017NEDB, 2017

1The first two authors have contributed equally to this work.Influence Maximization January 27, 2017 NEDB, 2017 1 / 19

Page 2: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Information Propagation2: Need for Modelling??

Many real-world processes can be interpreted using conceptsfrom information propagationFor example: Spread of Diseases

2Propagation/Flow/Spread/Diffusion, would be used interchangeablyInfluence Maximization January 27, 2017 NEDB, 2017 2 / 19

Page 3: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Need for Modelling??

Traffic Congestion and its propagation

Influence Maximization January 27, 2017 NEDB, 2017 3 / 19

Page 4: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Other Applications

Using the word-of-mouth effect for:

Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns

Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .

Influence Maximization January 27, 2017 NEDB, 2017 4 / 19

Page 5: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Other Applications

Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns

Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .

Influence Maximization January 27, 2017 NEDB, 2017 4 / 19

Page 6: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Other Applications

Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns

Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .

Influence Maximization January 27, 2017 NEDB, 2017 4 / 19

Page 7: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Existing Information Propagation models

Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.

0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

E

F

G

H

Influence Maximization January 27, 2017 NEDB, 2017 5 / 19

Page 8: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Existing Information Propagation models

Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.

0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

E

F

G

H

Influence Maximization January 27, 2017 NEDB, 2017 5 / 19

Page 9: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Existing Information Propagation models

0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B

D

C

E

F

G

H

Influence Maximization January 27, 2017 NEDB, 2017 6 / 19

Page 10: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Existing Information Propagation models

0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

F

E

G

H

Influence Maximization January 27, 2017 NEDB, 2017 7 / 19

Page 11: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Existing Information Propagation models

0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

E

F

G

H

Influence Maximization January 27, 2017 NEDB, 2017 8 / 19

Page 12: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

The Influence Maximization (IM) Problem

Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-set

Task: Identify the set of most-influential nodes in a networkMaximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation

Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation

Influence Maximization January 27, 2017 NEDB, 2017 9 / 19

Page 13: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

The Influence Maximization (IM) Problem

Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network

Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation

Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation

Influence Maximization January 27, 2017 NEDB, 2017 9 / 19

Page 14: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

The Influence Maximization (IM) Problem

Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network

Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation

Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation

Influence Maximization January 27, 2017 NEDB, 2017 9 / 19

Page 15: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Need for benchmarking? Wide variety of techniques

MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread

SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence

Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard

Influence Maximization January 27, 2017 NEDB, 2017 10 / 19

Page 16: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Need for benchmarking? Wide variety of techniques

MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread

SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence

Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard

Influence Maximization January 27, 2017 NEDB, 2017 10 / 19

Page 17: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Need for benchmarking? Wide variety of techniques

MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread

SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence

Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard

Influence Maximization January 27, 2017 NEDB, 2017 10 / 19

Page 18: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Need for benchmarking? Wide variety of techniques

MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread

SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence

Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard

Influence Maximization January 27, 2017 NEDB, 2017 10 / 19

Page 19: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Need for benchmarking? : Ambiguities

Existing Literature: Use IC, WC interchangeablyActual scenario: Varied behaviour in terms of the spread of seednodes, efficiency and scalability aspects of different techniques.

0 100 200

102

104

106

Ru

nn

ing

tim

e (

se

cs)

Seeds (k)

ICWC

Figure: IMM (ε = 0.5) for Orkut dataset

Influence Maximization January 27, 2017 NEDB, 2017 11 / 19

Page 20: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Need for benchmarking? : Ambiguities

State-of-the-art technique in one aspect behaves the worst inanother aspect of the problem.

0 100 200Seeds (k)

103

104

105

Mem

ory

(MB

)

EaSyIMIMM

(a) Memory

0 100 200Seeds (k)

101

102

103

104

Run

ning

tim

e (s

ecs)

EaSyIMIMM

(b) Running Time

Influence Maximization January 27, 2017 NEDB, 2017 12 / 19

Page 21: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?

What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?

Influence Maximization January 27, 2017 NEDB, 2017 13 / 19

Page 22: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?

Are the claims made by the recent papers true?

Influence Maximization January 27, 2017 NEDB, 2017 13 / 19

Page 23: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?

Influence Maximization January 27, 2017 NEDB, 2017 13 / 19

Page 24: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Our Framework

Generic framework applicable on all techniques.Unified approach to tune the external parameters.

Setu

p

Algorithms Propagation Models Datasets Configurations

IM F

ram

ew

orkSeed Selection Spread Computation

Monte-Carlo (MC)

Simulations

Insig

hts

Eva

luati

on

Quality

Efficiency Scalability

Node

Ord

ering

Influen

ce

Estim

atio

n

Convergence

Robustness

Upda

te

Data

Str

uctu

res

#Parameters

Convergence calculationSelect the parameters which provide the best quality withouthampering the efficiency and scalability of the technique.

Influence Maximization January 27, 2017 NEDB, 2017 14 / 19

Page 25: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Our Framework

Generic framework applicable on all techniques.Unified approach to tune the external parameters.

Setu

p

Algorithms Propagation Models Datasets Configurations

IM F

ram

ew

orkSeed Selection Spread Computation

Monte-Carlo (MC)

Simulations

Insig

hts

Eva

luati

on

Quality

Efficiency Scalability

Node

Ord

ering

Influen

ce

Estim

atio

n

Convergence

Robustness

Upda

te

Data

Str

uctu

res

#Parameters

Convergence calculationSelect the parameters which provide the best quality withouthampering the efficiency and scalability of the technique.

Influence Maximization January 27, 2017 NEDB, 2017 14 / 19

Page 26: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Myths

IMM is always faster than TIM+?

Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x

Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+

over HepPH dataset for 200 seeds

Influence Maximization January 27, 2017 NEDB, 2017 15 / 19

Page 27: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Myths

IMM is always faster than TIM+?

Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x

Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+

over HepPH dataset for 200 seeds

Influence Maximization January 27, 2017 NEDB, 2017 15 / 19

Page 28: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Myths

CELF++ is the fastest IM technique in the MC estimationparadigm?

50

60

70

80

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (WC)

CELFCELF++

(c) Nethept (WC)

120

130

140

150

160

170

180

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (LT)

CELFCELF++

(d) Nethept (LT)

Influence Maximization January 27, 2017 NEDB, 2017 16 / 19

Page 29: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Myths

CELF++ is the fastest IM technique in the MC estimationparadigm?

50

60

70

80

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (WC)

CELFCELF++

(e) Nethept (WC)

120

130

140

150

160

170

180

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (LT)

CELFCELF++

(f) Nethept (LT)

Influence Maximization January 27, 2017 NEDB, 2017 16 / 19

Page 30: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Myths

SIMPATH is faster the LDAG?

0

0.2

0.4

0.6

0.8

1

1.2

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(g) Nethept

5

10

15

20

25

30

35

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(h) DBLP

Influence Maximization January 27, 2017 NEDB, 2017 17 / 19

Page 31: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Myths

SIMPATH is faster the LDAG?

0

0.2

0.4

0.6

0.8

1

1.2

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(i) Nethept

5

10

15

20

25

30

35

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(j) DBLP

Influence Maximization January 27, 2017 NEDB, 2017 17 / 19

Page 32: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Conclusions

No technique is the best on all aspects of IM.

Quality

EfficiencyMemory Footprint

TIM/IMMPMC

CELF/CELF++EaSyIM

ME

IRIEIMRank

StaticGreedy

LDAGSIMPATH

(k) Qualitative catego-rization of IM techniques

(l) Which technique to choose & when?

Influence Maximization January 27, 2017 NEDB, 2017 18 / 19

Page 33: Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam

Thanks!

For more details, please refer :A. Arora, S. Galhotra, S. Ranu. Debunking the Myths of InfluenceMaximization : An In-Depth Benchmarking Study. SIGMOD 2017

Influence Maximization January 27, 2017 NEDB, 2017 19 / 19