nonparametric link prediction in dynamic graphs

66
Nonparametric Link Prediction in Dynamic Graphs Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley) 1

Upload: deion

Post on 24-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Nonparametric Link Prediction in Dynamic Graphs. Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley). Link Prediction. Who is most likely to be interact with a given node?. Should Facebook suggest Alice as a friend for Bob ?. Alice. Bob. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nonparametric  Link Prediction in Dynamic Graphs

1

Nonparametric Link Prediction in Dynamic Graphs

Purnamrita Sarkar (UC Berkeley)Deepayan Chakrabarti (Facebook)Michael Jordan (UC Berkeley)

Page 2: Nonparametric  Link Prediction in Dynamic Graphs

2

Link Prediction Who is most likely to be interact with a given node?

Friend suggestion in Facebook

Should Facebook suggest Alice

as a friend for Bob?

Bob

Alice

Page 3: Nonparametric  Link Prediction in Dynamic Graphs

3

Link Prediction

Alice

Bob

Charlie

Movie recommendation in Netflix

Should Netflix suggest this

movie to Alice?

Page 4: Nonparametric  Link Prediction in Dynamic Graphs

4

Link Prediction Prediction using simple features

degree of a node number of common neighbors last time a link appeared

What if the graph is dynamic?

Page 5: Nonparametric  Link Prediction in Dynamic Graphs

5

Related Work

Generative models Exp. family random graph models [Hanneke+/’06] Dynamics in latent space [Sarkar+/’05] Extension of mixed membership block models

[Fu+/10] Other approaches

Autoregressive models for links [Huang+/09] Extensions of static features [Tylenda+/09]

Page 6: Nonparametric  Link Prediction in Dynamic Graphs

6

Goal

Link Prediction incorporating graph dynamics, requiring weak modeling assumptions, allowing fast predictions, and offering consistency guarantees.

Page 7: Nonparametric  Link Prediction in Dynamic Graphs

7

Outline

Model Estimator Consistency Scalability Experiments

Page 8: Nonparametric  Link Prediction in Dynamic Graphs

8

The Link Prediction Problem in Dynamic Graphs

G1 G2 GT+1……

Y1 (i,j)=1

Y2 (i,j)=0

YT+1 (i,j)=?

YT+1(i,j) | G1,G2, …,GT ~ Bernoulli (gG1,G2,…GT(i,j))

Edge in T+1 Features of previous graphsand this pair of nodes

Page 9: Nonparametric  Link Prediction in Dynamic Graphs

9

cn

ℓℓ

deg

Including graph-based features

Example set of features for pair (i,j): cn(i,j) (common neighbors) ℓℓ(i,j) (last time a link was formed) deg(j)

Represent dynamics using “datacubes” of these features. ≈ multi-dimensional histogram on binned feature values

ηt = #pairs in Gt with these features

1 ≤ cn ≤ 33 ≤ deg ≤ 61 ≤ ℓℓ ≤ 2

ηt+ = #pairs in Gt with these

features, which had an edge in Gt+1

high ηt+/ηt this feature

combination is more likely to create a new edge at time t+1

Page 10: Nonparametric  Link Prediction in Dynamic Graphs

10

G1 G2 GT……

Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=?

1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2

Including graph-based features

How do we form these datacubes? Vanilla idea: One datacube for Gt→Gt+1

aggregated over all pairs (i,j) Does not allow for differently evolving communities

Page 11: Nonparametric  Link Prediction in Dynamic Graphs

11

YT+1 (i,j)=?

1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2

Our Model

How do we form these datacubes? Our Model: One datacube for each neighborhood

Captures local evolution

G1 G2 GT……

Y1 (i,j)=1 Y2 (i,j)=0

Page 12: Nonparametric  Link Prediction in Dynamic Graphs

12

Our Model

Number of node pairs- with feature s- in the neighborhood of i- at time t

Number of node pairs- with feature s- in the neighborhood of i- at time t- which got connected at time t+1

Datacube

1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2

Neighborhood Nt(i)= nodes within 2 hops

Features extracted from (Nt-p,…Nt)

Page 13: Nonparametric  Link Prediction in Dynamic Graphs

13

Our Model

Datacube dt(i) captures graph evolution in the local neighborhood of a node in the recent past

Model:

What is g(.)?

YT+1(i,j) | G1,G2, …,GT ~ Bernoulli ( gG1,G2,…GT(i,j))g(dt(i), st(i,j))

Features of the pair

Local evolution patterns

Page 14: Nonparametric  Link Prediction in Dynamic Graphs

14

Outline

Model Estimator Consistency Scalability Experiments

Page 15: Nonparametric  Link Prediction in Dynamic Graphs

15

Kernel Estimator for g

G1 G2 …… GTGT-1GT-2

query data-cube at T-1 and feature vector at time T

compute similarities

datacube, feature pair

t=1

{{

{

{

{

{

{

{

datacube, feature pair

t=2

{{

{

{

{

{

{

{

…datacube,

feature pair t=3

{{

{

{

{

{

{

{

…{

{

Page 16: Nonparametric  Link Prediction in Dynamic Graphs

16

Factorize the similarity function Allows computation of g(.) via simple lookups

}} }

K( , )I{ == }

Kernel Estimator for g

Page 17: Nonparametric  Link Prediction in Dynamic Graphs

17

Kernel Estimator for g

G1 G2 …… GTGT-1GT-2

datacubes t=1

datacubes t=2

datacubes t=3

compute similarities only between data cubes

w1

w2

w3

w4

η1 , η1+

η2 , η2+

η3 , η3+

η4 , η4+

44332211

44332211

wwwwwwww

Page 18: Nonparametric  Link Prediction in Dynamic Graphs

18

Factorize the similarity function Allows computation of g(.) via simple lookups What is K( , )?

}}

}

K( , )I{ == }

Kernel Estimator for g

Page 19: Nonparametric  Link Prediction in Dynamic Graphs

19

Similarity between two datacubes

Idea 1 For each cell s, take

(η1+/η1 – η2

+/η2)2 and sum

Problem: Magnitude of η is ignored 5/10 and 50/100 are treated

equally

Consider the distribution

η1 , η1+

η2 , η2+

Page 20: Nonparametric  Link Prediction in Dynamic Graphs

20

Similarity between two datacubes

0 5 10 15 20 25 30 35 40 450

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 5 10 15 20 25 30 35 40 450

0.02

0.04

0.06

0.08

0.1

0.12

0.14

) , dist(b) , K( 0<b<1

As b0, K( , ) 0 unless dist( , ) =0

Idea 2 For each cell s, compute

posterior distribution of edge creation prob.

dist = total variation distance between distributions summed over all cells

η1 , η1+

η2 , η2+

Page 21: Nonparametric  Link Prediction in Dynamic Graphs

21

1tη) , K(#1f

) , (f) , (h) , (g

1tη) , K(

#1h

Want to show: gg

Kernel Estimator for g

Page 22: Nonparametric  Link Prediction in Dynamic Graphs

22

Outline

Model Estimator Consistency Scalability Experiments

Page 23: Nonparametric  Link Prediction in Dynamic Graphs

23

Consistency of Estimator

Lemma 1: As T→∞, for some R>0,

Proof using:

) , (f) , (h) , (g

As T→∞,

Page 24: Nonparametric  Link Prediction in Dynamic Graphs

24

Consistency of Estimator

Lemma 2: As T→∞,

) , (f) , (h) , (g

Page 25: Nonparametric  Link Prediction in Dynamic Graphs

25

Consistency of Estimator

Assumption: finite graph Proof sketch:

Dynamics are Markovian with finite state spacethe chain must eventually enter a closed, irreducible communication classgeometric ergodicity if class is aperiodic(if not, more complicated…)strong mixing with exponential decayvariances decay as o(1/T)

Page 26: Nonparametric  Link Prediction in Dynamic Graphs

26

Consistency of Estimator

Theorem:

Proof Sketch:

for some R>0

So

Page 27: Nonparametric  Link Prediction in Dynamic Graphs

27

Outline

Model Estimator Consistency Scalability Experiments

Page 28: Nonparametric  Link Prediction in Dynamic Graphs

28

Scalability Full solution:

Summing over all n datacubes for all T timesteps Infeasible

Approximate solution: Sum over nearest neighbors of query datacube

How do we find nearest neighbors? Locality Sensitive Hashing (LSH)

[Indyk+/98, Broder+/98]

Page 29: Nonparametric  Link Prediction in Dynamic Graphs

29

Using LSH

Devise a hashing function for datacubes such that “Similar” datacubes tend to be hashed to the

same bucket “Similar” = small total variation distance

between cells of datacubes

Page 30: Nonparametric  Link Prediction in Dynamic Graphs

30

0 5 10 15 20 25 30 35 40 450

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Using LSH

Step 1: Map datacubes to bit vectors

Use B2 bits for each bucket For probability mass p the first bits are set to

1Use B1 buckets to discretize [0,1]

Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells

Page 31: Nonparametric  Link Prediction in Dynamic Graphs

31

Using LSH

Step 1: Map datacubes to bit vectors Total variation distance

L1 distance between distributions Hamming distance between vectors

Step 2: Hash function = k out of MB1B2 bits

Page 32: Nonparametric  Link Prediction in Dynamic Graphs

32

Fast Search Using LSH

1111111111000000000111111111000

10000101000011100001101010000

10101010000011100001101010000

101010101110111111011010111110

1111111111000000000111111111001

00000001

1111

0011

.

.

.

.

1011

Page 33: Nonparametric  Link Prediction in Dynamic Graphs

33

Outline

Model Estimator Consistency Scalability Experiments

Page 34: Nonparametric  Link Prediction in Dynamic Graphs

34

Experiments

Baselines LL: last link (time of last occurrence of a pair)

CN: rank by number of common neighbors in AA: more weight to low-degree common neighbors Katz: accounts for longer paths

CN-all: apply CN to AA-all, Katz-all: similar

ss

Page 35: Nonparametric  Link Prediction in Dynamic Graphs

35

Setup

Pick random subset S from nodes with degree>0 in GT+1

, predict a ranked list of nodes likely to link to s Report mean AUC (higher is better)

G1 G2 GT

Training data Test dataGT+

1

Page 36: Nonparametric  Link Prediction in Dynamic Graphs

36

Simulations Social network model of Hoff et al.

Each node has an independently drawn feature vector

Edge(i,j) depends on features of i and j Seasonality effect

Feature importance varies with seasondifferent communities in each season

Feature vectors evolve smoothly over timeevolving community structures

Page 37: Nonparametric  Link Prediction in Dynamic Graphs

37

Simulations

NonParam is much better than others in the presence of seasonality

CN, AA, and Katz implicitly assume smooth evolution

Page 38: Nonparametric  Link Prediction in Dynamic Graphs

38

Sensor Network*

* www.select.cs.cmu.edu/data

Page 39: Nonparametric  Link Prediction in Dynamic Graphs

39

Summary

Link formation is assumed to depend on the neighborhood’s evolution over a time window

Admits a kernel-based estimator Consistency Scalability via LSH

Works particularly well for Seasonal effects differently evolving communities

Page 40: Nonparametric  Link Prediction in Dynamic Graphs

40

Thanks!

Page 41: Nonparametric  Link Prediction in Dynamic Graphs

41

Problem statement We are given {G1, G2,…, Gt}. Want to predict Gt+1

Model 1: Yt+1(i,j) = f(Yt-p+1(i,j), …, Yt(i,j)) Takes all edges as independent Only looks at one feature.

Model2: Gt+1 = f(Gt-p+1, Gt-p+2,…, Gt ) Huge dimensionality Probably intractable

Middle ground Learn local prediction model for Yt+1(i,j) using a few features and

patch these together to predict the entire graph.

Page 42: Nonparametric  Link Prediction in Dynamic Graphs

42

Our Model

Idea: Yt+1(i,j) depends on features of (i,j) and the neighborhood of i in the ‘’p’’ previous graphs.

Features specific to (i,j) in t{deg(i), deg(j), cn(i,j), ℓℓ(i,j)}

Features of the neighborhood of i

Should reflect the evolution of the

graph. But should also be similar to the features

of (i,j).

Should be amenable to fast

algorithms.

Page 43: Nonparametric  Link Prediction in Dynamic Graphs

43

Estimation

Kernel Estimator of g

}Once you have computed the kernel similarities between two datacubes, everything boils down to table lookups.

Page 44: Nonparametric  Link Prediction in Dynamic Graphs

44

Distance between two datacubes

Can just compare rates of link formation, i.e. η+/η, but this does not take into account the variance.

Instead, make a normal approximation to η+/η and look at the total variation distance.

As b0, K(dt(i), dt’(i’)) 0 unless D(K(dt(i), dt’(i’)) =0

Page 45: Nonparametric  Link Prediction in Dynamic Graphs

45

Distance between two datacubes

Can just compare rates of link formation, i.e. η+/η, but this does not take into account the variance.

Instead, make a normal approximation to η+/η and look at the total variation distance.

As b0, K(dt(i), dt’(i’)) 0 unless D(K(dt(i), dt’(i’)) =0

Page 46: Nonparametric  Link Prediction in Dynamic Graphs

46

Consistency of Estimator

Define Kind of behaves like a bias term.

Page 47: Nonparametric  Link Prediction in Dynamic Graphs

47

Consistency of Estimator

Show Assumption 1. b0 as nT∞ [similar to kernel density estimation]

Show that for bounded q,

Assumption 2. Introduce strong mixing coefficient α(k), roughly this bounds the degree of dependence between two neighborhoods at distance k.

The total covariance between all neighborhoods is bounded. Assume

Page 48: Nonparametric  Link Prediction in Dynamic Graphs

48

G1 G2 GT……

Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=?

Idea1: Make one datacube per (Gt ,Gt+1 ) transition. Learn how successful this feature combination has been in generating links over the past.

1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2

Too global.

Idea2: Make one datacube for each pair of nodes.

Too local, not to mention expensive

Including graph-based features

Page 49: Nonparametric  Link Prediction in Dynamic Graphs

49

Datacube dt(i) captures the evolution of a small (2-hop) neighborhood around node i

Close nodes will have overlapping neighborhoods similar datacubes.

Our Model

YT+1 (i,j)=?

))((G,...,G,G|j)(i,Y T21 1T gBer

{dT-1(i) ,sT (i,j)}

1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2

sT (i,j)

G1 G2 GT……

Y1 (i,j)=1 Y2 (i,j)=0

Page 50: Nonparametric  Link Prediction in Dynamic Graphs

50

Building neighborhood features

Let S=range of s(i,j). Assume S is finite.

Number of pairs with feature s in the neighborhood of i at time t

Number of pairs which got connected at time t+1 out of ηit (s)

Captures the evolution of the neighborhood from tt+1We use the past evolution pattern of a neighborhood in predicting future evolution.

But how do we estimate g efficiently?

Datacube

We will show that the inference of g will boil down to table lookups in the datacubes dt(i)

Page 51: Nonparametric  Link Prediction in Dynamic Graphs

51

Kernel Estimator for g

G1 G2 …… GTGT-1GT-2

query data-cube at T-1 and feature vector at time T

compute similarities

datacube, feature pair

t=1

{{

{

{

{

{

{

{

datacube, feature pair

t=2

{{

{

{

{

{

{

{

…datacube,

feature pair t=3

{{

{

{

{

{

{

{

…{

{

Huge # of combinations of (datacube, feature) pairs

Page 52: Nonparametric  Link Prediction in Dynamic Graphs

52

Similarity between two datacubes

η1 , η1+

η2 , η2+

Idea 1: for each cell s, take (η1+/η1 –

η2+/η2)2 and sum.Trouble: we do not take the

magnitude of η into account. 3/10 and 12/40 are both treated the same way.

10, 3

40, 12

Idea 2: For each cell compute normal approximation to the posterior of η+/η

0 5 10 15 20 25 30 35 40 450

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Total variation distance,sum over all cells.

0 5 10 15 20 25 30 35 40 450

0.02

0.04

0.06

0.08

0.1

0.12

0.14

) , dist(b) , K( 0<b<1

As b0, K( , ) 0 unless dist( , ) =0

Page 53: Nonparametric  Link Prediction in Dynamic Graphs

53

Consistency of Estimator

Define bias term

f])fE[fB)((g])hE[h(Bgg

All we need now, is B0, and both are consistent,h f

Assumption 1. b0 as nT∞ [similar to kernel density estimation]

Will need some sort of control over the dependency structure.

Page 54: Nonparametric  Link Prediction in Dynamic Graphs

54

Consistency of Estimator

Forget about timestep for now.

(A1) Assume graph has a fixed growth rate ρ, i.e. #nodes at distance k from any node O(kρ-1)bounded degree, bounded neighborhood size

Can be heavily dependent

)] , )q( , cov[q(

Tn1) , q(

nT1var 22

{node,timestep}

depends on neighborhood of some node j at some time t’.

Page 55: Nonparametric  Link Prediction in Dynamic Graphs

55

Consistency of Estimator, if we forgot about time

#datacubes from overlapping neighborhoods = O(n)

)] , )q( , cov[q(n1

2

k hops awayO(k ρ -1) such neighborhoods

Introduce mixing coefficients α(k), to bound the degree of dependence between two nodes more than k hops away. O(α(k)) covariance

per neighborhodSufficient to have

#datacubes from non-overlapping neighborhoods = O(n)

0

Page 56: Nonparametric  Link Prediction in Dynamic Graphs

56

Adding the time component

Make a stacked graph of nT nodes. Previous analysis holds

Gt+1

Page 57: Nonparametric  Link Prediction in Dynamic Graphs

57

Consistency of Estimator

Can show: B0

Plug in f and h for q, and prove that under some regularity conditions,

f])fE[fB)((g])hE[h(Bgg

Page 58: Nonparametric  Link Prediction in Dynamic Graphs

58

Fast Algorithms: quick recap

G1 G2 …… GT+

1

GTGT-1

……

datacubes t=1

datacubes t=2

datacubes t=3

compute similarities only between data cubes

w1

w2

w3

w4

η1 , η1+

η2 , η2+

η3 , η3+

η4 , η4+

44332211

44332211

wwwwwwww

Page 59: Nonparametric  Link Prediction in Dynamic Graphs

59

0 5 10 15 20 25 30 35 40 450

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Using LSH

Devise a hashing function for datacubes such that “Similar” datacubes tend to be hashed to the same bucket “Similar” = small total variation distance between cells of

datacubes

Use B2 bits for each bucket For probability mass p the first bits are set to

1Use B1 buckets to discretize [0,1]

Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells

Page 60: Nonparametric  Link Prediction in Dynamic Graphs

60

Fast Search Using LSH

Distance between datacube now becomes hamming distance between M*B1*B2 bits.

We never have to build this explicitly. We just need to pick k bits out of M*B1*B2 u.a.r and ℓ such hash functions

Hence total work to hash a neighborhood is O(kℓ). We do this for once in the preprocessing phase.

Page 61: Nonparametric  Link Prediction in Dynamic Graphs

61

Scalability Locality Sensitive Hashing (LSH)*.

Main idea: to design a hash function such that two “similar” entities get hashed to the same bucket with high probability.

Widely used in information retrieval for removing near-duplicate documents.

We will use the hashing scheme for hamming distances.

Page 62: Nonparametric  Link Prediction in Dynamic Graphs

62

Simulations

All algorithms perform well on stationary time series.All algorithms that are based on smooth transition only (CN, AA, KATZ) fail for seasonal trends.Non-param works better than LL as long as it has seen all seasonal transitions.LL’s performance gets better with large T and less randomness.

Page 63: Nonparametric  Link Prediction in Dynamic Graphs

63

Real graphs

• Citeseer, NIPS, and HepTh (Physics community) graphs.

Page 64: Nonparametric  Link Prediction in Dynamic Graphs

64

G1 G2 GT……

Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=?

1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2

Including graph-based features

How do we form these datacubes? Idea 2: One datacube for each pair (i,j),

aggregated over G1→…→Gt→Gt+1 Too local + expensive

Page 65: Nonparametric  Link Prediction in Dynamic Graphs

65

1tη) , K(#1f

) , (f) , (h) , (g

1tη) , K(

#1h

Want to show: gg

Kernel Estimator for g

Page 66: Nonparametric  Link Prediction in Dynamic Graphs

66

Using LSH

Step 1: Map datacubes to bit vectors Total variation distance

L1 distance between distributions Hamming distance between vectors

Step 2: Sample k out of MB1B2 bits Step 3: Hash function = values of these k bits in

the bit vector for the datacube O(k) computation per datacube