snapnets: automatic segmentation of network sequences with node labels

48
SnapNETS: Automatic Segmentation of Network Sequences with Node Labels Sorour E. Amiri, Liangzhe Chen, B. Aditya Prakash Department of Computer Science Virginia Tech AAAI, San Francisco, USA, February 9, 2017

Upload: sorour-ekhtiari-amiri

Post on 12-Apr-2017

73 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

SnapNETS: Automatic Segmentation of Network Sequences

with Node Labels

Sorour E. Amiri, Liangzhe Chen, B. Aditya PrakashDepartment of Computer Science

Virginia Tech

AAAI, San Francisco, USA, February 9, 2017

Page 2: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS Experiments Conclusion

Amiri, Chen, Prakash 2

Page 3: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Network SequencesEpidemiology: disease spreads over contact networks

Social Media: Information spreads over friendship networks

3

Flu

Meme

Amiri, Chen, Prakash

G1 G2 G3 G4

G1 G2 G3 G4

Uninfected

Infected

Inactive

Active

Page 4: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Making sense of network sequences

4

Flu

when do the infection patterns change?

Star Bridge Near Clique

Reason:• Virus mutation• Vaccination• …

Amiri, Chen, Prakash

G1 G2 G3 G4Uninfected

Infected

Page 5: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Making sense of network sequences

5

Meme Reason:• Event• …

Star Clique

when do the activation patterns change?

Amiri, Chen, Prakash

G1 G2 G3 G4

Inactive

Active

Page 6: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Problem 1: Network sequence segmentation

Given a sequence of networks with labeled nodes, Find the best segmentation which captures:

Different distribution of node labels.

6

Star Bridge Near CliqueAmiri, Chen, Prakash

G1 G2 G3 G4

In this work: Binary labels {0, 1}

Page 7: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Desirable Properties P1. Parameter-free:

• No threshold, No fixed granularity

P2. Comprehensive: • Use the entire graph

P3. Scalable

7Amiri, Chen, Prakash

Page 8: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS Experiments Conclusion

8Amiri, Chen, Prakash

Page 9: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Alternative 1: Feature Ext. &Time-series

9

0 0 0 … 2F1: #cliques (of active subgraph)

F2: #ladders (of inactive subgraph)

F3: #ladders (of active subgraph)

1 1 0 … 0

0 0 0 … 1

[Henderson et al. 2010] [Likas, Vlassis, and Verbeek 2003] [Li et al. 2009]

Amiri, Chen, Prakash

G1 G2 G3 G4-1

0

1

2

Features time series

F1 F2 F3

Step 1: Feature Extraction

Step 2: Time-series segmentationG1 G2 G3 G4

Page 10: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Alternative 1: Feature Ext. &Time-series

Drawbacks: Laborious feature-engineering

o # Cliqueso # Ladders

“Local” change detection:o One aggregation time periodo Threshold

10Amiri, Chen, Prakash

G1 G2 G3 G4-1

0

1

2

Features time series

F1 F2 F3

G1 G2 G3 G4

Page 11: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Alternative 2: Plain-graph-based analysis

11

[Shah et al. 2015] [Sun et al. 2007] [Lin et al. 2009] [Qu et al. 2014]

Step 1: Extract active subgraphs

Amiri, Chen, Prakash

Step 2: Dynamic graph segmentation

G1 G2 G3 G4

G1 G2 G3 G4 G1 G2 G3 G4

Page 12: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Alternative 2: Plain-graph-based analysis

Drawbacks: Inactive nodes are important to detect different patterns

Amiri, Chen, Prakash

Entire graphDynamic graph segmentation

10

G1 G2 G3 G4 G1 G2 G3 G4

Chain Roles are different

Page 13: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Desirable Properties P1. Parameter-free:

• No threshold, No fixed granularity

P2. Comprehensive: • Use the entire graph

P3. Scalable

13Amiri, Chen, Prakash

Comparison of SnapNETS

Page 14: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS

Main Idea and Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

14Amiri, Chen, Prakash

Page 15: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Nodes: For each segment there is a node + {Source (‘s’), Target (‘t’)} Source (‘s’) = start time Target (‘t’) = end time

Edges: There is a directed edge between adjacent nodes

Main Idea: Segmentation graph

15Amiri, Chen, Prakash

Best segmentation problem Path optimization problem

Inpu

t

Segmentation G

raph

Page 16: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Overview of SnapNETS Goal 1. Summarize each graph:

Keep structural and label dependent properties

Goal 2. Construct Segmentation graph:Define nodes and edgesDefining edges weights

o extract the features of summarized graphs

Goal 3. Find the best segmentation:Define the best segmentation (path)Compute the best segmentation

16Amiri, Chen, Prakash

Page 17: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Technical Challenges Using the entire graph snapshots:

Summarize graph while satisfying P2

Finding the number of segments: Compute segmentation while satisfying P1

17

Reminder: P1. Parameter-free P2. Comprehensive P3. Scalable

Amiri, Chen, Prakash

Page 18: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS

Main Idea and Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

18Amiri, Chen, Prakash

Page 19: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Goal 1: Summarizing graph snapshots

We want to preserve Structural properties Nodes labels

Role of Eigenvalue:

19Amiri, Chen, Prakash

Epidemic threshold in most diffusion models [Prakash et al. ICDM 2011]

Same Same diffusive properties

Leading eigenvalue of Adjacency matrix

Page 20: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

20

Our summarization approach We want to get a smaller graph with similar eigenvalues:

Successively merge nodes

Amiri, Chen, Prakash

Page 21: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Problem 2: Graph summarization Given: A graph with labeled nodes and a compression ratio. Find: a coarsened graph such that:

21Amiri, Chen, Prakash

Page 22: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Keep leading eigenvalue Matrix perturbation approach

Based on CoarsNet [Purohit et al. KDD 2014] Successively merge nodes Do not merge nodes with different labels

Our Approach

22

Given: A graph with labeled nodes and a compression ratio.Find: a coarsened graph such that:

Amiri, Chen, Prakash

0.10.1 0.1

0.2

0.2

Page 23: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS

Main Idea and Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

23Amiri, Chen, Prakash

Page 24: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Nodes: For each segment there is a node + {Source (‘s’), Target (‘t’)} Source (‘s’) = start time Target (‘t’) = end time

Edges: There is a directed edge between adjacent nodes

Goal 2: Segmentation graph

24Amiri, Chen, Prakash

Page 25: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Edge Weights

25

How can we measure the distance between two segments?Amiri, Chen, Prakash

w ?

Page 26: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Our Approach Step 1: Extract features from summary graphs:

Easier and more efficient than on original graphs. No complex features

26Amiri, Chen, Prakash

F = [3.9, 13,..., 2.2]

Page 27: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Step 2: Distance of adjacent segments

27

Edge Weights

Amiri, Chen, Prakash

w

Page 28: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS

Main Idea and Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

28Amiri, Chen, Prakash

Page 29: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Goal 3: Finding the best segmentation Observation:

For each segmentation there is a path from ‘s’ to ‘t’For each path from ‘s’ to ‘t’ there is a segmentation

Therefore,• Best segmentation problem Path optimization problem

29Amiri, Chen, Prakash

Page 30: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Possible approach Longest path? Given a segmentation graph Find the longest path from ‘s’ to ‘t’

30

Over segmentation problem

s t. . .

s t0.01 0.01 0.01 0.01

0.9 0.9 0.9

Sum = 3

Sum = 2.7Amiri, Chen, Prakash

Page 31: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Problem 3: Finding the best segmentation

Our idea: Average longest path

Advantages: Parameter free Naturally balances weight of the path with the number of segments.

31

Given a segmentation graphFind the average longest path from ‘s’ to ‘t’

Amiri, Chen, Prakash

Page 32: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Solving ALP Finding the ALP in general graphs is NP-hard. The segmentation graph is a DAG ALP can be solved in

polynomial time State-of-the-art algorithm [Waggoner et al. WACV 2013]

32Amiri, Chen, Prakash

Time complexity:

Cubic: Not scalable!

Page 33: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Our Solution: LAYERED-ALP

Amiri, Chen, Prakash 33

Dynamic Programming Optimal solution

lp1 = Longest path with 1 segment

lp2 = Longest path with 2 segments

lp4 = Longest path with 4 segments

Page 34: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Our Solution: LAYERED-ALP

Amiri, Chen, Prakash 34

Time Complexity:

Linear!

Build Layers

Find LP in each layer

Find ALP

Page 35: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Complete algorithm

35

Time complexity:

Amiri, Chen, Prakash

Sub-quadratic

Page 36: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Complete algorithm: Parallel

36

Time complexity:

Amiri, Chen, Prakash

Page 37: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS

Main Idea and Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

37Amiri, Chen, Prakash

Page 38: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Experiments: datasets Different Domains with range of sizes:

BA-degree: Random Barabasi Albert graph AS-Oregon: Autonomous Systems peering information Higgs: Tweets dataset (with the follower-followee network) Portland: Contact network between people of Portland Memetracker: Who-copies-from-whom blog and website network IranElect: Follower-followee network of Twitter related to the Iran

election. DBLP: Co-authorship network related to ‘network’ topic.

38Amiri, Chen, Prakash

Page 39: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Experiments: baselines DYNAMMO [Li et al. KDD 2009]:

Change point detection ( Reconstruction errors) # segments = # segments of SnapNETS .

K-means [Likas et al. Pattern Recognition 2003]: segment when a new cluster is detected

VOG [Koutra et al. SDM 2014]: 10 most important sub-structures Cut when the set of sub-structures changes significantly

o (threshold = the one gives the best result)

39Amiri, Chen, Prakash

Feature Extraction & time series

Dynamic graph

Page 40: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Experiments: baselines-variations SN-ORIG: Original graphs instead of summary graphs SN-LP: Longest Path instead of ALP SN-GREEDY: Greedy Approach instead of ALP

40Amiri, Chen, Prakash

Page 41: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Experiments: Quantitative analysis

41

SnapNETS outperforms the baselines Clear patterns in summary graphs

Infection moves to new community

As-Oregon

Amiri, Chen, Prakash

Page 42: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Case studies: Memetracker

42

Televised vice-presidential debates

Summary graphs are close to the case when all nodes have the same label (f5)

Random nodes are active (f8)

Summary graphs are substantially sparser (f2).

Many active nodes got merged into important nodes such as CNN and BBC to form hubs (f6)

Amiri, Chen, Prakash

Can I call you joe?

Page 43: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Case studies: AS-Oregon

43

New community New segment

Amiri, Chen, Prakash

Page 44: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

44

Scalability

Amiri, Chen, Prakash

Scalability of SNAP NETS Speedup by parallelizing construction of segmentation graph

Near-linear

Page 45: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Outline Motivation Alternative Approaches Our Proposed Method: SnapNETS

Main Idea and Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

45Amiri, Chen, Prakash

Page 46: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Discussion: SnapNets Patterns:

the ‘placement’ and ‘connection’ of active/inactive nodes:

• structural (e.g. community/role/centrality) • rate changes.

Global method: SnapNETS is a ‘global’ method and not simply a change-point detection method.

46Amiri, Chen, Prakash

Graph summarization and features

Average Longest Path

Properties: P1. Parameter-freeP2. ComprehensiveP3. Scalable

Page 47: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Future Work Handle dynamic graphs with varying

nodes and edges More node labels and real valued features Work with partially observed graphs

47Amiri, Chen, Prakash

Page 48: SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Any questions?

48

Funding:

Code at: https://github.com/SorourAmiri/SnapNETS

Sorour E. Amiri Liangzhe Chen B. Aditya Prakash

Goal 1 Goal 2 Goal 3Finding the best segmentation

Successively merge nodesKeep leading eigenvalueKeep same set of labels

Graph summarization Segmentation graph Nodes Edges Edge weights

ALP

SnapNETS Result