graphscope : parameter-free mining of large time-evolving graphs

GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun CMU

Spiros Papadimitriou IBM

Philip S. Yu IBM

Christos Faloutsos CMU

Motivation of GraphScope

Time-evolving graphs Network traffic graphs Email networks Customer product relationshipsCall detail records in telecom networks Financial transaction data

Key questions:1. How to monitor community structures?

2. How to detect the change points?

1. Community discovery

5 10 15 20 25

Products

Graph Adjacency matrix

289 /300

5/200 2/75

CEOsResearchers

54%54%

Simultaneously group: customers and products,or, source-destination traffic graphs,or, sender-recipient communication, etc…

Product groups

Customers

ProductsCustomers

Products

2. Change detection

Find change points in group structure

Products

holiday season

Given graphs G1, G2, … Gt where Gi is n-by-m

1. partition them into time segments G(1), G(2), …

2. for each segment, identify the groups

Problem definition

1. Scalable, 2. Parameter-free, 3. Incremental

G(1) G(2)

Outline

MotivationGraphScope

Community discovery Change detection

Experiments

Community detectionClustering problem Compression problem

t = 0 t = 1 t = 2

Cost objective within a time segment

groups

ℓ = 3

col. g

dsegment duration

log dnimj

i,j d nimj H(pi,j)

density of ones (edges)

d n1m2 H(p1,2) bits for (1,2)

code cost

bits total

description cost

+ log* d

Cost objective within a time segment

code cost(blocks)

description cost(blocks’ model)

one row groupone col group

n row groupsm col groups

high low

Cost objectivewithin a time segment

code cost(blocks)

k = 3 row groupsℓ = 3 col groups

Search for the optimum grouping

Problem is NP-hard even for one timestamp on column permutation onlyReduction from TSP problem [Johnson+ 03]

HeuristicsSearch: Split, Merge, Shuffle Initialization: Resume, Restart

Outline

Community discovery Change detection

Experiments

Change point detection

Option 1:Append to current segment

change point

Option 2:Start new segment

1: append

2: split (time)

In both cases, we do row & col. shuffles, splits and/or merges

Choose the most parsimonious option

Outline

Single timestamp Multiple timestamp

Experiments

Objectives

Effectiveness on Community discoveryChange detection

Compression benefit Scalable, incremental computation

Evolving communitiesNETWORK

29K hosts (nodes)12K edges (on avg)1,220 hours

~ 14.6M edges totaltime

Community change pointsENRON

34K email addresses12K emails (on avg)165 weeks

~ 2M emails total

Key change-pointscorrespond to

key events

Compression gain

20GraphScope gives 10%-150% compression gain

Graphscope

Graph stream clusteringScalability—NETWORK

29K hosts (nodes) 12K edges per hour (on average) 1,220 hours (timestamps) ~ 14.6M edges total

< 2 sec / snapshot on avg

Related work

Co-clustering [Dhillon+ KDD03] [Chakrabarti+ KDD04]

Graph partitioning [Karypis+ 99]

Time-evolving graphs [Chakrabarti+ KDD06] [Chi+ KDD07] [Asur+ KDD07]

Summary

Organize into few, homogeneous communities

Find changes in community structure

Scalable Parameter-free Incremental

GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun

Spiros Papadimitriou

Philip S. Yu

Christos Faloutsos

Graph stream clustering

t = 0 t = 1 t = 2

Graph clustering – [Chakrabarti+ KDD’04]

versus

Column groups Column groups

Good Clustering

1. Similar nodes are grouped together

2. As few groups as necessary

A few, homogeneous

blocks

Good Compression

Why is this better?

implies

versus

Column groups Column groups

Good Clustering

1. Similar nodes are grouped together

2. As few groups as necessary

A few, homogeneous

blocks

Good Compression

Why is this better?

implies

Good Clustering

GoodCompression

implies

log nimj

Assumes group paritionings,sizes and densities are given

i,j nimj H(pi,j)

Cost objective

m1 m2 m3

p1,1 p1,2 p1,3

p2,1 p2,2 p2,3

p3,3p3,2p3,1

n £ m adj. matrix

ℓ = 3 col. groups

density of ones (edges)

n1m2 H(p1,2) bits for (1,2)

code cost

bits total

irow-partitionidescription j

col-partitionjdescription

i,jtransmit#edges ei,j

description cost

block size entropy

Graph clusteringScalability

Number of edges

Splits

Shuffles

Linear on the number of edges Scalable

Time vs. Size

Cost objective

code cost(blocks)

one row groupone col group

n row groupsm col groups

high low

Cost objective

code cost(blocks)

k = 3 row groupsℓ = 3 col groups

Search for optimum

Cost vs. number of groups

one row

groupone

col group

groupsm

roupsk =

groupsℓ =

l groups

splitshuffle

k = 5, ℓ = 5k = 5, ℓ = 5

Search for optimumSummary

k=1, ℓ=2 k=2, ℓ=2 k=2, ℓ=3 k=3, ℓ=3 k=3, ℓ=4 k=4, ℓ=4 k=4, ℓ=5

k = 1, ℓ = 1

splitshuffle

Split:Increase k or ℓ

Shuffle:Rearrange rows and cols

Merge:Decrease k or ℓ

Given a graph of interactions or associationsCustomers to products Documents to termsPeople to peopleComputer communicationsFinancial transactions

Find simultaneouslyCommunities (source and destination)Their number

graphscope : parameter-free mining of large time-evolving graphs

change point detection1

change point detectionoption

emails totalkey change

time segmentp1

time segmentk

time segments g1

cost objectivewithin

edges totaltime

Documents

dynamic flows dynamic transshipment & evolving graphs...

evolving graphs & dynamic networks : old questions new...

bar graphs line graphs & picto-graphs

large scale evolving graphs with burst detection ·...

a self-evolving fuzzy system which learns dynamic threshold...

calibration methods introduction 1.)graphs are critical to...

clude: an efficient algorithm for lu decomposition over a...

clude: an efcient algorithm for lu decomposition over a...

dynamic networks & evolving graphs afonso ferreira cnrs i3s...

spatial community-informed evolving graphs for demand...

the butterfly effect in knowledge graphs: predicting the...

efﬁcient densest subgraph computation in evolving...

parameter optimization of evolving spiking neural...

javagenes: evolving graphs with crossover€¦ · the...

querying evolving graphs with portal - arxiv · querying...

cypher: an evolving query language for property graphs ·...

cypher: an evolving query language for property graphs

evolving graphs

parameter substitution parametermeaning $parameter or...

quantum gravity and spinfoams: from evolving graphs to ......