danai koutra – cmu/technicolor researcher, carnegie mellon university at mlconf atl

Carnegie Mellon University

Making Sense of Large Graphs: Summarization and Similarity

Mlconf ‘14, Atlanta, GA

Danai Koutra Computer Science Department

Carnegie Mellon University

[email protected] http://www.cs.cmu.edu/~dkoutra

Making sense of large graphs

Danai Koutra (CMU) 2

Human Connectome

Project

scalable algorithms and models for understanding massive graphs.

>1.25B users!

Understanding Large Graphs


Part 1 S u m m a r i z a t i o n


79,870 email accounts 288,364 emails

Ever tried visualizing a large graph?

After this talk, you’ll know how to Cind…


VoG Top-3 Stars [email protected]

[email protected]

Enron Summary


VoG Top Near Bipartite Core Ski

excursion

organizers participants

“Affair”

Commenters CC’ed

Problem DeCinition


Given: a graph

Find:

≈ important graph

structures.

a succinct summary with possibly overlapping subgraphs

[Koutra, Kang, Vreeken, Faloutsos. SDM’14]


Lady Gaga Fan Club

Main Ideas

Idea 1: Use well-known structures (vocabulary):

Idea 2: Best graph summary   è optimal compression (MDL)


Shortest lossless description

Minimum Description Length


BACKGROUND

a1 x + a0

min L(M) + L(D|M)

a10 x10 + a9 x9 + … + a0

errors

{ }

simple & good explanations

# bits for M

# bits for the data using M

~Occam’s razor

Formally: Minimum Graph Description


Given: - a graph G - vocabulary Ω

Find: model M s.t. min L(G,M) = min{ L(M) + L(E) }

Adjacency A Model M Error E

VoG: Overview


argmin

≈

≈?

VoG: Overview


Pick best (with some criterion)

Summary

Q: Which structures to pick?


A: Those that min description length S of G

2|S| combinations

Runtime


VOG is near-linear on # edges of the input graph.

1.25B users!

Understanding a wiki graph


Nodes: wiki editors Edges: co-edited

I don’t see anything! L

Wiki Controversial Article


Stars: admins, bots, heavy users

Bipartite cores: edit wars

Kiev vs. Kyiv vandals vs. admins

VoG vs. other methods


VoG Bounded-‐Error Summariza@on

Mo@f Simplifica@on

Clustering Methods

Cross-‐Associa@ons

Variety of Structures ✔ ✗ ✗ ✗ ✗ Important Structures ✔ ✗ ✗ ✗ ✗ Low Complexity ✔ ✗ ✗ ✔(?) ✔ Visualiza@on ✔ ✔ ✔ ✗ ✗ Graph Summary ✔ ✔ ✔ ✗ ✗

Stars, cliques near-cliques

[Navlakha+’08] [Dunne+’13] [Chakrabarti+’03]

VoG: summary


•  Focus on important •  possibly-overlapping structures •  with known graph-theoretic properties

www.cs.cmu.edu/~dkoutra/SRC/vog.tar

Understanding Large Graphs


Part 2 S i m i l a r i t i e s

friendship graph ≈ wall posts graph?


Behavioral PaOerns 1

VS.

Are the graphs / behaviors similar?

Why graph similarity?


Classification 2

Temporal anomaly detec@on

3

Intrusion detec@on 4

�! �!12 13 14 22 23

Day 1 Day 2 Day 3 Day 4

sim1 sim2 sim3

Problem DeCinition: Graph Similarity

•  Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence

•  Find: similarity score s [0,1]

€

∈


GA

GB

Obvious solution?

Edge Overlap (EO) # of common edges (normalized or not)

Danai Koutra 24

GA

GB

… but “barbell”…

EO(B10,mB10) == EO(B10,mmB10)

Danai Koutra 25

GA GA

GB GB’

What makes a similarity function good?

26

•  Properties: ² Intuitive

Danai Koutra

ProperFes like: “Edge-‐importance”

✗

What makes a similarity function good?

27

•  Properties: ² Intuitive

² Scalable

Danai Koutra

ProperFes like: “Weight-‐awareness”

✗

MAIN IDEA: DELTACON

28

SA = SB =

①  Find the pairwise node influence, SA & SB. ②  Find the similarity between SA & SB.

Danai Koutra (CMU)

DETAILS

How? Using Belief Propagation Attenuating Neighboring Influence for small ε:

29

S = [I+ε 2D−εA]−1 ≈

≈ [I−εA]−1 = I+εA+ε 2A2 +...

1-hop 2-hops …

Note: ε > ε2 > ..., 0<ε<1

INTUITION

Danai Koutra (CMU)

OUR SOLUTION: DELTACON

30

DETAILS

①  Find the pairwise node influence, SA & SB. ②  Find the similarity between SA & SB.

Danai Koutra (CMU)

sim( ) = 1

1+ sA,ij − sB,ij( )2

i, j∑SA,SB

SA = SB =

“Root” Euclidean Distance

… but O(n2) …

31

f a s t e r ?

O(m1+m2) in the paper J

Danai Koutra (CMU)

32

•  Nodes: email accounts of employees •  Edges: email exchange

Day 1 Day 2 Day 3 Day 4 Day 5

sim1 sim2 sim3 sim4

Danai Koutra (CMU)

Temporal Anomaly Detection

33

similarity

consecu@ve days Danai Koutra (CMU)

Feb 4: Lay resigns

Temporal Anomaly Detection

Brain-‐Connectivity Graph Clustering

34

•  114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections

•  Attributes: gender, IQ, age…

Danai Koutra (CMU)

Brain-‐Connectivity Graph Clustering

35 Danai Koutra (CMU) 35 Danai Koutra (CMU) Danai Koutra (CMU) 35

t-‐test p-‐value = 0.0057

Graph Understanding via …

•  … Summarization … ² VoG: to spot the important graph structures

•  … Comparison …

² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently

36 Danai Koutra (CMU) Danai Koutra (CMU) 36

Thank you!

www.cs.cmu.edu/~dkoutra/pub.htm [email protected]


summarization similarities Understanding

danai koutra – cmu/technicolor researcher, carnegie mellon university at mlconf atl

Technology