Transcript
Page 1: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Carnegie  Mellon  University  

Making  Sense  of  Large  Graphs:  Summarization  and  Similarity  

Mlconf  ‘14,  Atlanta,  GA  

Danai Koutra Computer Science Department

Carnegie Mellon University

[email protected] http://www.cs.cmu.edu/~dkoutra

Page 2: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Making  sense  of  large  graphs  

Danai Koutra (CMU) 2

Human Connectome

Project

scalable algorithms and models for understanding massive graphs.

>1.25B users!

Page 3: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Understanding  Large  Graphs  

Danai Koutra (CMU) 3

Part 1 S u m m a r i z a t i o n

Page 4: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Danai Koutra (CMU) 4

79,870 email accounts 288,364 emails

Ever  tried  visualizing  a  large  graph?  

Page 5: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Danai Koutra (CMU) 5

79,870 email accounts 288,364 emails

Ever  tried  visualizing  a  large  graph?  

Page 6: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

After  this  talk,  you’ll  know    how  to  Cind…  

Danai Koutra (CMU) 6

VoG Top-3 Stars [email protected]  

[email protected]  

Page 7: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Enron  Summary  

Danai Koutra (CMU) 7

VoG Top Near Bipartite Core Ski

excursion

organizers participants

“Affair”

Commenters CC’ed

Page 8: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Problem  DeCinition  

Danai Koutra (CMU) 8

Given: a graph

Find:

≈ important graph

structures.

a succinct summary with possibly overlapping subgraphs

[Koutra, Kang, Vreeken, Faloutsos. SDM’14]

Danai Koutra (CMU) 8

Lady Gaga Fan Club

Page 9: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Main  Ideas  

Idea 1: Use well-known structures (vocabulary):

Idea 2: Best graph summary   è optimal compression (MDL)

Danai Koutra (CMU) 9

Shortest lossless description

Page 10: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Minimum  Description  Length  

Danai Koutra (CMU) 10

BACKGROUND  

a1 x + a0

min  L(M)        +        L(D|M)  

a10 x10 + a9 x9 + … + a0

errors

{ }

simple & good explanations

# bits for M

# bits for the data using M

~Occam’s razor

Page 11: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Formally:  Minimum  Graph  Description    

Danai Koutra (CMU) 11

Given: - a graph G - vocabulary Ω

Find: model M s.t. min L(G,M) = min{ L(M) + L(E) }

Adjacency A Model M Error E

Page 12: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

VoG:  Overview  

Danai Koutra (CMU) 12

argmin    

≈?

Page 13: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

VoG:  Overview  

Danai Koutra (CMU) 13

Pick best (with some criterion)

Summary

Page 14: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Q:  Which  structures  to  pick?  

Danai Koutra (CMU) 14

A: Those that min description length S of G

2|S| combinations

Page 15: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Runtime  

Danai Koutra (CMU) 15

VOG is near-linear on # edges of the input graph.

1.25B users!

Page 16: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Understanding  a    wiki  graph  

Danai Koutra (CMU) 16

Nodes: wiki editors Edges: co-edited

I don’t see anything! L

Page 17: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Wiki  Controversial  Article  

Danai Koutra (CMU) 17

Stars: admins, bots, heavy users

Bipartite cores: edit wars

Kiev vs. Kyiv vandals vs. admins

Page 18: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

VoG    vs.  other  methods  

Danai Koutra (CMU) 18

VoG   Bounded-­‐Error  Summariza@on  

Mo@f  Simplifica@on  

Clustering  Methods  

Cross-­‐Associa@ons  

 

Variety  of  Structures   ✔   ✗   ✗   ✗   ✗  Important  Structures   ✔   ✗   ✗   ✗   ✗  Low  Complexity   ✔   ✗   ✗   ✔(?)   ✔  Visualiza@on   ✔   ✔   ✔   ✗   ✗  Graph  Summary   ✔   ✔   ✔   ✗   ✗  

Stars, cliques near-cliques

[Navlakha+’08] [Dunne+’13] [Chakrabarti+’03]

Page 19: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

VoG:  summary  

Danai Koutra (CMU) 19

•  Focus on important •  possibly-overlapping structures •  with known graph-theoretic properties

 www.cs.cmu.edu/~dkoutra/SRC/vog.tar  

Page 20: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Understanding  Large  Graphs  

Danai Koutra (CMU) 20

Part 2 S i m i l a r i t i e s

Page 21: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

friendship  graph  ≈  wall  posts  graph?  

Danai Koutra (CMU) 21

Behavioral  PaOerns  1  

VS.  

Are  the  graphs  /  behaviors  similar?  

Page 22: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Why  graph  similarity?  

Danai Koutra (CMU) 22

Classification 2

Temporal  anomaly    detec@on  

3

Intrusion  detec@on  4  

�! �!12 13 14 22 23

Day  1                    Day  2                      Day  3                    Day  4  

sim1   sim2   sim3  

Page 23: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Problem  DeCinition:  Graph  Similarity  

•  Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence

•  Find: similarity score s [0,1]

Danai Koutra (CMU) 23

GA

GB

Page 24: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Obvious  solution?  

Edge Overlap (EO) # of common edges (normalized or not)

Danai Koutra 24

GA

GB

Page 25: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

…  but  “barbell”…  

EO(B10,mB10) == EO(B10,mmB10)

Danai Koutra 25

GA GA

GB GB’

Page 26: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

What  makes  a  similarity    function  good?  

26

•  Properties: ² Intuitive

Danai Koutra

ProperFes  like:  “Edge-­‐importance”  

Page 27: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

✗  

What  makes  a  similarity    function  good?  

27

•  Properties: ² Intuitive

² Scalable

Danai Koutra

ProperFes  like:  “Weight-­‐awareness”  

✗  

Page 28: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

MAIN  IDEA:  DELTACON  

28

SA  =   SB =  

①  Find the pairwise node influence, SA & SB. ②  Find the similarity between SA & SB.

Danai Koutra (CMU)

DETAILS  

Page 29: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

How?  Using  Belief  Propagation  Attenuating Neighboring Influence for small ε:

29

S = [I+ε 2D−εA]−1 ≈

≈ [I−εA]−1 = I+εA+ε 2A2 +...

1-hop 2-hops …

Note: ε > ε2 > ..., 0<ε<1

INTUITION  

Danai Koutra (CMU)

Page 30: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

OUR  SOLUTION:  DELTACON  

30

DETAILS  

①  Find the pairwise node influence, SA & SB. ②  Find the similarity between SA & SB.

Danai Koutra (CMU)

sim( ) = 1

1+ sA,ij − sB,ij( )2

i, j∑SA,SB  

SA  =   SB =  

“Root” Euclidean Distance

Page 31: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

…  but  O(n2)  …  

31

f a s t e r ?

O(m1+m2) in the paper J

Danai Koutra (CMU)

Page 32: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

32

•  Nodes:  email  accounts  of  employees  •  Edges:  email  exchange    

Day  1                              Day  2                              Day  3                          Day  4                            Day  5    

sim1   sim2   sim3   sim4  

Danai Koutra (CMU)

Temporal  Anomaly    Detection    

Page 33: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

33

similarity  

consecu@ve  days  Danai Koutra (CMU)

Feb  4:  Lay  resigns  

Temporal  Anomaly    Detection    

Page 34: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Brain-­‐Connectivity    Graph  Clustering  

34

•  114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections

•  Attributes: gender, IQ, age…

Danai Koutra (CMU)

Page 35: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Brain-­‐Connectivity    Graph  Clustering  

35 Danai Koutra (CMU) 35 Danai Koutra (CMU) Danai Koutra (CMU) 35

t-­‐test    p-­‐value  =  0.0057  

Page 36: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Graph  Understanding  via  …  

•  … Summarization … ² VoG: to spot the important graph structures

•  … Comparison …

² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently

36 Danai Koutra (CMU) Danai Koutra (CMU) 36

Page 37: Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

Thank  you!  

www.cs.cmu.edu/~dkoutra/pub.htm [email protected]

Danai Koutra (CMU) 37

summarization similarities Understanding


Top Related