danai koutra – cmu/technicolor researcher, carnegie mellon university at mlconf atl

Post on 05-Dec-2014

1.216 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Networks naturally capture a host of interactions in the real world spanning from friendships to brain activity. But, given a massive graph, like the Facebook social graph, what can be said about its structure? Which are its most important structures? How does it compare to other networks like Twitter? This talk will focus on my work developing scalable algorithms and models that help us to make sense of large graphs via pattern discovery and similarity analysis. I will begin by presenting VoG, an approach that efficiently summarizes large graphs by finding their most interesting and semantically meaningful structures. Starting from a clutter of millions of nodes and edges, such as the Enron who-mails-whom graph, our Minimum Description Length based algorithm, disentangles the complex graph connectivity and spotlights the structures that ‘best’ describe the graph. Then, for similarity analysis at the graph level, I will introduce the problems of graph comparison and graph alignment. I will conclude by showing how to apply my methods to temporal anomaly detection, brain graph clustering, deanonymization of bipartite (e.g., user-group membership) and unipartite graphs, and more.

TRANSCRIPT

Carnegie  Mellon  University  

Making  Sense  of  Large  Graphs:  Summarization  and  Similarity  

Mlconf  ‘14,  Atlanta,  GA  

Danai Koutra Computer Science Department

Carnegie Mellon University

danai@cs.cmu.edu http://www.cs.cmu.edu/~dkoutra

Making  sense  of  large  graphs  

Danai Koutra (CMU) 2

Human Connectome

Project

scalable algorithms and models for understanding massive graphs.

>1.25B users!

Understanding  Large  Graphs  

Danai Koutra (CMU) 3

Part 1 S u m m a r i z a t i o n

Danai Koutra (CMU) 4

79,870 email accounts 288,364 emails

Ever  tried  visualizing  a  large  graph?  

Danai Koutra (CMU) 5

79,870 email accounts 288,364 emails

Ever  tried  visualizing  a  large  graph?  

After  this  talk,  you’ll  know    how  to  Cind…  

Danai Koutra (CMU) 6

VoG Top-3 Stars klay@enron.com  

kenneth.lay@enron.com  

Enron  Summary  

Danai Koutra (CMU) 7

VoG Top Near Bipartite Core Ski

excursion

organizers participants

“Affair”

Commenters CC’ed

Problem  DeCinition  

Danai Koutra (CMU) 8

Given: a graph

Find:

≈ important graph

structures.

a succinct summary with possibly overlapping subgraphs

[Koutra, Kang, Vreeken, Faloutsos. SDM’14]

Danai Koutra (CMU) 8

Lady Gaga Fan Club

Main  Ideas  

Idea 1: Use well-known structures (vocabulary):

Idea 2: Best graph summary   è optimal compression (MDL)

Danai Koutra (CMU) 9

Shortest lossless description

Minimum  Description  Length  

Danai Koutra (CMU) 10

BACKGROUND  

a1 x + a0

min  L(M)        +        L(D|M)  

a10 x10 + a9 x9 + … + a0

errors

{ }

simple & good explanations

# bits for M

# bits for the data using M

~Occam’s razor

Formally:  Minimum  Graph  Description    

Danai Koutra (CMU) 11

Given: - a graph G - vocabulary Ω

Find: model M s.t. min L(G,M) = min{ L(M) + L(E) }

Adjacency A Model M Error E

VoG:  Overview  

Danai Koutra (CMU) 12

argmin    

≈?

VoG:  Overview  

Danai Koutra (CMU) 13

Pick best (with some criterion)

Summary

Q:  Which  structures  to  pick?  

Danai Koutra (CMU) 14

A: Those that min description length S of G

2|S| combinations

Runtime  

Danai Koutra (CMU) 15

VOG is near-linear on # edges of the input graph.

1.25B users!

Understanding  a    wiki  graph  

Danai Koutra (CMU) 16

Nodes: wiki editors Edges: co-edited

I don’t see anything! L

Wiki  Controversial  Article  

Danai Koutra (CMU) 17

Stars: admins, bots, heavy users

Bipartite cores: edit wars

Kiev vs. Kyiv vandals vs. admins

VoG    vs.  other  methods  

Danai Koutra (CMU) 18

VoG   Bounded-­‐Error  Summariza@on  

Mo@f  Simplifica@on  

Clustering  Methods  

Cross-­‐Associa@ons  

 

Variety  of  Structures   ✔   ✗   ✗   ✗   ✗  Important  Structures   ✔   ✗   ✗   ✗   ✗  Low  Complexity   ✔   ✗   ✗   ✔(?)   ✔  Visualiza@on   ✔   ✔   ✔   ✗   ✗  Graph  Summary   ✔   ✔   ✔   ✗   ✗  

Stars, cliques near-cliques

[Navlakha+’08] [Dunne+’13] [Chakrabarti+’03]

VoG:  summary  

Danai Koutra (CMU) 19

•  Focus on important •  possibly-overlapping structures •  with known graph-theoretic properties

 www.cs.cmu.edu/~dkoutra/SRC/vog.tar  

Understanding  Large  Graphs  

Danai Koutra (CMU) 20

Part 2 S i m i l a r i t i e s

friendship  graph  ≈  wall  posts  graph?  

Danai Koutra (CMU) 21

Behavioral  PaOerns  1  

VS.  

Are  the  graphs  /  behaviors  similar?  

Why  graph  similarity?  

Danai Koutra (CMU) 22

Classification 2

Temporal  anomaly    detec@on  

3

Intrusion  detec@on  4  

�! �!12 13 14 22 23

Day  1                    Day  2                      Day  3                    Day  4  

sim1   sim2   sim3  

Problem  DeCinition:  Graph  Similarity  

•  Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence

•  Find: similarity score s [0,1]

Danai Koutra (CMU) 23

GA

GB

Obvious  solution?  

Edge Overlap (EO) # of common edges (normalized or not)

Danai Koutra 24

GA

GB

…  but  “barbell”…  

EO(B10,mB10) == EO(B10,mmB10)

Danai Koutra 25

GA GA

GB GB’

What  makes  a  similarity    function  good?  

26

•  Properties: ² Intuitive

Danai Koutra

ProperFes  like:  “Edge-­‐importance”  

✗  

What  makes  a  similarity    function  good?  

27

•  Properties: ² Intuitive

² Scalable

Danai Koutra

ProperFes  like:  “Weight-­‐awareness”  

✗  

MAIN  IDEA:  DELTACON  

28

SA  =   SB =  

①  Find the pairwise node influence, SA & SB. ②  Find the similarity between SA & SB.

Danai Koutra (CMU)

DETAILS  

How?  Using  Belief  Propagation  Attenuating Neighboring Influence for small ε:

29

S = [I+ε 2D−εA]−1 ≈

≈ [I−εA]−1 = I+εA+ε 2A2 +...

1-hop 2-hops …

Note: ε > ε2 > ..., 0<ε<1

INTUITION  

Danai Koutra (CMU)

OUR  SOLUTION:  DELTACON  

30

DETAILS  

①  Find the pairwise node influence, SA & SB. ②  Find the similarity between SA & SB.

Danai Koutra (CMU)

sim( ) = 1

1+ sA,ij − sB,ij( )2

i, j∑SA,SB  

SA  =   SB =  

“Root” Euclidean Distance

…  but  O(n2)  …  

31

f a s t e r ?

O(m1+m2) in the paper J

Danai Koutra (CMU)

32

•  Nodes:  email  accounts  of  employees  •  Edges:  email  exchange    

Day  1                              Day  2                              Day  3                          Day  4                            Day  5    

sim1   sim2   sim3   sim4  

Danai Koutra (CMU)

Temporal  Anomaly    Detection    

33

similarity  

consecu@ve  days  Danai Koutra (CMU)

Feb  4:  Lay  resigns  

Temporal  Anomaly    Detection    

Brain-­‐Connectivity    Graph  Clustering  

34

•  114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections

•  Attributes: gender, IQ, age…

Danai Koutra (CMU)

Brain-­‐Connectivity    Graph  Clustering  

35 Danai Koutra (CMU) 35 Danai Koutra (CMU) Danai Koutra (CMU) 35

t-­‐test    p-­‐value  =  0.0057  

Graph  Understanding  via  …  

•  … Summarization … ² VoG: to spot the important graph structures

•  … Comparison …

² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently

36 Danai Koutra (CMU) Danai Koutra (CMU) 36

Thank  you!  

www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu

Danai Koutra (CMU) 37

summarization similarities Understanding

top related