danai koutra – cmu/technicolor researcher, carnegie mellon university at mlconf atl
DESCRIPTION
Networks naturally capture a host of interactions in the real world spanning from friendships to brain activity. But, given a massive graph, like the Facebook social graph, what can be said about its structure? Which are its most important structures? How does it compare to other networks like Twitter? This talk will focus on my work developing scalable algorithms and models that help us to make sense of large graphs via pattern discovery and similarity analysis. I will begin by presenting VoG, an approach that efficiently summarizes large graphs by finding their most interesting and semantically meaningful structures. Starting from a clutter of millions of nodes and edges, such as the Enron who-mails-whom graph, our Minimum Description Length based algorithm, disentangles the complex graph connectivity and spotlights the structures that ‘best’ describe the graph. Then, for similarity analysis at the graph level, I will introduce the problems of graph comparison and graph alignment. I will conclude by showing how to apply my methods to temporal anomaly detection, brain graph clustering, deanonymization of bipartite (e.g., user-group membership) and unipartite graphs, and more.TRANSCRIPT
Carnegie Mellon University
Making Sense of Large Graphs: Summarization and Similarity
Mlconf ‘14, Atlanta, GA
Danai Koutra Computer Science Department
Carnegie Mellon University
[email protected] http://www.cs.cmu.edu/~dkoutra
Making sense of large graphs
Danai Koutra (CMU) 2
Human Connectome
Project
scalable algorithms and models for understanding massive graphs.
>1.25B users!
Understanding Large Graphs
Danai Koutra (CMU) 3
Part 1 S u m m a r i z a t i o n
Danai Koutra (CMU) 4
79,870 email accounts 288,364 emails
Ever tried visualizing a large graph?
Danai Koutra (CMU) 5
79,870 email accounts 288,364 emails
Ever tried visualizing a large graph?
Enron Summary
Danai Koutra (CMU) 7
VoG Top Near Bipartite Core Ski
excursion
organizers participants
“Affair”
Commenters CC’ed
Problem DeCinition
Danai Koutra (CMU) 8
Given: a graph
Find:
≈ important graph
structures.
a succinct summary with possibly overlapping subgraphs
[Koutra, Kang, Vreeken, Faloutsos. SDM’14]
Danai Koutra (CMU) 8
Lady Gaga Fan Club
Main Ideas
Idea 1: Use well-known structures (vocabulary):
Idea 2: Best graph summary è optimal compression (MDL)
Danai Koutra (CMU) 9
Shortest lossless description
Minimum Description Length
Danai Koutra (CMU) 10
BACKGROUND
a1 x + a0
min L(M) + L(D|M)
a10 x10 + a9 x9 + … + a0
errors
{ }
simple & good explanations
# bits for M
# bits for the data using M
~Occam’s razor
Formally: Minimum Graph Description
Danai Koutra (CMU) 11
Given: - a graph G - vocabulary Ω
Find: model M s.t. min L(G,M) = min{ L(M) + L(E) }
Adjacency A Model M Error E
VoG: Overview
Danai Koutra (CMU) 12
argmin
≈
≈?
VoG: Overview
Danai Koutra (CMU) 13
Pick best (with some criterion)
Summary
Q: Which structures to pick?
Danai Koutra (CMU) 14
A: Those that min description length S of G
2|S| combinations
Runtime
Danai Koutra (CMU) 15
VOG is near-linear on # edges of the input graph.
1.25B users!
Understanding a wiki graph
Danai Koutra (CMU) 16
Nodes: wiki editors Edges: co-edited
I don’t see anything! L
Wiki Controversial Article
Danai Koutra (CMU) 17
Stars: admins, bots, heavy users
Bipartite cores: edit wars
Kiev vs. Kyiv vandals vs. admins
VoG vs. other methods
Danai Koutra (CMU) 18
VoG Bounded-‐Error Summariza@on
Mo@f Simplifica@on
Clustering Methods
Cross-‐Associa@ons
Variety of Structures ✔ ✗ ✗ ✗ ✗ Important Structures ✔ ✗ ✗ ✗ ✗ Low Complexity ✔ ✗ ✗ ✔(?) ✔ Visualiza@on ✔ ✔ ✔ ✗ ✗ Graph Summary ✔ ✔ ✔ ✗ ✗
Stars, cliques near-cliques
[Navlakha+’08] [Dunne+’13] [Chakrabarti+’03]
VoG: summary
Danai Koutra (CMU) 19
• Focus on important • possibly-overlapping structures • with known graph-theoretic properties
www.cs.cmu.edu/~dkoutra/SRC/vog.tar
Understanding Large Graphs
Danai Koutra (CMU) 20
Part 2 S i m i l a r i t i e s
friendship graph ≈ wall posts graph?
Danai Koutra (CMU) 21
Behavioral PaOerns 1
VS.
Are the graphs / behaviors similar?
Why graph similarity?
Danai Koutra (CMU) 22
Classification 2
Temporal anomaly detec@on
3
Intrusion detec@on 4
�! �!12 13 14 22 23
Day 1 Day 2 Day 3 Day 4
sim1 sim2 sim3
Problem DeCinition: Graph Similarity
• Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence
• Find: similarity score s [0,1]
€
∈
Danai Koutra (CMU) 23
GA
GB
Obvious solution?
Edge Overlap (EO) # of common edges (normalized or not)
Danai Koutra 24
GA
GB
… but “barbell”…
EO(B10,mB10) == EO(B10,mmB10)
Danai Koutra 25
GA GA
GB GB’
What makes a similarity function good?
26
• Properties: ² Intuitive
Danai Koutra
ProperFes like: “Edge-‐importance”
✗
What makes a similarity function good?
27
• Properties: ² Intuitive
² Scalable
Danai Koutra
ProperFes like: “Weight-‐awareness”
✗
MAIN IDEA: DELTACON
28
SA = SB =
① Find the pairwise node influence, SA & SB. ② Find the similarity between SA & SB.
Danai Koutra (CMU)
DETAILS
How? Using Belief Propagation Attenuating Neighboring Influence for small ε:
29
S = [I+ε 2D−εA]−1 ≈
≈ [I−εA]−1 = I+εA+ε 2A2 +...
1-hop 2-hops …
Note: ε > ε2 > ..., 0<ε<1
INTUITION
Danai Koutra (CMU)
OUR SOLUTION: DELTACON
30
DETAILS
① Find the pairwise node influence, SA & SB. ② Find the similarity between SA & SB.
Danai Koutra (CMU)
sim( ) = 1
1+ sA,ij − sB,ij( )2
i, j∑SA,SB
SA = SB =
“Root” Euclidean Distance
… but O(n2) …
31
f a s t e r ?
O(m1+m2) in the paper J
Danai Koutra (CMU)
32
• Nodes: email accounts of employees • Edges: email exchange
Day 1 Day 2 Day 3 Day 4 Day 5
sim1 sim2 sim3 sim4
Danai Koutra (CMU)
Temporal Anomaly Detection
33
similarity
consecu@ve days Danai Koutra (CMU)
Feb 4: Lay resigns
Temporal Anomaly Detection
Brain-‐Connectivity Graph Clustering
34
• 114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections
• Attributes: gender, IQ, age…
Danai Koutra (CMU)
Brain-‐Connectivity Graph Clustering
35 Danai Koutra (CMU) 35 Danai Koutra (CMU) Danai Koutra (CMU) 35
t-‐test p-‐value = 0.0057
Graph Understanding via …
• … Summarization … ² VoG: to spot the important graph structures
• … Comparison …
² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently
36 Danai Koutra (CMU) Danai Koutra (CMU) 36
Thank you!
www.cs.cmu.edu/~dkoutra/pub.htm [email protected]
Danai Koutra (CMU) 37
summarization similarities Understanding