part 1: graph mining – patternschristos/talks/17-cmu-tepper-alan/faloutsos-p1-patterns.pdf(c) c....
TRANSCRIPT
(C) C. Faloutsos, 2017
1
CMU SCS
Part 1: Graph Mining – patterns
Christos Faloutsos CMU
CMU SCS
Our goal:
Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining
System) • www.cs.cmu.edu/~pegasus • code and papers Tepper, CMU, April 4 (c) C. Faloutsos, 2017 2
CMU SCS
References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/
S00449ED1V01Y201209DMK006
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 3
CMU SCS
Outline
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Tools (Ranking, proximity) • Conclusions
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 4
(C) C. Faloutsos, 2017
2
CMU SCS
Graphs - why should we care?
Internet Map [lumeta.com]
Food Web [Martinez ’91]
Protein Interactions [genomebiology.com]
Friendship Network [Moody ’01]
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 5
CMU SCS
Graphs - why should we care? • IR: bi-partite graphs (doc-terms)
• web: hyper-text graph
• ... and more:
D1
DN
T1
TM
... ...
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 6
CMU SCS
Graphs - why should we care? • network of companies & board-of-directors
members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic
and anomaly detection • ....
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 7
CMU SCS
Outline
• Introduction – Motivation • Patterns in graphs
– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 8
(C) C. Faloutsos, 2017
3
CMU SCS
Network and graph mining
• How does the Internet look like? • How does FaceBook look like?
• What is ‘normal’/‘abnormal’? • which patterns/laws hold?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 9
CMU SCS
Network and graph mining
• How does the Internet look like? • How does FaceBook look like?
• What is ‘normal’/‘abnormal’? • which patterns/laws hold?
– To spot anomalies (rarities), we have to discover patterns
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 10
CMU SCS
Network and graph mining
• How does the Internet look like? • How does FaceBook look like?
• What is ‘normal’/‘abnormal’? • which patterns/laws hold?
– To spot anomalies (rarities), we have to discover patterns
– Large datasets reveal patterns/anomalies that may be invisible otherwise…
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 11
CMU SCS
Topology
How does the Internet look like? Any rules?
(Looks random – right?)
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 12
(C) C. Faloutsos, 2017
4
CMU SCS
Graph mining • Are real graphs random?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 13
CMU SCS
Laws and patterns • Are real graphs random? • A: NO!!
– Diameter – in- and out- degree distributions – other (surprising) patterns
• So, let’s look at the data
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 14
CMU SCS
Laws – degree distributions • Q: avg degree is ~2 - what is the most
probable degree?
degree
count ??
2
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 15
CMU SCS
Laws – degree distributions • Q: avg degree is ~2 - what is the most
probable degree?
degree degree
count ?? count
2 2
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 16
(C) C. Faloutsos, 2017
5
CMU SCS
Solution S1 .Power-law: outdegree O
The plot is linear in log-log scale [FFF’99]
freq = degree (-2.15)
O = -2.15 Exponent = slope
Outdegree
Frequency
Nov’97
-2.15
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 17
CMU SCS
Solution# S.1’
• Power law in the degree distribution [SIGCOMM99]
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 18
CMU SCS
Solution# S.2: Eigen Exponent E
• A2: power law in the eigenvalues of the adjacency matrix
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 19
CMU SCS
Solution# S.2: Eigen Exponent E
• [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 20
(C) C. Faloutsos, 2017
6
CMU SCS
But: How about graphs from other domains?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 21
CMU SCS
More power laws: • web hit counts [w/ A. Montgomery]
Web Site Traffic
in-degree (log scale)
Count (log scale)
Zipf
users sites
``ebay’’
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 22
CMU SCS
epinions.com • who-trusts-whom
[Richardson + Domingos, KDD 2001]
(out) degree
count
trusts-2000-people user
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 23
CMU SCS
And numerous more • # of sexual contacts • Income [Pareto] –’80-20 distribution’ • Duration of downloads [Bestavros+] • Duration of UNIX jobs (‘mice and
elephants’) • Size of files of a user • … • ‘Black swans’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 24
(C) C. Faloutsos, 2017
7
CMU SCS
Outline
• Introduction – Motivation • Patterns in graphs
– Patterns in Static graphs • Degree • Triangles • …
– Patterns in Weighted graphs – Patterns in Time evolving graphs
• Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 25
CMU SCS
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 26
CMU SCS
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles – Friends of friends are friends
• Any patterns?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 27
CMU SCS
Triangle Law: #S.3 [Tsourakakis ICDM 2008]
ASN HEP-TH
Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 28
(C) C. Faloutsos, 2017
8
CMU SCS
Triangle Law: #S.3 [Tsourakakis ICDM 2008]
ASN HEP-TH
Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 29
CMU SCS
Triangle Law: #S.4 [Tsourakakis ICDM 2008]
SN Reuters
Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 30
CMU SCS
Outline
• Introduction – Motivation • Patterns in graphs
– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs
• Generators
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 31
CMU SCS Observations on weighted
graphs? • A: yes - even more ‘laws’!
M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 32
(C) C. Faloutsos, 2017
9
CMU SCS
Observation W.1: Fortification Q: How do the weights of nodes relate to degree?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 33
CMU SCS
Observation W.1: Fortification
More donors, more $ ?
$10
$5
‘Reagan’
‘Clinton’ $7
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 34
CMU SCS
Edges (# donors)
In-weights ($)
Observation W.1: fortification: Snapshot Power Law
• Weight: super-linear on in-degree • exponent ‘iw’: 1.01 < iw < 1.26
Orgs-Candidates
e.g. John Kerry, $10M received, from 1K donors
More donors, even more $
$10
$5
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 35
CMU SCS
Outline
• Introduction – Motivation • Patterns in graphs
– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs
• Generators
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 36
(C) C. Faloutsos, 2017
10
CMU SCS
Problem: Time evolution • with Jure Leskovec (CMU ->
Stanford)
• and Jon Kleinberg (Cornell – sabb. @ CMU)
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 37
CMU SCS
T.1 Evolution of the Diameter • Prior work on Power Law graphs hints
at slowly growing diameter: – diameter ~ O(log N) – diameter ~ O(log log N)
• What is happening in real data?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 38
CMU SCS
T.1 Evolution of the Diameter • Prior work on Power Law graphs hints
at slowly growing diameter: – diameter ~ O(log N) – diameter ~ O(log log N)
• What is happening in real data? • Diameter shrinks over time
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 39
CMU SCS
T.1 Diameter – “Patents”
• Patent citation network
• 25 years of data • @1999
– 2.9 M nodes – 16.5 M edges
time [years]
diameter
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 40
(C) C. Faloutsos, 2017
11
CMU SCS T.2 Temporal Evolution of the
Graphs
• N(t) … nodes at time t • E(t) … edges at time t • Suppose that
N(t+1) = 2 * N(t) • Q: what is your guess for
E(t+1) =? 2 * E(t)
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 41
CMU SCS T.2 Temporal Evolution of the
Graphs
• N(t) … nodes at time t • E(t) … edges at time t • Suppose that
N(t+1) = 2 * N(t) • Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled! – But obeying the ``Densification Power Law’’
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 42
CMU SCS T.2 Densification – Patent
Citations • Citations among
patents granted • @1999
– 2.9 M nodes – 16.5 M edges
• Each year is a datapoint
N(t)
E(t)
1.66
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 43
CMU SCS
Outline
• Introduction – Motivation • Patterns in graphs
– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs
• Generators
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 44
(C) C. Faloutsos, 2017
12
CMU SCS
More on Time-evolving graphs
M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 45
CMU SCS
Observation T.3: NLCC behavior Q: How do NLCC’s emerge and join with
the GCC? (``NLCC’’ = non-largest conn. components) – Do they continue to grow in size? – or do they shrink? – or stabilize?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 46
CMU SCS
Observation T.3: NLCC behavior • After the gelling point, the GCC takes off, but
NLCC’s remain ~constant (actually, oscillate).
IMDB
CC size
Time-stamp Tepper, CMU, April 4 (c) C. Faloutsos, 2017 47
CMU SCS Generalized Iterated Matrix
Vector Multiplication (GIMV)
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 48
(C) C. Faloutsos, 2017
13
CMU SCS
Example: GIM-V At Work • Connected Components
Size
Count
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 49
CMU SCS
Example: GIM-V At Work • Connected Components
Size
Count
~0.7B singleton nodes
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 50
CMU SCS
Example: GIM-V At Work • Connected Components
Size
Count
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 51
CMU SCS
Example: GIM-V At Work • Connected Components
Size
Count 300-size
cmpt X 500. Why? 1100-size cmpt
X 65. Why?
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 52
(C) C. Faloutsos, 2017
14
CMU SCS
Example: GIM-V At Work • Connected Components
Size
Count
suspicious financial-advice sites
(not existing now)
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 53
CMU SCS
GIM-V At Work • Connected Components over Time • LinkedIn: 7.5M nodes and 58M edges
Stable tail slope after the gelling point
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 54
CMU SCS
Timing for Blogs
• with Mary McGlohon (CMU) • Jure Leskovec (CMU->Stanford) • Natalie Glance (now at Google) • Mat Hurst (now at MSR) [SDM’07]
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 55
CMU SCS
T.4 : popularity over time
Post popularity drops-off – exponentially?
lag: days after post
# in links
1 2 3
@t
@t + lag
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 56
(C) C. Faloutsos, 2017
15
CMU SCS
T.4 : popularity over time
Post popularity drops-off – exponentially? POWER LAW! Exponent?
# in links (log)
1 2 3 days after post (log)
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 57
CMU SCS
T.4 : popularity over time
Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 • close to -1.5: Barabasi’s stack model • and like the zero-crossings of a random walk
# in links (log)
1 2 3
-1.6
days after post (log)
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 58
CMU SCS
Conclusions (part1)
MANY patterns in real graphs • Skewed degree distributions • Small (and shrinking) diameter • Power-laws wrt triangles • Oscillating size of connected components • … and more
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 59
CMU SCS
References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/
S00449ED1V01Y201209DMK006
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 60
(C) C. Faloutsos, 2017
16
CMU SCS
References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award).
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 61
CMU SCS
Project info www.cs.cmu.edu/~pegasus
Akoglu, Leman
Chau, Polo
Kang, U
McGlohon, Mary
Tsourakakis, Babis
Tong, Hanghang
Prakash, Aditya
Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, INTEL, HP Tepper, CMU, April 4 (c) C. Faloutsos, 2017 62
CMU SCS
Part1 END
Tepper, CMU, April 4 (c) C. Faloutsos, 2017 63