advanced data mining - datalab.snu.ac.krukang/courses/20f-adm/l2-graphbasic.pdf · protein...
TRANSCRIPT
U Kang 1
Advanced Data Mining
Graph Basics and Diameter
U KangSeoul National University
U Kang 2
In This Lecture
Basic definitions in graph mining Small World Phenomenon Diameter over time
U Kang 3
Outline
Basic DefinitionSmall World PhenomenonDiameter over TimeConclusion
U Kang 4
Basic Definition
Graph: a way of specifying relationships among a collection of items. Nodes (or vertices) Edges
U Kang 5
Basic Definition
Types of graph Directed vs. Undirected graph
Directed Undirected
U Kang 6
Basic Definition
Types of graph Weighted vs. Unweighted graph
Weighted Unweighted
1.2
0.1
2.5
U Kang 7
Basic Definition
Types of graph Simple vs. Attributed graph
Simple Attributed
CEO
Research ManagerAssistant
Marketing Manager
U Kang 8
Examples of graph
Arpanet, DEC 19701) Social network2)
1) UCLA and BBN, ARPANET in December 1970, 1970,
https://commons.wikimedia.org/wiki/File:ARPANET_1970_Map.png
2) MRFerocius, Social Network, 2011,
https://stackoverflow.com/questions/4594962/social-network-directed-graph-library-for-net
facebooktwitter
U Kang 9
More Examples
Protein Interactions1) World Wide Web Document Network2)
PatentDBLP
1) Wikipedia, Schizophrenia PPI, 2016,
https://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction
2) Widipedia, Logo of the English Wikipedia, 2001,
https://en.wikipedia.org/wiki/English_Wikipedia
U Kang 10
Even more examples
Call graph (who calls whom) Email graph (who emailed whom) Movie-actor database from IMDB (more examples?)
U Kang 11
More definitions
Two vertices are adjacent if they share a common edge
Two adjacent vertices are neighbors An edge is incident with another edge if they share
a vertex An edge is incident with two vertices A degree of a node is the number of neighbors of it
Directed graph: in-degree, out-degree
U Kang 12
Connection
Connected graph
Disconnected graph
U Kang 13
Connected Component
Disconnected Component
Disconnected Component
C
DE
A
B
Giant Connected Component
F
G
H
I
K
M
L
U Kang 14
Special Families of Graph
Star graph Equation for |E| (=m) as a function of |V| (=n)?
Chain graph Equation for |E| (=m) as a function of |V| (=n)?
Complete graph = full graph = clique graph Equation for |E| (=m) as a function of |V| (=n)?
U Kang 15
Special Families of Graph
Bipartite graph
Complete bipartite graph
U Kang 16
Path, Cycle, Walk
Path : a sequence of connected vertices Cycle : a path whose start and end vertices are the
same Simple path : no repeated vertices Simple cycle : no repeated vertices, except the
start vertex (=end vertex)
Some authors use ‘path’ for simple path (no repetition), and ‘walk’ for path (repetition allowed)
U Kang 17
Subgraph
Subgraph A subset of the graph
Induced subgraph A subgraph induced from a set of vertices
U Kang 18
Outline
Basic DefinitionSmall World PhenomenonDiameter over TimeConclusion
U Kang 19
Distance in graph
Length of a path : # of steps from beginning to end
You
U Kang 20
Distance in graph
Length of a path : # of steps from beginning to end
You
Distance 1: “friends”
U Kang 21
Distance in graph
Length of a path : # of steps from beginning to end
You
Distance 2: “friends of friends”
U Kang 22
Distance in graph
Length of a path : # of steps from beginning to end
You
Distance 3: “friends of friends of friends”
U Kang 23
Algorithm: Breadth First Search
Find one step neighbors. Find two step neighbors. Continue…
You
U Kang 24
Radius and Diameter
Radius of a node : the longest shortest distance to the all other nodes
Diameter of a graph : the maximum radius
v
U Kang 25
Small World Phenomenon
Milgram’s experiment
Six degrees of separation
The median value is “6”N
UMBE
RO
F CH
AIN
S
NUMBER OF INTERMEDIARIES
N = 64
- Jeffrey Travers and Stanley Milgram, An Experimental Study of the Small World Problem, Sociometry, Vol. 32, No. 4, 1969, pp. 432
U Kang 26
Small World Phenomenon
Erdos Number Paul Erdos : published ~1500 papers Erdos number : the distance to Paul Erdos in co-
authorship graph
1) Billy and Grace Tao, Paul Erdos with Terence Tao, 1985, https://commons.wikimedia.org/wiki/File:Paul_Erdos_with_Terence_Tao.jpg 2) H2g2bob, Erdosnumber, 2006, https://commons.wikimedia.org/wiki/File:Erdosnumber.png
1) 2)
U Kang 27
Small World Phenomenon
Erdos Number Albert Einstein : 2 Enrico Fermi : 3 Noam Chomski : 4 Most mathematicians have Erdos numbers at most 4 or 5
U Kang 28
Small World Phenomenon
Kevin Bacon number Distance to the Kevin Bacon in actor-actor graph from
Internet Movie DataBase (IMDB) at 1994 Average Bacon number of actors : 2.9
- GabboT, Kevin Bacon TIFF 2015, 2015, https://commons.wikimedia.org/wiki/File:Kevin_Bacon_TIFF_2015.jpg
U Kang 29
Outline
Basic DefinitionSmall World PhenomenonDiameter over Time
ObservationModel
Conclusion
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 30
Graph Evolution
How do the graphs evolve over time?
Application Network simulation Network prediction
U Kang 31
Graph Evolution
How do the graphs evolve over time?
Conventional Wisdom Constant average degree: the number of edges grows
linearly with the number of nodes Slowly growing diameter
New findings Densification power law: networks are becoming denser
over time Shrinking diameter
U Kang 32
Temporal Evolution of Graphs
Densification Power Law Networks are becoming denser over time Average degree is increasing
orequivalently
U Kang 33
Graph Densification – A closer look
Densification Power Law
Densification exponent: 1 ≤ a ≤ 2: a=1: linear growth – constant out-degree
(assumed in the literature so far) a=2: quadratic growth – clique
What are the exponents a for real graphs?
U Kang 34
Densification – Physics Citations
Citations among physics papers
1992: 1,293 papers,
2,717 citations 2003:
29,555 papers, 352,807 citations
For each month M, create a graph of all citations up to month M
N(t)
E(t)
1.69
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 35
Densification – Patent Citations
Citations among patents granted
1975 334,000 nodes 676,000 edges
1999 2.9 million nodes 16.5 million edges
Each year is a data point
N(t)
E(t)
1.66
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 36
Densification – Autonomous Systems
Graph of Internet 1997
3,000 nodes 10,000 edges
2000 6,000 nodes 26,000 edges
One graph per dayN(t)
E(t)
1.18
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 37
Densification – Affiliation Network
Authors linked to their publications
1992 318 nodes 272 edges
2002 60,000 nodes
20,000 authors 38,000 papers
133,000 edges N(t)
E(t)
1.15
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 38
Graph Densification – Summary
The traditional constant out-degree assumption does not hold
Real world graphs:
The average degree is increasing
U Kang 39
Evolution of the Diameter
Prior work on Power Law graphs hints atSlowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N)
What is happening in real data?
Diameter shrinks over time As the network grows the distances between nodes
slowly decrease
U Kang 40
Diameter – ArXiv citation graph
Citations among physics papers
1992 –2003 One graph per year
time [years]
diameter
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 41
Diameter – “Autonomous Systems”
Graph of Internet One graph per day 1997 – 2000
number of nodes
diameter
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 42
Diameter – “Affiliation Network”
Graph of collaborations in physics – authors linked to papers
10 years of data
time [years]
diameter
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 43
Diameter – “Patents”
Patent citation network
25 years of data
time [years]
diameter
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 44
Outline
Basic DefinitionSmall World PhenomenonDiameter over Time
ObservationModel
Conclusion
U Kang 45
Models
Existing graph generation models do not capture Densification Power Law and Shrinking diameters
Can we find a simple model of local behavior, which naturally leads to observed phenomena?
U Kang 46
“Forest Fire” model
How do people make friends in a new environment?
1. Find first a person and make friends2. Follow one of his friends3. Continue recursively4. From time to time get introduced to a new person
Forest Fire model imitates exactly this process
U Kang 47
“Forest Fire” – the Model
A node arrives Randomly chooses an “ambassador” Starts burning nodes (with probability p) and
adds links to burned nodes “Fire” spreads recursively
U Kang 48
Forest Fire in Action
Forest Fire generates graphs that Densify and have Shrinking Diameter
Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005
U Kang 49
Outline
Basic DefinitionSmall World PhenomenonDiameter over TimeConclusion
U Kang 50
Conclusion
Definitions: make sure you know them Small World Phenomenon
Six degrees of separation Diameter over time
Shrinking diameter
U Kang 51
Questions?