advanced data mining - datalab.snu.ac.krukang/courses/20f-adm/l2-graphbasic.pdf · protein...

Post on 25-Sep-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

U Kang 1

Advanced Data Mining

Graph Basics and Diameter

U KangSeoul National University

U Kang 2

In This Lecture

Basic definitions in graph mining Small World Phenomenon Diameter over time

U Kang 3

Outline

Basic DefinitionSmall World PhenomenonDiameter over TimeConclusion

U Kang 4

Basic Definition

Graph: a way of specifying relationships among a collection of items. Nodes (or vertices) Edges

U Kang 5

Basic Definition

Types of graph Directed vs. Undirected graph

Directed Undirected

U Kang 6

Basic Definition

Types of graph Weighted vs. Unweighted graph

Weighted Unweighted

1.2

0.1

2.5

U Kang 7

Basic Definition

Types of graph Simple vs. Attributed graph

Simple Attributed

CEO

Research ManagerAssistant

Marketing Manager

U Kang 8

Examples of graph

Arpanet, DEC 19701) Social network2)

1) UCLA and BBN, ARPANET in December 1970, 1970,

https://commons.wikimedia.org/wiki/File:ARPANET_1970_Map.png

2) MRFerocius, Social Network, 2011,

https://stackoverflow.com/questions/4594962/social-network-directed-graph-library-for-net

facebooktwitter

U Kang 9

More Examples

Protein Interactions1) World Wide Web Document Network2)

PatentDBLP

1) Wikipedia, Schizophrenia PPI, 2016,

https://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction

2) Widipedia, Logo of the English Wikipedia, 2001,

https://en.wikipedia.org/wiki/English_Wikipedia

U Kang 10

Even more examples

Call graph (who calls whom) Email graph (who emailed whom) Movie-actor database from IMDB (more examples?)

U Kang 11

More definitions

Two vertices are adjacent if they share a common edge

Two adjacent vertices are neighbors An edge is incident with another edge if they share

a vertex An edge is incident with two vertices A degree of a node is the number of neighbors of it

Directed graph: in-degree, out-degree

U Kang 12

Connection

Connected graph

Disconnected graph

U Kang 13

Connected Component

Disconnected Component

Disconnected Component

C

DE

A

B

Giant Connected Component

F

G

H

I

K

M

L

U Kang 14

Special Families of Graph

Star graph Equation for |E| (=m) as a function of |V| (=n)?

Chain graph Equation for |E| (=m) as a function of |V| (=n)?

Complete graph = full graph = clique graph Equation for |E| (=m) as a function of |V| (=n)?

U Kang 15

Special Families of Graph

Bipartite graph

Complete bipartite graph

U Kang 16

Path, Cycle, Walk

Path : a sequence of connected vertices Cycle : a path whose start and end vertices are the

same Simple path : no repeated vertices Simple cycle : no repeated vertices, except the

start vertex (=end vertex)

Some authors use ‘path’ for simple path (no repetition), and ‘walk’ for path (repetition allowed)

U Kang 17

Subgraph

Subgraph A subset of the graph

Induced subgraph A subgraph induced from a set of vertices

U Kang 18

Outline

Basic DefinitionSmall World PhenomenonDiameter over TimeConclusion

U Kang 19

Distance in graph

Length of a path : # of steps from beginning to end

You

U Kang 20

Distance in graph

Length of a path : # of steps from beginning to end

You

Distance 1: “friends”

U Kang 21

Distance in graph

Length of a path : # of steps from beginning to end

You

Distance 2: “friends of friends”

U Kang 22

Distance in graph

Length of a path : # of steps from beginning to end

You

Distance 3: “friends of friends of friends”

U Kang 23

Algorithm: Breadth First Search

Find one step neighbors. Find two step neighbors. Continue…

You

U Kang 24

Radius and Diameter

Radius of a node : the longest shortest distance to the all other nodes

Diameter of a graph : the maximum radius

v

U Kang 25

Small World Phenomenon

Milgram’s experiment

Six degrees of separation

The median value is “6”N

UMBE

RO

F CH

AIN

S

NUMBER OF INTERMEDIARIES

N = 64

- Jeffrey Travers and Stanley Milgram, An Experimental Study of the Small World Problem, Sociometry, Vol. 32, No. 4, 1969, pp. 432

U Kang 26

Small World Phenomenon

Erdos Number Paul Erdos : published ~1500 papers Erdos number : the distance to Paul Erdos in co-

authorship graph

1) Billy and Grace Tao, Paul Erdos with Terence Tao, 1985, https://commons.wikimedia.org/wiki/File:Paul_Erdos_with_Terence_Tao.jpg 2) H2g2bob, Erdosnumber, 2006, https://commons.wikimedia.org/wiki/File:Erdosnumber.png

1) 2)

U Kang 27

Small World Phenomenon

Erdos Number Albert Einstein : 2 Enrico Fermi : 3 Noam Chomski : 4 Most mathematicians have Erdos numbers at most 4 or 5

U Kang 28

Small World Phenomenon

Kevin Bacon number Distance to the Kevin Bacon in actor-actor graph from

Internet Movie DataBase (IMDB) at 1994 Average Bacon number of actors : 2.9

- GabboT, Kevin Bacon TIFF 2015, 2015, https://commons.wikimedia.org/wiki/File:Kevin_Bacon_TIFF_2015.jpg

U Kang 29

Outline

Basic DefinitionSmall World PhenomenonDiameter over Time

ObservationModel

Conclusion

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 30

Graph Evolution

How do the graphs evolve over time?

Application Network simulation Network prediction

U Kang 31

Graph Evolution

How do the graphs evolve over time?

Conventional Wisdom Constant average degree: the number of edges grows

linearly with the number of nodes Slowly growing diameter

New findings Densification power law: networks are becoming denser

over time Shrinking diameter

U Kang 32

Temporal Evolution of Graphs

Densification Power Law Networks are becoming denser over time Average degree is increasing

orequivalently

U Kang 33

Graph Densification – A closer look

Densification Power Law

Densification exponent: 1 ≤ a ≤ 2: a=1: linear growth – constant out-degree

(assumed in the literature so far) a=2: quadratic growth – clique

What are the exponents a for real graphs?

U Kang 34

Densification – Physics Citations

Citations among physics papers

1992: 1,293 papers,

2,717 citations 2003:

29,555 papers, 352,807 citations

For each month M, create a graph of all citations up to month M

N(t)

E(t)

1.69

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 35

Densification – Patent Citations

Citations among patents granted

1975 334,000 nodes 676,000 edges

1999 2.9 million nodes 16.5 million edges

Each year is a data point

N(t)

E(t)

1.66

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 36

Densification – Autonomous Systems

Graph of Internet 1997

3,000 nodes 10,000 edges

2000 6,000 nodes 26,000 edges

One graph per dayN(t)

E(t)

1.18

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 37

Densification – Affiliation Network

Authors linked to their publications

1992 318 nodes 272 edges

2002 60,000 nodes

20,000 authors 38,000 papers

133,000 edges N(t)

E(t)

1.15

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 38

Graph Densification – Summary

The traditional constant out-degree assumption does not hold

Real world graphs:

The average degree is increasing

U Kang 39

Evolution of the Diameter

Prior work on Power Law graphs hints atSlowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N)

What is happening in real data?

Diameter shrinks over time As the network grows the distances between nodes

slowly decrease

U Kang 40

Diameter – ArXiv citation graph

Citations among physics papers

1992 –2003 One graph per year

time [years]

diameter

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 41

Diameter – “Autonomous Systems”

Graph of Internet One graph per day 1997 – 2000

number of nodes

diameter

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 42

Diameter – “Affiliation Network”

Graph of collaborations in physics – authors linked to papers

10 years of data

time [years]

diameter

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 43

Diameter – “Patents”

Patent citation network

25 years of data

time [years]

diameter

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 44

Outline

Basic DefinitionSmall World PhenomenonDiameter over Time

ObservationModel

Conclusion

U Kang 45

Models

Existing graph generation models do not capture Densification Power Law and Shrinking diameters

Can we find a simple model of local behavior, which naturally leads to observed phenomena?

U Kang 46

“Forest Fire” model

How do people make friends in a new environment?

1. Find first a person and make friends2. Follow one of his friends3. Continue recursively4. From time to time get introduced to a new person

Forest Fire model imitates exactly this process

U Kang 47

“Forest Fire” – the Model

A node arrives Randomly chooses an “ambassador” Starts burning nodes (with probability p) and

adds links to burned nodes “Fire” spreads recursively

U Kang 48

Forest Fire in Action

Forest Fire generates graphs that Densify and have Shrinking Diameter

Materials based on Jure Leskovec’s slideJ. Leskovec, J. Kleinberg, C. Faloutsos. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, ACM SIGKDD, 2005

U Kang 49

Outline

Basic DefinitionSmall World PhenomenonDiameter over TimeConclusion

U Kang 50

Conclusion

Definitions: make sure you know them Small World Phenomenon

Six degrees of separation Diameter over time

Shrinking diameter

U Kang 51

Questions?

top related