statistical analysis of network data with r - network...
TRANSCRIPT
![Page 1: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/1.jpg)
Statistical Analysis of Network Data with RNetwork Cohesion & Graph Partitioning
Kim Seonghyeon
April 14, 2017
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 1 / 27
![Page 2: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/2.jpg)
Network Cohesion
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 2 / 27
![Page 3: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/3.jpg)
subgraph & censuses
cliqueclique: Complete subgraphmaximal clique: A clique that is not a subset of a larger clique
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 3 / 27
![Page 4: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/4.jpg)
subgraph & censuses
H
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2526
27
28
2930
31
32
33A
Figure 1: karateKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 4 / 27
![Page 5: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/5.jpg)
subgraph & censuses
Table 1: number of clique
1 2 3 4 5count 34 78 45 11 2
Table 2: number of maximal clique
2 3 4 5count 11 21 2 2
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 5 / 27
![Page 6: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/6.jpg)
subgraph & censuses
core & corenessk-core: weakened notion of cliqueA subgraph of G for which all vertex degrees are at least k.No other subgraph obeying the same condition contains it. (i.e., it ismaximal in this property)coreness: coreness(v) = max{k|H is k-core, v ∈ VH}
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 6 / 27
![Page 7: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/7.jpg)
subgraph & censuses
Figure 2: visualization with corenessKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 7 / 27
![Page 8: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/8.jpg)
subgraph & censuses
Censuses (directed graph)mutual:Cmut = {{u, v} ⊂ VG |(u, v), (v , u) ∈ EG}asymmetric:Casym = {{u, v} ⊂ VG |(u, v) ∈ EG} \ Cmutnull:Cnull = {{u, v}|{u, v} ⊂ VG} \ (Cmut ∪ Casym)
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 8 / 27
![Page 9: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/9.jpg)
subgraph & censuses## aidsblog
## $v## [1] 146#### $e## [1] 183#### $mut## [1] 3#### $asym## [1] 177#### $null## [1] 10405
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 9 / 27
![Page 10: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/10.jpg)
Density and Related Notions of Relative Frequency
DensityDensity:
den(H) = |EH ||VH |(|VH | − 1)/2
*In the case that G is a directed graph, the denominator is replaced by|VH|(|VH| − 1).
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 10 / 27
![Page 11: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/11.jpg)
Density and Related Notions of Relative Frequency
clustering coefficientglobal clustering coefficient:
clT (G) = 3τ∆(G)τ3(G)
local clustering coefficient:
cl(v) = τ∆(v)τ3(v)
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 11 / 27
![Page 12: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/12.jpg)
Density and Related Notions of Relative Frequency
a
b
c
d
Figure 3: transitivityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 12 / 27
![Page 13: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/13.jpg)
Density and Related Notions of Relative Frequency
reciprocity (directed graph)type 1:
rec1(G) = |Cmut ||Cmut ∪ Casym|
type 2:
rec2(G) = 2|Cmut ||EG |
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 13 / 27
![Page 14: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/14.jpg)
Density and Related Notions of Relative Frequency
a
b
c
Figure 4: reciprocityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 14 / 27
![Page 15: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/15.jpg)
Connectivity, Cuts, and Flows
ConnectivityA graph G is said to be connected if every vertex is reachable fromevery other.Connected component of a graph is a maximally connected subgraph.diameter: length of the longest path.
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 15 / 27
![Page 16: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/16.jpg)
Connectivity, Cuts, and Flows
k-vertex-connectedA graph G is called k-vertex-connected if(i) the number of vertices Nv > k(ii) the removal of any subset of vertices X ⊂ V of cardinality |X | < kleaves a subgraph that is connected.connectivity: connectivity(G) = max{k|G is k-vertex-connected }
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 16 / 27
![Page 17: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/17.jpg)
Graph Partitioning
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 17 / 27
![Page 18: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/18.jpg)
Graph Partition
Graph Partitionpartition: C = {C1, ...,CK}, partition of the vertex set VGE (Ck ,Ck′): edges connecting vertices in Ck to vertices in Ck
′
We want to seek partition C where E (Ck ,Ck′) is relatively small in sizecompared to the set E (Ck ,Ck)
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 18 / 27
![Page 19: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/19.jpg)
Hierarchical Clustering
modularity
eij = |E (Ci ,Cj)|2|E | , ai =
K∑j=1
eij
mod(C) =K∑
i=1(eii − ai
2)
mod(C) = 0 if eij = aiajmod(C) is large if ∑K
i=1 eii = 1
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 19 / 27
![Page 20: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/20.jpg)
Hierarchical Clustering
## fraction1
## 1 2 3 sum## 1 0.04 0.04 0.12 0.2## 2 0.04 0.04 0.12 0.2## 3 0.12 0.12 0.36 0.6## sum 0.20 0.20 0.60 1.0
## fraction2
## 1 2 3 sum## 1 0.2 0.0 0.0 0.2## 2 0.0 0.2 0.0 0.2## 3 0.0 0.0 0.6 0.6## sum 0.2 0.2 0.6 1.0
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 20 / 27
![Page 21: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/21.jpg)
Hierarchical Clustering
Hierarchical methodsagglomerative: begin with partition {{v1}, ..., {vNv}}divisive: begin with partition {V }
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 21 / 27
![Page 22: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/22.jpg)
Hierarchical Clustering
H2
34
5
6
7
8
9
10
11
12
13
14
151617
18
19
2021
22
23
24
2526
27
28
2930
31
32
33A
Mr Hi
Actor 2
Actor 3Actor 4
Actor 5
Actor 6Actor 7
Actor 8
Actor 9
Actor 10
Actor 11
Actor 12
Actor 13
Actor 14
Actor 15
Actor 16
Actor 17Actor 18
Actor 19
Actor 20
Actor 21
Actor 22
Actor 23
Actor 24
Actor 25Actor 26
Actor 27
Actor 28
Actor 29
Actor 30
Actor 31
Actor 32
Actor 33John A
Figure 5: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 22 / 27
![Page 23: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/23.jpg)
Spectral Partitioning
graph Laplaciangraph Laplacian: L = D − A, where A is adjacency matrix andD = diag [(dv )]λ1 ≤ ... ≤ λNv are the eigenvalues of L.graph G will consist of K connected components if and only ifλ1(L) = ··· = λK (L) = 0 and 0 < λK+1.
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 23 / 27
![Page 24: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/24.jpg)
Spectral Partitioning
A
B
C
D
E
F
G
H
I
J
K
L
Figure 6: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 24 / 27
![Page 25: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/25.jpg)
Spectral Partitioning
## [1] 0 0 0 1 2 2 2 2 3 3 4 5
## A B C D E F G H I J K L## 1 0.00 0.00 0.00 0.00 0.00 0.5 0.5 0.5 0.5 0.00 0.00 0.00## 2 0.00 0.00 0.00 0.00 0.00 0.0 0.0 0.0 0.0 0.58 0.58 0.58## 3 0.45 0.45 0.45 0.45 0.45 0.0 0.0 0.0 0.0 0.00 0.00 0.00
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 25 / 27
![Page 26: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/26.jpg)
Spectral Partitioning
spectral bisectionIf λ2(L) is close to zero, we might expect that there is good candidatefor bisection.partition vertices by separating them according to the sign of theirentries in the corresponding eigenvector x2S = {v ∈ V : x2(v) ≥ 0}, S̄ = {v ∈ V : x2(v) < 0}
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 26 / 27
![Page 27: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data](https://reader035.vdocuments.net/reader035/viewer/2022070722/5f01c0177e708231d400db8e/html5/thumbnails/27.jpg)
Spectral Partitioning
0 5 10 15 20 25 30 35
010
2030
4050
Index
Eig
enva
lues
of G
raph
Lap
laci
an
0 5 10 15 20 25 30 35
−0.
3−
0.2
−0.
10.
00.
10.
2
Actor Number
Fie
dler
Vec
tor
Ent
ry
Figure 7: Agglomerative Clustering
Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 27 / 27