![Page 1: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/1.jpg)
SCAN: A Structural Clustering Algorithm for Networks
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas Schweiger
KDD’07
![Page 2: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/2.jpg)
An Introduction to DBSCAN
DBSCAN is a density-based algorithm.– Density = number of points within a specified radius (Eps)
– A point is a core point if it has more than a specified number of points (MinPts) within Eps
These are points that are at the interior of a cluster
– A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point
– A noise point is any point that is not a core point or a border point.
![Page 3: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/3.jpg)
DBSCAN: Core, Border, and Noise Points
![Page 4: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/4.jpg)
DBSCAN Algorithm
Eliminate noise points Perform clustering on the remaining points
![Page 5: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/5.jpg)
DBSCAN: Core, Border and Noise Points
Original Points Point types: core, border and noise
Eps = 10, MinPts = 4
![Page 6: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/6.jpg)
When DBSCAN Works Well
Original Points Clusters
• Resistant to Noise
• Can handle clusters of different shapes and sizes
![Page 7: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/7.jpg)
DBSCAN: Determining EPS and MinPts
Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance
Noise points have the kth nearest neighbor at farther distance
So, plot sorted distance of every point to its kth nearest neighbor
![Page 8: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/8.jpg)
Network Clustering Problem
Networks made up of the mutual relationships of data elements usually have an underlying structure. Because relationships are complex, it is difficult to discover these structures. How can the structure be made clear?
Stated another way, given simply information of who associates with whom, could one identify clusters of individuals with common interests or special relationships (families, cliques, terrorist cells).
![Page 9: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/9.jpg)
An Example of Networks
How many clusters? What size should they
be? What is the best
partitioning? Should some points
be differentiated?
![Page 10: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/10.jpg)
A Social Network Model
Individuals in a tight social group, or clique, know many of the same people, regardless of the size of the group.
Individuals who are hubs know many people in different groups but belong to no single group. Politicians, for example bridge multiple groups.
Individuals who are outliers reside at the margins of society. Hermits, for example, know few people and belong to no group.
![Page 11: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/11.jpg)
The Neighborhood of a Vertex
v
Define () as the immediate neighborhood of a vertex (i.e. the set of people that an individual knows ).
![Page 12: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/12.jpg)
Structure Similarity
The desired features tend to be captured by a measure we call Structural Similarity
Structural similarity is large for members of a clique and small for hubs and outliers.
|)(||)(|
|)()(|),(
wv
wvwv
![Page 13: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/13.jpg)
Structural Connectivity [1]
-Neighborhood: Core: Direct structure reachable:
Structure reachable: transitive closure of direct structure reachability
Structure connected:
}),(|)({)( wvvwvN
|)(|)(, vNvCORE
)()(),( ,, vNwvCOREwvDirRECH
),(),(:),( ,,, wuRECHvuRECHVuwvCONNECT
[1] M. Ester, H. P. Kriegel, J. Sander, & X. Xu (KDD'97)
![Page 14: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/14.jpg)
Structure-Connected Clusters
Structure-connected cluster C– Connectivity:
– Maximality:
Hubs:– Not belong to any cluster– Bridge to many clusters
Outliers:– Not belong to any cluster– Connect to less clusters
),(:, , wvCONNECTCwv
CwwvREACHCvVwv ),(:, ,
hub
outlier
![Page 15: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/15.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
![Page 16: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/16.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.63
![Page 17: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/17.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.75
0.67
0.82
![Page 18: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/18.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
![Page 19: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/19.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.67
![Page 20: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/20.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.73
0.730.73
![Page 21: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/21.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
![Page 22: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/22.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.51
![Page 23: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/23.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.68
![Page 24: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/24.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.51
![Page 25: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/25.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
![Page 26: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/26.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7 0.51
0.51
0.68
![Page 27: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/27.jpg)
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
![Page 28: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/28.jpg)
Running Time
Running time = O(|E|) For sparse networks = O(|V|)
[2] A. Clauset, M. E. J. Newman, & C. Moore, Phys. Rev. E 70, 066111 (2004).
![Page 29: SCAN: A Structural Clustering Algorithm for Networks](https://reader036.vdocuments.net/reader036/viewer/2022062408/568134ad550346895d9bc320/html5/thumbnails/29.jpg)
Conclusion
We propose a novel network clustering algorithm:
– It is fast O(|E|), for scale free networks: O(|V|)
– It can find clusters, as well as hubs and outliers