![Page 1: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/1.jpg)
Introduction to Clustering and
Spectral ClusteringKarl Rohe
Originally a job talk atWilliams College in January 2011.
1
![Page 2: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/2.jpg)
1) Intro to clustering
2) Spectral clustering
2
![Page 3: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/3.jpg)
UsefulSubjective Computationally Challenging
Clustering divides a data set into sets of similar points.
3
![Page 4: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/4.jpg)
• Urbanization
• Irises
• Financial sectors
• Dolphin social network
• Natural image patches
Image from NASA
Clustering has many applications
4
![Page 5: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/5.jpg)
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5
Clustering has many applications
• Urbanization
• Irises
• Financial sectors
• Dolphin social network
• Natural image patches
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Irises
PC1
PC2
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
5
![Page 6: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/6.jpg)
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5
Clustering has many applications
• Urbanization
• Iris species
• Financial sectors
• Dolphin social network
• Natural image patches
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Irises
PC1
PC2
SetosaVersicolorVirginica
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
6
![Page 7: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/7.jpg)
Daily closing stock prices for all S&P 500Going back to 2005
Clustering has many applications
• Urbanization
• Iris species
• Financial sectors
• Dolphin social network
• Natural image patches
data overwhelm
7
![Page 8: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/8.jpg)
Clustering has many applications
• Urbanization
• Iris species
• Financial sectors
• Dolphin social network
• Natural image patches
8
![Page 9: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/9.jpg)
D. Lusseau, K. Schneider, O.J. Boisseau, P. Haase, E. Slooten, and S.M. Dawson. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54:396–405, 2003.
Community structure in large networks 21
(a) Zachary’s karate club network . . .
0.1
1
1 10
! (
conducta
nce)
k (number of nodes in the cluster)
cut Acut A+B
(b) . . . and it’s community profile plot
(c) Dolphins social network . . .
0.01
0.1
1
1 10 100
! (
conducta
nce)
k (number of nodes in the cluster)
cut
(d) . . . and it’s community profile plot
(e) Monks social network . . .
0.1
1
1 10
! (
conducta
nce)
k (number of nodes in the cluster)
cut
(f) . . . and it’s community profile plot
(g) Network science network . . .
0.001
0.01
0.1
1
1 10 100
! (
conducta
nce)
k (number of nodes in the cluster)
A
BC
D
C+E
(h) . . . and it’s community profile plot
Figure 5: Depiction of several small social networks that are common test sets for community detection algorithmsand their network community profile plots. (5(a)–5(b)) Zachary’s karate club network. (5(c)–5(d)) A network ofdolphins. (5(e)–5(f)) A network of monks. (5(g)–5(h)) A network of researchers researching networks.
Clustering has many applications
• Urbanization
• Iris species
• Financial sectors
• Dolphin social network
• Natural image patches
9
![Page 10: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/10.jpg)
Gallant Neuroscience lab
Clustering has many applications
• Urbanization
• Iris species
• Financial sectors
• Dolphin social network
• Natural image patches
what you are seeing changes what is going on in the back of your brain.
fMRI can measure this.
10
![Page 11: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/11.jpg)
Clustering has many applications
• Urbanization
• Iris species
• Financial sectors
• Dolphin social network
• Natural image patches
11
![Page 12: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/12.jpg)
UsefulSubjective Computationally Challenging
Clustering divides a data set into sets of similar points.
12
![Page 13: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/13.jpg)
Grouping similar objects is natural. But, how do you define “similar”?
13
●
●
0 5 10 15 20
05
1015
c(0, 20)
c(0,
17)
![Page 14: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/14.jpg)
On those remote pages [of an ancient Chinese encyclopedia] it is written that animals are divided into (a) those that belong to the Emperor, (b) embalmed ones, (c) those that are trained, (d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h) those that are included in this classification, (e) those that tremble as if they were mad, (j) innumerable ones, (k) those drawn with a very fine camel’s hair brush, (l) other, (m) those that have just broken a flower vase, (n) those that resemble flies from a distance.
- Jorge Luis Borges, Other Inquisitions14
Grouping similar objects is natural. But, how do you define “similar”?
![Page 15: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/15.jpg)
UsefulSubjective Computationally Challenging
Clustering divides a data set into sets of similar points.
15
![Page 16: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/16.jpg)
16
![Page 17: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/17.jpg)
Log daily closing stock prices for all S&P 500, 2005 - Present.
17
![Page 18: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/18.jpg)
While clustering is natural for humans. Clustering large data sets is difficult.
We need to teach computers how to cluster!
18
![Page 19: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/19.jpg)
While clustering is natural for humans. Clustering large data sets is difficult.
We need to teach computers how to cluster!
• Computers only understand algorithms.
• Often, clustering algorithm are motivated by optimization problems.
19
![Page 20: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/20.jpg)
Partitions
20
![Page 21: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/21.jpg)
Partitions
Community structure in large networks 21
(a) Zachary’s karate club network . . .
0.1
1
1 10
! (
conducta
nce)
k (number of nodes in the cluster)
cut Acut A+B
(b) . . . and it’s community profile plot
(c) Dolphins social network . . .
0.01
0.1
1
1 10 100
! (
conducta
nce)
k (number of nodes in the cluster)
cut
(d) . . . and it’s community profile plot
(e) Monks social network . . .
0.1
1
1 10
! (
conducta
nce)
k (number of nodes in the cluster)
cut
(f) . . . and it’s community profile plot
(g) Network science network . . .
0.001
0.01
0.1
1
1 10 100
! (
conducta
nce)
k (number of nodes in the cluster)
A
BC
D
C+E
(h) . . . and it’s community profile plot
Figure 5: Depiction of several small social networks that are common test sets for community detection algorithmsand their network community profile plots. (5(a)–5(b)) Zachary’s karate club network. (5(c)–5(d)) A network ofdolphins. (5(e)–5(f)) A network of monks. (5(g)–5(h)) A network of researchers researching networks.
21
![Page 22: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/22.jpg)
Clustering as an optimization problem
minC
f(C)
C: any partition f: measures how bad C is
22
![Page 23: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/23.jpg)
Convexity
23
![Page 24: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/24.jpg)
Convexity
24
![Page 25: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/25.jpg)
Convexity
25
![Page 26: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/26.jpg)
Convexity
26
![Page 27: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/27.jpg)
ConvexityMinimizing convex functions on convex sets is “easy”
Set the derivative equal to zero or
Just go downhill!
27
![Page 28: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/28.jpg)
Usually, clustering problems are not convex.
x1, . . . , xn � Rd
28
![Page 29: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/29.jpg)
Usually, clustering problems are not convex.
x1, . . . , xn � Rdk-means
f(c1, . . . , ck) =�
i
minj⇥xi � cj⇥2
2
29
![Page 30: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/30.jpg)
Usually, clustering problems are not convex.
x1, . . . , xn � Rdk-means
f(c1, . . . , ck) =�
i
minj⇥xi � cj⇥2
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Irises
PC1
PC2
SetosaVersicolorVirginica
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
30
![Page 31: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/31.jpg)
Usually, clustering problems are not convex.
x1, . . . , xn � Rdk-means
f(c1, . . . , ck) =�
i
minj⇥xi � cj⇥2
2
minc1,...ck
f(c1, . . . , ck)●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Irises
PC1
PC2
SetosaVersicolorVirginica
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
31
![Page 32: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/32.jpg)
1
2 34
32
Aij =�
1 if node i is connected to node j0 otherwise
.
A � {0, 1}n�nSymmetric adjacency matrix
![Page 33: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/33.jpg)
Normalized cuts:
Shi, Malik (2000). Normalized Cuts and Image Segmentation, IEEE Transactions Pattern Analysis and Machine Learning, 22, 888-905.
Community structure in large networks 21
(a) Zachary’s karate club network . . .
0.1
1
1 10
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
cut Acut A+B
(b) . . . and it’s community profile plot
(c) Dolphins social network . . .
0.01
0.1
1
1 10 100
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
cut
(d) . . . and it’s community profile plot
(e) Monks social network . . .
0.1
1
1 10
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
cut
(f) . . . and it’s community profile plot
(g) Network science network . . .
0.001
0.01
0.1
1
1 10 100
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
A
BC
D
C+E
(h) . . . and it’s community profile plot
Figure 5: Depiction of several small social networks that are common test sets for community detection algorithmsand their network community profile plots. (5(a)–5(b)) Zachary’s karate club network. (5(c)–5(d)) A network ofdolphins. (5(e)–5(f)) A network of monks. (5(g)–5(h)) A network of researchers researching networks.
minP,Q
�i�P,j�Q Aij�
i�P,j�P⇥Q Aij+
�i�P,j�Q Aij�
i�Q,j�P⇥Q Aij
Aij =�
1 if node i is connected to node j0 otherwise
.
33
A � {0, 1}n�nSymmetric adjacency matrix
![Page 34: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/34.jpg)
Normalized cuts:
It can be shown that normalized cuts is equivalent to the following problem:
subject to yT D1 = 0
miny
yT (D �A)yyT Dy
yi ⇥ {�1/b, b} ⇤i
Community structure in large networks 21
(a) Zachary’s karate club network . . .
0.1
1
1 10
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
cut Acut A+B
(b) . . . and it’s community profile plot
(c) Dolphins social network . . .
0.01
0.1
1
1 10 100
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
cut
(d) . . . and it’s community profile plot
(e) Monks social network . . .
0.1
1
1 10
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
cut
(f) . . . and it’s community profile plot
(g) Network science network . . .
0.001
0.01
0.1
1
1 10 100
! (
co
nd
ucta
nce
)
k (number of nodes in the cluster)
A
BC
D
C+E
(h) . . . and it’s community profile plot
Figure 5: Depiction of several small social networks that are common test sets for community detection algorithmsand their network community profile plots. (5(a)–5(b)) Zachary’s karate club network. (5(c)–5(d)) A network ofdolphins. (5(e)–5(f)) A network of monks. (5(g)–5(h)) A network of researchers researching networks.
34
![Page 35: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/35.jpg)
There are _________ many partitions of n data points into 2 clusters.
35
![Page 36: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/36.jpg)
There are very many partitions of the data points
There are _________ many partitions of n data points into 2 clusters.
2n�1 � 1
(a,b) (b,a) (ab, ) ( ,ab)
36
![Page 37: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/37.jpg)
(a,b) (b,a) (ab, ) ( ,ab)
There are _________ many partitions of n data points into 2 clusters.
2n�1 � 1
There are very many partitions of the data points
37
![Page 38: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/38.jpg)
Champion has postulated that there are
283 atoms in the universe.
Not possible to look at all partitions!
There are _________ many partitions of n data points into 2 clusters.
2n�1 � 1
Matthew Champion, "Re: How many atoms make up the universe?", 1998
There are very many partitions of the data points
38
![Page 39: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/39.jpg)
Clustering poses several challenges.
• The approach you choose is often subjective
• Difficult to optimize.
• Must rely on approximations: Local optima, “convex relaxation.”
39
![Page 40: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/40.jpg)
The algorithmRelationship to normalized cutsEuclidean versionAdvantages
Spectral Clustering
40
![Page 41: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/41.jpg)
The algorithm (with graph data)
A � {0, 1}n�nAdjacency matrix
21
3
4
5 6
41
![Page 42: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/42.jpg)
The algorithm (with graph data)
D � Rn�n
A � {0, 1}n�n.
Dii =�
j
Aij
Adjacency matrix
L = I �D�1A
21
3
4
5 6
42
![Page 43: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/43.jpg)
The algorithm (with graph data)
D � Rn�n
Find the eigenvector corresponding to the second smallest eigenvalue.
A � {0, 1}n�n.
Dii =�
j
Aij
Adjacency matrix
L = I �D�1A
y � Rn
21
3
4
5 6
43
![Page 44: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/44.jpg)
The algorithm (with graph data)
D � Rn�n
Find the eigenvector corresponding to the second smallest eigenvalue.
A � {0, 1}n�n.
Dii =�
j
Aij
Adjacency matrix
L = I �D�1A
y � Rn Ly = �y
Eigenvectors are beautiful ANDelusive.
it is difficult to exactly define the vectors .
21
3
4
5 6
-.5-.5
-.3
.3
.5 .5
yi < 0 =� i ⇥ Ayi � 0 =⇥ i ⇤ B
44
![Page 45: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/45.jpg)
The algorithmRelationship to normalized cutsEuclidean version Advantages
Spectral Clustering
45
![Page 46: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/46.jpg)
Recall normalized cuts:
minP,Q
�i�P,j�Q Aij�
i�P,j�P⇥Q Aij+
�i�P,j�Q Aij�
i�Q,j�P⇥Q Aij
subject to yT D1 = 0
miny
yT (D �A)yyT Dy
yi ⇥ {�1/b, b} ⇤i
46
![Page 47: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/47.jpg)
Spectral clustering is a “convex relaxation” of normalized cuts
Because of the restriction to a discrete set, this problem is not convex. “Relax” the problem. Optimize over
The optimum is the second smallest eigenvector of .L
y � Rn, yT D1 = 0
subject to yT D1 = 0
miny
yT (D �A)yyT Dy
yi ⇥ {�1/b, b} ⇤i
47
![Page 48: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/48.jpg)
The algorithmRelationship to normalized cutsEuclidean version Advantages
Spectral Clustering
48
![Page 49: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/49.jpg)
The algorithm (with graph data)
D � Rn�n
Find the eigenvector corresponding to the second smallest eigenvalue.
A � {0, 1}n�n.
Dii =�
j
Aij
Adjacency matrix
L = I �D�1A
y � Rn Ly = �y
21
3
4
5 6
-.5-.5
-.3
.3
.5 .5
yi < 0 =� i ⇥ Ayi � 0 =⇥ i ⇤ B
49
![Page 50: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/50.jpg)
The algorithm (with Euclidean data)
Kij = exp(�⇥xi � xj⇥22/�2)
D � Rn�n
.
Dii =�
j
Kij
L = I �D�1K
Here the subjectivity of the similarity function is obvious! Need to choose a function for K_{ij}
Find the eigenvector corresponding to the second smallest eigenvalue.
y � Rn Ly = �y
K � Rn�n
yi < 0 =� i ⇥ Ayi � 0 =⇥ i ⇤ B
50
![Page 51: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/51.jpg)
The algorithmRelationship to normalized cutsEuclidean version Advantages
Spectral Clustering
51
![Page 52: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/52.jpg)
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4 −2 0 2 4
−4−2
02
4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Irises
PC1
PC2
SetosaVersicolorVirginica
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
k-means
52
![Page 53: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/53.jpg)
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4 −2 0 2 4
−4−2
02
4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Irises
PC1
PC2
SetosaVersicolorVirginica
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
k-means
53
![Page 54: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/54.jpg)
Because it relies on a “local” measure of similarity, spectral clustering can
detect oddly shaped clusters.
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4 −2 0 2 4
−4−2
02
4
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4 −2 0 2 4
−4−2
02
4
“k-nearest neighbors”
Another similarity function
54
![Page 55: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/55.jpg)
Spectral clustering . . .
• is computationally tractable.
• is a “convex relaxation” of normalized cuts.
• is able to find oddly shaped clusters.
• has interesting connections to • spectral graph theory, • random walks on graphs,• and electrical network theory.
In contrast to regression , clustering is much more subjective.
there are several different clustering algorithms?
What criteria should you use to decide?
55
![Page 56: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/56.jpg)
1) Intro to clustering
2) Spectral clustering
3) Research56
![Page 57: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/57.jpg)
Why should you trust spectral clustering?
Statistical EstimationStochastic BlockmodelTheorem
57
![Page 58: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/58.jpg)
x1, . . . , xn � Rp
y1, . . . , yn � RFor example, say you have the GPA of some students
and some predictors (height, SAT score, # roommates, etc.)
58
Yi = Xi� + erroriSay with a few conditions on the error distribution, is a reasonable model for the data.
A statistical model to study an algorithm
![Page 59: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/59.jpg)
One desirable result:
�̂n = argminb�Rd
n�
i=1
(Yi �Xib)2
�̂n � �
What can be said about
This would suggest that least squares is reasonable.
59
Yi = Xi� + errori
A statistical model to study an algorithm
![Page 60: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/60.jpg)
To study the estimation performance of spectral clustering, we need a
statistical model.
60
![Page 61: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/61.jpg)
Statistical EstimationStochastic BlockmodelTheorem
Why should you trust spectral clustering?
61
![Page 62: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/62.jpg)
• Divide nodes into k blocks• Let each block have an equal
proportion of the nodes• Edges: independent, Bernoulli
with probability p if nodes in same block r otherwise
A simple example of the Stochastic Block Model
Clustering: estimate block membership for each node62
![Page 63: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/63.jpg)
Statistical EstimationStochastic BlockmodelTheorem
Why should you trust spectral clustering?
63
![Page 64: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/64.jpg)
Theorem
Under the previously described Stochastic Block Model, for , the number of “misclustered” nodes is bounded
p �= r
as the number of nodes and the number of blocks
n�⇥k = k(n)�⇥
number of misclustered nodes = o(k3 log2 n) a.s.
R., Chatterjee, Yu. Spectral clustering and the high-dimensional Stochastic Blockmodel. Annals of Statistics, pending minor revisions. 64
![Page 65: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/65.jpg)
Clustering is useful, subjective, and computationally challenging.
Spectral clustering uses the eigenvectors of the graph Laplacian to relax a non-convex problem.
Spectral clustering can estimate the blocks in the Stochastic Blockmodel.
Conclusions
65
![Page 66: Introduction to Clustering and Spectral Clusteringpages.stat.wisc.edu/~karlrohe/netsci/IntroToClustering.pdf · Behavioral Ecology and Sociobiology, 54:396–405, 2003. Community](https://reader033.vdocuments.net/reader033/viewer/2022052005/601864e8c3690e007f05415a/html5/thumbnails/66.jpg)
ReferencesAnderson (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5
Lusseau, Schneider, Boisseau, Haase, Slooten, and Dawson (2003). The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54:396–405.
R., Chatterjee, Yu (2010). Spectral clustering and the high-dimensional Stochastic Blockmodel. Annals of Statistics, pending minor revisions.
Shi, Malik (2000). Normalized Cuts and Image Segmentation, IEEE Transactions Pattern Analysis and Machine Learning, 22:888-905.
66