![Page 1: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/1.jpg)
Hierarchical)&)Spectral)clustering)Lecture)13)
David&Sontag&New&York&University&
Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein
![Page 2: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/2.jpg)
Agglomerative Clustering • Agglomerative clustering:
– First merge very similar instances – Incrementally build larger clusters out
of smaller clusters
• Algorithm: – Maintain a set of clusters – Initially, each instance in its own
cluster – Repeat:
• Pick the two closest clusters • Merge them into a new cluster • Stop when there’s only one cluster left
• Produces not one clustering, but a family of clusterings represented by a dendrogram
![Page 3: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/3.jpg)
Agglomerative Clustering • How should we define �closest� for clusters
with multiple elements?
![Page 4: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/4.jpg)
Agglomerative Clustering • How should we define �closest� for clusters
with multiple elements?
• Many options: – Closest pair
(single-link clustering) – Farthest pair
(complete-link clustering) – Average of all pairs
• Different choices create different clustering behaviors
![Page 5: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/5.jpg)
Agglomerative Clustering • How should we define �closest� for clusters
with multiple elements?
Farthest pair (complete-link clustering)
Closest pair (single-link clustering) Single Link Example
1 2
3 4
5 6
7 8
Complete Link Example
1 2
3 4
5 6
7 8
[Pictures from Thorsten Joachims]
![Page 6: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/6.jpg)
Clustering&Behavior&Average
Mouse tumor data from [Hastie et al.]
Farthest Nearest
![Page 7: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/7.jpg)
Agglomera<ve&Clustering&
When&can&this&be&expected&to&work?&
Closest pair (single-link clustering) Single Link Example
1 2
3 4
5 6
7 8
Strong separation property: All points are more similar to points in their own cluster than to any points in any other cluster
Then, the true clustering corresponds to some pruning of the tree obtained by single-link clustering!
Slightly weaker (stability) conditions are solved by average-link clustering
(Balcan et al., 2008)
![Page 8: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/8.jpg)
Spectral)Clustering)
Slides adapted from James Hays, Alan Fern, and Tommi Jaakkola
![Page 9: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/9.jpg)
Spectral)clustering)
[Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
K-means Spectral clustering
![Page 10: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/10.jpg)
Spectral)clustering)
[Figures from Ng, Jordan, Weiss NIPS ‘01]
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
![Page 11: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/11.jpg)
Spectral)clustering)
))Group)points)based)on)links)in)a)graph)
A B
[Slide from James Hays]
![Page 12: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/12.jpg)
!"#�$"�%&'($'�$)'�*&(+)�,
• -$�./�0"11"2�$"�3/'�(�*(3//.(2�4'&2'5�$"�0"1+3$'�/.1.5(&.$6�7'$#''2�"78'0$/
� ��
�• 92'�0"35:�0&'($'�– ;�<3556�0"22'0$':�=&(+)– 4�2'(&'/$�2'.=)7"&�=&(+)�>'(0)�2":'�./�"256�0"22'0$':�$"�.$/�4�2'(&'/$�2'.=)7"&/?
A B
[Slide from Alan Fern]
![Page 13: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/13.jpg)
Can)we)use)minimum)cut)for)clustering?)
!"#$%&'( #$" '")!"*#+#,%* -.,#".,+ /'"& ,* !%'# %0 #$"! +."1+'"& %* 2%-+2 3.%3".#,"' %0 #$" ).+3$4 5"-+/'" 3".-"3#/+2).%/3,*) ,' +1%/# "6#.+-#,*) #$" )2%1+2 ,!3."'',%*' %0 +'-"*"( +' 7" '+7 "+.2,".( #$,' 3+.#,#,%*,*) -.,#".,%* %0#"*0+22' '$%.# %0 #$,' !+,* )%+24
8* #$,' 3+3".( 7" 3.%3%'" + *"7 ).+3$9#$"%."#,- -.,#".,%*0%. !"+'/.,*) #$" )%%&*"'' %0 +* ,!+)" 3+.#,#,%*:#$"!"#$%&'()* +,-4 ;" ,*#.%&/-" +*& </'#,0= #$,' -.,#".,%* ,*>"-#,%* ?4 @$" !,*,!,A+#,%* %0 #$,' -.,#".,%* -+* 1"0%.!/2+#"& +' + )"*".+2,A"& ",)"*B+2/" 3.%12"!4 @$"",)"*B"-#%.' -+* 1" /'"& #% -%*'#./-# )%%& 3+.#,#,%*' %0#$" ,!+)" +*& #$" 3.%-"'' -+* 1" -%*#,*/"& ."-/.',B"2= +'&"',."& C>"-#,%* ?4DE4 >"-#,%* F ),B"' + &"#+,2"& "632+*+#,%*%0 #$" '#"3' %0 %/. ).%/3,*) +2)%.,#$!4 8* >"-#,%* G( 7"'$%7 "63".,!"*#+2 ."'/2#'4 @$" 0%.!/2+#,%* +*& !,*,!,A+9#,%* %0 #$" *%.!+2,A"& -/# -.,#".,%* &.+7' %* + 1%&= %0."'/2#' 0.%! #$" 0,"2& %0 '3"-#.+2 ).+3$ #$"%.= C>"-#,%* HE4I"2+#,%*'$,3 #% 7%.J ,* -%!3/#". B,',%* ,' &,'-/''"& ,*>"-#,%* K +*& -%!3+.,'%* 7,#$ ."2+#"& ",)"*B"-#%. 1+'"&'")!"*#+#,%* !"#$%&' ,' ."3."'"*#"& ,* >"-#,%* K4D4 ;"-%*-2/&" ,* >"-#,%* L4
@$" !+,* ."'/2#' ,* #$,' 3+3". 7"." 0,.'# 3."'"*#"& ,* M?NO4
! "#$%&'(" )* "#)&+ &)#,','$('("
P ).+3$ ! ! ""!## -+* 1" 3+.#,#,%*"& ,*#% #7% &,'<%,*#'"#'( "!#( " $# ! $ ( " %# ! &( 1= ',!32= ."!%B,*) "&)"'-%**"-#,*) #$" #7% 3+.#'4 @$" &")."" %0 &,'',!,2+.,#=1"#7""* #$"'" #7% 3,"-"' -+* 1" -%!3/#"& +' #%#+2 7",)$#%0 #$" "&)"' #$+# $+B" 1""* ."!%B"&4 8* ).+3$ #$"%."#,-2+*)/+)"( ,# ,' -+22"& #$" +,-Q
%&'""!## !!
&'"!('#)"&! (#* "!#
@$" %3#,!+2 1,3+.#,#,%*,*) %0 + ).+3$ ,' #$" %*" #$+#!,*,!,A"' #$,' +,- B+2/"4 P2#$%/)$ #$"." +." +* "63%*"*#,+2*/!1". %0 '/-$ 3+.#,#,%*'( 0,*&,*) #$" $'!'$,$ +,- %0 +).+3$ ,' + 7"229'#/&,"& 3.%12"! +*& #$"." "6,'# "00,-,"*#+2)%.,#$!' 0%. '%2B,*) ,#4
;/ +*& R"+$= M?HO 3.%3%'"& + -2/'#".,*) !"#$%& 1+'"&%* #$,' !,*,!/! -/# -.,#".,%*4 8* 3+.#,-/2+.( #$"= '""J #%3+.#,#,%* + ).+3$ ,*#% J9'/1).+3$' '/-$ #$+# #$" !+6,!/!-/# +-.%'' #$" '/1).%/3' ,' !,*,!,A"&4 @$,' 3.%12"! -+* 1""00,-,"*#2= '%2B"& 1= ."-/.',B"2= 0,*&,*) #$" !,*,!/! -/#'#$+# 1,'"-# #$" "6,'#,*) '")!"*#'4 P' '$%7* ,* ;/ +*&R"+$=S' 7%.J( #$,' )2%1+22= %3#,!+2 -.,#".,%* -+* 1" /'"& #%3.%&/-" )%%& '")!"*#+#,%* %* '%!" %0 #$" ,!+)"'4
T%7"B".( +' ;/ +*& R"+$= +2'% *%#,-"& ,* #$",. 7%.J(#$" !,*,!/! -/# -.,#".,+ 0+B%.' -/##,*) '!+22 '"#' %0,'%2+#"& *%&"' ,* #$" ).+3$4 @$,' ,' *%# '/.3.,',*) ',*-"#$" -/# &"0,*"& ,* CDE ,*-."+'"' 7,#$ #$" */!1". %0 "&)"')%,*) +-.%'' #$" #7% 3+.#,#,%*"& 3+.#'4 U,)4 D ,22/'#.+#"' %*"'/-$ -+'"4 P''/!,*) #$" "&)" 7",)$#' +." ,*B".'"2=3.%3%.#,%*+2 #% #$" &,'#+*-" 1"#7""* #$" #7% *%&"'( 7"'"" #$" -/# #$+# 3+.#,#,%*' %/# *%&" +! %. +" 7,22 $+B" + B".='!+22 B+2/"4 8* 0+-#( +*= -/# #$+# 3+.#,#,%*' %/# ,*&,B,&/+2*%&"' %* #$" .,)$# $+20 7,22 $+B" '!+22". -/# B+2/" #$+* #$"-/# #$+# 3+.#,#,%*' #$" *%&"' ,*#% #$" 2"0# +*& .,)$# $+2B"'4
@% +B%,& #$,' /**+#/.+2 1,+' 0%. 3+.#,#,%*,*) %/# '!+22'"#' %0 3%,*#'( 7" 3.%3%'" + *"7 !"+'/." %0 &,'+''%-,+#,%*
1"#7""* #7% ).%/3'4 8*'#"+& %0 2%%J,*) +# #$" B+2/" %0 #%#+2"&)" 7",)$# -%**"-#,*) #$" #7% 3+.#,#,%*'( %/. !"+'/."-%!3/#"' #$" -/# -%'# +' + 0.+-#,%* %0 #$" #%#+2 "&)"-%**"-#,%*' #% +22 #$" *%&"' ,* #$" ).+3$4 ;" -+22 #$,'&,'+''%-,+#,%* !"+'/." #$" !"#$%&'()* +,- C.+,-EQ
,%&'""!## ! %&'""!##-../%""! $ #
( %&'""!##-../%"#! $ #
! ""#
7$"." -../%""! $ # !"
&'"!''$ )"&! '# ,' #$" #%#+2 -%**"-#,%*
0.%! *%&"' ,* P #% +22 *%&"' ,* #$" ).+3$ +*& -../%"#! $ # ,'
',!,2+.2= &"0,*"&4 ;,#$ #$,' &"0,*,#,%* %0 #$" &,'+''%-,+#,%*
1"#7""* #$" ).%/3'( #$" -/# #$+# 3+.#,#,%*' %/# '!+22
,'%2+#"& 3%,*#' 7,22 *% 2%*)". $+B" '!+22 .+,- B+2/"( ',*-"
#$" +,- B+2/" 7,22 +2!%'# -".#+,*2= 1" + 2+.)" 3".-"*#+)" %0
#$" #%#+2 -%**"-#,%* 0.%! #$+# '!+22 '"# #% +22 %#$". *%&"'4 8*
#$" -+'" ,22/'#.+#"& ,* U,)4 D( 7" '"" #$+# #$" %&'! B+2/"
+-.%'' *%&" +! 7,22 1" DNN 3".-"*# %0 #$" #%#+2 -%**"-#,%*
0.%! #$+# *%&"48* #$" '+!" '3,.,#( 7" -+* &"0,*" + !"+'/." 0%. #%#+2
*%.!+2,A"& +''%-,+#,%* 7,#$,* ).%/3' 0%. + ),B"* 3+.#,#,%*Q
,-../%""!## ! -../%""!"#-../%""! $ #
( -../%"#!##-../%"#! $ #
! "##
7$"." -../%""!"# +*& -../%"#!## +." #%#+2 7",)$#' %0"&)"' -%**"-#,*) *%&"' 7,#$,* " +*& #( ."'3"-#,B"2=4 ;"'"" +)+,* #$,' ,' +* /*1,+'"& !"+'/."( 7$,-$ ."02"-#' $%7#,)$#2= %* +B".+)" *%&"' 7,#$,* #$" ).%/3 +." -%**"-#"& #%"+-$ %#$".4
P*%#$". ,!3%.#+*# 3.%3".#= %0 #$,' &"0,*,#,%* %0 +''%-,+9#,%* +*& &,'+''%-,+#,%* %0 + 3+.#,#,%* ,' #$+# #$"= +."*+#/.+22= ."2+#"&Q
,%&'""!## ! %&'""!##-../%""! $ # (
%&'""!##-../%"#! $ #
! -../%""! $ # ) -../%""!"#-../%""! $ #
( -../%"#! $ # ) -../%"#!##-../%"#! $ #
! ") -../%""!"#-../%""! $ # (
-../%"#!##-../%"#! $ #
# $
! "),-../%""!##*
T"*-"( #$" #7% 3+.#,#,%* -.,#".,+ #$+# 7" '""J ,* %/.).%/3,*) +2)%.,#$!( !,*,!,A,*) #$" &,'+''%-,+#,%* 1"#7""*#$" ).%/3' +*& !+6,!,A,*) #$" +''%-,+#,%* 7,#$,* #$"
!"# $%& '$(#)* %+,'$(#-.& /01! $%& #'$2. !.2'.%1$1#+% 334
5678 98 $ :;<= >?=@= A6B6ACA :CD 76E=< ; F;G H;@D6D6IB8
[Shi & Malik ‘00]
![Page 14: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/14.jpg)
Graph�partitioning
![Page 15: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/15.jpg)
Graph�Terminologies• Degree�of�nodes
• Volume�of�a�set
![Page 16: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/16.jpg)
Graph�Cut• Consider�a�partition�of�the�graph�into�two�parts�A�and�B
• Cut(A,�B):�sum�of�the�weights�of�the�set�of�edges�that�connect�the�two�groups
• An�intuitive�goal�is�find�the�partition�that��minimizes�the�cut
![Page 17: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/17.jpg)
Normalized�Cut
• Consider�the�connectivity�between�groups�relative�to�the�volume�of�each�group
A
B)(),(
)(),(),(
BVolBAcut
AVolBAcut
BANcut ��
)()()()(),(),(
BVolAVolBVolAVol
BAcutBANcut�
�
Minimized�when�Vol(A)�and�Vol(B)�are�equal.�Thus�encourage�balanced�cut
![Page 18: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/18.jpg)
01�DyTSubject�to:
Solving�NCut• How�to�minimize�Ncut?
• With�some�simplifications,�we�can�show:
DyyyWDy
xNcut T
T
yx)(
min)(min�
�
Rayleigh�quotient
NP�Hard!
.1)(,}1,1{ in vector a be Let );,(),( matrix, diag. thebe DLet ;),( matrix, similarity thebe Let ,
AiixxjiWiiD
WjiWW
Nj
ji
�����
��
(y takes discrete values)
![Page 19: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/19.jpg)
• Relax�the�optimization�problem�into�the�continuous�domain�by�solving�generalized�eigenvalue�system:
���" �� � ' � � subject�to����� ( �
• Which�gives: � ' � � ( ���• Note�that� � ' � � ( �,�so�the�first�eigenvector�is��� ( �
with�eigenvalue��.• The�second�smallest�eigenvector�is�the�real�valued�solution�to�
this�problem!!
Solving�NCut
![Page 20: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/20.jpg)
2�way�Normalized�Cuts
1. Compute�the�affinity�matrix�W,�compute�the�degree�matrix�(D),�D�is�diagonal�and�
!��2. Solve� ,�where� is�
called�the�Laplacian matrix3. Use�the�eigenvector�with�the�second�smallest�
eigen�value�to�bipartition�the�graph�into�two�parts.
![Page 21: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/21.jpg)
Creating�Bi�partition�Using�2ndEigenvector
• Sometimes�there�is�not�a�clear�threshold�to�split�based�on�the�second�vector�since�it��takes�continuous�values
• How�to�choose�the�splitting�point?�a) Pick�a�constant�value�(0,�or�0.5).b) Pick�the�median�value�as�splitting�point.c) Look�for�the�splitting�point�that�has�the�minimum�Ncut
value:1. Choose�n possible�splitting�points.2. Compute�Ncut value.3. Pick�minimum.
![Page 22: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/22.jpg)
Spectral clustering: example
−3 −2 −1 0 1 2 3 4 5−2
−1
0
1
2
3
4
5
6
−4 −2 0 2 4 6−2
−1
0
1
2
3
4
5
6
Tommi Jaakkola, MIT CSAIL 18
![Page 23: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/23.jpg)
Spectral clustering: example cont’d
0 5 10 15 20 25 30 35 40−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Components of the eigenvector corresponding to the secondlargest eigenvalue
Tommi Jaakkola, MIT CSAIL 19
![Page 24: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/24.jpg)
K�way�Partition?
• Recursive�bi�partitioning�(Hagen�et�al.,̂ 91)– Recursively�apply�bi�partitioning�algorithm�in�a�hierarchical�divisive�manner.
– Disadvantages:�Inefficient,�unstable• Cluster�multiple�eigenvectors– Build�a�reduced�space�from�multiple�eigenvectors.– Commonly�used�in�recent�papers– A�preferable�approach`�its�like�doing�dimension�reduction�then�k�means
![Page 25: Hierarchical)&)Spectral)clustering) Lecture)13)people.csail.mit.edu/dsontag/courses/ml16/slides/lecture13.pdfHierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University&](https://reader033.vdocuments.net/reader033/viewer/2022052722/5f0c1c637e708231d433caff/html5/thumbnails/25.jpg)