statistical data mining and machine learning hilary term 2016sejdinov/teaching/ht16_lecture3.pdfv1...
TRANSCRIPT
Statistical Data Mining and Machine LearningHilary Term 2016
Dino SejdinovicDepartment of Statistics
Oxford
Slides and other materials available at:http://www.stats.ox.ac.uk/~sejdinov/sdmml
Visualisation and Dimensionality Reduction
Last time: Biplots, MDSBiplots: a way to visualiserelationships between principalcomponents and the original variables.In its scaled version, projected pointsare uncorrelated and equi-variant andangles between variables correspondto correlation.
−0.2 −0.1 0.0 0.1 0.2 0.3
−0.
2−
0.1
0.0
0.1
0.2
0.3
Comp.1
Com
p.2
AlabamaAlaska
Arizona
Arkansas
California
ColoradoConnecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana Iowa
Kansas
KentuckyLouisiana
MaineMaryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon Pennsylvania
Rhode Island
South Carolina
South DakotaTennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
−5 0 5
−5
05
Murder
Assault
UrbanPop
Rape
Multidimensional Scaling: a class ofdimensionality reduction techniqueswhich find a representation of the dataitems x1, . . . , xn ∈ Rp in alower-dimensional spacez1, . . . , zn ∈ Rk which approximatelypreserves the inter-point(dis)similarities.
−1000 −500 0 500 1000
−10
00−
500
050
010
00
ATLANTA
CHICAGO
DENVER
HOUSTON
LA
MIAMI
NY
SF
SEATTLE
DC
Visualisation and Dimensionality Reduction Multidimensional Scaling
Varieties of MDSChoices of (dis)similarities and stress functions S(Z) lead to differentalgorithms.
Classical/Torgerson: preserves inner products instead - strain function(cmdscale)
S(Z) =∑i 6=j
(bij − 〈zi − z̄, zj − z̄〉)2
Metric Shephard-Kruskal: preserves distances w.r.t. squared stress
S(Z) =∑i 6=j
(dij − ‖zi − zj‖2)2
Sammon: preserves shorter distances more (sammon)
S(Z) =∑i 6=j
(dij − ‖zi − zj‖2)2
dij
Non-Metric Shephard-Kruskal: ignores actual distance values, onlypreserves ranks (isoMDS)
S(Z) = ming increasing
∑i6=j(g(dij)− ‖zi − zj‖2)
2∑i 6=j ‖zi − zj‖2
2
Visualisation and Dimensionality Reduction Multidimensional Scaling
Example: Language data
Presence or absence of 2867 homologous traits in 87 Indo-Europeanlanguages.
> X<-read.table("http://www.stats.ox.ac.uk/~sejdinov/sdmml/data/cognate.txt")> X[1:15,1:16]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16Irish_A 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Irish_B 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Welsh_N 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0Welsh_C 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0Breton_List 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Breton_SE 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Breton_ST 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Romanian_List 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Vlach 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Italian 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Ladin 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Provencal 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0French 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Walloon 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0French_Creole_C 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Visualisation and Dimensionality Reduction Multidimensional Scaling
Example: Language dataUsing MDS with non-metric (Sammon) scaling.
MDS (i.e. cmdscale) which minimizes (d2ij − d̃2
ij)2. Sammon thereby puts
more weight on reproducing the separation of points which are close byforcing them apart. Projection by MDS(Jaccard/sammon) with cluster dis-covery by k-means (Jaccard): There is an obvious east to west (top-left to
!0.5 0.0 0.5
!0.6
!0.4
!0.2
0.0
0.2
0.4
0.6
Irish_A
Irish_B
Welsh_NWelsh_C
Breton_List
Breton_SE
Breton_ST
Romanian_ListVlach
Italian
Ladin
Provencal
French
Walloon
French_Creole_C
French_Creole_D
Sardinian_N
Sardinian_L
Sardinian_C
Spanish Portuguese_STBrazilian
Catalan
German_STPenn_Dutch
Dutch_List
Afrikaans
Flemish
Frisian
Swedish_UpSwedish_VL
Swedish_List
Danish
Riksmal
Icelandic_ST
Faroese
English_STTakitaki
Lithuanian_O
Lithuanian_STLatvian
Slovenian
Lusatian_L
Lusatian_U
Czech
Slovak
Czech_E
Ukrainian
Byelorussian
PolishRussian
MacedonianBulgarian
Serbocroatian
Gypsy_Gk
Singhalese
Kashmiri
Marathi
Gujarati
Panjabi_ST
Lahnda
Hindi
Bengali
Nepali_List
Khaskura
Greek_ML
Greek_MD
Greek_Mod
Greek_D
Greek_K
Armenian_Mod
Armenian_List
Ossetic
Afghan
WaziriPersian_ListTadzik
Baluchi
Wakhi
Albanian_T
Albanian_Top
Albanian_G
Albanian_K
Albanian_C
HITTITE
TOCHARIAN_A
TOCHARIAN_B
bottom-right) separation of languages in the MDS and the clusters in theMDS grouping agree with the clusters discovered by agglomerative clus-tering and k-means. The two clustering methods group languages slightlydifferently with k-means splitting the Germanic languages.
## (alternative/MDS) make a field to display the clusters
## use MDS - sammon does this nicely
di.sam <- sammon(D,magic=0.20000002,niter=1000,tol=1e-8)
eqscplot(di.sam$points,pch=km$cluster,col=km$cluster)
text(di.sam$points,labels=row.names(X),pos=4,col=km$cluster)
5
Visualisation and Dimensionality Reduction Isomap
Nonlinear Dimensionality Reduction
Two aims of different varieties of MDS:To visualize the (dis)similarities among items in a dataset, where these(dis)disimilarities may not have Euclidean geometric interpretations.To perform nonlinear dimensionality reduction.
Many high-dimensional datasets exhibit low-dimensional structure (“live on alow-dimensional menifold”).
Visualisation and Dimensionality Reduction Isomap
Isomap
Isomap is a non-linear dimensional reduction technique based on classicalMDS. Differs from other MDSs as it uses estimates of geodesic distancesbetween the data points.
converts distances to inner products (17 ),which uniquely characterize the geometry ofthe data in a form that supports efficientoptimization. The global minimum of Eq. 1 isachieved by setting the coordinates yi to thetop d eigenvectors of the matrix !(DG) (13).
As with PCA or MDS, the true dimen-sionality of the data can be estimated fromthe decrease in error as the dimensionality ofY is increased. For the Swiss roll, whereclassical methods fail, the residual varianceof Isomap correctly bottoms out at d " 2(Fig. 2B).
Just as PCA and MDS are guaranteed,given sufficient data, to recover the truestructure of linear manifolds, Isomap is guar-anteed asymptotically to recover the true di-mensionality and geometric structure of astrictly larger class of nonlinear manifolds.Like the Swiss roll, these are manifolds
whose intrinsic geometry is that of a convexregion of Euclidean space, but whose ambi-ent geometry in the high-dimensional inputspace may be highly folded, twisted, orcurved. For non-Euclidean manifolds, such asa hemisphere or the surface of a doughnut,Isomap still produces a globally optimal low-dimensional Euclidean representation, asmeasured by Eq. 1.
These guarantees of asymptotic conver-gence rest on a proof that as the number ofdata points increases, the graph distancesdG(i, j) provide increasingly better approxi-mations to the intrinsic geodesic distancesdM(i, j), becoming arbitrarily accurate in thelimit of infinite data (18, 19). How quicklydG(i, j) converges to dM(i, j) depends on cer-tain parameters of the manifold as it lieswithin the high-dimensional space (radius ofcurvature and branch separation) and on the
density of points. To the extent that a data setpresents extreme values of these parametersor deviates from a uniform density, asymp-totic convergence still holds in general, butthe sample size required to estimate geodes-ic distance accurately may be impracticallylarge.
Isomap’s global coordinates provide asimple way to analyze and manipulate high-dimensional observations in terms of theirintrinsic nonlinear degrees of freedom. For aset of synthetic face images, known to havethree degrees of freedom, Isomap correctlydetects the dimensionality (Fig. 2A) and sep-arates out the true underlying factors (Fig.1A). The algorithm also recovers the knownlow-dimensional structure of a set of noisyreal images, generated by a human hand vary-ing in finger extension and wrist rotation(Fig. 2C) (20). Given a more complex dataset of handwritten digits, which does not havea clear manifold geometry, Isomap still findsglobally meaningful coordinates (Fig. 1B)and nonlinear structure that PCA or MDS donot detect (Fig. 2D). For all three data sets,the natural appearance of linear interpolationsbetween distant points in the low-dimension-al coordinate space confirms that Isomap hascaptured the data’s perceptually relevantstructure (Fig. 4).
Previous attempts to extend PCA andMDS to nonlinear data sets fall into twobroad classes, each of which suffers fromlimitations overcome by our approach. Locallinear techniques (21–23) are not designed torepresent the global structure of a data setwithin a single coordinate system, as we do inFig. 1. Nonlinear techniques based on greedyoptimization procedures (24–30) attempt todiscover global structure, but lack the crucialalgorithmic features that Isomap inheritsfrom PCA and MDS: a noniterative, polyno-mial time procedure with a guarantee of glob-al optimality; for intrinsically Euclidean man-
Fig. 2. The residualvariance of PCA (opentriangles), MDS [opentriangles in (A) through(C); open circles in (D)],and Isomap (filled cir-cles) on four data sets(42). (A) Face imagesvarying in pose and il-lumination (Fig. 1A).(B) Swiss roll data (Fig.3). (C) Hand imagesvarying in finger exten-sion and wrist rotation(20). (D) Handwritten“2”s (Fig. 1B). In all cas-es, residual variance de-creases as the dimen-sionality d is increased.The intrinsic dimen-sionality of the datacan be estimated bylooking for the “elbow”at which this curve ceases to decrease significantly with added dimensions. Arrows mark the true orapproximate dimensionality, when known. Note the tendency of PCA and MDS to overestimate thedimensionality, in contrast to Isomap.
Fig. 3. The “Swiss roll” data set, illustrating how Isomap exploits geodesicpaths for nonlinear dimensionality reduction. (A) For two arbitrary points(circled) on a nonlinear manifold, their Euclidean distance in the high-dimensional input space (length of dashed line) may not accuratelyreflect their intrinsic similarity, as measured by geodesic distance alongthe low-dimensional manifold (length of solid curve). (B) The neighbor-hood graph G constructed in step one of Isomap (with K " 7 and N "
1000 data points) allows an approximation (red segments) to the truegeodesic path to be computed efficiently in step two, as the shortestpath in G. (C) The two-dimensional embedding recovered by Isomap instep three, which best preserves the shortest path distances in theneighborhood graph (overlaid). Straight lines in the embedding (blue)now represent simpler and cleaner approximations to the true geodesicpaths than do the corresponding graph paths (red).
R E P O R T S
www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2321
on
De
ce
mb
er
5,
20
08
w
ww
.scie
nce
ma
g.o
rgD
ow
nlo
ad
ed
fro
m
Tenenbaum et al. (2000)
Visualisation and Dimensionality Reduction Isomap
Isomap
Isomap
Calculate Euclidean distances dij for i, j = 1, . . . , n between all data points.Form a graph G with n samples as nodes, and edges between therespective K nearest neighbours (K-Isomap) or between i and j if dij < ε(ε-Isomap).For i, j linked by an edge, set dG
ij = dij. Otherwise, set dGij to the
shortest-path distance between i and j in G.Run classical MDS using distances dG
ij .
converts distances to inner products (17 ),which uniquely characterize the geometry ofthe data in a form that supports efficientoptimization. The global minimum of Eq. 1 isachieved by setting the coordinates yi to thetop d eigenvectors of the matrix !(DG) (13).
As with PCA or MDS, the true dimen-sionality of the data can be estimated fromthe decrease in error as the dimensionality ofY is increased. For the Swiss roll, whereclassical methods fail, the residual varianceof Isomap correctly bottoms out at d " 2(Fig. 2B).
Just as PCA and MDS are guaranteed,given sufficient data, to recover the truestructure of linear manifolds, Isomap is guar-anteed asymptotically to recover the true di-mensionality and geometric structure of astrictly larger class of nonlinear manifolds.Like the Swiss roll, these are manifolds
whose intrinsic geometry is that of a convexregion of Euclidean space, but whose ambi-ent geometry in the high-dimensional inputspace may be highly folded, twisted, orcurved. For non-Euclidean manifolds, such asa hemisphere or the surface of a doughnut,Isomap still produces a globally optimal low-dimensional Euclidean representation, asmeasured by Eq. 1.
These guarantees of asymptotic conver-gence rest on a proof that as the number ofdata points increases, the graph distancesdG(i, j) provide increasingly better approxi-mations to the intrinsic geodesic distancesdM(i, j), becoming arbitrarily accurate in thelimit of infinite data (18, 19). How quicklydG(i, j) converges to dM(i, j) depends on cer-tain parameters of the manifold as it lieswithin the high-dimensional space (radius ofcurvature and branch separation) and on the
density of points. To the extent that a data setpresents extreme values of these parametersor deviates from a uniform density, asymp-totic convergence still holds in general, butthe sample size required to estimate geodes-ic distance accurately may be impracticallylarge.
Isomap’s global coordinates provide asimple way to analyze and manipulate high-dimensional observations in terms of theirintrinsic nonlinear degrees of freedom. For aset of synthetic face images, known to havethree degrees of freedom, Isomap correctlydetects the dimensionality (Fig. 2A) and sep-arates out the true underlying factors (Fig.1A). The algorithm also recovers the knownlow-dimensional structure of a set of noisyreal images, generated by a human hand vary-ing in finger extension and wrist rotation(Fig. 2C) (20). Given a more complex dataset of handwritten digits, which does not havea clear manifold geometry, Isomap still findsglobally meaningful coordinates (Fig. 1B)and nonlinear structure that PCA or MDS donot detect (Fig. 2D). For all three data sets,the natural appearance of linear interpolationsbetween distant points in the low-dimension-al coordinate space confirms that Isomap hascaptured the data’s perceptually relevantstructure (Fig. 4).
Previous attempts to extend PCA andMDS to nonlinear data sets fall into twobroad classes, each of which suffers fromlimitations overcome by our approach. Locallinear techniques (21–23) are not designed torepresent the global structure of a data setwithin a single coordinate system, as we do inFig. 1. Nonlinear techniques based on greedyoptimization procedures (24–30) attempt todiscover global structure, but lack the crucialalgorithmic features that Isomap inheritsfrom PCA and MDS: a noniterative, polyno-mial time procedure with a guarantee of glob-al optimality; for intrinsically Euclidean man-
Fig. 2. The residualvariance of PCA (opentriangles), MDS [opentriangles in (A) through(C); open circles in (D)],and Isomap (filled cir-cles) on four data sets(42). (A) Face imagesvarying in pose and il-lumination (Fig. 1A).(B) Swiss roll data (Fig.3). (C) Hand imagesvarying in finger exten-sion and wrist rotation(20). (D) Handwritten“2”s (Fig. 1B). In all cas-es, residual variance de-creases as the dimen-sionality d is increased.The intrinsic dimen-sionality of the datacan be estimated bylooking for the “elbow”at which this curve ceases to decrease significantly with added dimensions. Arrows mark the true orapproximate dimensionality, when known. Note the tendency of PCA and MDS to overestimate thedimensionality, in contrast to Isomap.
Fig. 3. The “Swiss roll” data set, illustrating how Isomap exploits geodesicpaths for nonlinear dimensionality reduction. (A) For two arbitrary points(circled) on a nonlinear manifold, their Euclidean distance in the high-dimensional input space (length of dashed line) may not accuratelyreflect their intrinsic similarity, as measured by geodesic distance alongthe low-dimensional manifold (length of solid curve). (B) The neighbor-hood graph G constructed in step one of Isomap (with K " 7 and N "
1000 data points) allows an approximation (red segments) to the truegeodesic path to be computed efficiently in step two, as the shortestpath in G. (C) The two-dimensional embedding recovered by Isomap instep three, which best preserves the shortest path distances in theneighborhood graph (overlaid). Straight lines in the embedding (blue)now represent simpler and cleaner approximations to the true geodesicpaths than do the corresponding graph paths (red).
R E P O R T S
www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2321
on
De
ce
mb
er
5,
20
08
w
ww
.scie
nce
ma
g.o
rgD
ow
nlo
ad
ed
fro
m
R function: isomap{vegan}.
Visualisation and Dimensionality Reduction Isomap
Handwritten Characters
tion to geodesic distance. For faraway points,geodesic distance can be approximated byadding up a sequence of “short hops” be-tween neighboring points. These approxima-tions are computed efficiently by findingshortest paths in a graph with edges connect-ing neighboring data points.
The complete isometric feature mapping,or Isomap, algorithm has three steps, whichare detailed in Table 1. The first step deter-mines which points are neighbors on themanifold M, based on the distances dX (i, j)between pairs of points i, j in the input space
X. Two simple methods are to connect eachpoint to all points within some fixed radius !,or to all of its K nearest neighbors (15). Theseneighborhood relations are represented as aweighted graph G over the data points, withedges of weight dX(i, j) between neighboringpoints (Fig. 3B).
In its second step, Isomap estimates thegeodesic distances dM (i, j) between all pairsof points on the manifold M by computingtheir shortest path distances dG(i, j) in thegraph G. One simple algorithm (16 ) for find-ing shortest paths is given in Table 1.
The final step applies classical MDS tothe matrix of graph distances DG " {dG(i, j)},constructing an embedding of the data in ad-dimensional Euclidean space Y that bestpreserves the manifold’s estimated intrinsicgeometry (Fig. 3C). The coordinate vectors yi
for points in Y are chosen to minimize thecost function
E ! !#$DG% " #$DY%!L2 (1)
where DY denotes the matrix of Euclideandistances {dY(i, j) " !yi & yj!} and !A!L2
the L2 matrix norm '(i, j Ai j2 . The # operator
Fig. 1. (A) A canonical dimensionality reductionproblem from visual perception. The input consistsof a sequence of 4096-dimensional vectors, rep-resenting the brightness values of 64 pixel by 64pixel images of a face rendered with differentposes and lighting directions. Applied to N " 698raw images, Isomap (K" 6) learns a three-dimen-sional embedding of the data’s intrinsic geometricstructure. A two-dimensional projection is shown,with a sample of the original input images (redcircles) superimposed on all the data points (blue)and horizontal sliders (under the images) repre-senting the third dimension. Each coordinate axisof the embedding correlates highly with one de-gree of freedom underlying the original data: left-right pose (x axis, R " 0.99), up-down pose ( yaxis, R " 0.90), and lighting direction (slider posi-tion, R " 0.92). The input-space distances dX(i, j )given to Isomap were Euclidean distances be-tween the 4096-dimensional image vectors. (B)Isomap applied to N " 1000 handwritten “2”sfrom the MNIST database (40). The two mostsignificant dimensions in the Isomap embedding,shown here, articulate the major features of the“2”: bottom loop (x axis) and top arch ( y axis).Input-space distances dX(i, j ) were measured bytangent distance, a metric designed to capture theinvariances relevant in handwriting recognition(41). Here we used !-Isomap (with ! " 4.2) be-cause we did not expect a constant dimensionalityto hold over the whole data set; consistent withthis, Isomap finds several tendrils projecting fromthe higher dimensional mass of data and repre-senting successive exaggerations of an extrastroke or ornament in the digit.
R E P O R T S
22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320
on D
ecem
ber
5, 2008
ww
w.s
cie
ncem
ag.o
rgD
ow
nlo
aded fro
m
Visualisation and Dimensionality Reduction Isomap
Faces
tion to geodesic distance. For faraway points,geodesic distance can be approximated byadding up a sequence of “short hops” be-tween neighboring points. These approxima-tions are computed efficiently by findingshortest paths in a graph with edges connect-ing neighboring data points.
The complete isometric feature mapping,or Isomap, algorithm has three steps, whichare detailed in Table 1. The first step deter-mines which points are neighbors on themanifold M, based on the distances dX (i, j)between pairs of points i, j in the input space
X. Two simple methods are to connect eachpoint to all points within some fixed radius !,or to all of its K nearest neighbors (15). Theseneighborhood relations are represented as aweighted graph G over the data points, withedges of weight dX(i, j) between neighboringpoints (Fig. 3B).
In its second step, Isomap estimates thegeodesic distances dM (i, j) between all pairsof points on the manifold M by computingtheir shortest path distances dG(i, j) in thegraph G. One simple algorithm (16 ) for find-ing shortest paths is given in Table 1.
The final step applies classical MDS tothe matrix of graph distances DG " {dG(i, j)},constructing an embedding of the data in ad-dimensional Euclidean space Y that bestpreserves the manifold’s estimated intrinsicgeometry (Fig. 3C). The coordinate vectors yi
for points in Y are chosen to minimize thecost function
E ! !#$DG% " #$DY%!L2 (1)
where DY denotes the matrix of Euclideandistances {dY(i, j) " !yi & yj!} and !A!L2
the L2 matrix norm '(i, j Ai j2 . The # operator
Fig. 1. (A) A canonical dimensionality reductionproblem from visual perception. The input consistsof a sequence of 4096-dimensional vectors, rep-resenting the brightness values of 64 pixel by 64pixel images of a face rendered with differentposes and lighting directions. Applied to N " 698raw images, Isomap (K" 6) learns a three-dimen-sional embedding of the data’s intrinsic geometricstructure. A two-dimensional projection is shown,with a sample of the original input images (redcircles) superimposed on all the data points (blue)and horizontal sliders (under the images) repre-senting the third dimension. Each coordinate axisof the embedding correlates highly with one de-gree of freedom underlying the original data: left-right pose (x axis, R " 0.99), up-down pose ( yaxis, R " 0.90), and lighting direction (slider posi-tion, R " 0.92). The input-space distances dX(i, j )given to Isomap were Euclidean distances be-tween the 4096-dimensional image vectors. (B)Isomap applied to N " 1000 handwritten “2”sfrom the MNIST database (40). The two mostsignificant dimensions in the Isomap embedding,shown here, articulate the major features of the“2”: bottom loop (x axis) and top arch ( y axis).Input-space distances dX(i, j ) were measured bytangent distance, a metric designed to capture theinvariances relevant in handwriting recognition(41). Here we used !-Isomap (with ! " 4.2) be-cause we did not expect a constant dimensionalityto hold over the whole data set; consistent withthis, Isomap finds several tendrils projecting fromthe higher dimensional mass of data and repre-senting successive exaggerations of an extrastroke or ornament in the digit.
R E P O R T S
22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320
on D
ecem
ber
5, 2008
ww
w.s
cie
ncem
ag.o
rgD
ow
nlo
aded fro
m
Visualisation and Dimensionality Reduction Isomap
Nonlinear Dimensionality Reduction Techniques
Kernel PCALocally Linear EmbeddingLaplacian EigenmapsMaximum Variance Unfolding
Clustering
Clustering
Clustering
Clustering
Many datasets consist of multiple heterogeneous subsets.Cluster analysis: Given an unlabelled data, want algorithms thatautomatically group the datapoints into coherent subsets/clusters.Examples:
market segmentation of shoppers based on browsing and purchase historiesdifferent types of breast cancer based on the gene expressionmeasurementsdiscovering communities in social networksimage segmentation
Clustering
Types of Clustering
Model-based clustering:Each cluster is described using a probability model.
Model-free clustering:Defined by similarity/dissimilarity among instances within clusters.
Clustering
This Lecture: Two “Model-free” Clustering Methods
K-means clustering: a partition-based method into K clusters. Findsgroups such that variation within each group is small. The number ofclusters K is usually fixed beforehand or various values of K areinvestigated as a part of the analysis.Hierarchical clustering: nearby data items are joined into clusters, thenclusters into super-clusters forming a hierarchy. Typically, the hierarchyforms a binary tree (a dendrogram) where each cluster has two“children” clusters. Dendrogram allows to view the clusterings for eachpossible number of clusters, from 1 to n (number of data items).
Clustering K-means
K-meansPartition-based methods seek to divide data points into a pre-assignednumber of clusters C1, . . . ,CK where for all k, k′ ∈ {1, . . . ,K},
Ck ⊂ {1, . . . , n} , Ck ∩ Ck′ = ∅ ∀k 6= k′,K⋃
k=1
Ck = {1, . . . , n} .
For each cluster, represent it using a prototype or cluster centroid µk.We can measure the quality of a cluster with its within-cluster deviance
W(Ck, µk) =∑i∈Ck
‖xi − µk‖22.
The overall quality of the clustering is given by the total within-clusterdeviance:
W =K∑
k=1
W(Ck, µk).
The overall objective is to choose both the cluster centroids and allocation ofpoints to minimize the objective function.
Clustering K-means
K-means
W =
K∑k=1
∑i∈Ck
‖xi − µk‖22 =
n∑i=1
‖xi − µci‖22
where ci = k if and only if i ∈ Ck.Given partition {Ck}, we can find the optimal prototypes easily bydifferentiating W with respect to µk:
∂W∂µk
= 2∑i∈Ck
(xi − µk) = 0 ⇒ µk =1|Ck|
∑i∈Ck
xi
Given prototypes, we can easily find the optimal partition by assigningeach data point to the closest cluster prototype:
ci = argmink‖xi − µk‖2
2
But joint minimization over both is computationally difficult.
Clustering K-means
K-means
The K-means algorithm is a widely used method that returns a local optimumof the objective function W, using iterative and alternating minimization.
1 Randomly initialize K cluster centroids µ1, . . . , µK .2 Cluster assignment: For each i = 1, . . . , n, assign each xi to the cluster
with the nearest centroid,
ci := argmink‖xi − µk‖2
2
Set Ck := {i : ci = k} for each k.3 Move centroids: Set µ1, . . . , µK to the averages of the new clusters:
µk :=1|Ck|
∑i∈Ck
xi
4 Repeat steps 2-3 until convergence.5 Return the partition {C1, . . . ,CK} and means µ1, . . . , µK .
Clustering K-means
K-means
The algorithm stops in a finite number of iterations. Between steps 2and 3, W either stays constant or it decreases, this implies that we neverrevisit the same partition. As there are only finitely many partitions, thenumber of iterations cannot exceed this.The K-means algorithm need not converge to global optimum.K-means is a heuristic search algorithm so it can get stuck at suboptimalconfigurations. The result depends on the starting configuration. Typicallyperform a number of runs from different configurations, and pick the endresult with minimum W.
−0.5 0.0 0.5 1.0 1.5
0.0
0.5
1.0
x
y
W= 9.184
−0.5 0.0 0.5 1.0 1.5
0.0
0.5
1.0
x
y
W= 3.418
−0.5 0.0 0.5 1.0 1.5
0.0
0.5
1.0
x
y
W= 9.264
Clustering K-means
K-means on Crabs
Looking at the Crabs data again.
library(MASS)library(lattice)data(crabs)
splom(~log(crabs[,4:8]),pch=as.numeric(crabs[,2]),col=as.numeric(crabs[,1]),main="circle/triangle is gender, black/red is species")
Clustering K-means
K-means on Crabscircle/triangle is gender, black/red is species
Scatter Plot Matrix
FL2.62.83.03.2
2.62.83.03.2
2.02.22.42.6
2.02.22.42.6●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●● ●●●●
●●
● ●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●● ●●●●●●●●
●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●
●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
RW2.42.62.83.0
2.42.62.83.0
1.82.02.22.4
1.82.02.22.4●
●●●●●●●●●●●
●●●●●
●●●●●●●●
●●●●●●
●●●●●●●
●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●
●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●
●●●●●●●●●● ●
●●●●●
●●●●●●●●●●●●●●
●●●●● ●●●●●●●
●●●
●●●●
●●●
●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●
●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●
●●● ●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●
●
CL3.43.63.8
3.43.63.8
2.83.03.2
2.83.03.2●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●
●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●● ●●●●●
●●●●●●●●●●●●●
●
●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●
●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●
●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●● ●●●●
●●● ●●
●●●●●●●●●●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●
●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●
●●●●●●
CW3.43.63.84.0
3.43.63.84.0
2.83.03.23.4
2.83.03.23.4●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●●
●●●●● ●●●●●●●●●●●●●
●
●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●
●●● ●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●●
●●●●●●●●●●●
●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●
●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
BD2.5
3.02.5 3.0
2.0
2.5
2.0 2.5
Clustering K-means
K-means on Crabs
Apply K-means with 2 clusters and plot results.
Crabs.kmeans <- kmeans( log(crabs[,4:8]), 2, nstart=1, iter.max=10)
splom(~log(crabs[,4:8]),col=Crabs.kmeans$cluster+2,main="blue/green is cluster; finds big/small")
Clustering K-means
K-means on Crabsblue/green is cluster finds big/small
Scatter Plot Matrix
FL2.62.83.03.2
2.62.83.03.2
2.02.22.42.6
2.02.22.42.6 ●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●● ●●●●
●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●
● ●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●
●●●
●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●
●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●
●●●
●●●●●●●
●●●●●●●●●●●●●●● ●●●●●●●●●
●●●●●●●●●
●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●
●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●● ●●●●●●●●
●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●
●●●
●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●
●
●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
RW2.42.62.83.0
2.42.62.83.0
1.82.02.22.4
1.82.02.22.4●
●●●●●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●●
●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●●●
●●●●●
●●●●●●●●
●●●●●●
●●●●●●●
●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●●●●●
●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●●●●●
●●●●●●●
●●●●●●●●●●●●
●●●●●●
●●●●●●●●●
●●●●
●
●●●●●
●
●
●●●●●●●●●● ●
●●●●●
●●●●●●●●●●●●●●
●●●●● ●●●●●●●
●●●
●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●
●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●
●●●●●
●
●●●
●●●●
●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●
●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●●● ●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●
●
CL3.43.63.8
3.43.63.8
2.83.03.2
2.83.03.2●
●●●●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●
●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●
●●●
●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●● ●●●●●
●●●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●●
●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●● ●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●●● ●●
●●●●●●●●●●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●
●
●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●
●●●●●●
CW3.43.63.84.0
3.43.63.84.0
2.83.03.23.4
2.83.03.23.4 ●●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●●
●●●●● ●●●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●
●
●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●
●
●
●●●
●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
● ●●●●
●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●● ●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●●
●●●●●●●●●●●
●●●●●●
●
● ●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●
●●●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
BD2.5
3.02.5 3.0
2.0
2.5
2.0 2.5
Clustering K-means
K-means on Crabs
‘Whiten’ or ‘sphere’1 the data using PCA.
pcp <- princomp( log(crabs[,4:8]) )Crabs.sphered <- pcp$scores %*% diag(1/pcp$sdev)splom( ~Crabs.sphered[,1:3],
col=as.numeric(crabs[,1]),pch=as.numeric(crabs[,2]),main="circle/triangle is gender, black/red is species")
And apply K-means again.
Crabs.kmeans <- kmeans(Crabs.sphered, 2, nstart=1, iter.max=20)splom( ~Crabs.sphered[,1:3],
col=Crabs.kmeans$cluster+2, main="blue/green is cluster")
1Apply a linear transformation so that the covariance matrix is identity.
Clustering K-means
K-means on Crabs
circle/triangle is gender, black/red is species
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●●● ●● ●●● ●
●●● ●●
●●●
●
●●
●●●
● ●●●●●
● ●●●●● ●●●
●● ●● ●●●●●●
●● ●●●
●● ●●●●●●
●
●●●●●
●
●
● ●●●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●●●●●●
●●●● ●
●●●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●●
●●●●●
●● ●●●●●● ●●
●●●
●●●●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
20 1 2
−2
−1
0
−2 −1 0
●
●
●●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●●●
●●●
●●●
●
●
●
●●●●
●●
●●●●
●
●
●●
●
●●
●●●●●
●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●●
● ●●●
●
●
●●
●
●●
●●●● ●
●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
blue/green is cluster
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●
●●
● ●● ●●
●●● ● ●●● ●●●●●● ● ●●●
● ●●● ●●● ●●●●● ●●●●
●● ●●● ●
●
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●●● ●● ●●● ●
●●● ●●
●●●
●
●
● ●●●●
●●●
●●● ●●● ●
●●●●●●
●●●
●●● ●● ● ●●●● ●●● ●●●● ●
●●●● ●
●●
●●
●●●
● ●●●●●
● ●●●●● ●●●
●● ●● ●●●●●●
●● ●●●
●● ●●●●●●
●
●●●●●
●
●
●●●●
●●●●●
● ●●● ● ●●● ●●
● ●●●● ●●● ●●
●●●●● ●●● ●●● ●●
● ●●●●●
●
●
● ●●●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●●●●●●
●●●● ●
●●●●
●
●●●
●●●
● ●●
● ●●●● ●●● ●
●●●●● ●
●●●●●● ●● ●● ●●● ●●●
●●●● ●● ●
●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●●
●●●●●
●● ●●●●●● ●●
●●●
●●●●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●●
●●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●●●●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●●●●
●●●
●
●●●●
●
●
●
●
●●
●●●
●●●●
●●
●●●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
20 1 2
−2
−1
0
−2 −1 0●
●
●
●●●
●
●
●
●● ●
●
●
● ●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●●●●
●●●●●●
●
●
● ●●●
●
●
●
●
●●
●●●
●●●
●
●●
●●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●●
●
●●
●●●
●
●
●
●●●●●●●
●
●
●●●●
●
●●
●●
●
●●●●
●
●
●
●●●
●●●
●
●●●●
●●
●●
●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●●●
●●●
●●●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●●
●●●●
●●
●
●
●●●
●●
●●●●
●
●
●●
●
●●
●●●●●
●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
● ●
●
●●●● ●● ●
●
●
●●
●●
●
●●
●●
●
●●● ●
●
●
●
●●●
●●●
●
●●●●
●●
●●
●
● ●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●●
●●● ●●●
●
●
●●
●
●●
● ●●●
●
●
●●
●
●●
●●●● ●
●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
Discovers gender difference...But the result depends crucially on sphering the data first!
Clustering K-means
K-means on Crabs with K = 4circle/triangle is gender, black/red is species
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●●● ●● ●●● ●
●●● ●●
●●●
●
●●
●●●
● ●●●●●
● ●●●●● ●●●
●● ●● ●●●●●●
●● ●●●
●● ●●●●●●
●
●●●●●
●
●
● ●●●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●●●●●●
●●●● ●
●●●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●●
●●●●●
●● ●●●●●● ●●
●●●
●●●●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
20 1 2
−2
−1
0
−2 −1 0
●
●
●●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●●●
●●●
●●●
●
●
●
●●●●
●●
●●●●
●
●
●●
●
●●
●●●●●
●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●●
● ●●●
●
●
●●
●
●●
●●●● ●
●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
colors are clusters
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●
●●
●●● ●
●●●● ● ●●● ● ●
●●●● ● ● ●●
● ●●● ●●● ●●●●●
●●●●
●● ●
●● ●
●
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●
●● ●● ●●
● ●●●
● ●●●●●
●
●
●●
●●●
●●●
●●● ● ●● ●
●●●●●●
●●●
●●● ●● ● ●●●● ●●
● ●●●● ●
●●●
● ●●●
●●
●●●
●●●●●●
● ●●●●● ●●●
●●●● ●●●●●●
●● ●●●
●● ●● ●●●●
●
●●●●●
●
●
●●
●●
●●●●
●● ●●● ● ●●● ●
●● ●●●● ●●
● ●●●●●●● ●●● ●
●● ●●
●●●●● ●
●
●
● ●● ●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●
●● ●●●
● ●●● ●
●●●●
●
●●
●●●
●● ●
●● ●●●● ●
●● ●●●●
●● ●
●●●●●● ●● ●● ●●
● ●●●●●
●●●
● ●●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●
●●●●●
●●● ●●
●●●● ●●●●
●
●●●●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●●●●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●
●●●
●●
●
●
●●●●
●
●
●
●
●●
●●
●
●●●●
●●
●●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
2 0 1 2
−2
−1
0
−2 −1 0
●
●
●
●●●
●
●
●
●
● ●
●
●
● ●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●●
●
●●●●●
●
●
● ●●●
●
●
●
●
●●
●●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
●
●
●●●
●●●●
●
●
●
●●
●
●
●●
●●
●
●●●●
●
●
●
●●●
●●●
●
●●●
●
●●
●●
●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●
●●●●●
●●●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●
●
●●●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●●●●
●●
●
●
●●
●
●●
●●●●
●
●
●●
●
●
●
●●
●●●●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●
●●●
●●●●
● ●
●
●●●●
●● ●
●
●
●
●●
●
●
●●
●●
●
●●● ●
●
●
●
●●●
●●●
●
●●●
●
●●
●●
●
● ●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●●● ●●●
●
●
●●
●
●●
● ●●●
●
●
●●
●
●
●
●●
●● ●●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
> table(Crabs.kmeans$cluster,Crabs.class)Crabs.classBF BM OF OM
1 3 0 41 02 39 8 6 03 8 42 0 04 0 0 3 50
Clustering K-means
K-means Additional Comments
Good practice initialization. Randomly pick K training examples(without replacement) and set µ1, µ2, . . . , µK equal to those examplesSensitivity to distance measure. Euclidean distance can be greatlyaffected by measurement unit and by strong correlations. Can useMahalanobis distance instead:
‖x− y‖M =√(x− y)>M−1(x− y)
where M is positive semi-definite matrix, e.g. sample covariance.Determination of K. The K-means objective will always improve withlarger number of clusters K. Determination of K requires an additionalregularization criterion. E.g., in DP-means2, use
W =
K∑k=1
∑i∈Ck
‖xi − µk‖22 + λK
2DP-means paper.
Clustering K-means
Other partition based methods
Other partition-based methods with related ideas:K-medoids3: requires cluster centroids µi to be an observation xj
K-medians: cluster centroids represented by a median in eachdimensionK-modes: cluster centroids represented by a mode estimated from acluster
3See also Affinity propagation.
Clustering Hierarchical Clustering
Hierarchical Clustering
Hierarchically structured data is ubiquitous (genus, species, subspecies,individuals...)There are two general strategies for generating hierarchical clusters.Both proceed by seeking to minimize some measure of overalldissimilarity.
Agglomerative / Bottom-Up / MergingDivisive / Top-Down / Splitting
Higher level clusters are created by merging clusters at lower levels. Thisprocess can easily be viewed by a tree/dendrogram.Avoids specifying how many clusters are appropriate.
hclust, agnes{cluster}
Clustering Hierarchical Clustering
EU Indicators Data
6 Economic indicators for EU countries in 2011.
> eu<-read.csv(’http://www.stats.ox.ac.uk/~sejdinov/sdmml/data/eu_indicators.csv’,sep=’ ’)
> eu[1:15,]Countries abbr CPI UNE INP BOP PRC UN_perc
1 Belgium BE 116.03 4.77 125.59 908.6 6716.5 -1.62 Bulgaria BG 141.20 7.31 102.39 27.8 1094.7 3.53 CzechRep. CZ 116.20 4.88 119.01 -277.9 2616.4 -0.64 Denmark DK 114.20 6.03 88.20 1156.4 7992.4 0.55 Germany DE 111.60 4.63 111.30 499.4 6774.6 -1.36 Estonia EE 135.08 9.71 111.50 153.4 2194.1 -7.77 Ireland IE 106.80 10.20 111.20 -166.5 6525.1 2.08 Greece EL 122.83 11.30 78.22 -764.1 5620.1 6.49 Spain ES 116.97 15.79 83.44 -280.8 4955.8 0.710 France FR 111.55 6.77 92.60 -337.1 6828.5 -0.911 Italy IT 115.00 5.05 87.80 -366.2 5996.6 -0.512 Cyprus CY 116.44 5.14 86.91 -1090.6 5310.3 -0.413 Latvia LV 144.47 12.11 110.39 42.3 1968.3 -3.614 Lithuania LT 135.08 11.47 114.50 -77.4 2130.6 -4.315 Luxembourg LU 118.19 3.14 85.51 2016.5 10051.6 -3.0
Data from Greenacre (2012)
Clustering Hierarchical Clustering
EU Indicators Data
dat<-scale(eu[,3:8])rownames(dat)<-eu$Countriesbiplot(princomp(dat))
−0.2 0.0 0.2 0.4
−0.
20.
00.
20.
4
Comp.1
Com
p.2
Belgium
Bulgaria
CzechRep.
Denmark
Germany
Estonia
Ireland
Greece
Spain
FranceItaly
Cyprus
LatviaLithuania
Luxembourg
HungaryMalta
NetherlandsAustria
Poland
Portugal
Romania
Slovenia
Slovakia
Finland
Sweden
UnitedKingdom
−4 −2 0 2 4 6
−4
−2
02
46
CPI
UNE
INPBOP
PRC
UN_perc
Clustering Hierarchical Clustering
Visualising Hierarchical Clustering
> hc<-hclust(dist(dat))> plot(hc,hang=-1)
Luxe
mbo
urg
Bel
gium
Ger
man
yA
ustr
iaN
ethe
rland
sD
enm
ark
Sw
eden
Cze
chR
ep.
Mal
taS
love
nia
Fra
nce
Fin
land
Italy
Uni
tedK
ingd
omC
ypru
sP
ortu
gal
Gre
ece
Irel
and
Spa
inR
oman
iaB
ulga
riaH
unga
ryP
olan
dS
lova
kia
Est
onia
Latv
iaLi
thua
nia
01
23
45
6
Cluster Dendrogram
hclust (*, "complete")dist(dat)
Hei
ght
> library(ape)> plot(as.phylo(hc), type = "fan")
Belgium
Bul
garia
CzechR
ep.
Den
mar
k
Germany
Estonia
Irelan
d
Greece
Spai
n
France
Italy
Cyprus
Latvia
Lithuania
Luxembourg
Hungary
Malta
Net
herla
nds
Austria
Poland
Portugal
Rom
ania
Slovenia
Slovakia
Finland
Sw
eden
UnitedKingdom
Clustering Hierarchical Clustering
Visualising Hierarchical Clustering
Levels in the dendrogram represent a dissimilarity between examples.Tree dissimilarity dT
ij = minimum height in the tree at which examples iand j belong to the same cluster.ultrametric (stronger than triangle) inequality:
dTij ≤ max{dT
ik, dTkj}.
Hierarchical clustering can be interpreted as an approximation of a givendissimilarity dij by an ultrametric dissimilarity.
Clustering Hierarchical Clustering
Measuring Dissimilarity Between ClustersTo join clusters Ci and Cj into super-clusters, we need a way to measure thedissimilarity D(Ci,Cj) between them.
b
b
b
b
bx1
x2
x3
x4
x5
single linkage: d(x1, x3)
b
b
b
b
bx1
x2
x3
x4
x5
complete linkage: d(x2, x5)
b
b
b
b
bx1
x2
x3
x4
x5
average linkage: 1
6
∑2
i=1
∑5
j=3d(xi, xj)
Clustering Hierarchical Clustering
Measuring Dissimilarity Between ClustersTo join clusters Ci and Cj into super-clusters, we need a way to measure thedissimilarity D(Ci,Cj) between them.(a) Single Linkage: elongated, loosely connected clusters
D(Ci,Cj) = minx,y
(d(x, y)|x ∈ Ci, y ∈ Cj)
(b) Complete Linkage: compact clusters, relatively similar objects canremain separated at high levels
D(Ci,Cj) = maxx,y
(d(x, y)|x ∈ Ci, y ∈ Cj)
(c) Average Linkage: tries to balance the two above, but affected by thescale of dissimilarities
D(Ci,Cj) = avgx,y (d(x, y)|x ∈ Ci, y ∈ Cj)
Clustering Hierarchical Clustering
Hierarchical Clustering on Artificial Dataset
−20 0 20 40 60 80 100
−40
−20
020
4060
80
V1
V2
1
2
3
4
5
6
7
8
910
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
3233 34
3536 37
3839
40
4142
43
44 45
4647
4849
50
51
52
5354
55
56
5758
59
60
6162
63
64
65
66 67
68
6970
71
72
7374
7576
777879
80
8182
8384
85
86
87
888990
91
92
93
94
95
96
97
9899
100101
102103
104
105106
107
108109
110
111112
113114
115
116117
118
119
120
121122
123
124
125
126
127128
129
130
131
132133
134135
136
137
138 139
140141142
143144 145
146
147
148149
150151
152
153
154
155156
157
158
159160
161162163
164
165
166
167 168
169
170171
172173
174
175
176 177
178
179180
181 182183
184
185
186
187188
189190191
192193 194195
196 197198
199
200201
202
203
204
205
206
207208
209210 211
212
213
214
215
216
217
218
219 220
221 222223
224225
226
227228229
230
231
232
233
234
235
236237
238
239
240241242
243
244
245246247
248
249
250251
252
253 254 255256257
258259
260261
262
263
264 265
266
267268
269
270
271272
273274
275
276
277
278
279
280
281282
283284285286
287
288
289 290
291
292
293
294
295
296
297
298299
300 301
302
303
304305
306
307308
309
310
311
312313
314
315
316
317
318319
320
321
322
323
324
325
326
327
328
329
330
331
332333334 335 336337
338339
340
341
342 343
344345
346347
348
349
350
351
352
353
354
355
356357
358
359
360
361
362
363364
365
366
367
368369
370371372
373
374375
376377378
379380
381382
383384385
386
387
388
389
390391
392393
394
395
396 397398
399
400
401402
403
404
405
406
407
408
409
410
411412
413
414
415
416
417 418419420
421
422
423424
425426
427
428
429
430
431432433
434
435
436437
438
439
440441 442
443
444445
446447 448
449
450
451
452453
454
455
456
457
458
459460
461
462
463464
465466467
468
469
470
471
472
473
474
475476
477478 479
480
481
482
483
484
485
486
487
488489
490491 492
493
494
495496
497
498499
500
501
502
503 504
505
506507
508
509510
511
512
513
514
515
516
517 518519
520521 522
523
524
525
526
527
528
529
530
531
532
533534
535
536
537
538539 540
541
542
543
544
545
546
547
548549
550
551552
553
554
555
556557
558
559
560
561
562
563
564
565
566567
568
569570
571 572573
574
575576577
578
579
580581
582 583
584585586587
588
589590
591
592593 594
595
596 597
598
599600
601
602603604
605
606
607
608
609
610
611
612
613
614615
616617
618619
620
621622
623 624
625
626
627
628629
630
631
632
633
634635636
637638
639
640641
642643 644
645
646
647
648
649650
651
652
653
654
655
656657
658659
660
661
662663
664 665
666
667
668669
670
671672
673674
675 676
677
678
679
680
681
682683
684
685
686
687688
689
690
691692
693
694
695696
697698699700701
702
703 704
705
706
707
708
709
710
711
712
713
714
715
716717
718
719
720
721722
723 724725
726
727728729
730731
732
733
734
735
736
737738
739
740
741
742743
744
745
746
747
748
749
750751 752753
754755756
757
758
759
760
761762
763764
765
766
767768
769
770771
772 773774
775
776777
778
779
780781
782
783784
785786787
788
789
790791
792
793794 795796797
798
799
800801
802803
804
805
806
807
808
809810
811
812
813814
815
816
817
818819
820821
822
823
824825
826 827828
829
830
831
832 833
834
835
836
837838
839 840
841
842
843844
845
846
847
848849
850
851
852
853854
855856
857 858
859
860861
862
863
864
865866 867
868
869
870
871
872 873874
875876
877878879880
881
882
883
884
885886 887888
889
890
891
892893
894895
896
897
898
899
900
901
902
903904
905
906
907 908
909
910
911912
913
914
915916
917
918919
920
921922
923
924
925926
927
928929930
931
932
933
934
935
936
937
938
939
940
941942
943944
945
946
947
948
949
950951952
953
954
955
956
957
958959960
961
962
963
964965
966967
968969
970
971972
973974
975976
977
978979
980
981
982983984
985
986
987
988
989990
991
992
993
994
995
996997998
9991000
1001
1002
1003
1004
10051006
1007
1008 1009
1010
10111012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
10231024
1025102610271028
1029 103010311032
10331034 1035
1036
1037
1038
1039
1040
1041
1042
10431044
1045
1046
1047
10481049
1050
1051
1052
1053
10541055
1056
1057
1058105910601061
1062
1063
1064
1065 1066
1067
1068
1069
1070
1071
1072
1073
1074
1075 1076
1077
1078
1079
1080
1081
1082
1083
10841085
1086
1087
1088
1089
10901091
1092
1093
10941095
1096 10971098
1099
11001101
1102
11031104
11051106
1107
11081109
1110
1111
111211131114
1115
1116
11171118
1119
1120
11211122
11231124
1125
1126
1127
1128
11291130
1131
11321133
1134
1135 1136
1137
11381139
1140
11411142 11431144
1145
1146
1147
11481149
1150
11511152
1153
1154
11551156
1157
1158
1159
11601161
1162
1163
1164
116511661167
11681169
1170
11711172
1173
1174
1175
117611771178
1179
1180
1181
11821183
1184
11851186
1187
11881189
1190
1191
1192
1193
1194
1195
1196
119711981199
1200
1201
1202
1203
12041205
1206
1207
1208
12091210
1211
1212
1213
1214
1215
1216
12171218
1219
1220
122112221223
1224
1225
12261227
12281229
1230
1231
1232
1233
1234
1235
1236
1237
1238
123912401241
1242
1243
1244
1245
12461247
12481249
1250
1251
1252
1253
125412551256
1257
12581259
1260126112621263
1264
1265
1266 12671268
1269
12701271
1272
1273
1274
1275
12761277
1278
1279 12801281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
12931294
12951296
1297
1298
129913001301
1302
1303
13041305 1306
1307
1308
13091310
13111312
1313
13141315
131613171318
1319
1320
13211322
1323
1324
1325
13261327
1328 132913301331
13321333
1334
1335
1336
1337
1338
13391340
13411342
1343
1344 13451346
1347
1348
13491350
1351
1352
1353
1354
13551356
13571358
1359
1360
13611362
1363
1364
1365
1366
1367
136813691370 1371
1372
1373
1374
137513761377
1378
1379
1380
1381
13821383
13841385
1386
1387
13881389
139013911392
13931394
1395
1396
1397
13981399
14001401
1402
1403
1404
1405140614071408
1409
1410
1411
14121413
1414
14151416
1417
1418
1419
1420
14211422
1423
1424
1425
1426
142714281429
1430
1431
1432
14331434
14351436
1437
1438
1439
144014411442 1443
1444
1445
1446
1447
14481449
1450
1451
1452
14531454
1455
1456
1457
14581459
14601461
1462
14631464
1465146614671468
1469
1470
1471
14721473
14741475
1476
1477
14781479
14801481
14821483
148414851486
14871488
1489
1490
1491
14921493
14941495
1496
1497
14981499
1500
15011502
1503
1504
150515061507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
15241525
1526
15271528
1529
15301531
15321533
1534
1535
1536
1537
1538
1539 1540
15411542
154315441545
15461547
15481549 1550
1551
1552 1553
1554
1555
1556
1557
1558
15591560
15611562
15631564
1565
1566
1567
1568
1569
1570 15711572
1573
1574
1575
1576
15771578
1579
15801581
1582
1583
1584
15851586
1587
1588
15891590
1591
1592
1593
15941595
1596
1597
159815991600
1601
1602 1603
1604
1605
1606
1607
1608
1609
1610
1611
161216131614
16151616
16171618
1619 16201621
1622
1623
1624
1625
16261627
162816291630
16311632
1633
1634
16351636 1637
16381639
1640
1641
1642
1643
1644
1645
1646
16471648
1649
165016511652
1653
1654
1655
1656
1657
1658
16591660
1661
1662
16631664
16651666
16671668
1669 1670
1671
1672
16731674
1675
167616771678
1679
1680
16811682
16831684
1685
16861687
1688 16891690
1691
1692
1693
1694
1695
1696
1697 169816991700
17011702
17031704
17051706
1707
1708
1709
17101711
17121713
1714
1715
1716
1717
1718
17191720
1721
1722
1723
1724 17251726
1727
1728
17291730
17311732
1733
1734
1735
1736 17371738
17391740
174117421743
1744
174517461747
1748
1749
1750
17511752
1753
1754
1755
1756
1757
1758
175917601761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
17721773
17741775
17761777177817791780
1781
17821783 1784
1785
17861787
1788
17891790
1791
179217931794
17951796
1797
1798
1799
1800
1801
1802
1803
1804
1805 18061807
1808
18091810
18111812
18131814
1815
18161817
18181819
1820
1821
1822
1823
182418251826
1827
1828
1829
1830
183118321833
1834
18351836
1837
18381839
1840
1841
1842 1843
1844
1845
1846
18471848
1849
1850
1851
18521853
1854185518561857 1858
1859
1860
18611862
1863
1864
1865
18661867
186818691870
1871
1872
1873
1874
1875
1876
18771878
1879
188018811882
1883
18841885
1886
18871888
1889
18901891
18921893
1894
18951896
18971898
18991900
1901
1902
1903 1904
1905
1906
190719081909
1910
19111912 1913
1914
1915
1916
19171918
1919
1920
1921
1922
1923
1924
1925
19261927
1928
1929
1930 1931
1932
1933
19341935
1936
19371938 19391940
19411942
194319441945
19461947
1948
1949
1950
19511952
1953
1954
1955
195619571958
1959
19601961
1962
1963
1964
19651966
1967
1968
19691970
19711972
1973
19741975
1976
1977
19781979
19801981
1982
19831984
1985
19861987
1988
19891990
1991
1992
19931994 1995
1996
1997
19981999
20002001
20022003
2004
20052006
2007
20082009
2010
2011
2012
2013
2014
2015
2016
201720182019
2020 20212022
2023 20242025
2026
20272028
2029
2030
2031
2032
20332034
2035
20362037
2038
20392040
2041
204220432044
204520462047
20482049
2050
2051
20522053
2054
2055
2056
2057
2058
2059 2060
2061
206220632064
206520662067
2068
20692070
2071
20722073
2074
207520762077
2078
2079
2080
2081
2082
2083
2084
20852086
2087
2088
2089
2090
20912092
2093
209420952096
2097
2098 2099
2100
2101
2102
2103 2104
2105
2106
2107
2108
2109
2110
211121122113
21142115
2116
21172118
2119
2120
2121
21222123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
21382139
2140
2141
2142
2143
21442145
2146
2147
21482149
2150 21512152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
21652166
2167
2168
21692170
2171
2172
21732174
2175
2176
2177 2178
2179
2180
2181
2182
2183
2184
21852186
21872188
2189
2190
2191
21922193
2194
21952196
2197
2198
2199
2200
2201
2202
22032204
22052206
2207 2208
2209
2210
2211
2212
2213
22142215
2216
22172218
2219 2220
2221
2222
22232224
2225
2226
2227
2228
22292230
2231
2232
22332234
2235
2236
2237
22382239
2240
2241
2242
2243
2244
2245
2246
22472248
2249
2250
22512252
2253
2254
225522562257
2258
22592260
2261
22622263
2264
2265 22662267
2268
22692270
2271
22722273
2274
22752276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
22932294
22952296
22972298
2299
2300
2301
2302
2303
23042305
2306
2307
2308
230923102311
2312
23132314
23152316
2317
2318 2319
2320
2321
23222323
2324
23252326
232723282329
2330
2331
2332
2333
23342335
2336
2337
2338
2339
2340
2341
2342
2343
23442345
2346
2347
23482349
2350
23512352
2353 2354
23552356
23572358
2359
2360
23612362
2363
23642365
2366
2367
2368
23692370
2371
2372
2373
2374
2375
2376
2377
2378
2379
23802381
23822383
2384
23852386
2387
2388
2389
23902391
23922393
23942395
2396
2397
23982399
24002401
2402 240324042405
24062407
2408
24092410
2411
2412
24132414
2415
2416
2417
24182419
2420
2421 24222423
24242425
2426
242724282429
2430
2431
2432
24332434
2435
2436
2437
2438
2439
244024412442
2443 2444
2445
24462447
2448
2449
2450
2451
2452
2453
24542455
2456
24572458 2459
2460
2461
24622463
246424652466
246724682469
2470
2471
2472
2473
2474
2475
2476
2477
24782479
2480
2481
2482
24832484
2485
2486
2487
24882489
2490
249124922493
2494
2495
2496
2497
24982499
2500
2501
2502
2503
250425052506
25072508
2509
2510
2511
2512
25132514
2515
2516
2517
2518 2519
2520
2521
2522
2523
25242525
2526 2527
2528
25292530
2531
2532
2533
2534
25352536
2537
25382539 2540
2541
2542
2543
2544
2545
2546
25472548
2549
25502551 25522553
2554
2555
25562557
255825592560
25612562
2563
2564
2565
25662567
2568
2569
2570
2571
2572
2573
25742575
2576
2577
2578
2579
2580
2581
25822583
2584
2585
2586
2587
25882589
2590
25912592
25932594
2595
25962597
25982599
2600
2601
2602
2603
2604
2605
2606
2607
26082609
26102611
2612
26132614
26152616
26172618
2619
26202621
2622
26232624
2625
2626
2627
2628
2629
2630
2631
2632
263326342635
2636
263726382639
2640
2641
26422643
26442645
264626472648
2649
2650
26512652
2653
2654
2655
2656
2657
26582659 2660
266126622663
2664
2665
2666
2667
26682669
2670
2671
2672
2673
2674
2675
2676
26772678
2679
26802681
2682
268326842685 26862687
2688
2689
2690
26912692
26932694
2695
2696
2697
2698
2699
27002701
2702
27032704
2705
27062707
2708
2709
2710
2711
2712
2713 27142715
2716
27172718
2719
27202721
2722
2723
2724
2725
2726
2727
2728
2729
27302731
2732
2733
2734
27352736
2737
27382739
2740
2741
27422743
27442745
2746
2747
2748
2749
2750
27512752
2753
2754
27552756
2757
2758
2759
2760
2761
2762
276327642765 2766
2767
27682769
2770
2771
2772
2773
2774
27752776
2777
2778
27792780
27812782
27832784
2785
2786
2787
27882789
2790
27912792
2793
2794
2795
2796
2797 2798
2799
2800
2801
2802
2803
28042805
2806
28072808
2809
28102811
2812
28132814
2815
28162817
2818
28192820
2821
2822
2823
28242825 2826
282728282829
2830
28312832
28332834
28352836
2837
2838
2839
28402841
28422843
2844
28452846
2847
2848
28492850
2851
28522853
2854
285528562857
28582859
2860
2861
2862
28632864
2865
286628672868
28692870
2871
28722873
2874
28752876
2877
2878
28792880
28812882
2883
2884
28852886
28872888
2889
28902891
2892
2893
2894
2895
2896
2897
2898
2899
2900 2901
2902
2903
2904
29052906
2907
2908
2909
29102911
2912
29132914
2915
2916
29172918
2919
2920
2921
2922
2923
29242925
29262927
29282929
2930
29312932
2933
29342935
2936
293729382939
2940
2941
2942
2943
2944
2945
29462947
29482949
2950
2951
2952
2953
2954
29552956
2957
295829592960
2961
2962
296329642965
2966
2967
2968
2969
2970
29712972
2973
2974
2975
2976
2977
29782979
2980
2981
2982
29832984
298529862987
2988
2989
29902991 2992
2993
2994
2995
29962997
2998 2999
3000
Clustering Hierarchical Clustering
Hierarchical Clustering on Artificial Dataset
dat=xclara #3000 x 2library(cluster)
#plot the dataplot(dat,type="n")text(dat,labels=row.names(dat))
plot(agnes(dat,method="single"))plot(agnes(dat,method="complete"))plot(agnes(dat,method="average"))
Clustering Hierarchical Clustering
Hierarchical Clustering on Artificial Dataset
110
392
694
505
547 39
5 70 24 760 54
121
680
349
036
515
389
089
911
329
4 330 70
9 246
691
2245
153
074
7 620
804
240
516
32 502 4
9713
361
5 158
612
179
454
326
212
627
5123
528
918
02
714 1
1263
178
1 648
254
764 55
366
483 42
273
7 40 231
488
386 39
8 828 20
162
154 68 338 5
09 838
3116
729
934
965
6 256
600
237
703
396
769
670
700
63 706
588 95 17
568
673
040
533
913
038 63
2 567
4726
277
3 264
744
852
7378
481
7 357
875
751
825 55
747
451
755
974
681
371
8 750
521
531
450
549 44
76 152
354
872
885
796 79
455
552
221
78 881 34
610
615
431
189
782 18
4 196 45
714
437
269
325
332
328 89
3 585 12
612
463
417
296
636 42
443
679
739
784
5 783 82
369
988 19
323
331
077
7 382
448
514
591
628 26
028
421
349
5 520
880 61
675
465
938
387
8 682 83
358
778
5 874
575
735 51
159
2 75 234
291
861
683
879 64
273
1 457
079
0 285
77 335
674
755
343
741
188
720
589 2
38 453
295
551
602
814
738
801 4
2930
548
152
102
617
444
778 202
848
576 3
7036
42 657
163
538
321
780
603
652
859
293
226
604
705
717 2
82 333
434
460
810 3
80 844
103
314
171
776
855
208
445
394
584
223
159
432
877
594
176
206 540
580
81 269 84
149
653
400
716
143
418 65
830
470
245
299 20
934
747
116
116
214
761
917
223
614
237
843
3 892
222
533 55
672
486
286
4 274
168
385 60
976
627
762
9 403
376
768 88
7 279
352
795
178
426
194
793
313
329 3
6724
780
662
681
263
752
774
386
580
854
251
2 261
292
308
1233
467
777
172
3 286
499
805
192
377
837 84
9 590
710
135
524 85
628
156
354
835
9 242
11 351
207
297
368 23
989
816 50
015
584
7 263
379
443
412 83
456
215
641
26 503 25
925
849
419
778
727
374 70
1 886 50
466
8 58
90 341 10
867
567
8 458
889
882
651
842
742
748 25
231
269
7 198
218 3
3237
146
5 466 288
417
786
13 300
664
826 6
551
060
773
967
269
877
4 29
322
719
388
545
46 722
210
391 8
40 165
136
721
276
579
469
375
736 1
1782
286
612
858
669
676
3 513 80
734
575
764
414
058
259
968
039
314
171
389 19
547
737 33
646
760
523
043
144
283
048
687
351
875
2 5013
414
576
782
0 57 779
759
2347
653
757
8 104 11
985
419
148
7 858
428 227
871
596
618
268
410 1
3959
861
180
081
689
116
057
175
680
979
2 219
271
870 55
364
0 660
721
129
830
189
453
270
646
597 72
8 534 66
588
847
963
535 70
421
426
566 22
546
464
348
555
590
051
567
385
356
479
818
757
779
182
162
578
818
536
711 36
017
371
512
335
6 508
287
320
835
430
647 30
617
020
520 37
349 27
849
255
0 402 69
241 69
047
883
942
379
476
2 114
80 200
543
844
658
368
473
2 761
734
484
67 369
645
654 53
550
712
125
136
3 811 66
930
366
738
142
1 729
480
827
190
449
183
613
315
109
661
249
257
440
447
896
156
860
302
733
344
355
472
836
87 189
970
786
7 48
473
802
290
43 132
272
122
851
758
129
624
863 3
28 415 6
0131
742
039
940
665
045 40
7 148
9252
652
846
184
350
1 151
506
695
765 69
401
671 8
6912
540
884
654
648
915 16
474
578
284
162
268
171
233
64 199
97 525 131
389
409 48
221
250
475
544
83 94 127 86
8 685
224
411 35
841
945
9 181 82
522
036
440
4 799 63
049
653
226
646
819
248
523 33
134
081
556
059 18
562
118
554 22
836
239 31
917
7 9811
612
028
352
2 493
374
572
9361
465
541
463
364
942
759
5 895
186
573
884
72 244
539
441
384
342
593 34
843
983
2 753
157
666
498
361 3
3711
156
1 772 5
4313
858
151
910
020
478
915
046
356
885
7 267
676
275
413
689 4
6252
963
982
422
945
668
7 688
101
169
166
679
146
740
255
558
775
818
437
324
876
491
565
105
307 32
556
620
391
232
316
574
725
819
883
137
243
390
318
569
27 638
217
662
60 726 3
5010
717
438
660
814
85 850
110
606
831
245
727
96 435
470
1451 62
328
071 18
26
3016
8390
117
4891
913
9017
6719
3110
7810
9414
8911
5114
0018
7720
37 159
219
99 1991
1494
1123
1825
1583
1870 93
398
320
01 936
1827
1006
1791 17
9896
397
516
5110
9819
3517
2519
4911
7216
2113
67 1144
1170
1946
1183
1606 15
0915
8115
2212
9816
6918
9710
9515
6113
8219
0713
1518
31 1050
1586
2050 18
4019
8419
2513
2911
0913
11 1267
1858
1272
976
1711
1226
1468
1247
1512
1782 18
6419
4013
0313
3015
3015
73 170
011
2912
8714
6216
2514
7218
9313
0916
2018
3215
2717
37 100
216
5519
4417
3115
4611
7316
96 1498
1978
1917
926
1042
1958 14
5918
7913
1313
6218
02 1936
1007
1579 18
1312
8515
5219
9413
8817
34 1076
1369
2029
1708
2020 10
7011
0113
4611
7618
6716
3520
40 143
614
2513
4210
4716
9412
41 109
693
713
6820
49 944
989
981
1764
1564
1759
1420
1225
1036
1107 1
299
1029
1482
1569
1762
1604
959
1636
1125
1189
1483
1710 1
691
1932
1122
1359
1559
1649
1892
1355
1456
1828
1976
974
984
1919
1422 16
1510
0812
93 1223
1143
1909 18
7415
5016
5219
72 146
314
7420
2315
7715
8016
3218
0910
6112
3015
6212
6210
5418
48 1839
1239
1963
1603
1780
1777
1281
1835
1664
1663
1806
1860
1937
1105
1587
1208
1816 9
3417
6919
2310
5515
7815
29 1967
1059
1268
1890
1630
1340
1585
1383 18
5510
1219
1613
7219
04 1847 16
3410
6613
3915
6714
0519
65 1548
950
1822 11
7412
8018
68 1682
1453
1740
1624
2007 1
575
1533
1104
1974
2031
1058
1826
1908
1464 99
617
8414
6516
5019
5719
4110
8813
4199
792
018
33 1389
1126
1057
2035 13
84 149
310
2416
6110
7412
6313
74 1302
1751
1572 11
8011
0218
1115
4112
3415
05 115
717
5792
813
7619
0210
8617
6020
2219
4512
0312
0796
210
1415
3113
26 1399
1306
1969 1
478
971
1049
1133
1458
1637
1100
1275
2010 14
7620
3312
6414
4913
1691
311
6212
6017
0717
7893
294
9 101
517
6519
9894
114
3817
7915
1518
0418
7811
4912
05 1219
1773 9
0713
3115
4418
1811
9819
8311
6517
8315
1919
9018
6212
3213
6014
1616
7420
03 977
1623
1197
1555
1873
1332
1659
1717
1385
1715
1514
1153
1508
1676
1181
1532
1545
1171
1736 17
6616
6219
11 152
393
813
33 102
512
8418
5213
4417
24 131
818
9619
2917
7620
1514
1314
1999
510
3719
81 904
1871
988
1046
1128
1253 16
4710
8514
2499
417
47 1713
1159
1697
1716
1912
1397
1407 19
8218
5610
1111
4213
1212
7911
5214
0618
98 982
1114
1597
1432
1211
1654
1084
1118
1411
1113
1454 10
2811
0318
6614
67 152
812
2216
81 164
211
3814
15 1256
1314
1501
1885 15
58 1704
916
1457
1466
1108
1216
1979
1345
1952 11
4111
47 181
591
517
20 1005
1845
2025 13
5319
9692
415
11 1602
1959
1328
1712
1136
1938
1956
1484
970
1093 19
8512
0612
1418
7610
2616
7713
00 1679
1178
1370
1186
1237
1381 13
7912
5516
1218
3712
9514
99 1282
1901
1068
1259
1274
1758 93
098
518
8496
720
1110
87 1746
1913
1140
1231
1245
1375
1403 17
4319
61 1488
1212
1441
1643
1888 16
2620
32 188
299
116
1619
21 103
911
5611
66 1795
1947
1185
1242 16
3314
8614
8515
3716
38 986
2002 1
726
1167
1266
1869
1989
1699
1805
1323
1391 1
179
1240
912
1960 11
32 948
1781
1444
1687
1927 19
0615
0320
08 1551
1308
1479
1448
1510
987
1439
2045
1215
1598 1
304
1673
956
978
1335
1513
1824
1729
1060
1971
1730
1204 1
003
1257
1793
1987
1106
1182
1690
1789 20
30 1518 10
2310
9217
0214
4514
7115
2496
110
6413
2119
70 154
014
9020
1293
115
6317
8893
514
3711
9513
3619
95 1030
1742
1875 13
5210
0411
4613
6192
212
5412
33 1955 13
4814
3415
9110
1312
61 1628
1939 1
506
1322
1918
1378
2024
1130
1398
1834
1863
1859 14
5594
011
3513
6514
0291
192
918
46 1754
1210
1358
1785 15
3418
08 1973
1243
1727
2047
1027
1619 19
4813
9319
97 1075
1814
1079
1943 15
47 1905
1112
1953 1
252
1350
1750 14
9618
19 130
113
9613
9416
1717
3815
2117
03 159
614
7311
8816
6020
09 1065
1903 16
8010
4418
3012
7012
9616
1392
710
9019
6617
0117
8618
8320
3618
9111
9218
41 969
1721
1763
966
1276
1770
903
1117
1688
1796 95
215
4310
4110
8217
56 162
918
3615
49 1535
1803
1857
1072
1139
1317
1062
1639
1115
1155 18
8112
2813
4316
1120
1319
4219
86 960
1560
972
1487 1
614
1177
964
1150
1168 10
4013
34 1357
1761 12
8318
72 2016
1292
908
1349
1246
1307 1
584
1895
1914
947
1752
1423
1810
1732
1829 16
7012
6914
21 193
495
117
9016
89 979
1609
1526
1698 10
9711
3114
4312
5812
91 204
818
6111
6015
53 202
191
711
6310
9914
9710
2210
7710
3518
51 1111 1
954
1470
1653
990
1213
1706
1217
1641
1656
1018
1605
1395
2043
1854 90
920
46 1305
1417
1412
1693
1993
1539
1608
1193
1354
992
1071
1286
1290
1794
1817
2044
1739
1705
1031
1073
2039
1191
1461 14
2714
4616
8611
6116
9211
1011
8414
52 1347 9
7310
5217
9720
42 1556 12
9712
6515
4216
4020
0519
3320
2616
2712
7713
92 177
112
4410
8013
5618
1214
9218
38 1695
1570
1657
1418
1977
1091
1236
1568 9
8011
1916
2218
80 178
712
9414
3516
3110
4312
7117
18 1380
1089
1926
1121
1250 9
5818
8714
0915
9414
2914
4019
62 142
812
2115
02 100
012
1816
7210
1616
00 153
818
4219
68 1920
1684
946
1886
1017
1733
918
1363
1792
1799
1950
1980
1063 1507 14
6917
2214
7515
3615
7416
7120
1790
295
711
9613
37 1491
1582
1894 17
1419
2819
8820
2713
1916
67 201
999
310
5115
6620
38 123
810
1915
20 1554
1900
1158
1164
1447
1709
1410
2006
1404
1450 9
2317
35 2018 1
145
939
1209
2041 12
5112
88 943
1154 16
65 999
1033
1741 14
6013
1013
77 200
4 1666
1889 1
169
1414
1034
1433
1865
1924
1187
1504 13
5195
411
2415
9394
211
9012
7812
2914
9511
4814
3119
92 1749 12
0217
4518
53 158
813
8710
09 945
921
1915
906
1001 1
576
1386
1477
1056
1480
1069
905
1645
1273
1775
1820
1481 14
0810
10 968
1565
1571
1719
965
1646
1021
1426
1964
2014 1
910
1517
955
1500
2028
1320
1324
914
1768 20
3410
6715
99 1045
1755
1116
1048
1843
1744
1823
1821
1235
1200
1224
910
1053
1801
1844
1083
1849
1675
1199
1595
1678
1327
1220
1668
1774 13
2518
99 1194
1516
1607
1364
1175
1032
1371
1020
1366
1590
1753
1807
1850
1975
1137
1589
1648 1
134
953
1800
1723
1081
1644
925
1601 1
227 15
5714
0120
0019
51 1618
1442
1772
1338
327
1430 11
2012
8919
2270
810
3816
1012
0113
7312
4917
2815
2519
3024
142
534 66
3 61
829
115
998
1685
2051
2507
2948
2055
2120
2938
2825 28
6021
6929
2521
4924
0722
1320
6627
03 2281
2629
2465
2078
2284
2115
2697
2139
2602
2521
2935
2649
2414
2964
2788
2832
2614
2737 24
0929
5420
9726
5826
6921
2327
6922
6724
0025
7027
8623
0523
08 2930
2529 21
3324
8926
72 2209
2536
2896 2
464
2706
2806
2122
2440
2720
2778
2592
2885 26
1527
2128
1827
5128
7829
73 2709
2620
2888 2
837
2869
2071
2383
2617 28
1523
2823
48 2513
2965
2998 24
7625
9721
4227
8120
8420
9825
46 2870 20
9023
2924
85 2311
2351
2096
2300
2811 2
856
2201
2334 22
4524
7128
0922
3522
7724
8727
53 2566 24
0824
1329
7628
3629
28 2429
2095
2549 22
6526
7728
9823
4923
74 2864
2855 24
10 2219
2262
2876
2873
2315
2812 2
681
2992
2415
2382
2369
2535
2653
2579
2852
2883 24
2724
0124
9827
5524
6226
3924
9125
9423
4326
35 2365
2622 26
8624
9329
6922
7629
29 242
8 2158
2361
2793
2080
2137
2562
2531
2130
2322 24
2522
3222
9026
8327
6022
9324
5829
8421
7826
2129
0222
0524
75 246
022
7824
0325
3227
0425
8224
0426
5624
38 2517
2246
2761
2951
2060
2146
2607
2543
2911 2
738
2564 2730
2805 27
5020
6424
7922
5222
30 2188
2673
2075
2168
2337
2559 27
9528
3529
00 2416
2700
2695 28
7522
8624
2126
0921
2822
7023
5229
6123
9024
7421
9123
9125
0423
6424
5926
3323
1723
60 2558
2424
2914
2611
2651
2820
2439
2872
2393
2584
2436
2514
2576 20
6222
9723
41 2207
2103
2814
2829 2
392
2575
2569
2865
2953
2813
2881
2884
2121
2222
2112
2229
2548
2675
2816
2511
2823
2174
2810
2131
2652
2692
2908
2143
2378
2057
2089
2268
2707
2175 21
4824
9923
4626
2427
8228
2620
6727
6522
2428
4823
9426
80 2644
2118
2228
2527
2573 24
05 2236
2370
2742
2445
2843
2190
2238
2211
2254
2398
2568
2746 2
204
2567 2
743
2842
2110
2616
2295
2114
2135
2312 2
645
2822
2170
2868
2273
2239
2266
2335
2824 25
0229
3122
2323
5725
5626
8926
2727
14 2117
2323
2797
2124
2515 22
4925
6124
9222
1724
0625
2526
38 2844
2802
2451
2263
2132
2358
2705
2741 28
6323
8820
8529
8823
7629
9322
5628
49 2647
2478 25
06 2113
2151
2827
2423
2373
2968 23
8125
05 2719
2309
2985
2368
2324
2978 28
4026
9129
87 2747
2648
2995
2444
2661
2173
2780
2257
2306
2828 25
0025
5328
6128
91 298
627
17 227
220
8826
57 2282
2091
2162
2450
2956
2109
2269
2271
2728 24
1120
9229
4224
30 296
022
5122
6028
38 2604
2682
2632
2225
2758 23
0224
3423
7124
96 2587 2
244
2915
2554
2698
2420
2461
2516
2522
2684
2913
2608
2107
2326
2385
2975
2395
2618
2668
2218
2939
2715
2186
2231
2193
2776
2291
2526 2
386
2972
2418
2947
2530 29
09 2678
2052
2431 22
9627
1127
1320
7721
3421
5026
2829
50 2166
2181
2214
2237
2980
2303
2733
2770 2
294
2063
2255 25
4527
4827
79 2053
2596
2076
2099
2867
2331
2557
2694
2187
2399
2203 24
6629
9626
4228
5723
3323
6227
8920
9427
5624
2225
5224
5728
9028
94 276
726
3429
05 2643
2664
2494
2419
2630
2074
2981
2846
2585
2437
2936
2182
2777 2
227
2292
2880 2
375
2307
2519
2722 29
4427
85 295
927
9829
67 2145
2240
2775 21
0226
5527
83 248
230
0021
3621
9522
48 2736 2
762
2058
2990
2086
2108
2625 23
1629
4629
5824
5426
0627
68 267
421
5623
5321
6723
1829
34 2957 21
7724
7225
6329
0725
4426
59 2389
2723
2882
2105
2833 29
1228
8622
7529
2124
9725
60 2687
2859
2708 28
5027
3925
1826
8528
04 273
122
6120
6123
59 2441
2484
2577 27
2529
2726
5429
9723
3627
6620
6525
9321
92 2159 29
1028
8721
7922
2024
5520
6829
7422
4723
8424
17 2259
2899 24
7025
3323
1424
73 2726
2172
2970
2547
2877
2989
2800
2483
2079
2199
2745
2387
2298
2646
2841
2773
2082
2469
2901 29
71 270
128
6623
1929
4129
6328
3125
7428
0823
5426
93 2463
2379
2637
2509
2600 27
8728
9225
4125
5027
1829
9920
8322
4221
4129
2020
9326
4121
2623
9722
0624
81 2274
2435
2449 22
3324
5627
5221
6427
4021
3828
3421
8522
3423
3825
0127
63 258
325
7223
0426
6025
3923
6325
4229
3729
24 2488
2432
2081
2111
2279
2468
2528 2
879
2208
2667
2790
2087
2226
2327
2161
2350
2922 2
650
2994 27
12 2665
2906
2280
2153
2503
2520 23
4225
9920
5627
8426
0128
4529
4925
1222
8323
5520
6925
89 217
127
2929
1625
3422
1229
4322
0223
2128
7421
9426
36 221
627
9922
8729
52 2977
2571
2160
2955
2127
2918
2180
2749 21
63 2807
2565
2839
2250
2754
2771
2688
2871
2332
2299
2452
2101
2119
2759
2072
2285
2598
2372
2367
2258
2613
2330
2140
2716
2734
2155
2603
2523
2962
2581
2591
2198
2917 24
4829
2629
3223
0124
8624
4226
4023
4526
7928
9320
7326
9626
7622
8923
2525
88 2510 29
8224
9525
8626
1027
7229
9126
2320
5928
1922
2123
1021
0626
7028
9525
7822
1522
8827
4424
4321
4423
66 252
424
0226
9924
2629
0422
6421
1621
8921
9628
8923
4726
7124
8027
9623
5623
9625
0829
7928
4721
7621
0026
1925
3726
1221
4725
3821
2922
41 2595
2794
2580
2830
2791
2152
2490
2433
2735 26
2627
2727
5727
3223
4427
1029
1927
0228
9720
5421
6523
3921
0428
5822
5327
24 244
6 2340
2540
2453
2690
2933
2184
2605
2966
2183
2853 29
0321
2525
5528
5424
4722
0023
2027
7426
6622
4326
3129
83 259
028
0121
5421
5723
1328
5124
1228
6224
7721
9727
6425
5128
1720
7028
2128
0329
4529
40 309
1127
1658
2792
2923
610
1248
387
562
2662
2380
2467
353
770
2377
2210 41
674
926
63
02
46
810
Dendrogram of agnes(x = dat, method = "single")
Agglomerative Coefficient = 0.93dat
Hei
ght
Clustering Hierarchical Clustering
Hierarchical Clustering on Artificial Dataset
124 76
039
269
439
5 10 541 70
505
547
153
890
899
22 365
212
627
13 300
826
664
698
774
713
140
582
599
680
268
410
393
667
6551
060
767
273
982
286
639
940
665
049
029 32
238
854
571
927
657
932
083
5 86 398
828
242
136
721
469
375
736
165
210
391
840
1853
671
136
017
371
528
751
235
289
180
246
691
123
356
508
430
647
7378
481
717
020
580 20
030
6 3 54 68 338
838
201
621
405
63 706
175
588 95
686
730
38 632
262
773
264
357
875
509
567 47
744
852
751
135
524
160
571
756
809
792
192
377
837
849
590
710 8
474
517
521
531
25 557
559
746
813
718
750 44 79
455
552 31
167
299
349
656
256
600
396
769 74 701
886
196
46 722
117
237
703
670
700
76 152
354
796
872
885 82 184
130
339
41 690
478
839
109
661
249
257
440
447
896
111
561
157
666
498
337
1433 64 199
97 525
131
59 185 62
118
554
389
409
482
113
294
330
709
216
803 20 373
49 278
402
423
794
762
492
550
692
186
573
884
228
362
114
427
595
895
414
633
649
229
462
491
565
27 638
318
569 43
772 24
444
136
134
838
453
943
983
275
377
213
858
151
934
259
354
360 72
635
010
724
142
5 387
416
749 6
87 189
472
836
681 20
312
786
868
535
841
945
949
653
218
182
541
368
922
079
936
440
463
011 35
120
729
789
811
985
423
936
856
423 47
610
453
757
819
148
742
885
8 2125
047
554
422
441
119
044
930
273
367 36
964
565
418
361
331
550
753
512
166
925
136
381
134
435
583 94 27
250
466
8 84 658
143
418
149
653
400
716 19
248
523
331
340
815
560
85 850
110
606 83
196 43
547
0 1451
12 334
677
771
723
198
218
856
332
466
371
465
417
786
853
156
860
798
288
800
464
643
485
555
900
28 893
585
126
81 269
383
878
874
587
785
682
833
144
372
693
253
323
154
311
897
450
549
291
861
683
879
642
731 75 234
308
871
281
563
359
548
286
499
805 16 500
155
847
263
379
443
412
834
219
553
640
271
870
660
66 225
328
415
26 503
259
258
5621
564
121
349
552
088
057
573
561
675
465
957
779
182
178
810
020
478
955
877
581
881
988
312
285
160
175
812
962
486
331
742
048
910
116
914
616
667
9 740
150
463
568
857
187
515
673 82
4 623
353
271
411
263
178
164
840 23
148
813
959
861
181
689
1 37 336 50
467
605 55
366
483
422
737
254
764
134
145
767
820
304
702
452
159
540
176
206
432
877
594 45 407 92
526
528
148
712
151
506
695
765
461
843
501 4
570
790
285
77 335
674
755
343
741
102
617
444
778
223
36 42 657
163
538
321
780
603
293
652
859 78 881
221
106
346
124
634
457
260
284
103
314
305
481
429
171
776
855
238
453
580
295
814
551
602
738
801
197
787
273
222
533
724
556
862
864
759
52 370
202
848
576
188
720
589
460
810
208
445
394
584
226
604
705
717
282
333
380
434
844
128
513
345
757
644
511
592
586
696
763
807 5
438
484
446
583
684
732
761 53
270
646
597
728
888
534
665 7
211
298
301
894 89 195
477
480
827
99 209
479
635
194
793
313
329
261
292
35 704
214
265 54
623
044
283
043
148
687
351
862
514
237
843
389
216
838
560
976
637
676
814
761
917
223
616
116
234
747
127
762
988
727
940
330
375
217
424
436
296
636
699
596
618
274
397
845
783
797
823
88 193
233
310
777
448
514
382
591
628 58 90 341
141
227
252
312
697
458
889
882
651
842
43 132
745
6940
167
112
540
884
686
949
452
774
386
554
257 77
935
279
510
867
567
874
274
817
842
636
724
780
663
762
681
251
280
8 39 319
177
255
9811
612
028
352
249
337
457
210
530
732
556
613
724
3 390
245
727
267
676 72
938
142
173
462
232
487
657
472
5 71 182 27
528
061
012
48 970
786
7 4829
047
380
2 15 164
782
841
266
468 9
123
231
634 66
3 61 829 11
5 309
32 502
497
240
620
804
133
615
158
516
451
530
747 93
614
655
179
454
326
612
456
687
688
174
386
608
217
662
529
639 30
1683
1081
1644
943
1154 99
916
6510
3411
6912
5112
8814
3318
6519
2470
810
3816
1011
2012
89 1922
327
1430
965
1646 1021
1069
1187
1504
1351
953
1800
1200
1235
960
1560
1357
1761
992
1071
1286
964
1334
1040
1150
1168
1283
1872
917
1163
1099
1497
1022
1077
1217
1641
1656
1018
1705
1035
1851
1111
1954
1470
1653
1722
944
989
1834
1863
981
1764
1564
1759
1420
1029
1299
1604
1482
1569
1762
1036
1107
1225
1180
1359 98
714
3920
4512
1515
9813
0499
012
1317
0695
916
3614
8317
1011
2511
2211
8916
9111
3013
9818
5914
5618
2819
7690
813
4912
4613
0715
8418
9519
1494
717
5216
7014
2318
1017
3218
2995
117
9016
89 979
1609
1526
1698 10
0911
6116
9212
6914
2119
3495
018
2211
7415
3315
7516
2420
0710
1312
6116
2819
3911
0419
7414
9318
8212
8018
6816
8214
5317
4010
9712
5812
9111
3114
4320
4812
2115
0211
6015
5318
6120
21 921
1915
958
1887
1428
1409
1594
1429
1440
1962
922
1254
1233
1955
931
1563
1195
1336
1995
1788
1030
1352
1742
1875 93
514
3713
6194
618
8610
1710
0411
4610
4418
3012
7012
9211
9915
9516
78 1327
1744
1823
1821
925
1601 9
9810
3213
71 1227 1
685
914
1768
2034
1048
1843
1067
1599
1045
1755 11
1610
0012
1816
7216
8410
1616
0015
3818
4219
6819
2010
4313
8012
7117
1810
8919
26 1175
1565
1194
1134
1338
1364
1516
1607
1249
1728
1320
1324 15
5790
117
4815
2717
3711
2919
1712
8714
6216
2514
7218
9313
0916
2018
3210
0216
5519
4414
2511
7316
9619
7814
9812
1214
41 985
1488
1546
1731
1743
1961
2016
1064
1540
1321
1970
1490
2012
928
1086
1203
1207
1264
1449
1476
2033
1760
2022 93
011
0012
7520
1013
1696
210
1415
3113
9913
2613
7619
0219
4516
4318
8813
0619
6914
7813
8817
3416
4918
9291
219
6011
3215
0320
0819
0613
0814
7914
4815
1015
5194
817
8114
4416
8719
2713
4814
3415
9197
110
4911
3314
5816
3711
4012
3112
4513
7514
03 916
1457
1466
1108
1216
1979
1345
1952
1141
1147
1138
1415
1528
1901
1068
1259
1815
1153
1181
1532
1545
982
1114
1597
1211
1432
1654
1103
1866
1467
1222
1681
1642
1256
1314
1501
1885
1558
1028
1084
1118
1411
1214
1876
1113
1454
1704
1136
1938
1956
1484 92
610
4219
5813
1314
5918
7910
9613
6218
0219
3611
8612
9514
9912
5516
1218
3712
8212
3713
8113
7997
010
9319
8512
0610
2616
7713
0016
7911
7813
7010
4716
9412
4114
1314
1910
0715
7918
1317
0820
2012
8515
5219
9410
7613
6920
2910
7014
3611
0113
4611
7618
6716
3520
40 920
1833
1389
1126
933
983
2001
1342
936
1827
1006
1791
1798
976
1711
1226
1468
1669
1897
1884
1247
1512
1782
1864
1940
1303
1330
1700
1530
1573 99
617
8416
5019
5719
41 997
1465
1057
2035
1384
1322
1918
1088
1341
1274
1758
963
1725
1949 975
1651
1098
1935
1144
1298
1522
1170
1946
1095
1561
1382
1315
1831
1907
1109
1311
1050
1925
1984
1172
1621
1367
1329
1586
2050
1840
1061
1262
1230
1562
1580
1632
1809 90
418
7199
417
4713
9714
0716
4719
8298
811
2812
5310
4610
8514
2411
5916
9717
1319
1217
1618
5691
517
2010
0518
4520
2513
5319
96 924
1511
1602
1328
1959
1011
1142
1152
1279
1312
1406
1898
1062
1639
1115
1155
1228
1881
1549
1712
1343
1611
2013
1673
1942
1986 94
211
9012
7812
2914
9510
2314
4514
7115
2412
9017
9418
1720
44 956
978
1335
1513
1003
1092
1702
1060
1971
1204
1730
1729
1824
1257
1793
1987
919
1390
1767
1931
1078
1183
1606
1509
1581
1094
1489
1991
1151
1400
1877
2037
1494
1825
1583
1870
1592
1999
1024
1661
1074
1263
1374
1751
1302
1572
1267
1858
1272 96
110
6519
0316
8011
2313
5511
0615
1811
8220
3016
9017
89 913
1162
1260
1707
1778
1515
1804
1878 97
414
6410
5818
2619
0898
419
1914
6314
7420
2315
7710
0812
9312
2314
2216
1511
4319
0918
7415
5016
5219
72 932
949
1015
941
1438
1779
1149
1205
1787
1102
1811
1757
1541
1157
1234
1505
1765
1998 934
1769
1923
1055
1578
1529
1967
1746
1339
1567
1405
1965
1548
1012
1916
1634
1066
1372
1904
1847
1059
1268
1890
1630
1855
1340
1585
1383
1054
1848
1663
1806
1839
1105
1587
1208
1816
1239
1963
1603
1780
1777
1281
1835
1664
1860
1937
1219
1773
1913
1626
2032 92
796
917
0117
8617
2117
6318
8320
3610
9019
6618
9111
9218
4196
720
1110
87 995
1037
1981 96
612
7617
7014
9115
8218
9417
1419
2819
8820
27 918
1507
1063
1792
1799
1363
1950
1980
1052
1797
2042
1265
1542
1627
1640
2005
1933
2026 98
012
7713
9211
1916
2218
8010
8013
5618
1214
1819
7712
4417
7112
2016
6817
7413
2518
9914
6914
9218
3816
9515
7016
57 945 95
515
0020
28 993
1051
1566
2038
1238
1056
1480 12
0112
2417
2397
315
5611
1013
4711
8414
52 1148
1431
1992
1749
1019
1520
1554
1900
1158
1164
1447
1709
1091
1236
1297
1568
1410
2006
903
1117
1796
1688
952
1543
1041
1082
1756
1629
1836
1535
1803
1857
1179
1776
2015
938
1333
1025
1344
1724
1284
1852
1165
1783
1519
1990
1862
1416
1674
2003
1605
907
1232
1360
1072
991
1616
1921
1039
1198
1983
1331
1544
1818
1156
1166
1795
1485
1537
1638
1185
1242
1633
1947
1486
1027
1619
1948
1393
1997
1075
1814
1139
1317 91
111
8897
214
8716
1411
9313
5492
918
4617
5416
6020
0912
9616
1312
1013
5817
8512
4317
2720
4715
3418
0819
7393
713
6820
4919
3213
1818
9615
2319
2915
5920
3197
716
2315
1411
9715
5518
7313
3216
5917
1713
8517
1511
7117
3617
6616
6219
1115
0816
76 909
2046
1305
1417
939
1209
2041
1191
1461
1427
1446
1686
1412
1693
1993
1539
1608
1031
1073
2039
1394
1473
1112
1953
1252
1396
1177
1301
1350
1750
1496
1819
1033
1741
1460
1889
1310
1377
2004
1666
1414
940
1135
1365
1402
1294
1435
986
2002
1726
1240
1869
1989
1631
1699
1805
1167
1266
1323
1391 17
3910
1011
2112
5010
7919
4315
4719
0515
2117
0315
9616
1717
3812
0215
8814
0817
4518
5310
2018
5019
75 1648
1137
1589
1366
1590
1753
1807
1401
2000
1951
1525
1930
1442
1772
1618
902
957
1196
1337 91
010
8318
4910
5318
0118
44 906
1001
1576
1404
1450
1386
1477 15
1790
516
4512
7314
8117
7518
2013
1916
6720
19 923
1735
1145
954
1124
1593
1426
1910
1964
2014 96
816
7120
17 1387
1475
1536
1574
1733
1571
1719
1378
2024
1455
1506
1395
2043
1854 2018
1127
1373
1675 16
5856
226
6223
80 2467
2059
2819
2578
2215
2288
2744
2150
2628
2950
2571
2221
2310
2497
2560
2739
2327
2443
2106
2670
2895
2356
2396
2508
2979 2
125
2129
2241
2595
2244
2915
2554
2698
2299
2452
2453
2690
2933
2792
2052
2431
2296
2711
2713 2077
2454
2674
2606
2768
2912
2105
2833
2886
2167
2333
2177
2472
2907
2389
2563
2723
2147
2538
2318
2934
2957
2544
2659
2402
2699
2426
2904 20
7028
2121
8328
5326
3129
8325
5128
1722
7529
2125
1827
3126
8728
5928
5027
0822
8028
8729
6324
8326
8528
0424
4829
2629
32 2524
2586
2610
2772
2991
2055
2825
2120
2938
2860
2706
2806
2149
2407
2213
2169
2925
2267
2400
2305
2308
2930
2529
2570
2786
2066
2703
2281
2629
2078
2284
2139
2602
2465
2097
2658
2669
2709
2878
2973
2489
2672
2521
2935
2115
2697
2954
2620
2888
2837
2869
2409
2788
2832
2614
2737
2414
2964
2649
2107
2186
2385
2975
2193
2776
2231
2138
2834
2185
2234
2338
2583
2218
2939
2715
2291
2526
2909
2418
2947
2530
2326
2395
2618
2668
2386
2972
2678
2080
2137
2562
2531
2130
2322
2425
2232
2290
2683
2760
2293
2458
2984
2413
2976
2836
2928
2608
2809
2084
2098
2546
2870
2245
2471
2158
2429
2276
2929
2428
2748
2779
2362
2789
2522
2684
2913
2063
2255
2545
2103
2814
2829
2392
2575
2088
2657
2282
2604
2682
2569
2865
2953
2091
2162
2450
2956
2109
2271
2728
2269
2411
2225
2758
2371
2496
2587
2302
2434
2092
2942
2430
2632
2251
2260
2838
2960
2173
2780
2306
2828
2500
2986
2553
2861
2891
2134
2166
2181
2214
2257
2717
2237
2980
2303
2294
2733
2770
2051
2507
2948
2420
2461
2516
2079
2199
2387
2745
2112
2229
2143
2378
2511
2823
2174
2810
2548
2675
2816
2131
2652
2692
2908
2060
2738
2146
2607
2564
2730
2805
2543
2911
2750
2298
2646
2841
2630
2773
2062
2207
2297
2341
2121
2128
2961
2813
2881
2884
2270
2352
2390
2474
2074
2981
2585
2846
2307
2375
2437
2936
2250
2332
2182
2777
2227
2292
2880
2127
2918
2163
2180
2749
2565
2839
2807
2688
2871
2754
2771
2058
2990
2316
2086
2108
2625
2910
2272
2946
2191
2391
2504
2439
2872
2317
2360
2558
2424
2914
2611
2651
2820
2393
2584
2436
2514
2576
2085
2988
2376
2993
2959
2256
2849
2506
2444
2478
2647
2113
2381
2505
2151
2827
2423
2719
2373
2968
2309
2985
2368
2958
2094
2756
2831
2114
2135
2312
2645
2822
2863
2156
2353
2422
2552
2767
2457
2890
2894
2494
2634
2905
2643
2664
2064
2479
2252
2188
2230
2673
2075
2168
2337
2559
2795
2835
2900 22
3520
9623
0028
1128
5622
0123
3422
8624
2126
0924
1627
0026
9528
7522
1922
7725
6624
0824
8727
5323
4923
7428
5528
6423
8820
7123
8326
1729
6529
9823
2823
4824
7625
1320
9023
2924
8523
1123
5121
4227
8125
9728
1520
9525
4926
7728
9822
6524
1024
9329
6921
2425
1522
4925
6121
3223
5827
0524
9222
1728
4424
5128
0224
0625
2526
3822
6327
4123
6424
5926
3322
6228
7628
7324
6226
3924
9125
9423
1528
1223
8224
0124
9827
5524
1526
8129
9223
4326
3523
6526
8626
2223
6925
7928
5228
8324
2725
3526
5320
5627
8426
0128
4529
4925
1222
8323
5521
5526
0323
4523
6721
0226
5527
8324
8221
3627
3630
0021
9522
4827
6221
4027
1627
3421
9829
1725
2329
6226
7928
9324
1228
62 247
722
5327
2424
4623
4427
1029
1920
6829
7421
7925
7428
0821
6029
5522
4723
8424
1722
5928
9923
1424
7327
2624
7025
3323
2429
7828
4026
4829
9526
6126
9129
8727
4721
0026
1925
3726
1221
7229
7025
4728
7729
8928
00 215
721
9628
89 2791
2313
2851
2116
2189
2480
2796
2347
2671 24
4728
3029
0328
0329
4577
029
4020
7222
8525
9823
7221
7625
8021
5323
4224
3225
0325
2025
9925
5528
5420
9326
4121
2623
9724
5627
5225
0127
6329
2425
3925
4229
3721
6123
5029
22 2164
2304
2660
2363
2206
2481
2274
2233
2435
2449
2572
2740
2122
2818
2440
2720
2778
2592
2885
2615
2721
2145
2240
2775
2751
2123
2769
2133
2246
2761
2951
2222
2665
2650
2994
2712
2101
2119
2759
2264
2727
2757 21
8427
3220
5325
9620
9928
6720
7623
3125
5721
8723
9922
0324
4127
2529
2726
9428
4220
8722
2622
2024
5524
6629
9626
4228
5727
9829
6725
8125
9123
0124
8624
4226
4022
8925
1023
2525
8829
8223
4025
4026
2320
5723
4626
2420
8921
4824
9921
7522
6827
0722
1121
9024
4528
4325
6827
4620
6727
6522
2428
4823
9426
8026
4422
0924
6425
3628
9621
1822
2825
2725
7324
0522
3622
3823
9823
7027
4223
6127
9321
1026
1622
9522
0425
6727
4322
2323
5722
5427
8228
2621
1723
2327
9725
5626
8926
2727
1421
7028
6822
7329
3122
3925
0222
6623
3528
2420
6123
5921
5920
6525
9321
9223
3627
6624
8425
7726
5429
9720
8322
4221
4129
20 2794
2069
2589
2799
2943
2171
2729
2916
2534
2212
2287
2952
2977
2202
2321
2874
2319
2941
2261
2495
2178
2621
2902
2404
2656
2438
2517
2205
2475
2460
2532
2704
2278
2403
2582
2258
2613
2330
2419
2488
2073
2696
2676
2194
2636
2216
2144
2366
2923
2200
2243
2320
2774
2666
2590
2801
2605
2966
2210
2054
2165 2339
2104
2858
2702
2897 21
5428
4720
8121
1122
7921
5224
9024
3327
3526
2620
8224
6929
0127
0129
7127
8528
6625
1927
2229
4428
8229
0622
0826
6727
9024
6825
2828
7923
5426
9324
6327
1829
9923
7926
3725
5025
0926
0027
8728
9225
4126
6321
9727
6423
77
020
4060
8010
012
014
0
Dendrogram of agnes(x = dat, method = "complete")
Agglomerative Coefficient = 0.99dat
Hei
ght
Clustering Hierarchical Clustering
Hierarchical Clustering on Artificial Dataset
124 76
021
680
3 1039
269
439
5 7050
554
715
389
089
951
235
289
180
113
294
330
709
246
691
1433 64 199
97 525
131
59 185 6
211
855
438
940
948
213 30
082
666
469
877
471
365
510
607
672
739
822
866
268
410 39
366
729 32
238
854
571
913
672
146
937
573
627
657
9 2245
153
074
721
262
736
554
139
940
665
0 490
32 502
497
240
620
804
516
133
615
158
612
179
454 32
6 9361
465
552
963
9 217
662
174
386
608
383
820
162
140
586 39
882
824
254 68 338 73
784
817
357
875
170
205
1853
671
136
032
083
512
335
650
817
371
528
730
638 63
250
956
747
744
852
262
773
264
135
524
751
192
377
837
849
590
710
160
571
756
809
792
686
730 20 373
114
49 278
402
492
550
692
430
647
41 690
478
839
80 200
423
794
762
109
661
249
257
440
447
896
186
573
884
228
362
12 334
677
771
723
198
218
856
288
417
786 85
333
246
637
146
518
751
567
346
464
348
555
590
079
880
087
127 63
831
856
9 437
72 244
384
539
441
157
666
498
361
772
111
561
337
138
581 51
934
259
334
843
983
275
3 543
229
462
491
565
414
633 64
942
759
589
5 271
411
263
178
164
813
414
576
782
0 37 336 50
467
605 45 407
5536
648
342
273
725
476
440 23
148
813
959
861
181
689
1 9252
652
846
184
350
115
150
669
576
535
841
945
9 496
532
39 319
177 98
148
712
116
120
283
522
493
374
572
105
307
325
566 25
55
438
484
446
583
684
732
761 7
211
298
301
894
479
635 53
270
646
597
728
534
665
888
35 704
214
265 54
623
044
283
043
148
687
3 518
625
303
752
1742
443
678
379
729
663
669
982
359
661
888 19
323
331
077
738
244
851
459
162
814
237
843
389
227
439
784
5 759
222
533
556
724
862
864
168
385
609
766
277
629
403
279
376
768
887
57 779
352
795
147
619
172
236
99 209
161
162
347
471 89 195
477
512
808
178
426
367
194
793
313
329
261
292 58 90 341
252
312
697
141
227
140
582
599
680 69
401
671
869
125
408
846
108
675
678
458
889
882
651
842
742
748
247
806
637
626
812 49
452
774
386
554
2 970
786
7 4829
047
380
248
082
7 15 164
782
841 43 132 74
526
646
845
668
768
8 9123
231
64
570
790
285
188
720
589
460
810
238
453 52 370
202
848
576
102
617
444
778 36 42 657
163
538
321
780
603
293
652
859 77 335
674
755
343
741 22
310
331
417
177
685
512
463
445
719
778
727
329
581
455
160
273
880
130
548
142
912
858
669
676
380
751
364
420
844
539
458
434
575
7 84 658
143
418
380
149
653
400
716
304
702
452 27
215
943
287
759
417
620
654
058
022
660
470
571
728
233
343
484
4 847
451
725 55
775
0 44 7945
555
219
655
974
681
371
874 70
188
676 15
279
622
135
487
288
5 82 184
130
339
670
78 881
346
106
260
284 31
167
299
349
656 63 706
588
175 95
237
703
700
256
600
396
769 46 722
117
165
210
391
840
28 893
585
126
81 269
154
311
897
450
549
521
531
383
878
682
833
587
785
511
592
874
75 234
308
291
861
683
879
642
731
144
372
693
253
323
281
563
359
548
286
499
805 11 351
207
297
368
898
119
854
564
16 500
155
847
263
239
379
443
412
834
219
553
640
271
870
660
26 503
259
258
5621
564
121
349
552
088
057
573
561
675
465
915
686
057
779
182
178
866 22
532
841
560
112
285
175
812
962
486
331
742
0 489
100
204
789
558
775
818
819
883
101
169 14
616
667
9 740
150
463
568
857
824 62
319
248
523
331
340
815 56
085 85
011
060
6 831
23 476
537
578
104
191
487
428
858
67 369
645
654
183
613
315
507
535
190
449
121
251
363
811
669
344
355
504
668 21
250
475
544 83 94 127
868
685
181
825
220
364
404
630
799
413
689
87 189
681
224
411
302
733
472
836
203
34 663 61 829
115 30
913
724
3 390
245
727
267
676
381
421
734
729
622
324
876
574
725 20
7028
2196 43
5 470 14
5135
360 72
635
0 107
241
425
387
670
810
3816
1011
2012
89 192
232
714
30 71
182 2
75 280
610
1248
416
749
3016
8310
3411
6912
5112
8814
3318
65 1924
939
1209
2041 94
311
54 999
1665 10
8116
44 101
010
3317
4114
6018
8914
1413
1013
7720
0416
6690
311
1717
9616
8815
4910
8217
5616
2918
3695
215
4310
4110
7215
3518
0318
57 907
1331
1544
1818
1198
1983
1232
1360
1416
1674
2003 16
0593
813
3310
2512
8418
5213
4417
2411
6517
8315
1919
9018
6294
011
3513
6514
0298
620
0217
2616
9918
05 1631
1167
1266
1869
1989
1323
1391
1179
1776
2015 12
40 1739
909
2046
1305
1417
1193
1354
1412
1693
1993
1539
1608
1031
1073
2039
1191
1461
1427
1446
1686
929
1846
1754
1296
1613
1660
2009
1210
1358
1785
1534
1808
1243
1727
2047
1973
1027
1619
1948
1393
1997
1075
1814
1139
1317
1079
1943
1547
1905
1121
1250
1394
1473
1521
1703
1617
1738
1596
1112
1953
1252
1396
1301
1350
1750
1496
1819 11
7710
2018
5019
7513
6615
9017
5318
0711
3715
89 164
815
2519
3012
0215
8817
4518
5312
9414
35 1408
1401
2000
1951
1618
1442
1772
901
1748
1527
1737
1129
1917
1287
1462
1625
1472
1893
1309
1620
1832
1002
1655
1944
1731
1546
1173
1696
1978
1498
1425 98
510
6413
2119
7015
4014
9020
1210
9711
3114
4312
5812
9120
4811
6015
5318
6120
2192
810
8613
7619
0217
6020
2219
4512
1214
4116
4318
8896
210
1415
3113
2613
9916
4918
92 930
1100
1275
2010
1264
1449
1316
1476
2033
1180
1359
1203
1207
1306
1969
1478
971
1049
1133
1458
1637
1140
1231
1245
1375
1403
1488
1743
1961 20
1692
212
5412
3319
5593
514
3711
9513
3619
9517
8813
6193
115
6310
0411
4610
3017
4218
7513
5294
618
8610
1710
4418
3012
7012
9291
219
6011
3214
4416
8719
2719
0615
0320
0813
0814
7914
4815
1015
5194
817
8113
4814
3415
9110
4312
7117
1813
8010
8919
2610
0012
1816
72 1684
1016
1600
1538
1842
1968
1920 91
111
8896
015
6013
5717
6197
214
8716
1499
210
7112
8696
413
3410
4011
5011
6812
8318
7294
498
998
117
6415
6417
5914
2010
3611
0712
2510
2912
9916
04 987
1439
2045
1215
1598
1304
959
1636
1125
1189
1483
1710
1559
1122
1691
1932
1482
1569
1762
1318
1896
2031
1929
1130
1398
1834
1863 18
5914
5618
2819
76 917
1163
1099
1497
1022
1077
990
1213
1706
1217
1641
1656
1035
1851
1111
1954
1470
1653 10
6911
8715
0413
5117
2295
318
00 1200
1235
965
1646 10
2110
1817
0590
418
7198
811
2812
5310
4610
8514
24 994
1747
1397
1407
1647
1982
1159
1697
1713
1912
1716
1856
915
1720
1005
1845
2025
1353
1996 924
1511
1959
1602
1328
1712
1011
1142
1312
1152
1279
1406
1898
1062
1639
1115
1155
1228
1881
1343
1611
2013
1942
1986 16
7395
697
813
3515
1317
2918
2410
6019
7112
0417
3012
5717
9319
8710
0310
9217
0211
0620
3015
1811
8216
9017
89 961
1065
1903
1680 13
5591
614
5714
6618
1511
0812
1619
7913
4519
5211
4111
47 991
1616
1921
1039
1156
1166
1795
1947
1185
1242
1633
1486
1485
1537
1638 97
716
2315
1415
2311
9715
5518
7313
3216
5917
1713
8517
1511
7117
3617
6616
6219
1115
0816
7610
6812
5911
5311
8115
3215
45 926
1042
1958
1313
1459
1879
1362
1802
1936
937
1368
2049
1007
1579
1813
1708
2020
1285
1552
1994
1388
1734
1076
1369
2029
1096
1070
1101
1346
1436
1176
1867
1635
2040
970
1093
1985
1206
1214
1876 19
0110
2616
7713
0016
7911
7813
7010
4716
9412
4114
1314
1911
8612
5516
1218
3712
8212
3713
8113
7912
9514
99 920
1833
1389
1126
1057
2035
1384 99
713
4299
617
8414
6516
5019
5719
4110
8813
41 933
983
2001
1669
1897
1884
936
1827
1006
1791
1798
1172
1621
1367
976
1711
1226
1468
1247
1512
1782
1864
1940
1303
1330
1700
1530
1573 982
1114
1597
1211
1432
1654
1028
1084
1118
1411
1113
1454
1704
1136
1938
1956
1484
1103
1866
1467
1528
1222
1681
1642
1138
1415
1256
1314
1501
1885
1558
1024
1661
1074
1263
1374
1302
1751
1572
1109
1311
1267
1858
1272
913
1162
1260
1707
1778
1515
1804
1878
967
2011
1087
1583
1870 99
510
3719
81 919
1390
1767
1931
1078
1094
1489
1151
1123
1991
1400
1877
2037
1592
1999
1494
1825
963
1725
1949 975
1651
1098
1935
1095
1561
1382
1315
1831
1907
1144
1298
1522
1170
1946
1183
1606
1509
1581
1050
1925
1984
1329
1586
2050
1840
1061
1262
1230
1562
1580
1632
1809 97
411
5710
5818
2619
0814
6498
419
1914
2210
0812
9312
2316
1514
6314
7420
2315
7711
4319
0918
7415
5016
5219
7211
0515
8712
0818
1612
3919
6316
0317
8017
7712
8118
3516
6416
6318
0618
6019
3712
1917
7319
1316
2620
3293
294
910
1594
114
3817
7917
6519
9811
4912
05 1787
1012
1916
1634
1066
1372
1904
1847
1102
1811
1757
1234
1505
1541 90
295
713
1916
6720
1990
516
4512
7317
7518
20 1481
923
1735
2018
954
1124
1593 11
4514
2619
6420
1419
1013
7396
816
7120
1714
7515
3615
7417
33 1387
908
1349
1246
1307
1584
1895
1914
1013
1261
1628
1939
950
1822
1174
1533
1575
1624
2007
1221
1502
1274
1758
1322
1918
934
1769
1923
1055
1578
1529
1967 1746
1059
1268
1890
1630
1855
1340
1585
1383
1378
2024
1455
1506
1395
2043
1854
1054
1848
1839
1339
1567
1405
1965
1548
1104
1974
1493
1882
1280
1868
1682
1453
1740
947
1752 16
7014
2318
1017
3218
2912
6914
2119
34 1009
1161
1692
951
1790
1689 97
916
0915
2616
98 1571
1719
906
1001
1576
1386
1477
1404
1450
1517
910
1083
1849
1053
1801
1844 1
675
927
1192
1841
1090
1966
1891
969
1721
1763
1701
1786
1883
2036
966
1276
1770 9
4519
2819
8820
2711
9613
3714
9115
8218
9417
14 918
1363
1950
1980
1063
1792
1799 97
315
5610
5217
9720
4212
6515
4216
2716
4020
0519
3320
26 980
1277
1392
1119
1622
1880
1080
1356
1812
1418
1977 1244
1771
1469
1507
1492
1838
1695
1570
1657
1220
1668
1774
1325
1899
942
1190
1278
1229
1495
1023
1445
1471
1524
1290
1794
1817
2044
1110
1184
1452
1347
1148
1431
1992
1749
1019
1520
1554
1900
1158
1164
1447
1709
1091
1236
1297
1568
1410
2006 9
5515
0020
28 1224
1201
1723
993
1051
1566
2038
1238
1056
1480 91
417
6820
3410
6715
99 1048
1843
1744
1823 18
2110
4517
55 1116
921
1915
1199
1595
1678 1
327
958
1887
1428
1409
1594
1429
1440
1962
1032
1371 12
2792
516
0199
816
8511
3413
3811
7515
6511
9413
6415
1616
07 1249
1728
1320
1324 15
5711
2716
5856
226
62 2380 2
467
2792
2125
2453
2690 29
3321
2922
4125
9522
4429
1525
5426
9822
9924
5220
5224
3122
9627
1127
13 2077
2105
2833
2886
2454
2674
2606
2768
2912
2167
2333
2193
2776
2231
2107
2186
2385
2975
2218
2939
2715
2326
2395
2618
2668
2291
2526
2909
2386
2972
2678
2748
2779 21
4725
3821
7724
7225
6323
8927
2323
1829
3429
5725
4426
5929
0724
0226
9924
2629
0420
5928
1922
2123
1021
3421
6621
5026
2829
50 2571
2106
2670
2895 25
7822
1522
8827
44 2443
2356
2396
2508
2979
2275
2921
2518
2731
2687
2859
2850
2708 23
2724
9725
6027
3922
8028
8729
6324
8326
8528
04 2448
2926
2932 2
524
2586
2610
2772
2991 21
8328
5326
3129
8325
5128
1720
5125
0729
4824
2024
6125
1620
7921
9927
4523
8722
9826
4628
4126
3027
50 2773
2074
2981
2585
2846
2375
2437
2936 23
0721
8227
7722
2722
9228
8025
7428
0821
2729
1821
6321
8027
4925
6528
3928
0722
5027
5427
71 2332
2367
2688
2871
2056
2784
2601
2845
2949 25
1221
5526
03 2345
2102
2655
2783
2482
2283
2355
2136
2736
3000
2195
2248
2140
2716
2734
2198
2917
2523
2962
2679
2893
2412
2862
2477
2253
2724
2446
2344
2710 29
1920
6829
7421
7922
4723
8424
1721
6029
5522
5928
9924
7025
3323
1424
7327
2623
2429
7828
4026
4829
9526
6126
9129
8727
4721
0026
1925
3726
1221
7229
7025
4728
7729
8928
00 215
727
9123
1328
5121
1621
8924
8027
9621
9628
8923
4726
71 2447
2830
2903
2803
2945
2058
2990
2316
2086
2108
2625
2272
2946
2910
2191
2391
2504
2364
2459
2633 27
4123
1723
6025
5824
2429
1426
1126
5128
2024
3928
7223
9325
8424
3625
1425
7620
6027
3821
4626
0725
6425
4329
1127
3028
0520
6222
9723
4122
0721
2128
1328
8128
8429
6120
6424
7922
5222
3021
8826
7320
7521
6823
3725
5927
9528
3529
0021
2822
7023
5223
9024
7420
8529
8823
7629
9329
5922
5628
4925
0624
7826
4724
4421
1323
8125
0523
0929
8523
6821
5128
2727
1923
7329
6824
23 2958
2063
2255
2545
2103
2814
2829
2392
2575
2569
2865
2953
2112
2229
2143
2378
2511
2823
2174
2810
2548
2675
2816
2131
2652
2692
2908
2088
2657
2282
2604
2682
2091
2162
2450
2956
2109
2269
2411
2271
2728
2225
2758
2302
2434
2371
2496
2587 27
6220
9229
4224
3026
3222
5122
6028
3829
6022
5727
1723
0628
2825
0025
5328
6128
9129
8620
7123
8326
1729
6529
9823
2823
4825
1324
7628
0924
1329
7628
3629
2820
8420
9825
4628
7022
4524
7121
5824
2922
7629
2924
2820
9525
4922
6524
1026
7728
9821
4227
8125
9728
1520
9023
2924
8523
1123
5120
9623
0028
1128
5622
0123
3424
0822
8624
2126
0924
1627
0026
9528
7522
1923
4923
7428
5528
64 2388
2235
2277
2487
2753
2566
2080
2137
2562
2531
2130
2322
2425
2173
2780
2232
2290
2683
2760
2293
2458
2984
2362
2789
2522
2684
2913 26
0821
8122
1422
3729
8023
0322
9427
3327
7020
5325
9624
6629
9626
4228
5727
9829
6720
7623
3125
5726
9420
9928
6721
8723
9922
0320
8722
2622
2024
5525
8125
9120
8224
6929
0129
7127
0128
6625
1927
2229
4427
8528
8220
9427
5628
3124
2225
5227
6724
5728
9028
9424
9426
3429
0526
4326
6420
5421
65 2339
2104
2858
2702
2897
2081
2111
2279
2152
2490
2433
2735 26
2622
0826
6727
9024
6825
2828
79 2906
2354
2693
2463
2718
2999
2379
2637
2550
2509
2600
2787
2892
2541 2
154
2847
2197
2764
2377
2055
2120
2938
2825
2860
2169
2925
2149
2407
2213
2489
2672
2706
2806
2115
2697
2649
2954
2620
2888
2837
2869
2066
2703
2281
2629
2078
2284
2139
2602
2465
2521
2935
2409
2414
2964
2788
2832
2614
2737
2418
2947
2530
2097
2658
2669
2709
2878
2973
2122
2440
2720
2778
2818
2751
2592
2885
2615
2721
2123
2769
2133
2267
2400
2570
2786
2529
2305
2308
2930
2093
2641
2126
2397
2456
2752 21
6422
0624
8122
7422
3324
3524
4925
7227
4023
0426
6025
3923
6321
3828
3421
8522
3423
38 2583
2501
2763
2542
2937 29
2421
0121
19 2759
2264
2145
2240
2775 21
6123
5029
2226
5029
9427
1226
6520
5723
4626
2421
4824
9920
8921
7522
6827
0722
1121
9024
4528
4321
1822
2825
2725
7324
0522
3622
3823
7027
4223
9820
6727
6522
2428
4825
6827
4623
9426
8026
4422
0925
3628
9624
6423
6127
9321
7826
2129
0224
1924
0426
5624
3825
1723
1929
41 2488
2205
2475
2460
2532
2704 25
8222
7824
0322
4627
6129
51 2222
2114
2135
2312
2645
2822
2863
2156
2353
2170
2868
2273
2239
2502
2931
2223
2357
2266
2335
2824
2556
2689
2627
2714
2117
2323
2797
2124
2515
2249
2561
2492
2132
2358
2705
2217
2844
2263
2451
2802
2406
2525
2638
2262
2876
2873
2462
2639
2491
2594
2315
2812
2382
2401
2498
2755
2415
2681
2992
2343
2635
2365
2622
2686
2369
2535
2653
2427
2579
2852
2883
2493
2969
2061
2359
2654
2997
2336
2766
2441
2484
2577
2725
2927
2301
2486
2442
2640
2065
2593
2192
2159
2204
2567
2743
2842
2110
2616
2295
2254
2782
2826
2258
2613 23
3020
8322
4221
4129
20 279
422
8925
1023
2525
8829
8223
4025
4026
2320
6925
8927
9929
4321
7127
2929
16 2534
2202
2321
2874 22
6124
9522
1222
8729
5229
7720
7326
96 2676
2194
2636
2216
2144
2366
770
2940
2072
2285
2598 23
7221
5323
4225
0325
20 2432
2599
2176
2580
2732
2555
2854
2184
2727
2757 22
0022
4323
2027
74 2666
2590
2801
2605
2966
2210
2923
2663
020
4060
Dendrogram of agnes(x = dat, method = "average")
Agglomerative Coefficient = 0.99dat
Hei
ght
Clustering Hierarchical Clustering
Using Dendrograms
Different ways of measuring dissimilarity result in different trees.Dendrograms are useful for getting a feel for the structure ofhigh-dimensional data though they don’t represent distances betweenobservations well.Dendrograms show hierarchical clusters with respect to increasing valuesof dissimilarity between clusters, cutting a dendrogram horizontally at aparticular height partitions the data into disjoint clusters which arerepresented by the vertical lines it intersects. Cutting horizontallyeffectively reveals the state of the clustering algorithm when thedissimilarity value between clusters is no more than the value cut at.Despite the simplicity of this idea and the above drawbacks, hierarchicalclustering methods provide users with interpretable dendrograms thatallow clusters in high-dimensional data to be better understood.