unsupervised learning in r...unsupervised learning in r linking clusters in hierarchical clustering...
TRANSCRIPT
![Page 1: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/1.jpg)
UNSUPERVISED LEARNING IN R
Introduction to hierarchical clustering
![Page 2: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/2.jpg)
Unsupervised Learning in R
Hierarchical clustering● Number of clusters is not known ahead of time
● Two kinds: bo!om-up and top-down, this course bo!om-up
![Page 3: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/3.jpg)
Unsupervised Learning in R
Hierarchical clustering
Simple Example
![Page 4: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/4.jpg)
Unsupervised Learning in R
Hierarchical clustering
5 Clusters Each point a cluster
![Page 5: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/5.jpg)
Unsupervised Learning in R
Hierarchical clustering
4 Clusters
![Page 6: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/6.jpg)
Unsupervised Learning in R
Hierarchical clustering
3 Clusters
![Page 7: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/7.jpg)
Unsupervised Learning in R
Hierarchical clustering
2 Clusters
![Page 8: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/8.jpg)
Unsupervised Learning in R
Hierarchical clustering
1 Cluster
![Page 9: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/9.jpg)
Unsupervised Learning in R
Hierarchical clustering in R> # Calculates similarity as Euclidean distance between observations > dist_matrix <- dist(x)
> # Returns hierarchical clustering model > hclust(d = dist_matrix)
Call: hclust(d = s)
Cluster method : complete Distance : euclidean Number of objects: 50
x is a data matrix
![Page 10: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/10.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!
![Page 11: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/11.jpg)
UNSUPERVISED LEARNING IN R
Selecting number of clusters
![Page 12: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/12.jpg)
Unsupervised Learning in R
Interpreting results> # Create hierarchical cluster model: hclust.out > hclust.out <- hclust(dist(x)) > # Inspect the result > summary(hclust.out) Length Class Mode merge 98 -none- numeric height 49 -none- numeric order 50 -none- numeric labels 0 -none- NULL method 1 -none- character call 2 -none- call dist.method 1 -none- character
Information isn’t particularly useful
![Page 13: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/13.jpg)
Unsupervised Learning in R
Dendrogram● Tree shaped structure used to interpret hierarchical
clustering models
height
![Page 14: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/14.jpg)
Unsupervised Learning in R
Dendrogram
height
● Tree shaped structure used to interpret hierarchical clustering models
![Page 15: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/15.jpg)
Unsupervised Learning in R
Dendrogram● Tree shaped structure used to interpret hierarchical
clustering models
height
![Page 16: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/16.jpg)
Unsupervised Learning in R
Dendrogram● Tree shaped structure used to interpret hierarchical
clustering models
height
![Page 17: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/17.jpg)
Unsupervised Learning in R
Dendrogram● Tree shaped structure used to interpret hierarchical
clustering models
height
![Page 18: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/18.jpg)
Unsupervised Learning in R
Dendrogram plo!ing in R> # Draws a dendrogram > plot(hclust.out) > abline(h = 6, col = "red")
height
02
46
![Page 19: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/19.jpg)
Unsupervised Learning in R
Tree 'cu!ing' in R● Need to cut the tree to get cluster assignments
> # Cut by height h > cutree(hclust.out, h = 6) [1] 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 [32] 3 3 3 3 4 4 4 4 4 4 4 4 4 4 2 4 2 4 4
> # Cut by number of clusters k > cutree(hclust.out, k = 2) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 [32] 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
![Page 20: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/20.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!
![Page 21: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/21.jpg)
UNSUPERVISED LEARNING IN R
Clustering linkage and practical ma!ers
![Page 22: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/22.jpg)
Unsupervised Learning in R
Linking clusters in hierarchical clustering● How is distance between clusters determined? Rules?
● Four methods to determine which cluster should be linked
● Complete: pairwise similarity between all observations in cluster 1 and cluster 2, and uses largest of similarities
● Single: same as above but uses smallest of similarities
● Average: same as above but uses average of similarities
● Centroid: finds centroid of cluster 1 and centroid of cluster 2, and uses similarity between two centroids
![Page 23: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/23.jpg)
Unsupervised Learning in R
Linking methods: complete and average
![Page 24: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/24.jpg)
Unsupervised Learning in R
Linking method: single
![Page 25: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/25.jpg)
Unsupervised Learning in R
Linking method: centroid
![Page 26: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/26.jpg)
Unsupervised Learning in R
Linkage in R> # Fitting hierarchical clustering models using different methods > hclust.complete <- hclust(d, method = "complete") > hclust.average <- hclust(d, method = "average") > hclust.single <- hclust(d, method = "single")
![Page 27: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/27.jpg)
Unsupervised Learning in R
Practical ma!ers● Data on different scales can cause undesirable results
in clustering methods
● Solution is to scale data so that features have same mean and standard deviation
● Subtract mean of a feature from all observations
● Divide each feature by the standard deviation of the feature
● Normalized features have a mean of zero and a standard deviation of one
![Page 28: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/28.jpg)
Unsupervised Learning in R
Practical ma!ers> # Check if scaling is necessary > colMeans(x) [1] -0.1337828 0.0594019
> apply(x, 2, sd) [1] 1.974376 2.112357
> # Produce new matrix with columns of mean of 0 and sd of 1 > scaled_x <- scale(x)
> colMeans(scaled_x) [1] 2.775558e-17 3.330669e-17
> apply(scaled_x, 2, sd) [1] 1 1
x is a data matrix
Normalized features have mean of 0 and standard deviation of 1
![Page 29: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/29.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!
![Page 30: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/30.jpg)
UNSUPERVISED LEARNING IN R
Review of hierarchical clustering
![Page 31: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/31.jpg)
Unsupervised Learning in R
Hierarchical clustering review> # Fitting various hierarchical clustering models > hclust.complete <- hclust(d, method = "complete") > hclust.average <- hclust(d, method = "average") > hclust.single <- hclust(d, method = "single")
![Page 32: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/32.jpg)
Unsupervised Learning in R
Linking methods: complete and average
![Page 33: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/33.jpg)
Unsupervised Learning in R
Hierarchical clustering review
height
![Page 34: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/34.jpg)
Unsupervised Learning in R
Hierarchical clustering review
height
![Page 35: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/35.jpg)
Unsupervised Learning in R
Hierarchical clustering review
height
![Page 36: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/36.jpg)
Unsupervised Learning in R
Hierarchical clustering review
height
![Page 37: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/37.jpg)
Unsupervised Learning in R
Hierarchical clustering review> # Scale the data > pokemon.scaled <- scale(pokemon)
> # Create hierarchical and k-means clustering models > hclust.pokemon <- hclust(dist(pokemon.scaled), method = "complete") > km.pokemon <- kmeans(pokemon.scaled, centers = 3, nstart = 20, iter.max = 50)
> # Compare results of the models > cut.pokemon <- cutree(hclust.pokemon, k = 3) > table(km.pokemon$cluster, cut.pokemon) cut.pokemon 1 2 3 1 242 1 0 2 342 1 0 3 204 9 1
![Page 38: UNSUPERVISED LEARNING IN R...Unsupervised Learning in R Linking clusters in hierarchical clustering How is distance between clusters determined? Rules? Four methods to determine which](https://reader034.vdocuments.net/reader034/viewer/2022052612/5f0ec9357e708231d440edeb/html5/thumbnails/38.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!