hierarchical clustering

23
Hierarchical Clustering Dr. Bernard Chen Assistant Professor

Upload: todd

Post on 05-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Hierarchical Clustering. Dr. Bernard Chen Assistant Professor. Outline. Hierarchical Clustering Hybrid Hierarchical Kmeans clustering DBscan. Hierarchical Clustering. Venn Diagram of Clustered Data. Dendrogram. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hierarchical Clustering

Hierarchical Clustering

Dr. Bernard Chen Assistant Professor

Page 2: Hierarchical Clustering

Outline

Hierarchical Clustering Hybrid Hierarchical Kmeans

clustering DBscan

Page 3: Hierarchical Clustering

Hierarchical Clustering

DendrogramVenn Diagram of Clustered Data

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 4: Hierarchical Clustering

Nearest Neighbor, Level 2, k = 1 clusters.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 5: Hierarchical Clustering

Nearest Neighbor, Level 3, k = 2 clusters.

Page 6: Hierarchical Clustering

Nearest Neighbor, Level 4, k = 3 clusters.

Page 7: Hierarchical Clustering

Nearest Neighbor, Level 5, k = 2 clusters.

Page 8: Hierarchical Clustering

Nearest Neighbor, Level 6, k = 2 clusters.

Page 9: Hierarchical Clustering

Nearest Neighbor, Level 7, k = 2 clusters.

Page 10: Hierarchical Clustering

Nearest Neighbor, Level 8, k = 1 cluster.

Page 11: Hierarchical Clustering

Typical Alternatives to Calculate the Distance between Clusters

Single link: smallest distance between an element in one

cluster and an element in the other, i.e., dis(Ki, Kj) = min(tip,

tjq)

Complete link: largest distance between an element in one

cluster and an element in the other, i.e., dis(Ki, Kj) = max(tip,

tjq)

Average: avg distance between an element in one cluster

and an element in the other, i.e., dis(Ki, Kj) = avg(tip, tjq)

Page 12: Hierarchical Clustering

Functional significant gene clusters

Two-way clustering

Gene clusters

Sample clusters

Page 13: Hierarchical Clustering

Outline

Hierarchical Clustering Hybrid Hierarchical Kmeans

clustering DBscan

Page 14: Hierarchical Clustering

Motivation Among clustering algorithms, Hierarchical and

K-means clustering are the two most popular and classic methods. However, both have their innate disadvantages.

K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly; in other words, you don’t know how to start

Hierarchical clustering is hard to find a place to cut

Page 15: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm

The brief idea is we cluster around half data through Hierarchical clustering and succeed by K-means for the remaining

In order to generate super-rules, we let Hierarchical terminate when it generates the largest number of clusters

Page 16: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm

Page 17: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

Page 18: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

Page 19: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

Page 20: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

Page 21: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

Page 22: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

Page 23: Hierarchical Clustering

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example