clustering algorithms k-means hierarchic agglomerative clustering (hac) …. birch association rule...

16
Clustering Algorithms • k-means • Hierarchic Agglomerative Clustering (HAC) •…. • BIRCH • Association Rule Hypergraph Partitioning (ARHP) •Categorical clustering (CACTUS, STIRR) •…… •STC •QDC

Upload: audrey-matthews

Post on 02-Jan-2016

233 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Clustering Algorithms

• k-means• Hierarchic Agglomerative Clustering (HAC)•….• BIRCH• Association Rule Hypergraph Partitioning (ARHP)•Categorical clustering (CACTUS, STIRR)•……•STC•QDC

Page 2: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Hierarchical clustering

Given a set of N items to be clustered, and an NxN distance (or similarity) matrix,

1. Start by assigning each item to its own cluster

2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.

3. Compute distances (similarities) between the new cluster and each of the old clusters.

4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Page 3: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Iwona Białynicka-Birula - Clustering Web Search Results

Agglomerative hierarchical clustering

Page 4: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Iwona Białynicka-Birula - Clustering Web Search Results

Clustering result: dendrogram

Page 5: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Iwona Białynicka-Birula - Clustering Web Search Results

AHC variants

• Various ways of calculating cluster similarity

single-link(minimum)

complete-link(maximum)

Group-average(average)

Page 6: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Data ClusteringK-means

Partitional clusteringInitial number of clusters k

Page 7: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

K-means1. Place K points into the space represented by the

objects that are being clustered. These points represent initial group centroids.

2. Assign each object to the group that has the closest centroid.

3. When all objects have been assigned, recalculate the positions of the K centroids.

4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

8

Page 8: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Example by Andrew W. Moore

9

Page 9: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

10

Page 10: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

K-means

11

Page 11: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Iwona Białynicka-Birula - Clustering Web Search Results

K-means clustering (k=3)

Page 12: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

14

Page 13: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

15

Page 14: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Iwona Białynicka-Birula - Clustering Web Search Results

Single-pass

threshold

Page 15: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Document Clustering: k-means •k-means: distance-based flat clustering

•Advantage:•linear time complexity •works relatively well in low dimension space

•Drawback:•distance computation in high dimension space•centroid vector may not well summarize the cluster documents•initial k clusters affect the quality of clusters

0. Input: D::={d1,d2,…dn }; k::=the cluster number;1. Select k document vectors as the initial centriods of k clusters 2. Repeat3. Select one vector d in remaining documents4. Compute similarities between d and k centroids5. Put d in the closest cluster and recompute the centroid 6. Until the centroids don’t change7. Output:k clusters of documents

Page 16: Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering

Document Clustering: HAC •Hierarchic agglomerative clustering(HAC):distance-based hierarchic clustering

•Advantage:•producing better quality clusters•works relatively well in low dimension space

•Drawback:•distance computation in high dimension space•quadratic time complexity

0. Input: D::={d1,d2,…dn };1. Calculate similarity matrix SIM[i,j] 2. Repeat3. Merge the most similar two clusters, K and L, to form a new cluster KL4. Compute similarities between KL and each of the remaining cluster and update SIM[i,j]5. Until there is a single(or specified number) cluster6. Output: dendogram of clusters