robust information-theoretic clustering

14
Robust Information- theoretic Clustering By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant Presenter: Niyati Parikh

Upload: allegra-whitaker

Post on 30-Dec-2015

56 views

Category:

Documents


0 download

DESCRIPTION

Robust Information-theoretic Clustering. By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant Presenter: Niyati Parikh. Objective. Find natural clustering in a dataset Two questions: Goodness of a clustering Efficient algorithm for good clustering. Define “ goodness”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Robust Information-theoretic Clustering

Robust Information-theoretic Clustering

By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant

Presenter: Niyati Parikh

Page 2: Robust Information-theoretic Clustering

Objective

Find natural clustering in a dataset Two questions:

Goodness of a clustering Efficient algorithm for good clustering

Page 3: Robust Information-theoretic Clustering

Define “goodness”

Ability to describe the clusters succinctly Adopt VAC (Volume after Compression)

Record #bytes for number of clusters k Record #bytes to record their type (guassian,

uniform,..) Compressed location of each point

Page 4: Robust Information-theoretic Clustering

VAC

Tells which grouping is better Lower VAC => better grouping Formula using decorrelation matrix Decorrelation matrix = matrix with

eigenvectors

Page 5: Robust Information-theoretic Clustering

Computing VAC

Steps: Compute covariance matrix of cluster C Compute PCA and obtain eigenvector

matrix Compute VAC from the matrix

Page 6: Robust Information-theoretic Clustering

Efficient algorithm

Take initial clustering given by any algorithm

Refine that clustering to remove outliers/noise

Output a better clustering by doing post processing

Page 7: Robust Information-theoretic Clustering

Refining Clusters Use VAC to refine existing clusters Removing outliers from the given cluster C Define Core and Out as set of points for core and outliers

in C Initially Out contains all points in C Arrange points in ascending order of its distance from

center Compute VAC Pick the closest point from Out and move to Core Compute new VAC If new VAC increases then stop, else pick next closest

point and repeat

Page 8: Robust Information-theoretic Clustering

VAC and Robust estimation

-Conventional estimation: covariance matrix uses Mean-Robust estimation: covariance matrix uses Median-Median is less affected by outliers than Mean

Page 9: Robust Information-theoretic Clustering

Sample result-Imperfect clusters formed by K-Means affect purifying process-May result into redundant clusters, that could be merged

Page 10: Robust Information-theoretic Clustering

Cluster Merging

Merge Ci and Cj only if the combined VAC decreases

savedCost(Ci, Cj) = VAC(Ci) + VAC(Cj) – VAC(Ci U Cj)

If savedCost > 0, then merge Ci and Cj Greedy search to maximize savedCost,

hence minimize VAC

Page 11: Robust Information-theoretic Clustering

Final Result

Page 12: Robust Information-theoretic Clustering

Experiment results

Page 13: Robust Information-theoretic Clustering

Example

Page 14: Robust Information-theoretic Clustering

Thank You

Questions?