ric: parameter-free noise-robust clustering

13
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology RIC: Parameter-Free Noise-Robust Clustering Presenter : Shu-Ya Li Authors : CHRISTIAN BO¨ HM, CHRISTOS FALOU TSOS, JIA-YU PAN, CLAUDIA PLANT TKDD, 2007

Upload: bailey

Post on 06-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

RIC: Parameter-Free Noise-Robust Clustering. Presenter : Shu-Ya Li Authors : CHRISTIAN BO¨ HM, CHRISTOS FALOUTSOS, JIA-YU PAN, CLAUDIA PLANT. TKDD, 2007. Outline. Motivation Objective Methodology Experiments and Results Conclusion Personal Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RIC:  Parameter-Free Noise-Robust Clustering

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

RIC: Parameter-Free Noise-Robust Clustering

Presenter : Shu-Ya Li

Authors : CHRISTIAN BO¨ HM, CHRISTOS FALOUTSOS,

JIA-YU PAN, CLAUDIA PLANT

TKDD, 2007

Page 2: RIC:  Parameter-Free Noise-Robust Clustering

2Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation

Objective

Methodology

Experiments and Results

Conclusion

Personal Comments

Page 3: RIC:  Parameter-Free Noise-Robust Clustering

3Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

How to find a natural clustering of a real-world point set which contains

an unknown number of clusters with different shapes

the clusters may be contaminated by noise?

Page 4: RIC:  Parameter-Free Noise-Robust Clustering

4Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

Find natural clustering in a dataset Goodness of a clustering

We use Volume after Compression (VAC) to quantify the ‘goodness’ of a grouping by.

Efficient algorithm for good clustering

Robust Fitting Cluster Merging

MDL for classificationVAC for clustering

Page 5: RIC:  Parameter-Free Noise-Robust Clustering

5Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VAC (Volume after Compression )

VAC Tells which grouping is better

Lower VAC => better grouping

Formula using decorrelation matrix

Computing VAC Compute covariance matrix of cluster C

Compute PCA and obtain decorrelation matrix

Compute VAC from the matrix

Page 6: RIC:  Parameter-Free Noise-Robust Clustering

6Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Computing VAC

VAC (volume after compression) Record bytes to record their type (guassian, uniform,..)

Record bytes for number of clusters k

The bytes to describe the parameters of each distribution (e.g., mean, variance, covariance, slope, intercept) and then the location of each point

Cluster Model

stat = (μi, σi, lbi, ubi, ...)

2.3+4.3=6.6bits

Page 7: RIC:  Parameter-Free Noise-Robust Clustering

7Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology – RIC framework

Robust Fitting Mahalanobis distance defined by Λ and V

Conventional estimation: covariance matrix uses Mean

Robust estimation: covariance matrix uses Median

Median is less affected by outliers than Mean

PCA (Σ = V ΛV T)

median

μ

μR

Page 8: RIC:  Parameter-Free Noise-Robust Clustering

8Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology – RIC framework

Cluster Merging Merge Ci and Cj only if the combined VAC decreases

If savedCost > 0, then merge Ci and Cj

Greedy search to maximize savedCost, hence minimize VAC

Page 9: RIC:  Parameter-Free Noise-Robust Clustering

9Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Results on Synthetic Data

Page 10: RIC:  Parameter-Free Noise-Robust Clustering

10Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Performance on Real Data

Page 11: RIC:  Parameter-Free Noise-Robust Clustering

11Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Compares the result of filterOpt to the result of filterDist.

Page 12: RIC:  Parameter-Free Noise-Robust Clustering

12Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

The contributions of this work are the answers to the two questions, organized in our RIC framework. (Q1) Goodness Measure.

We propose the VAC criterion using information-theory concepts, and specifically the volume after compression.

(Q2) Efficiency. Robust fitting (RF) algorithm, which carefully avoids outliers.

Cluster merging (CM) algorithm, which stitches clusters together if the stitching gives a better VAC score.

Page 13: RIC:  Parameter-Free Noise-Robust Clustering

13Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Personal Comments

Advantage Description detail

Many pictures and examples

Drawback It is difficult to identify black and white picture.

Application Clustering