a link-based cluster ensemble approach for categorical data clustering

15
Intelligent Database Systems Presenter : JIAN-REN CHEN Authors : Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, and Chris Price 2012 , IEEE A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Upload: myra

Post on 22-Feb-2016

38 views

Category:

Documents


15 download

DESCRIPTION

A Link-Based Cluster Ensemble Approach for Categorical Data Clustering. Presenter : Jian-Ren Chen Authors : Natthakan Iam -On, Tossapon Boongoen , Simon Garrett, and Chris Price 2012 , IEEE. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Presenter : JIAN-REN CHEN

Authors : Natthakan Iam-On, Tossapon Boongoen,

   Simon Garrett, and Chris Price

2012 , IEEE

A Link-Based Cluster Ensemble Approachfor Categorical Data Clustering

Page 2: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Motivation• Cluster Ensembles:

combine different clustering decisions in such a

way as to achieve accuracy superior to that of

any individual clustering.

Page 4: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Objectives• A new link-based approach improves the conventional

matrix by discovering unknown entries through

similarity between clusters in an ensemble.

Page 5: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Methodology

Creating a Cluster Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Page 6: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Creating a Cluster Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

MethodologyType I (Direct ensemble):

Type II (Full-space ensemble)

Type III (Subspace ensemble)

Page 7: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

MethodologyCreating a Cluster

Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Page 8: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

MethodologyCreating a Cluster

Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Page 9: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Methodology

• given a graph G = (V,W)• SPEC finds the K largest eigenvectors

of W• formed another matrix U

Creating a Cluster Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Page 10: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments

• Investigated Data Sets

Page 11: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments

Page 12: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments

Page 13: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments

Page 14: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Conclusions• Constructing the RM is efficiently resolved by the

similarity among categorical labels, using the

Weighted Triple-Quality similarity algorithm.

• The link-based method usually achieves superior

clustering results.

Page 15: A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Intelligent Database Systems Lab

Comments• Advantages– The link-based method is efficient.

• Applications– Categorical Data Clustering