unsupervised evolutionary clustering algorithm for mixed type data

Post on 02-Jan-2016

23 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Unsupervised Evolutionary Clustering Algorithm for Mixed Type Data. Zhi Zheng , Maoguo Gong , Jingjing Ma , Licheng Jiao , Qiaodi Wu 2010,CEC Presented by Chien-Hao Kung 2011/12/1. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Unsupervised Evolutionary Clustering Algorithm for Mixed Type Data

Zhi Zheng , Maoguo Gong , Jingjing Ma , Licheng Jiao , Qiaodi Wu2010,CEC

Presented by Chien-Hao Kung2011/12/1

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines· Motivation· Objectives· Methodology· Experiments· Conclusions· Comments

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

3

Motivation

· As a partitional clustering algorithm, K-prototype (KP) algorithm is a well-known one for mixed type data.

· However, it is sensitive to initialization and converges to local optimum easily.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

4

Objectives

· In this study, KP is applied as a local search strategy, and runs under the Global searching to help KP overcome its flaws.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· K-prototype Algorithm─ Step1.Initializing.─ Step2.For each data item, calculating the distances.

─ Step3.Retest every data item.─ Step4.Repeat Step3. until no item changes its cluster.

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Evolutionary k-prototype(EKP)─ Step1 Initialization.─ Step2 Crossover.─ Step3 Mutation.─ Step4 KP Search.─ Step5 Evaluation and Selection.─ Step6 Termination Test.

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology· Initialization

─ There are 8 parameters have to be set before evolution. Cluster number r is a weight in EKP which balance the influence on clustering Population size Proportion of initial individuals that generated by choosing

items randomly in dataset (IP) Crossover probability Mutation probability in simulated binary crossover(SBX) n in polynomial mutation

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

8

· Initialization─ Two kinds of random initialization schemes

─ The first is randomly choosing K data item as the prototypes of clusters

─ The second is randomly generating K prototypes

Ex:

[2.23,5.63],[6.56,5.13], and {1,2,3,4,5,6},{2,4}

=>{3.21,6.23,2,4}

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

9

· Crossover.─ Numerical type --Simulated binary crossover(SBX)─ Categorical type – Single point crossover

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

10

· Mutation

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

11

· KP Search· Evaluation and Selection

· Termination Test

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

· Parameter setting

Experiments

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

15

· Dataset

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

17

Conclusions· This paper propose a novel unsupervised clustering

algorithm for mixed type data named evolutionary k-prototype(EKP) .

· The experiment result show that the evolutionary framework improves the original algorithms markedly.

· EKP which can adjust this weight automatically needs to be studied.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

18

Comments

· Drawback─ This method use the parameter too much.

Application─ Clustering

top related