fully automatic clustering system

36
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Advisor : Dr. Hsu Graduate : Sheng-Hs uan Wang Authors : Giuseppe Pat ane Marco Russo Department of Information Man agement Fully Automatic Clustering System IEEE Transactions on Neural Networks, vol. 13, no. 6, November 2002

Upload: lapis

Post on 05-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Fully Automatic Clustering System. Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Giuseppe Patane Marco Russo Department of Information Management. IEEE Transactions on Neural Networks, vol. 13, no. 6, November 2002. Outline. Motivation Objective - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fully Automatic Clustering System

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Advisor : Dr. Hsu

Graduate : Sheng-Hsuan Wang

Authors : Giuseppe Patane

Marco Russo

Department of Information Management

Fully Automatic Clustering System

IEEE Transactions on Neural Networks, vol. 13, no. 6, November 2002

Page 2: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Introduction VQ Previous Works: ELBG FACS Results Conclusion Personal Opinion Review

Page 3: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

Fully automatic clustering? The number of computations per iteration.

Page 4: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

In this paper, the fully automatic clustering system (FACS) is presented.

The objective is the automatic calculation of the codebook of the right dimension, the desired error being fixed.

In order to save on the number of computations per iteration, greedy techniques are adopted.

Page 5: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction

Cluster Analysis(CA, or clustering). Vector Quantization (VQ).

Groups (or cells). Each cell is represented by a vector (called codeword). The set of the codewords is called the codebook.

The different of CA and VQ. Grouping data into a certain number of groups so that

a loss (or error) function is minimized.

Page 6: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering and VQ

Page 7: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VQ-Definition

The objective of VQ is the representation of a set of feature vectors by a set, , of reference vector in .

kXx },...,{ 1 cNyyY

cN k

(1) ,...,1 },)(:{ cii NiyxqXxS

Page 8: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VQ-Quantization Error(QE)

Square error(SE)

Weighted square error(WSE)(2) )(),(

1

2

k

iii yxyxd

(3) )(),(1

2

k

iiii yxwyxd

(5) ),(

(4) 1

))(,(1

}),({

:

11

in

cp

Sxnini

N

ii

p

N

ppp

p

yxdD

DN

xqxdN

SYDMQE

Page 9: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VQ-Nearest neighbor condition (NNC) Nearest neighbor condition (NNC): Given a fixed

codebook Y, the NNC consists in assigning to each input vector the nearest codeword.

(7) )})(,({}),({

set, datainput theof Spartition every for

},...,{)( symbol And partition. Voronoi thecalled This

set. datainput theofpartition a constitute defined,just sets The

(6) ,...,1

},,,...,1),,(),(:{

1

YPYDSYD

SSYP

S

Ni

ijNjyxdyxdXxS

cN

i

c

cjii

Page 10: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.VQ-Centroid condition (CC)

Centroid condition (CC): Given a fixed partition S, the CC concerns the procedure for finding the optimal codebook.

(10) })),(X({},{

Ycodebook every for

(9) },...,1);({)(X

(8) 1

)(

SSDSYD

NiSxS

xN

Ax

ci

AxA

Page 11: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Previous Works: ELBG

The starting point of the research reported in this paper was our previous work: the ELBG [39]. Initialization. Partition calculation. According to the NNC (6). Termination condition check. ELBG-block execution. New codebook calculation. According to the CC (9). Return to Step 2.

prevcurrprev DDD /||

Page 12: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.A. ELBG-Block

The basic idea of the ELBG-block. Joining a low-distortion cell with a cell adjacent to it. A high-distortion cell is split into two smaller ones.

If we define the mean distortion per cell asmeanD

(11) 1

1

CN

ii

Cmean D

ND

Page 13: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.A. ELBG-Block

Page 14: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.A. ELBG-Block

Page 15: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.A. ELBG-Block

1) SoCAs (shift of codeword attempt):

is looked for in a stochastic way.

)( cell distortion-low a :)( cellth

from distance minimum the

has )( codeword whosecell the:)( cellth

)( cell distortion-low a :)( cellth

meanpp

i

ll

meanii

DDSp

y

ySj

DDSi

(12)

value.distortion its toalproportion y probabilit a with cell a choose We

:

meanh DDhh

pp

p

D

DP

PpP

Page 16: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.A. ELBG-Block

Splitting: We place both and on the principal diagonal of

; in this sense, we can say that the two codewords are near each other.

Executing some local rearrangements. Union:

iypy

pI

(14) )(ill

ll

SSSSxy

(13) ],[...],[],[ 2211 kMkmMmMmp xxxxxxI

Page 17: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.A. ELBG-Block

2) Mean Quantization Error Estimation and Eventual SoC: After the shift, we have a new codebook (Y’) and a new par

tition (S’). Therefore, we can calculate the new MQE. If it is lower than the value we had before the SoCA, this is

confirmed. Otherwise, it is rejected.

Page 18: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.B. Conderations Regarding the ELBG

Insertions are effected in the regions where the error is higher ; Deletions where the error is lower.

operations are executed locally. Several insertions or deletions can be effected during

the same iteration always working locally.

Page 19: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.FACS

Introduction. The CA/VQ technique whose objective is to automatically f

ind the codebook of the right dimension. FACS - increase or decrease happens smartly.

To insert new codewords where the QE is higher. To eliminate them where the error is lower.

Page 20: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.FACS iteration

Page 21: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Smart growing phase.

Page 22: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.p versus the number of iteration

Page 23: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Smart reduction phase.

Page 24: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.FACS

The cell to eliminate is chosen with a probability that is a decreasing function of its distortion.

(16) )(

:

meanh DDhhmean

pmeanp DD

DDP

Page 25: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Behavior of FACS Versus the Number of Iterations and Termination Condition

Page 26: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Discussion about outliers

Page 27: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Result

Introduction. Comparison With ELBG. Comparison With GNG and GNG-U. Comparison With FOSART. Comparison With the Competitive Agglomeration

Algorithm. Classification.

Page 28: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.B. Comparison with ELBG

Page 29: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.C. Comparison With GNG and GNG-U.

GNG, GNG-U.

Insert codewords until The prefixed number. The “performance measure”

is fulfilled.

Our case, Te

Page 30: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.D. Comparison With FOSART.

The family of the ART algorithms called FOSART. They use it also for tasks of VQ.

Page 31: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.E. Comparison With the Competitive Agglomeration.

Page 32: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.F. Classification

Comparison between FACS and the GCS algorithm for a problem, the two spirals, of supervised classification.

Mode 1: The input is constituted by 194 2-D vectors representing the two

spirals. The output is the related membership class (0 or 1). We employed the WSE.

Mode 2: The clustering phase occurs using only the part of the patterns

related to the input, and using SE.

Page 33: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.F. Classification(cont.)

Page 34: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

FACS, a new algorithm for CA/VQ that is able to autonomously find the number of codewords once the desired quantization error is specified.

In comparison to previous similar works a significative improvement in the running time has been obtained.

Further studies will be made regarding the use of different distortion measures.

Page 35: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Personal Opinion

The starting point of the research reported in this paper was author’s previous work:the ELBG.

The QE is a key index.

Page 36: Fully Automatic Clustering System

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Review

Clustering V.S VQ. Previous works: ELBG. FACS

Smart Growing Smart Reduction