self-organizing topological tree for online vector quantization and data clustering

30
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors : Pengei Xu, Chip-Hong, Se nior Member,IEEE Andrew Palins ki , Member, IEEE 2005 Expert Systems with Applications .

Upload: kael

Post on 23-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering. Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors : Pengei Xu, Chip-Hong, Senior Member,IEEE Andrew Palinski , Member, IEEE. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Self-Organizing Topological Tree for Online Vector Quantization and Data

Clustering

Advisor : Dr. Hsu

Graduate : Kuo-min Wang

Authors : Pengei Xu,

Chip-Hong, Senior Member,IEEE

Andrew Palinski , Member, IEEE

2005 Expert Systems with Applications

.

Page 2: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Outline Motivation Objective Introduction Structure of SOTT and Related Terminology Training Algorithm of SOTT Simulation Results Conclusions Personal Opinion

Page 3: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM have two important operations Vector quantization Topology-preserving mapping

Disadvantage in its application to clustering problem

1. Clustering result is sensitive to the number of partitions

2. The clustering result of one partition provides knowledge at only one similarity level

3. Favoring “equally-sized compact spheroidal clusters”

4. Computational complexity is long.

Motivation

Page 4: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective Propose an online self-organizing topological tree

(SOTT) with faster learning. Computational complexity is O (log N) rather than O (N) as for

the basic SOM A hybrid clustering algorithm that fully exploit the online

learning and multi-resolution characteristics of SOTT is devised. A new linkage metric is proposed which can be updated online to

accelerate the time consuming agglomerative hierarchical clustering stage.

Page 5: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction The applications of SOM

Obtain an optimal set of codebook vectors that maximizes the rate-distortion performance

Clustering which aims to segregate a chaotic mixture of patterns for the purpose of knowledge discovery and analysis

Growing SOM[1] overcomes the first problem By growing the SOM to suitable number of partitions

through the insertion of new neurons

Page 6: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Tree-Structured SOM [20] provide a hierarchical structure to reduce the

computation complexity and alleviates the first two problems simultaneously.

GHSOM [26] Proposed to grow a hierarchical SOM to solve the

third problem.

Introduction (cont.)

Page 7: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Vector Quantization 由 Y. Linde, A. Buzo, and R. M. Gray 三位學者於 1980年所提出

將影像切割成一群大小是 n × n的影像區塊 每個以利用事先設計好的編碼簿來處理。

Introduction (cont.)

Page 8: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Generalized Lloyd Algorithm (GLA) [23] 從一群區塊向量 (training vector)中,使用分群 (clustering)的方法,去訓練一個能夠還原原影像區塊的編碼簿

Tree search vector quantizer (TSVQ) [4] 加速搜尋最鄰近碼向量的過程 需要較多的儲存空間 利用樹狀結構編碼簿所得到的影像品質,較傳統向量量化編碼簿的影像品質來的差。

Tree-structure SOM (TS-SOM) [21] Organized layers by layers All training data are fed into the system repeatedly at every layer,

taxes the system resources heavily for large database.

Introduction (cont.)

Page 9: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction (cont.) We propose a new multi-resolution self-organizing

topological tree (SOTT) to accelerate the search procedure. Globally suboptimal and fails to find the real BMU

Using multi-path to overcome How to maintain two kinds of neighborhood relationship co-exist

in the network The inter-layer parent-child relationship And the intra-layer sibling relationship Using winning path to overcome

Page 10: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction (cont.) In hybrid clustering scheme, a low complexity

partition clustering algorithm is first applied to reduce the large amount of data before the computational AHC

Linkage metric is a proximity measure used to merge subset rather than individual points in AHC

SOTT AHC algorithm

Page 11: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Structure of SOTT and Related Terminology A static SOTT can be viewed as a multi-layer SOM, with fixe

d depth and breadth. Input vector and

L, the number of layers and Ni is the number of neurons at the ith layer

The ith layer has neurons Two kinds of relationship

The intra-layer neighborhood The inter-layer neighborhood

nn Rxxxx ),...,,( 21

},...,2,1 ,...,2,1{ , in

ji NjandLiRwW

01 NB i

WRn :

Page 12: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Structure of SOTT and Related Terminology Gi is a fully connected graph by the neurons and their

interconnections at the ith layer A neuron, u is said to be in the k-distance neighborho

od of the neuron v if there is a connected path from u to v and || u – v || ≦k

u is said to be a child of vthe neurons of have the same parent neuron v, are called the siblings

vii HGuandGviff 1

vi HG 1

Page 13: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Training Algorithm of SOTT

i

Tiiij

NjLi

NjNjw

,...,2,1 and ,...,2,1

]/255)2/1(,...,/255)2/1[(

Page 14: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Butterfly Permutation for Input Randomization Online learning causes the learning performance to be

order dependent, when the training set contains a high degree of redundant information

A block based butterfly jumping sequence was used to subsample the pixels from each block to form different training sweeps by Pei and Lo[25].

Page 15: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Butterfly Permutation for Input Randomization (cont.) A global butterfly permutation sequence is used to present the

spatially correlated input data from a multidimensional coordinate system

The aim is to let the neurons learn the characteristics of the training source as early as possible to prevent the performance degraded by order dependent learning.

The butterfly permutation is defined by a mapping :an input order number to a n-dimensional coordinate system, where is a finite integer space which bounded by [0,2J-1]n.

nJn Zxxx ),...,,( 21

nJI ZZ :

IZr nJZ

Page 16: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Searching for the Winning Path The updating of the winning neuron and its neighborhood

Until a winning path has been identified for each input

To trace the winning path, we need to search for a single winning leaf Uses two key parameters λκ to bias the competitiveness of so

me layers and emulate the positive effect of a multi path search

The idea of the algorithm is 1) find the winning child neurons progressively on each layer, until a winni

ng child at the leaf layer is found.

2) Then path to the win_leaf is set as the winning path.

Page 17: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Searching for the Winning Path (cont.)

Page 18: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Updating of Winning Path Neurons and Their Neighborhoods

, is a monotonic decreasing gain function of the sweep time, this neighborhood taper is

mkm 1)0()(

Neighborhood width

Page 19: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

19

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Updating of Winning Path Neurons and Their Neighborhoods Maintain both the intra-layer relationship and

the inter-layer relationship correct is import, the following updating rules are imposed1. The initialization of neighborhood widths is

proportional

2. the children neurons will only be updated if their parent neuron is also updated

3. the neighborhood neurons will only be updated if it is sufficiently close to the winning neuron of their layer.

11 )0(

)0(

i

i

i

i

N

N

)(m

vu

i

i

Page 20: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

20

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Convergence Criteria

If the average square difference of the neuron weights, wLj at the leaf layer is less than 0.1, the training is terminated

1

1

1

||)1()(||L

B

j LjLj

B

mwmwL

Page 21: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

21

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Hybrid Clustering Algorithm on SOTT

The main idea behind the hybrid clustering is to combine the efficiency of the partition clustering and the prowess of discrimination of AHC

To merge clusters rather than individual points the distance between individual points has to be generalized to the distance between clusters (sets of points) SOTT AHC

Page 22: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

22

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Hybrid Clustering Algorithm on SOTT A metric Bond (Ai, Aj) to assess the connectivity of tw

o atomic clusters Ai and Aj is defined as follows:

Computational complexity is O ((ki + kj) kikj)

Page 23: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

23

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Simulation Results Measure the performance of the proposed SOTT in

VQ and compare it to the performance of the SOM, GLA[23], and TSVQ[4]

Page 24: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

24

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Simulation Results (cont.)

Page 25: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

25

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Simulation Results (cont.)

Page 26: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

26

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Simulation Results (cont.)

Page 27: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

27

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Simulation Results (cont.)

Page 28: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

28

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Simulation Results (cont.)

Page 29: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

29

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Conclusions The proposed SOTT hybrid clustering algorith

m has demonstrated to be Computational efficient and possesses good scalability Overcome the clustering performance deficiencies of k-

means and SOM algorithms.

The experimental results show that the computation efficiency of SOTT is much better than that of basic SOM and other vector quantizers.

Page 30: Self-Organizing Topological Tree for Online Vector Quantization and Data Clustering

30

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Personal Opinions Advantage

Computational complexity is faster than others.

Application Pattern classification applications

Drawback The structure of the paper is not good, So it is not easy to understand.