extensions of vector quantization for incremental clustering

12
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology Extensions of vector quantization for incremental clustering Edwin Lughofer PR, Vol.41 2008, pp. 995–1011 Presenter : Wei-Shen Tai 2011/1/19

Upload: jenna

Post on 23-Feb-2016

86 views

Category:

Documents


0 download

DESCRIPTION

Extensions of vector quantization for incremental clustering. Edwin Lughofer PR, Vol.41 2008, pp. 995–1011 Presenter : Wei- Shen Tai 20 11 / 1/19. Outline . Introduction Vector quantization Extensions of vector quantization Evaluation Conclusion and outlook Comments . Motivation . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extensions of vector quantization  for incremental clustering

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Extensions of vector quantization for incremental clustering

Edwin Lughofer

PR, Vol.41 2008, pp. 995–1011

Presenter : Wei-Shen Tai

2011/1/19

Page 2: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

2

Outline Introduction Vector quantization Extensions of vector quantization Evaluation Conclusion and outlook Comments

Page 3: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

3

Motivation Incremental clustering processes

Quite often online measurements are recorded resulting in data streams for various applications.

In an online manner, guarantee that queries are up-to-date and that results can be answered with a small time delay.

Page 4: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

4

Objective An incremental and evolving vector quantization

Processes data streams in a on-line clustering scheme. Omits pre-definition of the number of clusters and

improve the quality of cluster partitions with several strategies.

Page 5: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

5

Vector quantization1. Choose initial values for the C cluster centers.2. Fetch out the next data sample of the data set.3. Calculate the distance of the selected data point to all cluster centers.4. Elicit the cluster center which is closest to the data point.5. Update the p components of the winning cluster by moving it towards the

selected point.

6. If the data set contains data points which were not processed through steps 2–5, goto step 2.

7. If any cluster center was moved significantly in the last iteration, say more than , reset the pointer to the data buffer at the beginning and goto step 2, otherwise stop.

Page 6: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

6

Vector quantization in incremental mode

Stability / plasticity dilemma in ART-2 Using vigilance parameter ρ to control the tradeoff between adaptation

of already learned clusters (stability) and generation of new clusters (plasticity).

Differences between VQ and VQ-INC The starting number of clusters is zeros. If the distance between the incoming input x and the closest cluster

center cwin is larger than ρ and x is not faulty, a new cluster will be created. Otherwise, cwin is updated to move toward to x.

Update the ranges of all p variables if x is not faulty. Besides, η is changed with the amount of data points belonging to each cluster in a monotonic decreasing way.

Page 7: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

7

An alternative distance strategy Both ‘over-clustering’ and incorrect partition of the

input space occur in VQ-INC. Instead of classic Euclidean distance, the ranges of influence

for all clusters or the surface along the direction towards the cluster center are applied in VQ-INC-EXT.

Page 8: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

8

Satellite deletion Cluster satellites

Undesirable tiny clusters, which lie very close to significantly bigger ones.

Identify outliers and satellites If ki/N <1%, cluster i is regarded as an outlier cluster. If ki/N < low_mass and ci lies inside the range of influence

of any other cluster, elicit the closest center cwin. Calculate the distance of ci to the surface of all other

clusters.

Page 9: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

9

A split-and-merge strategy Parameter ρ

Cannot be known in advance and a bad setting may cause an incorrect cluster structure.

Not-optimal clustering It is prevented by merging clusters grown together or by

splitting big clusters including more than one distinct data cloud.

Calculate the quality of cluster partition in three phases including before spilt, after spilt (p results) and after merged. Then pick the best cluster partition to replace existing one.

Page 10: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

10

Evaluation

Page 11: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

11

Conclusion and outlook A new extended vector quantization (VQ-INCEXT)

Can be applied for data streams in fast online applications or for huge data bases.

Provides an incremental learning scheme and incorporates new distance measurement, satellite deletion and online split-and-merge strategy.

Outlooks Split-and-merge strategy may suffer from computation speed. Reacting to drifts or shifts in the data, drifts changes the distribution of

the underlying data smoothly over time; shifts trigger abrupt and sudden changes of the data characteristics.

Page 12: Extensions of vector quantization  for incremental clustering

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

12

Comments Advantage

This proposed method extends VQ to a incremental learning VQ and adds several strategies to improve the quality of cluster partition simultaneously.

Data streams can be effectively processed by this on-line learning VQ. Drawback

In algorithm 3, the vector of winning cluster is updated by Eq.(1) according to the Manhattan distance between the winning cluster and the input whenever the new distance strategy is applied.

Application Data stream on-line learning issue.