minimum-distance-to-means clustering for vector quantization: new algorithms and applications

19
Torna alla prima pagina A short presentation of two interesting unsupervised learning algorithms for vector quantization recently published in the literature CLUSTERING FOR VECTOR QUANTIZATION: NEW ALGORITHMS AND APPLICATIONS

Upload: ceri

Post on 12-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION: NEW ALGORITHMS AND APPLICATIONS. A short presentation of two interesting unsupervised learning algorithms for vector quantization recently published in the literature. Biography. Andrea Baraldi - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Torna alla prima pagina

A short presentation of two interesting unsupervised learning algorithms for vector quantization recently published in the literature

MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:

NEW ALGORITHMS AND APPLICATIONS

Page 2: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Biography Andrea Baraldi Laurea in Elect. Engineering, Univ. Bologna, 1989 Consultant at ESA-ESRIN, 1991-1993 Research associate at ISAO-CNR, Bologna, 1994-1996 Post-doctoral fellowship at ICSI, Berkeley, 1997-1999

Scientific interests Remote sensing applications Image processing Computer vision Artificial intelligence (neural networks)

Page 3: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

About this presentation Basic concepts related to minimum-distance-to-

means clustering Applications in data analysis and image

processing Interesting clustering models taken from the

literature: Fully self-Organizing Simplified Adaptive

Resonance Theory (FOSART, IEEE TSMC, 1999) Enhanced Linde-Buzo-Gray (ELBG, IJKIES, 2000)

Page 4: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Minimum-distance-to-means clustering Clustering as an ill-posed problem (heuristic

techniques for grouping the data at hand) Cost function minimization (inductive learning to

characterize future samples) Mean-square-error minimization = minimum-

distance-to-means (vector quantization) Entropy maximization (equiprobable cluster

detection) Joint probability maximization (pdf

estimation)

Page 5: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Applications of unsupervised vector quantizers Detection of hidden data structures (data

clustering, perceptual grouping) First stage unsupervised learning in RBF

networks (data classification, function regression) (Bruzzone, IEEE TGARS, 1999)

Pixel-based initialization of context-based image segmentation techniques (image partitioning and classification)

Page 6: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

FOSART by A. Baraldi, ISAO-CNR, IEEE TSMC, 1999

Input parameters:• (0,1] (ART-based vigilance threshold)• (convergence threshold, e.g., 0.001)

Constructive: generates (resp. removes) units and lateral connections on an example-driven (resp. mini-batch) basis

Topology-preserving Minimum-distance-to means clustering On-line learning Soft-to-hard competitive Incapable of shifting codewords through non-contiguous Voronoi regions

Page 7: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

FOSART APPLICATIONS: Perceptual grouping of non-convex data sets

Non-convex data set. Circular ring plus three Gaussian clusters. 140 data points.

FOSART processing: 11 templates, 3 maps.

Page 8: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Input: 3-D digitized human face, 9371 data points.

FOSART APPLICATIONS: 3-D surface reconstruction

Output: 3370 nodes, 60 maps.

Page 9: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG by M. Russo and G. Patane`, Univ. Messina, IJKIES, 2000 c-means minimum-distance-to-means clustering (McQueen, 1967; LBG, 1980) Initialized by means of random selection or splitting by two (Moody and Darken,

1988) Non-constructive Batch learning Hard competitive Capable of shifting codewords through non-contiguous Voronoi regions (in line

with LBG-U, Fritzke, 1977)

Input parameters:• c number of clusters• (convergence threshold, e.g., 0.001)

Page 10: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Combination of ELBG with FOSART FOSART initializes ELBG

Input parameters of the two-stage

clustering system are:

(0,1] (ART-based vigilance

threshold)

(convergence threshold, e.g., 0.001)

Page 11: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG algorithm

•Ym: codebook at iteration m

•P(Ym): Voronoi (ideal) partition

•S(Ym): non-Voronoi (sub-optimal) partition

•D{Ym, S(Ym)} D{Ym, P(Ym)}

•Voronoi cell Si, i = 1, …, Nc, such that Si = {x X : d(x, yi) d(x, yj), j=1,…,Nc, j i}

Page 12: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG block

•Utility Ui = Di / Dmean, Ui [0, ), i = 1,…, Nc, adimensional distorsion

• “low” utility (< 1): distorsion below average codeword to be shifted

• “high” utility (> 1): distorsion above average codeword to be split

Page 13: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG block: iterative scheme

• C.1) Sequential search of cell Si to be shifted (distorsion below average)

• C.2) Stochastic search of cell Sp to be split (distorsion above average)

• C.3)

a) Detection of codeword yn closest to yi;

b) “Local” LBG arrangement of codewords yi and yp;

c) Arrangement of yn such that S’n = Sn Si;

• C.4) Compute D’n, D’p and D’i

• C.5) if (D’n + D’p + D’i) < (Dn + D’p + D’i) then accept the shift

Page 14: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG block: initial situation before the shift of codeword attempt

• C.1) Sequential search of cell Si to be shifted

• C.2) Stochastic search of cell Sp to be split

• C.3.a) Detection of codeword yn closest to yi;

Page 15: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG block: initialization of the “local” LBG arrangement of yi and yp

• C.3.b) “Local” LBG arrangement of codewords yi and yp;

Page 16: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG block: situation after the initialization of the shift of codeword attempt

• C.3.a) Detection of codeword yn closest to yi;

• C.3.b) “Local” LBG arrangement of codewords yi and yp;

Page 17: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

ELBG block: situation after the shift of codeword attempt

• C.3.b) “Local” LBG arrangement of codewords yi and yp;

• C.3.c) Arrangement of Yn such that S’n = Sn Si;

• C.4) Compute D’n, D’p and D’i• C.5) if (D’n + D’p + D’i) < (Dn +

D’p + D’i) then accept the shift

Page 18: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Examples Polynomial case (Russo and Patane`, IJKIES 2000) Cantor distribution (same as above) Fritzke’s 2-D data set (same as above) RBF network classification (Baraldi and Blonda,

IGARSS 2000)

Lena image compression

Modified-LBG (Lee et al.,IEEE Signal Proc. Lett., 1997)

ELBG with splitting-by-two ELBG with FOSARTc

PSNR*(db)

MSE Iter.* PSNR(db)

MSE Iter. (split.+ ELBG)

PSNR(db)

MSE Iter.(FOSART+ ELBG)

256 31.92 668.6 20 31.97 660.9 46 + 8 31.98 659.4 3 + 10512 33.09 510.7 17 33.17 499.2 54 + 8 33.22 494.0 3 + 91024 34.42 376.0 19 34.72 349.3 64 + 9 34.78 344.3 3 + 9

Comparison of M-LBG and ELBG in the clustering of the 16-dimensional Lena data set, consisting of 16384vectors (Russo and Patane`, IJKIES, 2000).*: results taken from the literature (Russo and Patane`, IJKIES, 2000).

Page 19: MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION:  NEW ALGORITHMS AND APPLICATIONS

Conclusions

is stable with respect to changes in initial conditions (i.e., it is effective in approaching the absolute minimum of the cost function)

is fast to converge features low overhead with respect to traditional LBG

(< 5%) performs better than or equal to other minimum-

distance-to-means clustering algorithms found in the literature

ELBG (+ FOSART):