exploiting data topology in visualization and clustering of self-organizing maps

15
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps Kadim Tas ¸demir and Erzsébet Merényi, Senior Member TNN, 2011 Presented by Hung-Yi Cai 2011/3/9

Upload: stuart

Post on 22-Jan-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps. Kadim Tas ¸ demir and Erzsébet Merényi , Senior Member TNN, 2011 Presented by Hung-Yi Cai 2011/3/9. Outlines. Motivation Objectives Previous Study Methodology Experiments Conclusions Comments. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Kadim Tas ¸demir and Erzsébet Merényi, Senior MemberTNN, 2011

Presented by Hung-Yi Cai2011/3/9

Page 2: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines· Motivation· Objectives· Previous Study· Methodology· Experiments· Conclusions· Comments

Page 3: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

3

Motivation

· Different aspects of the information learned by the SOM are presented by existing methods, but data topology, which is present in the SOM’s knowledge, is greatly underutilized.

· Data topology can be integrated into the visualization of the SOM and thereby provide a more elaborate view of the cluster structure than existing schemes.

Page 4: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

4

Objectives

· To integrate the data topology, present in the SOM’s knowledge, into the visualization of the SOM for improved capture of clusters.

· This objective will be accomplished through a new concept of the “connectivity matrix” and its specific rendering over the SOM.

Page 5: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Previous Study· SOM is a topology preserving mapping

─ Ideally, prototypes(neurons) those are neighbors in SOM map are also neighbors (centroids of neighboring Voronoi polyhedra) in data space and vice versa.

· Growing SOM ─ It appears less robust than the Kohonen SOM because of the large

number of parameters needing adjustment.

· ViSOM─ it requires a relatively large number of prototypes even for small data

sets.

5

Page 6: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Topology visualization through connectivity matrix of SOM

prototypes CONNvis: visualization of the connectivity matrix Assessment of topology preservation with CONNvis

6

Page 7: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Topology visualization through connectivity matrix of SOM prototypes

· Induced Delaunay Triangulation and Voronoi─ It can be determined from the relationships of the best

matching units (BMUs) and the second BMUs.

· Connectivity Matrix─ It is a weighted analog of A, where the weights indicate the

density distribution of the input data among the prototypes adjacent in M.

─ where, RFij means wi is the BMU and wj is the second BMU.

7

N

j iji RFRF1jiij RFRFjiCONN ),(

Page 8: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.CONNvis: visualization of the connectivity matrix

· Line width: Global Importance─ The strength of the connection and reflects the density

distribution among the connected units.

· Line colors: Local Importance─ A ranking of the connectivity strengths of wi .

─ Reveals most-to-least dense regions local to wi in data space.

8

Page 9: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.The threshold of width

9

Page 10: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Assessment of topology preservation with CONNvis

· Topology violations─ connected neural units that are not immediate

neighbors in map (forward topology violations); ─ unconnected neural units that are immediate neighbors

in map (backward topology violations).

10

Page 11: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Remove weak connections

· Remove weak connections that link any two coarse clusters X and Y at their boundary

11

Page 12: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

· A real remote sensing spectral image of Ocean City

12

Page 13: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

· Compare to U-matrix and ISOMAP

13

Page 14: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

14

Conclusions· CONNvis integrates data distribution into the

customary Delaunay triangulation, which, when displayed on the SOM grid, enables 2-D visualization of the manifold structure regardless of the data dimensionality.

· CONNvis is also unique among SOM representations in that it shows both forward and backward topology violations on the SOM grid.

Page 15: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

15

Comments

· Advantages─ CONNvis greatly assists in detailed identification of

cluster boundaries.

· Applications─ Data Clustering