clustering algorithms meta applier (cama) toolbox dmitry s. shalymov kirill s. skrygan dmitry a....

20
Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Upload: isabella-lester

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Clustering Problem Clustering and Classification SYRCoSE’09

TRANSCRIPT

Page 1: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Clustering Algorithms Meta Applier (CAMA) Toolbox

Dmitry S. ShalymovKirill S. SkryganDmitry A. Lyubimov

Page 2: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

ClusteringClustering• Goals

– To detect the underlying structure in data– To reduce data set capacity– To extract unique objects

• Usage – Data mining– Machine learning– Financial mathematics– Optimization– Statistics– Pattern recognition– Control strategies development

SYRCoSE’09

Page 3: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Clustering ProblemClustering Problem

Xxxx n },...,,{ 21

),( xx

YXA :lg

Clustering and Classification

min][

),(][

ji ji

ji jiji

yy

xxyyW

max

][

),(][

ji ji

jiji ji

yy

xxyyB

SYRCoSE’09

Page 4: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Variety of Clustering AlgorithmsVariety of Clustering Algorithms

• Hierarchical– Aglomerative– Partitioning

• Iterative– Hard (K-means, SVM, SPSA)– Fuzzy (FCM)

Important parameters-Distance norm-Number of clusters-Initial values of cluster centers

SYRCoSE’09

Page 5: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Cluster Stability AlgorithmsCluster Stability Algorithms

• Indexes

• Stability (similarity, merit) functions

• Probabilistic measures assessing the likelihood of a decision

• Density estimation approaches

SYRCoSE’09

Page 6: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Stochastic ApproximationStochastic Approximation

0/:* L)(1 kkkkk ga

/)( Lg

k

ikkikkkki c

ecyecyg2

)()()(

kik

kkkkkkkki c

cycyg

2)()()( T

kpkkk ),...,,( 21

Recursive stochastic approximation

FDSA

SPSA

SYRCoSE’09

Page 7: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

SYRCoSE’09

Page 8: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Effectiveness of SPSAEffectiveness of SPSA

SYRCoSE’09

Page 9: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Finding the number of clusters in data setFinding the number of clusters in data set

• Run the SPSA algorithm for different numbers of clusters, K, and calculate the corresponding distortions

• Select a transformation power, Y

• Calculate the “jumps” in transformed distortion

• Estimate the number of clusters in the data set by

1 KY

KY

K ddJ

Kd

KK JK maxarg*

SYRCoSE’09

Page 10: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Structure of data set detectionStructure of data set detection

SYRCoSE’09

Page 11: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

ExamplesExamples

• Iris (3 clusters, 4 features, 150 instances)

• Wine (3 clusters, 13 features, 178 instances)

• Breast Cancer (2 clusters, 32 features, 569 instances)

• Image Segmentation (7 clusters, 19 features, 2310 instances)

SYRCoSE’09

Page 12: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Software Tools for Clustering AnalysisSoftware Tools for Clustering Analysis

• Research– COMPACT– DCPR (Data Clustering & Pattern Recognition)– FCDA (Fuzzy Clustering and Data Analysis Toolbox)– ClusterPack Matlab Toolbox– The Curve Clustering Toolbox– SOM (Self-Organizing Map)– Spectral Clustering Toolbox– Yashil's FCM Clustering

• License software– SPSS– STATISTICA

• Characteristics– Visualization– Efectiveness analysis with patterns– Tools to check performance

• Shortcomings– Limited number of data sets and algorithms– No possibilities to load own algorithm– No on-line services– MATLAB

SYRCoSE’09

Page 13: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

Page 14: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

Page 15: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

CAMA. KernelCAMA. Kernel

SYRCoSE’09

Page 16: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

CAMA. KernelCAMA. Kernel

SYRCoSE’09

Page 17: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

CAMA ToolboxCAMA Toolboxhttp://ancient.punklan.net:8084/CAMA2/index.jsphttp://ancient.punklan.net:8084/CAMA2/index.jsp

SYRCoSE’09

Page 18: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

Page 19: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

Page 20: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

Thank you!

SYRCoSE’09