clustering algorithms meta applier (cama) toolbox dmitry s. shalymov kirill s. skrygan dmitry a....
DESCRIPTION
Clustering Problem Clustering and Classification SYRCoSE’09TRANSCRIPT
Clustering Algorithms Meta Applier (CAMA) Toolbox
Dmitry S. ShalymovKirill S. SkryganDmitry A. Lyubimov
ClusteringClustering• Goals
– To detect the underlying structure in data– To reduce data set capacity– To extract unique objects
• Usage – Data mining– Machine learning– Financial mathematics– Optimization– Statistics– Pattern recognition– Control strategies development
SYRCoSE’09
Clustering ProblemClustering Problem
Xxxx n },...,,{ 21
),( xx
YXA :lg
Clustering and Classification
min][
),(][
ji ji
ji jiji
yy
xxyyW
max
][
),(][
ji ji
jiji ji
yy
xxyyB
SYRCoSE’09
Variety of Clustering AlgorithmsVariety of Clustering Algorithms
• Hierarchical– Aglomerative– Partitioning
• Iterative– Hard (K-means, SVM, SPSA)– Fuzzy (FCM)
Important parameters-Distance norm-Number of clusters-Initial values of cluster centers
SYRCoSE’09
Cluster Stability AlgorithmsCluster Stability Algorithms
• Indexes
• Stability (similarity, merit) functions
• Probabilistic measures assessing the likelihood of a decision
• Density estimation approaches
SYRCoSE’09
Stochastic ApproximationStochastic Approximation
0/:* L)(1 kkkkk ga
/)( Lg
k
ikkikkkki c
ecyecyg2
)()()(
kik
kkkkkkkki c
cycyg
2)()()( T
kpkkk ),...,,( 21
Recursive stochastic approximation
FDSA
SPSA
SYRCoSE’09
SYRCoSE’09
Effectiveness of SPSAEffectiveness of SPSA
SYRCoSE’09
Finding the number of clusters in data setFinding the number of clusters in data set
• Run the SPSA algorithm for different numbers of clusters, K, and calculate the corresponding distortions
• Select a transformation power, Y
• Calculate the “jumps” in transformed distortion
• Estimate the number of clusters in the data set by
1 KY
KY
K ddJ
Kd
KK JK maxarg*
SYRCoSE’09
Structure of data set detectionStructure of data set detection
SYRCoSE’09
ExamplesExamples
• Iris (3 clusters, 4 features, 150 instances)
• Wine (3 clusters, 13 features, 178 instances)
• Breast Cancer (2 clusters, 32 features, 569 instances)
• Image Segmentation (7 clusters, 19 features, 2310 instances)
SYRCoSE’09
Software Tools for Clustering AnalysisSoftware Tools for Clustering Analysis
• Research– COMPACT– DCPR (Data Clustering & Pattern Recognition)– FCDA (Fuzzy Clustering and Data Analysis Toolbox)– ClusterPack Matlab Toolbox– The Curve Clustering Toolbox– SOM (Self-Organizing Map)– Spectral Clustering Toolbox– Yashil's FCM Clustering
• License software– SPSS– STATISTICA
• Characteristics– Visualization– Efectiveness analysis with patterns– Tools to check performance
• Shortcomings– Limited number of data sets and algorithms– No possibilities to load own algorithm– No on-line services– MATLAB
SYRCoSE’09
Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier
SYRCoSE’09
Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier
SYRCoSE’09
CAMA. KernelCAMA. Kernel
SYRCoSE’09
CAMA. KernelCAMA. Kernel
SYRCoSE’09
CAMA ToolboxCAMA Toolboxhttp://ancient.punklan.net:8084/CAMA2/index.jsphttp://ancient.punklan.net:8084/CAMA2/index.jsp
SYRCoSE’09
CAMA ToolboxCAMA Toolbox
SYRCoSE’09
CAMA ToolboxCAMA Toolbox
SYRCoSE’09
Thank you!
SYRCoSE’09