model order selection for multiple cooperative swarms clustering

8/3/2019 Model Order Selection for Multiple Cooperative Swarms Clustering

http://slidepdf.com/reader/full/model-order-selection-for-multiple-cooperative-swarms-clustering 1/15

Model order selection for multiple cooperative swarms clustering

using stability analysis

Abbas Ahmadi a,⇑, Fakhri Karray b, Mohamed S. Kamel b

a Industrial Engineering Department, Amirkabir University of Technology, Tehran, Iranb Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada

a r t i c l e i n f o

Article history:

Available online 28 October 2010

Keywords:

Model order selection

Data clustering

Particle swarm optimization

Cooperative swarms

Swarm intelligence

a b s t r a c t

Extracting different clusters of a given data is an appealing topic in swarm intelligence

applications. This paper introduces two main data clustering approaches based on particle

swarm optimization, namely single swarm and multiple cooperative swarms clustering. A

stability analysis is next introduced to determine the model order of the underlying data

using multiple cooperative swarms clustering. The proposed approach is assessed using

different data sets and its performance is compared with that of k-means, k-harmonic

means, fuzzy c -means and single swarm clustering techniques. The obtained results indi-

cate that the proposed approach fairly outperforms the other clustering approaches in

terms of different cluster validity measures.

Ó 2010 Elsevier Inc. All rights reserved.

1. Introduction

Recognizing subgroups of the given data is of interest in data clustering. A vast number of clustering techniques have

been developed to deal with unlabelled data based on different assumptions about the distribution, shape and size of the

data. Most of the clustering techniques require a priori knowledge about the number of clusters [5,25], whereas some other

approaches are capable of extracting such information [16].

Swarm intelligence approaches such as particle swarm optimization (PSO), biologically inspired by the social behavior of

flocking birds [15], have been applied for clustering applications [1,3,7,8,19,22–24]. The goal of PSO-based clustering tech-

niques is usually to find cluster centers. Most of the recent swarm clustering techniques use a single swarm approach to

reach a final clustering solution [8,18,19]. Multiple swarms clustering has been recently proposed [3]. A multiple swarms

clustering approach is useful to deal with high dimensional data as it uses a divide and conquer strategy. In other words, it

distributes the search space among multiple swarms, each of which explores its associated division while cooperating with

others. The novelty of this paper is to apply a stability analysis for determining the number of clusters in underlying data

using multiple cooperative swarms [4].This paper is organized as follows. First, an introduction to cluster analysis is given. Particle swarm optimization and par-

ticle swarm clustering approaches are next explained. Then, model order selection using stability analysis is described. Fi-

nally, experiments using eight different data sets and concluding remarks are provided.

2. Cluster analysis

Organizing a set of unlabeled data points Y into several clusters using some similarity measure is known as clustering [9].

The notion of similarity between samples is usually represented using their corresponding distance. Each cluster C k contains

0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved.doi:10.1016/j.ins.2010.10.010

⇑ Corresponding author. Tel.: +98 21 222 39403.

E-mail address: [email protected] (A. Ahmadi).

Information Sciences 182 (2012) 169–183

Contents lists available at ScienceDirect

Information Sciences

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / i n s

http://dx.doi.org/10.1016/j.ins.2010.10.010

mailto:[email protected]


http://www.sciencedirect.com/science/journal/00200255

http://www.elsevier.com/locate/ins

http://www.elsevier.com/locate/ins

http://www.sciencedirect.com/science/journal/00200255


mailto:[email protected]




a set of similar data points given by C k ¼ y k j

n onk

j¼1, where y k j denotes data point j in cluster k, nk indicates the number of its

associated data points and K is the number of clusters. Let us assume the solution of a clustering algorithm AK (Y ) for the given

data points Y of size N is presented by T :¼ AK (Y ) which is a vector of labels T ¼ ftigN i¼1, where t i denotes the obtained label for

data point i and t i 2 L : f1; . . . ; K g.

The main approaches for grouping data are hierarchical and partitional clustering. The hierarchical clustering approach

generates a hierarchy of clusters known as a dendrogram. Apparently, the dendrogram can be broken at different levels

to yield different clusterings of the data [13]. To build the dendrogram, agglomerative and divisive approaches are used.

In an agglomerative approach, each data point is initially considered as a cluster. Then, the two close clusters merge togetherand produce a new cluster. Merging close clusters is continued until all points form a single cluster. Different notions of

closeness exist, which are single link, complete link and average link.

In contrast to the agglomerative approach, the divisive approach begins with a single cluster containing all data points. It

then splits the data points into two separate clusters. This procedure continues till each cluster includes a single data point

[13].

In the partitional approach to data clustering, the aim is to partition a given data set into a pre-specified number of clus-

ters. Various partitional clustering algorithms are available. The most famous one is the k-means algorithm. The k-means

(KM) procedure initiates with k arbitrary random points as cluster centers. The algorithm assigns each data point to the near-

est center. New centers are then computed based on the associated data points of each cluster. This procedure is repeated

until no improvement is obtained after a certain number of iterations.

Unlike k-means, the k-harmonic means (KHM) algorithm, introduced by Zhang and Hsu [25], does not rely on the initial

solution. It utilizes the harmonic average of distances from each data point to the centers. As compared to the k-means algo-

rithm, it improves the quality of clustering results in certain cases [25].Another extension to k-means algorithm was suggested by Bezdeck using fuzzy set theory [5]. This algorithm is known as

fuzzy c -means (FCM) clustering in which every data point is associated to each cluster with some degree of membership.

Another class of partitional clustering is probabilistic clustering approaches, such as particle swarm-based clustering,

which are developed using probability theory [14].

In particle swarm clustering, the intention is to find the centers of clusters such that an objective function is optimized.

The objective function can be defined in terms of cluster validity measures. These measures are used to evaluate the quality

of clustering techniques [11]. There are numerous measures, which mainly tend to minimize intra-cluster distance, and/or

maximize the inter-cluster distance. Some widely used quality measures of clustering techniques are next described.

2.1. Compactness measure

Compactness measure, also represented by within-cluster distance, indicates how compact the clusters are [9]. This mea-

sure is denoted by F c ðm1;

. . .;

mK

Þ or simply by F c ðMÞ, where M = (m1

, . . . , mK

). Having K clusters, compactness measure isdefined as

F c ðMÞ ¼1

K

XK

k¼1

1

nk

Xnk

j¼1

d mk; y k j

; ð1Þ

where mk denotes the center of the cluster k and d(Á) stands for Euclidean distance. Clustering techniques tend to minimize

this measure.

2.2. Separation measure

This measure, also known as between-cluster distance, evaluates the separation of the clusters [9]. It is given by

F sðMÞ ¼

1

K ðK À 1ÞXK

j¼1

XK

k¼ jþ1 d m

j;m

kÀ Á:ð

2Þ

It is desirable to maximize this measure, or equivalently minimize ÀF sðMÞ.

2.3. Turi’s validity index

Turi’s validity index [21] is defined as

F TuriðMÞ ¼ ðc Â N ð2; 1Þ þ 1Þ Âintra

inter ; ð3Þ

where c is a user-specified parameter, equal to unity in this paper, and NðÁÞ is a Gaussian distribution with l = 2 and r = 1.

The intra denotes the within-cluster distance provided in Eq. (1). Also, the inter term is the minimum Euclidean distance be-

tween the cluster centers computed by

inter ¼ minfdðmk;mqÞg; k ¼ 1; 2; . . . ;K À 1; q ¼ k þ 1; . . . ;K : ð4Þ

170 A. Ahmadi et al. / Information Sciences 182 (2012) 169–183



The aim of the different clustering approaches is to minimize Turi’s index.

2.4. Dunn’s index

Let’s define a(C k, C q) and b(C k) as

a C k;C q

¼ min

x 2C k ;z2C qdð x ; zÞ;

bðC kÞ ¼ max x ;z2C k

dð x ; zÞ:ð5Þ

Now, Dunn’s index [10] can be computed as

F DunnðMÞ ¼ min16k6K

minkþ16q6K

a C k; C q

max16~k6K bðC

~kÞ

0@

1A

8<:

9=;: ð6Þ

Clustering techniques are required to maximize Dunn’s index.

2.5. S_Dbw index

Let the average scattering of the clusters is considered as a measure of compactness expressed by

Scatt ¼ K À1XK

k¼1

rðC kÞ rðY Þk k

; ð7Þ

where r(Á) stands for the variance of the data and kÁk indicates Euclidean norm. Then, the separation measure is given by

Sep ¼1

K ðK À 1Þ

XK

k¼1

XK

q¼1

D zk;q

À Ámax D mkð Þ;D mqð Þf g

; q–k; ð8Þ

where zk,q is the middle point of the line segment defined by cluster centers mk and mq. Also, DðmkÞ denotes a density func-

tion around point mk which is estimated by DðmkÞ ¼Pnk

j¼1 f mk; yk

j

, and

f mk; yk j ¼

1 if d mk; yk j

< ~r

0 Otherwise;( ð9Þ

where ~r ¼ K À1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PK

k¼1krðC kÞq

k. Finally, S_Dbw index [11,12] is defined as

F S DbwðMÞ ¼ Scatt þ Sep: ð10Þ

Maximizing this index is of interest when trying to cluster a set of data into several groups.

3. Particle swarm clustering

Particle swarm optimization (PSO) is a search algorithm introduced for dealing with optimization problems [15]. The PSO

procedure commences with an initial swarm of particles in an n-dimensional space and evolves through a number of iter-

ations to find an optimal solution given a predefined objective function F . Each particle i is distinguished from others by its

position and velocity vectors, denoted by x i and vi, respectively. To choose a new velocity, each particle considers three com-ponents: its previous velocity, a personal best position and a global best position. The personal best and global best positions,

denoted by x pi and x *, respectively, keep track of the best solutions obtained so far by the associated particle and the swarm.

Thus, the new velocity and position are updated as

v i t þ 1ð Þ ¼ w v iðt Þ þ c 1r 1 x pi ðt Þ À x iðt Þ

À Áþ c 2r 2 x Ãðt Þ À x iðt Þð Þ; ð11Þ

x i t þ 1ð Þ ¼ x iðt Þ þ v i t þ 1ð Þ; ð12Þ

where w indicates the impact of the previous history of velocities on the current velocity, c 1 and c 2 are cognitive and social

components, respectively, r 1 and r 2 are generated randomly using a uniform distribution in interval [0,1]. If minimizing the

objective function is of interest, the personal best position of particle i at iteration t can be provided by

x pi ðt þ 1Þ ¼

x pi ðt Þ if F x iðt þ 1Þð ÞP F x pi ðt ÞÀ Á;

x iðt þ 1Þ otherwise:( ð13Þ

A. Ahmadi et al. / Information Sciences 182 (2012) 169–183 171



Moreover, the global best position is updated as

x Ãðt þ 1Þ ¼ arg min x

p

iðt Þ

F ð x pi ðt ÞÞ; i ¼ 1; 2; . . . ;n: ð14Þ

Maximum number of iterations, number of iterations with no improvement, and minimum objective function criterion

are common strategies to terminate the PSO procedure [2]. The first strategy is taken into consideration hereafter in this

paper.

3.1. Single swarm clustering

In this approach, the position of the particle i is expressed by x i = (m1, . . . , mK )i or simply by x i = (M)i, where

M = (m1, . . . , mK ) and mk denotes the center of cluster k. In other words, each particle contains a representative for the center

of all clusters. The representation of particle position x i for three-cluster case (K = 3) is illustrated in Fig. 1.

To model the clustering problem as an optimization problem, it is required to formulate an objective function. Cluster

validity measures described in Section 2 can be considered as the objective function. By considering F ðm1; . . . ; mK Þ or

F ðMÞ as the required objective function, PSO algorithm can explore the search space to find the cluster centers.

When the dimensionality of the data increases and the number of clusters is large, the ability of the single swarm clus-

tering is not sufficient to traverse all of the search space. Instead, multiple cooperative particle swarms can be considered to

determine clusters’ centers [3].

3.2. Multiple cooperative swarms clustering

Multiple cooperative swarms clustering approach assumes that the number of swarms is equal to the number of clusters

and particles of each swarm are candidates for the corresponding cluster’s center. The procedure of multiple cooperative

swarms clustering is completed through two main phases: to distribute the search task among multiple swarms and to build

up a cooperation mechanism between swarms. More detailed description of the proposed distribution and cooperation strat-

egies is given next.

3.2.1. Distribution strategy

The core idea is to divide the search space into different divisions sk, k 2 [1,. . . , K ]. Each division sk is denoted by its center

zk and width Rk; i.e., sk = f (zk, Rk). To distribute the search space into different divisions, a super-swarm is used. The super-

swarm, which is a population of particles, aims to find the center of divisions zk, k 2 [1,. . . , K ]. Each particle of the super-

swarm is defined as (z

1

,. . .

, z

K

), where z

k

denotes the center of division k. By repeating the single swarm clustering procedureusing one of the mentioned cluster validity measures as the objective function, the centers of different divisions are obtained.

Then, the width of divisions are computed by

Rk ¼ akkmax

; k 2 ½1; . . . ; K ; ð15Þ

where a is a positive constant selected experimentally and kkmax is the square root of the biggest eigenvalue of data points

belonging to division k [3].

Fig. 1. Representation of particle position in single swarm clustering.


http://-/?-

http://-/?-



3.2.2. Cooperation strategy

After distributing the search space, each division is assigned to a swarm. That is, the number of swarms is equal to the

number of divisions, or clusters, and particles of each swarm are candidates for the corresponding cluster’s center. In this

stage, there is information exchange between swarms and each swarm knows the global best, the best cluster center, of

the other swarms obtained so far. Therefore, there is a cooperative search scheme where each swarm explores its related

division to find the best solution for the associated cluster center interacting with other swarms. The schematic representa-

tion of the multiple cooperative swarms is depicted in Fig. 2.

In the multiple cooperative clustering approach, particles of each swarm are required to optimize the following

problem:

min F ðMiÞ

s:t: : m1

i 2 s1; . . . ;mK i 2 sK ;

i 2 ½1; . . . ;n;

ð16Þ

where FðÁÞ denotes one of the cluster validity measures introduced in Section 2.

The search procedure using multiple swarms is performed in a parallel scheme. First, n different solutions for cluster cen-

ters are obtained using Eq. (16). The best solution is called a new candidate for the cluster centers, denoted by

M0 = (m01, . . . , m0K ). To update the cluster centers, the following rule is applied:

Mðnew

Þ ¼M0

if F ðM0Þ 6 F ðMðoldÞÞ;

MðoldÞ

otherwise;

(ð17Þ

where M = (m1, . . . , mK ). In other words, if the objective value of the new candidate for cluster centers ( M0) is smaller than

that of the former candidate (M(old)), the new solution is accepted; otherwise, it is rejected. The overall algorithm of multiple

swarms clustering is provided in Algorithm 1.

The PSO-based clustering approaches assume that the number of clusters is known in advance. In this paper, the notion of

stability analysis is used to extract the number of clusters for the underlying data.

4. Model order selection using stability approach

Determining the number of clusters in data clustering is known as a model order selection problem. There exist two main

stages in model order selection. First, a clustering algorithm should be chosen. Then, the model order needs to be extracted,

given a set of data [16].

Fig. 2. Schematic representation of multiple swarms. First, the cooperation between multiple swarms initiates and each swarm investigates its associated

division (a). When the particles of each swarm converge (b), the final solution for cluster centers is revealed.


http://-/?-

http://-/?-



Algorithm 1: Multiple cooperative swarms clustering

Stage 1: Distribute search space into K different divisions s1, . . . , sK

Obtain the center of divisions z1, . . . , zK

Obtain the width of divisions R1, . . . , RK

Stage 2: Cooperate till convergence

Explore the division by

– 1.1. Computing new positions and velocities of all particles of swarms– 1.2. Determining the fitness value of all particles using the associate cluster validity measure

– 1.3. Choosing a solution that minimizes the optimization problem provided in Eq. (16) and denoting it as the

new candidate for cluster centers (m01, . . . , m0K )

Update the cluster centers

– 2.1. If the objective value of the new candidates for centers of clusters (m01, . . . , m0K ) is smaller than that of

previous iteration, accept the new solution; otherwise, reject it

– 2.2. If termination criterion is achieved, stop; otherwise, continue this stage

Most of the clustering approaches assume that the model order is known in advance. Here, we employ stability analysis to

obtain the number of clusters when using the multiple cooperative swarms to cluster the underlying data. A description of

stability analysis is provided before explaining the core algorithm.

Stability concept is used to evaluate the robustness of a clustering algorithm. In other words, the stability measure indi-cates how much the results of the clustering algorithm are reproducible on other data drawn from the same source. Some

examples of stable and unstable clustering are shown in Fig. 3 when the aim is to cluster the presented data into two groups.

As can be seen in Fig. 3, data points shown in Fig. 3(a) provide a stable clustering solution in a sense that the same clus-

tering results are obtained by repeating a clustering algorithm several times. However, the data points illustrated in Fig. 3(b)

and (c) do not yield stable clustering solutions when two clusters are of interest. That is, different results are generated by

running the clustering algorithm a number of times. Each line in Fig. 3 presents a possible clustering solution for the corre-

sponding data. The reason for getting unstable clustering solutions in these cases is the inappropriate number of clusters. In

other words, stable results are obtained for these data sets by choosing a suitable number of clusters. The proper number of

clusters for these data are three and four, respectively.

As a result, one of the issues that affects the stability of the solutions produced by a clustering algorithm is the model

order. For example, by assuming a large number of clusters the algorithm generates random groups of data influenced by

the changes observed in different samples. On the other hand, by choosing a very small number of clusters, the algorithm

may compound separated structures together and return unstable clusters [16]. As a result, one can utilize the stability mea-sure for estimating the model order of the unlabeled data [4].

The multiple cooperative swarms clustering data requires a priori knowledge of the model order in advance. In order to

enable this approach to estimate the number of clusters, the stability approach is taken into consideration. This paper uses

the stability method introduced by Lange et al. for the following reasons:

it requires no information about the data,

it can be applied to any clustering algorithm,

it returns the correct model order using the notion of maximal stability.

The required procedure for model order selection using stability analysis is provided in Algorithm 2.

A more precise schematic description of this algorithm is depicted in Fig. 4. The goal is to get the true cluster centers,

denoted by (m1, . . . , mK ), for the given data Y . First, the underlying data is randomly divided into two halves Y 1 and Y 2. Multi-

ple cooperative swarms approach is used to cluster these two halves and the obtained solutions are shown by T 1 and T 2,respectively. Next, a classifier /(Y 1) is trained by using the first half of data and its associated labels ( Y 1,T 1).

(a) Stable (b) Unstable (c) Unstable

Fig. 3. Examples of stable and unstable clustering when two clusters are desired.




Algorithm 2: Model order selection using stability analysis

for k 2 [2 . . . K ] do

for r 2 [1. . . r max] do

– Randomly split the given data Y into two halves Y 1, Y 2– Cluster Y 1, Y 2 independently using an appropriate clustering approach; i.e., T 1 :¼ Ak(Y 1), T 2 :¼ Ak(Y 2)

– Use (Y 1, T 1) to train classifier /(Y 1) and compute T 02 ¼ /ðY 2Þ

– Calculate the distance of the two solutions T 2 and T 02 for Y 2; i.e., dr ¼ dðT 2; T 02Þ

– Again cluster Y 1, Y 2 by assigning random labels to points

– Extend random clustering as above, and obtain the distance of the solutions; i.e., dnr

end for

– Compute the stability stab(k) = meanr (d)

– Compute the stability of random clusterings stabrand(k) = meanr (dn)

– sðkÞ ¼ stabðkÞstabrand ðkÞ

end for

– Select the model order k* such that k* = arg mink{s(k)}

Now, the trained classifier can be used to determine the labels for Y 2, denoted by T 02. Consequently, there exist two dif-ferent labels for Y 2. The more similar the labels are, the more stable the results would be. The similarity of the obtained solu-

tions can be stated in terms of their associated distance. Accordingly, if the distance is low, it is said that the obtained results

are stable. As it is revealed from the algorithm, the explained procedure is repeated several times (r max) to ensure that the

reported results are not generated by random. Furthermore, we repeat the whole procedure for different values of K to ex-

tract a correct model order.

Next, the most important aspects of the model order selection algorithm are explained.

4.1. Classifier /(Y)

A set of labeled data is required for training a classifier /. The data set Y 1 and its clustering solution from algorithm

Ak, i.e., T 1 :¼ Ak(Y 1), can be used to establish a classifier. There are a vast range of classifiers that can be used for clas-

sification. In this paper,k

-nearest neighbor (KNN) classifier was chosen as it requires no assumption on the distribution

of data.

Fig. 4. The schematic description of the model order selection algorithm.




4.2. Distance of solutions provided by clustering and classifier for the same data

Having a set of training data, the classifier can be tested using a test data Y 2. Its solution is given by T 02 ¼ /ðY 2Þ. But, there

exists another solution for the same data obtained from the multiple cooperative swarms clustering technique, i.e.,

T 2 :¼ Ak(Y 2). The distance of these two solutions is calculated by

d T 2; T 02À Á ¼ arg min

x2qkX

N

i¼1

# xðt 2iÞ – t 02iÈ É; ð18Þ

where

# t 2i – t 02i

È É¼

1 if t 2i – t 02i;

0 otherwise:

&ð19Þ

Also, qk contains all permutations of k labels and x is the optimal permutation which produces the maximum agreement

between two solutions [16].

2 3 4 5 6 7

14

16

18

20

22

24

k

s t a b

( k )

(a) Excluding random clustering

2 3 4 5 6 7

0.54

0.56

0.58

0.6

0.62

0.64

k

s

( k )

(b) Including random clustering

Fig. 5. The effect of the random clustering on the selection of the model order.

Table 1

Data sets selected from UCI machine learning repository.

Data set Classes Samples Dimensionality

Iris 3 150 4

Wine 3 178 13

Teaching assistant evaluation (TAE) 3 151 5

Breast cancer 2 569 30

Zoo 7 101 17

Glass identification 7 214 9

Diabetes 2 768 8




1. Speech data 2. Iris data

0 20 40 60 80 100

−1.5

−1

−0.5

0

0.5

1

1.5

2

Iterations

T u r i i n d e x

K−means

Single swarm

Multiple swarms

0 20 40 60 80 100

−0.5

0

0.5

1

1.5

2

Iterations

T u r i i n d e x

K−means

Single swarm

Multiple swarms

3. Wine data 4. Teaching assistant evaluation data

0 20 40 60 80 100−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Iterations

T u r i i n d e x

K−means

Single swarm

Multilpe swarms

0 20 40 60 80 100−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Iterations

T u r i i n d e x

K−means

Single swarmMultiple swarms

5. Breast cancer data 6. Zoo data

0 20 40 60 80 100

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Iterations

T u r i i n d e x

K−means

Single swarm

Multiple swarms

0 20 40 60 80 100

−6

−4

−2

0

2

4

6

8

10

Iterations

T u r i i n d e x

K−means

Single swarm

Multiple swarms

7. Glass data 8. Diabetes data

0 20 40 60 80 100

−6

−4

−2

0

2

4

6

8

10

Iterations

T u r i i n d e x

K−means

Single swarm

Multiple swarms

0 20 40 60 80 100

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Iterations

T u r i i n d e x

K−means

Single swarm

Multiple swarms

Fig. 6. Comparing the performance of the multiple cooperative swarms clustering with k-means and single swarm clustering in terms of Turi’s index.




4.3. Random clustering

The stability measure depends on the number of classes or clusters. For instance, the accuracy rate of 50% for binary clas-

sification is more or less the same as that of a random guess. However, this rate for k = 10 is much better than a random

predictor. In other words, if a clustering approach outcomes the same accuracy for model orders k1 and k2, where k1 < k2,

the clustering solution for k2 is more reliable than the other solution. Hence, the primary stability measure obtained for a

certain value k, stab(k) in Algorithm 2, should be normalized using the stability rate of a random clustering stabrand(k)

[16]. Therefore, the final stability measure for the model order k is obtained as follows:

sðkÞ ¼stabðkÞ

stabrandðkÞ

& ': ð20Þ

The effect of the random clustering is studied on the performance of the Zoo data set provided in Section 5 to determine

the model order of the data using k-means algorithm. The stability measure for different number of clusters with and with-

out using random clustering is shown in Fig. 5.

As depicted in Fig. 5, the model order of the zoo data using k-means clustering is recognized as 2 without considering

random clustering, while it becomes 6, which is close to the true model order, by normalizing the primary stability measure

by the stability of the random clustering.

4.4. Appropriate clustering approach

For a given data set, the algorithm does not provide the same result for multiple runs. Moreover, the model order is highlydependent on the type of appropriate clustering approach that is used in this algorithm (see Algorithm 2), and there is no

specific emphasis in the work of Lange et al. [16] on the type of clustering algorithm that should be used. K -means and k-

harmonic means algorithms are either sensitive to the initial conditions or to the type of data. In other words, they cannot

capture true underlying patterns of the data, and consequently the estimated model order is not robust. However, PSO-based

clustering methods such as single swarm or multiple cooperative swarms clustering do not rely on initial conditions, and

they are the search schemes which can explore the search space more effectively and may escape from local optimums.

Moreover, as described earlier, multiple cooperative swarms clustering is more probable to get the optimal solution as com-

pared with single swarm clustering and it can provide more stable and robust solutions.

Therefore, the multiple cooperative swarms approach distributes the search space among multiple swarms and enables

cooperation between swarms, leading to an effective search strategy. Accordingly, we propose to use multiple cooperative

swarms clustering in stability analysis-based approach to find the model order of the given data.

5. Experimental results

The performance of the proposed approach is evaluated and compared with other approaches such as single swarm clus-

tering, k-means and k-harmonic means clustering using eight different data sets, seven of which are selected from the UCI

machine learning repository [6], and the last one being a speech data set is taken from the standard TIMIT corpus [17]. The

name of data sets chosen from UCI machine learning repository, their associated number of classes, samples and dimensions

are provided in Table 1.

Also, the speech data include four phonemes: /aa/, /ae/, /ay/ and /el/, from the TIMIT corpus. A total of 800 samples from

these classes was selected, and twelve mel-frequency cepstral coefficients [20] have been considered as speech features.

Table 2

Average and standard deviation of different measures for speech data.

Method Turi’s index Dunn’s index S_Dbw

K -means 0.8328 [0.8167] 0.0789 [0.0142] 3.3093 [0.327]

K -harmonic means 3.54e05 [2.62e05] 0.0769 [0.0001] 3.3242 [0.0001]

Single swarm À1.4539 [0.8788] 0.1098 [0.014] 1.5531 [0.0372]

Cooperative swarms À1.6345 [1.0694] 0.1008 [0.0153] 1.583 [0.0388]

Table 3

Average and standard deviation of different measures for iris data.


K -means 0.4942 [0.3227] 0.1008 [0.0138] 3.0714 [0.2383]


Single swarm À0.8802 [0.4415] 0.3979 [0.0001] 1.4902 [0.0148]



http://-/?-

http://-/?-



The performance of the multiple cooperative swarms clustering approach is compared with k-means and single swarm

clustering techniques in terms of Turi’s validity index over 80 iterations (Fig. 6). The results are obtained by repeating the

algorithms 30 independent times. For these experiments, the parameters are considered as w = 1.2 (decreasing gradually

[2]), c 1 = 1.49, c 2 = 1.49, n = 30 (for all swarms). Also, the number of clusters is considered to be equal to the number of

classes.

Table 4Average and standard deviation of different measures for wine data.


K -means 0.2101 [0.3565] 0.016 [0.006] 3.1239 [0.4139]


Single swarm À0.3669 [0.4735] 0.1122 [0.0213] 1.3843 [0.0026]


Table 5

Average and standard deviation of different measures for TAE data.


K -means 0.6329 [0.7866] 0.0802 [0.0306] 3.2321 [0.5205]

K -harmonic means 1.36e06 [1.23e06] 0.123 [0.0001] 2.7483 [0.0001]Single swarm À0.5675 [0.6525] 0.1887 [0.0001] 1.4679 [0.0052]


Table 6

Average and standard deviation of different measures for breast cancer data.


K -means 0.1711 [0.1996] 0.0173 [0.0001] 2.1768 [0.0001]


Single swarm À0.62 [0.7997] 217.59 [79.079] 1.7454 [0.079]


Table 7

Average and standard deviation of different measures for zoo data.


K -means 0.8513 [1.0624] 0.2228 [0.0581] 2.5181 [0.2848]

K -harmonic means 1.239 [1.5692] 0.3168 [0.0938] 2.3048 [0.1174]

Single swarm À5.5567 [3.6787] 0.5427 [0.0165] 2.0528 [0.0142]


Table 8

Average and standard deviation of different measures for glass identification data.


K -means 0.7572 [0.9624] 0.0286 [0.001] 2.599 [ 0.2571]


Single swarm À4.214 [3.0376] 0.1877 [0.0363] 2.6797 [0.3372]


Table 9

Average and standard deviation of different measures for diabetes data.


K -means 0.243 [0.3398] 0.0137 [0.0001] 2.297 [ 0.0001]


Single swarm À0.2203 [0.2621] 1298.1 [0.0001] 1.5202 [0.027]





As illustrated in Fig. 6, multiple cooperative swarms clustering provides better results as compared with k-means, as well

as single swarm clustering approaches, in terms of Turi’s index for a majority of the data sets.

In Tables 2–9, the multiple cooperative swarms clustering is compared with other clustering approaches using different

cluster validity measures over 30 independent runs. The results presented for different data sets are in terms of average and

standard deviation ([r]) values.

As observed in Tables 2–9, multiple swarms clustering is able to provide better results in terms of the different cluster

validity measures for most of the data sets. This is because it is capable of manipulating multiple-objective problems, in

Speech data Iris data Wine data TAE data

K -means

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

k

s

( k )

2 3 4 5 6 7

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

k

s

( k )

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

0.4

k

s

( k )

2 3 4 5 6 7

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

k

s

( k )

K -harmonic means

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

0.4

k

s

( k )

2 3 4 5 6 7

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

k

s

( k )

2 3 4 5 6 7

0.3

0.35

0.4

0.45

0.5

k

s

( k )

2 3 4 5 6 7

0.25

0.3

0.35

0.4

0.45

0.5

0.55

k

s

( k )

Fuzzy c-means

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

k

s

( k )

2 3 4 5 6 7

0.05

0.1

0.15

0.2

0.25

0.3

0.35

k

s

( k )

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

0.4

k

s

( k )

2 3 4 5 6 7

0.2

0.25

0.3

0.35

0.4

0.45

k

s

( k )

Single swarm

2 3 4 5 6 7

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

k

s

( k )

2 3 4 5 6 7

0.4

0.41

0.42

0.43

0.44

0.45

0.46

0.47

0.48

k

s

( k )

2 3 4 5 6 7

0.43

0.44

0.45

0.46

0.47

0.48

0.49

0.5

0.51

k

s

( k )

2 3 4 5 6 7

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

k

s

( k )

Multiple swarms

2 3 4 5 6 7

0.58

0.59

0.6

0.61

0.62

0.63

0.64

0.65

k

s

( k )

2 3 4 5 6 7

0.45

0.5

0.55

0.6

0.65

0.7

0.75

k

s

( k )

2 3 4 5 6 7

0.35

0.4

0.45

0.5

0.55

0.6

0.65

k

s

( k )

2 3 4 5 6 7

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

0.68

k

s

( k )

Fig. 7. Stability measure as a function of model order: speech, iris, wine and TAE data sets.




contrast to k-means and k-harmonic means clustering, and it distributes the search space between multiple swarms and

solves the problem more effectively.

Now, the stability-based approach for model order selection in multiple cooperative swarms clustering is studied. The

PSO parameters are kept the same as before, and r max = 30 and k is considered to be 25 for KNN classifier. The stability mea-

sures of different model orders for the multiple cooperative swarms and other clustering approaches using different data sets

are presented in Figs. 7 and 8. The results for speech, iris, wine and teaching assistant evaluation data sets are provided in

Fig. 7 and for the last four data sets including breast cancer, zoo, glass identification and diabetes are shown in Fig. 8. In these

Breast cancer data Zoo data Glass data Diabetes data

K -means

2 3 4 5 6 7

0.05

0.1

0.15

0.2

0.25

0.3

0.35

k

s

( k )

2 3 4 5 6 7

0.54

0.56

0.58

0.6

0.62

0.64

k

s

( k )

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

k

s

( k )

2 3 4 5 6 7

0.15

0.2

0.25

0.3

0.35

0.4

0.45

k

s

( k )

K -harmonic means

2 3 4 5 6 7

0.4

0.45

0.5

0.55

0.6

0.65

k

s

( k )

2 3 4 5 6 7

0.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

k

s

( k )

2 3 4 5 6 7

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

k

s

( k )

2 3 4 5 6 7

0.35

0.4

0.45

0.5

0.55

k

s

( k )

Fuzzy c-means

2 3 4 5 6 7

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

k

s

( k )

2 3 4 5 6 7

0.1

0.2

0.3

0.4

0.5

0.6

0.7

k

s

( k )

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

k

s

( k )

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

k

s

( k )

Single swarm

2 3 4 5 6 7

0.2

0.22

0.24

0.26

0.28

0.3

0.32

k

s

( k )

2 3 4 5 6 7

0.56

0.57

0.58

0.59

0.6

0.61

0.62

0.63

0.64

k

s

( k )

2 3 4 5 6 7

0.25

0.3

0.35

0.4

0.45

k

s

( k )

2 3 4 5 6 70.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

k

s

( k )

Multiple swarms

2 3 4 5 6 7

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

k

s

( k )

2 3 4 5 6 7

0.46

0.48

0.5

0.52

0.54

0.56

k

s

( k )

2 3 4 5 6 7

0.25

0.3

0.35

0.4

0.45

k

s

( k )

2 3 4 5 6 7

0.1

0.15

0.2

0.25

0.3

0.35

0.4

k

s

( k )

Fig. 8. Stability measure as a function of model order: breast cancer, zoo, glass identification and diabetes data sets.




figures, k and s(k) indicate model order and stability measure for the given model order k, respectively. The corresponding

curves for single swarm and multiple swarms clustering approaches are obtained using Turi’s validity index.

According to Figs. 7 and 8, the proposed approach using multiple cooperative swarms clustering is able to identify the

correct model order for most of the data sets. Moreover, the best model order for different data sets can be obtained as pro-

vided in Table 10. The minimum value for stability measure given any clustering approach is considered as the best model

order (k*), i.e.

k

Ã

¼ arg mink fsðkÞg: ð21Þ

As presented in Table 10, k-means, k-harmonic means and fuzzy c -means clustering approaches do not converge to the

true model order using the stability-based approach for the most of the data sets. The performance of the single swarm clus-

tering is partially better than that of k-means, k-harmonic means and fuzzy c -means clustering because it does not depend

on initial conditions and it can escape trapping in local optimal solutions. Moreover, the multiple cooperative swarms ap-

proach using Turi’s index provides the true model order for majority of the data sets. As a result, Turi’s validity index is

appropriate for the model order selection using the proposed clustering approach. Its performance, based on Dunn’s index

and S_Dbw index, is also considerable as compared to the other clustering approaches. Subsequently, the proposed multiple

cooperative swarms can provide better estimates for model order, as well as stable clustering results as compared to the

other clustering techniques by using the introduced stability-based approach.

6. Conclusion

A new bio-inspired multiple cooperative swarms algorithm was described to deal with clustering problem. The stability

analysis-based approach was introduced to estimate the model order of the multiple cooperative swarms clustering. We pro-

posed to use multiple cooperative swarms clustering to find the model order of the data, due to its robustness and stable

solutions. The performance of the proposed approach has been evaluated using eight different data sets. The experiments

indicate that the proposed approach produces better results as compared with the k-means, k-harmonic means, fuzzy c -

means and single swarm clustering approaches. In the future, we will investigate other similarity measures as Euclidean dis-

tance works well when a dataset contains compact or isolated clusters. Furthermore, we will study other stability measures

since the used measure requires a considerable computational burden to discover a suitable model order.

References

[1] A. Abraham, C. Grosan, V. Ramos (Eds.), Swarm Intelligence in Data Mining, Springer, 2006.

[2] A. Abraham, H. Guo, H. Liu, Swarm Intelligence: Foundations, Perspectives and Applications, Studies in Computational Intelligence, Springer-Verlag,Germany, 2006.

[3] A. Ahmadi, F. Karray, M. Kamel, Multiple cooperating swarms for data clustering, in: IEEE Swarm Intelligence Symposium, 2007, pp. 206–212.

[4] A. Ahmadi, F. Karray, M. Kamel, Model order selection for multiple cooperative swarms clustering using stability analysis, in: IEEE Congress on

Evolutionary Computation within IEEE World Congress on Computational Intelligence, Hong Kong, 2008, pp. 3387–3394.

[5] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1981.

[6] C. Blake, C. Merz, UCI Repository of Machine Learning Databases. <http://www.ics.uci.edu/mlearn/MLRepository.html> , 1998.

[7] C. Chen, F. Ye, Particle swarm optimization algorithm and its application to clustering analysis, in: IEEE International Conference on Networking,

Sensing and Control, 2004, pp. 789–794.

[8] X. Cui, J. Gao, T.E. Potok, A flocking based algorithm for document clustering analysis, Journal of Systems Architecture 52 (8-9) (2006) 505–515.

[9] R. Duda, P. Hart, D. Stork, Pattern Classification, John Wiley and Sons, 2000.

[10] J. Dunn, A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters, Cybernetics 3 (1973) 32–57.

[11] M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques, Intelligent Information Systems 17 (2–3) (2001) 107–145.

[12] M. Halkidi, M. Vazirgiannis, Clustering validity assessment: finding the optimal partitioning of a data set, in: International Conference on Data Mining,

2001, pp. 187–194.

[13] A. Jain, M. Murty, P. Flynn, Data clustering: a review, ACM Computing Surveys 31 (3) (1999) 264–323.

[14] M. Kazemian, Y. Ramezani, C. Lucas, B. Moshiri, Swarm Clustering Based on Flowers Pollination by Artificial Bees, Swarm Intelligence in Data Mining,

Springer, 2006. pp. 191–202.[15] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International Conference on Neural Networks, vol. 4, 1995, pp. 1942–1948.

Table 10

The best model order (k*) for data sets.

Data set Real model order KM KHM FCM Single swarm Multiple swarms

Turi Dunn S_Dbw Turi Dunn S_Dbw

Speech 4 2 2 2 7 2 2 4 2 2

Iris 3 2 3 2 7 2 2 3 4 2

Wine 3 2 7 2 4 4 2 3 5 3

TAE 3 2 2 2 3 2 2 4 2 2

Breast cancer 2 2 3 2 2 2 2 2 2 2

Zoo 7 6 2 4 4 2 2 6 2 2

Glass 7 2 3 2 4 2 2 7 2 2

Diabetes 2 2 3 4 2 2 2 2 2 2


http://www.ics.uci.edu/mlearn/MLRepository.html

http://www.ics.uci.edu/mlearn/MLRepository.html



[16] T. Lange, V. Roth, M. Braun, J. Buhmann, Stability-based validation of clustering solutions, Neural Computing 16 (2004) 1299–1323.

[17] National Institute of Standards and Technology, TIMIT Acoustic-Phonetic Continuous Speech Corpus, Speech Disc 1-1.1, NTIS Order No. PB91-

5050651996, 1990.

[18] M. Omran, A. Engelbrecht, A. Salman, Particle swarm optimization method for image clustering, International Journal of Pattern Recognition and

Artificial Intelligence 19 (3) (2005) 297–321.

[19] M. Omran, A. Salman, A. Engelbrecht, Dynamic clustering using particle swarm optimization with application in image segmentation, Pattern Analysis

and Applications 6 (2006) 332–344.

[20] M. Seltzer, Sphinx III signal processing front end specification, Technical Report, CMU Speech Group, 1999.

[21] R. Turi, Clustering-based colour image segmentation, Ph.D. Thesis, Monash University, Australia, 2001.

[22] D. van der Merwe, A. Engelbrecht, Data clustering using particle swarm optimization, in: Proceeding of IEEE Congress on Evolutionary Computation,

vol. 1, 2003, pp. 215–220.[23] X. Xiao, E. Dow, R. Eberhart, Z. Miled, R. Oppelt, Gene clustering using self-organizing maps and particle swarm optimization, in: IEEE Procedings of

International Parallel Processing Symposium, 2003, p. 10.

[24] F. Ye, C. Chen, Alternative kpso-clustering algorithm, Tamkang Journal of Science and Engineering 8 (2) (2005) 165–174.

[25] B. Zhang, M. Hsu, K -harmonic means: a data clustering algorithm, Technical Report, Hewlett-Packard Labs, HPL-1999-124.


model order selection for multiple cooperative swarms clustering

Documents