1-s2.0-s1568494614006334-main.pdf
TRANSCRIPT
-
Applied Soft Computing 28 (2015) 301311
Contents lists available at ScienceDirect
Applied Soft Computing
j ourna l h o mepage: www.elsev ier .com/ locate /asoc
Ant Co ho
Tlin Inka Uludag Univeb Middle East T y
a r t i c l
Article history:Received 28 FeReceived in reAccepted 17 NAvailable onlin
Keywords:ClusteringAnt Colony OpMultiple objecData set reduc
ring arbithallen
set ess a
localustea novti-ob: neigata per clu
ACO-C outperforms the competing approaches. The multi-objective evaluation mechanism relative tothe neighborhoods enhances the extraction of the arbitrary-shaped clusters having density variations.
2014 Elsevier B.V. All rights reserved.
1. Introdu
Cluster points intoally considthe charactnot knowning processa priori infshapes andsuch clustetems [2], coaddition, clmation abosemiconduc
We consaim is to obtthe best of that works Therefore, w
CorresponE-mail add
(S. Kayalgil), n
http://dx.doi.o1568-4946/ ction
analysis is the organization of a collection of data clusters based on similarity [1]. Clustering is usu-ered as an unsupervised classication task. That is,eristics of the clusters and the number of clusters are
a priori, and they are extracted during the cluster-. In this work we focus on spatial data sets in whichormation about the data set (the number of clusters,
densities of the clusters) is not available. Findingrs has applications in geographical information sys-mputer graphics [3], and image segmentation [4]. In
usters of spatial defect shapes provide valuable infor-ut the potential problems in manufacturing processes oftors [5,6].ider spatial clustering as an optimization problem. Ourain compact, connected and well-separated clusters. Toour knowledge, there is not a single objective functionwell for any kind of geometrical clustering structure.e rst introduce two solution evaluation mechanisms
ding author. Tel.: +90 224 2942605; fax: +90 224 2941903.resses: [email protected] (T. Inkaya), [email protected]@metu.edu.tr (N.E. zdemirel).
for measuring the quality of a clustering solution. The main ideabehind both mechanisms is similar, and each mechanism is basedon two objectives: adjusted compactness and relative separation.The rst objective measures the compactness and connectivity of aclustering solution, and the second objective is a measure for sep-aration. The difference between the two mechanisms is the degreeof locality addressed in the calculations. The main advantage ofthese objectives is that the length of an edge is evaluated relatively,that is, it is scaled relative to the lengths of other edges within itsneighborhood. This scaling permits us to evaluate the quality ofthe clustering solution independent of the shape and density of theclusters.
We implement the proposed solution evaluation mechanisms ina clustering framework based on Ant Colony Optimization (ACO).In order to nd the target clusters, we use two complementaryobjective functions (adjusted compactness and relative separa-tion) in a multiple-objective context. Hence, the output of ACO-Cis a set of non-dominated solutions. Different from the literature,we are not interested in nding all non-dominated solutions orthe entire Pareto efcient frontier. ACO-C has two pre-processingsteps: neighborhood construction and data set reduction. Neigh-borhood construction extracts the local connectivity, proximity anddensity information inherent in the data set. Data set reductionhelps reduce the storage requirements and processing time for theclustering task. Our experimental results indicate that ACO-C nds
rg/10.1016/j.asoc.2014.11.0602014 Elsevier B.V. All rights reserved.lony Optimization based clustering met
ayaa,, Sinan Kayalgil b, Nur Evin zdemirelb
rsity, Industrial Engineering Department, Grkle, 16059 Bursa, Turkeyechnical University, Industrial Engineering Department, C ankaya, 06800 Ankara, Turke
e i n f o
bruary 2014vised form 31 August 2014ovember 2014e 13 December 2014
timizationtivestion
a b s t r a c t
In this work we consider spatial clusteis unknown, and clusters may have ing methodology addresses several cneighborhood construction, and datafunctions, namely adjusted compactnclustering solution with respect to thesure the quality of a wide range of ctwo objective functions we present tion (ACO-C). ACO-C works in a mulACO-C has two pre-processing stepsextracts the local characteristics of dthe proposed methodology with othdology
problem with no a priori information. The number of clustersrary shapes and density differences. The proposed cluster-ges of the clustering problem including solution evaluation,reduction. In this context, we rst introduce two objectivend relative separation. Each objective function evaluates thel characteristics of the neighborhoods. This allows us to mea-ring solutions without a priori information. Next, using theel clustering methodology based on Ant Colony Optimiza-
jective setting and yields a set of non-dominated solutions.hborhood construction and data set reduction. The former
oints, whereas the latter is used for scalability. We comparestering approaches. The experimental results indicate that
-
302 T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311
the arbitrary-shaped clusters with varying densities effectively,where the number of clusters is unknown.
Our contributions to the literature are as follows:
1. The proptify the qclusters wuse of thThey canbased clu
2. The propunied fa priori inism, exclusterin
3. ACO-C iswhich th the num cluster there m differe
We provduces the sare explainmance of ACfactorial dealgorithms.
2. Related
The clushierarchicaulated annswarm optisive review
In this seevaluation
2.1. Solutio
A good cthat are wfying and connectivittask. We rliterature ubased apprapproaches
Partitionminimizatiopoints, or mpoints and medoids [1to be givenellipsoid sh
In ordeclusters anproposed. maximum mum/maxi[12,13]. A tytrated in Fithe spiral c
clusters. In this case elimination of the longest edge causes divisionof the spiral clusters.
Another research stream in solution evaluation makes use ofcluster validity indices. Validity indices are used to quantify the
of as in me rc algoe a ceset inhapele isy DBnt Mters wheneparimplis acget s, Dun
ulik alusteted y Daand to belusteactinpes, e m
in h a cluce a sulti-l objA [26d onlustes th
usteratic kry almplclusnon-ing wemegate
clusze Xy, an
limiay [3on tand i
in re hs.re aing sd [37sterinex, Sves aosed solution evaluation mechanisms allow us to quan-uality of a clustering solution having arbitrary-shapedith different densities in an optimization context. The
ese evaluation mechanisms is not restricted to ACO. be used in other metaheuristics and optimization-stering approaches.osed ACO-based methodology introduces a general,ramework for the spatial clustering problem withoutnformation. It includes the solution evaluation mecha-traction of local properties, data set reduction, and theg task itself.
a novel methodology for the clustering problem inere is no a priori information, that is,ber of clusters is unknown,
s may have arbitrary shapes,ay be density variations within the clusters, and
nt clusters may have density differences.
ide the related literature in Section 2. Section 3 intro-olution evaluation mechanisms. The details of ACO-Ced in Section 4. Section 5 is about the empirical perfor-O-C. First, we set the algorithm parameters using a full
sign. Then, we compare ACO-C with some well-known Finally, we conclude in Section 6.
literature
tering algorithms can be classied into partitional,l, density-based algorithms, and metaheuristics (sim-ealing, tabu search, evolutionary algorithms, particlemization, ACO, and so on). [1,7,8] provide comprehen-s of clustering approaches.ction, we present the related literature on the solution
mechanisms and ant-based clustering algorithms.
n evaluation mechanisms
lustering solution has compact and connected clustersell-separated from each other. However, quanti-measuring the clustering objectives (compactness,y and separation) for a data set is not a trivialeview the solution evaluation mechanisms in thender four categories: partitional approaches, graph-oaches, clustering validity indices, and multi-objective.al approaches consider objective functions such asn of total variance/distance between all pairs of datainimization of total variance/distance between data
a cluster representative such as k-means [9,10] or k-1]. In these approaches, the number of clusters needs
as input, and the resulting clusters have spherical orapes in general.r to handle the data sets with arbitrary-shapedd density variations, graph-based approaches areObjective functions used are minimization of theedge length in a cluster, maximization of the mini-mum/average distance between two clusters, and so onpical complication for such objective functions is illus-g. 1(a). In Fig. 1(a) the maximum edge length withinlusters is larger than the distance between these two
qualityclusterters, sogenetiassuma data trary sexampated bdiffereof clusfound mum sindex (0.31) the tarHenceset.
Mathree csimulanamelindex, found ber of cto extrcal shadistanccultywithin
Sinters, mseveraVIENNis baseintra-crequireMO clautomlutionatwo cober of set of of ndImprovinvestiers theoptimineouslis alsopadhybased index indicesmeasucluster
Theclustermethoies cluXB indobjecti clustering solution and to determine the number ofa data set [14,15]. In an effort to nd the target clus-esearchers use validity indices as objective functions inrithms [1621]. However, most of the validity indicesrtain geometrical structure in the cluster shapes. Whencludes several different cluster structures, such as arbi-s and density differences, these indices may fail. An
provided in Fig. 1(b). The clustering solutions gener-SCAN [22] are evaluated using Dunn index [23] withinPts settings (within a range of 115). The numberfound with each setting is shown, e.g. 30 clusters are
MinPts is set to two. Dunn index measures the mini-ation to maximum compactness ratio, so a higher Dunnies better clustering. Although the highest Dunn indexhieved for the solutions with two and four clusters,olution has three clusters with a Dunn index of 0.09.n index is not a proper objective function for such a data
nd Bandyopadhyay [24] evaluates the performance ofring algorithms, namely k-means, single-linkage, and
annealing (SA) by using four cluster validity indices,vies-Bouldin index, Dunn index, Calinski-Harabaszindex I. Compared to other validity indices, index I is
more consistent and reliable in nding the correct num-rs. However, the four cluster validity indices are limitedg spherical clusters only. To handle different geometri-Bandyopadhyay et al. [25] uses a point symmetry-basedeasure in a genetic algorithm. The algorithm has dif-andling asymmetric clusters and density differencesster.ingle objective is often unsuitable to extract target clus-objective (MO) approaches are considered to optimizeectives simultaneously. To the best of our knowledge,] is the rst multi-objective clustering algorithm, which
PESA [27]. It optimizes two objective functions, totalr variance and connectedness. However, the algorithme target number of clusters. One of the well-knowning algorithms is the multi-objective clustering with-determination (MOCK) [28]. MOCK is based on evo-gorithms, and uses compactness and connectedness asementary objective functions. It can detect the num-ters in the data set. The output of the algorithm is adominated clustering solutions. However, it is capableell-separated clusters having hyperspherical shapes.
nts in this algorithm and its applications have beend [29,30]. Saha and Bandyopadhyay [31] also consid-tering problem in a multi-objective framework. Theyie-Beni (XB) index [32] and Sym-index [21] simulta-d introduce a multi-objective SA algorithm. This workted to nding symmetric clusters. Saha and Bandyo-3] proposes several connectivity-based validity indiceshe relative neighborhood graph. In addition to Sym-ndex I, [34] uses one of the connectivity-based validity[33] as the third objective. Adding this connectivityelps extraction of arbitrary shapes and asymmetric
re additional solution approaches proposed for MOuch as differential evolution [35,36], immune-inspired], and particle swarm optimization [38]. In these stud-g objectives are either cluster validity indices such as
ym-index and FCM index, or compactness-connectivitys in [28].
-
T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311 303
0
1
2
3
4
5
6
7
8
9
10
1 2
0.3
0.35
56
4 2 2 2 2 2 2 2 2 2 2 4y
s gene
To the bACO to MO colonies wotive functioof clusters ialgorithm u
2.2. ACO an
ACO wasiors of realdeposit a sutration of ththe food sointeracting tion construdesign choi
In the clrithms havant-based athis study, wapproachesties, and oth
ACO-basIn these stusidered as required a ppoint to a ccluster is rethe pheromvalue of thecapable of There are aKuo et al. [5bilistic centa hybridizathis approaACO directsvalues. AnoPSO is [55].
inits is rheric
apprm anction, anies. Aints tions on thork s pse
to is mtensi]. Fort an
0 1 2 3 4 5 6 7 8 9 10 00
0.05
0.1
0.15
0.2
0.25
Dunn
inde
x
x
(a)Fig. 1. (a) Example data set. (b) Dunn index values for the clustering solution
est of our knowledge, [39] is the only study that appliesclustering problem. In this algorithm, there are two antrking in parallel. Each colony optimizes a single objec-n, either compactness or connectedness. The numbers required as input. In addition, they test the proposedsing the Iris data set only.
d ant-based clustering algorithms
introduced by [4042]. It is inspired from the behav- ants. As ants search for food on the ground, theybstance called pheromone on their paths. The concen-e pheromone on the paths helps direct the colony to
urces. Ant colony nds the food sources effectively bywith the environment. Solution representation, solu-ction and pheromone update mechanisms are the mainces of ACO.ustering literature, several ant-based clustering algo-
used toclusterand sp
The[56] fotive fufunctioactivitthe pooperatbased ants wing thiappliedizes ththe ex[55,56pendene been proposed. For a comprehensive review aboutnd swarm-based clustering one can refer to [43]. Ine categorize the related studies into three: ACO-based
, approaches that mimic ants gathering/sorting activi-er ant-based approaches.ed approaches [4452] are built upon the work of [42].dies the total intra-cluster variance/distance is con-
the objective function, and the number of clusters isriori. An ant constructs a solution by assigning a dataluster. The desirability of assigning a data point to apresented by the amount of pheromone. Ants updateone in an amount proportional to the objective function
solution they generate. The proposed algorithms arending the clusters with spherical and compact shapes.lso hybrid algorithms using ACO [5355]. For instance,3] modies the k-means algorithm by adding a proba-roid assignment procedure. Huang et al. [54] introducestion of ACO and particle swarm optimization (PSO). Inch, PSO helps optimize continuous variables, whereas
the search to promising regions using the pheromonether hybrid algorithm that combines k-means, ACO andIn [55] the cluster centers obtained by ACO and PSO are
the local simdistance.
Other anants. Azzagto build a dpoints and [66,67] antsPheromonecloser the dpheromonebetween simcentration. a hierarchicclustering apheromonewith highenumber of average-linAnother anoptimizatiobehavior ofcolony. Giv3 4 5 6 7 8 9 10 11 12 13 14 15 16
30
12
MinPts
(b)rated by DBSCAN with different MinPts settings.
ialize k-means. In these hybrid studies the number ofequired as input, and the resulting clusters are compactal.roaches that mimic ants gathering/sorting activitiesnother research stream. ACO uses an explicit objec-n whereas these approaches have an implicit objectived clusters emerge as the result of the gathering/sortingn ant picks up a point in the space and drops it off nearhat are similar to it. These picking up and dropping offare performed using the probabilities that are calculatede similarity of the points in the neighborhood. Hence,as if they are forming a topographic map. After form-udo-topographic map, a cluster retrieval operation isnd the nal clusters. Lumer and Faieta [57] general-ethod for exploratory data analysis. [58,48,5964] areons and modications of the algorithms proposed by
instance, Yang and Kamel [59] uses parallel and inde-t colonies aggregated by a hypergraph model. In [64]
ilarity of a point is measured by entropy rather than
t-based approaches use the emergent behavior of the et al. [65] introduces a hierarchical ant-based algorithmecision tree. Ants move a data point close to the similaraway from the dissimilar ones on the decision tree. In
generate tours by inserting edges between data points. is updated for each edge connecting a pair of points. Theistance between two points, the larger the amount of
released. In the rst phase of the algorithm, the edgesilar points become denser in terms of pheromone con-
The next phase is the cluster retrieval process by usingal clustering algorithm. Ghosh et al. [68] introduces approach based on aggregation pheromone. Aggregation
leads the data points to accumulate around the pointsr pheromone density. In order to obtain the desiredclusters, merging operations are performed using thekage agglomerative hierarchical clustering algorithm.t-based clustering algorithm is the chaotic ant swarmn proposed by Wan et al. [69]. It combines the chaotic
a single ant and self-organizing behavior of the anten the number of clusters, the proposed approach
-
304 T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311
optimizes the total intra-cluster variation. Although it providessome improvement over PSO and k-means, the resulting clustersare still spherical.
3. How to
Our aimclusters. Fotion mecha(CERN) andhood (WCE
3.1. Cluster
3.1.1. AdjusThis obj
tivity and re
(a) ConnecneighboThen, ware sevenearest[70], andensityIt providgeneratthe databe usedwe use individu
Let Cmp, respesure p iscase, coter m am is caltotal nuclosure take conby suchclosuresvalue ofclusters
(b) Relativecluster hood. Inin the min whicthe graptogethecient wedges inness of
r comp
where (i,j) is the edge between points i and j, dij is the Euclideandistance between points i and j, MSTm and MSTm(i) are the setsof edges in the MST of the points in cluster m and in the neigh-
rhood
en tmpr(decd ccm/cfoun
Relatood ne s. Le
argm m is
=
ove
inimepara
eightN)
ERN culaten CEht f. He
m ar
wm
=
evaluate a clustering solution?
is to obtain compact, connected and well-separatedr this purpose, we introduce two solution evalua-nisms: Clustering Evaluation Relative to Neighborhood
Weighted Clustering Evaluation Relative to Neighbor-RN).
ing Evaluation Relative to Neighborhood (CERN)
ted compactnessective is built upon the trade-off between the connec-lative compactness:
tivity: Basically, connectivity is the degree to whichring data points are placed in the same cluster [26,28].e rst need to dene the neighborhood of a point. Thereral neighborhood construction algorithms such as k-
neighbors (KNN), -neighborhood [22], NC algorithmd so on. When there are arbitrary-shaped clusters with
differences, NC outperforms KNN and -neighborhood.es a unique neighborhood for each data point. NC also
es subclusters (closures), which are formed by merging points having common neighbors. These closures can
as the basis of a clustering solution. For these reasons,the NC algorithm to determine the neighborhoods ofal data points.and Clp be the sets of points in cluster m and closure
ctively. Connectivity of cluster m with respect to clo- connectmp = |Cm Clp|/|Clp| if Cm Clp /= . In the idealnnectivity takes a value of one, which means that clus-nd closure p fully overlap. The connectivity of clusterculated as connectm =
ncp=1connectmp, where nc is the
mber of closures. In this calculation, if Cm Clp = , thenp is part of a cluster other than m, and, in this case, wenectmp = 1 so that the value of connectm is not affected
unrelated closure and cluster pairs. Merging multiple that are in the same cluster results in a connectivity
one, whereas it is less than one when there are divided.
compactness: We dene the relative compactness ofm as the most inconsistent edge within its neighbor-
relative compactness calculation we consider the edgesinimum spanning tree (MST) of a cluster. MST is a graphh the sum of the edge lengths is the minimum, andh is connected with no cycles. These two properties
r allow us to dene compactness of a cluster in an ef-ay. Then, we compare each edge in the MST with the
the neighborhood. More formally, relative compact-cluster m is
cm = max(i,j)MSTm
dijmax
(k, l) MSTm(i)or
(k, l) MSTm(j)
{dkl}
bo
Whness iorates adjustecomp tion is
3.1.2. A g
We declustern(j *)) =cluster
r sep cm
The
CERN mative s
3.2. W(WCER
WCare calbetwea weigWCERNcluster
r comp
and
r sep wm of point i in cluster m, respectively.
he number of clusters increases, relative compact-oves (decreases) whereas the connectivity deteri-reases). Combining connectivity and compactness,
ompactness of cluster m is obtained as compm = ronnectm. The overall compactness of a clustering solu-d as max
m{compm}.
ive separationclustering solution must have well-separated clusters.the relative separation based on the local properties oft the nearest cluster to cluster m be n such that (m(i *),in {dij : i Cm, j Cn, m /= n} . The relative separation of
min
dm(i),n(j)max
(k, l) MSTm(i) if |Cm| > 1
or
(k, l) MSTn(j) if |Cn| > 1
{dkl}
, if | Cm| > 1 or |Cn > 1
1, otherwise.
rall separation of a clustering solution is minm
{r sep cm}.izes the adjusted compactness and maximizes the rel-tion.
ed Clustering Evaluation Relative to Neighborhood
is similar to CERN; both compactness and separationed relative to the neighborhoods. The only differenceRN and WCERN is that the edge lengths are used as
actor in compactness and separation calculations innce, relative compactness and relative separation ofe calculated as
= max(i,j)MSTm
d2ij
max(k, l) MSTm(i)or
(k, l) MSTm(j)
{dkl}
min
d2m(i),n(j)
max(k, l) MSTm(i) if | Cm| > 1
or
(k, l) MSTn(j) if | Cn| > 1
{dkl}
, if |Cm| > 1 or |Cn| > 1
dm(i),n(j), otherwise.
-
T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311 305
The ACO-C Methodology Step 0. Pre-processing (neighborhood construction and data set reduction)Step 1. Initialization of parameters For t = 1,..,
For
EndStepStep
End for
Similar tand maxim
4. The ACO
ACO-C iswork. It hasand data sethe local intion is usedthe scalabil
In ACO-Cing edges bform a clusexactly twoclusters and
The outlwhere max tions and th
Step 0. Pr
4.1. Neighb
We consclosures (sutwo propera cluster (dboundary oand outlierdetection adistant neigpair of datadummy poito nowhereconnected tter. An examPoints j, k, lpoint p is thextended bdistant neig
4.2. Data se
The inteis sufcientclosures forpoints in a
0.91
0.92
Closure A
Closu
m
ntribion a
1. Ine pants),oratiction
2. Sohen atializhe seet of
onetion int se
formt i, is , theNCSkt nowtructtializge int seleij, reenc
ij/
i
tion starting from point j. The initial pheromone concen-n is inversely proportional to the evaporation rate, ij = 1/
i D, j NSi.max_iter
s = 1,.., no_ants
Step 2. Solution constructionStep 3. Solution evaluationStep 4. Local search
for 5. Pheromone update 6. Non-do minated set update
Fig. 2. The outline of the ACO-C methodology.
o CERN, WCERN minimizes the adjusted compactness,izes the relative separation.
-based clustering (ACO-C) methodology
a clustering methodology in a multi-objective frame- two pre-processing steps: neighborhood constructiont reduction. Neighborhood construction helps extractformation inherent in the data set. This local informa-
in the solution evaluation. Data set reduction ensuresity of the approach.
an ant is a search agent. Ants construct tours by insert-etween pairs of data points. Connected points in a tourter. During edge insertion each point is connected to
points. This makes it easier to extract arbitrary-shaped reduces computational requirements.ine of the ACO-C methodology is presented in Fig. 2iter and no ants denote the maximum number of itera-e number of ants, respectively.
e-processing.
orhood construction
truct the neighborhood of each data point and obtainbclusters) using the NC algorithm [70]. NC closures haveties: (1) A closure is either a cluster itself or a subset ofivided cluster). (2) There may be an outlier mix on thef a closure. Hence, we focus on the merging operations
detection in the clustering. In order to allow outliernd closure merging, we extend NC neighborhoods withhbors and nowhere. Distant neighbors are the nearest
points between two adjacent closures. Nowhere is ant used for outlier detection. If a data point is connected
twice, then it is classied as an outlier. If a data point is
0.780.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.9
and coextract
StepTh
(no aevapin Se
StepW
is iniset, tthe sset toselec
Poter ispoinThenand is noconsis ini
Edpoin(i,j), ter. Hpij =insertratiofor o nowhere once, then it is the start/end point of a clus-ple for neighborhood denition is provided in Fig. 3.
, m and n are neighbors of point i generated by NC, ande distant neighbor of point i. Neighborhood of point i isy point p and nowhere. Note that not every point has ahbor.
t reduction via boundary formation
rior points of a closure are already connected, hence it to consider only the points on the boundaries of the
merging and outlier detection. Exclusion of interiorclosure decreases the number of points in a data set
Point seDo is emp
Step 3. SoThe pe
CERN and Step 4. Lo
In ordeapply locational meclusters mand let ctive sepaThe adjusare comp0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86
k
pre B j
i
n
l
Fig. 3. An example for neighbors of point i.
utes to the scalability of ACO-C. We use the boundarylgorithm in [71].
itialization of parameters.rameters of ACO-C, including the number of ants
the number of maximum iterations (max iter), and theon rate () are initialized. We conduct a factorial design
5 to determine the values of these parameters.lution construction.n ant starts clustering, the set of unvisited points, Do,ed as the entire data set, D. For each point in the datat of currently available neighbors, NCSi, is initialized asits neighbors, NSi. The current number of clusters (m) is. There are two substeps in solution construction: pointand edge insertion.lection: Every time an ant starts a new tour, a new clus-ed. When a new cluster, Cm, is initialized, a point, say
selected at random from the set of unvisited points, Do. related sets are updated as Cm = Cm U{i}, Do = Do/{i},= NCSk/{i} for k Do. If NCSi is non-empty or point ihere, we continue with edge insertion. Otherwise, the
ion of the current cluster is nished, and a new clustered by incrementing the cluster index, m, by 1.sertion: An ant inserts an edge between point i and acted from NCSi. The pheromone concentration on edgepresents the tendency of edge (i,j) to occur in a clus-
e, the probability of selecting edge (i,j) is calculated as
kNCS ik for j NCSi. Then, the ant continues edgelection and edge insertion substeps are repeated untilty. The details of Step 2 are presented in Fig. 4.lution evaluation.rformance of a clustering solution is evaluated using
WCERN as described in Section 3.cal search.r to strengthen the exploitation property of ACO-C wel search to each clustering solution constructed. Condi-rging operations are performed in the local search. Let
and n are adjacent clusters considered for merging,omp and sep be the adjusted compactness and rela-ration of the current clustering solution, respectively.ted compactness and relative separation after merging and sep, respectively. If comp comp and sep sep,
-
306 T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311
Step 2. Solution construction
Set m = 1, Do = D and NCSi = NSi, Di .
While Do
2.1.
Sele
Set C
Whi
End
Set m
End while
clusters mof the locants (SC)
Step 5. PhPherom
(edge) soreected
There alem: (1) different and proxi(2) We uscompleming pheroincumbenpoint i, inration obtWe checkseparatiothan the released i
For all of pheroimprovemcompactnfrom the values are
ij = (1
where
wij =
{mi
0,
Step 6. NoLet s1 a
The aim isto maxim
Denitio
(i) comps(ii) comps
Denition 2. If there does not exist any other clusteringsolutions dominating solution s1, then solution s1 is called a non-dominated solution.
updd of eumb
datahe me reated
erim
his se pr. Secxperate orform
algoore2
ata se
tests [72ith vtra-cxam
evaldex s:
numre as
nume ass
nume ass
numre as
a
b +a +
+ b +Point selection
ct point i from Do at random.
m = Cm U{i}, Do = Do /{i}, and NCSk = NCSk /{ i} for k Do.
le NCSi and i nowhere
2.2. Edge insertion
Select edge (i,j) where j NCSi using probabilities based on ij ,
and insert edge (i,j).
Set Cm = Cm U{j}, Do = Do /{j}, and NCSk = NCSk /{j} for k Do.
Then, set i = j.
while
= m + 1, and start a new cluster.
Fig. 4. The details of Step 2.
and n are merged. The clustering solutions at the endal search form the set of solutions constructed by thein the current iteration.eromone update.one update is performed for each solution component
that the effect of the solution component is well-in the pheromone concentration.re two important properties about our clustering prob-We are interested in arbitrary-shaped clusters withdensities, so reecting the local density, connectivitymity relations are crucial in nding the target clusters.e adjusted compactness and relative separation as twoentary objective functions. Hence, we use the follow-mone update mechanism. For each data point i, thet (minimum) adjusted compactness obtained so far forc compi, and the incumbent (maximum) relative sepa-ained so far for point i, inc sepi, are kept in the memory.
whether or not the adjusted compactness and relativen of the cluster to which edge (i,j) belongs are bettercorresponding incumbent values. More pheromone isf an incumbent improves.the edges in the clustering solution, E, the amountmone released is proportional to the amount ofent in the incumbents. The initial incumbent adjustedess and relative separation for each point are takenclosures of the NC algorithm. Formally, the pheromone
updated as
Wethe enthe incfor the
If t26 ardomin
5. Exp
In tFirst, wcriteriatorial eelaborthe pe
TheIntel C
5.1. D
Wesourcesets wetc), inSome e
Wecard infollow
a: theand ab: thebut arc: thebut ard: theand a
JI =a +
RI =a)ij + wijij i D, j NSi,
n{inc compi, inc compj}comp(i,j)
+ sep(i,j)max{inc sepi, inc sepj}
, if (i, j) E
otherwise.
n-dominated set update.nd s2 be two clustering solutions generated by the ants.
to minimize the maximum adjusted compactness andize the minimum relative separation.
n 1. Solution s1 dominates solution s2 if
1 < comps2 and seps1 seps2, or1 comps2 and seps1 > seps2.
JI is oneIt takes valclustering icoefcient.same clusteassigned toclusters as values in th
5.2. Param
The threset max iterecorded thfound. We determine impact of thof ACO-C. Tate the current set of non-dominated solutions (SN) atach iteration using Denitions 1 and 2. We also updateent compactness and separation (inc compi and inc sepi)
set.aximum number of iterations is not exceeded, Stepspeated. Otherwise, ACO-C terminates with the non-solutions in set SN.
ental results for the ACO-C methodology
ection, we test the performance of ACO-C empirically.esent the test data sets and the performance evaluationond, using some pilot data sets, we conduct a full fac-iment in order to set the ACO-C parameters. Third, wen the impact of data set reduction. Finally, we compareance of ACO-C with other clustering algorithms.rithm was coded in Matlab 7.9 and run on a PC with
Duo 2.33 GHz processor and 2 GB RAM.
ts and performance evaluation criteria
ed ACO-C using 32 data sets compiled from several74]. These include 2- and higher dimensional dataarious shapes of clusters (circular, elongated, spiral,luster and inter-cluster density variations, and outliers.ple data sets are presented in Fig. 5.uated the accuracy of the clustering solution using Jac-(JI) and Rand index (RI). We dene these measures as
ber of point pairs that belong to the same target clustersigned to the same cluster in the solution.ber of point pairs that belong to the same target clusterigned to different clusters in the solution.ber of point pairs that belong to different target clustersigned to the same cluster in the solution.ber of point pairs that belong to different target clusterssigned to different clusters in the solution.
c(1)
d
c + d (2)
of the well-known external clustering validity indices.ues between zero and one, one indicating the targets achieved. RI is also known as the simple matching
While JI focuses on point pairs correctly assigned to ther, RI also takes into account the point pairs correctly
different clusters. Both indices penalize the division ofwell as mixing them. We report the maximum JI and RIe set of non-dominated solutions.
eter settings for ACO-C
e parameters of ACO-C are no ants, max iter, and . Wer to twice the number of points in the data set, ande iteration number in which the target clustering wasused a full factorial experimental design in order tothe best settings for no ants and . We also studied thee solution evaluation function (EF) on the performancehe three factors used in the experimental design and
-
T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311 307
Fig. 5. Exampiris (projected
their levelsrial experimwere select
Before dACO-C resunon-dominle data sets: (a) train2, (b) data-c-cc-nu-n, (c) data-uc-cc-nu-n, (d) data-c-cv-nu-n, (e) data-uc-cv-nu-n, (f) data circle, (g) train3, (h) data circle 1 20 1 1, (i) to 3-dimensional space), (j) letters.
are presented in Table 1. We conducted the full facto-ent using a subset of 15 data sets. These 15 data sets
ed to represent different properties of all data sets.iscussing the full factorial design results, we present thelts for the example data set given in Fig. 5(d). The threeated clustering solutions found by ACO-C are presented
Table 1Experimental factors in ACO-C.
Factors Level 0 Level 1
Evaluation function, EF CERN WCERNEvaporation rate, 0.01 0.05Number of ants, no ants 5 10
-
308 T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311
1 2 3 4 5 6 7 8 91
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 91
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 91
2
3
4
5
6
7
8
9
10
C1
C2C3
C4
C5
C6
C1
C2C3 C2
C1
(a) (b)Fig. 6. Non-dominated clustering solutions for data-c-cv-nu-n. (a) Solution with six clusters (JI = 1), (b)(JI = 0.54).
in Fig. 6. These solutions include the target clustering with a JI valueof one. The resulting non-dominated solutions can be interpretedas clustering of points in different resolutions.
We also checked the convergence of ACO-C in the example dataset. In Fig. 7iteration nu
The maiare presentslows the cthe target sthree timesACO-C provfor increaseperformancparameter s
In Fig. 8(of the maxithe target c(e). Hence, Cin low resothat are visand (h). CEtarget clustWe consideboth as the
Note tha
# o
f no
n-d
omin
ate
d so
lutio
ns
Fig. 7. C
5.3. Data se
We testThe bound
uctiored wle 2
dimding ercen) anductiectioion.
mpa
perf, singrison, andN is algorgle-eral en 2 entslgori
sett. NO
ighbo the set of non-dominated solutions stays the same aftermber 70. This implies that convergence is achieved.n effect plots for the maximum RI and execution timeed in Fig. 8. The low setting of the evaporation rateonvergence down and prevents ACO-C from missingolutions. However, the time spent in ACO-C increases
with this setting. Increasing the number of ants used inides a slight improvement in the maximum RI in return
in the time. Considering the trade-offs between thee and time, the experiments are performed with theettings = 0.01 and no ants = 5.a) and (b) WCERN performs better than CERN in termsmum RI and time. On the other hand, only CERN ndslustering in some data sets, i.e. data sets in Fig. 5(d) andERN ensures nding the target clustering that is visible
lution. WCERN is more powerful in extracting clustersible in higher resolution such as data sets in Fig. 5(f)RN and WCERN complement each other in nding theers, so we run ACO-C using both evaluation mechanisms.r the union of the non-dominated solutions obtained by
nal solution set.t there is no signicant interaction among the factors.
3.5
4
set redcompa
Tabhigherdepention pFig. 5(fthe red
In Sreduct
5.4. Co
ThemeanscomparithmsDBSCAtering and sinfor sevbetweincremeach aMinPtsparisonthe ne0 20 40 60 80 10 0 12 0 14 0 16 00
0.5
1
1.5
2
2.5
3
iterations
onvergence analysis for the example data set, data-c-cv-nu-n.
elaborate othe union oWCERN setand WCERN
The resunds the ta
Table 2The percentag
Average (%) Std. dev. (%)Min. (%) Max. (%) (c) solution with three clusters (JI = 0.57), (c) solution with two clusters
t reduction
ed the impact of data set reduction using 32 data sets.ary extraction algorithms in [71] were used for datan. The number of points in the original data set wasith the number of points after reduction.
shows the data set reduction percentages for 2- andensional data sets. The reduction percentages varyon the shape of the clusters. The highest data set reduc-tages are achieved when clusters are convex, as in
(h). When there are non-convex clusters as in Fig. 5(g),on percentages are lower.n 5.4 clustering is performed on the data sets after the
rison of the ACO-C methodology with others
ormance of ACO-C is compared with the results of k-le-linkage, DBSCAN, NC closures, and NOM [75]. In ours k-means represents the partitional clustering algo-
single-linkage the hierarchical clustering algorithms.selected as a representative of the density-based clus-ithms. The number of clusters is an input for k-meanslinkage, therefore we run k-means and single-linkagevalues of the number of clusters. This number variesand 10% of the number of points in the data set with
of 1, and the one with the best JI value is selected forthm. In the same manner, for DBSCAN, among severalings the one with the best JI value is selected for com-M is a graph theoretical clustering algorithm. It also usesrhoods constructed by the NC algorithm, hence we can
n the impact of ACO-C better. For ACO-C, we considerf the non-dominated solutions obtained with CERN andtings, and the sum of the execution times with CERN
is considered as the execution time of ACO-C.lts for 32 data sets are summarized in Table 3. ACO-Crget clusters in 29 data sets out of 32. Single-linkage and
es of data set reduction.
2-Dimensional data sets Higher dimensionaldata sets
42.79 19.11 20.42 12.41
1.52 4.9074.29 53.84
-
T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311 309
max
RI
0,99 0
0,98 5
0,98
0,97
0,97
0,99
0,98
0,98
0,97
0,97
evaporation rate no. of ants15000
1250 0
1000 0
evaporation rate no. of ants
) mai
Table 3Comparison of
CAN
# of data sets
JI 1 7 0
RI 5 1 3
Time
NOM followhas the bestand single-lthat are clostandard devalues of JIeven in theapproaches
Typicallythere is noievaluation density andarate clustetarget clustnoise as we
The numand WCERNthe size ofuse.
The maicompared tthe Matlabrequired.
6. Conclus
In this wno a priori arbitrary sh10
0
5
0
10
10
0
5
0
5
0
evaluation function tim
e
7500
5000
15000
1250 0
10000
750 0
5000
(a)Fig. 8. (a) Main effect plots for maximum RI and (b
ACO-C with k-means, single-linkage, NC closures and NOM (32 data sets).
k-Means Single-linkage DBS
TC is found 6 24 13
Average 0.71 0.91 0.9Std. dev. 0.24 0.19 0.1Min. 0.28 0.45 0.5
Average 0.84 0.94 0.9Std. dev. 0.13 0.14 0.1Min. 0.62 0.53 0.5Average 0.44 4.76 1.29 Std. dev. 0.60 8.47 2.01 Min. 0.05 0.38 0.03 Max. 2.19 32.60 7.45
ACO-C with 24 and 18 data sets, respectively. ACO-C average JI and RI values, followed by NOM, NC, DBSCANinkage. This indicates that ACO-C is able to form clustersse to the target clusters on the average. Moreover, theviations of JI and RI are the smallest, and the minimum
and RI are the highest for ACO-C. This indicates that worst case ACO-C performs better than the competing., ACO-C has difculty in detecting target clusters whense, as for the data set in Fig. 5(g). The relative solutionmechanisms of both CERN and WCERN are sensitive to
distance changes, so these points are labeled as sep-rs. Although ACO-C yields the general structure of theers in such data sets, it forms clusters by enclosing thell.ber of non-dominated solutions generated by CERN
varies between 1 and 6 for different data sets. Hence, the non-dominated sets is reasonable for practical
n limitation of ACO-C is the longer execution timeso k-means, single-linkage and DBSCAN, partly due to
implementation. In this respect, improvements are
ion
ork we consider the spatial clustering problem withinformation on the data set. The clusters may includeapes, and there may be density differences within and
between thWe presenACO-C. In Asity and discapabilitiesclustering ming issues oextraction itself. The psets. The eother compRI. In particevaluation in extractintions, and a reasonabuse.
The proptering soluthe time. Tterns havinresolutionsoffs betweresearch dtering solufrontier.
ACO-C tits executiocan be impr10 10
10
evaluation function
(b)n effect plots for time.
NC NOM ACO-C
13 18 29
0.93 0.96 0.990.12 0.08 0.020.56 0.59 0.89
0.96 0.98 0.990.02 0.08 0.010.91 0.59 0.9627.34 235.31 1089.4171.29 511.85 823.810.05 0.07 2.09
318.46 1721.47 2916.20
e clusters. Moreover, the number of clusters is unknown.t a novel ACO-based clustering methodology, namelyCO-C we combine the connectivity, proximity, den-
tance information with the exploration and exploitation of ACO in a multi-objective framework. The proposedethodology is capable of handling several challeng-
f the clustering problem including solution evaluation,of local properties, scalability and the clustering taskerformance of ACO-C is tested using a variety of dataxperimental results indicate that ACO-C outperformseting approaches in terms of the validity indices JI andular, the multi-objective framework and the solutionrelative to the neighborhoods enhance the algorithmg arbitrary-shaped clusters, handling density varia-nding the correct number of clusters. ACO-C achievesle number of non-dominated solutions for practical
osed methodology can generate non-dominated clus-tions, which include the target clustering most ofhese solutions represent alternative clustering pat-g different levels of resolution. Solutions with different
allow the decision maker to analyze the trade-en the merging and division operations. A futureirection can be to nd all the non-dominated clus-tions in different resolutions, i.e. the Pareto efcient
ypically has problems with detection of the noise. Also,n times are relatively longer. The proposed approachoved in both areas.
-
310 T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311
References
[1] A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review, ACM Comput. Surv.31 (3) (1999) 264323.
[2] H. Alani, C.B. Jones, D. Tudhope, Voronoi-based region approximation for geo-graphical(2001) 28
[3] H. Pster24 (4) (20
[4] J. FreixenmentatioSci. 2352
[5] T. Yuan, Wspatial de(2007) 93
[6] C.H. Wanon semic
[7] R. Xu, D. W(3) (2005
[8] P. BerkhC. NichoAdvances
[9] J.B. MacQobservatical Statis
[10] J.A. HartigR. Stat. So
[11] L. KaufmaAnalysis,
[12] C.T. Zahnters, IEEE
[13] A.K. Jain, Reference
[14] M. HalkidSIGMOD
[15] M. HalkidSIGMOD
[16] S. Bandyoof validit(2001) 12
[17] S. Bandyoclusters a1197120
[18] E.R. HrusData Ana
[19] W. Shengtering wi Part B:
[20] S. Bandysymmetr
[21] S. Bandyoautomati1441145
[22] M. Ester, clusters iInternatioOR, 1996
[23] J.C. Dunn(1974) 95
[24] U. Maulikrithms an1650165
[25] S. Bandyotering forRemote S
[26] J. Handl, J8th Interpp. 1081
[27] D.W. Cortion in evEvolution
[28] J. Handl, IEEE Tran
[29] N. Matakmatic k-dEvolution
[30] C.-W. Tsbased cnumber http://dx
[31] S. Saha, Snique for
[32] X.L. Xie, Anal. Mac
[33] S. Saha, SAppl. Soft
[34] S. Saha, S. Bandyopadhyay, A generalized automatic clustering algorithm in amulti-objective framework, Appl. Soft Comput. 13 (2013) 89108.
[35] S. Das, A. Abraham, A. Konar, Clustering using Multi-objective Differential Evo-lution Algorithms, Metaheuristic Clustering, Springer, Heidelberg, Berlin, 2009,pp. 213238.
ureshective403Gong,une-puta
Paoli, lti-ob(12) (2. Santoorithm249DorighnicaDorigperatDorigo, Artiandl, 07) 95iu, F. H10) 59Yong,ce, in
InforP. Chenic lib559e, S. H
ing, inposiuao, S.C
. J. Adv. Ho,
consE Co1109/. Runk) (200aatchans Alge An
. Sheloing, An
Kuo, Hlysis, . Huans for clied tNikna
k-m197. Deneof collt Conf
PressumerProceeaviororycz. 9 (20Yang, onies andl, Jif. LifeMartinlligen959andl, rieval
Solvionm
abaseanceshang,ropy, zzag,
orithm07) 90. Sinhsterin information retrieval with gazetteers, Int. J. Geogr. Inf. Sci. 15 (4)7306., M. Gross, Point-based computer graphics, IEEE Comput. Graph. Appl.04) 2223.et, X. Munoz, D. Raba, J. Mart, X. Cuf, Yet another survey on image seg-n: region and boundary information integration, Lect. Notes Comput./2002 (2002) 2125, http://dx.doi.org/10.1007/3-540-47977-5 27.
. Kuo, A model-based clustering approach to the recognition of thefect patterns produced semiconductor fabrication, IIE Trans. 40 (2)101.g, W. Kuo, H. Bensmail, Detection and classication of defect patternsonductor wafers, IIE Trans. 39 (2006) 10591068.
unsch, Survey of clustering algorithms, IEEE Trans. Neural Netw. 16) 645678.in, A survey of clustering data mining techniques, in: J. Kogan,las, M. Teboulle (Eds.), Grouping Multidimensional Data: Recent
in Clustering, Springer, Berlin, 2006, pp. 2571.ueen, Some methods for classication and analysis of multivariateons, in: Proceedings of the Fifth Berkeley Symposium on Mathemati-tics and Probability, Berkeley, 1967, pp. 281297.an, M.A. Wong, Algorithm AS 136: a k-means clustering algorithm, J.c.: Ser. C (Appl. Stat.) 28 (1) (1979) 100108.n, P.J. Rousseeuw, Finding Groups in Data: An Introduction to ClusterJohn Wiley & Sons, Hoboken, NJ, 1990., Graph-theoretical methods for detecting and describing gestalt clus-
Trans. Comput. C-20 (1) (1971) 6886.R.C. Dubes, Algorithms for clustering data, in: Prentice-Hall Advanced
Series, Prentice-Hall Inc., Upper Saddle River, NJ, 1988.i, Y. Batistakis, M. Vazirgiannis, Cluster validity methods: Part I, ACMRec. 31 (2) (2002) 4045.i, Y. Batistakis, M. Vazirgiannis, Cluster validity methods: Part II, ACMRec. 31 (3) (2002) 1927.padhyay, U. Maulik, Nonparametric genetic clustering: comparisony indices, IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 31 (1)0125.padhyay, U. Maulik, Genetic clustering for automatic evolution ofnd application to image classication, Pattern Recogn. 35 (2002)8.
chka, N.F.F. Ebecken, A genetic algorithm for cluster analysis, Intell.l. 7 (2003) 1525., S. Swift, L. Zhang, X. Liu, A weighted sum validity function for clus-th a hybrid niching genetic algorithm, IEEE Trans. Syst. Man Cybern.Cybern. 35 (6) (2005) 11561167.opadhyay, S. Saha, GAPS: a clustering method using a new pointy-based distance measure, Pattern Recogn. 40 (2007) 34303451.padhyay, S. Saha, A point symmetry-based clustering technique forc evolution of clusters, IEEE Trans. Knowl. Data Eng. 20 (11) (2008)7.
K.P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discoveringn large spatial databases with noise, in: Proceedings of the Secondnal Conference on Knowledge Discovery and Data Mining, Portland,
, pp. 226231., Well-separated clusters and optimal fuzzy partitions, J. Cybern. 4 (1)104., S. Bandyopadhyay, Performance evaluation of some clustering algo-d validity indices, IEEE Trans. Pattern Anal. Mach. Intell. 24 (12) (2002)4.padhyay, U. Maulik, A. Mukhopadhyay, Multi-objective genetic clus-
pixel classication in remote sensing imagery, IEEE Trans. Geosci.ens. 45 (5) (2007) 15061511.. Knowles, Evolutionary multi-objective clustering, in: Proceedings ofnational Conference on Parallel Problem Solving from Nature, 2004,1091.
ne, N.R. Jerram, J.D. Knowles, M.J. Oates, PESA-II: region-based selec-olutionary multi-objective optimization, in: Proceedings of Geneticary Computation Conference, 2001, pp. 283290.J. Knowles, An evolutionary approach to multi-objective clustering,s. Evolut. Comput. 11 (1) (2007) 5676.e, T. Hiroyasu, M. Miki, T. Senda, Multi-objective clustering with auto-etermination for large-scale data, in: Proceedings of Genetic andary Computation Conference, London, England, 2007, p. 2007.ai, W.-L. Chen, M.-G. Chiang, A modied multi-objective EA-lustering algorithm with automatic determination of theof clusters, Proc. IEEE Int. Conf. Syst. Man Cybern. (2012),.doi.org/10.1109/ICSMC.2012.6378178.. Bandyopadhyay, A symmetry-based multi-objective clustering tech-
automatic evolution of clusters, Pattern Recogn. 43 (2010) 738751.G. Beni, A validity measure for fuzzy clustering, IEEE Trans. Patternh. Intell. 13 (1991) 841847.. Bandyopadhyay, Some connectivity-based cluster validity indices,
Comput. 12 (2012) 15551565.
[36] K. Sobj381
[37] M. immCom
[38] A. mu47
[39] D.SAlg239
[40] M. Tec
[41] M. coo
[42] M. tion
[43] J. H(20
[44] X. L(20
[45] W. genand
[46] A.-tro548
[47] Y. HterSym
[48] Y. KInt
[49] C.KforIEE10.
[50] T.A(12
[51] S. SmeIma
[52] P.Ster
[53] R.J.ana
[54] C.Lgieapp
[55] T. and183
[56] J.-Lics FirsMIT
[57] E. Lin: Beh
[58] U. Bput
[59] Y. col
[60] J. HArt
[61] M. inte951
[62] J. HRetlem
[63] N. MDatAdv
[64] L. Zent
[65] H. Aalg(20
[66] A.Nclu, D. Kundu, S. Ghosh, S. Das, A. Abraham, Data clustering using multi- differential evolution algorithms, Fundam. Inform. 97 (4) (2009).
L. Zhang, L. Jiao, S. Gou, Solving multi-objective clustering using aninspired algorithm, in: Proceedings of IEEE Congress on Evolutionarytion, 2007, http://dx.doi.org/10.1109/CEC.2007.4424449.F. Melgani, E. Pasolli, Clustering of hyperspectral images based onjective particle swarm optimization, IEEE Trans. Geosci. Remote Sens.009) 41754188.s, D.D. Oliveira, A.L.C. Bazzan, A Multiagent Multi-objective Clustering. Data Mining and Multi-agent Integration, Springer, Berlin, 2009, pp.
.o, V. Maniezzo, A. Colorni, Positive Feedback as a Search Strategy.l Report, 91-016, Politecnico di Milano, Dip. Elettronica, 1991.o, V. Maniezzo, A. Colorni, Ant system: optimization by a colony ofing agents, IEEE Trans. Syst. Man Cybern. Part B 26 (1) (1996) 2941., G.D. Caro, L.M. Gambardella, Ant algorithms for discrete optimiza-
f. Life 5 (2) (1999) 137172.B. Meyer, Ant-based and swarm-based clustering, Swarm Intell. 1113.u, An effective clustering algorithm with ant colony, J. Comput. 5 (4)8605.
W. Peng-Cheng, Data clustering method based on ant swarm intelli-: Proceedings of IEEE International Conference on Computer Sciencemation Technology, 2009, pp. 358361.n, C.-C. Chen, A new efcient approach for data clustering in elec-rary using ant colony clustering algorithm, Electron. Libr. 24 (4) (2006).ui, Y. Sim, A novel ant-based clustering approach for document clus-
: H. Ng, M.-K. Leong, M.-Y. Kan, D. Ji (Eds.), Asia Information Retrievalm, Springer, Singapore, 2006, pp. 537544.. Fu, An ant-based clustering algorithm for manufacturing cell design,. Manuf. Technol. 28 (2006) 11821189.H.T. Ewe, A hybrid ant colony optimization approach (hACO)tructing load-balanced clusters, in: Proceedings of the 2005ngress on Evolutionary Computation, 2005, http://dx.doi.org/CEC.2005.1554942.ler, Ant colony optimization of clustering models, Int. J. Intell. Syst. 205) 12331251.i, C.C. Hung, Hybridization of the Ant Colony Optimization with the k-gorithm for Clustering. Lecture Notes in Computer Science: vol. 3540.alysis, Springer, Berlin, 2005, pp. 511520.kar, V.K. Jayaraman, B.D. Kulkarni, An ant colony approach for clus-al. Chim. Acta 509 (2004) 187195..S. Wang, T.-L. Hu, S.H. Chou, Application of ant k-means on clusteringComput. Math. Appl. 50 (2005) 17091724.g, W.-C. Huang, H.Y. Chang, Y.-C. Yeh, C.-Y. Tsai, Hybridization strate-ontinuous ant colony optimization and particle swarm optimizationo data clustering, Appl. Soft Comput. 13 (2013) 38643872.m, B. Amiri, An efcient hybrid approach based on PSO, ACOeans for cluster analysis, Appl. Soft Comput. 10 (1) (2010).ubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain, The dynam-ective sorting: robot-like ant and ant-like robot, in: Proceedings of theerence on Simulation of Adaptive Behavior: From Animals to Animats,, 1991, pp. 356365.
, B. Faieta, Diversity and adaptation in populations of clustering ants,dings of the Third International Conference on Simulation of Adaptive
, MIT Press, Cambridge, 1994, pp. 501508.ka, Finding groups in data: cluster analysis with ants, Appl. Soft Com-09) 6170.M.S. Kamel, An aggregated clustering approach using multi-ant
algorithms, Pattern Recogn. 39 (2006) 12781289.. Knowles, M. Dorigo, Ant-based clustering and topographic mapping,
12 (2006) 3561., B. Chopard, P. Albuquerque, Formation of an ant cemetery: swarmce or statistical accident? Future Gener. Comput. Syst. 18 (2002).B. Meyer, Improved Ant-based Clustering and Sorting in a Document
Interface. Lecture Notes in Computer Science: vol. 2439. Parallel Prob-ng from Nature, Springer, Berlin, 2002, pp. 913923.arch, M. Slimane, G. Venturini, On Improving Clustering in Numericals with Articial Ants. Lecture Notes in Articial Intelligence: vol. 1674.
in Articial Life, Springer, Berlin, 1999, pp. 626635. Q. Cao, J. Lee, A novel ant-based clustering algorithm using RenyiAppl. Soft Comput. 13 (5) (2013) 26432657.
G. Venturini, A. Oliver, C. Guinot, A hierarchical ant-based clustering and its use in three real-world applications, Eur. J. Oper. Res. 179 (3)6922.a, N. Das, G. Sahoo, Ant colony based hybrid optimization for datag, Kybernetes 36 (1/2) (2007) 175191.
-
T. Inkaya et al. / Applied Soft Computing 28 (2015) 301311 311
[67] C.F. Tsai, C.W. Tsai, H.C. Wu, T. Yang, ACODF: a novel data clustering approachfor data mining in large databases, J. Syst. Softw. 73 (1) (2004) 133145.
[68] A. Ghosh, A. Halder, M. Kothari, S. Ghosh, Aggregation pheromone density-based data clustering, Inf. Sci. 178 (2008) 28162831.
[69] M. Wan, C. Wang, L. Li, Y. Yang, Chaotic ant swarm approach for data clustering,Appl. Soft Comput. 12 (8) (2012) 23872393.
[70] T. Inkaya, S. Kayalgil, N.E. zdemirel, An adaptive neighbourhood constructionalgorithm based on density and connectivity, Pattern Recogn. Lett. 52 (2015)1724.
[71] T. Inkaya, S. Kayalgil, N.E. zdemirel, Extracting the non-convex boundaries ofclusters: A post-clustering tool for spatial data sets. Technical Report, MiddleEast Technical University, Ankara, Turkey, 2014.
[72] K. Bache, M. Lichman, UCI Machine Learning Repository, University ofCalifornia, School of Information and Computer Science, Irvine, CA, 2013,http://archive.ics.uci.edu/ml
[73] O. Sourina, Current Projects in the Homepage of Olga Sourina, 2013,http://www.ntu.edu.sg/home/eosourina/projects.html (accessed 21.11.13).
[74] C. Iyigun, Probabilistic Distance Clustering (Ph.D. dissertation), Rutgers Univer-sity, New Brunswick, NJ, 2008.
[75] T. Inkaya, S. Kayalgil, N.E. zdemirel, A new density-based clustering approachin graph theoretic context, IADIS Int. J. Comput. Sci. Inf. Syst. 5 (2) (2010)117135.
Ant Colony Optimization based clustering methodology1 Introduction2 Related literature2.1 Solution evaluation mechanisms2.2 ACO and ant-based clustering algorithms
3 How to evaluate a clustering solution?3.1 Clustering Evaluation Relative to Neighborhood (CERN)3.1.1 Adjusted compactness3.1.2 Relative separation
3.2 Weighted Clustering Evaluation Relative to Neighborhood (WCERN)
4 The ACO-based clustering (ACO-C) methodology4.1 Neighborhood construction4.2 Data set reduction via boundary formation
5 Experimental results for the ACO-C methodology5.1 Data sets and performance evaluation criteria5.2 Parameter settings for ACO-C5.3 Data set reduction5.4 Comparison of the ACO-C methodology with others
6 ConclusionReferences