ike 5086

8/10/2019 Ike 5086

1/7

8/10/2019 Ike 5086

2/7

2 Fuzzy C-Means and SubtractiveClustering method

2.1 Fuzzy C-Means algorithm (FCM)The FCM algorithm uses fuzzy set memberships to

associate every data point with at least one cluster.

Given a dataset X = {x i R p, i=1..n}, where n>0 is the

number of data points and p>0 is the dimension of the dataspace of X, let c, c N, 2 c n, be the number of clustersin X.

Denote V = {v k R p, k=1..c} as the set of center points

of c clusters in the fuzzy partition; U = {u ki [0,1], i=1..n,k=1..c} as the partition matrix, where u ki is the fuzzymembership degree of the data point x i to the k

th cluster,and

.n..1i,1uc

1k ki (1)

The clustering problem is to determine the values of c andV such that:

min,vxuV,U|XJn

1i

c

1k k iki

(2)

where ||x-y|| is the distance between the data points x and yin R p, defined using Euclidean distance as:

.yxyx p

1i

2ii2 (3)

By using fuzzy sets to assign data points to clusters,FCM allows adjacent clusters to overlap, and therefore

provides more information on the relationships among thedata points. In addition, by using a fuzzifier factor, m, in itsobjective function (4), the clustering model from FCM ismore flexible in changing the overlap regions amongclusters.

min,vxu)V,U|X(J k in

1i

c

1k

mki

(4)

where m, 1 m< , is the fuzzifier factor.

Equation (4) can be solved using Lagrange multiplierswith respect to (1).

,uxuvn

1i

mki

n

1ii

mkik (5)

.vx

1

vx

1u

c

1 j

m11

2

ji

m11

2k i

ki

(6)

To estimate the solution of the system of equations (5)

and (6), FCM uses an iteration process. The values of U areinitialized randomly. The values of V are estimated using(5). The values of U are then re-estimated using (6) with thenew values of V. This process is iterated until convergentwhere

u>0, T > 0: t>T,

.(t)u1)(tumaxUU ukikiik,t1t (7)

Or, v>0, T > 0: t>T,

.

(t)v1)(tvmaxVV vk k k t1t (8)

The FCM algorithm has the advantage of convergingquickly, however, it may get stuck in local optima and beunable to detect the global optimum.

2.2 Subtractive Clustering method (SC)SC is commonly used to find cluster centers and

provide the optimal number of clusters. To detect the center point of a new cluster, SC considers each data point as a potential cluster center. The mountain function is used tomeasure the potential for each data point to be a cluster

center.

,e)x(M

n

1 j

2

x,x

i

2

2 ji

(9)

where >0, is a predefined constant that represents themountain peak or the neighborhood area radius of each data

point. The greater the number of close neighbors a data point has, the higher the magnitude of its mountainfunction. The mountain function can be viewed as ameasure of data density in the neighborhood of each data

point. Data points with large values for the mountain

function have more chances to become cluster centers, andthe data point with the maximum value of the mountainfunction is the first choice for the center of the cluster.

Once a data point is selected as center point of acluster, its neighbors are affected. Let x* be the selecteddata point and M* be the mountain value at x*. Themountain function values of its neighbors are then amendedas:

8/10/2019 Ike 5086

3/7

,eM)x(M)x(M2

2

2 jx

*x

* j1t jt

(10)

where >0, a predefined constant, represents the mountainor the affected neighborhood area radius at each data point.

Usually, > and quite often = 1.5 .

SC stops increasing the number of clusters when it isequal to a predefined constant C max , or the ratio between thevalue of M* at time t, M t*, and its original value, M 0*, isless than a predefined constant ,

.MM

*0

*t (11)

While SC can help to determine the number of clustersin the dataset, the constants , , and must be specified,and will differ among datasets.

3 The proposed algorithm3.1 Fuzzy subtractive clustering (SC) method

We first propose a novel fuzzy SC method that uses afuzzy partition of the data instead of the data themselves.Our method addresses the drawback of the traditional SC inthat it does not require a priori the values of the mountain

peak and mountain radii.

3.1.1 Fuzzy mountain functionGiven {U,V}, a fuzzy partition on X, the accumulated

density at v k , k=1..c, is calculated [8] as

.u)v(Accn

1ikik (16)

The density at each data point x i is estimated, using ahistogram based method, as

.u)v(Acc)x(densc

1k kik i (17)

The density estimator in (17) is more true at the centers

{vk } than at other data points [8]. Therefore, if we try tofind the most dense data points, c, using (17), we willobtain the centers of c clusters of the fuzzy partition. Toaddress this problem, we use the concept of strong uniformfuzzy partition [8]. We define a new fuzzy partition {U',V}using {U,V},

,eeuC

1l

1vx1vx

'ki

2l

li2k

k i

(18)

where

.vx)v|x(Pn

1i

2k ik i

2k (19)

Because {U',V} is a strong uniform fuzzy partition [8], the

density at every data point can then be estimated as

.u)v(Acc)x(densc

1k

'kik i (20)

3.1.2 Fuzzy mountain function amendmentEach time the most dense data point is selected, the

mountain function of the other data points must be amendedto search for new cluster centers. In the traditional SCmethod, the amendment is done using the directrelationships between the selected data point and itsneighborhood, i.e., using a pre-specified mountain radius(10). In our approach, this is done using the fuzzy partitionand no other parameters are required. First, the accumulateddensity of all cluster centers is revised using

),x|v(PM)v(Acc)v(Acc *tk *tk

tk

1t (21)

where x t* is the data point selected as the new cluster center

and M t* is the mountain function value, at time t. The

density at each data point is then re-estimated using (20)and (21) as follows,

.)x|v(PuM)x(dens

)x|v(PuMu)v(Acc

u)x|v(PM)v(Acc

u)v(Acc)x(dens

c

1k

*tk

'ki

*ti

t

c

1k

*tk

'ki

*t

c

1k

'kik

t

'ki

c

1k

*tk

*tk

t

c

1k

'kik

1ti

1t

Hence, the estimated density at each data point is amendedas:

.)x|v(PuM)x(dens)x(densc

1k

*tk

'ki

*ti

ti

1t (22)

To determine the optimal number of clusters in thedataset, we first locate the most dense data points, n .Only the data points where the ratio between the mountainfunction values at time t and time 0 is above 0.95 areselected. This number is considered the optimal number ofclusters in the dataset.

8/10/2019 Ike 5086

4/7

3.1.3 Fuzzy subtractive clustering algorithm Input: Fuzzy partition {U,V} on X. Output: Optimal number of clusters, c * , and c *

cluster candidates. Steps1. Construct a new fuzzy partition {U',V} using (18)

2. Estimate the density in X using (20)3. Search for the most dense data point, say x*4. If dens t(x*) > 0.95*dens 0(x*) then select x* as a

cluster candidate5. Amend all mountain function values using (22)6. If the number of such visited x* is less than n

then go to step 37. Return the set of selected cluster candidates.

Regarding computational complexity, the evaluation offormula (18) is equal to that of (6); and (19) is the same as(5). The running time of (16), (20), (21) and (22) is O(c n).Hence, the computational complexity of our fuzzy SC

method is O(c n) which is equal to an iteration of the FCMalgorithm. This is in contrast to the traditional SC method,as in (9) and (10), which is O(n 2). Thus, for large datasets,while n increases significantly, c may become large, but isreasonably limited by n. This illustrates another advantageof our method over the traditional SC.

3.2 Model selection methodIn order to decide the optimal number of clusters in the

dataset, we randomly generate multiple clustering solutionsand choose the best one. This approach is a novel methodfor model selection using the log likelihood estimator with

fuzzy partition. Each clustering solution is modeled with = {U,V}. Our model selection method is based on thelikelihood of the solution model and the dataset as

.)v|x(P)v(P)V,U|x(P

)X|V,U(L)X|(Ln

1i

c

1k k ik

n

1ii

(23)

The log likelihood estimator is computed as

.max)v|x(P)v(Plog)Llog(n

1i

c

1k k ik

(24)

Because our clustering model is a possibility basedone, a possibility to probability transformation is needed

before applying (24) to model selection. For each fuzzy partition, u ki represents the possibility of x i given the clustervk . We used the method of Florea et al. [9] to approximatethe probability of x i given v k , P

a(x i|vk ). The prior probabilityP(v k ) is then estimated as in (25),

.)v|x(P

)v|x(P)v(P c

1l

n

1ili

a

n

1ik i

a

k

(25)

The probability of x i given v k is computed using Pa(x i|vk )

and the Gaussian distribution model as

.e)2(),v|x(Pmax)v|x(P

1

2

vx

k n/1

k ia

k i

2k

2k i

(26)

Equation (26) represents the data distribution better if itis not Gaussian. Our model selection method is based on(24) with (25) and (26).

3.3 The proposed algorithmWe combine the FCM algorithm with our fuzzy SC

method for a novel clustering algorithm (fzSC) thatautomatically determines the optimal number of clustersand the fuzzy partition for the given dataset.

Input: The data to cluster X={x i}, i=1..n. Output: An optimal fuzzy partition solution,

c: Optimal number of clusters. V = {v i }, i =1..c: Cluster centers. U={u ki}, i=1..n, k=1..c: Partition matrix.

Steps1. Randomly generate a set of L fuzzy partitions

where the number of clusters is set to n2. For each fuzzy partition, Use the fuzzy SC method to determine the

optimal number of clusters Run the FCM algorithm using the optimal

number of clusters and the partition matrixfound by the fuzzy SC

Compute the fitness value of the fuzzy partition solution using (24)

3. Select the best solution from the partition set based on the fitness values

4. Return the best solution.

4 Experimental resultsTo evaluate the performance of the fzSC algorithm, we

generated one dataset manually in a 2D space with 171 data points in 6 clusters (labeled from 1 to 6 (Fig.1)) and 200datasets using the method based on a simple finite mixturemodel [12]. Datasets are distinguished by the dimensionsand cluster number, and we generated (5-2+1)*(9-5+1)=20dataset types. For each type, we generated 10 datasets, for atotal of 200. For the real datasets, we used Iris, Wine, Glass,and Breast Cancer Wisconsin datasets from the UC Irvine

8/10/2019 Ike 5086

5/7

Machine Learning Repository [10]. These real datasetscontain classification information. They are thereforehelpful for clustering algorithm evaluation.

4.1 Evaluation of candidate cluster centersinitialization

We first used the manually created (MC) and Irisdatasets to demonstrate the ability of the fzSC algorithm todetermine the centers of the candidate clusters.

4.1.1 MC datasetFor the MC dataset, we first ran the standard FCM

algorithm with the number of clusters set to 13 which isequal to the square root of 171, and the partition matrixrandomly initialized. fzSC was then used to search for thesix (the known number of clusters) most dense data pointsas candidate cluster centers. The 13 cluster centers found bythe FCM algorithm and the six candidate cluster centers

found by fzSC were plotted as in Fig.1.

Fig.1. Candidate cluster centers in the MC dataset foundusing fzSC. Squares, cluster centers from FCM; dark

circles, cluster centers found by fzSC. Classes are labeled by numbers from 1 to 6.

Fig.1 shows that fzSC successfully detected the bestcenters for the six cluster candidates in the MC dataset.

4.1.2 Iris datasetFor the Iris dataset, we first ran the standard FCM

algorithm with the number of clusters set to 12 which isequal the square root of 150, and the partition matrixinitialized randomly. We then used fzSC to search for thethree candidate cluster centers. Cluster centers found by

FCM and fzSC are plotted in Fig.2. fzSC successfullydetected the best centers for the three cluster candidates inthe Iris dataset.

Fig.2. Candidate cluster centers in the Iris dataset foundusing fzSC. Squares, cluster centers from FCM; dark

circles, cluster centers found by fzSC.

4.2 Evaluation of partition matrixinitialization

To evaluate the effectiveness of fzSC in initializing the partition matrix for FCM, we compared the performance offzSC with that of k-means, k-medians and FCM using theMC dataset with 6 clusters. The partition matrices of thethree standard algorithms were initialized randomly at eachruntime. We ran each of these algorithms three times andrecorded their best solutions. For the fzSC algorithm, thestandard FCM algorithm was run once with the number ofclusters set to 13. The fuzzy SC method was then applied tosearch for the six candidate cluster centers. The standardFCM algorithm was then rerun with the values of c and Vset to the six cluster centers found by the fuzzy SC method.We repeated this experiment 20 times and averaged the

performance of each algorithm (Table 1).

Table 1

The algorithm performance on the MC dataset where thenumber of clusters is set to 6

AlgorithmCorrectness ratio by class Avg.

Ratio1 2 3 4 5 6

fzSC 1.00 1.00 1.00 1.00 1.00 1.00 1.00

k-means 0.97 0.87 1.00 1.00 1.00 0.75 0.93

k-medians 0.95 0.82 1.00 1.00 1.00 0.62 0.90

FCM 0.97 1.00 0.95 1.00 1.00 0.96 0.98

8/10/2019 Ike 5086

6/7

8/10/2019 Ike 5086

7/7

[3] J.Y Chen, Z. Quin, J. Jia, A weighted meansubtractive clustering algorithm, Information Technology,Vol. 7, pp. 356360, 2008.

[4] C.C. Tuan, J.H. Lee, S.J. Chao, Using Nearest Neighbor Method and Subtractive Clustering-BasedMethod on Antenna-Array Selection Used in Virtual MIMOin Wireless Sensor Network, in Tenth InternationalConference on Mobile Data Management: Systems,Services and Middleware, Taipei, 2009, pp. 496501.

[5] J.C. Collazo, F.M. Aceves, E.H. Gorrostieta, J.O.Pedraza, A.O. Sotomayor, M.R. Delgado, Comparison

between Fuzzy C-means clustering and Fuzzy ClusteringSubtractive in urban air pollution, in lectronics,Communications and Computer (CONIELECOMP), 201020th International Conference, Cholula, 2010, pp. 174179.

[6] Q. Yang, D. Zhang, F. Tian, An initialization methodfor Fuzzy C-means algorithm using Subtractive Clustering,Third International Conference on Intelligent Networks andIntelligent Systems, Vol. 10, 2010, pp. 393396.

[7] J. Li, C.H. Chu, Y Wang, W. Yan, An ImprovedFuzzy C-means Algorithm for Manufacturing CellFormation, Fuzzy Systems, Vol. 2, pp. 15051510, 2002.

[8] K. Loquin, O. Strauss, Histogram density estimators based upon a fuzzy partition, Statistics and ProbabilityLetters, Vol. 78, pp. 18631868, 2008.

[9] M.C. Florea, A.L. Jousselme, D. Grenier, E. Bosse,Approximation techniques for the transformation of fuzzysets into random sets, Fuzzy Sets and Systems, Vol. 159,

pp. 270288, 2008.

[10] A. Frank, A. Asuncion, (2010) Machine LearningRepository. [Online]. http://archive.ics.uci.edu/ml.

[11] B.P.P Houte, J. Heringa, Accurate confidence awareclustering of array CGH tumor profiles, BioInformatics,Vol. 26(1), pp. 614, 2010.

[12] L. Xu, M.I. Jordan, On convergence properties of theEM algorithm for Gaussian mixtures, Neural Computation,Vol. 8, pp. 129151, 1996.

ike 5086

Documents