image segmentation by clustering - · pdf fileimage using 2 centroids) figure 4 ... of...

3
Colour image segmentation using K – Medoids Clustering Amit Yerpude, Dr. Sipi Dubey Rungta College of Engg. & Tech. Bhilai, Chhattishgarh,India [email protected] [email protected] Abstract—K – medoids clustering is used as a tool for clustering color space based on the distance criterion. This paper presents a color image segmentation method which divides colour space into clusters. Through this paper, using various colour images, we will try to prove that K – Medoids converges to approximate the optimal solution based on this criteria theoretically as well as experimentally. Here we will also compare the efficiency of available algorithm for segmentation of gray as well as noisy images. Keywords— Color image segmentation, Clustering, K - Medoids, K - Means. I. INTRODUCTION Image segmentation [1] [2] as a important research area in Digital Image Processing is extremely large, often depending on a variety of techniques from a wide range of other mathematical fields . Segmentation implies the division of an image into different objects or connected regions that do not overlap. Thus, the union of all the regions is the image itself. A region often has a similar intensity or a distinct boundary. In [3]order to facilitate practical manipulation, recognition, and object-based analysis of multimedia resources, partitioning pixels in an image into groups of coherent properties is indispensable. This process is regarded as image segmentation. Hundreds of methods for color image segmentation have been proposed in the past years. These methods can mainly be classified into two categories: one is contour-based and the other is region–based. Methods of the first category use discontinuity in an image to detect edges or contours in the image, and then use them to partition the image. Methods of the second category try to divide pixels in an image into different groups corresponding to coherent properties such as color etc., that is, it mainly use decision criteria to segment an image into different regions according to the similarity of the pixels. Region growing and clustering are two representative methods of region-based segmentation. Drawbacks of the region growing method are that it is difficult to make the growing or stop growing criteria for different images and the method is sensitive to noise. Recently, most researchers focus on treating the segmentation problem as an unsupervised classification problem or clustering problem. In their methods, segmentation is obtained as the global minima of criterion functions associated with the fuzzy / possibilistic distance between the prototypes and the image pixels. By partitioning pixels according to their global feature distribution, these methods achieve good global partitioning results for most of the pixels. But in these methods, spatial relationship of the pixels is rarely considered. The loss of spatial information of the pixels maybe leads to unreasonable segmentation results for that the pixels that are similar in low level feature (color etc.) but separate in spatial will be grouped into one region. And at the same time, run time complexity of this global partition is often high. The process [4]of grouping a set of physical or abstract objects into classes of similar objects is called clustering. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Clustering is an example of unsupervised learning. Unlike classification, clustering and unsupervised learning do not rely on predefined classes and class labeled training examples. In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large dataset. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data. Both the k-means and k-medoids algorithms are partitioned and both attempt to minimize squared error, the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the kmeans algorithm k-medoids chooses datapoints as centers. Unlike k-means algorithm, k-medoids is not sensitive to dirty data and abnormal data. But the calculation of medoids is clearly larger than the k-means; it is generally only suitable for a small dataset. II. SIMILAR METHOD K-Means algorithm: K-Means [4]is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early group is done. Amit Yerpude et al,Int.J.Computer Techology & Applications,Vol 3 (1),152-154 IJCTA | JAN-FEB 2012 Available [email protected] 152 ISSN:2229-6093

Upload: vutruc

Post on 28-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Image segmentation by Clustering - · PDF fileimage using 2 centroids) Figure 4 ... of Clusters in K-Means Clustering and Application in Colour Image Segmentation". [7] ... means-like

Colour image segmentation using K – Medoids Clustering

Amit Yerpude, Dr. Sipi Dubey Rungta College of Engg. & Tech.

Bhilai, Chhattishgarh,India [email protected]

[email protected]

Abstract—K – medoids clustering is used as a tool for clustering color space based on the distance criterion. This paper presents a color image segmentation method which divides colour space into clusters. Through this paper, using various colour images, we will try to prove that K – Medoids converges to approximate the optimal solution based on this criteria theoretically as well as experimentally. Here we will also compare the efficiency of available algorithm for segmentation of gray as well as noisy images. Keywords— Color image segmentation, Clustering, K - Medoids, K - Means.

I. INTRODUCTION Image segmentation [1] [2] as a important research

area in Digital Image Processing is extremely large, often depending on a variety of techniques from a wide range of other mathematical fields . Segmentation implies the division of an image into different objects or connected regions that do not overlap. Thus, the union of all the regions is the image itself. A region often has a similar intensity or a distinct boundary.

In [3]order to facilitate practical manipulation, recognition, and object-based analysis of multimedia resources, partitioning pixels in an image into groups of coherent properties is indispensable. This process is regarded as image segmentation. Hundreds of methods for color image segmentation have been proposed in the past years. These methods can mainly be classified into two categories: one is contour-based and the other is region–based. Methods of the first category use discontinuity in an image to detect edges or contours in the image, and then use them to partition the image. Methods of the second category try to divide pixels in an image into different groups corresponding to coherent properties such as color etc., that is, it mainly use decision criteria to segment an image into different regions according to the similarity of the pixels. Region growing and clustering are two representative methods of region-based segmentation. Drawbacks of the region growing method are that it is difficult to make the growing or stop growing criteria for different images and the method is sensitive to noise. Recently, most researchers focus on treating the segmentation problem as an unsupervised classification problem or clustering problem. In their methods, segmentation is obtained as the global minima of criterion functions associated with the fuzzy / possibilistic distance between the prototypes and the image pixels. By

partitioning pixels according to their global feature distribution, these methods achieve good global partitioning results for most of the pixels. But in these methods, spatial relationship of the pixels is rarely considered. The loss of spatial information of the pixels maybe leads to unreasonable segmentation results for that the pixels that are similar in low level feature (color etc.) but separate in spatial will be grouped into one region. And at the same time, run time complexity of this global partition is often high.

The process [4]of grouping a set of physical or abstract objects into classes of similar objects is called clustering. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Clustering is an example of unsupervised learning. Unlike classification, clustering and unsupervised learning do not rely on predefined classes and class labeled training examples. In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large dataset. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data.

Both the k-means and k-medoids algorithms are partitioned and both attempt to minimize squared error, the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the kmeans algorithm k-medoids chooses datapoints as centers. Unlike k-means algorithm, k-medoids is not sensitive to dirty data and abnormal data. But the calculation of medoids is clearly larger than the k-means; it is generally only suitable for a small dataset.

II. SIMILAR METHOD K-Means algorithm: K-Means [4]is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early group is done.

Amit Yerpude et al,Int.J.Computer Techology & Applications,Vol 3 (1),152-154

IJCTA | JAN-FEB 2012 Available [email protected]

152

ISSN:2229-6093

Page 2: Image segmentation by Clustering - · PDF fileimage using 2 centroids) Figure 4 ... of Clusters in K-Means Clustering and Application in Colour Image Segmentation". [7] ... means-like

At this point, it is need to re-calculate k new centroids as centers of the clusters resulting from the previous step. After these k new centroids, a new binding has to be done between the same data points and the nearest new centroid. A loop has been generated. As a result of this loop it may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more. Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The algorithm is composed of the following steps:

1. Place k points into the space rep resented by the objects that are being clustered. These points represent initial group centroids

2. Assign each object to the group that has the closest centroid

3. When all objects have been assigned, recalculate the positions of the k centroids

4. Repeat Step 2 and 3 until the centroids no longer move.

This produces a separation of the objects into groups from which the metric to be minimized can be calculated. Although it can be proved that the procedure will always terminate, the K-Means algorithm does not necessarily find the most optimal configuration, corresponding to the global objective function minimum. The algorithm is also significantly sensitive to the initial randomly selected cluster centers. The KMeans algorithm can be run multiple times to reduce this effect. K-Means is a simple algorithm that has been adapted to many problem domains. First, the algorithm randomly selects k of the objects. Each selected object represents a single cluster and because in this case only one object is in the cluster, this object represents the mean or center of the cluster.

III. K – MEDOIDS CLUSTERING K-Medoids algorithm: The K-means algorithm is sensitive to outliers since an object with an extremely large value may substantially distort the distribution of data. How might the algorithm be modified to diminish such sensitivity? Instead of taking the mean value of the objects in a cluster as a reference point, a Medoid can be used, which is the most centrally located object in a cluster. Thus the partitioning method can still be performed based on the principle of minimizing the sum of the dissimilarities between each object and its corresponding reference point. This forms the basis of the K-Medoids method. The basic strategy of KMediods [4] clustering algorithms is to find k clusters in n objects by first arbitrarily finding a representative object (the Medoids) for each cluster. Each remaining object is clustered with the Medoid to which it is the most similar. K-Medoids method uses representative objects as reference points instead of taking the mean value of the objects in each cluster. The algorithm takes the input parameter k, the number of clusters to be partitioned among a set of n objects. A typical K-Mediods algorithm for partitioning based on Medoid or central objects is as follows: Input:

K: The number of clusters D: A data set containing n objects

Output: A set of k clusters that minimizes the sum of the dissimilarities of all the objects to their nearest medoid. Method: Arbitrarily choose k objects in D as the initial representative objects;

Repeat: Assign each remaining object to the cluster with the nearest medoid; Randomly select a non medoid object Orandom; Compute the total points S of swap point Oj with Oramdom if S < 0 then swap Oj with Orandom to form the new set of k medoid Until no change;

Like this algorithm, a Partitioning Around Medoids (PAM) was one of the first k-Medoids algorithms introduced. It attempts to determine k partitions for n objects. After an initial random selection of k medoids, the algorithm repeatedly tries to make a better choice of medoids.

IV. RESULT To verify the proposed segmentation method, experiments were performed on images with different complexity. As examples, Figures gives four pairs of segmented color images. In the test images, figure 4.4 is aerial image and figure 4.3 is with noise (Gaussian noise). Our method is not sensitive to noise and is effective for gray scale image as well.

Figure 4.1.1(Original image)

Figure 4.1.2(Segmented image using 2 centroids)

Figure 4.1.3 (Segmented image using 4 centroids)

Figure 4.1.4(Segmented image using 8 centroids)

Amit Yerpude et al,Int.J.Computer Techology & Applications,Vol 3 (1),152-154

IJCTA | JAN-FEB 2012 Available [email protected]

153

ISSN:2229-6093

Page 3: Image segmentation by Clustering - · PDF fileimage using 2 centroids) Figure 4 ... of Clusters in K-Means Clustering and Application in Colour Image Segmentation". [7] ... means-like

Figure 4.2.1(Original image)

Figure 4.2.2(Segmented image using 2 centroids)

Figure 4.2.3(Segmented image using 4 centroids)

Figure 4.2.4(Segmented image using 8 centroids)

Figure 4.3.1(Original image)

Figure 4.3.2(Segmented image using 2 centroids)

Figure 4.3.3(Segmented image using 4 centroids)

Figure 4.3.4(Segmented image using 8 centroids)

Figure 4.4.1(Original image)

Figure 4.4.2(Segmented image using 2 centroids)

Figure 4.4.3(Segmented image using 4 centroids)

Figure 4.4.4(Segmented image using 8 centroids)

V. CONCLUSION AND FUTURE WORK In this paper we use K – Medoids clustering technique to

segment color images. Figure 4.2 shows that the algorithm is working for gray scale images and Figure 4.3 proves that it is also suitable for noisy images too. It is seen that, the segmented images are highly dependent on the number of segments or centroids hence our future work will consider, the findings of optimal number of segments and to provide more accurate centroids.

VI. REFERENCES

[1] S.Pradeesh Hosea, S. Ranichandra, and T.K.P.Rajagopal, "Color Image Segmentation – An Approach," Color Image Segmentation – An Approach, vol. 2, no. 3, March 2011.

[2] R. Rajesh N. Senthilkumaran, "A Note on Image Segmentation Techniques," International J. of Recent Trends in Engineering and Technology, vol. 3, no. 2, May 2010.

[3] Wen Gao,Wei Zeng Qixiang, "Color Image Segmentation Using Density based clustering," IEEE, 2003.

[4] T. Velmurugan and T. Santhanam, "Computational Complexity between K-Means and K-Medoids Clustering Algorithms," Journal of Computer Science, vol. 6, no. 3, 2010.

[5] Catherine a. Sugar and gareth m. James, "Finding the number of clusters in a data set :An information theoretic approach".

[6] Siddheswar Ray and Rose H. Turi, "Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation".

[7] Osama Abu Abbas, "Comparisions Between Data Clustering Algorithms," The International Arab Journal of Information Technology, vol. 5, no. 3, p. 320, July 2008.

[8] Hae-Sang Park, Jong-Seok Lee, and Chi-Hyuck Jun, "K-means-like Algorithm for K-medoids Clustering and Its Performance".

[9] N. Senthilkumaran and R. Rajesh, "A Note on Image Segmentation Techniques," International J. of Recent Trends in Engineering and Technology, vol. 3, no. 2, May 2010.

[10] Krishna Kant Singh and Akansha Singh, "A Study Of Image Segmentation Algorithms For Different Types of Images," IJCSI International Journal of Computer Science Issues, vol. 7, no. 5, September 2010.

Amit Yerpude et al,Int.J.Computer Techology & Applications,Vol 3 (1),152-154

IJCTA | JAN-FEB 2012 Available [email protected]

154

ISSN:2229-6093