the use of k-means and kohonen self organizing maps to classify cotton bales

27
1 THE USE OF K-MEANS AND THE USE OF K-MEANS AND KOHONEN SELF KOHONEN SELF ORGANIZING MAPS TO ORGANIZING MAPS TO CLASSIFY COTTON BALES CLASSIFY COTTON BALES J I Mwasiagi, Huang J I Mwasiagi, Huang XiuBao, Wang XinHou & XiuBao, Wang XinHou & Chen Qing-Dong Chen Qing-Dong

Upload: jirair

Post on 04-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

THE USE OF K-MEANS AND KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES. J I Mwasiagi, Huang XiuBao, Wang XinHou & Chen Qing-Dong. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

11

THE USE OF K-MEANS AND THE USE OF K-MEANS AND KOHONEN SELF ORGANIZING KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON MAPS TO CLASSIFY COTTON

BALESBALES

J I Mwasiagi, Huang XiuBao, J I Mwasiagi, Huang XiuBao, Wang XinHou & Wang XinHou &

Chen Qing-DongChen Qing-Dong

Page 2: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

22

IntroductionIntroduction Cotton yarn characteristics such as length, Cotton yarn characteristics such as length,

micronaire, uniformity, short fiber index, micronaire, uniformity, short fiber index, spinning consistency, color and trash spinning consistency, color and trash measurements can now be measured measurements can now be measured speedily using High Volume Instrument speedily using High Volume Instrument (HVI) (HVI)

To properly utilize the high dimensional To properly utilize the high dimensional data obtained from HVI system, clustering data obtained from HVI system, clustering techniques and other data analysis techniques and other data analysis techniques can be used to describe the techniques can be used to describe the datadata

Page 3: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

33

Kohonen Self Organizing Kohonen Self Organizing MapsMaps

   Kohonen Self Organizing Maps (SOM) Kohonen Self Organizing Maps (SOM) whose architecture is given in figure 1, whose architecture is given in figure 1, learn to recognize groups of similar input learn to recognize groups of similar input vectors in such a way that neurons vectors in such a way that neurons physically close together in the neuron physically close together in the neuron layer respond to similar input vectors layer respond to similar input vectors

Page 4: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

44

The architecture of SOMThe architecture of SOM

Page 5: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

55

SOMSOM

SOM learn both the distribution and topology of SOM learn both the distribution and topology of the input vectors they are trained on. The the input vectors they are trained on. The neurons in the layer of a SOM are arranged in neurons in the layer of a SOM are arranged in physical positions according to given topology physical positions according to given topology and distance functionsand distance functions

The quality of the SOM maps can be checked by The quality of the SOM maps can be checked by using two factors: data representation accuracy using two factors: data representation accuracy and data set topology representation accuracy. and data set topology representation accuracy.

Page 6: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

66

K-means clustering algorithmK-means clustering algorithmClustering algorithms attempt to organize Clustering algorithms attempt to organize

unlabeled feature vectors into clusters or unlabeled feature vectors into clusters or natural groupings natural groupings

K means clustering algorithm constructs k K means clustering algorithm constructs k (k fixed a (k fixed a prioripriori) clusters for a given data ) clusters for a given data set. set.

The K-means algorithm defines k centers The K-means algorithm defines k centers one for each cluster and hence has k one for each cluster and hence has k groupsgroups

Page 7: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

77

K-means clustering algorithmK-means clustering algorithm

Place Place kk points into the space represented by the points into the space represented by the objects that are being clustered. These points objects that are being clustered. These points represent initial group centers.represent initial group centers.

Assign each object to the group whose center is Assign each object to the group whose center is closest.closest.

When all objects have been assigned, When all objects have been assigned, recalculate the positions of the recalculate the positions of the kk centers. centers.

Repeat the second and third steps until the Repeat the second and third steps until the centers no longer move. This produces a centers no longer move. This produces a separation of the objects into groups from which separation of the objects into groups from which the metric to be minimized can be calculatedthe metric to be minimized can be calculated ..

Page 8: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

88

Quality of K-means clusteringQuality of K-means clustering

Davies Bouldin (DB) index Davies Bouldin (DB) index Silhouette meansSilhouette means

Page 9: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

99

Statistical TechniquesStatistical TechniquesGiven a sample of data set for one Given a sample of data set for one

attribute: attribute: Descriptive measures such as measure of Descriptive measures such as measure of

tendency and measure of dispersion tendency and measure of dispersion The results of the above analysis can be The results of the above analysis can be

displayed using two dimensional graphsdisplayed using two dimensional graphs

For data with many attributes (High For data with many attributes (High Dimensional data)Dimensional data)

Principal components analysis technique Principal components analysis technique

Page 10: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1010

Materials and MethodsMaterials and Methods  

Materials Materials HVI data obtained from Shanghai HVI data obtained from Shanghai

Inspection Center of Industrial Inspection Center of Industrial Products and Materials (SICIPM)Products and Materials (SICIPM)

The total data sample had 2421 cotton The total data sample had 2421 cotton bales and 13 cotton 13 cotton lint bales and 13 cotton 13 cotton lint characteristics were measured for each characteristics were measured for each bale.bale.

Data 2421x13 (High dimensional)Data 2421x13 (High dimensional)

Page 11: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1111

HVI CharacteristicsHVI Characteristics

13 cotton HVI characteristics:13 cotton HVI characteristics:

micronaire (Mic), maturity (Mat), length micronaire (Mic), maturity (Mat), length (Len), elongation (Elg), strength (Str), (Len), elongation (Elg), strength (Str), Short Fiber Index (Sfi), length uniformity Short Fiber Index (Sfi), length uniformity (Unf), Spinning Consistency Index (Sci), (Unf), Spinning Consistency Index (Sci), reflectance (Rd), yellowness (+b), trash reflectance (Rd), yellowness (+b), trash cent (Trc), trash area (Tra) and trash cent (Trc), trash area (Tra) and trash grade (Trg). grade (Trg).

Page 12: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1212

MethodsMethods (i)(i)  Use SOM data visualization technique Use SOM data visualization technique

to get an idea of the nature of clustering to get an idea of the nature of clustering within HVI data,within HVI data,

(ii) Partition the HVI data using K-means (ii) Partition the HVI data using K-means technique. The value of k should be technique. The value of k should be obtained from (i) above, and obtained from (i) above, and

(iii) Check for data group compactness (iii) Check for data group compactness using other methods and techniques such using other methods and techniques such as silhouette means, coefficient of as silhouette means, coefficient of variation and principal component variation and principal component analysis. analysis.

Page 13: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1313

MethodsMethods(i)(i)   (i)    (i) Use SOM data visualization technique Use SOM data visualization technique

to get an idea of the nature of clustering to get an idea of the nature of clustering within HVI data,within HVI data,

(ii) (ii) (ii) (ii) Partition the HVI data using K-means Partition the HVI data using K-means technique. The value of k should be technique. The value of k should be obtained from (i) above, and obtained from (i) above, and

( ( (iii) (iii) Check for data group compactness Check for data group compactness using other methods and techniques such using other methods and techniques such as silhouette means, coefficient of as silhouette means, coefficient of variation and principal component variation and principal component analysis. analysis.

Page 14: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1414

Result and DiscussionResult and Discussion

Data VisualizationData Visualization SOM reduced the 2421x13 data into SOM reduced the 2421x13 data into

18x13 grids, with a quantization error of 18x13 grids, with a quantization error of 1.879 and topographic error of 0.083. 1.879 and topographic error of 0.083.

The number of hits for each node of the The number of hits for each node of the SOM map is given in figure 2, which SOM map is given in figure 2, which shows that there are multiple hits in 231 shows that there are multiple hits in 231 out of the 234 nodes of the visualization out of the 234 nodes of the visualization grids. grids.

Page 15: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1515

No. of hits for SOM nodesNo. of hits for SOM nodes

Page 16: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1616

Data ClusteringData Clustering

Optimum clusters is 19 (fig. 3)Optimum clusters is 19 (fig. 3)The means and the Coefficient of variation The means and the Coefficient of variation

(CV) for the groups are given in Tables 1 (CV) for the groups are given in Tables 1 and 2. and 2.

The CVs for the 2The CVs for the 2ndnd, 7, 7thth and 17 and 17thth group are group are higher than that of the main group (MG) higher than that of the main group (MG) for some attributes. for some attributes.

16 groups are properly partitioned16 groups are properly partitioned

Page 17: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1717

Figure 3Figure 3

Page 18: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1818

Table 1Table 1

Page 19: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

1919

Table 2Table 2

Page 20: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2020

The Re-combined Subgroup The Re-combined Subgroup (RSG)(RSG)

22ndnd, 7, 7thth and 17 and 17thth groups re-combined and groups re-combined and designated as RSGdesignated as RSG

Visualization and the hits for RSG is given Visualization and the hits for RSG is given in figure 4in figure 4

Level of grouping is very low with only five Level of grouping is very low with only five out of the 66 nodes having hits of more out of the 66 nodes having hits of more than 5, and about 44% of all the nodes than 5, and about 44% of all the nodes have one (1) or zero hits. have one (1) or zero hits.

  

Page 21: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2121

Visualization of RSGVisualization of RSG

Page 22: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2222

Optimum clusters for RSGOptimum clusters for RSG

The silhouette means for the RSG is given The silhouette means for the RSG is given in figure 5. The optimum clustering is 3, in figure 5. The optimum clustering is 3, which was the initial number of groups in which was the initial number of groups in RSG before re-combining.RSG before re-combining.

Checking out the principal components of the Checking out the principal components of the attributes (Table 3), shows that 97. 87% of attributes (Table 3), shows that 97. 87% of the variance is accounted for by the first two the variance is accounted for by the first two principal components. principal components.

Page 23: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2323

Clustering of RSGClustering of RSG

Fig. 5: Clustering the RSG

0.52

0.56

0.6

0.64

0.68

2 4 6 8 10 12

No. of groups in the data

sill

houette m

eans

Page 24: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2424

Table 3Table 3

Page 25: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2525

Scatter plot for RSGScatter plot for RSG

B and C are widely B and C are widely scattered and can be scattered and can be subdivided into two subdivided into two each-hence RSG has each-hence RSG has five groupsfive groups

Still the five groups do Still the five groups do not show a sense of not show a sense of compactness compactness

RSG is just a RSG is just a collection of outlierscollection of outliers

Page 26: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2626

ConclusionConclusion

A cotton bale classification model has been A cotton bale classification model has been proposed and used to classify 2421 cotton balesproposed and used to classify 2421 cotton bales

The model reduced the 2421x13 HVI high The model reduced the 2421x13 HVI high dimensional data into 18x13 grids, with a dimensional data into 18x13 grids, with a quantization error of 1.879 and a topographic error quantization error of 1.879 and a topographic error of 0.083, and initially identified 19 of 0.083, and initially identified 19 groups in the groups in the data. data.

Three of the groups containing a total of 144 data Three of the groups containing a total of 144 data failed the compactness test. The final classification failed the compactness test. The final classification of the 2421 cotton bales contains 16 groups of of the 2421 cotton bales contains 16 groups of cotton bales having a total of 2277 bales and five cotton bales having a total of 2277 bales and five sets of 144 bales containing outliers. sets of 144 bales containing outliers.

Page 27: THE USE OF K-MEANS AND          KOHONEN SELF ORGANIZING MAPS TO CLASSIFY COTTON BALES

2727

Thanks !Thanks !

Q and AQ and A