combining self-organizing map and clustering algorithms 2

8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

1/24

Combining Self-OrganizingMap and clustering

algorithms in industrialdata analysis

Luca CANETTA

Naoufel CHEIKHROUHOURmy GLARDON

Laboratory for Production Management and Processes

Institute for Production and Robotics

Swiss Federal Institute of Technology at Lausanne


2/24

2

Table of contents

Introduction Motivation

Clustering algorithms

Self-Organising Map (SOM)

New hybrid two-stages approach

Reference data sets analysis

Industrial data analysis

Conclusion

Laboratory for Production Management and ProcessesInstitute for Production and Robotics



3/24

3

Clustering definition (Everitt,1993)Given a collection ofn objects, individuals, animal, plants etc., eachof which is described by a set ofp characteristics or variables( ), derive a useful division into a number of classes (k), on the

basis of their degree of similarity. Both the number of classes and theproperties of the classes are to be determined

Clustering quality measures: High intra-cluster similarity (homogeneous clusters) Low inter-cluster similarity (well-separated clusters)

p

ix

Introduction




4/24

4

Clustering definition (Everitt,1993)Given a collection ofn objects, individuals, animal, plants etc., eachof which is described by a set ofp characteristics or variables( ), derive a useful division into a number of classes (k), on the

basis of their degree of similarity. Both the number of classes and theproperties of the classes are to be determined

Clustering quality measures: High intra-cluster similarity (homogeneous clusters) Low inter-cluster similarity (well-separated clusters)

p

ix

Introduction



Compact clusters

well-separated

clusters

2

3

4

1

1

2

4

3


5/24

5

Motivation

Challenge Database sizes and numbers

Data complexity and heterogeneity

Time responsiveness

Objective

Development of an effective and efficient clusteringmethod





6/24

6

Clustering algorithms

Hierarchical (Everitt, 1993; Mangiameli et al., 1996) Rigid Complexity O(n2) Several methods (Single, Average, Complete, Ward)

Partitioning (Hartigan, 1975; Fung, 2001) Complexity O(n) Fixed number of clusters (k) Cluster seeds (barycentres) choice influences clustering

performance K-means

Traditional two-stages approaches (Punj and Steward, 1983) Hierarchical (Average/Ward) + partitioning (K-means) Combined complexity





7/24

7

Self-Organizing Map (SOM)

input data

output layer

neuron

input layer

Neural Network technique (Kohonen, 1995)

Two (or more) layers

Learning process

Find the closest neuron (most similar) to the input data Update the winning neuron and its neighbours Analyse another input data

Properties Reduced sensibility to noise (outliers) Visualisation of input data topology (2D-3D feature map)





8/24

8

New hybrid two-stagesapproaches

[ ]pn, [ ] nmpm ~, Kclusters

Input data Neuron prototype vectors

Clustering

algorithms

K-meansAverage

Ward

Single

Complete




Objectives Increasing clustering robustness

Decreasing outliers impact

Improving computational efficiency

Improving assignment quality

Process description


9/24

9

Approach features

Similarity measure: Euclidean Distance

Data standardisation

Clustering validation: Lbler index

jljjlj

jljil

il

xx

xxW

minmax

min

= i, j = 1 n l= 1 p( )p

ix

)()1(max KgKgK Kopt +=




homogeneity within groups

heterogeneity between groups==

e

o

h

h

Kg )(


10/24

10

Iris data sets analysis150 instances 3 clusters (equal sizes)

4 dimensions No overlapped clusters




Only the approaches using Single

are not robust (fail to determine

the correct number of clusters)

K-means assignment quality

benefits the most from SOM data

pre-treatment

An hybrid two-stages approach

SOM+K-means

is the most performing method


11/24

11

Abalone data sets analysis4177 instances 3 clusters

8 dimensions overlapped clusters




Hierarchical algorithms if not

preceded by SOM are not robust

Average, Single, SOM + Single

and SOM + Average have

unsatisfactory assignment quality

SOM + Single, SOM + Average

and all hierarchical algorithms will

be disregarded


12/24

12

Abalone data sets analysis4177 instances 3 clusters

8 dimensions overlapped clusters




SOM + Ward is not robust

SOM + K-means is the most

performing method

SOM + hierarchical algorithmsmethods have bad assignment

quality


13/24

13

Industrial data analysis

1020 instances

7 dimensions

1. Commonality

2. Average delivery time3. Variation coefficient ( )

of the delivery time

4. Aver. monthly quantity

5. Variation coefficient of the

monthly quantity6. Unit price

7. Utilisation frequency

SOM + K-means method

x

x

Lbler index (Wilt data)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2 3 4 5 6 7 8

number of clusters

g(k)

4 clusters




Wilt data: description of purchasing and utilisation

characteristics of industrial components


14/24

14

Industrial datacharacterisation Cluster A (416): - Frequent use

- Lowest delivery time

- Lowest VC delivery time

Cluster B (200): - Highest price

- Lowest quantity

Cluster C (108): - Lowest commonality

- Highest VC quantity

Cluster D (296): - Lowest price

- Highest commonality

- Highest frequency- Highest quantity

- Highest delivery time

- Highest VC delivery time





15/24

15

Conclusion

Compared to traditional clustering algorithms new hybridtwo-stages (SOM + clustering) methods have: Better robustness Better efficiency

SOM + Average lasts 8 times less than Average SOM + K-means lasts 3.6 times less than K-means K-means algorithm converges faster if preceded by SOM

Comparable assignment quality

SOM + K-means results to be the most robust and themost performing method

Strategic products management





16/24

16

SOM learning process

An input ( )is analysed and the winning neuron (j*

)is selected The weights of the winning neuron and its close neighbours (jNj*(d))

move toward according to Kohonen rule

The training step is increased (t=t+1). Another input ( )is analysed

Winning neuron

ix

ix

ix

ix

( ))1()(),()1()( += twtxtdtwtw jijj

Neighbours





17/24

17

Comparison results

Clustering quality for the Iris data set

Traditional Clustering

SOM Two-stage Clustering

K-means Ward Average Comp. Single K-means Ward Average Comp. Single

Correctlyclassifieddata (%)

88 87.33 88 83.33 67.33 84.67 92.67 88 82.67 88.87 66.67

Robustness 6/6 0/6 6/6 6/6 0/6





18/24

18

Comparison results

Clustering quality for a subset of the Abalone data set

Traditional Clustering


K-means Ward Average Comp. Single K-means Ward Average Comp. Single


51.10 40.26 48.61 51.09 37.22 51.22 51.22 52.70 37.00 52.39 37.00

Robustness 6/6 0/6 6/6 6/6 0/6





19/24

19

Comparison results

Clustering quality for the entire Abalone data set

TraditionalClustering


K-means K-means Ward Complete


50.30 52.00 51.45 45.39 49.68

Robustness 6/6 6/6 6/6 0/6 6/6





20/24

20

Industrial data clustering

Cluster A (416): - Used frequently

- Shortest Delivery Time

- Least variable Delivery Time Cluster B (200): - Expensive items

- Littlest quantity

Cluster C (108): - Littlest commonality- Highest VC Quantity

Cluster D (296): - Lowest Unit Price

- Highest Commonality, Frequency, Quantity

- Highest and most variable Delivery Time





21/24

Part-Machine matrix (Burbidge 1971)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

A X X

B X X X X X X X X

C X X X X X

D X X X X X X X

E X X X X X X X X X X X X

F X X X X X X X X X X X X X X X X X X

G X X X

H X X X X X X X X X X X X X X X X X X X

I X X X X X X X X X X

J X X X X X X X

K X X X X X X

L X X X X X

M X X

N X X X X

O X X X X X X

P X X X X X X X X

43 parts being manufactured using 16


22/24

SOM+K-means: Part grouping

3 clusters Cluster 5 clusters

Pa

rtnumber

5; 8; 9; 14; 15; 16; 19; 21; 23;29; 33; 41; 43

A 5; 8; 9; 14; 15; 16; 19;21; 23; 29; 33; 41; 43

2; 4; 7; 10; 17; 18; 28; 32; 37;38; 40; 42

B 2; 4; 10; 18; 28; 32; 37;38; 40; 42

1; 3; 6; 11; 12; 13; 20; 22; 24;25; 26; 27; 30; 31; 34; 35; 36;

39

C 3; 11; 20; 22; 24; 27; 30

D 6; 7; 17; 34; 35; 36

E 1; 12; 13; 25; 26; 31; 39

complete each part just within a single manufacturing cell can require to include the same machine in more than one cell


23/24

SOM+K-means: Part grouping Cluster CD E O H F A B I P C G J K L M N

1 X X X X

3 X X X

6 X X

11 X X

12 X X X

13 X X X

17 X X X

20 X X

22 X

24 X X X X

25 X X

26 X

27 X X X

30 X X

31 X X

34 X X

35 X X

36 X

39 X X


24/24

SOM+K-means: Part grouping Cluster A & BD E O H F A B I P C G J K L M N

5 X X X

8 X X X

9 X X X X

14 X X X X

15 X X

16 X

19 X X X X X

21 X X X X

23 X X X X

29 X X

33 X X X

41 X X X

43 X X X X

2 X X X X X X

4 X

7 X X X

10 X X X

18 X X

28 X X X

32 X X X X

37 X X X X X X

38 X X X X

40 X X X

combining self-organizing map and clustering algorithms 2

Documents