combining self-organizing map and clustering algorithms 2

Upload: luca-canetta

Post on 07-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    1/24

    Combining Self-OrganizingMap and clustering

    algorithms in industrialdata analysis

    Luca CANETTA

    Naoufel CHEIKHROUHOURmy GLARDON

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    2/24

    2

    Table of contents

    Introduction Motivation

    Clustering algorithms

    Self-Organising Map (SOM)

    New hybrid two-stages approach

    Reference data sets analysis

    Industrial data analysis

    Conclusion

    Laboratory for Production Management and ProcessesInstitute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    3/24

    3

    Clustering definition (Everitt,1993)Given a collection ofn objects, individuals, animal, plants etc., eachof which is described by a set ofp characteristics or variables( ), derive a useful division into a number of classes (k), on the

    basis of their degree of similarity. Both the number of classes and theproperties of the classes are to be determined

    Clustering quality measures: High intra-cluster similarity (homogeneous clusters) Low inter-cluster similarity (well-separated clusters)

    p

    ix

    Introduction

    Laboratory for Production Management and ProcessesInstitute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    4/24

    4

    Clustering definition (Everitt,1993)Given a collection ofn objects, individuals, animal, plants etc., eachof which is described by a set ofp characteristics or variables( ), derive a useful division into a number of classes (k), on the

    basis of their degree of similarity. Both the number of classes and theproperties of the classes are to be determined

    Clustering quality measures: High intra-cluster similarity (homogeneous clusters) Low inter-cluster similarity (well-separated clusters)

    p

    ix

    Introduction

    Laboratory for Production Management and ProcessesInstitute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

    Compact clusters

    well-separated

    clusters

    2

    3

    4

    1

    1

    2

    4

    3

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    5/24

    5

    Motivation

    Challenge Database sizes and numbers

    Data complexity and heterogeneity

    Time responsiveness

    Objective

    Development of an effective and efficient clusteringmethod

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    6/24

    6

    Clustering algorithms

    Hierarchical (Everitt, 1993; Mangiameli et al., 1996) Rigid Complexity O(n2) Several methods (Single, Average, Complete, Ward)

    Partitioning (Hartigan, 1975; Fung, 2001) Complexity O(n) Fixed number of clusters (k) Cluster seeds (barycentres) choice influences clustering

    performance K-means

    Traditional two-stages approaches (Punj and Steward, 1983) Hierarchical (Average/Ward) + partitioning (K-means) Combined complexity

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    7/24

    7

    Self-Organizing Map (SOM)

    input data

    output layer

    neuron

    input layer

    Neural Network technique (Kohonen, 1995)

    Two (or more) layers

    Learning process

    Find the closest neuron (most similar) to the input data Update the winning neuron and its neighbours Analyse another input data

    Properties Reduced sensibility to noise (outliers) Visualisation of input data topology (2D-3D feature map)

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    8/24

    8

    New hybrid two-stagesapproaches

    [ ]pn, [ ] nmpm ~, Kclusters

    Input data Neuron prototype vectors

    Clustering

    algorithms

    K-meansAverage

    Ward

    Single

    Complete

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

    Objectives Increasing clustering robustness

    Decreasing outliers impact

    Improving computational efficiency

    Improving assignment quality

    Process description

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    9/24

    9

    Approach features

    Similarity measure: Euclidean Distance

    Data standardisation

    Clustering validation: Lbler index

    jljjlj

    jljil

    il

    xx

    xxW

    minmax

    min

    = i, j = 1 n l= 1 p( )p

    ix

    )()1(max KgKgK Kopt +=

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

    homogeneity within groups

    heterogeneity between groups==

    e

    o

    h

    h

    Kg )(

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    10/24

    10

    Iris data sets analysis150 instances 3 clusters (equal sizes)

    4 dimensions No overlapped clusters

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

    Only the approaches using Single

    are not robust (fail to determine

    the correct number of clusters)

    K-means assignment quality

    benefits the most from SOM data

    pre-treatment

    An hybrid two-stages approach

    SOM+K-means

    is the most performing method

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    11/24

    11

    Abalone data sets analysis4177 instances 3 clusters

    8 dimensions overlapped clusters

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

    Hierarchical algorithms if not

    preceded by SOM are not robust

    Average, Single, SOM + Single

    and SOM + Average have

    unsatisfactory assignment quality

    SOM + Single, SOM + Average

    and all hierarchical algorithms will

    be disregarded

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    12/24

    12

    Abalone data sets analysis4177 instances 3 clusters

    8 dimensions overlapped clusters

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

    SOM + Ward is not robust

    SOM + K-means is the most

    performing method

    SOM + hierarchical algorithmsmethods have bad assignment

    quality

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    13/24

    13

    Industrial data analysis

    1020 instances

    7 dimensions

    1. Commonality

    2. Average delivery time3. Variation coefficient ( )

    of the delivery time

    4. Aver. monthly quantity

    5. Variation coefficient of the

    monthly quantity6. Unit price

    7. Utilisation frequency

    SOM + K-means method

    x

    x

    Lbler index (Wilt data)

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    2 3 4 5 6 7 8

    number of clusters

    g(k)

    4 clusters

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

    Wilt data: description of purchasing and utilisation

    characteristics of industrial components

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    14/24

    14

    Industrial datacharacterisation Cluster A (416): - Frequent use

    - Lowest delivery time

    - Lowest VC delivery time

    Cluster B (200): - Highest price

    - Lowest quantity

    Cluster C (108): - Lowest commonality

    - Highest VC quantity

    Cluster D (296): - Lowest price

    - Highest commonality

    - Highest frequency- Highest quantity

    - Highest delivery time

    - Highest VC delivery time

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    15/24

    15

    Conclusion

    Compared to traditional clustering algorithms new hybridtwo-stages (SOM + clustering) methods have: Better robustness Better efficiency

    SOM + Average lasts 8 times less than Average SOM + K-means lasts 3.6 times less than K-means K-means algorithm converges faster if preceded by SOM

    Comparable assignment quality

    SOM + K-means results to be the most robust and themost performing method

    Strategic products management

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    16/24

    16

    SOM learning process

    An input ( )is analysed and the winning neuron (j*

    )is selected The weights of the winning neuron and its close neighbours (jNj*(d))

    move toward according to Kohonen rule

    The training step is increased (t=t+1). Another input ( )is analysed

    Winning neuron

    ix

    ix

    ix

    ix

    ( ))1()(),()1()( += twtxtdtwtw jijj

    Neighbours

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    17/24

    17

    Comparison results

    Clustering quality for the Iris data set

    Traditional Clustering

    SOM Two-stage Clustering

    K-means Ward Average Comp. Single K-means Ward Average Comp. Single

    Correctlyclassifieddata (%)

    88 87.33 88 83.33 67.33 84.67 92.67 88 82.67 88.87 66.67

    Robustness 6/6 0/6 6/6 6/6 0/6

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    18/24

    18

    Comparison results

    Clustering quality for a subset of the Abalone data set

    Traditional Clustering

    SOM Two-stage Clustering

    K-means Ward Average Comp. Single K-means Ward Average Comp. Single

    Correctlyclassifieddata (%)

    51.10 40.26 48.61 51.09 37.22 51.22 51.22 52.70 37.00 52.39 37.00

    Robustness 6/6 0/6 6/6 6/6 0/6

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    19/24

    19

    Comparison results

    Clustering quality for the entire Abalone data set

    TraditionalClustering

    SOM Two-stage Clustering

    K-means K-means Ward Complete

    Correctlyclassifieddata (%)

    50.30 52.00 51.45 45.39 49.68

    Robustness 6/6 6/6 6/6 0/6 6/6

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    20/24

    20

    Industrial data clustering

    Cluster A (416): - Used frequently

    - Shortest Delivery Time

    - Least variable Delivery Time Cluster B (200): - Expensive items

    - Littlest quantity

    Cluster C (108): - Littlest commonality- Highest VC Quantity

    Cluster D (296): - Lowest Unit Price

    - Highest Commonality, Frequency, Quantity

    - Highest and most variable Delivery Time

    Laboratory for Production Management and Processes

    Institute for Production and Robotics

    Swiss Federal Institute of Technology at Lausanne

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    21/24

    Part-Machine matrix (Burbidge 1971)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

    A X X

    B X X X X X X X X

    C X X X X X

    D X X X X X X X

    E X X X X X X X X X X X X

    F X X X X X X X X X X X X X X X X X X

    G X X X

    H X X X X X X X X X X X X X X X X X X X

    I X X X X X X X X X X

    J X X X X X X X

    K X X X X X X

    L X X X X X

    M X X

    N X X X X

    O X X X X X X

    P X X X X X X X X

    43 parts being manufactured using 16

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    22/24

    SOM+K-means: Part grouping

    3 clusters Cluster 5 clusters

    Pa

    rtnumber

    5; 8; 9; 14; 15; 16; 19; 21; 23;29; 33; 41; 43

    A 5; 8; 9; 14; 15; 16; 19;21; 23; 29; 33; 41; 43

    2; 4; 7; 10; 17; 18; 28; 32; 37;38; 40; 42

    B 2; 4; 10; 18; 28; 32; 37;38; 40; 42

    1; 3; 6; 11; 12; 13; 20; 22; 24;25; 26; 27; 30; 31; 34; 35; 36;

    39

    C 3; 11; 20; 22; 24; 27; 30

    D 6; 7; 17; 34; 35; 36

    E 1; 12; 13; 25; 26; 31; 39

    complete each part just within a single manufacturing cell can require to include the same machine in more than one cell

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    23/24

    SOM+K-means: Part grouping Cluster CD E O H F A B I P C G J K L M N

    1 X X X X

    3 X X X

    6 X X

    11 X X

    12 X X X

    13 X X X

    17 X X X

    20 X X

    22 X

    24 X X X X

    25 X X

    26 X

    27 X X X

    30 X X

    31 X X

    34 X X

    35 X X

    36 X

    39 X X

  • 8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2

    24/24

    SOM+K-means: Part grouping Cluster A & BD E O H F A B I P C G J K L M N

    5 X X X

    8 X X X

    9 X X X X

    14 X X X X

    15 X X

    16 X

    19 X X X X X

    21 X X X X

    23 X X X X

    29 X X

    33 X X X

    41 X X X

    43 X X X X

    2 X X X X X X

    4 X

    7 X X X

    10 X X X

    18 X X

    28 X X X

    32 X X X X

    37 X X X X X X

    38 X X X X

    40 X X X