a genetic clustering algorithm for data with non-spherical-shape clusters
DESCRIPTION
A genetic clustering algorithm for data with non-spherical-shape clusters. Outline. Motivation Objective Introduction The basic concept of genetic strategy The genetic clustering algorithm Experiments Concluding remarks and Summary Personal opinions Review. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/1.jpg)
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
A genetic clustering algorithm for data with non-spherical-shape clusters
Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Lin Yu Tseng
Shiueng Bien Yang
Department of Information Management
Pattern Recognition 33 (2000) 1251-1259
![Page 2: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/2.jpg)
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Outline
Motivation Objective Introduction The basic concept of genetic strategy The genetic clustering algorithm Experiments Concluding remarks and Summary Personal opinions Review
![Page 3: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/3.jpg)
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation
Some problems of the clustering. The number of clusters? The threshold distance d in neighborhood clustering. Non-spherical-shape clusters.
![Page 4: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/4.jpg)
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objective
To solve the problem of these traditional clustering algorithm.
A genetic clustering algorithm for clustering. Non-spherical-shape clusters. According to the similarities and automatically find the pr
oper k.
![Page 5: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/5.jpg)
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction
These clustering methods can broadly be classified into two categories: Hierarchical
agglomerative divisive
Non-hierarchical k-means
![Page 6: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/6.jpg)
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction
The problems in most of these clustering algorithms The number of clusters? Non-spherical shape cluster? The threshold of distance for merge?
GA clustering algorithm Searching, as same as clustering.
![Page 7: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/7.jpg)
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Basic concept of Classical Genetic Algorithm
Encoding schemas
Fitness evaluation
Testing the end of the algorithm
Parent selection
Crossover operators
Mutation operators
NO Halt
YES
![Page 8: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/8.jpg)
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
The genetic clustering algorithm
The algorithm CLUSTERING consists of two stages
First stage
Nearest Neighbor
C1, C2, …, Cm
n objects,
O1, O2, …, On
Second stage
GA clustering
merge
![Page 9: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/9.jpg)
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
First Stage
Step 1: find the nearest neighbor of each object Oi.
Step 2: dav, the average of the nearest neighbor distances.
The mean of u ?
![Page 10: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/10.jpg)
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
First Stage
Step 3: compute the adjacency matrix Anxn.
Step 4: connected components be denoted by
C1, C2, …, Cm.
nij
otherwise
dOOifjiA ji
1 where
,
||||,
0
1),(
![Page 11: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/11.jpg)
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Second Stage
The initialization step Population Coding Dinter and Dintra
The three phases of GA Reproduction phase Crossover phase Mutation phase
Encoding schemas
Fitness evaluation
Testing the end of the algorithm
Parent selection
Crossover operators
Mutation operators
NO Halt
YES
![Page 12: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/12.jpg)
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Second Stage
Distance matrix Dmxm of each pair of cluster Ci and Cj.
![Page 13: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/13.jpg)
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Second Stage
The initialization step Population: 50 strings. The length of each string is m:
{C1, C2, …, Cm}
For each string Ri, two sets Ui and U’i are defined
1 1 1 0 0
R1
1 0 1 1 0
R2
m
U1={C1, C2, C3} ; U’1={C4, C5}
U2={C1, C3, C4} ; U’2={C2, C5}
![Page 14: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/14.jpg)
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Second Stage
Intra-distance Dintra and the inter-distance Dinter
U1={C1, C2, C3} ; U’1={C4, C5, C7}
![Page 15: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/15.jpg)
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Second Stage
Reproduction phase Fitness function
SCORE(Ri) = Dinter(Ri)*w – Dintra(Ri), w within [1,3]. Reproducted probability
Crossover phase pc = 0.8.
Mutation phase pm = 0.1.
R1 1 1 1 0 0R2 1 0 1 1 0
N
iii RSCORERSCORE
1
)(/)(
![Page 16: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/16.jpg)
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Merge_Sets_Finding Algorithm
Step 1: Sort the fitness of the strings.
Step 2: Choose Ri.
Step 3: Choose smallest l > i such that .IF no such l exists THEN go to Step 4(discarded)
ELSE i = l and go to Step 2(merge)
Step 4: End.
)(...)()( 21 NRSCORERSCORERSCORE
R1={C1, C2, C3}
R2={C3, C4, C6}
R3={C4, C5}iUUU
Ui ;1
UU l
![Page 17: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/17.jpg)
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 1
Noise : distance > 2dav
Original
![Page 18: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/18.jpg)
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 1
u=1.2, 8 clusters
7 clusters
![Page 19: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/19.jpg)
19
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 1
6 clusters u=1.5 or 2, 5 clusters
![Page 20: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/20.jpg)
20
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 1
u=1.2, w=2,
4 clusters (best)
3 clusters
![Page 21: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/21.jpg)
21
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 1
2 clusters 4 clusters (direct GA)
![Page 22: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/22.jpg)
22
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 1
4 clusters (k-mean)
![Page 23: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/23.jpg)
23
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 2
Original
4 clusters
3 clusters
2 clusters
![Page 24: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/24.jpg)
24
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments - 3
Original
4 clusters
![Page 25: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/25.jpg)
25
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Concluding and Summary
A genetic clustering algorithm CLUSTERING Non-spherical shape. Automatic clustering. Binary searching the proper interval for w.
![Page 26: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/26.jpg)
26
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Personal Opinions The proper number of cluster decide by the value of w.
![Page 27: A genetic clustering algorithm for data with non-spherical-shape clusters](https://reader031.vdocuments.net/reader031/viewer/2022020307/568135e7550346895d9d5ba3/html5/thumbnails/27.jpg)
27
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Review
Using GCA to automatic clustering. Split : NN. Merge : Merge_Sets_Finding Algorithm.