kmeans initialization

39
K-Means Clustering Problem Ahmad Sabiq Febri Maspiyanti Indah Kuntum Khairina Wiwin Farhania Yonatan

Upload: djempol

Post on 05-Dec-2014

1.534 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Kmeans initialization

K-Means Clustering Problem

Ahmad Sabiq

Febri Maspiyanti

Indah Kuntum Khairina

Wiwin Farhania

Yonatan

Page 2: Kmeans initialization

What is k-means?

• To partition n objects into k clusters, based on

attributes.

– Objects of the same cluster are close � their

attributes are related to each other.

– Objects of different clusters are far apart � their

attributes are very dissimilar.

Page 3: Kmeans initialization

Algorithm

• Input: n objects, k (integer k ≤ n)

• Output: k clusters

• Steps: 1. Select k initial centroids.

2. Calculate the distance between each object and each centroid.

3. Assign each object to the cluster with the nearest centroid.

4. Recalculate each centroid.

5. If the centroids don’t change, stop (convergence).

Otherwise, back to step 2.

• Complexity: O(k.n.d.total_iteration)

Page 4: Kmeans initialization

Initialization

• Why is it important? What does it affect?

– Clustering result � local optimum!

– Total iteration / complexity

Page 5: Kmeans initialization

Good Initialization

3 clusters with 2 iterations…

Page 6: Kmeans initialization

Bad Initialization

3 clusters with 4 iterations…

Page 7: Kmeans initialization

Initialization Methods

1. Random

2. Forgy

3. Macqueen

4. Kaufman

Page 8: Kmeans initialization

Random

• Algorithm:

1. Assigns each object to a random cluster.

2. Computes the initial centroid of each cluster.

Page 9: Kmeans initialization

Random

Page 10: Kmeans initialization

Random

Page 11: Kmeans initialization

Random

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30 35

Page 12: Kmeans initialization

Forgy

• Algorithm:

1. Chooses k objects at random and uses them as the initial

centroids.

Page 13: Kmeans initialization

Forgy

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30 35

Page 14: Kmeans initialization

MacQueen

• Algorithm:

1. Chooses k objects at random and uses them as the initial

centroids.

2. Assign each object to the cluster with the nearest

centroid.

3. After each assignment, recalculate the centroid.

Page 15: Kmeans initialization

MacQueen

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30 35

Page 16: Kmeans initialization

MacQueen

Page 17: Kmeans initialization

MacQueen

Page 18: Kmeans initialization

MacQueen

Page 19: Kmeans initialization

MacQueen

Page 20: Kmeans initialization

MacQueen

Page 21: Kmeans initialization

MacQueen

Page 22: Kmeans initialization

MacQueen

Page 23: Kmeans initialization

MacQueen

Page 24: Kmeans initialization

MacQueen

Page 25: Kmeans initialization

Kaufman

Page 26: Kmeans initialization

Kaufman

Page 27: Kmeans initialization

Kaufman

Page 28: Kmeans initialization

Kaufman

Page 29: Kmeans initialization

Kaufman

Page 30: Kmeans initialization

Kaufman

Page 31: Kmeans initialization

Kaufman

Page 32: Kmeans initialization

Kaufman

Page 33: Kmeans initialization

Kaufman

d = 24,33

D = 15,52

C = 0

Page 34: Kmeans initialization

Kaufman

C = 0

C = 0

C = 0

C = 0

C = 0

Page 35: Kmeans initialization

Kaufman

C = 0

C = 0

C = 0

C = 0

C = 0

∑C1 = 2,74

Page 36: Kmeans initialization

Kaufman

∑C1 = 2,74

∑C2 = 12,,21

∑C3 = 12,36

∑C3 = 8,38

∑C5 = 52,55

∑C6 = 55,88

∑C7 = 53,77

∑C8 = 51,16

∑C9 = 42,69

Page 37: Kmeans initialization

Kaufman

∑C1 = 2,74

∑C2 = 12,,21

∑C3 = 12,36

∑C3 = 8,38

∑C5 = 52,55

∑C6 = 55,88

∑C7 = 53,77

∑C8 = 51,16

∑C9 = 42,69

Page 38: Kmeans initialization

Reference

1. J.M. Peña, J.A. Lozano, and P. Larrañaga. An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm. Pattern Recognition Letters, vol. 20, pp. 1027–1040. 1999.

2. J.R. Cano, O. Cordón, F. Herrera, and L. Sánchez. A Greedy Randomized Adaptive Search Procedure Applied to the Clustering Problem as an Initialization Process Using K-Means as a Local Search Procedure. Journal of Intelligent and Fuzzy Systems, vol. 12, pp. 235 – 242. 2002.

3. L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley. 1990.

Page 39: Kmeans initialization

Questions

1. Kenapa inisialisasi penting pada k-means?

2. Metode inisialisasi apa yang memiliki greedy

choice property?

3. Jelaskan kompleksitas O(nkd) pada metode

Random.