nearest neighbor classification · konsep jarak(distance) tipe data binary data interval data...

28
NEAREST NEIGHBOR CLASSIFICATION PRESENTED BY PRESENTED BY Zulhanif 1

Upload: others

Post on 28-Oct-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

NEAREST NEIGHBOR CLASSIFICATION

PRESENTED BYPRESENTED BY

Zulhanif

1

Page 2: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

NEAREST NEIGHBOR?

� Metode klasifikasi berdasarkan K - jarak terdekat

� Top 10 Data Mining Algorithm

� Metode K-NN Simple tetapi merupakan metode

terkini dalam mengklasifikasian

?

2

Page 3: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

REVIEW

3

Page 4: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

REVIEW

Page 5: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

KONSEP JARAK (DISTANCE)

Tipe data

Binary Data

Interval Data Euclidean, Manhattan (Block)

Chi-square

Jaccard, pattern

Qualitativ

Sifat jarak:•d(a, b) ≥ 0•d(a, a) = 0•d(a, b) = d(b, a)•d(a, b) meningkat seiring semakin tidak mirip kedua objek•d(a,c) ≤ d(a,b) + d(b,c)

Manhattan (Block)

Page 6: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

EUCLIDEAN DISTANCE

*

A

B

Y

(x2, y2)

y2-y1

*

X

(x1, y1)x2-x1

*

d = (x2-x1)2

+ (y2-y1)2

Page 7: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

EUCLIDEAN

DISTANCE

*B

Y

(3, 5)

5-2

A

X

(1, 2)3-1

*

d = (3-1)2

+ (5-2)2

= 3,61

Page 8: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

BEBERAPA UKURAN JARAK

Page 9: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

MANHATTANN

λλ

jkikij xxd −=

9

jkikij xxd −=

Page 10: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

10

Page 11: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

11

Page 12: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

CHEBYSHEV DISTANCE

jkikk

ij xxd −= max

12

Page 13: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

13

Page 14: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

14

Page 15: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

NEAREST NEIGHBOR CLASSIFIERS

� Basic idea:

� Jika object yang diamati berjalan seperti bebek,bersuara seperti bebek,berenang seperti bebek maka secara peluang object itu adalah bebek

Test Compute Distance Test

RecordDistance

Choose k of the “nearest” records

Trainingrecords

Page 16: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

PREDICT CLASSIFIERS

Atr1 ……... AtrN ClassA

B

B

Set of Stored Cases• Store the training records

• Use training records to predict the class label of unseen cases

B

C

A

C

B

Atr1 ……... AtrN

Unseen Case

Page 17: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

DEFINITION OF NEAREST NEIGHBOR

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

Page 18: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

NEAREST NEIGHBOR

CLASSIFICATION…ISSUE

� Scaling issues

� Adanya penskalaan attribut untuk mencegah

dominasi salah satu atribut.

� Example:� Example:

� Tinggi bervariasi dari 1.5m to 1.8m

� Berat bervariasi dari 50 Kg to 80 Kg

� Income bervariasi dari 3Jt to 6 Jt

Page 19: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

NEAREST NEIGHBOR

CLASSIFICATION…

� Pemilihan nilai k:

� Jika k terlalu kecil, sensitive terhadap noise points

� Jika k terlalu besar, neighborhood (titik terdekat)

mungkin ada dari klass yang berbeda

Page 20: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -

20

6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?

Page 21: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -

Distance20 (5-1)^2+(3-5)^2

21

6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?

Page 22: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -

Distance202517851616101341102

22

6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?

2172025517429258126

Page 23: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -

Distance202517851616101341102

Nearest Neighbor sign

++

++

-

23

6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?

2172025517429258126

-

-

-

--

Page 24: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

6

8

10

12

Positive

Negative

24

0

2

4

0 2 4 6 8 10 12

?

Page 25: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

LATIHANX1 X2 Y2 3 +1 1 +5 1 +7 3 +7 6 +3 1 +1 5 +6 5 +5 5 +2 7 +4 9 -

25

4 9 -6 8 -5 4 -5 9 -5 7 -7 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -6 5 ?

Page 26: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

X1 X2 Y2 3 +1 1 +5 1 +7 3 +7 6 +3 1 +1 5 +6 5 +5 5 +2 7 +4 9 -6 8 -5 4 -

Distance Nearest Neighbor sign2041175 +2 +25250 +1 +20209

26

5 4 -5 9 -5 7 -7 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -6 5 ?

92 -175 -102 -201 -26205 -4 -25

Page 27: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

V-FOLD CROSSVALIDATION

� V-fold cross validation membagi data kedalam V

folds. Lalu pilih nilai k untuk model nearest

neighbor analysis selanjutnya buat prediksi dari

the vth fold (menggunakan V−1 folds sebagai data training) dan evaluasi error. Proses ini data training) dan evaluasi error. Proses ini

diulang secara successively dengan

menggunakan semua kemungkinan dari v. Pada

akhir dari proses V folds, Hitung rata-rata error,

Ulangi langkah nya untuk nilai k yang lain.

Pemilihan nilai k berdasarkan error yang paling

kecil

27

Page 28: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,

N=40 FOLD=4

Tr Test Test Test Test Test Test Test Test Test

Test Tr Test Test Test Test Test Test Test Test

28

Test Test Tr Test Test Test Test Test Test Test

Test Test Test Tr Test Test Test Test Test Test