1 estimating the reliability of the knn classification maxim tsypin and heinrich röder biodesix,...

5
1 Estimating the reliability of the kNN classification Maxim Tsypin and Heinrich Röder Biodesix, Steamboat Springs, CO 1. kNN classification 2. Problems with kNN 3. Estimating the reliability of the kNN classification

Upload: dorothy-terry

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Estimating the reliability of the kNN classification Maxim Tsypin and Heinrich Röder Biodesix, Steamboat Springs, CO 1.kNN classification 2.Problems

1

Estimating the reliability of the kNN classification

Maxim Tsypin and Heinrich Röder

Biodesix, Steamboat Springs, CO

1. kNN classification

2. Problems with kNN

3. Estimating the reliability of the kNN classification

Page 2: 1 Estimating the reliability of the kNN classification Maxim Tsypin and Heinrich Röder Biodesix, Steamboat Springs, CO 1.kNN classification 2.Problems

2

k-Nearest Neighbor (kNN) classification

• Two classes

• Training set: N1 instances of class 1 N2 instances of class 2

• Each instance is characterized by d values, and is represented by a point in d-dimensional space

• k nearest neighbors of the test instance:

k1 instances of class 1

k2 instances of class 2

1( )dx x x

x1

x2 A

B

C

k = 5A: 5:0B: 3:2C: 0:5

Page 3: 1 Estimating the reliability of the kNN classification Maxim Tsypin and Heinrich Röder Biodesix, Steamboat Springs, CO 1.kNN classification 2.Problems

3

Problems of simple kNN

• Works properly only when N1 = N2 . Adding to the training set more instances of a given class would bias classification results in favor of this class.

• No information on the confidence of class assignment for the individual test instances. Intuitively, the confidence of class assignment in the 5:0 case should be greater than in the 3:2 case.

x1

x2 A

B

C

k = 5A: 5:0B: 3:2C: 0:5

Page 4: 1 Estimating the reliability of the kNN classification Maxim Tsypin and Heinrich Röder Biodesix, Steamboat Springs, CO 1.kNN classification 2.Problems

4

The question

• Two classes

• Training set: N1 instances of class 1 N2 instances of class 2

• k nearest neighbors of a given test instance:

k1 instances of class 1

k2 instances of class 2

k = k1 + k2

Given N1 , N2 , k1 , k2 , what is the probability that this test instance belongs to class 1 ? x1

x2 A

B

C

k = 5A: 5:0B: 3:2C: 0:5

Page 5: 1 Estimating the reliability of the kNN classification Maxim Tsypin and Heinrich Röder Biodesix, Steamboat Springs, CO 1.kNN classification 2.Problems

5

The answerTwo derivations:1) within the kernel density estimation framework: a fixed vicinity of the

test instance determines the number of neighbors.2) within the kNN framework: a fixed number of neighbors determines

the size of the vicinity.Both approaches lead to the same result:

For N1 = N2, this simplifies to:

• Quantifies the reliability of class assignment for each individual test instance, depending only on the (known) training set data.

• Properly accounts for complications arising when the numbers of training instances in the two classes are different, i.e. N1 ≠ N2 .

1

2

12 1 2 1 2

1 2

1(class 1) 1, 1; 3;1 .

2N

N

kP F k k k

k k

1 1

1 2 2

1 1(class 1)(class 1) , .

2 (class 2) 1

k kPP

k k P k