1 estimating the reliability of the knn classification maxim tsypin and heinrich röder biodesix,...

Estimating the reliability of the kNN classification

Maxim Tsypin and Heinrich Röder

Biodesix, Steamboat Springs, CO

1. kNN classification

2. Problems with kNN

3. Estimating the reliability of the kNN classification

k-Nearest Neighbor (kNN) classification

• Two classes

• Training set: N1 instances of class 1 N2 instances of class 2

• Each instance is characterized by d values, and is represented by a point in d-dimensional space

• k nearest neighbors of the test instance:

k1 instances of class 1

1( )dx x x

k = 5A: 5:0B: 3:2C: 0:5

Problems of simple kNN

• Works properly only when N1 = N2 . Adding to the training set more instances of a given class would bias classification results in favor of this class.

• No information on the confidence of class assignment for the individual test instances. Intuitively, the confidence of class assignment in the 5:0 case should be greater than in the 3:2 case.

k = 5A: 5:0B: 3:2C: 0:5

The question

• Two classes

• Training set: N1 instances of class 1 N2 instances of class 2

• k nearest neighbors of a given test instance:

k = k1 + k2

Given N1 , N2 , k1 , k2 , what is the probability that this test instance belongs to class 1 ? x1

k = 5A: 5:0B: 3:2C: 0:5

The answerTwo derivations:1) within the kernel density estimation framework: a fixed vicinity of the

test instance determines the number of neighbors.2) within the kNN framework: a fixed number of neighbors determines

the size of the vicinity.Both approaches lead to the same result:

For N1 = N2, this simplifies to:

• Quantifies the reliability of class assignment for each individual test instance, depending only on the (known) training set data.

• Properly accounts for complications arising when the numbers of training instances in the two classes are different, i.e. N1 ≠ N2 .

12 1 2 1 2

1(class 1) 1, 1; 3;1 .

kP F k k k

1 1(class 1)(class 1) , .

2 (class 2) 1

k k P k

1 estimating the reliability of the knn classification maxim tsypin and heinrich röder biodesix,...

Documents

sweet knn: an efﬁcient knn on gpu through reconciliation...

dx140w - knn cambodia

nearest-neighbor classifierweb.iitd.ac.in/~bspanda/knn...

kato mivule: an investigation of data privacy and utility...

classification of knot defect types using wavelets and knn

knn matting

knn imputation

1 peter fox data analytics – itws-4963/itws-6965 week 5a,...

examples of classification methods csit5210. content knn...

presentación - knn.20085205

latihan knn

defect classification in ndt applications using time ... ·...

knn (k-nearest neighbors)

konseplearning naivebayes knn

12 knn perceptron

cs7267 machine learning - kennesaw state...

lecture 11: cross validation - stanford universityscenario2...

hog feature extraction and knn classification for

knn - parzen pencereleri

rooftop solar-knn