1 k-nearest neighbor methods william cohen 10-601 april 2008

35
1 K-nearest neighbor methods William Cohen 10-601 April 2008

Upload: cora-collins

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

1

K-nearest neighbor methods

William Cohen

10-601 April 2008

Page 2: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

2

But first….

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120 140 160

Number of Publications

Ag

e in

Yea

rs

267

1ˆ xy

Page 3: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

3

Onward: multivariate linear regression

1)(ˆ xxyx TTw

nxx ,....,1x

nyy ,....,1y

1

11

)(ˆ

ˆ...ˆˆ

XXX

xwxwyTT

kk

yw

knn

k

xx

xx

X

,....,

...

,....,

1

111

ny

y

...1

y

Univariate Multivariate

row is example

col is feature

2)](ˆ[minarg ww i

iT

ii y xww )(̂

Page 4: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

4

X Y

Page 5: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

5

Page 6: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

6

Page 7: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

7

ACM Computing Surveys 2002

Page 8: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

8

Page 9: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

9

Review of K-NN methods (so far)

Page 10: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

10

Kernel regression

• aka locally weighted regression, locally linear regression, LOESS, …

What does making the kernel wider do to bias and variance?

Page 11: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

11

BellCore’s MovieRecommender• Participants sent email to [email protected]• System replied with a list of 500 movies to rate on a

1-10 scale (250 random, 250 popular)– Only subset need to be rated

• New participant P sends in rated movies via email• System compares ratings for P to ratings of (a

random sample of) previous users• Most similar users are used to predict scores for

unrated movies (more later)• System returns recommendations in an email

message.

Page 12: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

12

Suggested Videos for: John A. Jamus.

Your must-see list with predicted ratings:

•7.0 "Alien (1979)"

•6.5 "Blade Runner"

•6.2 "Close Encounters Of The Third Kind (1977)"

Your video categories with average ratings:

•6.7 "Action/Adventure"

•6.5 "Science Fiction/Fantasy"

•6.3 "Children/Family"

•6.0 "Mystery/Suspense"

•5.9 "Comedy"

•5.8 "Drama"

Page 13: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

13

The viewing patterns of 243 viewers were consulted. Patterns of 7 viewers were found to be most similar. Correlation with target viewer:

•0.59 viewer-130 ([email protected])

•0.55 bullert,jane r ([email protected])

•0.51 jan_arst ([email protected])

•0.46 Ken Cross ([email protected])

•0.42 rskt ([email protected])

•0.41 kkgg ([email protected])

•0.41 bnn ([email protected])

By category, their joint ratings recommend:

•Action/Adventure:

•"Excalibur" 8.0, 4 viewers

•"Apocalypse Now" 7.2, 4 viewers

•"Platoon" 8.3, 3 viewers

•Science Fiction/Fantasy:

•"Total Recall" 7.2, 5 viewers

•Children/Family:

•"Wizard Of Oz, The" 8.5, 4 viewers

•"Mary Poppins" 7.7, 3 viewers

Mystery/Suspense: •"Silence Of The Lambs, The" 9.3, 3 viewers

Comedy: •"National Lampoon's Animal House" 7.5, 4 viewers •"Driving Miss Daisy" 7.5, 4 viewers •"Hannah and Her Sisters" 8.0, 3 viewers

Drama: •"It's A Wonderful Life" 8.0, 5 viewers •"Dead Poets Society" 7.0, 5 viewers •"Rain Man" 7.5, 4 viewers

Correlation of predicted ratings with your actual ratings is: 0.64 This number measures ability to evaluate movies accurately for you. 0.15 means low ability. 0.85 means very good ability. 0.50

means fair ability.

Page 14: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

14

Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98)

• vi,j= vote of user i on item j

• Ii = items for which user i has voted

• Mean vote for i is

• Predicted vote for “active user” a is weighted sum

weights of n similar usersnormalizer

Page 15: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

15

Basic k-nearest neighbor classification

• Training method:– Save the training examples

• At prediction time:– Find the k training examples (x1,y1),…(xk,yk) that

are closest to the test example x

– Predict the most frequent class among those yi’s.

• Example: http://cgm.cs.mcgill.ca/~soss/cs644/projects/simard/

Page 16: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

16

What is the decision boundary?Voronoi diagram

Page 17: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

17

Convergence of 1-NN

x

yx1

y1

x2

y2

neighbor

P(Y|x1)

P(Y|x’’)

P(Y|x)

*'

22

'

2

1

)|'Pr()|*Pr(1

)|'Pr(1

)Pr(1

knnError)(

yy

y

xyYxy

xyY

yy

P

assume equal

let y*=argmax Pr(y|x)

rate)error optimal Bayes(2

))|*Pr(1(2

...

xy

Page 18: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

18

Basic k-nearest neighbor classification

• Training method:– Save the training examples

• At prediction time:– Find the k training examples (x1,y1),…(xk,yk) that

are closest to the test example x– Predict the most frequent class among those yi’s.

• Improvements:– Weighting examples from the neighborhood– Measuring “closeness”– Finding “close” examples in a large training set

quickly

Page 19: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

19

K-NN and irrelevant features

+ ++ ++ + + +oo o ooo ooooo ooo oo oo?

Page 20: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

20

K-NN and irrelevant features

+

+

+

+

+

++ +

o

o

o o

o

o

oo

o

o

o

oo

o

o

o

o

o?

Page 21: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

21

K-NN and irrelevant features

+

+

+

++

+ + +oo

o oo

o

ooo

o

ooo

oo

o

oo?

Page 22: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

22

Ways of rescaling for KNN

Normalized L1 distance:

Scale by IG:

Modified value distance metric:

Page 23: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

23

Ways of rescaling for KNN

Dot product:

Cosine distance:

TFIDF weights for text: for doc j, feature i: xi=tfi,j * idfi :

#occur. of term i in

doc j

#docs in corpus

#docs in corpus that contain term i

Page 24: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

24

Combining distances to neighbors

Standard KNN:

Distance-weighted KNN:

|}':')','{(|)',(

))(,(maxargˆ

yyDyxDyC

xNeighborsyCy y

)',(1)',(

))',(()',(

))(,(maxargˆ

}':')','{(

xxxxSIM

xxSIMDyC

xNeighborsyCy

yyDyx

y

}':')','{(

))',(1( 1 )',(yyDyx

xxSIMDyC

Page 25: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

25

Page 26: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

26

Page 27: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

27

William W. Cohen & Haym Hirsh (1998): Joins that Generalize: Text Classification Using WHIRL in

KDD 1998: 169-173.

Page 28: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

28

Page 29: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

29

Page 30: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

30

M1

M2

Vitor Carvalho and William W. Cohen (2008): Ranking Users for Intelligent Message Addressing in

ECIR-2008, and current work with Vitor, me, and Ramnath Balasubramanyan

Page 31: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

31

Computing KNN: pros and cons

• Storage: all training examples are saved in memory– A decision tree or linear classifier is much smaller

• Time: to classify x, you need to loop over all training examples (x’,y’) to compute distance between x and x’.– However, you get predictions for every class y

• KNN is nice when there are many many classes

– Actually, there are some tricks to speed this up…especially when data is sparse (e.g., text)

Page 32: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

32

Efficiently implementing KNN (for text)

IDF is nice computationally

Page 33: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

33

Tricks with fast KNN

K-means using r-NN1. Pick k points c1=x1,….,ck=xk as centers

2. For each xi, find Di=Neighborhood(xi)

3. For each xi, let ci=mean(Di)

4. Go to step 2….

Page 34: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

34

Efficiently implementing KNN

dj2

dj3

dj4

Selective classification: given a training set and test set, find the N test cases that you can most confidently classify

Page 35: 1 K-nearest neighbor methods William Cohen 10-601 April 2008

35

Train once and select 100 test cases to classify