clustering, k-means, and k-nearest neighbors · mean shift. spectral clustering. hierarchical...

62
Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC March 7 th , 2018 Most slides courtesy Hamed Pirsiavash

Upload: others

Post on 23-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering, K-Means, andK-Nearest Neighbors

CMSC 678UMBC

March 7th, 2018

Most slides courtesy Hamed Pirsiavash

Page 2: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Recap from last time…

Page 3: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Geometric Rationale of LDiscA & PCA

Objective: to rigidly rotate the axes of the D-dimensional space to new positions (principal axes):

ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis D has the lowest variance

covariance among each pair of the principal axes is zero (the principal axes are uncorrelated)

Courtesy Antano Žilinsko

Page 4: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

L-Dimensional PCA1. Compute mean 𝜇𝜇, priors, and common covariance Σ

2. Sphere the data (zero-mean, unit covariance) 3. Compute the (top L) eigenvectors, from sphere-d

data, via V

4. Project the data

Σ =1𝑁𝑁

�𝑖𝑖:𝑦𝑦𝑖𝑖=𝑘𝑘

𝑥𝑥𝑖𝑖 − 𝜇𝜇 𝑥𝑥𝑖𝑖 − 𝜇𝜇 𝑇𝑇

𝑋𝑋∗ = 𝑉𝑉𝐷𝐷𝐵𝐵𝑉𝑉𝑇𝑇

𝜇𝜇 =1𝑁𝑁�𝑖𝑖

𝑥𝑥𝑖𝑖

Page 5: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

OutlineClustering basics

K-means: basic algorithm & extensions

Cluster evaluation

Non-parametric mode finding: density estimation

Graph & spectral clustering

Hierarchical clustering

K-Nearest Neighbor

Page 6: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Basic idea: group together similar instancesExample: 2D points

Clustering

Page 7: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Basic idea: group together similar instancesExample: 2D points

One option: small Euclidean distance (squared)

Clustering results are crucially dependent on the measure of similarity (or distance) between points to be clustered

Clustering

Page 8: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Simple clustering: organize elements into k groups

K-meansMean shiftSpectral clustering

Hierarchical clustering: organize elements into a hierarchy

Bottom up - agglomerativeTop down - divisive

Clustering algorithms

Page 9: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering examples:Image Segmentation

image credit: Berkeley segmentation benchmark

Page 10: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering news articles

Clustering examples: News Feed

Page 11: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering queries

Clustering examples: Image Search

Page 12: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

OutlineClustering basics

K-means: basic algorithm & extensions

Cluster evaluation

Non-parametric mode finding: density estimation

Graph & spectral clustering

Hierarchical clustering

K-Nearest Neighbor

Page 13: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering using k-means

Data: D-dimensional observations (x1, x2, …, xn)

Goal: partition the n observations into k (≤ n) sets S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squared distances

cluster center

Page 14: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Lloyd’s algorithm for k-means

Initialize k centers by picking k points randomly among all the pointsRepeat till convergence (or max iterations)

Assign each point to the nearest center (assignment step)

Estimate the mean of each group (update step)

https://www.csee.umbc.edu/courses/graduate/678/spring18/kmeans/

Page 15: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Guaranteed to converge in a finite number of iterationsobjective decreases monotonicallylocal minima if the partitions don’t change. finitely many partitions k-means algorithm must converge

Running time per iterationAssignment step: O(NKD)Computing cluster mean: O(ND)

Issues with the algorithm:Worst case running time is super-polynomial in input sizeNo guarantees about global optimality

Optimal clustering even for 2 clusters is NP-hard [Aloise et al., 09]

Properties of the Lloyd’s algorithm

Page 16: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

k-means++ algorithm

A way to pick the good initial centers

Intuition: spread out the k initial cluster centers

The algorithm proceeds normally once the centers are initialized

[Arthur and Vassilvitskii’07] The approximation quality is O(log k) in expectation

k-means++ algorithm for initialization:

1. Chose one center uniformly at random among all the points

2. For each point x, compute D(x), the distance between x and the nearest center that has already been chosen

3. Chose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with a probability proportional to D(x)2

4. Repeat Steps 2 and 3 until k centers have been chosen

Page 17: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

k-means for image segmentation

18

Grouping pixels basedon intensity similarity

feature space: intensity value (1D)

K=2

K=3

Page 18: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

OutlineClustering basics

K-means: basic algorithm & extensions

Cluster evaluation

Non-parametric mode finding: density estimation

Graph & spectral clustering

Hierarchical clustering

K-Nearest Neighbor

Page 19: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation

(Classification: accuracy, recall, precision, F-score)

Greedy mapping: one-to-one

Optimistic mapping: many-to-one

Rigorous/information theoretic: V-measure

Page 20: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: One-to-One

Each modeled cluster can at most only map to one gold tag

type, and vice versa

Greedily select the mapping to maximize accuracy

Page 21: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: Many (classes)-to-One (cluster)

Each modeled cluster can map to at most one gold tag types,

but multiple clusters can map to the same gold tag

For each cluster: select the majority tag

Page 22: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: V-Measure

Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness

𝐻𝐻 𝑋𝑋 = −�𝑖𝑖

𝑝𝑝(𝑥𝑥𝑖𝑖) log 𝑝𝑝 𝑥𝑥𝑖𝑖

entropy

Page 23: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: V-Measure

Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness

𝐻𝐻 𝑋𝑋 = −�𝑖𝑖

𝑝𝑝(𝑥𝑥𝑖𝑖) log 𝑝𝑝 𝑥𝑥𝑖𝑖

entropy

entropy(point mass) = 0 entropy(uniform) = log K

Page 24: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: V-Measure

Rosenberg and Hirschberg (2008): harmonic mean of homogeneity

and completeness

Homogeneity: how well does each gold class map to a single

cluster?

homogeneity = �1, 𝐻𝐻 𝐾𝐾,𝐶𝐶 = 0

1 −𝐻𝐻 𝐶𝐶 𝐾𝐾𝐻𝐻 𝐶𝐶

, o/w

relative entropy is maximized when a cluster provides no new info. on class grouping

not very homogeneous

k clusterc gold class

“In order to satisfy our homogeneity criteria, a clustering must assign only those datapointsthat are members of a single class to a single cluster. That is, the class distribution within

each cluster should be skewed to a single class, that is, zero entropy.”

Page 25: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: V-Measure

Rosenberg and Hirschberg (2008): harmonic mean of homogeneity

and completeness

Completeness: how well does each learned cluster cover a

single gold class?completeness = �

1, 𝐻𝐻 𝐾𝐾,𝐶𝐶 = 0

1 −𝐻𝐻 𝐾𝐾 𝐶𝐶𝐻𝐻 𝐾𝐾

, o/w

relative entropy is maximized when each class is represented uniformly (relatively)

not very complete

k clusterc gold class

“In order to satisfy the completeness criteria, a clustering must assign all of those datapointsthat are members of a single class to a single

cluster. “

Page 26: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: V-Measure

Rosenberg and Hirschberg (2008): harmonic mean of homogeneity

and completeness

Homogeneity: how well does each gold class map to a single

cluster?

Completeness: how well does each learned cluster cover a

single gold class?completeness = �

1, 𝐻𝐻 𝐾𝐾,𝐶𝐶 = 0

1 −𝐻𝐻 𝐾𝐾 𝐶𝐶𝐻𝐻 𝐾𝐾

, o/w

homogeneity = �1, 𝐻𝐻 𝐾𝐾,𝐶𝐶 = 0

1 −𝐻𝐻 𝐶𝐶 𝐾𝐾𝐻𝐻 𝐶𝐶

, o/w

k clusterc gold class

Page 27: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: V-MeasureRosenberg and Hirschberg (2008): harmonic

mean of homogeneity and completeness

Homogeneity: how well does each gold class map to a single cluster?

Completeness: how well does each learned cluster cover a single gold class?

𝑎𝑎𝑐𝑐𝑘𝑘 = # elements of class c in cluster k

completeness = �1, 𝐻𝐻 𝐾𝐾,𝐶𝐶 = 0

1 −𝐻𝐻 𝐾𝐾 𝐶𝐶𝐻𝐻 𝐾𝐾

, o/w

homogeneity = �1, 𝐻𝐻 𝐾𝐾,𝐶𝐶 = 0

1 −𝐻𝐻 𝐶𝐶 𝐾𝐾𝐻𝐻 𝐶𝐶

, o/w

𝐻𝐻 𝐶𝐶 𝐾𝐾) = −�𝑘𝑘

𝐾𝐾

�𝑐𝑐

𝐶𝐶𝑎𝑎𝑐𝑐𝑘𝑘𝑁𝑁

log𝑎𝑎𝑐𝑐𝑘𝑘

∑𝑐𝑐′ 𝑎𝑎𝑐𝑐′𝑘𝑘

𝐻𝐻 𝐾𝐾 𝐶𝐶) = −�𝑐𝑐

𝐶𝐶

�𝑘𝑘

𝐾𝐾𝑎𝑎𝑐𝑐𝑘𝑘𝑁𝑁

log𝑎𝑎𝑐𝑐𝑘𝑘

∑𝑘𝑘′ 𝑎𝑎𝑐𝑐𝑘𝑘′

Page 28: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Clustering Evaluation: V-Measure

Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and

completeness

Homogeneity: how well does each gold class map to a single cluster?

Completeness: how well does each learned cluster cover a single gold class?

𝐻𝐻 𝐶𝐶 𝐾𝐾) = −�𝑘𝑘

𝐾𝐾

�𝑐𝑐

𝐶𝐶𝑎𝑎𝑐𝑐𝑘𝑘𝑁𝑁

log𝑎𝑎𝑐𝑐𝑘𝑘

∑𝑐𝑐′ 𝑎𝑎𝑐𝑐′𝑘𝑘

𝐻𝐻 𝐾𝐾 𝐶𝐶) = −�𝑐𝑐

𝐶𝐶

�𝑘𝑘

𝐾𝐾𝑎𝑎𝑐𝑐𝑘𝑘𝑁𝑁

log𝑎𝑎𝑐𝑐𝑘𝑘

∑𝑘𝑘′ 𝑎𝑎𝑐𝑐𝑘𝑘′

clusters

classes

ack K=1 K=2 K=3

3 1 1

1 1 3

1 3 1

Homogeneity = Completeness = V-Measure=0.14

Page 29: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

OutlineClustering basics

K-means: basic algorithm & extensions

Cluster evaluation

Non-parametric mode finding: density estimation

Graph & spectral clustering

Hierarchical clustering

K-Nearest Neighbor

Page 30: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

One issue with k-means is that it is sometimes hard to pick kThe mean shift algorithm seeks modes or local maxima of density in the feature spaceMean shift automatically determines the number of clusters

Clustering using density estimation

Kernel density estimator

Small h implies more modes (bumpy distribution)

Page 31: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Mean shift algorithm

For each point xi:set mi = xi

while not converged:compute

return {mi} 𝑚𝑚𝑖𝑖 =

∑𝑖𝑖 𝑥𝑥𝑖𝑖𝐾𝐾(𝑚𝑚𝑖𝑖 , 𝑥𝑥𝑖𝑖)∑𝑖𝑖 𝐾𝐾 𝑚𝑚𝑖𝑖 , 𝑥𝑥𝑖𝑖

self-clustering to based on kernel (similarity to other

points)

Pros:Does not assume shape on clustersGeneric techniqueFinds multiple modesParallelizable

Cons:Slow: O(DN2) per iterationDoes not work well for high-

dimensional features

Page 32: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Mean shift clustering results

http://www.caip.rutgers.edu/~comanici/MSPAMI/msPamiResults.html

Page 33: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

OutlineClustering basics

K-means: basic algorithm & extensions

Cluster evaluation

Non-parametric mode finding: density estimation

Graph & spectral clustering

Hierarchical clustering

K-Nearest Neighbor

Page 34: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Spectral clustering

[Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]

Page 35: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Group points based on the links in a graph

How do we create the graph?Weights on the edges based on similarity between the pointsA common choice is the Gaussian kernel

One could createA fully connected graphk-nearest graph (each node is connected only to its k-nearest neighbors)

Spectral clustering

A B

Slide courtesy Alan Fern

Page 36: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Consider a partition of the graph into two parts A and B

Cut(A, B) is the weight of all edges that connect the two groups

An intuitive goal is to find a partition that minimizes the cutmin-cuts in graphs can be computed in polynomial time

Graph cut

Page 37: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

The weight of a cut is proportional to number of edges in the cut; tends to produce small, isolated components.

Problem with min-cut

[Shi & Malik, 2000 PAMI]

We would like a balanced cut

Page 38: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Let W(i, j) denote the matrix of the edge weightsThe degree of node in the graph is:

The volume of a set A is defined as:

Graphs as matrices

Page 39: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

the connectivity between the groups relative to the volume of each group:

Minimizing normalized cut is NP-Hard even for planar graphs [Shi & Malik, 00]

Normalized cut

minimized when Vol(A) = Vol(B) a balanced cut

Page 40: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

W: the similarity matrixD: a diagonal matrix with D(i,i) = d(i) — the degree of node iy: a vector {1, -b}N , y(i) = 1 ↔ i ∈ A

The matrix (D-W) is called the Laplacian of the graph

Solving normalized cuts

allow for differing penalty

Page 41: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Normalized cuts objective:

Relax the integer constraint on y:

Same as: (Generalized eigenvalue problem) the first eigenvector is y1 = 1, with the

corresponding eigenvalue of 0The eigenvector corresponding to the second smallest eigenvalue is the solution to the relaxed problem

Solving normalized cuts

Page 42: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Hierarchical clustering

55

Page 43: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

OutlineClustering basics

K-means: basic algorithm & extensions

Cluster evaluation

Non-parametric mode finding: density estimation

Graph & spectral clustering

Hierarchical clustering

K-Nearest Neighbor

Page 44: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Agglomerative: a “bottom up” approach where elements start as individual clusters and clusters are

merged as one moves up the hierarchy

Divisive: a “top down” approach where elements start as a single cluster and clusters are split as one moves

down the hierarchy

Hierarchical clustering

Page 45: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Agglomerative clustering:First merge very similar instancesIncrementally build larger clusters out of smaller clusters

Algorithm:Maintain a set of clustersInitially, each instance in its own clusterRepeat:

Pick the two “closest” clustersMerge them into a new clusterStop when there’s only one cluster left

Produces not one clustering, but a family of clusterings represented by a dendrogram

Agglomerative clustering

Page 46: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

How should we define “closest” for clusters with multiple elements?

Closest pair: single-link clusteringFarthest pair: complete-link clustering

Average of all pairs

Agglomerative clustering

Page 47: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Different choices create different clustering behaviors

Agglomerative clustering

Page 48: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

OutlineClustering basics

K-means: basic algorithm & extensions

Cluster evaluation

Non-parametric mode finding: density estimation

Graph & spectral clustering

Hierarchical clustering

K-Nearest Neighbor

Page 49: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Will Alice like the movie?Alice and James are similarJames likes the movieAlice must/might also like the movie

Represent data as vectors of feature values

Find closest (Euclidean norm) points

Nearest neighbor classifier

Page 50: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Nearest neighbor classifier

Training data is in the form of Fruit data:

label: {apples, oranges, lemons}attributes: {width, height}

height

width

Page 51: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Nearest neighbor classifier

test data

(a, b) ?lemon

(c, d) ?apple

Page 52: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

k-Nearest neighbor classifierTake majority vote among the k nearest neighbors

outlier

Page 53: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

k-Nearest neighbor classifierTake majority vote among the k nearest neighbors

outlier What is the

effect of k?

Page 54: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Decision boundaries: 1NN

Page 55: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Choice of featuresWe are assuming that all features are equally importantWhat happens if we scale one of the features by a factor of 100?

Choice of distance functionEuclidean, cosine similarity (angle), Gaussian, etc …Should the coordinates be independent?

Choice of k

Inductive bias of the kNN classifier

Page 56: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

What is ?Find all the windows in the image that match the neighborhoodTo synthesize x

pick one matching window at randomassign x to be the center pixel of that window

An exact match might not be present, so find the best matches using Euclidean distance and randomly choose between them, preferring better matches with higher probability

An example: Synthesizing one pixel

pinput image

synthesized image

Slide from Alyosha Efros, ICCV 1999

Page 57: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

kNN: Scene Completion

“Scene completion using millions of photographs”, Hayes and Efros, TOG 2007

Page 58: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Nearest neighbors

kNN: Scene Completion

“Scene completion using millions of photographs”, Hayes and Efros, TOG 2007

Page 59: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

kNN: Scene Completion

“Scene completion using millions of photographs”, Hayes and Efros, TOG 2007

Page 60: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

kNN: Scene Completion

“Scene completion using millions of photographs”, Hayes and Efros, TOG 2007

Page 61: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Time taken by kNN for N points of D dimensions time to compute distances: O(ND)time to find the k nearest neighbor

O(k N) : repeated minimaO(N log N) : sorting O(N + k log N) : min heapO(N + k log k) : fast median

Total time is dominated by distance computation

We can be faster if we are willing to sacrifice exactness

Practical issue when using kNN: speed

Page 62: Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

Practical issue when using kNN: Curse of dimensionality

10x10#bins =

10ᵈd = 1000

#bins =

Atoms in the universe: ~10⁸⁰

How many neighborhoods are there?

d = 2