understanding thy neighbors: practical …skk2175/talks/icml18-part1.pdfunderstanding thy neighbors:...

154
,,,,,,,,,, Understanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE, UCalifornia, San Diego ORFE, Princeton University

Upload: others

Post on 24-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Understanding Thy Neighbors:Practical Perspectives from Modern Analysis

Sanjoy Dasgupta Samory KpotufeCSE, UCalifornia, San Diego ORFE, Princeton University

Page 2: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-Nearest Neighbor Approach:

Use the k closest datapoints to x to infer something about x.

Ubiquitous and Enduring in ML (implicit at times):

Traditional ML: Classification, Regression, Density Estimation,Bandits, Manifold Learning, Clustering ...

Modern ML: Matrix Completion, Inference on Graphs, TimeSeries Prediction ...

Of Practical Interest:

Which metric? Which values of k? Tradeoffs with big data?

A lot of recent insights towards these questions ...

Page 3: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-Nearest Neighbor Approach:

Use the k closest datapoints to x to infer something about x.

Ubiquitous and Enduring in ML (implicit at times):

Traditional ML: Classification, Regression, Density Estimation,Bandits, Manifold Learning, Clustering ...

Modern ML: Matrix Completion, Inference on Graphs, TimeSeries Prediction ...

Of Practical Interest:

Which metric? Which values of k? Tradeoffs with big data?

A lot of recent insights towards these questions ...

Page 4: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-Nearest Neighbor Approach:

Use the k closest datapoints to x to infer something about x.

Ubiquitous and Enduring in ML (implicit at times):

Traditional ML: Classification, Regression, Density Estimation,Bandits, Manifold Learning, Clustering ...

Modern ML: Matrix Completion, Inference on Graphs, TimeSeries Prediction ...

Of Practical Interest:

Which metric? Which values of k? Tradeoffs with big data?

A lot of recent insights towards these questions ...

Page 5: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-Nearest Neighbor Approach:

Use the k closest datapoints to x to infer something about x.

Ubiquitous and Enduring in ML (implicit at times):

Traditional ML: Classification, Regression, Density Estimation,Bandits, Manifold Learning, Clustering ...

Modern ML: Matrix Completion, Inference on Graphs, TimeSeries Prediction ...

Of Practical Interest:

Which metric? Which values of k? Tradeoffs with big data?

A lot of recent insights towards these questions ...

Page 6: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-Nearest Neighbor Approach:

Use the k closest datapoints to x to infer something about x.

Ubiquitous and Enduring in ML (implicit at times):

Traditional ML: Classification, Regression, Density Estimation,Bandits, Manifold Learning, Clustering ...

Modern ML: Matrix Completion, Inference on Graphs, TimeSeries Prediction ...

Of Practical Interest:

Which metric? Which values of k? Tradeoffs with big data?

A lot of recent insights towards these questions ...

Page 7: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-Nearest Neighbor Approach:

Use the k closest datapoints to x to infer something about x.

Ubiquitous and Enduring in ML (implicit at times):

Traditional ML: Classification, Regression, Density Estimation,Bandits, Manifold Learning, Clustering ...

Modern ML: Matrix Completion, Inference on Graphs, TimeSeries Prediction ...

Of Practical Interest:

Which metric? Which values of k? Tradeoffs with big data?

A lot of recent insights towards these questions ...

Page 8: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Basic Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

. . . y ← 5

Prediction: aggregate Y values in Neighborhood(x)

Similar Intuition: Classification Trees, RBF networks, Kernel machines.

Results by various authors help formalize the above intuition

Posner, Fix, Hodges, Cover, Hart, Devroye, Lugosi, Hero, Nobel, Gyorfi,Kulkarni, Ben David, Shalev-Schwartz, Samworth, Gadat, H. Chen, Shah,

Kpotufe, von Luxburg, Hein, Chaudhuri, Dasgupta, Langford, Kakade,Beygelzimer, Lee, Gray, Andoni, Clarkson, Krauthgamer, Indyk, ...

Page 9: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Basic Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

. . . . . . y ← 5

Prediction: aggregate Y values in Neighborhood(x)

Similar Intuition: Classification Trees, RBF networks, Kernel machines.

Results by various authors help formalize the above intuition

Posner, Fix, Hodges, Cover, Hart, Devroye, Lugosi, Hero, Nobel, Gyorfi,Kulkarni, Ben David, Shalev-Schwartz, Samworth, Gadat, H. Chen, Shah,

Kpotufe, von Luxburg, Hein, Chaudhuri, Dasgupta, Langford, Kakade,Beygelzimer, Lee, Gray, Andoni, Clarkson, Krauthgamer, Indyk, ...

Page 10: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Basic Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

. . . . . . y ← 5

Prediction: aggregate Y values in Neighborhood(x)

Similar Intuition: Classification Trees, RBF networks, Kernel machines.

Results by various authors help formalize the above intuition

Posner, Fix, Hodges, Cover, Hart, Devroye, Lugosi, Hero, Nobel, Gyorfi,Kulkarni, Ben David, Shalev-Schwartz, Samworth, Gadat, H. Chen, Shah,

Kpotufe, von Luxburg, Hein, Chaudhuri, Dasgupta, Langford, Kakade,Beygelzimer, Lee, Gray, Andoni, Clarkson, Krauthgamer, Indyk, ...

Page 11: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Basic Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

. . . . . . y ← 5

Prediction: aggregate Y values in Neighborhood(x)

Similar Intuition: Classification Trees, RBF networks, Kernel machines.

Results by various authors help formalize the above intuition

Posner, Fix, Hodges, Cover, Hart, Devroye, Lugosi, Hero, Nobel, Gyorfi,Kulkarni, Ben David, Shalev-Schwartz, Samworth, Gadat, H. Chen, Shah,

Kpotufe, von Luxburg, Hein, Chaudhuri, Dasgupta, Langford, Kakade,Beygelzimer, Lee, Gray, Andoni, Clarkson, Krauthgamer, Indyk, ...

Page 12: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Basic Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

. . . . . . y ← 5

Prediction: aggregate Y values in Neighborhood(x)

Similar Intuition: Classification Trees, RBF networks, Kernel machines.

Results by various authors help formalize the above intuition

Posner, Fix, Hodges, Cover, Hart, Devroye, Lugosi, Hero, Nobel, Gyorfi,Kulkarni, Ben David, Shalev-Schwartz, Samworth, Gadat, H. Chen, Shah,

Kpotufe, von Luxburg, Hein, Chaudhuri, Dasgupta, Langford, Kakade,Beygelzimer, Lee, Gray, Andoni, Clarkson, Krauthgamer, Indyk, ...

Page 13: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Cover both Statistical and Algorithmic Issues:

1 Statistical issues: how well can NN perform?

• When is 1-NN enough?

• For k-NN, what should k be?

• Is there always a curse of dimension?

2 Algorithmic issues: how efficient can NN be?

• Which data structure to use?

• Can we parallelize NN?

• What do we tradeoff?

Page 14: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Cover both Statistical and Algorithmic Issues:

1 Statistical issues: how well can NN perform?

• When is 1-NN enough?

• For k-NN, what should k be?

• Is there always a curse of dimension?

2 Algorithmic issues: how efficient can NN be?

• Which data structure to use?

• Can we parallelize NN?

• What do we tradeoff?

Page 15: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Cover both Statistical and Algorithmic Issues:

1 Statistical issues: how well can NN perform?

• When is 1-NN enough?

• For k-NN, what should k be?

• Is there always a curse of dimension?

2 Algorithmic issues: how efficient can NN be?

• Which data structure to use?

• Can we parallelize NN?

• What do we tradeoff?

Page 16: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Data representation is important:

Examples:

• Direct Euclidean

• Deep Neural Representation (image, speech)

• Word Embedding (text)

. . .

Representation ≡ choice of metric or dissimilarity ρ(x, x′)

Properties of ρ influence Statistical and Algorithmic aspects...

Page 17: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Data representation is important:

Examples:

• Direct Euclidean

• Deep Neural Representation (image, speech)

• Word Embedding (text)

. . .

Representation ≡ choice of metric or dissimilarity ρ(x, x′)

Properties of ρ influence Statistical and Algorithmic aspects...

Page 18: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Data representation is important:

Examples:

• Direct Euclidean

• Deep Neural Representation (image, speech)

• Word Embedding (text)

. . .

Representation ≡ choice of metric or dissimilarity ρ(x, x′)

Properties of ρ influence Statistical and Algorithmic aspects...

Page 19: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Data representation is important:

Examples:

• Direct Euclidean

• Deep Neural Representation (image, speech)

• Word Embedding (text)

. . .

Representation ≡ choice of metric or dissimilarity ρ(x, x′)

Properties of ρ influence Statistical and Algorithmic aspects...

Page 20: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Tutorial Outline:

• PART I: Basic Statistical Insights

• PART II: Refined Analysis and Implementation

Page 21: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Tutorial Outline:

• PART I: Basic Statistical Insights

• PART II: Refined Analysis and Implementation

Page 22: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

PART I: Basic Statistical Insights

• Universality

• Behavior of k-NN Distances

• From Regression to Classification

• Classification is easier than regression

• Multiclass and Mixed Costs

Page 23: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN as a universal approach:it can fit anything, provided k grows (but not too fast) with sample size!

Let’s make this precise in the context of regression ...

Page 24: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN as a universal approach:it can fit anything, provided k grows (but not too fast) with sample size!

Let’s make this precise in the context of regression ...

Page 25: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

i.i.d. Data: {(Xi, Yi)}ni=1,Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

k-NN is universally consistent:

Suppose kn → 0 and k →∞, then E |fk(X)− f(X)| n→∞−−−→ 0

Any function f , no matter how complex.

Page 26: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

i.i.d. Data: {(Xi, Yi)}ni=1,Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

k-NN is universally consistent:

Suppose kn → 0 and k →∞, then E |fk(X)− f(X)| n→∞−−−→ 0

Any function f , no matter how complex.

Page 27: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

i.i.d. Data: {(Xi, Yi)}ni=1,Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

k-NN is universally consistent:

Suppose kn → 0 and k →∞, then E |fk(X)− f(X)| n→∞−−−→ 0

Any function f , no matter how complex.

Page 28: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

i.i.d. Data: {(Xi, Yi)}ni=1,Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

k-NN is universally consistent:

Suppose kn → 0 and k →∞, then E |fk(X)− f(X)| n→∞−−−→ 0

Any function f , no matter how complex.

Page 29: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Intuition:

- {X(i)}k1 → x as long as k is fixed or grows slow (k/n→ 0)

- Suppose f is continuous, then we also get {f(X(i))}k1 → f(x)

- If k →∞, then fk(x) = 1k

∑(f(X(i) + noise)→ f(x)

Now, any f can be approximated arbitrarily well by continuous f ’s

Page 30: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Intuition:

Consider the k-NN {X(i)}k1 of some x

- {X(i)}k1 → x as long as k is fixed or grows slow (k/n→ 0)

- Suppose f is continuous, then we also get {f(X(i))}k1 → f(x)

- If k →∞, then fk(x) = 1k

∑(f(X(i) + noise)→ f(x)

Now, any f can be approximated arbitrarily well by continuous f ’s

Page 31: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Intuition:

As n↗, all {X(i)}k1 move closer to x

- {X(i)}k1 → x as long as k is fixed or grows slow (k/n→ 0)

- Suppose f is continuous, then we also get {f(X(i))}k1 → f(x)

- If k →∞, then fk(x) = 1k

∑(f(X(i) + noise)→ f(x)

Now, any f can be approximated arbitrarily well by continuous f ’s

Page 32: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Intuition:

As n↗, all {X(i)}k1 move closer to x

- {X(i)}k1 → x as long as k is fixed or grows slow (k/n→ 0)

- Suppose f is continuous, then we also get {f(X(i))}k1 → f(x)

- If k →∞, then fk(x) = 1k

∑(f(X(i) + noise)→ f(x)

Now, any f can be approximated arbitrarily well by continuous f ’s

Page 33: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Intuition:

As n↗, all {X(i)}k1 move closer to x

- {X(i)}k1 → x as long as k is fixed or grows slow (k/n→ 0)

- Suppose f is continuous, then we also get {f(X(i))}k1 → f(x)

- If k →∞, then fk(x) = 1k

∑(f(X(i) + noise)→ f(x)

Now, any f can be approximated arbitrarily well by continuous f ’s

Page 34: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Intuition:

As n↗, all {X(i)}k1 move closer to x

- {X(i)}k1 → x as long as k is fixed or grows slow (k/n→ 0)

- Suppose f is continuous, then we also get {f(X(i))}k1 → f(x)

- If k →∞, then fk(x) = 1k

∑(f(X(i) + noise)→ f(x)

Now, any f can be approximated arbitrarily well by continuous f ’s

Page 35: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Intuition:

As n↗, all {X(i)}k1 move closer to x

- {X(i)}k1 → x as long as k is fixed or grows slow (k/n→ 0)

- Suppose f is continuous, then we also get {f(X(i))}k1 → f(x)

- If k →∞, then fk(x) = 1k

∑(f(X(i) + noise)→ f(x)

Now, any f can be approximated arbitrarily well by continuous f ’s

Page 36: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar universality results for classification, density estimation, ...

Seminal results on k-NN consistency:

• [Fix, Hodges, 51]: classification + regularity, IRd.

• [Cover, Hart, 65, 67, 68]: classification + regularity, any metric.

• [Stone, 77]: classification, universal, IRd.

• [Devroye, Wagner, 77]: density estimation + regularity , IRd.

• [Devroye, Gyorfi, Kryzak, Lugosi, 94]: regression, universal, IRd.

Main message: k should grow (not too fast) with n ... (e.g. k ∼ log n)

But we need a more refined picture ...

Page 37: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar universality results for classification, density estimation, ...

Seminal results on k-NN consistency:

• [Fix, Hodges, 51]: classification + regularity, IRd.

• [Cover, Hart, 65, 67, 68]: classification + regularity, any metric.

• [Stone, 77]: classification, universal, IRd.

• [Devroye, Wagner, 77]: density estimation + regularity , IRd.

• [Devroye, Gyorfi, Kryzak, Lugosi, 94]: regression, universal, IRd.

Main message: k should grow (not too fast) with n ... (e.g. k ∼ log n)

But we need a more refined picture ...

Page 38: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar universality results for classification, density estimation, ...

Seminal results on k-NN consistency:

• [Fix, Hodges, 51]: classification + regularity, IRd.

• [Cover, Hart, 65, 67, 68]: classification + regularity, any metric.

• [Stone, 77]: classification, universal, IRd.

• [Devroye, Wagner, 77]: density estimation + regularity , IRd.

• [Devroye, Gyorfi, Kryzak, Lugosi, 94]: regression, universal, IRd.

Main message: k should grow (not too fast) with n ... (e.g. k ∼ log n)

But we need a more refined picture ...

Page 39: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar universality results for classification, density estimation, ...

Seminal results on k-NN consistency:

• [Fix, Hodges, 51]: classification + regularity, IRd.

• [Cover, Hart, 65, 67, 68]: classification + regularity, any metric.

• [Stone, 77]: classification, universal, IRd.

• [Devroye, Wagner, 77]: density estimation + regularity , IRd.

• [Devroye, Gyorfi, Kryzak, Lugosi, 94]: regression, universal, IRd.

Main message: k should grow (not too fast) with n ... (e.g. k ∼ log n)

But we need a more refined picture ...

Page 40: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar universality results for classification, density estimation, ...

Seminal results on k-NN consistency:

• [Fix, Hodges, 51]: classification + regularity, IRd.

• [Cover, Hart, 65, 67, 68]: classification + regularity, any metric.

• [Stone, 77]: classification, universal, IRd.

• [Devroye, Wagner, 77]: density estimation + regularity , IRd.

• [Devroye, Gyorfi, Kryzak, Lugosi, 94]: regression, universal, IRd.

Main message: k should grow (not too fast) with n ... (e.g. k ∼ log n)

But we need a more refined picture ...

Page 41: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

PART I: Basic Statistical Insights

• Universality

• Behavior of k-NN Distances

• From Regression to Classification

• Classification is easier than regression

• Multiclass and Mixed Costs

Page 42: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Why k-NN Distances?

Recall Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

So we hope that k-NN(x) are close to x ...

Formally: let rk(x) ≡ distance from x to k-th NN in i.i.d. {Xi}n1

Q: How small is rk(x) ≡ function of (PX , k, n)?

Page 43: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Why k-NN Distances?

Recall Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

So we hope that k-NN(x) are close to x ...

Formally: let rk(x) ≡ distance from x to k-th NN in i.i.d. {Xi}n1

Q: How small is rk(x) ≡ function of (PX , k, n)?

Page 44: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Why k-NN Distances?

Recall Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

So we hope that k-NN(x) are close to x ...

Formally: let rk(x) ≡ distance from x to k-th NN in i.i.d. {Xi}n1

Q: How small is rk(x) ≡ function of (PX , k, n)?

Page 45: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Why k-NN Distances?

Recall Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

So we hope that k-NN(x) are close to x ...

Formally: let rk(x) ≡ distance from x to k-th NN in i.i.d. {Xi}n1

Q: How small is rk(x) ≡ function of (PX , k, n)?

Page 46: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Why k-NN Distances?

Recall Intuition:

Closest neighbors of x should be mostly of similar type y = y(x) ...

So we hope that k-NN(x) are close to x ...

Formally: let rk(x) ≡ distance from x to k-th NN in i.i.d. {Xi}n1

Q: How small is rk(x) ≡ function of (PX , k, n)?

Page 47: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 48: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 49: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 50: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 51: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 52: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 53: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 54: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 55: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 56: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Fix x, and assume {Xi}n1 i.i.d. PX with density pX in IRd.

Bx ≡ B(x, rk(x)) ≡ smallest ball containing k-NN(x)

- Assume no ties: Pn(Bx) = k/n.

- w.h.p. Pn ≈ PX =⇒ PX(Bx) ≈ k/n.

Now: PX(Bx)≡∫BxpX(x′) dx′≈ pX(x) ·

∫Bxdx′≈ pX(x) · rk(x)d.

Therefore, w.h.p., rk(x) ≈(

1

pX(x)· kn

)1/d

.

Page 57: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 58: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 59: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 60: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 61: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 62: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 63: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 64: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

rk(x) ≈(

1

pX(x)· kn

)1/d

Immediate messages:

• rk(x)↗ when local density pX(x)↘• rk(x)↗ when input dimension d↗

Use smaller k for higher dimensional data ...

Curse of dimension: For rk ≈ ε we need n ≈ (1/ε)d ...

Fortunately, effective d can be small for high-dimensional X ∈ IRD

Effective d is whichever satisfies: PX(B(x, r)) = . . . ≈ c · rd

d = d (metric ρ; where X lies in IRD)

Page 65: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 1: Suppose X ∈ IRD, but ρ(x, x′) ≈ ρ(x(1), x′(1)) ...

PX(B(x, r)) ≈ r =⇒ effective d = 1

Page 66: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 1: Suppose X ∈ IRD, but ρ(x, x′) ≈ ρ(x(1), x′(1)) ...

B(x, r) ≡ {x′ : ρ(x, x′) ≤ r}

PX(B(x, r)) ≈ r =⇒ effective d = 1

Page 67: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 1: Suppose X ∈ IRD, but ρ(x, x′) ≈ ρ(x(1), x′(1)) ...

B(x, r) ≡ {x′ : ρ(x, x′) ≤ r}

PX(B(x, r)) ≈ r =⇒ effective d = 1

Page 68: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 2: Suppose X ∈ IRD, but lies on a d-dimensional space X ...

Consider B, of radius r, centered on X :

PX(B) ≈ pX ·∫B∩X dx ≈ pX · rd

Thus we’d have rk(x) ≈ (k/n)1/d, irrespective of D � d.

Page 69: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 2: Suppose X ∈ IRD, but lies on a d-dimensional space X ...

Consider B, of radius r, centered on X :

PX(B) ≈ pX ·∫B∩X dx ≈ pX · rd

Thus we’d have rk(x) ≈ (k/n)1/d, irrespective of D � d.

Page 70: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 2: Suppose X ∈ IRD, but lies on a d-dimensional space X ...

Consider B, of radius r, centered on X :

PX(B) ≈ pX ·∫B∩X dx ≈ pX · rd

Thus we’d have rk(x) ≈ (k/n)1/d, irrespective of D � d.

Page 71: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 2: Suppose X ∈ IRD, but lies on a d-dimensional space X ...

Consider B, of radius r, centered on X :

PX(B) ≈ pX ·∫B∩X dx ≈ pX · rd

Thus we’d have rk(x) ≈ (k/n)1/d, irrespective of D � d.

Page 72: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 2: Suppose X ∈ IRD, but lies on a d-dimensional space X ...

Consider B, of radius r, centered on X :

PX(B) ≈ pX ·∫B∩X dx ≈ pX · rd

Thus we’d have rk(x) ≈ (k/n)1/d, irrespective of D � d.

Page 73: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Ex 2: Suppose X ∈ IRD, but lies on a d-dimensional space X ...

Consider B, of radius r, centered on X :

PX(B) ≈ pX ·∫B∩X dx ≈ pX · rd

Thus we’d have rk(x) ≈ (k/n)1/d, irrespective of D � d.

Page 74: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Quick Simulations:

Embed (d = 2)-data into high-dimensional IRD, D →∞

Page 75: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Quick Simulations:

Fix d = 2: average NN distances are stable as D varies

Page 76: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Refined analysis for rk(x):

[J. Costa, A. Hero 04], [R. Samworth 12]

Implications:

rk(x) adaptive to d =⇒ NN methods adaptive to d ...

(d-sparse documents, images, Robotics data on d-manifold)

Page 77: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Refined analysis for rk(x):

[J. Costa, A. Hero 04], [R. Samworth 12]

Implications:

rk(x) adaptive to d =⇒ NN methods adaptive to d ...

(d-sparse documents, images, Robotics data on d-manifold)

Page 78: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

PART I: Basic Statistical Insights

• Universality

• Behavior of k-NN Distances

• From Regression to Classification

• Classification is easier than regression

• Multiclass and Mixed Costs

Page 79: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

From bounds on rk(x) to error rates:

Program:

1. Regression bounds

2. Reduce Classification to Regression

Page 80: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

From bounds on rk(x) to error rates:

Program:

1. Regression bounds

2. Reduce Classification to Regression

Page 81: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

Data: {(Xi, Yi)}ni=1, Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

Ideal Metric ρ: f(x) ≈ f(x′) if ρ(x, x′) ≈ 0

... e.g., assume f is Lipschitz: |f(x)− f(x′)| ≤ λ·ρ(x, x′).

Performance Goal:

Pick k such that ‖fk − f‖2 ≡ EX |fk(X)− f(X)|2 is small.

Page 82: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

Data: {(Xi, Yi)}ni=1, Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

Ideal Metric ρ: f(x) ≈ f(x′) if ρ(x, x′) ≈ 0

... e.g., assume f is Lipschitz: |f(x)− f(x′)| ≤ λ·ρ(x, x′).

Performance Goal:

Pick k such that ‖fk − f‖2 ≡ EX |fk(X)− f(X)|2 is small.

Page 83: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

Data: {(Xi, Yi)}ni=1, Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

Ideal Metric ρ: f(x) ≈ f(x′) if ρ(x, x′) ≈ 0

... e.g., assume f is Lipschitz: |f(x)− f(x′)| ≤ λ·ρ(x, x′).

Performance Goal:

Pick k such that ‖fk − f‖2 ≡ EX |fk(X)− f(X)|2 is small.

Page 84: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

Data: {(Xi, Yi)}ni=1, Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

Ideal Metric ρ: f(x) ≈ f(x′) if ρ(x, x′) ≈ 0

... e.g., assume f is Lipschitz: |f(x)− f(x′)| ≤ λ·ρ(x, x′).

Performance Goal:

Pick k such that ‖fk − f‖2 ≡ EX |fk(X)− f(X)|2 is small.

Page 85: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Regression

Data: {(Xi, Yi)}ni=1, Y = f(X) + noise

Learn: fk(x) = avg (Yi) of k-NN(x).

Ideal Metric ρ: f(x) ≈ f(x′) if ρ(x, x′) ≈ 0

... e.g., assume f is Lipschitz: |f(x)− f(x′)| ≤ λ·ρ(x, x′).

Performance Goal:

Pick k such that ‖fk − f‖2 ≡ EX |fk(X)− f(X)|2 is small.

Page 86: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 1: Bias-variance decomposition

A simple fact: E |Z − c|2 = E |Z − EZ|2 + |c− EZ|2 .

So fix x, and fix {Xi}, and let fk(x) = E{Yi}fk(x) ...

E |fk(x)− f(x)|2 = E |fk(x)− fk(x)|2︸ ︷︷ ︸Variance

+ |f(x)− fk(x)|2︸ ︷︷ ︸Bias2

.

Page 87: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 1: Bias-variance decomposition

A simple fact: E |Z − c|2 = E |Z − EZ|2 + |c− EZ|2 .

So fix x, and fix {Xi}, and let fk(x) = E{Yi}fk(x) ...

E |fk(x)− f(x)|2 = E |fk(x)− fk(x)|2︸ ︷︷ ︸Variance

+ |f(x)− fk(x)|2︸ ︷︷ ︸Bias2

.

Page 88: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 1: Bias-variance decomposition

A simple fact: E |Z − c|2 = E |Z − EZ|2 + |c− EZ|2 .

So fix x, and fix {Xi}, and let fk(x) = E{Yi}fk(x) ...

E |fk(x)− f(x)|2 = E |fk(x)− fk(x)|2︸ ︷︷ ︸Variance

+ |f(x)− fk(x)|2︸ ︷︷ ︸Bias2

.

Page 89: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 1: Bias-variance decomposition

A simple fact: E |Z − c|2 = E |Z − EZ|2 + |c− EZ|2 .

So fix x, and fix {Xi}, and let fk(x) = E{Yi}fk(x) ...

E |fk(x)− f(x)|2 = E |fk(x)− fk(x)|2︸ ︷︷ ︸Variance

+ |f(x)− fk(x)|2︸ ︷︷ ︸Bias2

.

Page 90: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 91: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 92: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 93: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 94: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 95: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 96: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 97: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 2: Bound the two terms

- Variance: recall fk(x) = 1k

∑Xi ∈ k-NN(x) Yi

Var(fk(x)) =1

k2

∑Xi ∈ k-NN(x)

Var(Yi) =σ2Yk

- Bias: note that fk(x) = 1k

∑Xi ∈ k-NN(x) f(Xi)

∣∣∣fk(x)− f(x)∣∣∣ ≤ 1

k

∑Xi ∈ k-NN(x)

|f(Xi)− f(x))|

≤ 1

k

∑Xi ∈ k-NN(x)

ρ(Xi, x)

≤ rk(x) ≈(k

n

)1/d

.

Page 98: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 3: Integrate over x and {Xi}

We then get: E ‖fk − f‖2 .1

k+

(k

n

)2/d

.

Pick k = Θ(n2/(2+d)) to get E ‖fk − f‖2 . n−2/(2+d), optimal.

Best choice of k ↗ as n↗ and d↘

Choosing k by C-V yields same optimal rates.

Page 99: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 3: Integrate over x and {Xi}

We then get: E ‖fk − f‖2 .1

k+

(k

n

)2/d

.

Pick k = Θ(n2/(2+d)) to get E ‖fk − f‖2 . n−2/(2+d), optimal.

Best choice of k ↗ as n↗ and d↘

Choosing k by C-V yields same optimal rates.

Page 100: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 3: Integrate over x and {Xi}

We then get: E ‖fk − f‖2 .1

k+

(k

n

)2/d

.

Tradeoff on k:

Pick k = Θ(n2/(2+d)) to get E ‖fk − f‖2 . n−2/(2+d), optimal.

Best choice of k ↗ as n↗ and d↘

Choosing k by C-V yields same optimal rates.

Page 101: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 3: Integrate over x and {Xi}

We then get: E ‖fk − f‖2 .1

k+

(k

n

)2/d

.

Tradeoff on k:

Pick k = Θ(n2/(2+d)) to get E ‖fk − f‖2 . n−2/(2+d), optimal.

Best choice of k ↗ as n↗ and d↘

Choosing k by C-V yields same optimal rates.

Page 102: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 3: Integrate over x and {Xi}

We then get: E ‖fk − f‖2 .1

k+

(k

n

)2/d

.

Tradeoff on k:

Pick k = Θ(n2/(2+d)) to get E ‖fk − f‖2 . n−2/(2+d), optimal.

Best choice of k ↗ as n↗ and d↘

Choosing k by C-V yields same optimal rates.

Page 103: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 3: Integrate over x and {Xi}

We then get: E ‖fk − f‖2 .1

k+

(k

n

)2/d

.

Tradeoff on k:

Pick k = Θ(n2/(2+d)) to get E ‖fk − f‖2 . n−2/(2+d), optimal.

Best choice of k ↗ as n↗ and d↘

Choosing k by C-V yields same optimal rates.

Page 104: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Step 3: Integrate over x and {Xi}

We then get: E ‖fk − f‖2 .1

k+

(k

n

)2/d

.

Tradeoff on k:

Pick k = Θ(n2/(2+d)) to get E ‖fk − f‖2 . n−2/(2+d), optimal.

Best choice of k ↗ as n↗ and d↘

Choosing k by C-V yields same optimal rates.

Page 105: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Unbounded capacity but generalizes at non-trivial rates ...

True even when k is different at every x ∈ IRd

(infinite number of parameters) [Kpo. 11]

Page 106: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Unbounded capacity but generalizes at non-trivial rates ...

True even when k is different at every x ∈ IRd

(infinite number of parameters) [Kpo. 11]

Page 107: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar messages under generalizations of Lipschitz assumption:

- Holder continuity: |f(x)− f(x′)| ≤ λ · ρ(x, x′)α.

(avg. version leads to so-called Nikoskii, Sobolev conditions)(see e.g. [Gyorfi, Krzyzak, Walk, 02])

Page 108: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar messages under generalizations of Lipschitz assumption:

- Holder continuity: |f(x)− f(x′)| ≤ λ · ρ(x, x′)α.

(avg. version leads to so-called Nikoskii, Sobolev conditions)(see e.g. [Gyorfi, Krzyzak, Walk, 02])

Page 109: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar messages under generalizations of Lipschitz assumption:

- Holder continuity: |f(x)− f(x′)| ≤ λ · ρ(x, x′)α.

(avg. version leads to so-called Nikoskii, Sobolev conditions)

|x− x′|α gets flatter around x = 0 as α↗.

(see e.g. [Gyorfi, Krzyzak, Walk, 02])

Page 110: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar messages under generalizations of Lipschitz assumption:

- Holder continuity: |f(x)− f(x′)| ≤ λ · ρ(x, x′)α.

(avg. version leads to so-called Nikoskii, Sobolev conditions)

|x− x′|α gets flatter around x = 0 as α↗.

Additional messages (as α↗):

- Local averages (as k-NN) not appropriate for smoother (easier) f .

- Local polynomials are best, but harder to implement in high-D.

(see e.g. [Gyorfi, Krzyzak, Walk, 02])

Page 111: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Similar messages under generalizations of Lipschitz assumption:

- Holder continuity: |f(x)− f(x′)| ≤ λ · ρ(x, x′)α.

(avg. version leads to so-called Nikoskii, Sobolev conditions)

|x− x′|α gets flatter around x = 0 as α↗.

Additional messages (as α↗):

- Local averages (as k-NN) not appropriate for smoother (easier) f .

- Local polynomials are best, but harder to implement in high-D.

(see e.g. [Gyorfi, Krzyzak, Walk, 02])

Page 112: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

From bounds on rk(x) to error rates:

Program:

1. Regression bounds

2. Reduce Classification to Regression

Page 113: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Classification

Data: {(Xi, Yi)}ni=1, Y ∈ {0, 1}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduces to regression: let fk(x) = avg (Yi) of k-NN(x)

... then: hk(x) ≡ 1{fk(x) ≥ 1/2}.

Performance Goal:

Pick k such that err(hk) ≡ P(hk(X) 6= Y ) is small.

Equivalently, consider E(hk) = err(hk)− err(h∗).

Page 114: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Classification

Data: {(Xi, Yi)}ni=1, Y ∈ {0, 1}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduces to regression: let fk(x) = avg (Yi) of k-NN(x)

... then: hk(x) ≡ 1{fk(x) ≥ 1/2}.

Performance Goal:

Pick k such that err(hk) ≡ P(hk(X) 6= Y ) is small.

Equivalently, consider E(hk) = err(hk)− err(h∗).

Page 115: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Classification

Data: {(Xi, Yi)}ni=1, Y ∈ {0, 1}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduces to regression: let fk(x) = avg (Yi) of k-NN(x)

... then: hk(x) ≡ 1{fk(x) ≥ 1/2}.

Performance Goal:

Pick k such that err(hk) ≡ P(hk(X) 6= Y ) is small.

Equivalently, consider E(hk) = err(hk)− err(h∗).

Page 116: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Classification

Data: {(Xi, Yi)}ni=1, Y ∈ {0, 1}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduces to regression: let fk(x) = avg (Yi) of k-NN(x)

... then: hk(x) ≡ 1{fk(x) ≥ 1/2}.

Performance Goal:

Pick k such that err(hk) ≡ P(hk(X) 6= Y ) is small.

Equivalently, consider E(hk) = err(hk)− err(h∗).

Page 117: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN Classification

Data: {(Xi, Yi)}ni=1, Y ∈ {0, 1}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduces to regression: let fk(x) = avg (Yi) of k-NN(x)

... then: hk(x) ≡ 1{fk(x) ≥ 1/2}.

Performance Goal:

Pick k such that err(hk) ≡ P(hk(X) 6= Y ) is small.

Equivalently, consider E(hk) = err(hk)− err(h∗).

Page 118: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Remarks: fk(x) estimates f(x) ≡ E [Y |x] = P(Y = 1|x),

and h∗(x) = 1{f(x) ≥ 1/2}, while hk(x) ≡ 1{fk(x) ≥ 1/2}.

fk(x) ≈ f(x) implies hk(x) = h∗(x)

One can show: E(hk) ≤ 2 ‖fk − f‖ .

For Lipschitz f : E E(hk) . n−1/(2+d), for k = Θ(n2/(2+d)).

Similar messages on choice of k ...

Page 119: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Remarks: fk(x) estimates f(x) ≡ E [Y |x] = P(Y = 1|x),

and h∗(x) = 1{f(x) ≥ 1/2}, while hk(x) ≡ 1{fk(x) ≥ 1/2}.

fk(x) ≈ f(x) implies hk(x) = h∗(x)

One can show: E(hk) ≤ 2 ‖fk − f‖ .

For Lipschitz f : E E(hk) . n−1/(2+d), for k = Θ(n2/(2+d)).

Similar messages on choice of k ...

Page 120: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Remarks: fk(x) estimates f(x) ≡ E [Y |x] = P(Y = 1|x),

and h∗(x) = 1{f(x) ≥ 1/2}, while hk(x) ≡ 1{fk(x) ≥ 1/2}.

fk(x) ≈ f(x) implies hk(x) = h∗(x)

One can show: E(hk) ≤ 2 ‖fk − f‖ .

For Lipschitz f : E E(hk) . n−1/(2+d), for k = Θ(n2/(2+d)).

Similar messages on choice of k ...

Page 121: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Remarks: fk(x) estimates f(x) ≡ E [Y |x] = P(Y = 1|x),

and h∗(x) = 1{f(x) ≥ 1/2}, while hk(x) ≡ 1{fk(x) ≥ 1/2}.

fk(x) ≈ f(x) implies hk(x) = h∗(x)

One can show: E(hk) ≤ 2 ‖fk − f‖ .

For Lipschitz f : E E(hk) . n−1/(2+d), for k = Θ(n2/(2+d)).

Similar messages on choice of k ...

Page 122: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Remarks: fk(x) estimates f(x) ≡ E [Y |x] = P(Y = 1|x),

and h∗(x) = 1{f(x) ≥ 1/2}, while hk(x) ≡ 1{fk(x) ≥ 1/2}.

fk(x) ≈ f(x) implies hk(x) = h∗(x)

One can show: E(hk) ≤ 2 ‖fk − f‖ .

For Lipschitz f : E E(hk) . n−1/(2+d), for k = Θ(n2/(2+d)).

Similar messages on choice of k ...

Page 123: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Remarks: fk(x) estimates f(x) ≡ E [Y |x] = P(Y = 1|x),

and h∗(x) = 1{f(x) ≥ 1/2}, while hk(x) ≡ 1{fk(x) ≥ 1/2}.

fk(x) ≈ f(x) implies hk(x) = h∗(x)

One can show: E(hk) ≤ 2 ‖fk − f‖ .

For Lipschitz f : E E(hk) . n−1/(2+d), for k = Θ(n2/(2+d)).

Similar messages on choice of k ...

Page 124: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Remarks: fk(x) estimates f(x) ≡ E [Y |x] = P(Y = 1|x),

and h∗(x) = 1{f(x) ≥ 1/2}, while hk(x) ≡ 1{fk(x) ≥ 1/2}.

fk(x) ≈ f(x) implies hk(x) = h∗(x)

One can show: E(hk) ≤ 2 ‖fk − f‖ .

For Lipschitz f : E E(hk) . n−1/(2+d), for k = Θ(n2/(2+d)).

Similar messages on choice of k ...

Page 125: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

PART I: Basic Statistical Insights

• Universality

• Behavior of k-NN Distances

• From Regression to Classification

• Classification is easier than regression

• Multiclass and Mixed Costs

Page 126: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

h∗ = 1{f ≥ 1/2}, while hk ≡ 1{fk ≥ 1/2}.

Suppose |f(x)− 1/2| ≥ δ for most values x ...

Then |fk − f | < δ implies hk = h∗ often ... no need for fk ≈ f .

Tsybakov’s noise condition: PX(|f − 1/2| < δ) ≤ δβ

If |fk − f | < δn, then PX(hk 6= h∗) ≤ PX(|f − 1/2| < δn) ≤ δβn.

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Page 127: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

h∗ = 1{f ≥ 1/2}, while hk ≡ 1{fk ≥ 1/2}.

Suppose |f(x)− 1/2| ≥ δ for most values x ...

Then |fk − f | < δ implies hk = h∗ often ... no need for fk ≈ f .

Tsybakov’s noise condition: PX(|f − 1/2| < δ) ≤ δβ

If |fk − f | < δn, then PX(hk 6= h∗) ≤ PX(|f − 1/2| < δn) ≤ δβn.

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Page 128: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

h∗ = 1{f ≥ 1/2}, while hk ≡ 1{fk ≥ 1/2}.

Suppose |f(x)− 1/2| ≥ δ for most values x ...

Then |fk − f | < δ implies hk = h∗ often ... no need for fk ≈ f .

Tsybakov’s noise condition: PX(|f − 1/2| < δ) ≤ δβ

If |fk − f | < δn, then PX(hk 6= h∗) ≤ PX(|f − 1/2| < δn) ≤ δβn.

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Page 129: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

h∗ = 1{f ≥ 1/2}, while hk ≡ 1{fk ≥ 1/2}.

Suppose |f(x)− 1/2| ≥ δ for most values x ...

Then |fk − f | < δ implies hk = h∗ often ... no need for fk ≈ f .

Tsybakov’s noise condition: PX(|f − 1/2| < δ) ≤ δβ

If |fk − f | < δn, then PX(hk 6= h∗) ≤ PX(|f − 1/2| < δn) ≤ δβn.

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Page 130: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

h∗ = 1{f ≥ 1/2}, while hk ≡ 1{fk ≥ 1/2}.

Suppose |f(x)− 1/2| ≥ δ for most values x ...

Then |fk − f | < δ implies hk = h∗ often ... no need for fk ≈ f .

Tsybakov’s noise condition: PX(|f − 1/2| < δ) ≤ δβ

If |fk − f | < δn, then PX(hk 6= h∗) ≤ PX(|f − 1/2| < δn) ≤ δβn.

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Page 131: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

h∗ = 1{f ≥ 1/2}, while hk ≡ 1{fk ≥ 1/2}.

Suppose |f(x)− 1/2| ≥ δ for most values x ...

Then |fk − f | < δ implies hk = h∗ often ... no need for fk ≈ f .

Tsybakov’s noise condition: PX(|f − 1/2| < δ) ≤ δβ

If |fk − f | < δn, then PX(hk 6= h∗) ≤ PX(|f − 1/2| < δn) ≤ δβn.

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Page 132: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

• Choice of metric → Lipschitzness of f , and intrinsic d.

• Large margin β mitigates effects of metric.(β =∞ =⇒ no curse of dimension!)

Technical Remarks:

- Above rates assume PX ≡ Uniform.

([Chaudhuri, Dasgupta 14] [Gadat et al 14]).- For non-uniform PX , rates are worse, but understudied.

([Gadat et al 14], [Cannings et al 17], [Kpo., Martinet 17]).

Page 133: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

• Choice of metric → Lipschitzness of f , and intrinsic d.

• Large margin β mitigates effects of metric.(β =∞ =⇒ no curse of dimension!)

Technical Remarks:

- Above rates assume PX ≡ Uniform.

([Chaudhuri, Dasgupta 14] [Gadat et al 14]).- For non-uniform PX , rates are worse, but understudied.

([Gadat et al 14], [Cannings et al 17], [Kpo., Martinet 17]).

Page 134: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

• Choice of metric → Lipschitzness of f , and intrinsic d.

• Large margin β mitigates effects of metric.(β =∞ =⇒ no curse of dimension!)

Technical Remarks:

- Above rates assume PX ≡ Uniform.

([Chaudhuri, Dasgupta 14] [Gadat et al 14]).- For non-uniform PX , rates are worse, but understudied.

([Gadat et al 14], [Cannings et al 17], [Kpo., Martinet 17]).

Page 135: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

• Choice of metric → Lipschitzness of f , and intrinsic d.

• Large margin β mitigates effects of metric.(β =∞ =⇒ no curse of dimension!)

Technical Remarks:

- Above rates assume PX ≡ Uniform.

([Chaudhuri, Dasgupta 14] [Gadat et al 14]).- For non-uniform PX , rates are worse, but understudied.

([Gadat et al 14], [Cannings et al 17], [Kpo., Martinet 17]).

Page 136: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

For Lipschitz f : E E(hk) . n−(β+1)/(2+d), for k = Θ(n2/(2+d)).

• Choice of metric → Lipschitzness of f , and intrinsic d.

• Large margin β mitigates effects of metric.(β =∞ =⇒ no curse of dimension!)

Technical Remarks:

- Above rates assume PX ≡ Uniform.

([Chaudhuri, Dasgupta 14] [Gadat et al 14]).- For non-uniform PX , rates are worse, but understudied.

([Gadat et al 14], [Cannings et al 17], [Kpo., Martinet 17]).

Page 137: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

PART I: Basic Statistical Insights

• Universality

• Behavior of k-NN Distances

• From Regression to Classification

• Classification is easier than regression

• Multiclass and Mixed Costs

Page 138: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN extends naturally to multiclass

Data: {(Xi, Yi)}ni=1, Y ∈ {1, , . . . , L}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduction: let fyk (x) = proportion (Y = y) out of k-NN(x)

It estimates fy(x) = P(Y = y|x).

... then: hk(x) ≡ argmaxy{fyk (x)}, and h∗(x) = argmaxy{fy(x)}

Previous insights extend easily ...

Page 139: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN extends naturally to multiclass

Data: {(Xi, Yi)}ni=1, Y ∈ {1, , . . . , L}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduction: let fyk (x) = proportion (Y = y) out of k-NN(x)

It estimates fy(x) = P(Y = y|x).

... then: hk(x) ≡ argmaxy{fyk (x)}, and h∗(x) = argmaxy{fy(x)}

Previous insights extend easily ...

Page 140: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN extends naturally to multiclass

Data: {(Xi, Yi)}ni=1, Y ∈ {1, , . . . , L}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduction: let fyk (x) = proportion (Y = y) out of k-NN(x)

It estimates fy(x) = P(Y = y|x).

... then: hk(x) ≡ argmaxy{fyk (x)}, and h∗(x) = argmaxy{fy(x)}

Previous insights extend easily ...

Page 141: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN extends naturally to multiclass

Data: {(Xi, Yi)}ni=1, Y ∈ {1, , . . . , L}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduction: let fyk (x) = proportion (Y = y) out of k-NN(x)

It estimates fy(x) = P(Y = y|x).

... then: hk(x) ≡ argmaxy{fyk (x)}, and h∗(x) = argmaxy{fy(x)}

Previous insights extend easily ...

Page 142: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

k-NN extends naturally to multiclass

Data: {(Xi, Yi)}ni=1, Y ∈ {1, , . . . , L}.

Learn: hk(x) = majority (Yi) of k-NN(x).

Reduction: let fyk (x) = proportion (Y = y) out of k-NN(x)

It estimates fy(x) = P(Y = y|x).

... then: hk(x) ≡ argmaxy{fyk (x)}, and h∗(x) = argmaxy{fy(x)}

Previous insights extend easily ...

Page 143: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

• Lipschitzness: ‖f(x)− f(x′)‖ ≤ ρ(x, x′)

• Noise margin: At any x, we want f (1)(x)� f (2)(x) ...

assume PX(f (1)(X) ≤ f (2)(X) + δ

)≤ δβ

Then: E E(hk) . ( 1logL · n)−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Same messages as earlier ...

Page 144: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

• Lipschitzness: ‖f(x)− f(x′)‖ ≤ ρ(x, x′)

• Noise margin: At any x, we want f (1)(x)� f (2)(x) ...

assume PX(f (1)(X) ≤ f (2)(X) + δ

)≤ δβ

Then: E E(hk) . ( 1logL · n)−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Same messages as earlier ...

Page 145: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

• Lipschitzness: ‖f(x)− f(x′)‖ ≤ ρ(x, x′)

• Noise margin: At any x, we want f (1)(x)� f (2)(x) ...

assume PX(f (1)(X) ≤ f (2)(X) + δ

)≤ δβ

Then: E E(hk) . ( 1logL · n)−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Same messages as earlier ...

Page 146: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

• Lipschitzness: ‖f(x)− f(x′)‖ ≤ ρ(x, x′)

• Noise margin: At any x, we want f (1)(x)� f (2)(x) ...

assume PX(f (1)(X) ≤ f (2)(X) + δ

)≤ δβ

Then: E E(hk) . ( 1logL · n)−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Same messages as earlier ...

Page 147: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

• Lipschitzness: ‖f(x)− f(x′)‖ ≤ ρ(x, x′)

• Noise margin: At any x, we want f (1)(x)� f (2)(x) ...

assume PX(f (1)(X) ≤ f (2)(X) + δ

)≤ δβ

Then: E E(hk) . ( 1logL · n)−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Same messages as earlier ...

Page 148: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

• Lipschitzness: ‖f(x)− f(x′)‖ ≤ ρ(x, x′)

• Noise margin: At any x, we want f (1)(x)� f (2)(x) ...

assume PX(f (1)(X) ≤ f (2)(X) + δ

)≤ δβ

Then: E E(hk) . ( 1logL · n)−(β+1)/(2+d), for k = Θ(n2/(2+d)).

Same messages as earlier ...

Page 149: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Mostly open:

Mixed costs regimes (e.g., medicine, finance, ...)

y ←↩ Expected cost when y is wrong 6= 1− P(Y = y)

Natural extensions of previous insights considered in [Reeve, Brown 17]

Practical Question:

assessing mixed costs, and integrating with NN methods ...

Page 150: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Mostly open:

Mixed costs regimes (e.g., medicine, finance, ...)

y ←↩ Expected cost when y is wrong 6= 1− P(Y = y)

Natural extensions of previous insights considered in [Reeve, Brown 17]

Practical Question:

assessing mixed costs, and integrating with NN methods ...

Page 151: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Mostly open:

Mixed costs regimes (e.g., medicine, finance, ...)

y ←↩ Expected cost when y is wrong 6= 1− P(Y = y)

Natural extensions of previous insights considered in [Reeve, Brown 17]

Practical Question:

assessing mixed costs, and integrating with NN methods ...

Page 152: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Mostly open:

Mixed costs regimes (e.g., medicine, finance, ...)

y ←↩ Expected cost when y is wrong 6= 1− P(Y = y)

Natural extensions of previous insights considered in [Reeve, Brown 17]

Practical Question:

assessing mixed costs, and integrating with NN methods ...

Page 153: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

Mostly open:

Mixed costs regimes (e.g., medicine, finance, ...)

y ←↩ Expected cost when y is wrong 6= 1− P(Y = y)

Natural extensions of previous insights considered in [Reeve, Brown 17]

Practical Question:

assessing mixed costs, and integrating with NN methods ...

Page 154: Understanding Thy Neighbors: Practical …skk2175/Talks/ICML18-Part1.pdfUnderstanding Thy Neighbors: Practical Perspectives from Modern Analysis Sanjoy Dasgupta Samory Kpotufe CSE,

, , , , , , , , , ,

End of Part I