machine learning: some theoretical and practical problems

98
Outline Framework Theoretical Results Consequences Practical Implications Machine Learning: Some theoretical and practical problems Olivier Bousquet Journ´ ees MAS, Lille, 2006 Olivier Bousquet Machine Learning: Some theoretical and practical problems

Upload: butest

Post on 11-Jun-2015

209 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Machine Learning: Some theoretical and practicalproblems

Olivier Bousquet

Journees MAS, Lille, 2006

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 2: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

1 Framework

2 Theoretical Results

3 Consequences

4 Practical Implications

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 3: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Outline

1 Framework

2 Theoretical Results

3 Consequences

4 Practical Implications

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 4: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Setting

Prediction problemsafter observing example pairs (X ,Y ), build a function g : X → Ythat predicts well: g(X ) ≈ Y

Typical setting is statistical (data assumed to be sampled i.i.d.)

Other setting: on-line adversarial (no assumption on the datageneration mechanism)

Goal: find the best algorithm

Theoretical answer: fundamental limits of learningPractical answer: guidelines for algorithm design

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 5: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Setting

Prediction problemsafter observing example pairs (X ,Y ), build a function g : X → Ythat predicts well: g(X ) ≈ Y

Typical setting is statistical (data assumed to be sampled i.i.d.)

Other setting: on-line adversarial (no assumption on the datageneration mechanism)

Goal: find the best algorithm

Theoretical answer: fundamental limits of learningPractical answer: guidelines for algorithm design

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 6: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Setting

Prediction problemsafter observing example pairs (X ,Y ), build a function g : X → Ythat predicts well: g(X ) ≈ Y

Typical setting is statistical (data assumed to be sampled i.i.d.)

Other setting: on-line adversarial (no assumption on the datageneration mechanism)

Goal: find the best algorithm

Theoretical answer: fundamental limits of learningPractical answer: guidelines for algorithm design

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 7: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Setting

Prediction problemsafter observing example pairs (X ,Y ), build a function g : X → Ythat predicts well: g(X ) ≈ Y

Typical setting is statistical (data assumed to be sampled i.i.d.)

Other setting: on-line adversarial (no assumption on the datageneration mechanism)

Goal: find the best algorithm

Theoretical answer: fundamental limits of learningPractical answer: guidelines for algorithm design

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 8: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Setting

Prediction problemsafter observing example pairs (X ,Y ), build a function g : X → Ythat predicts well: g(X ) ≈ Y

Typical setting is statistical (data assumed to be sampled i.i.d.)

Other setting: on-line adversarial (no assumption on the datageneration mechanism)

Goal: find the best algorithm

Theoretical answer: fundamental limits of learningPractical answer: guidelines for algorithm design

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 9: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Setting

Prediction problemsafter observing example pairs (X ,Y ), build a function g : X → Ythat predicts well: g(X ) ≈ Y

Typical setting is statistical (data assumed to be sampled i.i.d.)

Other setting: on-line adversarial (no assumption on the datageneration mechanism)

Goal: find the best algorithm

Theoretical answer: fundamental limits of learningPractical answer: guidelines for algorithm design

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 10: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Definitions

We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

A rule (or learning algorithm) is a mapping gn : (X ×Y)n ×X → Y.

Sample: Sn = {(X1,Y1), . . . , (Xn,Yn)}

Misclassification error: L(g) = P (g(X ) 6= Y ) (conditional on thesample)

Bayes error: best possible error L∗ = infg L(g) over all measurablefunctions

Sequence of classification rules {gn}: defined for any sample size(algorithms are usually defined in this way, possibly with a samplesize-dependent parameter)

Consistency: limn→∞ EL(gn) = L∗

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 11: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Definitions

We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

A rule (or learning algorithm) is a mapping gn : (X ×Y)n ×X → Y.

Sample: Sn = {(X1,Y1), . . . , (Xn,Yn)}

Misclassification error: L(g) = P (g(X ) 6= Y ) (conditional on thesample)

Bayes error: best possible error L∗ = infg L(g) over all measurablefunctions

Sequence of classification rules {gn}: defined for any sample size(algorithms are usually defined in this way, possibly with a samplesize-dependent parameter)

Consistency: limn→∞ EL(gn) = L∗

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 12: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Definitions

We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

A rule (or learning algorithm) is a mapping gn : (X ×Y)n ×X → Y.

Sample: Sn = {(X1,Y1), . . . , (Xn,Yn)}

Misclassification error: L(g) = P (g(X ) 6= Y ) (conditional on thesample)

Bayes error: best possible error L∗ = infg L(g) over all measurablefunctions

Sequence of classification rules {gn}: defined for any sample size(algorithms are usually defined in this way, possibly with a samplesize-dependent parameter)

Consistency: limn→∞ EL(gn) = L∗

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 13: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Definitions

We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

A rule (or learning algorithm) is a mapping gn : (X ×Y)n ×X → Y.

Sample: Sn = {(X1,Y1), . . . , (Xn,Yn)}

Misclassification error: L(g) = P (g(X ) 6= Y ) (conditional on thesample)

Bayes error: best possible error L∗ = infg L(g) over all measurablefunctions

Sequence of classification rules {gn}: defined for any sample size(algorithms are usually defined in this way, possibly with a samplesize-dependent parameter)

Consistency: limn→∞ EL(gn) = L∗

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 14: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Definitions

We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

A rule (or learning algorithm) is a mapping gn : (X ×Y)n ×X → Y.

Sample: Sn = {(X1,Y1), . . . , (Xn,Yn)}

Misclassification error: L(g) = P (g(X ) 6= Y ) (conditional on thesample)

Bayes error: best possible error L∗ = infg L(g) over all measurablefunctions

Sequence of classification rules {gn}: defined for any sample size(algorithms are usually defined in this way, possibly with a samplesize-dependent parameter)

Consistency: limn→∞ EL(gn) = L∗

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 15: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Definitions

We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

A rule (or learning algorithm) is a mapping gn : (X ×Y)n ×X → Y.

Sample: Sn = {(X1,Y1), . . . , (Xn,Yn)}

Misclassification error: L(g) = P (g(X ) 6= Y ) (conditional on thesample)

Bayes error: best possible error L∗ = infg L(g) over all measurablefunctions

Sequence of classification rules {gn}: defined for any sample size(algorithms are usually defined in this way, possibly with a samplesize-dependent parameter)

Consistency: limn→∞ EL(gn) = L∗

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 16: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Outline

1 Framework

2 Theoretical Results

3 Consequences

4 Practical Implications

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 17: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Consistency

How to build a consistent sequence of rules?

Countable Xvery easy, just wait! eventually every point with non-zero probabilityis observed an unbounded number of times (i.e. take majority voteover observed x , and random prediction on unobserved ones)

Uncountable Xobserved sample has measure zero (for non-atomic measures) sothis trick does not work

Instead, take local majority with two conditions: more and morelocal, but also more and more points averaged

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 18: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Consistency

How to build a consistent sequence of rules?

Countable Xvery easy, just wait! eventually every point with non-zero probabilityis observed an unbounded number of times (i.e. take majority voteover observed x , and random prediction on unobserved ones)

Uncountable Xobserved sample has measure zero (for non-atomic measures) sothis trick does not work

Instead, take local majority with two conditions: more and morelocal, but also more and more points averaged

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 19: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Consistency

How to build a consistent sequence of rules?

Countable Xvery easy, just wait! eventually every point with non-zero probabilityis observed an unbounded number of times (i.e. take majority voteover observed x , and random prediction on unobserved ones)

Uncountable Xobserved sample has measure zero (for non-atomic measures) sothis trick does not work

Instead, take local majority with two conditions: more and morelocal, but also more and more points averaged

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 20: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Consistency of Histograms

Histogram in Rd : cubic cells of size hn, prediction is constant over eachcell (majority vote)

hn → 0, nhdn →∞ enough for universal consistency

Idea of the proof

Continuous functions with bounded support are dense in Lp(ν)Such functions are uniformly continuous and can thus beapproximated by histograms (average of the function on a cell)provided cell size goes to 0Since cells will contain more and more points (secondcondition), the cell value will eventually converge to theaverage over the cell

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 21: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Consistency of Histograms

Histogram in Rd : cubic cells of size hn, prediction is constant over eachcell (majority vote)

hn → 0, nhdn →∞ enough for universal consistency

Idea of the proof

Continuous functions with bounded support are dense in Lp(ν)Such functions are uniformly continuous and can thus beapproximated by histograms (average of the function on a cell)provided cell size goes to 0Since cells will contain more and more points (secondcondition), the cell value will eventually converge to theaverage over the cell

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 22: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch

We can ”learn” anything

Is the problem solved?

The question becomes: among the consistent algorithms, which oneis the best?

We consider here the special case of classification

Similar phenomena occur for regression or density estimation

Unfortunately, there is no free lunch

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 23: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch

We can ”learn” anything

Is the problem solved?

The question becomes: among the consistent algorithms, which oneis the best?

We consider here the special case of classification

Similar phenomena occur for regression or density estimation

Unfortunately, there is no free lunch

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 24: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch

We can ”learn” anything

Is the problem solved?

The question becomes: among the consistent algorithms, which oneis the best?

We consider here the special case of classification

Similar phenomena occur for regression or density estimation

Unfortunately, there is no free lunch

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 25: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch

We can ”learn” anything

Is the problem solved?

The question becomes: among the consistent algorithms, which oneis the best?

We consider here the special case of classification

Similar phenomena occur for regression or density estimation

Unfortunately, there is no free lunch

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 26: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch

We can ”learn” anything

Is the problem solved?

The question becomes: among the consistent algorithms, which oneis the best?

We consider here the special case of classification

Similar phenomena occur for regression or density estimation

Unfortunately, there is no free lunch

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 27: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch

We can ”learn” anything

Is the problem solved?

The question becomes: among the consistent algorithms, which oneis the best?

We consider here the special case of classification

Similar phenomena occur for regression or density estimation

Unfortunately, there is no free lunch

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 28: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 1

Out-of-sample error: L′(gn) = P (gn(X ) 6= Y |X /∈ Sn)

Consider a uniform probability distribution µ over problems, i.e. forall x , EµP(Y = 1|X = x) = EµP(Y = 0|X = x)

All classifiers have the same average error

Theorem (Wolpert96)

For any classification rule gn,

EµEL′(gn) =1

2

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 29: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 1

Out-of-sample error: L′(gn) = P (gn(X ) 6= Y |X /∈ Sn)

Consider a uniform probability distribution µ over problems, i.e. forall x , EµP(Y = 1|X = x) = EµP(Y = 0|X = x)

All classifiers have the same average error

Theorem (Wolpert96)

For any classification rule gn,

EµEL′(gn) =1

2

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 30: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 2

A consequence of NFL1 is that there are always cases where analgorithm can be beaten.

A stronger version of NFL1: No Super Classifier

Theorem (DGL96)

For every sequence of classification rules {gn} there is a universallyconsistent sequence {g ′n} such that for some distribution

L(gn) > L(g ′n)

for all n.

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 31: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 2

A consequence of NFL1 is that there are always cases where analgorithm can be beaten.

A stronger version of NFL1: No Super Classifier

Theorem (DGL96)

For every sequence of classification rules {gn} there is a universallyconsistent sequence {g ′n} such that for some distribution

L(gn) > L(g ′n)

for all n.

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 32: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 3

A variation of NFL1

Arbitrarily bad error for fixed sample sizes

Theorem (Devroye82)

Fix an ε > 0. For any integer n and classification rule gn, there exists adistribution of (X ,Y ) with Bayes risk L∗ = 0 such that

EL(gn) ≥ 1/2− ε

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 33: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 3

A variation of NFL1

Arbitrarily bad error for fixed sample sizes

Theorem (Devroye82)

Fix an ε > 0. For any integer n and classification rule gn, there exists adistribution of (X ,Y ) with Bayes risk L∗ = 0 such that

EL(gn) ≥ 1/2− ε

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 34: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 4

NFL3 possibly considers a different distribution for each n

What happens for a fixed distribution when n increases?

Slow rate phenomenon

Theorem (Devroye82)

Let {an} be a sequence of positive numbers converging to zero with1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, thereexists a distribution of (X ,Y ) with Bayes risk L∗ = 0 such that

EL(gn) ≥ an

for all n.

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 35: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 4

NFL3 possibly considers a different distribution for each n

What happens for a fixed distribution when n increases?

Slow rate phenomenon

Theorem (Devroye82)

Let {an} be a sequence of positive numbers converging to zero with1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, thereexists a distribution of (X ,Y ) with Bayes risk L∗ = 0 such that

EL(gn) ≥ an

for all n.

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 36: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

No Free Lunch 4

NFL3 possibly considers a different distribution for each n

What happens for a fixed distribution when n increases?

Slow rate phenomenon

Theorem (Devroye82)

Let {an} be a sequence of positive numbers converging to zero with1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, thereexists a distribution of (X ,Y ) with Bayes risk L∗ = 0 such that

EL(gn) ≥ an

for all n.

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 37: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Proofs

The idea is to create a ”bad” distribution

It turns out that random ones are bad enough: just create a problemwith no structure (prediction at x unrelated to prediction at x ′)

All proofs work on finite (for fixed n) or countable (for varying n)spaces (no need to introduce uncountable X )

The trick is to make sure that there are enough point that have notbeen observed yet (on those, the error will be 1/2)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 38: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Proofs

The idea is to create a ”bad” distribution

It turns out that random ones are bad enough: just create a problemwith no structure (prediction at x unrelated to prediction at x ′)

All proofs work on finite (for fixed n) or countable (for varying n)spaces (no need to introduce uncountable X )

The trick is to make sure that there are enough point that have notbeen observed yet (on those, the error will be 1/2)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 39: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Proofs

The idea is to create a ”bad” distribution

It turns out that random ones are bad enough: just create a problemwith no structure (prediction at x unrelated to prediction at x ′)

All proofs work on finite (for fixed n) or countable (for varying n)spaces (no need to introduce uncountable X )

The trick is to make sure that there are enough point that have notbeen observed yet (on those, the error will be 1/2)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 40: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Proofs

The idea is to create a ”bad” distribution

It turns out that random ones are bad enough: just create a problemwith no structure (prediction at x unrelated to prediction at x ′)

All proofs work on finite (for fixed n) or countable (for varying n)spaces (no need to introduce uncountable X )

The trick is to make sure that there are enough point that have notbeen observed yet (on those, the error will be 1/2)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 41: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

A closer look at consistency

Consider the trivially consistent rule for a countable space (majorityvote)

Its error decreases with increasing sample size

∀n, EL(gn) ≥ EL(gn+1)

Is is true in general for universally consistent rules?

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 42: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

A closer look at consistency

Consider the trivially consistent rule for a countable space (majorityvote)

Its error decreases with increasing sample size

∀n, EL(gn) ≥ EL(gn+1)

Is is true in general for universally consistent rules?

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 43: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

A closer look at consistency

Consider the trivially consistent rule for a countable space (majorityvote)

Its error decreases with increasing sample size

∀n, EL(gn) ≥ EL(gn+1)

Is is true in general for universally consistent rules?

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 44: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Smart rules

Consistency for uncountable spaces is not so trivial

Smart rules

Definition

A sequence {gn} of classification rules is smart if for any distribution andany integer n,

EL(gn) ≥ EL(gn+1)

For uncountable spaces, some of the known universally consistentrules can be shown to be non-smart

Conjecture: on Rd any smart rule is not universally consistent

Interpretation: consistency on uncountable spaces requires to adaptthe degree of smoothness to the sample size, this means that therewill be a point for which smoothness degree will be too large

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 45: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Smart rules

Consistency for uncountable spaces is not so trivial

Smart rules

Definition

A sequence {gn} of classification rules is smart if for any distribution andany integer n,

EL(gn) ≥ EL(gn+1)

For uncountable spaces, some of the known universally consistentrules can be shown to be non-smart

Conjecture: on Rd any smart rule is not universally consistent

Interpretation: consistency on uncountable spaces requires to adaptthe degree of smoothness to the sample size, this means that therewill be a point for which smoothness degree will be too large

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 46: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Smart rules

Consistency for uncountable spaces is not so trivial

Smart rules

Definition

A sequence {gn} of classification rules is smart if for any distribution andany integer n,

EL(gn) ≥ EL(gn+1)

For uncountable spaces, some of the known universally consistentrules can be shown to be non-smart

Conjecture: on Rd any smart rule is not universally consistent

Interpretation: consistency on uncountable spaces requires to adaptthe degree of smoothness to the sample size, this means that therewill be a point for which smoothness degree will be too large

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 47: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Smart rules

Consistency for uncountable spaces is not so trivial

Smart rules

Definition

A sequence {gn} of classification rules is smart if for any distribution andany integer n,

EL(gn) ≥ EL(gn+1)

For uncountable spaces, some of the known universally consistentrules can be shown to be non-smart

Conjecture: on Rd any smart rule is not universally consistent

Interpretation: consistency on uncountable spaces requires to adaptthe degree of smoothness to the sample size, this means that therewill be a point for which smoothness degree will be too large

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 48: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Anti-learning

Average error is 1/2 so there are problems for which the error is muchworse than random guessing!

One can indeed construct distributions for which some standardalgorithms have EL(gn) arbitrarily close to 1 even with L∗ = 0!

Of course this occurs for a fixed sample size

Can one always do that (for any rule) ?

The problem should have a structure, but one which is opposite tothe ones preferred by the algorithm

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 49: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Anti-learning

Average error is 1/2 so there are problems for which the error is muchworse than random guessing!

One can indeed construct distributions for which some standardalgorithms have EL(gn) arbitrarily close to 1 even with L∗ = 0!

Of course this occurs for a fixed sample size

Can one always do that (for any rule) ?

The problem should have a structure, but one which is opposite tothe ones preferred by the algorithm

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 50: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Anti-learning

Average error is 1/2 so there are problems for which the error is muchworse than random guessing!

One can indeed construct distributions for which some standardalgorithms have EL(gn) arbitrarily close to 1 even with L∗ = 0!

Of course this occurs for a fixed sample size

Can one always do that (for any rule) ?

The problem should have a structure, but one which is opposite tothe ones preferred by the algorithm

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 51: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Anti-learning

Average error is 1/2 so there are problems for which the error is muchworse than random guessing!

One can indeed construct distributions for which some standardalgorithms have EL(gn) arbitrarily close to 1 even with L∗ = 0!

Of course this occurs for a fixed sample size

Can one always do that (for any rule) ?

The problem should have a structure, but one which is opposite tothe ones preferred by the algorithm

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 52: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Bayes Error Estimation

Assume we just want to estimate L∗.

Of course, we could use any universally consistent algorithm andestimate its error. But we get slow rates!

Is there a better way?

Theorem (DGL96)

For every n, for any estimate Ln of the Bayes error L∗ and for everyε > 0, there exists a distribution (X ,Y ), such that

E(|Ln − L∗|

)≥ 1/4− ε

Estimating this single number does not seem easier than estimatingthe whole set {x : P(Y = 1|x) > 1/2}

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 53: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Bayes Error Estimation

Assume we just want to estimate L∗.

Of course, we could use any universally consistent algorithm andestimate its error. But we get slow rates!

Is there a better way?

Theorem (DGL96)

For every n, for any estimate Ln of the Bayes error L∗ and for everyε > 0, there exists a distribution (X ,Y ), such that

E(|Ln − L∗|

)≥ 1/4− ε

Estimating this single number does not seem easier than estimatingthe whole set {x : P(Y = 1|x) > 1/2}

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 54: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Bayes Error Estimation

Assume we just want to estimate L∗.

Of course, we could use any universally consistent algorithm andestimate its error. But we get slow rates!

Is there a better way?

Theorem (DGL96)

For every n, for any estimate Ln of the Bayes error L∗ and for everyε > 0, there exists a distribution (X ,Y ), such that

E(|Ln − L∗|

)≥ 1/4− ε

Estimating this single number does not seem easier than estimatingthe whole set {x : P(Y = 1|x) > 1/2}

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 55: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Outline

1 Framework

2 Theoretical Results

3 Consequences

4 Practical Implications

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 56: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

What can we hope to prove?

Our framework is too general! Nothing interesting can be saidabout learning algorithms

Can we prove something interesting under slightly more restrictiveassumptions?

Are the distributions used to prove the NFLs pathological? (NFL 4holds even within classes of ”reasonable” distributions!)

If we can define which problems actually occur in real life, we canhope to derive appropriate algorithms (optimal on this class ofproblems)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 57: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

What can we hope to prove?

Our framework is too general! Nothing interesting can be saidabout learning algorithms

Can we prove something interesting under slightly more restrictiveassumptions?

Are the distributions used to prove the NFLs pathological? (NFL 4holds even within classes of ”reasonable” distributions!)

If we can define which problems actually occur in real life, we canhope to derive appropriate algorithms (optimal on this class ofproblems)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 58: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

What can we hope to prove?

Our framework is too general! Nothing interesting can be saidabout learning algorithms

Can we prove something interesting under slightly more restrictiveassumptions?

Are the distributions used to prove the NFLs pathological? (NFL 4holds even within classes of ”reasonable” distributions!)

If we can define which problems actually occur in real life, we canhope to derive appropriate algorithms (optimal on this class ofproblems)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 59: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

What can we hope to prove?

Our framework is too general! Nothing interesting can be saidabout learning algorithms

Can we prove something interesting under slightly more restrictiveassumptions?

Are the distributions used to prove the NFLs pathological? (NFL 4holds even within classes of ”reasonable” distributions!)

If we can define which problems actually occur in real life, we canhope to derive appropriate algorithms (optimal on this class ofproblems)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 60: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Bayesian Way

Assume something about how the data is generated

Consider an algorithm specifically tuned to this property

Prove that under this assumption the algorithm does well

Most results are going in this direction (sometimes in a subtle way)

Bayesian algorithms

Most minimax results are of this form

inf{gn}

supP∈P

(L(gn)− inf

gL(g)

)Seems reasonable and useful for understanding but does not provideguarantees

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 61: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Bayesian Way

Assume something about how the data is generated

Consider an algorithm specifically tuned to this property

Prove that under this assumption the algorithm does well

Most results are going in this direction (sometimes in a subtle way)

Bayesian algorithms

Most minimax results are of this form

inf{gn}

supP∈P

(L(gn)− inf

gL(g)

)Seems reasonable and useful for understanding but does not provideguarantees

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 62: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Bayesian Way

Assume something about how the data is generated

Consider an algorithm specifically tuned to this property

Prove that under this assumption the algorithm does well

Most results are going in this direction (sometimes in a subtle way)

Bayesian algorithms

Most minimax results are of this form

inf{gn}

supP∈P

(L(gn)− inf

gL(g)

)Seems reasonable and useful for understanding but does not provideguarantees

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 63: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Bayesian Way

Assume something about how the data is generated

Consider an algorithm specifically tuned to this property

Prove that under this assumption the algorithm does well

Most results are going in this direction (sometimes in a subtle way)

Bayesian algorithms

Most minimax results are of this form

inf{gn}

supP∈P

(L(gn)− inf

gL(g)

)Seems reasonable and useful for understanding but does not provideguarantees

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 64: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Bayesian Way

Assume something about how the data is generated

Consider an algorithm specifically tuned to this property

Prove that under this assumption the algorithm does well

Most results are going in this direction (sometimes in a subtle way)

Bayesian algorithms

Most minimax results are of this form

inf{gn}

supP∈P

(L(gn)− inf

gL(g)

)Seems reasonable and useful for understanding but does not provideguarantees

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 65: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Bayesian Way

Assume something about how the data is generated

Consider an algorithm specifically tuned to this property

Prove that under this assumption the algorithm does well

Most results are going in this direction (sometimes in a subtle way)

Bayesian algorithms

Most minimax results are of this form

inf{gn}

supP∈P

(L(gn)− inf

gL(g)

)Seems reasonable and useful for understanding but does not provideguarantees

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 66: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Bayesian Way

Assume something about how the data is generated

Consider an algorithm specifically tuned to this property

Prove that under this assumption the algorithm does well

Most results are going in this direction (sometimes in a subtle way)

Bayesian algorithms

Most minimax results are of this form

inf{gn}

supP∈P

(L(gn)− inf

gL(g)

)Seems reasonable and useful for understanding but does not provideguarantees

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 67: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Worst Case Way

Assume nothing about the data (distribution-free)

Restrict your objectives

Derive an algorithm that reaches this objective no matter how thedata is

inf{gn}

supP

(L(gn)− inf

g∈GL(g)

)Gives guarantees

In between: adaptation

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 68: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Worst Case Way

Assume nothing about the data (distribution-free)

Restrict your objectives

Derive an algorithm that reaches this objective no matter how thedata is

inf{gn}

supP

(L(gn)− inf

g∈GL(g)

)Gives guarantees

In between: adaptation

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 69: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Worst Case Way

Assume nothing about the data (distribution-free)

Restrict your objectives

Derive an algorithm that reaches this objective no matter how thedata is

inf{gn}

supP

(L(gn)− inf

g∈GL(g)

)Gives guarantees

In between: adaptation

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 70: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Worst Case Way

Assume nothing about the data (distribution-free)

Restrict your objectives

Derive an algorithm that reaches this objective no matter how thedata is

inf{gn}

supP

(L(gn)− inf

g∈GL(g)

)Gives guarantees

In between: adaptation

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 71: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

The Worst Case Way

Assume nothing about the data (distribution-free)

Restrict your objectives

Derive an algorithm that reaches this objective no matter how thedata is

inf{gn}

supP

(L(gn)− inf

g∈GL(g)

)Gives guarantees

In between: adaptation

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 72: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Outline

1 Framework

2 Theoretical Results

3 Consequences

4 Practical Implications

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 73: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Does this help practically?

We can probably come up with algorithms that work well on mostreal-world problems

If we have a characterization of these problems, we can even provesomething about such algorithms

However, there is no guarantee that a new problem will satisfy thischaracterization

So there cannot be a formal proof that an algorithm is good or bad

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 74: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Does this help practically?

We can probably come up with algorithms that work well on mostreal-world problems

If we have a characterization of these problems, we can even provesomething about such algorithms

However, there is no guarantee that a new problem will satisfy thischaracterization

So there cannot be a formal proof that an algorithm is good or bad

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 75: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Does this help practically?

We can probably come up with algorithms that work well on mostreal-world problems

If we have a characterization of these problems, we can even provesomething about such algorithms

However, there is no guarantee that a new problem will satisfy thischaracterization

So there cannot be a formal proof that an algorithm is good or bad

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 76: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Does this help practically?

We can probably come up with algorithms that work well on mostreal-world problems

If we have a characterization of these problems, we can even provesomething about such algorithms

However, there is no guarantee that a new problem will satisfy thischaracterization

So there cannot be a formal proof that an algorithm is good or bad

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 77: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

If theory cannot help, what can we do?

Essentially a matter of finding an algorithm that implements theright notion of smoothness for the problem at hand

More an art than a science!

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 78: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

If theory cannot help, what can we do?

Essentially a matter of finding an algorithm that implements theright notion of smoothness for the problem at hand

More an art than a science!

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 79: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Priors

Algorithm design is composed of two steps

Choosing a preferenceThis first step is based on knowledge of the problem, this is whereguidance (but no theory) is needed.

Exploiting it for inferenceThe second step can possibly be formalized (optimality with respectto assumptions). The main issue is computational cost.

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 80: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Priors

Algorithm design is composed of two steps

Choosing a preferenceThis first step is based on knowledge of the problem, this is whereguidance (but no theory) is needed.

Exploiting it for inferenceThe second step can possibly be formalized (optimality with respectto assumptions). The main issue is computational cost.

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 81: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 82: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 83: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 84: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 85: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 86: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 87: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 88: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

Why can algorithms fail in practice?

1 Data representation (unappropriate features, errors, ...)

2 Data scarcity (not enough data samples)

3 Data overload (too many variables, too much noise)

4 Lack of understanding of the result (impossible validation) / lack ofvalidation data

Examples

Forgot to remove the output variable (or a version of it): algorithmpicks it up

An irrelevant variable happens to be discriminative (e.g. date ofsample collection)

Error in a measurement (misalignment in the database)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 89: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 90: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 91: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 92: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 93: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 94: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 95: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 96: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 97: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

So, what would be helpful?

Flexible ways to incorporate knowledge/expertise

Provide tools that allow to formulate prior knowledge in anatural wayLook for other types of prior assumptions that occur in variousproblems (e.g. manifold structure, clusteredness, analogy...)

Ability to understand what is found by the algorithm (need alanguage to interact with experts)

Investigate how to improve understandability (simpler models,separate models and language for interaction...)Improve interaction (understand user’s intent)

Computationally efficient algorithms

Scalability, anytimeIncorporate time complexity in the theoretical analysis (tradecomplexity for accuracy)

Olivier Bousquet Machine Learning: Some theoretical and practical problems

Page 98: Machine Learning: Some theoretical and practical problems

Outline Framework Theoretical Results Consequences Practical Implications

References

L. Devroye: Necessary and Sufficient Conditions for the AlmostEverywhere Convergence of Nearest Neighbors Regression FunctionEstimates. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandteGebiete, 61: 467-481 (1982)

D. Wolpert: The lack of a prior distinctions between learningalgorithms, Neural Computation 8 (1996)

L. Devroye, L. Gyorfi and G. Lugosi: A Probabilistic Theory ofPattern Recognition, Springer (1996)

Olivier Bousquet Machine Learning: Some theoretical and practical problems