inductive reasoning and (one of) the foundations of machine learning

72
Inductive Reasoning and (one of) the Foundations of Machine Learning “beware of mathematicians, and all those who make empty prophecies” — St. Augustine

Upload: david-balduzzi

Post on 13-Feb-2017

262 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Inductive Reasoning and (one of) the Foundations of Machine Learning

Inductive Reasoning and (one of) the Foundations of

Machine Learning

“beware of mathematicians, and all those who make empty prophecies”

— St. Augustine

Page 2: Inductive Reasoning and (one of) the Foundations of Machine Learning
Page 3: Inductive Reasoning and (one of) the Foundations of Machine Learning
Page 4: Inductive Reasoning and (one of) the Foundations of Machine Learning
Page 5: Inductive Reasoning and (one of) the Foundations of Machine Learning

All men are mortal Socrates is a man

Deductive reasoning

Socrates is mortal

Page 6: Inductive Reasoning and (one of) the Foundations of Machine Learning

All men are mortal Socrates is a man

Deductive reasoning

Socrates is mortal

Idea: Thinking is deductive reasoning!

Page 7: Inductive Reasoning and (one of) the Foundations of Machine Learning

Articles

WINTER 2006 13

Photo courtesy Dartmouth College.

Page 1 of the Original Proposal.

Page 8: Inductive Reasoning and (one of) the Foundations of Machine Learning

Trenchard More, John McCarthy, Marvin Minsky, Oliver Selfridge,

Ray Solomonoff

50 years later

Page 9: Inductive Reasoning and (one of) the Foundations of Machine Learning

“To understand the real world, we must have a different set of primitives from the relatively

simple line trackers suitable and sufficient for the blocks world”

!

— Patrick Winston (1975) Director of MIT’s AI lab from 1972-1997

A bump in the road

Page 10: Inductive Reasoning and (one of) the Foundations of Machine Learning

The AI winterhttp://en.wikipedia.org/wiki/AI_winter

Page 11: Inductive Reasoning and (one of) the Foundations of Machine Learning

Reductio ad absurdum

“Intelligence is 10 million rules” — Doug Lenat

Page 12: Inductive Reasoning and (one of) the Foundations of Machine Learning

The story so far…• Boy meets girl !

!!

!!

!

Page 13: Inductive Reasoning and (one of) the Foundations of Machine Learning

The story so far…• Boy meets girl !• Boy spends 100s of millions of dollars

wooing girl with deductive reasoning!!!

!

Page 14: Inductive Reasoning and (one of) the Foundations of Machine Learning

The story so far…• Boy meets girl !• Boy spends 100s of millions of dollars

wooing girl with deductive reasoning!!

• Girl says: “drop dead”; boy becomes very sad

Page 15: Inductive Reasoning and (one of) the Foundations of Machine Learning

The story so far…• Boy meets girl !• Boy spends 100s of millions of dollars

wooing girl with deductive reasoning!!

• Girl says: “drop dead”; boy becomes very sad

Next: Boy ponders the errors of his ways

Page 16: Inductive Reasoning and (one of) the Foundations of Machine Learning

Next: Boy ponders the errors of his ways

Page 17: Inductive Reasoning and (one of) the Foundations of Machine Learning

Next: Boy ponders the errors of his ways

“this book is composed […] upon one very simple theme […] that we can learn from our mistakes”

!!!!!!!!!!

Karl Popper, Conjectures and Refutations

Page 18: Inductive Reasoning and (one of) the Foundations of Machine Learning

We’re going to look at 4 learning algorithms.

Page 19: Inductive Reasoning and (one of) the Foundations of Machine Learning

Sequential predictionScenario: At time t, Forecaster predicts 0 or 1.

Nature then reveals the truth. !!!

Forecaster has access to N experts. One of them is always correct.

Goal: Predict as accurately as possible.

Page 20: Inductive Reasoning and (one of) the Foundations of Machine Learning

Algorithm #1While t>0:

Predict by majority vote.Step 1.Remove experts that are wrong.Step 2.t ← t+1Step 3.

Set t = 1.

Page 21: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t>0:

Question:

Predict by majority vote.Step 1.Remove experts that are wrong.Step 2.t ← t+1Step 3.

How long to find correct expert?

Set t = 1.Algorithm #1

Page 22: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t>0:

BAD!!!

Predict by majority vote.Step 1.Remove experts that are wrong.Step 2.t ← t+1Step 3.

How long to find correct expert?

Set t = 1.Algorithm #1

Page 23: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t>0:

Question:

Predict by majority vote.Step 1.Remove experts that are wrong.Step 2.t ← t+1Step 3.

How many errors?

Set t = 1.Algorithm #1

Page 24: Inductive Reasoning and (one of) the Foundations of Machine Learning

Algorithm #1

Predict by majority vote.Step 1.Remove experts that are wrong.Step 2.

How many errors?

Page 25: Inductive Reasoning and (one of) the Foundations of Machine Learning

Predict by majority vote.Step 1.Remove experts that are wrong.Step 2.

How many errors?

When algorithm makes a mistake,

it removes ≥ half of experts

Algorithm #1

Page 26: Inductive Reasoning and (one of) the Foundations of Machine Learning

≤ log N

Predict by majority vote.Step 1.Remove experts that are wrong.Step 2.

How many errors?

When algorithm makes a mistake,

it removes ≥ half of experts

Algorithm #1

Page 27: Inductive Reasoning and (one of) the Foundations of Machine Learning

Deep thought #1Track errors, not runtime

Page 28: Inductive Reasoning and (one of) the Foundations of Machine Learning

What’s going on?Didn’t we just use deductive reasoning!?!

Page 29: Inductive Reasoning and (one of) the Foundations of Machine Learning

What’s going on?Didn’t we just use deductive reasoning!?!

Yes… but No!

Page 30: Inductive Reasoning and (one of) the Foundations of Machine Learning

What’s going on?Algorithm: makes educated guesses about Nature

Analysis: proves theorem about number of errors

(inductive)

(deductive)

Page 31: Inductive Reasoning and (one of) the Foundations of Machine Learning

What’s going on?Algorithm: makes educated guesses about Nature

Analysis: proves theorem about number of errors

(inductive)

(deductive)

The algorithm learns — but it does not deduce!

Page 32: Inductive Reasoning and (one of) the Foundations of Machine Learning

Adversarial predictionScenario: At time t, Forecaster predicts 0 or 1.

Nature then reveals the truth. !!!

Forecaster has access to N experts. One of them is always correct. Nature is adversarial.

Goal: Predict as accurately as possible.

Page 33: Inductive Reasoning and (one of) the Foundations of Machine Learning

At time t, Forecaster predicts 0 or 1. Nature then reveals the truth.

!!!

Forecaster has access to N experts. One of them is always correct. Nature is adversarial.

Goal: Predict as accurately as possible.

Seriously?!?!

Page 34: Inductive Reasoning and (one of) the Foundations of Machine Learning

Regret

Let m* be the best expert in hindsight. regret := errors(Forecaster) - errors(m*)

Goal: Predict as accurately as possible. Minimize regret.

Page 35: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:

Question:

Predict by weighted majority vote.Step 1.Multiply incorrect experts by β.Step 2.t ← t+1Step 3.

What is the regret?

Set t = 1.Algorithm #2

Pick β in (0,1). Assign 1 to experts.

Page 36: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:Predict by weighted majority vote.Step 1.Multiply incorrect experts by β.Step 2.t ← t+1Step 3.

Set t = 1.Algorithm #2

What is the regret? [ choose β carefully ]r

T · logN2

Pick β in (0,1). Assign 1 to experts.

Page 37: Inductive Reasoning and (one of) the Foundations of Machine Learning

Deep thought #2Model yourself, not Nature

Page 38: Inductive Reasoning and (one of) the Foundations of Machine Learning

Online Convex Opt.Scenario: Convex set K; convex loss L(a,b)

[ in both arguments, separately ] !

At time t, Forecaster picks at in K Nature responds with bt in K [ Nature is adversarial ] Forecaster’s loss is L(a,b)

Goal: Minimize regret.

Page 39: Inductive Reasoning and (one of) the Foundations of Machine Learning

Follow the Leader

Idea: Predict with the at that would have worked best on { b1, … ,bt-1 }

Page 40: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:Step 1.Step 2.

Set t = 1.Follow the Leader

Idea:

t ← t+1

Pick a1 at random.

at := argmina2K

"t�1X

i=1

L(a, bi)#

Predict with the at that would have worked best on { b1, … ,bt-1 }

Page 41: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:Step 1.Step 2.

Set t = 1.Follow the LeaderBAD!

Problem: Nature pulls Forecaster back-and-forth No memory!

t ← t+1

Pick a1 at random.

at := argmina2K

"t�1X

i=1

L(a, bi)#

Page 42: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:Step 1.Step 2.

Set t = 1.

t ← t+1

Algorithm #3Pick a1 at random.

regularize

at := argmina2K

"t�1X

i=1

L(a, bi) +�

2· kak22

#

Page 43: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:Step 1.Step 2.

Set t = 1.Algorithm #3

t ← t+1

Pick a1 at random.

gradient descent

at at�1 � � · @

@aL(at�1, bt�1)

Page 44: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:Step 1.Step 2.

Set t = 1.Algorithm #3

Intuition: β controls memory

t ← t+1

Pick a1 at random.

at at�1 � � · @

@aL(at�1, bt�1)

Page 45: Inductive Reasoning and (one of) the Foundations of Machine Learning

While t ≤ T:Step 1.Step 2.

Set t = 1.Algorithm #3

What is the regret? [ choose β carefully ]

diam(K) · Lipschitz(L) ·pT

t ← t+1

Pick a1 at random.

at at�1 � � · @

@aL(at�1, bt�1)

Page 46: Inductive Reasoning and (one of) the Foundations of Machine Learning

Deep thought #3

Those who cannot remember [their]

past are condemned to repeat it

George Santayana

Page 47: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theoreminfa2K

supb2K

L(a, b) = supb2K

infa2K

L(a, b)

Page 48: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theorem

Forecaster picks a, Nature responds b

infa2K

supb2K

L(a, b) = supb2K

infa2K

L(a, b)

Page 49: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theorem

Forecaster picks a, Nature responds b

Nature picks b, Forecaster responds a

infa2K

supb2K

L(a, b) = supb2K

infa2K

L(a, b)

Page 50: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theorem

Forecaster picks a, Nature responds b

Nature picks b, Forecaster responds a

infa2K

supb2K

L(a, b) = supb2K

infa2K

L(a, b)

infa2K

supb2K

L(a, b) � supb2K

infa2K

L(a, b)going first hurts Forecaster, so

Page 51: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theorem

Proof idea:No-regret algorithm →

!!

→ !!

Forecaster can asymptotically match hindsight !Order of players doesn’t matter asymptotically !Convert series of moves into average via online-to-batch.

Let m* be the best move in hindsight. regret := loss(Forecaster) - loss(m*)

infa2K

supb2K

L(a, b) supb2K

infa2K

L(a, b)

Page 52: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theorem

Proof idea:No-regret algorithm →

!!

→ !!

Forecaster can asymptotically match hindsight !Order of players doesn’t matter asymptotically !Convert series of moves into average via online-to-batch.

Let m* be the best move in hindsight. regret := loss(Forecaster) - loss(m*)

infa2K

supb2K

L(a, b) supb2K

infa2K

L(a, b)

Page 53: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theorem

Proof idea:No-regret algorithm →

!!

→ !!

Forecaster can asymptotically match hindsight !Order of players doesn’t matter asymptotically !Convert series of moves into average via online-to-batch.

Let m* be the best move in hindsight. regret := loss(Forecaster) - loss(m*)

infa2K

supb2K

L(a, b) supb2K

infa2K

L(a, b)

Page 54: Inductive Reasoning and (one of) the Foundations of Machine Learning

Minimax theorem

Proof idea:No-regret algorithm →

!!

→ !!

Forecaster can asymptotically match hindsight !Order of players doesn’t matter asymptotically !Convert series of moves into average via online-to-batch.

Let m* be the best move in hindsight. regret := loss(Forecaster) - loss(m*)

a =1

T

TX

t=1

at

infa2K

supb2K

L(a, b) supb2K

infa2K

L(a, b)

Page 55: Inductive Reasoning and (one of) the Foundations of Machine Learning

BoostingScenario:

Goal: Combine to perform well

Algorithm W is better than guessing on any data distribution: loss ≤ 0.5 - ε

Page 56: Inductive Reasoning and (one of) the Foundations of Machine Learning

The Boosting GameValue of game: V(w,d) = # mistakes w

makes on d

Algorithm W is better than guessing on any data distribution: loss ≤ 0.5 - ε

supd

infw

V(w, d) 1

2� ✏

Page 57: Inductive Reasoning and (one of) the Foundations of Machine Learning

The Boosting GameValue of game: V(w,d) = # mistakes w

makes on d

infw

supd

V(w, d) 1

2� ✏ MINIMAX!

Page 58: Inductive Reasoning and (one of) the Foundations of Machine Learning

The Boosting Game

infw

supd

V(w, d) 1

2� ✏ MINIMAX!

∃ distribution w* on learners that averages correctly on any data!

Page 59: Inductive Reasoning and (one of) the Foundations of Machine Learning

Meta-Algorithm #4Play Algorithm #2 against Algorithm W[ #2 maximizes W’s mistakes ]

Page 60: Inductive Reasoning and (one of) the Foundations of Machine Learning

Meta-Algorithm #4Play Algorithm #2 against Algorithm W[ #2 maximizes W’s mistakes ]

infw

supd

V(w, d) 1

2� ✏

Algorithm #2

Algorithm W

Page 61: Inductive Reasoning and (one of) the Foundations of Machine Learning

Meta-Algorithm #4

• Freund and Schapire 1995 !

• Best learning algorithm in late 1990s and early 2000s !

• Authors won Gödel prize

Play Algorithm #2 against Algorithm W[ #2 maximizes W’s mistakes ]

infw

supd

V(w, d) 1

2� ✏

Algorithm #2

Algorithm W

Page 62: Inductive Reasoning and (one of) the Foundations of Machine Learning

Deep thought #4

Your teachers are not your

friends

Page 63: Inductive Reasoning and (one of) the Foundations of Machine Learning

The story so far…• Boy met girl !• Boy spent 100s of millions of dollars

wooing girl with deductive reasoning !

• Girl said: “drop dead”; boy became very sad !

!

Page 64: Inductive Reasoning and (one of) the Foundations of Machine Learning

The story so far…• Boy met girl !• Boy spent 100s of millions of dollars

wooing girl with deductive reasoning !

• Girl said: “drop dead”; boy became very sad !

• Boy learnt to learn from mistakes

Page 65: Inductive Reasoning and (one of) the Foundations of Machine Learning

The story so far…• Boy met girl !• Boy spent 100s of millions of dollars

wooing girl with deductive reasoning !

• Girl showed no interest; boy became very sad !

• Boy learnt to learn from mistakes

Next: Boy invites girl for coffee. Girl accepts!

Page 66: Inductive Reasoning and (one of) the Foundations of Machine Learning

Online Convex Opt. (deep learning)

Apply Algorithm #3 to nonconvex optimization. !Theorems don’t work (not convex) → tons of engineering on top of #3 !Amazing performance. !New mathematics needs to be invented!!

Page 67: Inductive Reasoning and (one of) the Foundations of Machine Learning

Online Convex Opt. (deep learning)

In the last 2 years deep learning has: !• Better than human performance at object

recognition (ImageNet). !• Outperformed humans at recognising

street-signs (Google streetview). !

• Superhuman performance on Atari games (DeepMind).

!• Real-time translation: English voice

to Chinese text and voice.

Page 68: Inductive Reasoning and (one of) the Foundations of Machine Learning

Thank you!#1. Halving #2. Multiplicative Weights Exponential Weights Algorithm (EWA) #3. Online Gradient Descent (OGD) Stochastic Gradient Descent (SGD) Mirror Descent Backpropagation #4. AdaBoost

Page 69: Inductive Reasoning and (one of) the Foundations of Machine Learning

Details? Lecture notes on my webpage: https://dl.dropboxusercontent.com/u/

5874168/math482.pdf

Thank you!#1. Halving #2. Multiplicative Weights Exponential Weights Algorithm (EWA) #3. Online Gradient Descent (OGD) Stochastic Gradient Descent (SGD) Mirror Descent Backpropagation #4. AdaBoost

Page 70: Inductive Reasoning and (one of) the Foundations of Machine Learning

Vladimir Vapnik

Alexey Chervonenkis!1938 — 2014

Page 71: Inductive Reasoning and (one of) the Foundations of Machine Learning

“[A] theory of induction is superfluous. It has no function in a logic of science. The best we can say of a hypothesis is that up to now it has been able to show its worth, and that it has been

more successful that other hypotheses although, in principle, it can never be justified, verified, or even shown to be probable. This appraisal of the hypothesis relies

solely upon deductive consequences (predictions) which may be drawn

from the hypothesis: There is no need to even mention induction.”

Page 72: Inductive Reasoning and (one of) the Foundations of Machine Learning

“the learning process may be regarded as a search for a form of behaviour which will

satisfy the teacher (or some other criterion)”