game-theoretic online learning

54
An Introduction to Game-theoretic Online Learning Tim van Erven General Mathematics Colloquium Leiden, December 4, 2014

Upload: phamkhue

Post on 13-Feb-2017

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Game-theoretic Online Learning

An Introduction to

Game-theoretic Online Learning

Tim van Erven

General Mathematics Colloquium Leiden, December 4, 2014

Page 2: Game-theoretic Online Learning

Statistics

BayesianStatistics Frequentist

Statistics

Page 3: Game-theoretic Online Learning

Statistics

BayesianStatistics Frequentist

Statistics A.was here

Page 4: Game-theoretic Online Learning

Statistics

BayesianStatistics Frequentist

Statistics

RogueFactions

Page 5: Game-theoretic Online Learning

Statistics

BayesianStatistics Frequentist

Statistics

RogueFactions

AdversarialBandits

Page 6: Game-theoretic Online Learning

Statistics

BayesianStatistics Frequentist

Statistics

RogueFactions

Game-theoreticOnline Learning

AdversarialBandits

Youare here

Page 7: Game-theoretic Online Learning

Outline

● Introduction to Online Learning– Game-theoretic Model

– Regression example: Electricity

– Classification example: Spam

● Three algorithms● Tuning the Learning Rate

Page 8: Game-theoretic Online Learning

Game-Theoretic Online Learning

● Predict data that arrive one by one● Model: repeated game against an adversary● Applications:

– spam detection

– data compression

– online convex optimization

– predicting electricity consumption

– predicting air pollution levels

– ...

Page 9: Game-theoretic Online Learning

Repeated Game (Informally)

● Sequentially predict outcomes ● Measure quality of prediction by loss

● Before predicting , get predictions (=advice)from experts

● Goal: to predict as well as the best expert over rounds.

● Data and Advice can be adversarial

Page 10: Game-theoretic Online Learning

Repeated Game

● Every round :

1. Get expert predictions

2. Predict

3. Outcome is revealed

4. Measure losses

Page 11: Game-theoretic Online Learning

Repeated Game

● Every round :

1. Get expert predictions

2. Predict

3. Outcome is revealed

4. Measure losses

● Best expert:

● Goal: minimize regret

Page 12: Game-theoretic Online Learning

Outline

● Introduction to Online Learning– Game-theoretic Model

– Regression example: Electricity

– Classification example: Spam

● Three algorithms● Tuning the Learning Rate

Page 13: Game-theoretic Online Learning

Regression Example: Electricity

● Électricité de France: predict electricitydemand one day ahead, every day

● Experts: complicated regression models● Loss:

[Devaine, Gaillard,Goude, Stoltz, 2013]

Page 14: Game-theoretic Online Learning

Regression Example: Electricity

● Électricité de France: predict electricitydemand one day ahead, every day

● Experts: complicated regression models● Loss:

● Best model after one year:

● How much worse are we?

[Devaine, Gaillard,Goude, Stoltz, 2013]

Page 15: Game-theoretic Online Learning

Outline

● Introduction to Online Learning– Game-theoretic Model

– Regression example: Electricity

– Classification example: Spam

● Three algorithms● Tuning the Learning Rate

Page 16: Game-theoretic Online Learning

Example: Spam Detection

spam

spam

spamspam

spamspam

ham

hamham

ham

Page 17: Game-theoretic Online Learning

Classification Example: Spam

● Experts: spam detection algorithms● Messages:

Predictions:● Loss:

● Regret: extra mistakes we make over bestalgorithm on messages

Page 18: Game-theoretic Online Learning

Outline

● Introduction to Online Learning● Three algorithms:

1. Halving

2. Follow the Leader (FTL)

3. Follow the Regularized Leader (FTRL)

● Tuning the Learning Rate

Page 19: Game-theoretic Online Learning

A First Algorithm: Halving

● Suppose one of the spam detectors is perfect

● Keep track of experts without mistakes so far:

● Halving algorithm:

● Theorem: regret

Page 20: Game-theoretic Online Learning

A First Algorithm: Halving

Theorem: regret

● Does not grow with

Proof:● Suppose halving makes mistakes, regret =

● Every mistake eliminates at least half of

● is at most mistakes

Page 21: Game-theoretic Online Learning

Outline

● Introduction to Online Learning● Three algorithms:

1. Halving

2. Follow the Leader (FTL)

3. Follow the Regularized Leader (FTRL)

● Tuning the Learning Rate

Page 22: Game-theoretic Online Learning

Follow the Leader

● Want to remove unrealistic assumption thatone expert is perfect

● FTL: copy the leader's prediction● The leader at time :

where is cumulative loss forexpert k

(break ties randomly)

Page 23: Game-theoretic Online Learning

FTL Works with Perfect Expert

Theorem: Suppose one of the spam detectorsis perfect. Then Expected regret

Proof:● Expected regret = E[nr. mistakes] - 0● Worst case: experts get one loss in turn● E[nr. mistakes]

Page 24: Game-theoretic Online Learning

FTL: More Good News

● No assumption of perfect expert

Theorem: regret

Page 25: Game-theoretic Online Learning

FTL: More Good News

● No assumption of perfect expert

Theorem: regret

● Proof sketch:– No leader change: our loss = loss of leader, so

the regret stays the same

– Leader change: our regret increases at mostby 1 (range of losses)

Page 26: Game-theoretic Online Learning

FTL: More Good News

● No assumption of perfect expert

Theorem: regret

● Proof sketch:– No leader change: our loss = loss of leader, so

the regret stays the same

– Leader change, our regret increases at mostby 1 (range of losses)

● Works well for i.i.d. losses, because the leaderchanges only finitely many times w.h.p.

Page 27: Game-theoretic Online Learning

FTL on IID Losses

● 4 experts with Bernoulli 0.1, 0.2, 0.3, 0.4losses; regret = O(log K)

Page 28: Game-theoretic Online Learning

FTL Worst-case Losses

Page 29: Game-theoretic Online Learning

FTL Worst-case Losses

● Two experts with tie/leader change everyround:

● Both experts have cumulative loss:● Regret is linear in

● Problem: FTL too sure of itself when no ties!

Expert 1 1 0 1 0 1 0

Expert 2 0 1 0 1 0 1

FTL 1/2 1 1/2 1 1/2 1

Page 30: Game-theoretic Online Learning

Outline

● Introduction to Online Learning● Three algorithms:

1. Halving

2. Follow the Leader (FTL)

3. Follow the Regularized Leader (FTRL)

● Tuning the Learning Rate

Page 31: Game-theoretic Online Learning

Solution: Be Less Sure!

● Pull FTL choices towards uniform distribution● Follow the Leader:

– leader:

– as distribution:

● Follow the Regularized Leader:

– add penalty for being away from uniform;Kullback-Leibler divergence in this talk

Page 32: Game-theoretic Online Learning

The Learning Rate

● Follow the Regularized Leader:

● Very sensitive to choice of learning rate

Follow the Leader Don't learn at all

Page 33: Game-theoretic Online Learning

Outline

● Introduction to Online Learning● Three algorithms● Tuning the Learning Rate:

– Safe tuning

– The Price of Robustness

– Learning the Learning Rate

Page 34: Game-theoretic Online Learning

The Worst-case Safe Learning Rate

Theorem: For FTRL regret

regret

● No (probabilistic) assumptions about data!● Optimal● is standard in online learning

Page 35: Game-theoretic Online Learning

Outline

● Introduction to Online Learning● Three algorithms● Tuning the Learning Rate:

– Safe tuning

– The Price of Robustness

– Learning the Learning Rate

Page 36: Game-theoretic Online Learning

The Price of Robustness

● Safe tuning does much worse than FTL oni.i.d. losses

Page 37: Game-theoretic Online Learning

The Price of Robustness

Method Special Caseof FTRL

PerfectExpert Data

IID Data Worst-caseData

Halving no undefined undefined

Follow the Leader

(very large)

FTRL with Worst-case Safe Tuning

(small)

Can we adapt to optimal eta automatically?

Page 38: Game-theoretic Online Learning

Outline

● Introduction to Online Learning● Three algorithms● Tuning the Learning Rate:

– Safe tuning

– The Price of Robustness

– Learning the Learning Rate

Page 39: Game-theoretic Online Learning

A Failed Approach: Being Meta

● We want ● Idea: meta-problem

– Expert 1: FTL

– Expert 2: FTRL with Safe Tuning

Page 40: Game-theoretic Online Learning

A Failed Approach: Being Meta

● We want ● Idea: meta-problem

– Expert 1: FTL

– Expert 2: FTRL with Safe Tuning

● Regret ● Best of both worlds if meta-regret small!

Page 41: Game-theoretic Online Learning

A Failed Approach: Being Meta

● We want ● Idea: meta-problem

– Expert 1: FTL

– Expert 2: FTRL with Safe Tuning

● Regret ● Best of both worlds if meta-regret small!● If and ,

then is too big!

Page 42: Game-theoretic Online Learning

Partial Progress

● Safe tuning: Regret● Improvement for small losses:

Regret

Page 43: Game-theoretic Online Learning

Partial Progress

● Safe tuning: Regret● Improvement for small losses:

● Variance Bounds:

Regret variance of

● Cesa-Bianchi, Mansour, Stoltz, 2007● vE, Grünwald, Koolen, De Rooij, 2011● De Rooij, vE, Grünwald, Koolen, 2014

Page 44: Game-theoretic Online Learning

Partial Progress

● Safe tuning: Regret● Improvement for small losses:

● Variance Bounds:

Regret variance of

Page 45: Game-theoretic Online Learning

Regret is a Difficult Function 1/2

● All the previous solutions could potentially endup in wrong local minimum

Page 46: Game-theoretic Online Learning

Regret is a Difficult Function 2/2

● Regret as a function of T for fixed is non-monotonic.

● This means some may look very bad for awhile, but end up being very good after all

● How do we see which 's are good?● Use (best-possible) monotonic lower-bound

per

Page 47: Game-theoretic Online Learning

Learning the Learning Rate

● Koolen, vE, Grünwald, 2014

Page 48: Game-theoretic Online Learning

Learning the Learning Rate

● Track performance for grid of learning rates

● Switch between them– Pay for switching, but not too much

● Running time as fast as for single fixed– does not depend on size of grid

Page 49: Game-theoretic Online Learning

Learning the Learning Rate

Theorems:

● For all interesting :

} As good asall previousmethods

polylog(K,T)

Page 50: Game-theoretic Online Learning

Learning the Learning Rate

● Koolen, vE, Grünwald, 2014

Page 51: Game-theoretic Online Learning

The Price of Robustness

Method Special Case ofFTRL

PerfectExpert Data

IID Data Worst-caseData

Compared toOptimal

Follow theLeader

(very large)

no usefulguarantees

FTRL withWorst-caseSafe Tuning

(small)

no usefulguarantees

LLR polylog(K,T)factor

Can we adapt to optimal eta automatically? Yes!

Page 52: Game-theoretic Online Learning

Summary

● Game-theoretic Online Learning– e.g. electricity forecasting, spam detection

● Three algorithms:– Halving, Follow the (Regularized) Leader

● Tuning the Learning Rate:– Safe tuning pays the Price of Robustness

– Learning the learning rate adapts to theoptimal learning rate automatically

Statistics

FrequentistStatistics

RogueFactions

Game-theoreticOnline Learning

AdversarialBandits

Youare here

BayesianStatistics

Page 53: Game-theoretic Online Learning

Future Work

● So far: compete with the best expert

● Online Convex Optimization: compete with thebest convex combination of experts

● Future work: extend LLR to online convexoptimization

Page 54: Game-theoretic Online Learning

References

● Cesa-Bianchi and Lugosi. Prediction, learning, and games. 2006.

● Cesa-Bianchi, Mansour, Stoltz. Improved second-order bounds for prediction withexpert advice. Machine Learning, 66(2/3):321–352, 2007.

● Devaine, Gaillard, Goude, Stoltz. Forecasting electricity consumption byaggregating specialized experts. Machine Learning, 90(2):231-260, 2013.

● Van Erven, Grünwald, Koolen and De Rooij. Adaptive Hedge. NIPS, 2011.

● De Rooij, Van Erven, Grünwald, Koolen. Follow the Leader If You Can, Hedge IfYou Must. Journal of Machine Learning Research, 2014.

● Koolen, Van Erven, Grünwald. Learning the Learning Rate forPrediction with Expert Advice. NIPS, 2014.