blind online optimization gradient descent without a gradient abie flaxman cmu adam tauman kalai tti...

Blind online optimizationGradient descent without a gradient

Abie Flaxman CMU

Adam Tauman Kalai TTI

Brendan McMahan CMU

Standard convex optimization

Convex feasible set S ½ <d

Concave function f : S ! <

Goal: find x

f(x) ¸ maxz2Sf(z) – = f(x*) -

Steepest ascent

• Move in the direction of steepest ascent

• Compute f’(x) (rf(x) in higher dimensions)

• Works for convex optimization

(and many other problems)

x1 x2x3x4

Typical application

• Company produces certain numbers of cars per month

• Vector x 2 <d (#Corollas, #Camrys, …)

• Profit of company is concave function of production vector

• Maximize total (eq. average) profit

PROBLEMS

• Sequence of unknown concave functions

• period t: pick xt 2 S, find out only ft(xt)

• convex

Problem definition and results

Theorem:

Online model

• Holds for arbitrary sequences

• Stronger than stochastic model:– f1, f2, …, i.i.d. from D

– x* = arg minx2S ED[f(x)]

expected

regret

Outline

• Problem definition

• Simple algorithm

• Analysis sketch

• Variations

• Related work & applications

First try

f1(x1)

#CAMRYSx2

f2(x2)

f3(x3)

f4(x4)

f1f2f3

Zinkevich ’03:

If we could only compute gradients…

Idea: one point gradientP

#CAMRYSxx+x-

With probability ½, estimate = f(x + )/

With probability ½, estimate = –f(x – )/

E[ estimate ] ¼ f’(x)

d-dimensional online algorithm

Outline

• Analysis sketch

• Variations

Analysis ingredients

• E[1-point estimate] is gradient of

• is small

• Online gradient ascent analysis [Z03]

• Online expected gradient ascent analysis

• (Hidden complications)

1-pt gradient analysisP

#CAMRYSx+x-

1-pt gradient analysis (d-dim)

• E[1-point estimate] is gradient of

• is small 2

Online gradient ascent [Z03]

(concave,

bounded gradient)

Expected gradient ascent analysis

• Regular deterministic gradient ascent on gt

(concave,

bounded gradient)

Hidden complication…

Thin sets are bad

Hidden complication…

Round sets are good

…reshape into

“isotropic position”

[LV03]

Outline

• Analysis sketch

• Variations

Variations

• Works against adaptive adversary– Chooses ft knowing x1, x2, …, xt-1

• Also works if we only get a noisy estimate of ft(xt), i.e. E[ht(xt)|xt]=ft(xt)

diameter

gradient

Finite difference

Related convex optimization

Sighted(see entire function(s))

Blind (evaluations only)

Regular(single f)

Stochastic(dist over f’s or

dist over errors)

Online(f1, f2, f3, …)

Gradient descent (stoch.)

Gradient descent, ... Ellipsoid, Random walk [BV02],

Sim. annealing [KV05],

Finite difference

Gradient descent (online)

1-pt. gradient appx. [BKM04]

Finite difference [Kleinberg04]

1-pt. gradient appx.

[G89,S97]

Multi-armed bandit (experts)

[R52,ACFS95,…]

Driving to work (online routing)

Exponentially many paths…

Exponentially many slot machines?

Finite dimensions

Exploration/exploitation tradeoff

[TW02,KV02,

AK04,BM04]

Online product design

Conclusions and future work

• Can “learn” to optimize a sequence of unrelated functions from evaluations

• Answer to:“What is the sound of one hand clapping?”

• Applications– Cholesterol– Paper airplanes– Advertising

• Future work– Many players using same algorithm

(game theory)

blind online optimization gradient descent without a gradient abie flaxman cmu adam tauman kalai tti...

Documents

an introduction to the conjugate gradient method without...

gradient dune fonction gradient dune fonction....

ryan o'donnell (cmu) yi wu (cmu, ibm) yuan zhou (cmu)

nicolas christin, cmu ini/cylab sally s. yanagihara, cmu...

ryan o'donnell (cmu, ias) yi wu (cmu, ibm) yuan zhou (cmu)

proximal gradient descent - cmu...

potential-based agnostic boosting varun kanade harvard...

vikas k. garg and adam tauman kalai december 29, 2016

gradient 1 gradient 2 gradient 3 gradient 4 ecdis buyers ·...

cmu retaining 1-6' landscapingnetwork .com cmu block ckfill...

cmu fee schedule - hong kong dollarcmu fees for efbn, gb and...

steepest gradient method conjugate gradient method

ryan o’donnell (cmu, ias) joint work with yi wu (cmu,...

cmu scs big (graph) data analytics christos faloutsos cmu

ran canetti, yael tauman kalai, mayank varia, daniel wichs

stochastic gradient descent - cmu...

newton’s method - cmu...

cmu scs data mining on streams christos faloutsos cmu

venkatesan guruswami (cmu) yuan zhou (cmu)

learning to learn by gradient descent by gradient...