dpa51 dynamic programming applications lecture 5

21
DPA5 1 Dynamic Programming Applications Lecture 5

Upload: regina-dennis

Post on 02-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DPA51 Dynamic Programming Applications Lecture 5

DPA5 1

Dynamic Programming Applications

Lecture 5

Page 2: DPA51 Dynamic Programming Applications Lecture 5

DPA5 2

Preview

Last time:

Structural properties .

Today:

Optimal stopping & the OLA rule

(Secretary problem, Asset selling)

Next time:

Infinite horizon.

Page 3: DPA51 Dynamic Programming Applications Lecture 5

DPA5 3

The RM problemJt(x,i)= max{Jt-1(x), Ri+ Jt-1(x-1)}= (Ri- OCt-1(x))+ +Jt-1(x)

Optimal policy: accept cls. i iff Ri OCt-1(x) = Jt-1(x) - Jt-1(x-1)

Results:

1. Jt(x) increasing in x - by induction

2. OCt(x) decreasing in x - single crossing

3. OCt(x) increasing in t - by induction + 2:

Jt (x) = pi (Ri- OCt-1(x))+ + Jt-1(x)

Jt(x-1)= pi (Ri- OCt-1(x-1))+ + Jt-1(x-1)

OCt(x)- OCt-1(x)= pi [(Ri- OCt-1(x))+- (Ri- OCt-1(x-1))+] 0

Page 4: DPA51 Dynamic Programming Applications Lecture 5

DPA5 4

The RM problem - results

• The optimal policy is characterized by threshold levels bi

t as follows:

Accept class i at time t iff 0 x < bit

where bit = min{x | OCt-1(x) > Ri}

• Moreover, b1t … bm

t , where R1 … Rm

Page 5: DPA51 Dynamic Programming Applications Lecture 5

DPA5 5

Optimal Stopping

At each stage a control is available that stops

the evolution of the system.

At stage k there are 2 options:

1. Stop process (get a certain reward)

2. Continue process, perhaps at a certain cost, and select one of the next available choices.

If there is only one other choice besides stopping,

policy is characterized by the stopping states-set.

Page 6: DPA51 Dynamic Programming Applications Lecture 5

DPA5 6

Secretary Problems

• Cayley 1875• Interview N candidates for a job• Must accept/reject at end of interview• Objectives:

– Maximize expected ‘score’– Maximize P(get the best)

(you risk to hire nobody!)

Page 7: DPA51 Dynamic Programming Applications Lecture 5

DPA5 7

Archetype problem

Make irrevocable choice from a fixed

number of opportunities whose values

are revealed sequentially.• Asset selling• Purchasing with a deadline• Exercising stock options (in your next HW)

Page 8: DPA51 Dynamic Programming Applications Lecture 5

DPA5 8

Max P(get best)

• Wt=history of relative ranks of candidates seen by time t (inclusive)

• xt = 1, if tth candidate is best seen so far

0, otherwise

• Relevant: t and xt

• Fact: xt=1 and Wt-1 statistically independent:

Page 9: DPA51 Dynamic Programming Applications Lecture 5

DPA5 9

Objective

Jt = P(under optimal policy we select best candidate given that we’ve rejected t-1 so far )

Jt (0)=P(under optimal policy we select best candidate given that we’ve seen t so far and the last one was NOT the best so far)

Jt (1)= …

P(best of N| best of first t) = ?

Page 10: DPA51 Dynamic Programming Applications Lecture 5

DPA5 10

DP equation

JN+1 = 0

Jt = (t-1)/t Jt (0) + 1/t Jt (1)

Jt (0) = Jt+1 (must continue)

Jt (1) = max ( t/N , Jt+1) (accept or continue)

Fact 1: Jt -1 Jt

Fact 2: Jt t and t/N t => single crossing

Define: t* = min {t | Jt+1 t/N}

Page 11: DPA51 Dynamic Programming Applications Lecture 5

DPA5 11

RecursionJt = Jt* , if t < t*

(t-1)/t Jt + 1/N, if t t*

Jt/(t-1) = Jt+1/t + 1/(N(t-1))

Therefore: Jt+1 = t/N 1/s (after telescoping)

By definition, t* is the smallest s.t. Jt*+1 t* /N , so

t* = min{t | 1/s 1} = ?

N-1

s=t

N-1

s=t

Page 12: DPA51 Dynamic Programming Applications Lecture 5

DPA5 12

Policy

• For large N: 1/s loge(N/ t0)

• Therefore t0 N/e

• Policy: Interview N/e candidates and reject them, then select best you see so far.

• P(success) = J(t0) t0 /N 1/e .3679

• Empirical validation?

N-1

s=t0

Page 13: DPA51 Dynamic Programming Applications Lecture 5

DPA5 13

The Last Shall be First

“..The last person interviewed for a job gets it 55.8% of the time according to Runzheimer Canada, Inc. Early applicants are hired only 17.6% of the time; the management consulting firm suggests that job-seekers who find they are among the first to be grilled‘tactfully ask to be rescheduled for a later date’. Mondays are also poor days to be interviewed and anyday just before quitting time is also bad.”

(The Globe and Mail, Sept. 12, 1990, pg. A22)

Page 14: DPA51 Dynamic Programming Applications Lecture 5

DPA5 14

Asset selling

• Like maximizing interview score, but with discounting/investment

• Offers: w0,w1,…,wN-1 i.i.d with fixed known distribution (if not known: inference, learning)

• Stage k choices:1. Accept, and invest $wk at rate r2. Reject, and wait until stage k+1

• Objective: maximize revenue at end of period N

Page 15: DPA51 Dynamic Programming Applications Lecture 5

DPA5 15

Formulation

State:

• xkT: asset has not been sold, current offer is xk

• xk=T: asset has been sold

Decision:

• uk= u sell; uk= u’ don’t sell

Plant equation:

xk+1= T, if xk=T, or if xkT and uk= u (sell)

wk, otherwise

Page 16: DPA51 Dynamic Programming Applications Lecture 5

DPA5 16

Costs

gN(xN) = xN , if xN T

0 , else

gk(xk) = (1+r)N-k xk , if xk T and uk=u

0 , else

JN(xN) = xN , if xN T

0 , else

Jk(xk) = max((1+r)N-k xk , Ew{Jk+1(wk)}), if xk T

0 , else

Page 17: DPA51 Dynamic Programming Applications Lecture 5

DPA5 17

Policy

• Accept offer xk if xk > ak

• Reject offer xk if xk < ak

• Indifferent if xk = ak

Optimal policy is determined by sequence ak:

• ak = Ew{Jk+1(wk)} / (1+r)N-k

Page 18: DPA51 Dynamic Programming Applications Lecture 5

DPA5 18

Structural properties

Fact: ak ak+1 for all k

Intuition:

if an offer is good enough to be acceptable at time k, it should be so at time k+1.

Page 19: DPA51 Dynamic Programming Applications Lecture 5

DPA5 19

General stopping & OLA

• Stopping mandatory at or before stage N• Stationary: state, control, disturbances, and their space

sets, and cost/stage are constant over time

• Xtra action: go to termination state @ cost t(xk)

DP-algorithm:

JN(xN) = t(xN )

Jk(xk) = min(t(xk), Ew{g(xk,uk,wk)+Jk+1(f( xk,uk,wk)})

Page 20: DPA51 Dynamic Programming Applications Lecture 5

DPA5 20

Stopping set

It is optimal to stop at time k for states x in the set:

Tk={x| t(x) minu E{g(x,u,w) + Jk+1(f(x,u,w)) }

Fact: JN-1(x) JN(x), so Jk-1(x) Jk(x) for all k, x.

Cor.: T0 … Tk Tk+1 … TN-1

Question: how to guarantee equality?

Page 21: DPA51 Dynamic Programming Applications Lecture 5

DPA5 21

Absorbance

Condition: TN-1 is absorbing if x TN-1 and termination not selected, then next state is in TN-1.

That is f(x,u,w) TN-1 for all x TN-1 , u U(x), w.

Intuition: if you reach a state that’s optimal to stop at, but you don’t stop, then you move to a state that’s also optimal to stop at.

Theorem: If TN-1 is absorbing then Tk=TN-1 for all k.

OLA policy: iff TN-1 (1-step stopping set) absorbing.