dynamic programming

Dynamic Programming

Operations Research

Dynamic programming

• Series of inter-related decisions• How to find the combination of most optimal

decisions over time?• Production Scheduling, Inventory Control,

Warehousing, given changes in demand – How much to produce, store and sell over a period of time

2

Sometimes its easier to work backwards from the end!

• You are going for an arranged marriage and you have 10 proposals, whom you will meet one after the other.

• Once you have rejected a proposal, you cannot get the same girl again!

• The current profile may be best or they may be better profiles yet to come . . .

• How to decide?

A problem along similar lines• Lets say you have 10 tries to receive a random amount between 0 and 100. At

each try, you can either accept what you get and quit, or pass to next try.• At each try, when would you accept or reject the value that comes up?• Working backwards,

– On the last attempt (assuming you passed 9 times), you will have to take whatever value comes – expected value = 50.

– On the second last attempt, you can either accept whatever comes or pass to last attempt where you expect to get 50. So, in second last attempt, if you get more than 50 you should accept, else pass.

– On the third last attempt, you can either accept or move to second last attempt, where you expect to get 75 or again pass to last attempt where you expect to get 50. On an average you would expect to get 62.5 over the last two attempts. Therefore, if you get more than 62.5 in third last attempt, you should accept, else pass.

– On the 4th last attempt, you can either accept or move to 3rd last attempt where you expect to get 87.5 or pass to 2nd last attempt where you expect to get 75 or pass to last attempt where you expect to get 50. On an average you would expect to get 708 over last 3 attempts. Therefore, if you get more than 708 in 4th last attempt, you should accept, else pass.

– And so on . . .

Finding Shortest Path

Stage 1 Stage 2 Stage 3 Stage 4 Stage 5

Working Backwards

Stage 3 H I SP Go to

E 1+3 4+4 4 H

F 6+3 3+4 7 I

G 3+3 3+4 6 H

Stage 4

E F G SP Go to

B 7+4 4+7 6+6 11 E or F

C 3+4 2+7 4+6 7 E

D 4+4 1+7 5+6 8 E or F

Stage 5

B C D SP Go to

A 2+11 4+7 2+8 10 DADEHJ =2+4+1+3=10 ADFIJ = 2+1+3+4=10

Bellman’s Principle of Optimality

• “An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.”

dynamic programming

Education