lecture 06 dynamic programming(1) 1

Process Optimisation

Dynamic programming

Introduction

Richard Ernest Bellman (August 26, 1920 - March 19, 1984) was an applied mathematician, celebrated for his invention of dynamic programming in 1953, and important contributions in other fields of mathematics. During World War II he worked for a Theoretical Physics Division group in Los Alamos. In 1946 he received his Ph.D. at Princeton. He was a professor at the University of Southern California.

Dynamic programming converts a large, complicated optimisation problem into a series of interconnected smaller ones, each containing only a few variables.

The result is a series of partial optimisations requiring a reduced effort to find the optimum, even though some of the variables may have to be enumerated throughout their range.

Then, the dynamic programming algorithm can be applied to find the optimum of the entire process by using the connected partial optimisations of the smaller problems.

Introduction

Introduction

This optimisation procedure was developed at the same organisation where Dantzig developed linear programming, the RAND Corporation, a U.S. Air Force sponsored "think tank".

The research was in response to the need in the early 1950's, the Sputnik era, for a solution to the optimum missile trajectory problem which required extensions to the calculus of variations.

Two parallel efforts, one in U.S. by Richard Bellman and another in Russia by L. S. Pontryagin, led to similar but different solutions to the problem.

History

Ground-radar-controlled missile chasing a moving target.

History

The name, Dynamic Programming, was selected by Richard Bellman for this optimisation method that he devised and described in a series of papers and the books (1957, 1962).

There are continuous and discrete versions of this optimisation method. The continuous version is used for solutions to the trajectory problem where a continuous function is required, and the discrete version is used when a problem can be described in a series of stages.

Most engineering applications use the discrete version of dynamic programming.

History

Aris (1961) had published results of research on the application of dynamic programming to the optimal design of chemical reactors, and Mitten and Nemhauser (1963) had described a procedure to apply the method to a chemical process that involved a branched system.

Wilde (1965) developed the concept of functional diagrams to represent the functional equations of dynamic programming and a systematic method of converting a process flow diagram to a dynamic programming functional diagram. These results have become the standard way of analysing processes for dynamic programming optimisation.

History

Definition of a stage

si s̃i(si,di)

Ri(si,di)

di

Stage i

Decision variables, di, can be manipulated independently whilst state variables, si, are inputs to the stage from an adjacent stage and therefore can not be manipulated independently.

The economic model is called a return function Ri(si,di) and gives the measure of profit or cost for the stage.

The stage will have outputs, s̃i, that are inputs to adjacent stages. There are transition functions, s̃i=Ti(si,di), at each stage and could represent material and energy balances at the stage.

Definition of a stage

Three stage process

Functional diagram for three serial process.

s3

R3(s3,d3)

d3

3 s̃3=s2

f3(s3)

d1

1 s̃1

R1(s1,d1)f1(s1)R2(s2,d2)

d2

2 s̃2=s1

f2(s2)

The constraint equations or transition functions and the incident identities can be written as:

Optimise:

Subject to:

There are four independent variables, d1, d2, d3 and s3. which are to be determined to optimise the sum of the returns R1, R2 and R3.

3

i i ii 1

R s ,d

i i i i

i 1 i

s T s ,d i 1, 2,3

s s i 1, 2

Three stage process

At the first stage, the following equation gives the dynamic programming algorithm in terms of maximising the profit given by the return function.

It is necessary to exhaustively list individual values of s1 and to search on d1 to determine f1(s1). The values of f1(s1) are tabulated and stored for future use.

1

1 1 1 1 1df s max R s ,d

First stage

At stage 2, the optimal information at stage 1 is used, and the dynamic programming algorithm at this stage is:

Again, it is necessary to exhaustively list individual values of s2 and to search on d2 to obtain the maximum of the sum of the return at stage 2 and the optimal return at stage 1, f1(s1). The appropriate values of f1(s1) are determined using incident identity and transition function, s1=s̃2=T2(s2,d2). Thus, the optimal values of f2(s2) can be determined and stored for future use.

2

2 2 2 2 2 1 1df s max R s ,d f s

Second stage

At the final stage, the optimal information f2(s2) from stage 2 is used, and the dynamic programming algorithm at this stage is:

At this point either the value of s3 is known or it is an independent variable. If s3 is a known constant value, it is necessary only to determine the value of d3 that maximises f3(s3) for that value of s3. If s3 is an independent variable, it is necessary to conduct a two variable search to determine the maximum value of f3(s3,d3).

3

3 3 3 3 3 2 2df s max R s ,d f s

Final stage

“An optimal policy has the property that whatever the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.”

This principle was stated mathematically as the dynamic programming algorithm to maximise a serial process with i stages as:

i

i i i i i i 1 i 1df s max R s ,d f s

Principle of optimality

A tank truck of an expensive product manufactured in San Francisco is to be delivered to any major port on the East coast for shipment to Europe. The cost for shipment across the Atlantic is essentially the same from the major ports on the East coast. It is desired to select the optimum route (lowest road mileage) from San Francisco to the East coast.

Route planning

Route planning

Route planning

s̃1s4 s1

R3R4

d4

4

f4 f3

d3

3

R2

d2

2

f2 f1 R1

d1

1s2s3

1

4

8

8

5

2

1

6

9

9

5

2

N37 3 2N2 N17

6 5 36

C34 7 6C2 C14

S3 S2 S1

3 8 73

San Francisco

Seattle

San Diego

New York

Boston

Philadelphia

1

4

8

8

5

2

1

6

9

9

5

2

N37 3 2N2 N17

6 5 36

C34 7 6C2 C14

S3 S2 S1

3 8 73

San Francisco

Seattle

San Diego

New York

Boston

Philadelphia

16

1

3

7

9

7

6

Route planning10

13

15

Stage 4 Stage 3 Stage 2 Stage 1s4 d*4 f4 s3 s3 d*3 f3 s2 s2 d*2 f2 s1 s1 d*1 f1 s1̃

N3 S/R 10 N2/C2 N2 S 9 N1 N1 S 1 BOSSFO L 16 N3 C3 S/R 13 C2/S2 C2 L 7 N1 C1 L 3 BOS

S3 L/S 15 C2/S2 S2 L 6 C1 S1 L 7 NY

A serial system has the output of one stage as the input to the following stage, and the maximum of the sum of the profits from each stage is to be found by determining the optimal values of the decision variables.

Serial system optimisation problems are of four types: initial value, final value, two-point boundary value and cyclic problems. In an initial value problem sN is a known constant; in a final value problem s̃1 is a known constant; and in a two-point boundary value problem both sN and s̃1 are known. In a cyclic problem sN= s̃1 and the best value has to be determined that maximises fN(sN).

Serial systems

Serial systems

Functional diagram for a serial process.

sN

RN

dN

N s̃N=sN-1

fN(sN) fN-1(sN-1) RN-1

dN-1

N-1 s̃N-1

R2

d2

2

f2(s2)

s̃2=s1

f1(s1) R1

d1

1 s̃1s2

The dynamic programming algorithm for the ith stage of the initial value problem is:

Substituting the incident identity and transition function into equation gives:

which shows that fi is a function of si, optimising out di.

Initial value problem

i

i i i i i i 1 i 1df s max R s ,d f s

i

i i i i i i 1 i i idf s max R s ,d f T s ,d

At the last stage, stage N, the dynamic programming algorithm is:

If the value of sN is a known constant, the maximum return is fN(sN), and an exhaustive tabulation of sN is not required. However, if sN is not a constant and can be manipulated like a decision variable to maximise fN(sN), the dynamic programming algorithm at stage N is:

Initial value problem

N

N N N N N N 1 N N Ndf s max R s ,d f T s ,d

NN

N N N N N N 1 N 1ds

f s max R s ,d f s

For this situation the output from the first stage, s̃1, is a known constant. There are two approaches to solve this problem which are called state inversion and decision inversion.

State inversion means to transform the final value problem into an initial value problem by obtaining the N inverse transition functions, i.e., solve the transition functions for si in terms of s̃i as indicated below.

Reversing the arrows and renumbering the stages makes the problem into an initial value one.

Final value problem

i i i is T s ,d

Functional diagram for state inversion for final value problem.

State inversion

sN s̃1

dN

N s̃N=sN-1

dN-1

N-1 s̃N-1

d2

2 s̃2=s1

d1

1s2

In some cases inverting the transition functions is not possible, and the technique of decision inversion is employed. Here the roles of d1 and s1 are interchanged. The stage 1 transition function is:

This equation can be put in form

and d1 is uniquely determined by specifying s1, for s̃1 is a constant for this case. Stage 1 is decisionless and is combined with stage 2.

Final value problem

1 1 1 1s T s ,d cons tan t

1 1 1 1d T s ,s

The functional equation for the combined stages 1 and 2 is now:

After decision inversion is performed, the usual serial problem procedure applies to the rest of the stages in the problem.

Final value problem

2

2

2

2 2 2 2 2 1 1 1d

2 2 2 1 2 2 2 1 2 2 2 1d

2 2 2 1 2 2d

f s max R s ,d R s ,d

max R s ,d R T s ,d ,T T s ,d ,s

max R s ,d R s ,d

Decision inversion

Functional diagram for decision inversion for final value problem.

s̃4=s3 s̃3=s2

d3

3

d2

2 s̃2=s1

d1

1 s̃1

This type of problem arises when both the initial and final values of the state variables sN and s̃1 are specified. The problem requires decision inversion because state inversion still would give a two-point boundary value problem.

Decision inversion is performed condensing stages 1 and 2 as in final value problem. Then the partial optimisation proceeds as in an initial value problem. The dynamic programming algorithm at stage N being:

2-point value problem

N

N N N N N N 1 N 1df s max R s ,d f s

The cyclic system is a special case of the two-point boundary value problem where sN= s̃1. The method to solve this problem is to select a value of s̃1=sN=C and proceed to determine the optimum return as a two-point boundary value problem. The dynamic programming algorithm at stage N is:

Then a single variable search is performed by varying C until the maximum return fN(C) is located. Fixing the value of a state variable is referred to as cutting the state.

Cyclic optimisation

N

N N N N 1 N 1df C max R C,d f s

Functional diagram for N stage cyclic systems.

sN

RN

dN

N s̃N=sN-1

fN fN-1 RN-1

dN-1

N-1 s̃N-1

R2

d2

2

f2

s̃2=s1

f1 R1

d1

1 s̃1s2

Cyclic optimisation

Mitten and Nemhauser (1963) outlined the following 4 steps to use dynamic programming:

1. Separate the process into stages.

2. Formulate the return and transition functions for each stage of the process.

3. For each stage select the inputs, decisions and outputs to have as few state variables per stage as possible.

4. Apply the dynamic programming algorithm to find the optimal return from the process and the optimal decisions at each stage.

Procedures

It is proposed to build thermal stations at three different sites. The total budget available is 3 units (1 unit = $10 million) and the feasible levels of investment on any thermal station are 0, 1, 2, or 3 units. The electric power obtainable (return function) for different investments is given below:

Find the investment policy for maximising the total electric power generated.

Example

Return function, Ri(x)

Thermal station, i1 2 3

Ri(0) 0 0 0

Ri(1) 2 1 3

Ri(2) 4 5 5

Ri(3) 6 6 6

Bellman, R.E., Dynamic Programming, Princeton University Press. Princeton, N.J. (1957).

Bellman, R.E., and S. Dreyfus, Applied Dynamic Programming, Princeton University Press, Princeton, N.J. (1962).

Aris, R., The Optimal Design of Chemical Reactors, Academic Press, New York (1961).

Mitten, L.G. and G.L. Nemhauser, "Multistage Optimization", Chemical Engineering Process, 54, (1), 53 (Jan 1963).

Wilde, D.J., "Strategies for Optimization Macrosystems", Chemical Engineering Progress, 61 (3), 86 (March 1965).

References

lecture 06 dynamic programming(1) 1

Documents