static and dynamic optimization (42111)
TRANSCRIPT
Static and Dynamic Optimization (42111)
Build. 303b, room 048Section for Dynamical Systems
Dept. of Applied Mathematics and Computer ScienceThe Technical University of Denmark
Email: [email protected]: +45 4525 3356mobile: +45 9351 1161
2019-11-24 14:37
Lecture 12: Stochastic Dynamic Programming
1 / 29
Outline of lecture
Recap: L11 Deterministic Dynamic Programming (D)
Dynamics Programming (C)
Stochastics (Random variable)
Stochastic Dynamic Programming
Booking profiles
Stochastic Bellman
Stochastic optimal stepping (SDD)
Reading guidance: DO p. 83-92.
2 / 29
Dynamic Programming (D)
Find a sequence of decisions ui i = 0, , 1, . . . N which takes the system
xi+1 = fi(xi, ui) x0 = x0
along a trajectory, such that the cost function
J = φ(xN ) +
N−1∑
i=0
Li(xi, ui)
is minimized.
3 / 29
Dynamic Programming
The Bellman function (the optimal cost to go) is defined as:
Vi(xi) = minuN−1
i
Ji(xi, uN−1i )
and is a function of the present state, xi, and index, i.In particular
VN (xN ) = φN (xN )
Theorem
The Bellman function Vi, is given by the backwards recursion
Vi(xi) = minui
[
Li(xi, ui) + Vi+1(xi+1)]
xi+1 = fi(xi, ui) x0 = x0
with the boundary conditionVN (xN ) = φN (xN )
Bellman equation is a functional equation, gives a sufficient condition and V0(x0) = J∗. �
4 / 29
Dynamic programming
ui = arg minui
[
Li(xi, ui) + Vi+1( fi(xi, ui)︸ ︷︷ ︸
xi+1
)
︸ ︷︷ ︸
Wi(xi,ui)
]
If a maximization problem: min → max.
5 / 29
Type of solutions
−50
5
0
5
10
0
5
10
15
20
25
xt
x
Vt(x)
time (i)
V
Fish bone method (Graphical method)
Schematic method (Tables) − > programming
Analytical (e.g. Sep. of variable)
Analytical:
Guess the type of functionality in Vi(x) i.e. up to a number of parameter. Check if it satisfythe Bellman equation. This results in a (number of) recursion(s) for the parameter(s).
6 / 29
Continuous Dynamic Programming
Find the input function ut, t ∈ R, (more precisely {u}T0 ) that takes the system
x = ft(xt, ut) x0 = x0 t ∈ [0, T ] (1)
such that the cost function
J = φT (xT ) +
∫ T
0Lt(xt, ut) dt (2)
is minimized. Define the truncated performance index (cost to go)
Jt(xt, {u}Tt ) = φT (xT ) +
∫ T
t
Ls(xs, us) ds
The Bellman function (optimal cost to go) is defined by
Vt(xt) = min{u}T
t
[
Jt(xt, {u}Tt )
]
We have the following theorem, which states a sufficient condition.
Theorem
The Bellman function Vt(xt), satisfy the equation
−∂Vt(xt)
∂t= min
ut
[
Lt(xt, ut) +∂Vt(xt)
∂xft(xt, ut)
]
Hamilton Jacobi Bellman (3)
This is a PDE with boundary conditions
VT (xT ) = φT (xT )
�7 / 29
Continuous Dynamic Programming
Proof.
In discrete time we have the Bellman equation
Vi(xi) = minui
[
Li(xi, ui) + Vi+1(xi+1)]
with the boundary conditionVN (xN ) = φN (xN )
t+∆t
i+ 1
t
i
Then
Vt(xt) = minut
[∫ t+∆t
t
Lt(xt, ut) dt+ Vt+∆t(xt+∆t)
]
Apply a Taylor expansion on Vt+∆t(xt+∆t)
Vt(xt) = minut
[
Lt(xt, ut)∆t + Vt(xt) +∂Vt(xt)
∂xft ∆t+
∂Vt(xt)
∂t∆t+ o(|∆t|)
]
8 / 29
Continuous Dynamic Programming
Proof.
Vt(xt) = minut
[
Lt(xt, ut)∆t+ Vt(xt) +∂Vt(xt)
∂xft∆t+
∂Vt(xt)
∂t∆t+o(|∆t|)
]
(just a copy)
Collect the terms which do not depend on the decision (ut):
Vt(xt) = Vt(xt) +∂Vt(xt)
∂t∆t+min
ut
[
Lt(xt, ut) ∆t+∂Vt(xt)
∂xft(xt, ut) ∆t
]
+o(|∆t|)
In the limit ∆t → 0 (and after divide with ∆t):
−∂Vt(xt)
∂t= min
ut
[
Lt(xt, ut) +∂Vt(xt)
∂xft(xt, ut)
]
9 / 29
The HJB equation:
−∂Vt(xt)
∂t= min
u
[
Lt(xt, ut) +∂Vt(xt)
∂xft(xt, ut)
]
(just a copy)
The Hamiltonian function
Ht(xt, ut, λTt ) = Lt(xt, ut) + λT
t ft(xt, ut)
The HJB equation can also be formulated as
−∂Vt(xt)
∂t= min
ut
Ht(xt, ut,∂Vt(xt)
∂x)
Link to Pontryagins maximum principle:
λTt =
∂Vt(xt)
∂x
xt = ft(xt, ut) State equation
−λTt =
∂
∂xt
Ht Costate equation
ut = arg minut
[Ht] Optimality condition
10 / 29
Motion control
Consider the systemxt = ut x0 = x0
and the performance index
J =1
2px2
T +
∫ T
0
1
2u2t dt
The HJB equation, (3), gives:
−∂Vt(xt)
∂t= min
ut
[1
2u2t +
∂Vt(xt)
∂xut
]
VT (xT ) =1
2px2
T
The minimization can be carried out and gives a solution w.r.t. ut which is
ut = −∂Vt(xt)
∂x
So if the Bellman function is known the control action, the decision can be determined from this.If the result above is inserted in the HJB equation we get
−∂Vt(xt)
∂t=
1
2
[∂Vt(xt)
∂x
]2
−
[∂Vt(xt)
∂x
]2
= −1
2
[∂Vt(xt)
∂x
]2
which is a partial differential equation with the boundary condition
VT (xT ) =1
2px2
T
11 / 29
PDE:
−∂Vt(xt)
∂t= −
1
2
[∂Vt(xt)
∂x
]2
(just a copy)
Inspired of the boundary condition we guess on a candidate function of the type
Vt(x) =1
2stx
2
where the time dependence is in the function, st. Since
∂V
∂x= stx
∂V
∂t=
1
2stx
2
the following equation
−1
2stx
2 = −1
2(stx)
2
must be valid for any x, i.e. we can find st by solving the ODE
st = s2t sT = p
backwards. This is actually (a simple version of) the continuous time Riccati equation. Thesolution can be found analytically or by means of numerical methods. Knowing the function, st,we can find the control input
ut = −∂Vt(xt)
∂x= −stxt
12 / 29
Stochastic Dynamic Programming
13 / 29
The Bank loan
Deterministic:xi+1 = (1 + r)xi − ui x0 = x0
Stochastic:xi+1 = (1 + ri)xi − ui x0 = x0
0 5 10 15 20 250
1
2
3
4
5
6
7
8
9
10Rate of interests
%
time (month)
14 / 29
0 1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
4 Bank balance
Bal
ance
time (year)0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
4 Bank balance
Bal
ance
time (year)
15 / 29
Discrete Random Variable
X ∈{x1, x2, ..., xm
}∈ R
n
pk = P{X = xk
}≥ 0
m∑
k=1
pk = 1
1 2 3 4 5 6 7 80
0.2
0.4
E
{
X}
=m∑
k=1
pkxk
E
{
g(X)}
=m∑
k=1
pkg(xk)
16 / 29
Stochastic Dynamic Programming
Consider the problem of minimizing (in some sense):
J = φN (xN , eN ) +
N−1∑
i=0
Li(xi, ui, ei)
subject toxi+1 = fi(xi, ui, ei) x0 = x0
and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN
ei might be vectors reflecting model errors or direct stochastic effects.
17 / 29
Ranking performance Indexes
When ei and others are stochastic variable, what do we mean by that one strategy is better thananother.
In a deterministic situation we mean that
J1 > J2
(J1 (J2) being the objective function for strategy 1 (2)).
In a stochastic situation we can choose the definition
E
{
J1
}
> E
{
J2
}
but others do exists. This choice reflects some kind of average consideration.
18 / 29
Example: Booking profiles
Normally a plane is over booked, ie. more tickets are sold than the number of seats xN . Let xi
be the number of sold tickets on the beginning of day i.
0 N21
If xN < xN we have empty seats - money out the window.If xN > xN we have to pay compensations - also money out the window.
So we want to find a strategy such we are minimizing:
E
{
φ(xN − xN )}
Let wi be the requests for a ticket on day i
(with probability: P{wi = k
}= pk)
and let vi be number of cancellations on day i
(with probability P{vi = k
}= qk).
Dynamics:
xi+1 = xi +min(ui, wi) − vi ei =
[wi
vi
]
Decision information: ui(xi).
19 / 29
Stochastic Bellman Equation
Consider the problem of minimizing:
J = E
{
φ(xN , eN ) +
N−1∑
i=0
Li(xi, ui, ei)}
subject toxi+1 = fi(xi, ui, ei) x0 = x0
and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN
Theorem
The Bellman function (optimal cost to go), Vi(xi) is given by the (backward) recursion:
Vi(xi) = minui
E
{
Li(xi, ui, ei) + Vi+1(xi+1)}
xi+1 = fi(xi, ui, ei)
VN (xN ) = E
{
φN (xN , eN )}
where the optimization is subject to the constraints and the available information . �
20 / 29
Discrete (SDD) case
If ei is discrete, ie.
ei ∈{e1i , e2i , ... emi
}pki = P
{
ei = eki
}
k = 1, 2, ... m
then the stochastic Bellman equation can be expressed as
Vi(xi) = minui
m∑
k=1
pki
[
Li(xi, ui, eki ) + Vi+1(fi(xi, ui, e
ki )
︸ ︷︷ ︸
xi+1
)]
︸ ︷︷ ︸
Wi(xi,ui)
with boundary condition
VN (xN ) =m∑
k=1
pkNφN (xN , ekN )
The entries in the scheme below are now expected values (ie. weighted sums).
Wi ui Vi(xi) u∗i (xi)
xi 0 1 2 3
01234
21 / 29
Optimal stochastic stepping (SDD)
Consider the systemxi+1 = xi + ui + ei x0 = 2,
whereei ∈
{−1 0 1
}ui ∈ {−1, 0, 1}∗
xi ∈ {−2, −1, 0, 1, 2}
and
pki eixi -1 0 1
-2 0 12
12
-1 0 12
12
0 12
0 12
1 12
12
0
2 12
12
0
J = E
{
x24 +
3∑
i=0
x2i + u2
i
}
Notice, no stochastic components.
22 / 29
Optimal stochastic stepping (SDD)
Firstly, from
J = E
{
x24 +
3∑
i=0
x2i + u2
i
}
(no stochastics in cost) we establish V4(x4) = x24. We are assuming perfect state information.
x4 V4
-2 4-1 10 01 12 4
23 / 29
Optimal stochastic stepping (SDD)
Then we establish the W3(x3, u3) function (the cost to go):
W3(x3, u3) =m∑
k=1
pk3
[
L3(x3, u3, ek3) + V4(f3(x3, u3, e
k3)
]
W3(x3, u3) = p13[x23 + u2
3 + V4(x3 + u3 + e13)]
e13, p13
+p23[x23 + u2
3 + V4(x3 + u3 + e23)]
e23, p23
+p33[x23 + u2
3 + V4(x3 + u3 + e33)]
e33, p33
︸ ︷︷ ︸
L3(x3,u3,ek
3)
︸ ︷︷ ︸
f3(x3,u3,ek
3)
or more compact:
W3(x3, u3) = x23 + u2
3 + p13[V4(x3 + u3 + e13)
]
+p23[V4(x3 + u3 + e23)
]
+p33[V4(x3 + u3 + e33)
]
24 / 29
Optimal stochastic stepping (SDD)
W3(x3, u3) =3∑
k=1
pk[
x23 + u2
3 + V4(x3 + u3 + ek3)]
W3(0,−1) =1
2
[02 + (−1)2 + V4(0 − 1−1)
](−1,
1
2)
+0[02 + (−1)2 + V4(0− 1 + 0)
](0, 0)
+1
2
[02 + (−1)2 + V4(0− 1 + 1)
](1,
1
2)
=1
2(1 + 4) + 0 +
1
2(1 + 0) = 3
W3 u3
x3 -1 0 1-2 ∞ 6.5 5.5-1 4.5 1.5 2.50 3 1 31 2.5 1.5 4.52 5.5 3.5 ∞
x4 V4
-2 4-1 10 01 12 4
(just for reference)
25 / 29
Optimal stochastic stepping (SDD)
W3 u3 V3(x3) u∗3(x3)
x3 -1 0 1-2 ∞ 6.5 5.5 5.5 1-1 4.5 1.5 2.5 1.5 00 3 1 3 1 01 2.5 1.5 4.5 1.5 02 5.5 3.5 ∞ 3.5 0
W2 u2 V2(x2) u∗2(x2)
x2 -1 0 1-2 ∞ 7.5 6.25 6.25 1-1 5.5 2.25 3.25 2.25 00 4.25 1.5 3.25 1.5 01 3.25 2.25 4.5 2.25 02 6.25 6.5 ∞ 6.25 -1
26 / 29
Optimal stochastic stepping (SDD)
W1 u1 V1(x1) u∗1(x1)
x1 -1 0 1-2 ∞ 8.25 6.88 6.88 1-1 6.25 2.88 3.88 2.88 00 4.88 2.25 4.88 2.25 01 3.88 2.88 6.25 2.88 02 6.88 8.25 ∞ 6.88 -1
W0 u0 V0(x0) u∗0(x0)
x0 -1 0 12 7.56 8.88 ∞ 7.56 -1
Trace back: ui(xi). A feed back solution. Not a time function.
27 / 29
Deterministic setting (xi+1 = xi + ui i = 0, ... 3)
i 0 1 2 3u∗i -1 0 0 0
Stochastic setting (xi+1 = xi + ui+ei i = 0, ... 3)
x0 u∗0 x1 u∗
1 x2 u∗2 x3 u∗
3-2 1 -2 1 -2 1-1 0 -1 0 -1 00 0 0 0 0 01 0 1 0 1 0
2 -1 2 -1 2 -1 2 0
28 / 29
Concluding remarks
Discrete state and decision space.
Approximation. Grid covering state and decision space.
Curse of dimensions - combinatoric explosion.
29 / 29