a near model-free method for solving the hamilton-jacobi-bellman equation … · 2020. 1. 23. ·...
TRANSCRIPT
A near model-free method for solving theHamilton-Jacobi-Bellman equation in high dimensions
Mathias Oster, Leon Sallandt, Reinhold Schneider
Technische Universitat BerlinICODE Workshop on numerical solutions of HJB equations
10.01.2020
Motivation and Ingredients
Aim: Calculate optimal feedback laws (via HJB) for controlled PDEs.
Ingredients:
1 Reformulate the HJB equation as operator equation.
2 Use Monte Carlo integration for least squares approximation.
3 Use non linear, smooth Ansatz space: HT/TT – tree-based tensors.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 2 / 21
Classical optimal control problem
Optimal control problem: find u ∈ L2(0,∞) such that
minu
J(x , u) = minu
∫ ∞0
1
2‖x(s)‖2
Rn +λ
2|u(s)|2ds,
subject to
x = f (x , u), x ∈ Ω ⊂ Rx(0) = x0
1 Note that the differential equation can be high-dimensional
2 linear ODE and quadratic cost → Riccati equation
3 nonlinear ODE and nonlinear cost → Hamilton-Jacobi-Bellman (HJB)equation
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 3 / 21
Feedback control problem
Define a feedback-law α(x(t)) = u(t). Rephrase
minα
Jα(x) = minα
∫ ∞0
1
2‖x(s, α)‖2
Rn +λ
2|(α(x))(s)|2︸ ︷︷ ︸
=:rα(x)
ds,
Our goal: find an optimal feedback law α∗(x) = u.
Defining the value function
v(x) := infαJα(x) ∈ R
Idea: if v is differentiable, the feedback law is given by
α(x) = − 1
λDxv(x) Duf (x , u) (easy to calculate!).
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 4 / 21
The HJB equation
The value function obeys
infα
f (x , α(x)) · ∇v(x) + rα(x)
= 0
HJB equation is highly nonlinear and potentially high-dimensional!
But: For fixed policy α(x) it reduces to a linear equation: DefiningLα := −f (x , α) · ∇ we get
Lαvα(x)− rα(x) = 0.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 5 / 21
Methods of characteristics
Linearized HJB:Lαvα(x)− rα(x) = 0.
Using the methods of characteristics we obtain
x(t) = f (x , α),
vα(x(0)) =
∫ τ
0rα(x(t))dt + vα(x(s)),
which we call Bellman-like equation.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 6 / 21
Reformulation as Operator Equation
Consider the Koopman operator:
Kατ : Lloc,∞(Ω)→ Lloc,∞(Ω), Kα
τ [g ](x) = g(x(τ)).
Rewrite the Bellman-like equation: For all x ∈ Ω:
vα(x(0)) =
∫ τ
0rα(x(t))dt + vα(x(s)),
as
(Id− Kατ )[v ](x) =
∫ τ
0Kαt r(x)dt︸ ︷︷ ︸
=:Rατ (x)
.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 7 / 21
Policy iteration
Policy iteration uses a sequence of linearized HJB equations.
Algorithm (Policy iteration)
Initialize with stabilizing feedback α0. Solve until convergence
1 Find vi+1 such that (Id− Kαiτ )vi+1(·)− Rαi
τ (·) = 0.
2 Update policy according to αi+1(x) = − 1λDxvi+1(x) Duf (t, x , u).
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 8 / 21
Least squares ansatz
Problem: We need to solve
(Id− Kαiτ )vαi+1(·)− Rαi
τ (·) = 0.
Idea: Solve on suitable S
vαi+1 = arg minv∈S
‖(Id− Kαiτ )v(·)− Rαi
τ (·)‖2L2(Ω)︸ ︷︷ ︸
=∫
Ω |(Id−Kαiτ )v(x)−Rαi
τ (x)|2dx
.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 9 / 21
Projected Policy iteration
Algorithm (Projected Policy iteration)
Initialize with stabilizing feedback α0. Solve until convergence
1 Findvi+1 = arg min
v∈S‖(Id− Kαi
τ )v(·)− Rαiτ (·)‖2
L2(Ω).
2 Update policy according to αi+1(x) = − 1λDxvi+1(x) Duf (x , u).
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 10 / 21
Variational Monte-Carlo
Approximate by Monte-Carlo quadrature
‖(Id− Kαiτ )v(·)− Rαi
τ (·)‖2L2(Ω) ≈
1
n
n∑j=1
|(Id− Kαiτ )v(xj)− Rαi
τ (xj)|2.
v∗n,s = arg minv∈S
1
n
n∑j=1
|(Id− Kαiτ )v(xj)− Rαi
τ (xj)|2
Proposition ([Eigel, Schneider et al, 19)
] Let ε > 0 such that infvs∈S‖v∗ − vs‖2
L2(Ω) ≤ ε. Then
P[‖v∗ − v∗(n,s)‖2L2(Ω) > ε] ≤ c1(ε)e−c2(ε)n
with c1, c2 > 0.
Exponential decay with number of samples chosen.Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 11 / 21
Solving the VMC equation
arg minn∑
j=1
|(Id− Kαiτ )v(xj)− Rαi
τ (xj)|2.
1 v(xj)→ evaluate v at samples xj .
2 Kαiτ v(xj)→ evaluate v at transported samples (with policy αi ).
3 Rαiτ (xj)→ approximate reward by trapezoidal rule
What do we need for solving the equation?
Model-free solution is possible. Only a black-box solver of the ODE isneeded.
What do we need for updating the policy?
We need Duf (x , u), i.e. The derivative of the rhs w.r.t. the control.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 12 / 21
Possible ansatz spaces
Full linear space of polynomials
Low-rank tensor manifolds
Deep Neural Networks
Here used:
Low rank Tensor Train (TT-tensor) manifold
Riemanian manifold structure
Explicit representation of tangential space
Convergence theory for optimization algorithms
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 13 / 21
Tensor Trains
Consider Πi = (1, xi , x2i , x
3i , .., x
ki ) one-dimensional polynomials.
Tensor product Π =⊗n
i=1 Πi .
dim(Π) = (k + 1)n, huge if n >> 0.
Reduce size of Ansatz space by considering non-linear M⊂ Π.
v(x) =
A1 A2 A3 A4
P P P P
x1 x2 x3 x4
r1 r2 r3
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 14 / 21
Cost functional
Modify cost-functional:
RN(v) =1
n
n∑j=1
|(Id− Kαiτ )v(xj)− R(xj)|2
+ |v(0)|2 + |∇v(0)|2︸ ︷︷ ︸vanishes in exact case
+µ‖v‖2H1(Ω)︸ ︷︷ ︸
regularizer
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 15 / 21
Example: Schloegl-like equation
Consider a Schlogl like system with Neumann boundary condition, c.f. [1,Dolgov, Kalise, Kunisch, 19]. Solve for x ∈ Ω = L2(−1, 1)
minu
J(x , u) = minu
∫ ∞0
1
2‖x(s)‖2 +
λ
2|u(s)|2ds,
subject to
x(t) = σ∆x(t) + x(t)3 + χωu(t)
x(0) = x0.
χω is characteristic function on ω = [−0.4, 0, 4].
After discretization in space (finite differences):x1...xn
= A
x1...xn
+
x31...x3n
+ u
0...1...0
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 16 / 21
Example: Schloegl-like equation
TT Degrees of Freedom
Full space: 532. Reduced to ≈ 5000.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 17 / 21
Example: Schloegl-like equation
−1.0 −0.5 0.0 0.5 1.0
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00 x0x1
(a) Initial values.
x0 x10
2
4
6
Cost
2.85
5.74
2.152.88
2.122.83
|v(x0)− J (x0, α(x0))|2 |v(x1)− J (x1, α(x1))|210−2
10−1
100
101
Bellm
anerrorsquared
(b) Generated cost and least squareserror. Blue is Riccati, orange is VL2 andgreen is VH1 .
Figure: The generated controls for different initial values.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 18 / 21
Example: Schloegl-like equation
0 1 2 3 4 5time
−20
−15
−10
−5
0
RiccatiVL2
VH1
(a) Generated controls, initial value x0
0 1 2 3 4 5time
−17.5
−15.0
−12.5
−10.0
−7.5
−5.0
−2.5
0.0
RiccatiVL2
VH1
(b) Generated controls, initial value x1
Figure: The generated controls for different initial values.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 19 / 21
What do we need for optimization
We only need
a discretization of the flow Φ (blackbox)
the derivative of the rhs f (x , u) w.r.t. the control (easy if linear)
the cost functional
to solve the equation and generate a feedback law.
Thank you for your attention
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 20 / 21
References and related work
Sergey Dolgov, Dante Kalise, and Karl Kunisch.A Tensor Decomposition Approach for High-DimensionalHamilton-Jacobi-Bellman Equations.arXiv e-prints, page arXiv:1908.01533, Aug 2019.
Martin Eigel, Reinhold Schneider, Philipp Trunschke, and SebastianWolf.Variational monte carlo—bridging concepts of machine learning andhigh-dimensional partial differential equations.Advances in Computational Mathematics, Oct 2019.
Mathias Oster, Leon Sallandt, and Reinhold Schneider.Approximating the stationary hamilton-jacobi-bellman equation byhierarchical tensor products, 2019.
Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 21 / 21