a near model-free method for solving the hamilton-jacobi-bellman equation … · 2020. 1. 23. ·...

A near model-free method for solving theHamilton-Jacobi-Bellman equation in high dimensions

Mathias Oster, Leon Sallandt, Reinhold Schneider

Technische Universitat BerlinICODE Workshop on numerical solutions of HJB equations

10.01.2020

Motivation and Ingredients

Aim: Calculate optimal feedback laws (via HJB) for controlled PDEs.

Ingredients:

1 Reformulate the HJB equation as operator equation.

2 Use Monte Carlo integration for least squares approximation.

3 Use non linear, smooth Ansatz space: HT/TT – tree-based tensors.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 2 / 21

Classical optimal control problem

Optimal control problem: find u ∈ L2(0,∞) such that

minu

J(x , u) = minu

∫ ∞0

1

2‖x(s)‖2

Rn +λ

2|u(s)|2ds,

subject to

x = f (x , u), x ∈ Ω ⊂ Rx(0) = x0

1 Note that the differential equation can be high-dimensional

2 linear ODE and quadratic cost → Riccati equation

3 nonlinear ODE and nonlinear cost → Hamilton-Jacobi-Bellman (HJB)equation


Feedback control problem

Define a feedback-law α(x(t)) = u(t). Rephrase

minα

Jα(x) = minα

∫ ∞0

1

2‖x(s, α)‖2

Rn +λ

2|(α(x))(s)|2︸︷︷︸

=:rα(x)

ds,

Our goal: find an optimal feedback law α∗(x) = u.

Defining the value function

v(x) := infαJα(x) ∈ R

Idea: if v is differentiable, the feedback law is given by

α(x) = − 1

λDxv(x) Duf (x , u) (easy to calculate!).


The HJB equation

The value function obeys

infα

f (x , α(x)) · ∇v(x) + rα(x)

= 0

HJB equation is highly nonlinear and potentially high-dimensional!

But: For fixed policy α(x) it reduces to a linear equation: DefiningLα := −f (x , α) · ∇ we get

Lαvα(x)− rα(x) = 0.


Methods of characteristics

Linearized HJB:Lαvα(x)− rα(x) = 0.

Using the methods of characteristics we obtain

x(t) = f (x , α),

vα(x(0)) =

∫ τ

0rα(x(t))dt + vα(x(s)),

which we call Bellman-like equation.


Reformulation as Operator Equation

Consider the Koopman operator:

Kατ : Lloc,∞(Ω)→ Lloc,∞(Ω), Kα

τ [g ](x) = g(x(τ)).

Rewrite the Bellman-like equation: For all x ∈ Ω:

vα(x(0)) =

∫ τ

0rα(x(t))dt + vα(x(s)),

as

(Id− Kατ )[v ](x) =

∫ τ

0Kαt r(x)dt︸︷︷︸

=:Rατ (x)

.


Policy iteration

Policy iteration uses a sequence of linearized HJB equations.

Algorithm (Policy iteration)

Initialize with stabilizing feedback α0. Solve until convergence

1 Find vi+1 such that (Id− Kαiτ )vi+1(·)− Rαi

τ (·) = 0.

2 Update policy according to αi+1(x) = − 1λDxvi+1(x) Duf (t, x , u).


Least squares ansatz

Problem: We need to solve

(Id− Kαiτ )vαi+1(·)− Rαi

τ (·) = 0.

Idea: Solve on suitable S

vαi+1 = arg minv∈S

‖(Id− Kαiτ )v(·)− Rαi

τ (·)‖2L2(Ω)︸︷︷︸

=∫

Ω |(Id−Kαiτ )v(x)−Rαi

τ (x)|2dx

.


Projected Policy iteration

Algorithm (Projected Policy iteration)

Initialize with stabilizing feedback α0. Solve until convergence

1 Findvi+1 = arg min

v∈S‖(Id− Kαi

τ )v(·)− Rαiτ (·)‖2

L2(Ω).

2 Update policy according to αi+1(x) = − 1λDxvi+1(x) Duf (x , u).


Variational Monte-Carlo

Approximate by Monte-Carlo quadrature

‖(Id− Kαiτ )v(·)− Rαi

τ (·)‖2L2(Ω) ≈

1

n

n∑j=1

|(Id− Kαiτ )v(xj)− Rαi

τ (xj)|2.

v∗n,s = arg minv∈S

1

n

n∑j=1


τ (xj)|2

Proposition ([Eigel, Schneider et al, 19)

] Let ε > 0 such that infvs∈S‖v∗ − vs‖2

L2(Ω) ≤ ε. Then

P[‖v∗ − v∗(n,s)‖2L2(Ω) > ε] ≤ c1(ε)e−c2(ε)n

with c1, c2 > 0.

Exponential decay with number of samples chosen.Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 11 / 21

Solving the VMC equation

arg minn∑

j=1


τ (xj)|2.

1 v(xj)→ evaluate v at samples xj .

2 Kαiτ v(xj)→ evaluate v at transported samples (with policy αi ).

3 Rαiτ (xj)→ approximate reward by trapezoidal rule

What do we need for solving the equation?

Model-free solution is possible. Only a black-box solver of the ODE isneeded.

What do we need for updating the policy?

We need Duf (x , u), i.e. The derivative of the rhs w.r.t. the control.


Possible ansatz spaces

Full linear space of polynomials

Low-rank tensor manifolds

Deep Neural Networks

Here used:

Low rank Tensor Train (TT-tensor) manifold

Riemanian manifold structure

Explicit representation of tangential space

Convergence theory for optimization algorithms


Tensor Trains

Consider Πi = (1, xi , x2i , x

3i , .., x

ki ) one-dimensional polynomials.

Tensor product Π =⊗n

i=1 Πi .

dim(Π) = (k + 1)n, huge if n >> 0.

Reduce size of Ansatz space by considering non-linear M⊂ Π.

v(x) =

A1 A2 A3 A4

P P P P

x1 x2 x3 x4

r1 r2 r3


Cost functional

Modify cost-functional:

RN(v) =1

n

n∑j=1

|(Id− Kαiτ )v(xj)− R(xj)|2

+ |v(0)|2 + |∇v(0)|2︸︷︷︸vanishes in exact case

+µ‖v‖2H1(Ω)︸︷︷︸

regularizer


Example: Schloegl-like equation

Consider a Schlogl like system with Neumann boundary condition, c.f. [1,Dolgov, Kalise, Kunisch, 19]. Solve for x ∈ Ω = L2(−1, 1)

minu

J(x , u) = minu

∫ ∞0

1

2‖x(s)‖2 +

λ

2|u(s)|2ds,

subject to

x(t) = σ∆x(t) + x(t)3 + χωu(t)

x(0) = x0.

χω is characteristic function on ω = [−0.4, 0, 4].

After discretization in space (finite differences):x1...xn

= A

x1...xn

+

x31...x3n

+ u

0...1...0



TT Degrees of Freedom

Full space: 532. Reduced to ≈ 5000.



−1.0 −0.5 0.0 0.5 1.0

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00 x0x1

(a) Initial values.

x0 x10

2

4

6

Cost

2.85

5.74

2.152.88

2.122.83

|v(x0)− J (x0, α(x0))|2 |v(x1)− J (x1, α(x1))|210−2

10−1

100

101

Bellm

anerrorsquared

(b) Generated cost and least squareserror. Blue is Riccati, orange is VL2 andgreen is VH1 .

Figure: The generated controls for different initial values.



0 1 2 3 4 5time

−20

−15

−10

−5

0

RiccatiVL2

VH1

(a) Generated controls, initial value x0

0 1 2 3 4 5time

−17.5

−15.0

−12.5

−10.0

−7.5

−5.0

−2.5

0.0

RiccatiVL2

VH1

(b) Generated controls, initial value x1

Figure: The generated controls for different initial values.


What do we need for optimization

We only need

a discretization of the flow Φ (blackbox)

the derivative of the rhs f (x , u) w.r.t. the control (easy if linear)

the cost functional

to solve the equation and generate a feedback law.

Thank you for your attention


References and related work

Sergey Dolgov, Dante Kalise, and Karl Kunisch.A Tensor Decomposition Approach for High-DimensionalHamilton-Jacobi-Bellman Equations.arXiv e-prints, page arXiv:1908.01533, Aug 2019.

Martin Eigel, Reinhold Schneider, Philipp Trunschke, and SebastianWolf.Variational monte carlo—bridging concepts of machine learning andhigh-dimensional partial differential equations.Advances in Computational Mathematics, Oct 2019.

Mathias Oster, Leon Sallandt, and Reinhold Schneider.Approximating the stationary hamilton-jacobi-bellman equation byhierarchical tensor products, 2019.


a near model-free method for solving the hamilton-jacobi-bellman equation … · 2020. 1. 23. ·...

Documents