bert kappen snn donders institute, radboud university...
TRANSCRIPT
-
Latent state estimation using control theory
Bert KappenSNN Donders Institute, Radboud University, Nijmegen
Gatsby Unit, UCL London
February 27, 2017
1
1with Hans Christian Ruiz
Bert Kappen
-
Time series inference
The joint probability is
p(x1:T , y1:T ) = p(x1:T )p(y1:T |x1:T )
p(x1:T ) =T
t=1
p(xt|xt1) p(y1:T |x1:T ) =T
t=1
p(yt|xt)
Bert Kappen 1
-
Time series inference
Given observed data y1:T , estimate the latent states.
p(x1:T |y1:T ) =p(x1:T , y1:T )
y1:T )p(y1:T ) =
x1:T
p(x1:T , y1:T )
Bert Kappen 2
-
Time series inference
Given observed data y1:T , estimate the latent states.
p(x1:T |y1:T ) =p(x1:T , y1:T )
y1:T )p(y1:T ) =
x1:T
p(x1:T , y1:T )
Problem occurs in system identification: identify system dynamics from observa-tions.
cellular dynamical processes, finance,
Bert Kappen 3
-
fMRI
Bert Kappen 4
-
Parameter estimation by likelihood maximization
Suppose the model is known up to parameters: p(x, y|). 2.
Maximize the log likelihood:
L() = log p(y|) = log
x
p(x, y|)
L() =
1p(y|)
x
p(x, y|) =
x
log p(x, y|)
p(x|y, )
Parameter estimate requires statistics from the posterior.
Common approaches:Kalman Filtering: exact for linear dynamics and Gaussian observationsVariational, EP, extended Kalman Filter: valid for simple posterior distributionsParticle filtering, SMC: generic, but inefficient
2x = x1:T , y = y1:T .
Bert Kappen 5
-
Time series inference
p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)
0 10 20 30 40 50 60 70 80 90 1000.5
0
0.5
1
1.5
2
t
Bert Kappen 6
-
Time series inference
p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)
0 10 20 30 40 50 60 70 80 90 1000.5
0
0.5
1
1.5
2
t
Bert Kappen 7
-
Time series inference
p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)
0 10 20 30 40 50 60 70 80 90 100
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
0 10 20 30 40 50 60 70 80 90 100
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
Bert Kappen 8
-
Control approach to time-series inference
Outline:
Introduction to control theory and path integral control theory
Optimal control optimal sampling
Adaptive importance sampler: Estimate improved sampler from self-generateddata
fMRI application
Bert Kappen 9
-
Control theory
0 0.5 1 1.5 22
1
0
1
2
Consider a stochastic dynamical system
dXt = f (Xt, u)dt + dWt E(dWt,idWt. j) = i jdt
Given X0 find control function u(x, t) that minimizes the expected future cost
C = E((XT ) +
T0
dtV(Xt, u(Xt, t)))
Bert Kappen 10
-
Control theory
0.5 1 1.5 2
2
1
0
1
20 0.5 1 1.5 2
2
1
0
1
2
Standard approach: define J(x, t) is optimal cost-to-go from x, t.
J(x, t) = min ut:TEu
((XT ) +
Tt
dtV(Xt, u(Xt, t)))
Xt = x
J satisfies a partial differential equation
tJ(t, x) = minu
(V(x, u) + f (x, u)xJ(x, t) +
122xJ(x, t)
)J(x,T ) = (x)
with u = u(x, t).This is HJB equation. Optimal control u(x, t) defines distributionover trajectories p() (= p(|x0, 0)).
Bert Kappen 11
-
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
dXt = f (Xt, t)dt + g(Xt, t)(u(Xt, t)dt + dWt) X0 = x0
Goal is to find function u(x, t) that minimizes
C = Eu
(S () +
T0
dt12
u(Xt, t)2)
S () = (XT ) + T
0V(Xt, t)
Bert Kappen 12
-
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
0 0.5 1 1.5 22
1
0
1
2
Equivalent formulation: Find distribution over trajectories p that minimizes 3
C(p) =
dp()(S () + log
p()q()
)q(|x0, 0) is distribution over uncontrolled trajectories.
The optimal solution is given by p() = 1q()eS ()
3Eu T0 dt
12u(Xt, t)
2 =
dp() log p()q() .
Bert Kappen 13
-
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
0 0.5 1 1.5 22
1
0
1
2
Equivalent formulation: Find distribution over trajectories p that minimizes
C(p) =
dp()(S () + log
p()q()
)q(|x0, 0) is distribution over uncontrolled trajectories.
The optimal solution is given by p() = 1q()eS () = p(|u).
Equivalence of optimal control and discounted cost (Girsanov)
Bert Kappen 14
-
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
0 0.5 1 1.5 22
1
0
1
2
The optimal control cost is C(p) = log = J(x0, 0) with
=
dq()eS () = EqeS
udt =Eq
(dWeS
)Eq
(eS
)J, u can be computed by forward sampling from q.
Bert Kappen 15
-
Estimating = EeS
0 0.5 1 1.5 22
1
0
1
2
ESS = 1.8, C=31.7
Sample N trajectories from uncontrolled dynamics
i q() wi = eS (i) =1N
i
wi
unbiased estimate of .
Sampling efficiency is inversely proportional to variance in (normalized) wi.
ES S =N
1 + N2Var(w)
Bert Kappen 16
-
Importance sampling
0 0.5 1 1.5 22
1
0
1
2
ESS = 1.8, C=31.7
0 0.5 1 1.5 22
1
0
1
2
ESS = 3.5, C=5.0
0 0.5 1 1.5 22
1
0
1
2
ESS=9.5, C=2.0
Sample N trajectories from controlled dynamics and reweight yields unbiased es-timate of cost-to-go:
i p() wi = eS (i)q(i)p(i)
= eS u(i) =1N
i
wi
S u() = S () + T
0dt
12
u(Xt, t)2 + T
0u(Xt, t)dWt
Bert Kappen 17
-
Importance sampling
0 0.5 1 1.5 22
1
0
1
2
ESS = 1.8, C=31.7
0 0.5 1 1.5 22
1
0
1
2
ESS = 3.5, C=5.0
0 0.5 1 1.5 22
1
0
1
2
ESS=9.5, C=2.0
S u() = S () + T
0dt
12
u(Xt, t)2 + T
0u(Xt, t)dWt
Thm: Better u (in the sense of optimal control) provides a better sampler (in the senseof effective sample size). Optimal u = u (in the sense of optimal control) requires only one sample andS u() deterministic!
Thijssen, Kappen 2015
Bert Kappen 18
-
Proof
Control cost is C(p) = Ep(S () + log p()q()
)Using Jensens inequality:
C = log
q()eS () = log
p()eS ()logp()q()
p()(S () + log
p()q()
)
Bert Kappen 19
-
Proof
Control cost is C(p) = Ep(S () + log p()q()
)Using Jensens inequality:
C = log
q()eS () = log
p()eS ()logp()q()
p()(S () + log
p()q()
)
The inequality is saturated when S () + log p()q() has zero variance.
This is realized when p = p 4.
4p exists when q()eS () <
Bert Kappen 20
-
The Path Integral Cross Entropy (PICE) method
We wish to estimate
=
dq()eS ()
The optimal (zero variance) importance sampler is p() = 1q()eS ().
We approximate p() with pu(), where u(x, t|) is a parametrized control function.
Following the Cross Entropy method, we minimise KL(p|pu).
EueS u T
0dWt
u(Xt, t|)
u(x, t|) is arbitrary.
Kappen, Ruiz 2016
Bert Kappen 21
-
Adaptive importance samplingfor k = 0, . . . do
datak = generate data(model, uk) % Importance sampleruk+1 = learn control(datak, uk) % Gradient descent
end for
Parallel samplingParallel gradient computation
Bert Kappen 22
-
Inverted pendulum
Simple 2nd order pendulum with noise, X = (, )
= cos + u C = E T
0dtV(Xt) +
12
u(Xt, t)2
Naive grid: u(x) =
k ukx,xk.
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
ss
0 200 400 600 800 10001.4
1.2
1
0.8
0.6
0.4
0.2
0
J
0 2 4 6
2
1
0
1
2 3
2
1
0
1
2
3
Final fraction ES S < 1 due to time discretization, finite sample size effects.
Bert Kappen 23
-
Acrobot
Swing up and stabilize underactuated stochastic double pendulum.
(acrobot.mp4)
Neural network 2 hidden layers, 50 neurons per layer. Input is position and velocity. 2000iterations, with 30000 rollouts per iteration. 100 cores. 15 minutes
Bert Kappen 24
Lavf56.36.100
basic_animation.mp4Media File (video/mp4)
-
Coordination of UAVs
Centralized path integral solution computed in real time (simulation) for 10 quadrotors. Objectiveis to fly a holding pattern near a fixed location maintaining a minimal velocity and distance to otherdrones. Video at: http://www.snn.ru.nl/bertk/control_theory/PI_quadrotors.mp4
Gomez et al. 2016
Bert Kappen 25
http://www.snn.ru.nl/~bertk/control_theory/PI_quadrotors.mp4
-
Coordination of UAVs
This behavior was replicated on real quadrotors demonstrating high dimensionalnon-linear stochastic optimal control in real-time.
Chao Xu ACC 2017
Bert Kappen 26
-
Control Inference
Fleming, Mitter
Bert Kappen 27
-
Infering neural activity from BOLD fMRI
Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex
Reconstruct neural signal from BOLD signal.
Neural activity dZt = AZtdt + zdWt
HemodynamicsVasodilation s(z, s, f )Blood flow f (s)
Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)
BOLD y(v, q)
Bert Kappen 28
-
Infering neural activity from BOLD fMRI
Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex
Reconstruct neural signal from BOLD signal.
Neural activity dZt = AZtdt + zdWt
HemodynamicsVasodilation s(z, s, f )Blood flow f (s)
Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)
BOLD y(v, q)
Bert Kappen 29
-
Infering neural activity from BOLD fMRI
Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex
Reconstruct neural signal from BOLD signal.
Neural activity dZt = AZtdt + zdWt
HemodynamicsVasodilation s(z, s, f )Blood flow f (s)
Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)
BOLD y(v, q)
Bert Kappen 30
-
Infering neural activity from BOLD fMRI
Neural network with 8 layers and 50 units per layer. 45 learning iterations. 300.000 MC samplesper iteration. 500 cores. CPU time 3100 sec.
Bert Kappen 31
-
Infering neural activity from BOLD fMRI
Bert Kappen 32
-
Thank you!
Kappen, Hilbert Johan, and Hans Christian Ruiz. Adaptive importance sampling for control andinference. Journal of Statistical Physics 162.5 (2016): 1244-1266.
Bert Kappen 33