bert kappen snn donders institute, radboud university...

34
Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London February 27, 2017 1 1 with Hans Christian Ruiz Bert Kappen

Upload: vanhanh

Post on 03-May-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

  • Latent state estimation using control theory

    Bert KappenSNN Donders Institute, Radboud University, Nijmegen

    Gatsby Unit, UCL London

    February 27, 2017

    1

    1with Hans Christian Ruiz

    Bert Kappen

  • Time series inference

    The joint probability is

    p(x1:T , y1:T ) = p(x1:T )p(y1:T |x1:T )

    p(x1:T ) =T

    t=1

    p(xt|xt1) p(y1:T |x1:T ) =T

    t=1

    p(yt|xt)

    Bert Kappen 1

  • Time series inference

    Given observed data y1:T , estimate the latent states.

    p(x1:T |y1:T ) =p(x1:T , y1:T )

    y1:T )p(y1:T ) =

    x1:T

    p(x1:T , y1:T )

    Bert Kappen 2

  • Time series inference

    Given observed data y1:T , estimate the latent states.

    p(x1:T |y1:T ) =p(x1:T , y1:T )

    y1:T )p(y1:T ) =

    x1:T

    p(x1:T , y1:T )

    Problem occurs in system identification: identify system dynamics from observa-tions.

    cellular dynamical processes, finance,

    Bert Kappen 3

  • fMRI

    Bert Kappen 4

  • Parameter estimation by likelihood maximization

    Suppose the model is known up to parameters: p(x, y|). 2.

    Maximize the log likelihood:

    L() = log p(y|) = log

    x

    p(x, y|)

    L() =

    1p(y|)

    x

    p(x, y|) =

    x

    log p(x, y|)

    p(x|y, )

    Parameter estimate requires statistics from the posterior.

    Common approaches:Kalman Filtering: exact for linear dynamics and Gaussian observationsVariational, EP, extended Kalman Filter: valid for simple posterior distributionsParticle filtering, SMC: generic, but inefficient

    2x = x1:T , y = y1:T .

    Bert Kappen 5

  • Time series inference

    p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)

    0 10 20 30 40 50 60 70 80 90 1000.5

    0

    0.5

    1

    1.5

    2

    t

    Bert Kappen 6

  • Time series inference

    p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)

    0 10 20 30 40 50 60 70 80 90 1000.5

    0

    0.5

    1

    1.5

    2

    t

    Bert Kappen 7

  • Time series inference

    p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)

    0 10 20 30 40 50 60 70 80 90 100

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    t

    0 10 20 30 40 50 60 70 80 90 100

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    t

    Bert Kappen 8

  • Control approach to time-series inference

    Outline:

    Introduction to control theory and path integral control theory

    Optimal control optimal sampling

    Adaptive importance sampler: Estimate improved sampler from self-generateddata

    fMRI application

    Bert Kappen 9

  • Control theory

    0 0.5 1 1.5 22

    1

    0

    1

    2

    Consider a stochastic dynamical system

    dXt = f (Xt, u)dt + dWt E(dWt,idWt. j) = i jdt

    Given X0 find control function u(x, t) that minimizes the expected future cost

    C = E((XT ) +

    T0

    dtV(Xt, u(Xt, t)))

    Bert Kappen 10

  • Control theory

    0.5 1 1.5 2

    2

    1

    0

    1

    20 0.5 1 1.5 2

    2

    1

    0

    1

    2

    Standard approach: define J(x, t) is optimal cost-to-go from x, t.

    J(x, t) = min ut:TEu

    ((XT ) +

    Tt

    dtV(Xt, u(Xt, t)))

    Xt = x

    J satisfies a partial differential equation

    tJ(t, x) = minu

    (V(x, u) + f (x, u)xJ(x, t) +

    122xJ(x, t)

    )J(x,T ) = (x)

    with u = u(x, t).This is HJB equation. Optimal control u(x, t) defines distributionover trajectories p() (= p(|x0, 0)).

    Bert Kappen 11

  • Path integral control theory

    0 0.5 1 1.5 22

    1

    0

    1

    2

    dXt = f (Xt, t)dt + g(Xt, t)(u(Xt, t)dt + dWt) X0 = x0

    Goal is to find function u(x, t) that minimizes

    C = Eu

    (S () +

    T0

    dt12

    u(Xt, t)2)

    S () = (XT ) + T

    0V(Xt, t)

    Bert Kappen 12

  • Path integral control theory

    0 0.5 1 1.5 22

    1

    0

    1

    2

    0 0.5 1 1.5 22

    1

    0

    1

    2

    Equivalent formulation: Find distribution over trajectories p that minimizes 3

    C(p) =

    dp()(S () + log

    p()q()

    )q(|x0, 0) is distribution over uncontrolled trajectories.

    The optimal solution is given by p() = 1q()eS ()

    3Eu T0 dt

    12u(Xt, t)

    2 =

    dp() log p()q() .

    Bert Kappen 13

  • Path integral control theory

    0 0.5 1 1.5 22

    1

    0

    1

    2

    0 0.5 1 1.5 22

    1

    0

    1

    2

    Equivalent formulation: Find distribution over trajectories p that minimizes

    C(p) =

    dp()(S () + log

    p()q()

    )q(|x0, 0) is distribution over uncontrolled trajectories.

    The optimal solution is given by p() = 1q()eS () = p(|u).

    Equivalence of optimal control and discounted cost (Girsanov)

    Bert Kappen 14

  • Path integral control theory

    0 0.5 1 1.5 22

    1

    0

    1

    2

    0 0.5 1 1.5 22

    1

    0

    1

    2

    The optimal control cost is C(p) = log = J(x0, 0) with

    =

    dq()eS () = EqeS

    udt =Eq

    (dWeS

    )Eq

    (eS

    )J, u can be computed by forward sampling from q.

    Bert Kappen 15

  • Estimating = EeS

    0 0.5 1 1.5 22

    1

    0

    1

    2

    ESS = 1.8, C=31.7

    Sample N trajectories from uncontrolled dynamics

    i q() wi = eS (i) =1N

    i

    wi

    unbiased estimate of .

    Sampling efficiency is inversely proportional to variance in (normalized) wi.

    ES S =N

    1 + N2Var(w)

    Bert Kappen 16

  • Importance sampling

    0 0.5 1 1.5 22

    1

    0

    1

    2

    ESS = 1.8, C=31.7

    0 0.5 1 1.5 22

    1

    0

    1

    2

    ESS = 3.5, C=5.0

    0 0.5 1 1.5 22

    1

    0

    1

    2

    ESS=9.5, C=2.0

    Sample N trajectories from controlled dynamics and reweight yields unbiased es-timate of cost-to-go:

    i p() wi = eS (i)q(i)p(i)

    = eS u(i) =1N

    i

    wi

    S u() = S () + T

    0dt

    12

    u(Xt, t)2 + T

    0u(Xt, t)dWt

    Bert Kappen 17

  • Importance sampling

    0 0.5 1 1.5 22

    1

    0

    1

    2

    ESS = 1.8, C=31.7

    0 0.5 1 1.5 22

    1

    0

    1

    2

    ESS = 3.5, C=5.0

    0 0.5 1 1.5 22

    1

    0

    1

    2

    ESS=9.5, C=2.0

    S u() = S () + T

    0dt

    12

    u(Xt, t)2 + T

    0u(Xt, t)dWt

    Thm: Better u (in the sense of optimal control) provides a better sampler (in the senseof effective sample size). Optimal u = u (in the sense of optimal control) requires only one sample andS u() deterministic!

    Thijssen, Kappen 2015

    Bert Kappen 18

  • Proof

    Control cost is C(p) = Ep(S () + log p()q()

    )Using Jensens inequality:

    C = log

    q()eS () = log

    p()eS ()logp()q()

    p()(S () + log

    p()q()

    )

    Bert Kappen 19

  • Proof

    Control cost is C(p) = Ep(S () + log p()q()

    )Using Jensens inequality:

    C = log

    q()eS () = log

    p()eS ()logp()q()

    p()(S () + log

    p()q()

    )

    The inequality is saturated when S () + log p()q() has zero variance.

    This is realized when p = p 4.

    4p exists when q()eS () <

    Bert Kappen 20

  • The Path Integral Cross Entropy (PICE) method

    We wish to estimate

    =

    dq()eS ()

    The optimal (zero variance) importance sampler is p() = 1q()eS ().

    We approximate p() with pu(), where u(x, t|) is a parametrized control function.

    Following the Cross Entropy method, we minimise KL(p|pu).

    EueS u T

    0dWt

    u(Xt, t|)

    u(x, t|) is arbitrary.

    Kappen, Ruiz 2016

    Bert Kappen 21

  • Adaptive importance samplingfor k = 0, . . . do

    datak = generate data(model, uk) % Importance sampleruk+1 = learn control(datak, uk) % Gradient descent

    end for

    Parallel samplingParallel gradient computation

    Bert Kappen 22

  • Inverted pendulum

    Simple 2nd order pendulum with noise, X = (, )

    = cos + u C = E T

    0dtV(Xt) +

    12

    u(Xt, t)2

    Naive grid: u(x) =

    k ukx,xk.

    0 200 400 600 800 10000

    0.2

    0.4

    0.6

    0.8

    1

    ss

    0 200 400 600 800 10001.4

    1.2

    1

    0.8

    0.6

    0.4

    0.2

    0

    J

    0 2 4 6

    2

    1

    0

    1

    2 3

    2

    1

    0

    1

    2

    3

    Final fraction ES S < 1 due to time discretization, finite sample size effects.

    Bert Kappen 23

  • Acrobot

    Swing up and stabilize underactuated stochastic double pendulum.

    (acrobot.mp4)

    Neural network 2 hidden layers, 50 neurons per layer. Input is position and velocity. 2000iterations, with 30000 rollouts per iteration. 100 cores. 15 minutes

    Bert Kappen 24

    Lavf56.36.100

    basic_animation.mp4Media File (video/mp4)

  • Coordination of UAVs

    Centralized path integral solution computed in real time (simulation) for 10 quadrotors. Objectiveis to fly a holding pattern near a fixed location maintaining a minimal velocity and distance to otherdrones. Video at: http://www.snn.ru.nl/bertk/control_theory/PI_quadrotors.mp4

    Gomez et al. 2016

    Bert Kappen 25

    http://www.snn.ru.nl/~bertk/control_theory/PI_quadrotors.mp4

  • Coordination of UAVs

    This behavior was replicated on real quadrotors demonstrating high dimensionalnon-linear stochastic optimal control in real-time.

    Chao Xu ACC 2017

    Bert Kappen 26

  • Control Inference

    Fleming, Mitter

    Bert Kappen 27

  • Infering neural activity from BOLD fMRI

    Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex

    Reconstruct neural signal from BOLD signal.

    Neural activity dZt = AZtdt + zdWt

    HemodynamicsVasodilation s(z, s, f )Blood flow f (s)

    Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)

    BOLD y(v, q)

    Bert Kappen 28

  • Infering neural activity from BOLD fMRI

    Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex

    Reconstruct neural signal from BOLD signal.

    Neural activity dZt = AZtdt + zdWt

    HemodynamicsVasodilation s(z, s, f )Blood flow f (s)

    Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)

    BOLD y(v, q)

    Bert Kappen 29

  • Infering neural activity from BOLD fMRI

    Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex

    Reconstruct neural signal from BOLD signal.

    Neural activity dZt = AZtdt + zdWt

    HemodynamicsVasodilation s(z, s, f )Blood flow f (s)

    Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)

    BOLD y(v, q)

    Bert Kappen 30

  • Infering neural activity from BOLD fMRI

    Neural network with 8 layers and 50 units per layer. 45 learning iterations. 300.000 MC samplesper iteration. 500 cores. CPU time 3100 sec.

    Bert Kappen 31

  • Infering neural activity from BOLD fMRI

    Bert Kappen 32

  • Thank you!

    Kappen, Hilbert Johan, and Hans Christian Ruiz. Adaptive importance sampling for control andinference. Journal of Statistical Physics 162.5 (2016): 1244-1266.

    Bert Kappen 33