machine learning with signal processingasolin/icml2020-tutorial/... · 2020. 7. 13. · machine...

27
Machine Learning with Signal Processing Part IV: Application Examples Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University ICML 2020 TUTORIAL @arnosolin arno.solin.fi

Upload: others

Post on 19-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Machine Learning with Signal ProcessingPart IV: Application Examples

    Arno SolinAssistant Professor in Machine Learning

    Department of Computer ScienceAalto University

    ICML 2020 TUTORIAL

    7@arnosolin B arno.solin.fi

    https://twitter.com/arnosolinhttp://arno.solin.fi

  • Machine learning with signal processing: Part IVArno Solin

    2/25

    Outline

    Non-Gaussianlikelihoods

    Spatio-temporal GPs

    Latent-space GPs

    Latent differentialequations

    Langevindynamics

    Summary

  • Machine learning with signal processing: Part IVArno Solin

    3/25

    Gaussian processes ♥ SDEs

    GPs under the kernel formalism

    f (t) ∼ GP(0, κ(t , t ′))

    y | f ∼∏

    i

    p(yi | f (ti))

    Stochastic differential equations

    df(t) = F f(t) + L dβ(t)

    yi ∼ p(yi | hTf(ti))

    Flexible modelspecification

    Inference /First-principles

  • Machine learning with signal processing: Part IVArno Solin

    4/25

    Non-Gaussian likelihoods

    I The observation model might not be Gaussian

    f (t) ∼ GP(0, κ(t , t ′))y | f ∼

    ∏i

    p(yi | f (ti))

    I There exists a multitude of great methods to tackle generallikelihoods with approximations of the form

    Q(f | D) = N(f | m + Kα, (K−1 + W)−1)

    I Use those methods, but deal with the latent using statespace models

  • Machine learning with signal processing: Part IVArno Solin

    5/25

    Inference

    I Laplace approximation

    I Variational Bayes

    I Direct KL minimization

    I EP or Assumed density filtering (Single-sweep EP)

    I Can be evaluated in terms of a (Kalman) filter forward andbackward pass, or by iterating them

  • Machine learning with signal processing: Part IVArno Solin

    6/25

    Example

    I Commercial aircraft accidents 1919–2017I Log-Gaussian Cox process (Poisson likelihood) by ADF/EPI Daily binning, n = 35,959I GP prior with a covariance function:

    κ(t , t ′) = κν=3/2Mat. (t , t′) + κyearPer. (t , t

    ′)κν=3/2Mat. (t , t

    ′) + κweekPer. (t , t′)κ

    ν=3/2Mat. (t , t

    ′)

    I Learn hyperparameters by optimizing the marginallikelihood

    Nickisch et al. (2018). ICML.

  • Machine learning with signal processing: Part IVArno Solin

    6/25

    Example

    I Commercial aircraft accidents 1919–2017I Log-Gaussian Cox process (Poisson likelihood) by ADF/EPI Daily binning, n = 35,959I GP prior with a covariance function:

    κ(t , t ′) = κν=3/2Mat. (t , t′) + κyearPer. (t , t

    ′)κν=3/2Mat. (t , t

    ′) + κweekPer. (t , t′)κ

    ν=3/2Mat. (t , t

    ′)

    I Learn hyperparameters by optimizing the marginallikelihood

    Nickisch et al. (2018). ICML.

  • Machine learning with signal processing: Part IVArno Solin

    6/25

    Example

    I Commercial aircraft accidents 1919–2017I Log-Gaussian Cox process (Poisson likelihood) by ADF/EPI Daily binning, n = 35,959I GP prior with a covariance function:

    κ(t , t ′) = κν=3/2Mat. (t , t′) + κyearPer. (t , t

    ′)κν=3/2Mat. (t , t

    ′) + κweekPer. (t , t′)κ

    ν=3/2Mat. (t , t

    ′)

    I Learn hyperparameters by optimizing the marginallikelihood

    Nickisch et al. (2018). ICML.

  • Machine learning with signal processing: Part IVArno Solin

    7/25

    Spatio-temporal GPs

    f (r) ∼ GP(0, κ(r, r′))y | f ∼

    ∏i

    p(yi | f (ri))

    f (x, t) ∼ GP(0, κ(x, t ;x′, t ′))y | f ∼

    ∏i

    p(yi | f (xi , ti))

  • Machine learning with signal processing: Part IVArno Solin

    8/25

    Spatio-temporal Gaussian processes

    GPs under the kernel formalism

    f (x, t) ∼ GP(0, k(x, t ; x′, t ′))yi = f (xi , ti) + εi

    Stochastic partial differential equations

    ∂f(x, t)∂t

    = F f(x, t) +Lw(x, t)

    yi = Hi f(x, t) + εi

    Location(x) T

    ime(t)

    f(x

    ,t)

    Covariancek(x, t; x′, t′)

    Location(x) T

    ime(t)

    f(x

    ,t)

    The state at time t

  • Machine learning with signal processing: Part IVArno Solin

    9/25

    Spatio-temporal GP regression

    Särkkä et al. (2013). IEEE Signal Processing Magazine.

  • Machine learning with signal processing: Part IVArno Solin

    10/25

    Spatio-temporal GP regression

    Särkkä et al. (2013). IEEE Signal Processing Magazine.

  • Machine learning with signal processing: Part IVArno Solin

    11/25

    Spatio-temporal GP classification

    Wilkinson et al. (2020). ICML.

  • Machine learning with signal processing: Part IVArno Solin

    12/25

    Multi-view stereo as a temporal fusion problemC

    amer

    apo

    seFr

    ame

    Cos

    tvol

    .E

    ncod

    erD

    ecod

    erD

    ispa

    rity/

    Dep

    thLa

    tent

    GP

    Pose

    sim

    ilarit

    y

    z0

    skip

    y1

    z1

    skip

    y2

    z2

    skip

    y3

    z3

    skip

    y4

    z4

    I Inputs: Frame pairs and relativecamera poses

    I Encoder–decoder network fordepth estimation

    I Treat the encoder as a featureextractor, and do GP regressionin the latent space

    I The GP prior encodes thesimilarity of the camera views

    Hou et al. (2019). ICCV.

  • Machine learning with signal processing: Part IVArno Solin

    13/25

    Multi-view stereo as a temporal fusion problem

    Hou et al. (2019). ICCV. Video: https://youtu.be/iellGrlNW7k

    https://youtu.be/iellGrlNW7k

  • Machine learning with signal processing: Part IVArno Solin

    14/25

    Sequential priors in GAN latent space

    interpolate

    Hou et al. Submitted.

  • Machine learning with signal processing: Part IVArno Solin

    15/25

    Latent differential equations

    I Define an implicit prior overfunctions through dynamics

    I Define observation likelihoods.Anything differentiable w.r.t.latent state (e.g. text models!)

    I Train everything jointly withautomatic differentiation +variational inference

    Adopted form David Duvenaud.

  • Machine learning with signal processing: Part IVArno Solin

    16/25

    Latent ODEs

    I Learning low-rank latent representations of possibly high-dimensionalsequential data trajectories

    I Combination of variational auto-encoders (VAEs) with sequential datawith a latent space governed by a continuous-time ODE

    Yildiz et al. (2019). NeurIPS.

  • Machine learning with signal processing: Part IVArno Solin

    17/25

    In SDEs: Need to store noise

    I Infinite reparameterization trick: Use same Brownianmotion sample on forward and reverse passes

    Li et al. (2020). AISTATS.

  • Machine learning with signal processing: Part IVArno Solin

    18/25

    Latent SDE models

    Ornstein–Uhlenbeck priorLaplace likelihood

    MOCAP example50D data, 6D latent space,

    11000 params

    Li et al. (2020). AISTATS.

  • Machine learning with signal processing: Part IVArno Solin

    19/25

    Langevin dynamics

    I Molecular simulation

    I Sampling and parameterinference(e.g., Metropolis-adjustedLangevin algorithm)

    I Diffusion models(even for images!)

    Langevin’s model for brownian motion

    Denoising diffusion probabilistic modelwith Langevin dynamics

    Ho et al. Submitted.

  • Machine learning with signal processing: Part IVArno Solin

    20/25

    Summary

    Part I

    Tools anddiscrete-time

    models

    Part II

    SDEs(continuous-time

    models)

    Part III

    Gaussianprocesses

    �Part IV

    Applicationexamples

  • Machine learning with signal processing: Part IVArno Solin

    21/25

    Acknowledgements

    Simo Särkkä(Aalto)

    Manon Kok(TU Delft)

    Thomas Schön(Uppsala)

    Richard Turner(Cambridge)

    James Hensman(Amazon)

    Nicolas Durrande(prowler.io)

    Michael RiisAndersen

    (DTU)

    Emtiyaz Khan(RIKEN)

    The following researchers kindly provided material/examples:David Duvenaud, Markus Heinonen, Cagatay Yildiz, Jonathan Ho

  • Machine learning with signal processing: Part IVArno Solin

    22/25

    Acknowledgements

    Lassi William Paul

    Yaowei Prakhar Ari Otto

    Santiago Yuxin Jerry Christabella

    My group at Aalto University

  • Machine learning with signal processing: Part IVArno Solin

    23/25

    ReferencesThe examples in this tutorial were related to the following publications:

    � Solin, A., Särkkä, S., Kannala, J., and Rahtu, E. (2016). Terrain navigation in themagnetic landscape: Particle filtering for indoor positioning. EuropeanNavigation Conference (ENC).

    � Solin, A., Cortés, S., Rahtu, E., and Kannala, J. (2018). PIVO: Probabilisticinertial-visual odometry for occlusion-robust navigation. IEEE Winter Conferenceon Applications of Computer Vision (WACV).

    � Särkkä, S., Solin, A., and Hartikainen, J. (2013). Spatio-temporal learning viainfinite-dimensional Bayesian filtering and smoothing.IEEE Signal Processing Magazine, 30(4):51–61.

    � Särkkä, S., and Solin, A. (2019). Applied Stochastic Differential Equations.Cambridge University Press. Cambridge, UK.

    � Solin, A. (2016). Stochastic Differential Equation Methods for Spatio-TemporalGaussian Process Regression. Doctoral dissertation, Aalto University.

    � Solin, A., Hensman, J., and Turner, R.E. (2018). Infinite-horizon Gaussianprocesses. Advances in Neural Information Processing Systems (NeurIPS).

    � Durrande, N., Adam, V., Bordeaux, L., Eleftheriadis, E., Hensman, J. (2019).Banded matrix operators for Gaussian Markov models in the automaticdifferentiation era. International Conference on Artificial Intelligence andStatistics (AISTATS). PMLR 89:2780–2789.

  • Machine learning with signal processing: Part IVArno Solin

    24/25

    ReferencesThe examples in this tutorial were related to the following publications:

    � Nickisch, H., Solin, A., and Grigorievskiy, A. (2018). State apace Gaussianprocesses with non-Gaussian likelihood. International Conference on MachineLearning (ICML). PMLR 80:3789–3798.

    � Wilkinson, W. J., Chang, P., Riis Andersen, M., and Solin, A. (2020). State spaceexpectation propagation: Efficient inference schemes for temporal Gaussianprocesses. International Conference on Machine Learning (ICML).

    � Hou, Y., Kannala, J. and Solin, A. (2019). Multi-view stereo by temporalnonparametric fusion. International Conference on Computer Vision (ICCV).

    � Hou, Y., Heljakka, A. and Solin, A. (submitted). Gaussian process priors forview-aware inference. arXiv:1912.03249.

    � Yildiz, C., Heinonen, M., and Lähdesmäki, H. (2019). ODE2VAE: Deepgenerative second order ODEs with Bayesian neural networks. Advances inNeural Information Processing Systems (NeurIPS).

    � Li, X., Wong, T.-K. L., Chen, R. T. Q., and Duvenaud, D. (2020). Scalablegradients for stochastic differential equations. International Conference onArtificial Intelligence and Statistics (AISTATS).

    � Ho, J., Jain, A., and Abbeel, P. (submitted). Denoising diffusion probabilisticmodels. arXiv:2006.11239.

  • Machine learning with signal processing: Part IVArno Solin

    25/25

    Machine learning with signal processing

    Part I

    Tools anddiscrete-time

    models

    Part II

    SDEs(continuous-time

    models)

    Part III

    Gaussianprocesses

    �Part IV

    Applicationexamples

    anm1: 1.25: 1.24: 1.23: 1.22: 1.21: 1.20: 1.19: 1.18: 1.17: 1.16: 1.15: 1.14: 1.13: 1.12: 1.11: 1.10: 1.9: 1.8: 1.7: 1.6: 1.5: 1.4: 1.3: 1.2: 1.1: 1.0: anm0: 0.26: 0.25: 0.24: 0.23: 0.22: 0.21: 0.20: 0.19: 0.18: 0.17: 0.16: 0.15: 0.14: 0.13: 0.12: 0.11: 0.10: 0.9: 0.8: 0.7: 0.6: 0.5: 0.4: 0.3: 0.2: 0.1: 0.0: