machine learning with signal processingasolin/icml2020-tutorial/... · 2020. 7. 13. · machine...
TRANSCRIPT
-
Machine Learning with Signal ProcessingPart IV: Application Examples
Arno SolinAssistant Professor in Machine Learning
Department of Computer ScienceAalto University
ICML 2020 TUTORIAL
7@arnosolin B arno.solin.fi
https://twitter.com/arnosolinhttp://arno.solin.fi
-
Machine learning with signal processing: Part IVArno Solin
2/25
Outline
Non-Gaussianlikelihoods
Spatio-temporal GPs
Latent-space GPs
Latent differentialequations
Langevindynamics
Summary
-
Machine learning with signal processing: Part IVArno Solin
3/25
Gaussian processes ♥ SDEs
GPs under the kernel formalism
f (t) ∼ GP(0, κ(t , t ′))
y | f ∼∏
i
p(yi | f (ti))
Stochastic differential equations
df(t) = F f(t) + L dβ(t)
yi ∼ p(yi | hTf(ti))
Flexible modelspecification
Inference /First-principles
-
Machine learning with signal processing: Part IVArno Solin
4/25
Non-Gaussian likelihoods
I The observation model might not be Gaussian
f (t) ∼ GP(0, κ(t , t ′))y | f ∼
∏i
p(yi | f (ti))
I There exists a multitude of great methods to tackle generallikelihoods with approximations of the form
Q(f | D) = N(f | m + Kα, (K−1 + W)−1)
I Use those methods, but deal with the latent using statespace models
-
Machine learning with signal processing: Part IVArno Solin
5/25
Inference
I Laplace approximation
I Variational Bayes
I Direct KL minimization
I EP or Assumed density filtering (Single-sweep EP)
I Can be evaluated in terms of a (Kalman) filter forward andbackward pass, or by iterating them
-
Machine learning with signal processing: Part IVArno Solin
6/25
Example
I Commercial aircraft accidents 1919–2017I Log-Gaussian Cox process (Poisson likelihood) by ADF/EPI Daily binning, n = 35,959I GP prior with a covariance function:
κ(t , t ′) = κν=3/2Mat. (t , t′) + κyearPer. (t , t
′)κν=3/2Mat. (t , t
′) + κweekPer. (t , t′)κ
ν=3/2Mat. (t , t
′)
I Learn hyperparameters by optimizing the marginallikelihood
Nickisch et al. (2018). ICML.
-
Machine learning with signal processing: Part IVArno Solin
6/25
Example
I Commercial aircraft accidents 1919–2017I Log-Gaussian Cox process (Poisson likelihood) by ADF/EPI Daily binning, n = 35,959I GP prior with a covariance function:
κ(t , t ′) = κν=3/2Mat. (t , t′) + κyearPer. (t , t
′)κν=3/2Mat. (t , t
′) + κweekPer. (t , t′)κ
ν=3/2Mat. (t , t
′)
I Learn hyperparameters by optimizing the marginallikelihood
Nickisch et al. (2018). ICML.
-
Machine learning with signal processing: Part IVArno Solin
6/25
Example
I Commercial aircraft accidents 1919–2017I Log-Gaussian Cox process (Poisson likelihood) by ADF/EPI Daily binning, n = 35,959I GP prior with a covariance function:
κ(t , t ′) = κν=3/2Mat. (t , t′) + κyearPer. (t , t
′)κν=3/2Mat. (t , t
′) + κweekPer. (t , t′)κ
ν=3/2Mat. (t , t
′)
I Learn hyperparameters by optimizing the marginallikelihood
Nickisch et al. (2018). ICML.
-
Machine learning with signal processing: Part IVArno Solin
7/25
Spatio-temporal GPs
f (r) ∼ GP(0, κ(r, r′))y | f ∼
∏i
p(yi | f (ri))
f (x, t) ∼ GP(0, κ(x, t ;x′, t ′))y | f ∼
∏i
p(yi | f (xi , ti))
-
Machine learning with signal processing: Part IVArno Solin
8/25
Spatio-temporal Gaussian processes
GPs under the kernel formalism
f (x, t) ∼ GP(0, k(x, t ; x′, t ′))yi = f (xi , ti) + εi
Stochastic partial differential equations
∂f(x, t)∂t
= F f(x, t) +Lw(x, t)
yi = Hi f(x, t) + εi
Location(x) T
ime(t)
f(x
,t)
Covariancek(x, t; x′, t′)
Location(x) T
ime(t)
f(x
,t)
The state at time t
-
Machine learning with signal processing: Part IVArno Solin
9/25
Spatio-temporal GP regression
Särkkä et al. (2013). IEEE Signal Processing Magazine.
-
Machine learning with signal processing: Part IVArno Solin
10/25
Spatio-temporal GP regression
Särkkä et al. (2013). IEEE Signal Processing Magazine.
-
Machine learning with signal processing: Part IVArno Solin
11/25
Spatio-temporal GP classification
Wilkinson et al. (2020). ICML.
-
Machine learning with signal processing: Part IVArno Solin
12/25
Multi-view stereo as a temporal fusion problemC
amer
apo
seFr
ame
Cos
tvol
.E
ncod
erD
ecod
erD
ispa
rity/
Dep
thLa
tent
GP
Pose
sim
ilarit
y
z0
skip
y1
z1
skip
y2
z2
skip
y3
z3
skip
y4
z4
I Inputs: Frame pairs and relativecamera poses
I Encoder–decoder network fordepth estimation
I Treat the encoder as a featureextractor, and do GP regressionin the latent space
I The GP prior encodes thesimilarity of the camera views
Hou et al. (2019). ICCV.
-
Machine learning with signal processing: Part IVArno Solin
13/25
Multi-view stereo as a temporal fusion problem
Hou et al. (2019). ICCV. Video: https://youtu.be/iellGrlNW7k
https://youtu.be/iellGrlNW7k
-
Machine learning with signal processing: Part IVArno Solin
14/25
Sequential priors in GAN latent space
interpolate
Hou et al. Submitted.
-
Machine learning with signal processing: Part IVArno Solin
15/25
Latent differential equations
I Define an implicit prior overfunctions through dynamics
I Define observation likelihoods.Anything differentiable w.r.t.latent state (e.g. text models!)
I Train everything jointly withautomatic differentiation +variational inference
Adopted form David Duvenaud.
-
Machine learning with signal processing: Part IVArno Solin
16/25
Latent ODEs
I Learning low-rank latent representations of possibly high-dimensionalsequential data trajectories
I Combination of variational auto-encoders (VAEs) with sequential datawith a latent space governed by a continuous-time ODE
Yildiz et al. (2019). NeurIPS.
-
Machine learning with signal processing: Part IVArno Solin
17/25
In SDEs: Need to store noise
I Infinite reparameterization trick: Use same Brownianmotion sample on forward and reverse passes
Li et al. (2020). AISTATS.
-
Machine learning with signal processing: Part IVArno Solin
18/25
Latent SDE models
Ornstein–Uhlenbeck priorLaplace likelihood
MOCAP example50D data, 6D latent space,
11000 params
Li et al. (2020). AISTATS.
-
Machine learning with signal processing: Part IVArno Solin
19/25
Langevin dynamics
I Molecular simulation
I Sampling and parameterinference(e.g., Metropolis-adjustedLangevin algorithm)
I Diffusion models(even for images!)
Langevin’s model for brownian motion
Denoising diffusion probabilistic modelwith Langevin dynamics
Ho et al. Submitted.
-
Machine learning with signal processing: Part IVArno Solin
20/25
Summary
Part I
Tools anddiscrete-time
models
Part II
SDEs(continuous-time
models)
Part III
Gaussianprocesses
�Part IV
Applicationexamples
-
Machine learning with signal processing: Part IVArno Solin
21/25
Acknowledgements
Simo Särkkä(Aalto)
Manon Kok(TU Delft)
Thomas Schön(Uppsala)
Richard Turner(Cambridge)
James Hensman(Amazon)
Nicolas Durrande(prowler.io)
Michael RiisAndersen
(DTU)
Emtiyaz Khan(RIKEN)
The following researchers kindly provided material/examples:David Duvenaud, Markus Heinonen, Cagatay Yildiz, Jonathan Ho
-
Machine learning with signal processing: Part IVArno Solin
22/25
Acknowledgements
Lassi William Paul
Yaowei Prakhar Ari Otto
Santiago Yuxin Jerry Christabella
My group at Aalto University
-
Machine learning with signal processing: Part IVArno Solin
23/25
ReferencesThe examples in this tutorial were related to the following publications:
� Solin, A., Särkkä, S., Kannala, J., and Rahtu, E. (2016). Terrain navigation in themagnetic landscape: Particle filtering for indoor positioning. EuropeanNavigation Conference (ENC).
� Solin, A., Cortés, S., Rahtu, E., and Kannala, J. (2018). PIVO: Probabilisticinertial-visual odometry for occlusion-robust navigation. IEEE Winter Conferenceon Applications of Computer Vision (WACV).
� Särkkä, S., Solin, A., and Hartikainen, J. (2013). Spatio-temporal learning viainfinite-dimensional Bayesian filtering and smoothing.IEEE Signal Processing Magazine, 30(4):51–61.
� Särkkä, S., and Solin, A. (2019). Applied Stochastic Differential Equations.Cambridge University Press. Cambridge, UK.
� Solin, A. (2016). Stochastic Differential Equation Methods for Spatio-TemporalGaussian Process Regression. Doctoral dissertation, Aalto University.
� Solin, A., Hensman, J., and Turner, R.E. (2018). Infinite-horizon Gaussianprocesses. Advances in Neural Information Processing Systems (NeurIPS).
� Durrande, N., Adam, V., Bordeaux, L., Eleftheriadis, E., Hensman, J. (2019).Banded matrix operators for Gaussian Markov models in the automaticdifferentiation era. International Conference on Artificial Intelligence andStatistics (AISTATS). PMLR 89:2780–2789.
-
Machine learning with signal processing: Part IVArno Solin
24/25
ReferencesThe examples in this tutorial were related to the following publications:
� Nickisch, H., Solin, A., and Grigorievskiy, A. (2018). State apace Gaussianprocesses with non-Gaussian likelihood. International Conference on MachineLearning (ICML). PMLR 80:3789–3798.
� Wilkinson, W. J., Chang, P., Riis Andersen, M., and Solin, A. (2020). State spaceexpectation propagation: Efficient inference schemes for temporal Gaussianprocesses. International Conference on Machine Learning (ICML).
� Hou, Y., Kannala, J. and Solin, A. (2019). Multi-view stereo by temporalnonparametric fusion. International Conference on Computer Vision (ICCV).
� Hou, Y., Heljakka, A. and Solin, A. (submitted). Gaussian process priors forview-aware inference. arXiv:1912.03249.
� Yildiz, C., Heinonen, M., and Lähdesmäki, H. (2019). ODE2VAE: Deepgenerative second order ODEs with Bayesian neural networks. Advances inNeural Information Processing Systems (NeurIPS).
� Li, X., Wong, T.-K. L., Chen, R. T. Q., and Duvenaud, D. (2020). Scalablegradients for stochastic differential equations. International Conference onArtificial Intelligence and Statistics (AISTATS).
� Ho, J., Jain, A., and Abbeel, P. (submitted). Denoising diffusion probabilisticmodels. arXiv:2006.11239.
-
Machine learning with signal processing: Part IVArno Solin
25/25
Machine learning with signal processing
Part I
Tools anddiscrete-time
models
Part II
SDEs(continuous-time
models)
Part III
Gaussianprocesses
�Part IV
Applicationexamples
anm1: 1.25: 1.24: 1.23: 1.22: 1.21: 1.20: 1.19: 1.18: 1.17: 1.16: 1.15: 1.14: 1.13: 1.12: 1.11: 1.10: 1.9: 1.8: 1.7: 1.6: 1.5: 1.4: 1.3: 1.2: 1.1: 1.0: anm0: 0.26: 0.25: 0.24: 0.23: 0.22: 0.21: 0.20: 0.19: 0.18: 0.17: 0.16: 0.15: 0.14: 0.13: 0.12: 0.11: 0.10: 0.9: 0.8: 0.7: 0.6: 0.5: 0.4: 0.3: 0.2: 0.1: 0.0: