deep generative models i

41
Deep Generative Models I Variational Autoencoders – Part II Machine Perception Otmar Hilliges 23 April 2020

Upload: others

Post on 02-Apr-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Generative Models I

Deep Generative Models IVariational Autoencoders – Part II

Machine Perception

Otmar Hilliges

23 April 2020

Page 2: Deep Generative Models I

Last week(s)

Intro to generative modelling

VAE

Derivation of the ELBO

4/23/2020 2

Page 3: Deep Generative Models I

This week

More generative models

Step I: Variatonal autoencoders: β€’ Training

β€’ Representation learning & disentanglement

β€’ Applications

Step II: Generative adverserial networks

4/23/2020 3

Page 4: Deep Generative Models I

Generative Modelling

Given training data, generate new samples, drawn from β€œsame” distribution

We want π‘π‘šπ‘œπ‘‘π‘’π‘™(π‘₯) to be similar to π‘π‘‘π‘Žπ‘‘π‘Ž(π‘₯)

4/23/2020 4

Training samples ~π‘π‘‘π‘Žπ‘‘π‘Ž(π‘₯) Generated samples ~π‘π‘šπ‘œπ‘‘π‘’π‘™(π‘₯)

Page 5: Deep Generative Models I

Variational autoencoder

4/23/2020 5

π‘₯

Inference modelGenerative model

𝑧

Page 6: Deep Generative Models I

Recap: ELBO

Ε𝑧 log π‘Ξ˜ π‘₯ 𝑖 𝑧 ] βˆ’ 𝐷𝐾𝐿(π‘žπœ™ 𝑧 π‘₯ 𝑖 ) βˆ₯ π‘Ξ˜ 𝑧 𝑖 )

4/23/2020 6

β„’(π‘₯ 𝑖 , Θ, πœ™)

𝒙

𝝁 𝒛 𝒙 𝚺 𝒛 𝒙

Sample 𝑧|π‘₯~𝒩 πœ‡ 𝑧 π‘₯ , Ξ£ 𝑧 π‘₯

𝒛

𝝁 𝒙 𝒛 𝚺 𝒙 𝒛

Sample x|𝑧~𝒩 πœ‡ π‘₯ 𝑧 , Ξ£ π‘₯ 𝑧

ෝ𝒙

Encoder networkπ‘žπœ™ 𝑧 π‘₯

Decoder networkπ‘Ξ˜ π‘₯ 𝑧

Make approximate posteriorSimilar to prior

Maximize reconstruction likelihood

Page 7: Deep Generative Models I

Training VAEs

Training of VAEs is just backprop.

But wait, how do gradients flow through 𝑧?

Or any other random operation (requiring sampling)?

(see blackboard)

4/23/2020 7

Page 8: Deep Generative Models I

4/23/2020 8

Page 9: Deep Generative Models I
Page 10: Deep Generative Models I

Generating Data

4/23/2020 10

𝒛

𝝁 𝒙 𝒛 𝚺 𝒙 𝒛

Sample x|𝑧~𝒩 πœ‡ π‘₯ 𝑧 , Ξ£ π‘₯ 𝑧

ෝ𝒙

Decoder networkπ‘Ξ˜ π‘₯ 𝑧

At runtime only use decoder network. Sample z from prior.

Sample 𝑧~𝒩 0, 𝐼

[Kingma and Welling, β€œAuto-Encoding Variational Bayes”, ICLR 2014]

Page 11: Deep Generative Models I

Cross-modal Deep Variational Hand Pose Estimation

Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges

CVPR 2018

Page 12: Deep Generative Models I

RGB hand pose estimation

4/23/2020 12

Page 13: Deep Generative Models I

Assumption: Manifold of valid hand poses

Adopted from [Tagliasacchi et al. 2015]

Page 14: Deep Generative Models I

Cross-modal latent space – Inference model

Page 15: Deep Generative Models I

Cross-modal latent space – Generative model

Page 16: Deep Generative Models I

+++

[Spurr et al.: CVPR β€˜18]

Page 17: Deep Generative Models I

Cross-modal training objective

Original VAE considers only one modality β†’ need to modify to consider multiple modalities

Consider π‘₯𝑖 and π‘₯𝑑 samples from different modalities (e.g RGB and 3D joint positions)

Important: Both describe inherently the same thing. In our case, the hand pose.

Task: maximize log π‘πœƒ(π‘₯𝑑) under π‘₯𝑖

17

[Spurr et al.: CVPR β€˜18]

Page 18: Deep Generative Models I

Cross-modal training objective derivation

log 𝑝 π‘₯𝑑 = 𝑧 π‘ž 𝑧 π‘₯𝑖 log 𝑝 π‘₯𝑑 𝑑𝑧 = 𝑧 π‘ž 𝑧 π‘₯𝑖 log𝑝 π‘₯𝑑 𝑝 𝑧 π‘₯𝑑 π‘ž(𝑧|π‘₯𝑖)

𝑝 𝑧 π‘₯𝑑 π‘ž(𝑧|π‘₯𝑖)𝑑𝑧 (Bayes’ Rule)

= 𝑧 π‘ž 𝑧 π‘₯𝑖 logπ‘ž 𝑧 π‘₯𝑖𝑝 𝑧 π‘₯𝑑

𝑑𝑧 + 𝑧 π‘ž 𝑧 π‘₯𝑖 log𝑝 π‘₯𝑑 𝑝(𝑧|π‘₯𝑑)

π‘ž(𝑧|π‘₯𝑖)𝑑𝑧 (Logarithms)

= 𝐷𝐾𝐿(π‘ž(𝑧|π‘₯𝑖)||𝑝 𝑧 π‘₯𝑑 ) + 𝑧 π‘ž 𝑧 π‘₯𝑖 log𝑝 π‘₯𝑑 𝑧 𝑝 𝑧

π‘ž 𝑧 π‘₯𝑖𝑑𝑧 (Definition KL)

β‰₯ 𝑧 π‘ž 𝑧 π‘₯𝑖 log 𝑝 π‘₯𝑑 𝑧 𝑑𝑧 βˆ’ 𝑧 π‘ž 𝑧 π‘₯𝑖 logπ‘ž 𝑧 π‘₯𝑖𝑝 𝑧

𝑑𝑧 ( logπ‘₯

𝑦=βˆ’ log

𝑦

π‘₯)

= 𝔼𝑧~π‘ž 𝑧 π‘₯𝑖 [log 𝑝 π‘₯𝑑 𝑧)] βˆ’ 𝐷𝐾𝐿(π‘ž(𝑧|π‘₯𝑖)||𝑝 𝑧 ) (𝔼 over z, KL)

18

[Spurr et al.: CVPR β€˜18]

Page 19: Deep Generative Models I

Trained individually Trained cross-modal

t-SNE embedding comparisons[Spurr et al.: CVPR β€˜18]

Page 20: Deep Generative Models I

RGB Input Ground Truth Prediction

Posterior estimates: RGB (real) to 3D joints[Spurr et al.: CVPR β€˜18]

Page 21: Deep Generative Models I

Synthetic hands: Latent Space Walk

[Synthetic samples RGB] [Synthetic samples depth]

[Spurr et al.: CVPR β€˜18]

Page 22: Deep Generative Models I

Learning useful representations

4/23/2020 22

Goal: Learn features that capture similarities and dissimilarities Requirement: Objective that defines notion of utility (task-dependent)

Page 23: Deep Generative Models I

Autoencoders

4/23/2020 23

X Z 𝐗

f g

Notion of utility: Ability to reconstruct pixels from code (features are a compressed representation of original data)

Page 24: Deep Generative Models I

Autoencoders

4/23/2020 24

X Z

f

Entangled Representation: Individual dimensions in code encode some unknown combination of features in the data.

Learned representation

Page 25: Deep Generative Models I

Goal: Disentangled Representations

4/23/2020 25

Digit

Styl

e

Input EncoderFactorized

Code

Goal: Learn features that correspond to distinct factors of variation One Notion of Utility: statistical independence

Page 26: Deep Generative Models I

Regular vs Variational Autoencoders

4/23/2020 26

AE (z-dim 50, TSNE) VAE (z-dim 50, TSNE)

Issue: KL-Divergence regularizes values of z butRepresentations are still entangled

Page 27: Deep Generative Models I

Unsupervised vs Semi-supervised

4/23/2020 27

[image source: Babak Esmaeli]

[Kingma et al. Semi-Supervised Learning with Deep Generative Models. NIPS 2014]

Digit

Styl

e

Separate label 𝑦 from β€œnuisance” variable 𝑧

Page 28: Deep Generative Models I

Learning factorized representations (unsupervised)

4/23/2020 28

[image source: Emile Mathieu]

Page 29: Deep Generative Models I

𝛽-VAE

Goal: Learn disentangled representation without supervision

Idea: Provide framework for automated discovery of interpretable factorised latent representations

Approach: modification of the VAE, introducing an adjustable hyperparameter beta that balances latent channel capacity and independence constraints with reconstruction accuracy.

4/23/2020 29

[Higgins et al., β€œLearning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Page 30: Deep Generative Models I

4/23/2020 30

𝛽-VAE [Higgins et al., β€œLearning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Page 31: Deep Generative Models I

𝛽-VAE

4/23/2020 31

[Higgins et al., β€œLearning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Conditionally independent factors

Conditionally dependent factors

Assume that samples x are generated by controlling the generative factors v and w.

Page 32: Deep Generative Models I

4/23/2020 32

𝛽-VAE – Training Objective [Higgins et al., β€œLearning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

maxπœ™,πœƒ

𝔼π‘₯~π’Ÿ 𝔼𝑧~π‘žπœ™(𝑧|π‘₯) log π‘πœƒ(π‘₯|𝑧)

subject to DKL π‘žπœ™ 𝑧 π‘₯ βˆ₯ π‘πœƒ 𝑧 < 𝛿

Page 33: Deep Generative Models I

4/23/2020 33

𝛽-VAE – Training Objective [Higgins et al., β€œLearning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Page 34: Deep Generative Models I

Entangled versus Disentangled Representations

4/23/2020 34

𝛽-VAE vanilla VAE

Page 35: Deep Generative Models I

Disentangling Latent Hands for Image Synthesis and Pose Estimation

Linlin Yang, Angela Yao

CVPR 2019

Page 36: Deep Generative Models I

Disentangling latent factors

4/23/2020 36

[Yang et al.: CVPR β€˜19]

Page 37: Deep Generative Models I

Disentangling latent factors

4/23/2020 37

[Yang et al.: CVPR β€˜19]

𝐿 πœ™π‘₯, πœ™π‘¦1 , πœ™π‘¦2 , πœƒπ‘₯, πœƒπ‘¦1 , πœƒπ‘¦2 = 𝐸𝐿𝐡𝑂𝑑𝑖𝑠 π‘₯, 𝑦1, 𝑦2, πœ™π‘¦1 , πœ™π‘¦2 , πœƒπ‘₯, πœƒπ‘¦1 , πœƒπ‘¦2+ πΈπΏπ΅π‘‚π‘’π‘šπ‘(π‘₯, 𝑦1, 𝑦2, πœ™π‘₯).

where

log 𝑝 π‘₯, 𝑦1, 𝑦2 β‰₯ 𝐸𝐿𝐡𝑂𝑑𝑖𝑠 π‘₯, 𝑦1, 𝑦2, πœ™π‘¦1 , πœ™π‘¦2 , πœƒπ‘₯, πœƒπ‘¦1 , πœƒπ‘¦2= πœ†π‘₯𝔼𝑧~π‘žπœ™π‘¦1,πœ™π‘¦2

log π‘πœƒπ‘₯ π‘₯ 𝑧

+ πœ†π‘¦1𝔼𝑧𝑦1~π‘žπœ™π‘¦1log π‘πœƒπ‘¦1 𝑦1 𝑧𝑦1

+ πœ†π‘¦2𝔼𝑧𝑦2~π‘žπœ™π‘¦2log π‘πœƒπ‘¦2 𝑦2 𝑧𝑦2

βˆ’ 𝛽𝐷𝐾𝐿 π‘žπœ™π‘¦1 ,πœ™π‘¦2𝑧 𝑦1, 𝑦2 ||𝑝 𝑧

and

πΈπΏπ΅π‘‚π‘’π‘šπ‘ π‘₯, 𝑦1, 𝑦2, πœ™π‘₯ = πœ†β€²π‘₯𝔼𝑧~π‘žπœ™π‘₯ log π‘πœƒπ‘₯ π‘₯ 𝑧

+ πœ†β€²π‘¦1𝔼𝑧𝑦1~π‘žπœ™π‘₯ log π‘πœƒπ‘¦1 𝑦1 𝑧𝑦1

+ πœ†β€²π‘¦2𝔼𝑧𝑦2~π‘žπœ™π‘₯ log π‘πœƒπ‘¦2 𝑦2 𝑧𝑦2

βˆ’ 𝛽′𝐷𝐾𝐿 π‘žπœ™π‘₯(𝑧|π‘₯)||𝑝 𝑧

Page 38: Deep Generative Models I

Latenspace walk with disentangled z

4/23/2020 38

Page 39: Deep Generative Models I

Latenspace walk with disentangled z

4/23/2020 39

Page 40: Deep Generative Models I

Limitations of VAEs

4/23/2020 40

Tendency to generate blurry images.

Believed to be due to injected noise and weak inference models (Gaussian assumption of latent samples to simplistic, model capacity to weak)

More expressive model β†’substantially better results (e.gKingma et al. 2016, Improving variational inference with inverse autoregressive flow)

Page 41: Deep Generative Models I

Next

Deep Generative Modelling II:

Generative Adversarial Networks (GANs)

4/23/2020 41