supplemental material: overdispersed variational...

3
Supplemental Material: Overdispersed Variational Autoencoders Harshil Shah and David Barber Department of Computer Science University College London Neural network architectures Below, we describe the architecture of all of the neural net- works used for the parameters of the generative and varia- tional distributions. Note that all hidden units used the tanh nonlinearity. Hidden layers Output nonlinear- ity MNIST π θ (h) 2 × 200 units sigmoid η MNIST φ (x) 2 × 200 units linear Ω MNIST φ (x) 2 × 200 units exp FF μ θ (h) 2 × 100 units sigmoid Σ θ (h) 2 × 100 units exp η FF φ (x) 2 × 100 units linear Ω FF φ (x) 2 × 100 units exp Variance of gradient estimates Below, we show the sample variances of the gradient esti- mates during the first 1,000 iterations of training, for both datasets. MNIST VAE vs. OVAE Copyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. IWAE vs. OIWAE For the MNIST dataset, The OVAE gradient estimates have noticeably lower variance than do the VAE updates. Comparing the OIWAE against the IWAE, this is less clear, except in the first 300 training iterations. Frey Faces VAE vs. OVAE

Upload: others

Post on 03-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplemental Material: Overdispersed Variational Autoencodersweb4.cs.ucl.ac.uk/staff/D.Barber/publications/ShahBarberSupplemen… · Harshil Shah and David Barber Department of Computer

Supplemental Material: Overdispersed Variational Autoencoders

Harshil Shah and David BarberDepartment of Computer Science

University College London

Neural network architecturesBelow, we describe the architecture of all of the neural net-works used for the parameters of the generative and varia-tional distributions. Note that all hidden units used the tanhnonlinearity.

Hidden layers Output nonlinear-ity

MN

IST πππθ(h) 2 × 200 units sigmoid

ηηηMNISTφ (x) 2 × 200 units linear

ΩΩΩMNISTφ (x) 2 × 200 units exp

FF

µµµθ(h) 2 × 100 units sigmoidΣΣΣθ(h) 2 × 100 units expηηηFFφ (x) 2 × 100 units linearΩΩΩFFφ (x) 2 × 100 units exp

Variance of gradient estimatesBelow, we show the sample variances of the gradient esti-mates during the first 1,000 iterations of training, for bothdatasets.

MNISTVAE vs. OVAE

Copyright c© 2017, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

IWAE vs. OIWAE

For the MNIST dataset, The OVAE gradient estimateshave noticeably lower variance than do the VAE updates.Comparing the OIWAE against the IWAE, this is less clear,except in the first 300 training iterations.

Frey Faces

VAE vs. OVAE

Page 2: Supplemental Material: Overdispersed Variational Autoencodersweb4.cs.ucl.ac.uk/staff/D.Barber/publications/ShahBarberSupplemen… · Harshil Shah and David Barber Department of Computer

IWAE vs. OIWAE

As with MNIST, for the Frey Faces dataset, the OVAEgradient estimates have noticeably lower variance than dothe VAE updates. However, this is not the case for the OI-WAE when compared against the IWAE, which have verysimilar variances.

Generated output

Below we show sample outputs generated from the prior andposterior of the learned models, for both datasets.

MNIST

OVAE: Prior

OVAE: Posterior

OIWAE: Prior

OIWAE: Posterior

Page 3: Supplemental Material: Overdispersed Variational Autoencodersweb4.cs.ucl.ac.uk/staff/D.Barber/publications/ShahBarberSupplemen… · Harshil Shah and David Barber Department of Computer

Frey Faces

OVAE: Prior

OVAE: Posterior

OIWAE: Prior

OIWAE: Posterior