supplemental material: overdispersed variational...
TRANSCRIPT
Supplemental Material: Overdispersed Variational Autoencoders
Harshil Shah and David BarberDepartment of Computer Science
University College London
Neural network architecturesBelow, we describe the architecture of all of the neural net-works used for the parameters of the generative and varia-tional distributions. Note that all hidden units used the tanhnonlinearity.
Hidden layers Output nonlinear-ity
MN
IST πππθ(h) 2 × 200 units sigmoid
ηηηMNISTφ (x) 2 × 200 units linear
ΩΩΩMNISTφ (x) 2 × 200 units exp
FF
µµµθ(h) 2 × 100 units sigmoidΣΣΣθ(h) 2 × 100 units expηηηFFφ (x) 2 × 100 units linearΩΩΩFFφ (x) 2 × 100 units exp
Variance of gradient estimatesBelow, we show the sample variances of the gradient esti-mates during the first 1,000 iterations of training, for bothdatasets.
MNISTVAE vs. OVAE
Copyright c© 2017, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.
IWAE vs. OIWAE
For the MNIST dataset, The OVAE gradient estimateshave noticeably lower variance than do the VAE updates.Comparing the OIWAE against the IWAE, this is less clear,except in the first 300 training iterations.
Frey Faces
VAE vs. OVAE
IWAE vs. OIWAE
As with MNIST, for the Frey Faces dataset, the OVAEgradient estimates have noticeably lower variance than dothe VAE updates. However, this is not the case for the OI-WAE when compared against the IWAE, which have verysimilar variances.
Generated output
Below we show sample outputs generated from the prior andposterior of the learned models, for both datasets.
MNIST
OVAE: Prior
OVAE: Posterior
OIWAE: Prior
OIWAE: Posterior
Frey Faces
OVAE: Prior
OVAE: Posterior
OIWAE: Prior
OIWAE: Posterior