generative adversarial networks - university of...
TRANSCRIPT
Generative adversarial networks
BigGAN (2018)
EBGAN (2017)
Outline• Generative tasks• Formulations
• Original GAN• DCGAN• WGAN, improved WGAN• LSGAN
• Issues• Stability• Mode collapse• Evaluation methodology
Generative tasks• Generation (from scratch): learn to sample from
the distribution represented by the training set• Unsupervised learning task
Generative tasks• Image-to-image translation
P. Isola, J.-Y. Zhu, T. Zhou, A. Efros, Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017
Designing a network for generative tasks1. We need an architecture that can generate an
image• Recall upsampling architectures for dense prediction
Random seed or
latent code
Unconditional generation
Designing a network for generative tasks1. We need an architecture that can generate an
image• Recall upsampling architectures for dense prediction
Image-to-image translation
Designing a network for generative tasks1. We need an architecture that can generate an
image• Recall upsampling architectures for dense prediction
2. We need to design the right loss function
Learning to sample
Training data ! ~ #$%&% Generated samples ! ~ #'($)*
We want to learn !'($)* that matches !$%&%
Adapted from Stanford CS231n
Generative adversarial networks• Train two networks with opposing objectives:
• Generator: learns to generate samples• Discriminator: learns to distinguish between
generated and real samples
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, NIPS 2014
!Random noise "
#“Fake”
#“Real”
Figure adapted from F. Fleuret
GAN objective• The discriminator !(#) should output the probability
that the sample # is real • That is, we want !(#) to be close to 1 for real data
and close to 0 for fake• Expected conditional log likelihood for real and
generated data:%%~'()*) log!(#)
= %%~'()*) log!(#) + %+~'log(1 − ! -(.) )We seed the generator with noise ,drawn from a simple distribution -
(Gaussian or uniform)
+ %%~'./0 log 1 − ! #
GAN objective! ",$ = &!~#$%&% log$(+) + &'~#log(1 − $ "(0) )
• The discriminator wants to correctly distinguish real and fake samples:
$∗ = arg max) !(", $)• The generator wants to fool the discriminator:
"∗ = arg min* !(", $)• Train the generator and discriminator jointly in a
minimax game
Theoretical analysis! ",$ = &!~#$%&% log$(+) + &'~#log(1 − $ "(0) )
• Assuming unlimited capacity for generator and discriminator and unlimited training data:• The objective min+ max./ 0,2 is equivalent to Jensen-
Shannon divergence between 34565 and 3789 and global optimum (Nash equilibrium) is given by 34565 = 3789
• If at each step, 2 is allowed to reach its optimum given 0, and 0 is updated to decrease / 0,2 , then 3789 with eventually converge to 34565
GAN training! ",$ = &!~#$%&% log$(+) + &'~#log(1 − $ "(0) )
• Alternate between• Gradient ascent on discriminator:
(∗ = arg max0 1 2,(• Gradient descent on generator (minimize log-probability
of discriminator being right):2∗ = arg min6 1 2,(
= arg min6 7'~#log(1 − ( 2(=) )• In practice, do gradient ascent on generator (maximize
log-probability of discriminator being wrong):2∗ = arg max6 7'~#log(( 2(=) )
GAN trainingmin$%&'~) log(1 − 0 1(2) ) vs. max$%&'~) log(0 1(2) )
log(1 − 0(1(2))
−log(0(1(2))
Want to learn from confidently rejected sample but gradients here are small
These samples already fool the discriminator so we don’t need large gradients here
Small gradients for high-quality samples
Large gradients for low-quality samples
Figure source
Low discriminator score (low-quality samples)
High discriminator score (high-quality samples)
GAN training algorithm• Update discriminator
• Repeat for ! steps:• Sample mini-batch of noise samples "#,… , "& and
mini-batch of real samples '#,… , '&• Update parameters of ( by stochastic gradient ascent on
1*+
&log(('&) + log(1 − ( 3("&) )
• Update generator• Sample mini-batch of noise samples "#,… , "&• Update parameters of 3 by stochastic gradient ascent on
1*+
&log( 3("&)
• Repeat until happy with results
GAN: Conceptual picture• Update discriminator: push !(#!"#") close to
1 and !(%(&)) close to 0• The generator is a “black box” to the discriminator
$ % &%($)
&(%($))
)!"#"
&()!"#")
GAN: Conceptual picture• Update generator: increase !(#($))
• Requires back-propagating through the composed generator-discriminator network (i.e., the discriminator cannot be a black box)
• The generator is exposed to real data only via the output of the discriminator (and its gradients)
! " # #("(!))"(!)
GAN: Conceptual picture• Test time
! " "(!)
Original GAN results
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, NIPS 2014
Nearest real image for sample to the left
MNIST digits Toronto Face Dataset
Original GAN results
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, NIPS 2014
CIFAR-10 (FC networks) CIFAR-10 (conv networks)
• Propose principles for designing convolutional architectures for GANs
• Generator architecture
DCGAN
A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, ICLR 2016
• Propose principles for designing convolutional architectures for GANs
• Generator architecture
DCGAN
A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, ICLR 2016
Four transposed convolution layers with ReLU activations
Tanh activations in the last layer
Uniformly distributed
input
Linear transformation
• Propose principles for designing convolutional
architectures for GANs
• Discriminator architecture
• Don’t use pooling, only strided convolutions
• Use Leaky ReLU activations (sparse gradients
cause problems for training)
• Use only one FC layer before the softmax output
• Use batch normalization after most layers
(in the generator also)
DCGAN
A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep
convolutional generative adversarial networks, ICLR 2016
DCGAN results
Generated bedrooms after one epoch
DCGAN results
Generated bedrooms after five epochs
DCGAN resultsGenerated bedrooms from reference implementation
Source: F. Fleuret
Notice repetition artifacts(analysis)
DCGAN resultsInterpolation between different points in the z space
DCGAN results• Vector arithmetic in the z space
DCGAN results• Vector arithmetic in the z space
DCGAN results• Pose transformation by adding a “turn” vector
Problems with GAN training• Stability
• Parameters can oscillate or diverge, generator loss does not correlate with sample quality
• Behavior very sensitive to hyper-parameter selection
Problems with GAN training• Mode collapse
• Generator ends up modeling only a small subset of the training data
Source
Source
Training tips and tricks• Make sure discriminator provides good gradients
to the generator even when it can confidently reject all generator samples• Use non-saturating generator loss (like the original GAN
paper)• Smooth discriminator targets for “real” data, e.g., use
target value 0.9 instead of 1 when training discriminator
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Training tips and tricks• Use class labels in the discriminator if available
• For a dataset with ! classes, have the discriminator perform ! + 1-class classification where ! + 1 is “fake”
• Empirically, this considerably improves the visual quality of generated samples, supposedly by getting discriminator to focus on features that are also important for humans
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Training tips and tricks• Virtual batch norm
• Batch norm generally helps with GAN training, but can result in artifacts in generated images due to mini-batch dependence
• Solution: compute mini-batch statistics from a fixed reference batch (plus the current sample)
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Source
First mini-batch: orange-tinted samples
Second mini-batch: green-tinted samples
Training tips and tricks• Mini-batch discrimination
• Discriminators that look at a single sample at a time cannot detect mode collapse (lack of diversity)
• Solution: augment the feature representation of a single sample with some features derived from the entire mini-batch
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Training tips and tricks• Resources
• T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
• S. Chintala, E. Denton, M. Arjovsky, M. Mathieu, How to train a GAN? Tips and tricks to make GANs work, 2016
• I. Goodfellow, NIPS 2016 Tutorial: Generative Adversarial Networks, arXiv 2017
Some popular GAN flavors• WGAN and improved WGAN• LSGAN
Wasserstein GAN• Motivated by Wasserstein or Earth mover’s distance,
which is an alternative to JS divergence for comparing distributions• In practice, simply drop the sigmoid from the discriminator:
min! max" &#~%&'('' ( − &)~%' *(,)
• Due to theoretical considerations, important to ensure smoothness of discriminator
• This paper’s suggested method is clipping weights to fixed range [−,, ,]
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, ICML 2017
• Benefits• Better gradients, more stable training
Wasserstein GAN
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, ICML 2017
Wasserstein GAN
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, ICML 2017
• Benefits• Better gradients, more stable training• Objective function value is more meaningfully related to
quality of generator output
Original GAN divergence WGAN divergence
Improved Wasserstein GAN• Weight clipping leads to problems with discriminator
training• Improved Wasserstein critic loss:
! !"~$%&'" #$ − !"~$(&)*" $
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved training of Wasserstein GANs, NIPS 2017
Unit norm gradient penalty on points +, obtained by interpolating
real and generated samples
+ ' ! +"~$-. ∇ +""(*$) / − 1 /
Improved Wasserstein GAN: Results
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved training of Wasserstein GANs, NIPS 2017
LSGAN• Use least squares cost for generator and
discriminator• Equivalent to minimizing Pearson !" divergence
!∗ = arg min$ )%~'()*) ! * − 1 " + )+~'(! /(0) )"
,∗ = arg min4 5+~'(7 , 8 − 1)"
Push discrim. response on real data close to 1
Push response on generated data close to 0
Push response on generated data close to 1
X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, Least squares generative adversarial networks, ICCV 2017
LSGAN• Supposedly produces higher-quality images
X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, Least squares generative adversarial networks, ICCV 2017
LSGAN• Supposedly produces higher-quality images• More stable and resistant to mode collapse
X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, Least squares generative adversarial networks, ICCV 2017
GAN explosion
https://github.com/hindupuravinash/the-gan-zoo
How to evaluate GANs?
• Showing pictures of samples is not enough,
especially for simpler datasets like MNIST,
CIFAR, faces, bedrooms, etc.
• We cannot directly compute the likelihoods
of high-dimensional samples (real or
generated), or compare their distributions
• Many GAN approaches claim mainly to
improve stability, which is hard to evaluate
• For discussion, see Ian Goodfellow Twitter
thread
GAN evaluation: Human studies• Turing test
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
GAN evaluation: Inception score• Key idea: generators should produce images with a
variety of recognizable object classes• Defined as !" # = exp ()~+ ,-(/(0|2) ∥ /(0) where
/ 0 2 is the posterior label distribution returned by an image classifier (e.g., InceptionNet) for sample 2• If 2 contains a recognizable object, entropy of /(0|2)
should be low• If generator generates images of diverse objects, the
marginal distribution /(0) should have high entropy• Disadvantage: a GAN that simply memorizes the training
data (overfitting) or outputs a single image per class (mode dropping) could still score well
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
GAN evaluation: Fréchet Inception Distance• Key idea: fit simple distributions to statistics of
feature activations for real and generated data; estimate divergence parametrically• Pass generated samples through a network
(InceptionNet), compute activations for a chosen layer• Estimate multivariate mean and covariance of activations,
compute Fréchet distance to those of real data
• Advantages: correlated with visual quality of samples and human judgment, can detect mode dropping
• Disadvantage: cannot detect overfitting
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs trained by a two time-scale update rule converge to a local Nash equilibrium, NIPS 2017
Are GANs created equal?• From the abstract:
“We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes … We did not find evidence that any of the tested algorithms consistently outperforms the non-saturating GAN introduced in Goodfellow et al. (2014)”
M. Lucic, K. Kurach, M. Michalski, O. Bousquet, S. Gelly, Are GANs created equal? A large-scale study, NIPS 2018