generative adversarial networks - university of...

Generative adversarial networks

BigGAN (2018)

EBGAN (2017)

https://arxiv.org/pdf/1809.11096.pdf


Outline• Generative tasks• Formulations

• Original GAN• DCGAN• WGAN, improved WGAN• LSGAN

• Issues• Stability• Mode collapse• Evaluation methodology

Generative tasks• Generation (from scratch): learn to sample from

the distribution represented by the training set• Unsupervised learning task

Generative tasks• Conditional generation

Figure source


Generative tasks• Image-to-image translation

P. Isola, J.-Y. Zhu, T. Zhou, A. Efros, Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017

https://phillipi.github.io/pix2pix/

Designing a network for generative tasks1. We need an architecture that can generate an

image• Recall upsampling architectures for dense prediction

Random seed or

latent code

Unconditional generation



Image-to-image translation



2. We need to design the right loss function

Learning to sample

Training data ! ~ #$%&% Generated samples ! ~ #'($)*

We want to learn !'($)* that matches !$%&%

Adapted from Stanford CS231n

http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture12.pdf

Generative adversarial networks• Train two networks with opposing objectives:

• Generator: learns to generate samples• Discriminator: learns to distinguish between

generated and real samples

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, NIPS 2014

!Random noise "

#“Fake”

#“Real”

Figure adapted from F. Fleuret

http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

https://fleuret.org/ee559/ee559-slides-10-1-GAN.pdf

GAN objective• The discriminator !(#) should output the probability

that the sample # is real • That is, we want !(#) to be close to 1 for real data

and close to 0 for fake• Expected conditional log likelihood for real and

generated data:%%~'()*) log!(#)

= %%~'()*) log!(#) + %+~'log(1 − ! -(.) )We seed the generator with noise ,drawn from a simple distribution -

(Gaussian or uniform)

+ %%~'./0 log 1 − ! #

GAN objective! ",$ = &!~#$%&% log$(+) + &'~#log(1 − $ "(0) )

• The discriminator wants to correctly distinguish real and fake samples:

$∗ = arg max) !(", $)• The generator wants to fool the discriminator:

"∗ = arg min* !(", $)• Train the generator and discriminator jointly in a

minimax game

Theoretical analysis! ",$ = &!~#$%&% log$(+) + &'~#log(1 − $ "(0) )

• Assuming unlimited capacity for generator and discriminator and unlimited training data:• The objective min+ max./ 0,2 is equivalent to Jensen-

Shannon divergence between 34565 and 3789 and global optimum (Nash equilibrium) is given by 34565 = 3789

• If at each step, 2 is allowed to reach its optimum given 0, and 0 is updated to decrease / 0,2 , then 3789 with eventually converge to 34565

GAN training! ",$ = &!~#$%&% log$(+) + &'~#log(1 − $ "(0) )

• Alternate between• Gradient ascent on discriminator:

(∗ = arg max0 1 2,(• Gradient descent on generator (minimize log-probability

of discriminator being right):2∗ = arg min6 1 2,(

= arg min6 7'~#log(1 − ( 2(=) )• In practice, do gradient ascent on generator (maximize

log-probability of discriminator being wrong):2∗ = arg max6 7'~#log(( 2(=) )

GAN trainingmin$%&'~) log(1 − 0 1(2) ) vs. max$%&'~) log(0 1(2) )

log(1 − 0(1(2))

−log(0(1(2))

Want to learn from confidently rejected sample but gradients here are small

These samples already fool the discriminator so we don’t need large gradients here

Small gradients for high-quality samples

Large gradients for low-quality samples

Figure source

Low discriminator score (low-quality samples)

High discriminator score (high-quality samples)

https://cs.uwaterloo.ca/~mli/Deep-Learning-2017-Lecture7GAN.ppt

GAN training algorithm• Update discriminator

• Repeat for ! steps:• Sample mini-batch of noise samples "#,… , "& and

mini-batch of real samples '#,… , '&• Update parameters of ( by stochastic gradient ascent on

1*+

&log(('&) + log(1 − ( 3("&) )

• Update generator• Sample mini-batch of noise samples "#,… , "&• Update parameters of 3 by stochastic gradient ascent on

1*+

&log( 3("&)

• Repeat until happy with results

GAN: Conceptual picture• Update discriminator: push !(#!"#") close to

1 and !(%(&)) close to 0• The generator is a “black box” to the discriminator

$ % &%($)

&(%($))

)!"#"

&()!"#")

GAN: Conceptual picture• Update generator: increase !(#($))

• Requires back-propagating through the composed generator-discriminator network (i.e., the discriminator cannot be a black box)

• The generator is exposed to real data only via the output of the discriminator (and its gradients)

! " # #("(!))"(!)

GAN: Conceptual picture• Test time

! " "(!)

Original GAN results


Nearest real image for sample to the left

MNIST digits Toronto Face Dataset


Original GAN results


CIFAR-10 (FC networks) CIFAR-10 (conv networks)


• Propose principles for designing convolutional architectures for GANs

• Generator architecture

DCGAN

A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, ICLR 2016


• Propose principles for designing convolutional architectures for GANs

• Generator architecture

DCGAN

A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, ICLR 2016

Four transposed convolution layers with ReLU activations

Tanh activations in the last layer

Uniformly distributed

input

Linear transformation


• Propose principles for designing convolutional

architectures for GANs

• Discriminator architecture

• Don’t use pooling, only strided convolutions

• Use Leaky ReLU activations (sparse gradients

cause problems for training)

• Use only one FC layer before the softmax output

• Use batch normalization after most layers

(in the generator also)

DCGAN

A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep

convolutional generative adversarial networks, ICLR 2016


DCGAN results

Generated bedrooms after one epoch

DCGAN results

Generated bedrooms after five epochs

DCGAN resultsGenerated bedrooms from reference implementation

Source: F. Fleuret

Notice repetition artifacts(analysis)

http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture12.pdf

https://distill.pub/2016/deconv-checkerboard/

DCGAN resultsInterpolation between different points in the z space

DCGAN results• Vector arithmetic in the z space

DCGAN results• Pose transformation by adding a “turn” vector

Problems with GAN training• Stability

• Parameters can oscillate or diverge, generator loss does not correlate with sample quality

• Behavior very sensitive to hyper-parameter selection

Problems with GAN training• Mode collapse

• Generator ends up modeling only a small subset of the training data

Source

Source

https://medium.com/@jonathan_hui/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b


Training tips and tricks• Make sure discriminator provides good gradients

to the generator even when it can confidently reject all generator samples• Use non-saturating generator loss (like the original GAN

paper)• Smooth discriminator targets for “real” data, e.g., use

target value 0.9 instead of 1 when training discriminator

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016

https://papers.nips.cc/paper/6125-improved-techniques-for-training-gans.pdf

Training tips and tricks• Use class labels in the discriminator if available

• For a dataset with ! classes, have the discriminator perform ! + 1-class classification where ! + 1 is “fake”

• Empirically, this considerably improves the visual quality of generated samples, supposedly by getting discriminator to focus on features that are also important for humans



Training tips and tricks• Virtual batch norm

• Batch norm generally helps with GAN training, but can result in artifacts in generated images due to mini-batch dependence

• Solution: compute mini-batch statistics from a fixed reference batch (plus the current sample)


Source

First mini-batch: orange-tinted samples

Second mini-batch: green-tinted samples



Training tips and tricks• Mini-batch discrimination

• Discriminators that look at a single sample at a time cannot detect mode collapse (lack of diversity)

• Solution: augment the feature representation of a single sample with some features derived from the entire mini-batch



Training tips and tricks• Resources

• T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016

• S. Chintala, E. Denton, M. Arjovsky, M. Mathieu, How to train a GAN? Tips and tricks to make GANs work, 2016

• I. Goodfellow, NIPS 2016 Tutorial: Generative Adversarial Networks, arXiv 2017


https://github.com/soumith/ganhacks


Some popular GAN flavors• WGAN and improved WGAN• LSGAN

Wasserstein GAN• Motivated by Wasserstein or Earth mover’s distance,

which is an alternative to JS divergence for comparing distributions• In practice, simply drop the sigmoid from the discriminator:

min! max" &#~%&'('' ( − &)~%' *(,)

• Due to theoretical considerations, important to ensure smoothness of discriminator

• This paper’s suggested method is clipping weights to fixed range [−,, ,]

M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, ICML 2017


• Benefits• Better gradients, more stable training

Wasserstein GAN



Wasserstein GAN


• Benefits• Better gradients, more stable training• Objective function value is more meaningfully related to

quality of generator output

Original GAN divergence WGAN divergence


Improved Wasserstein GAN• Weight clipping leads to problems with discriminator

training• Improved Wasserstein critic loss:

! !"~$%&'" #$ − !"~$(&)*" $

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved training of Wasserstein GANs, NIPS 2017

Unit norm gradient penalty on points +, obtained by interpolating

real and generated samples

+ ' ! +"~$-. ∇ +""(*$) / − 1 /


Improved Wasserstein GAN: Results

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved training of Wasserstein GANs, NIPS 2017


LSGAN• Use least squares cost for generator and

discriminator• Equivalent to minimizing Pearson !" divergence

!∗ = arg min$ )%~'()*) ! * − 1 " + )+~'(! /(0) )"

,∗ = arg min4 5+~'(7 , 8 − 1)"

Push discrim. response on real data close to 1

Push response on generated data close to 0

Push response on generated data close to 1

X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, Least squares generative adversarial networks, ICCV 2017

http://openaccess.thecvf.com/content_ICCV_2017/papers/Mao_Least_Squares_Generative_ICCV_2017_paper.pdf

LSGAN• Supposedly produces higher-quality images



LSGAN• Supposedly produces higher-quality images• More stable and resistant to mode collapse



GAN explosion

https://github.com/hindupuravinash/the-gan-zoo

https://github.com/hindupuravinash/the-gan-zoo

How to evaluate GANs?

• Showing pictures of samples is not enough,

especially for simpler datasets like MNIST,

CIFAR, faces, bedrooms, etc.

• We cannot directly compute the likelihoods

of high-dimensional samples (real or

generated), or compare their distributions

• Many GAN approaches claim mainly to

improve stability, which is hard to evaluate

• For discussion, see Ian Goodfellow Twitter

thread

https://twitter.com/goodfellow_ian/status/978339478560415744

GAN evaluation: Human studies• Turing test



GAN evaluation: Inception score• Key idea: generators should produce images with a

variety of recognizable object classes• Defined as !" # = exp ()~+ ,-(/(0|2) ∥ /(0) where

/ 0 2 is the posterior label distribution returned by an image classifier (e.g., InceptionNet) for sample 2• If 2 contains a recognizable object, entropy of /(0|2)

should be low• If generator generates images of diverse objects, the

marginal distribution /(0) should have high entropy• Disadvantage: a GAN that simply memorizes the training

data (overfitting) or outputs a single image per class (mode dropping) could still score well



GAN evaluation: Fréchet Inception Distance• Key idea: fit simple distributions to statistics of

feature activations for real and generated data; estimate divergence parametrically• Pass generated samples through a network

(InceptionNet), compute activations for a chosen layer• Estimate multivariate mean and covariance of activations,

compute Fréchet distance to those of real data

• Advantages: correlated with visual quality of samples and human judgment, can detect mode dropping

• Disadvantage: cannot detect overfitting

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs trained by a two time-scale update rule converge to a local Nash equilibrium, NIPS 2017


Are GANs created equal?• From the abstract:

“We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes … We did not find evidence that any of the tested algorithms consistently outperforms the non-saturating GAN introduced in Goodfellow et al. (2014)”

M. Lucic, K. Kurach, M. Michalski, O. Bousquet, S. Gelly, Are GANs created equal? A large-scale study, NIPS 2018


generative adversarial networks - university of...

Documents