multi-objective training of generative adversarial

Multi-objective training of Generative AdversarialNetworks with multiple discriminators

Isabela Albuquerque∗, Joao Monteiro∗, Thang Doan, BreandanConsidine, Tiago Falk, and Ioannis Mitliagkas

∗Equal contribution

1 / 11

The multiple discriminators GAN setting

I Recent literature proposed to tackle GANs training instability*issues with multiple discriminators (Ds)

1. Generative multi-adversarial networks, Durugkar et al. (2016)2. Stabilizing GANs training with multiple random projections,

Neyshabur et al. (2017)3. Online Adaptative Curriculum Learning for GANs, Doan et al.

(2018)4. Domain Partitioning Network, Csaba et al. (2019)

*Mode-collapse or vanishing gradients

2 / 11

The multiple discriminators GAN setting

3 / 11

Our work

4 / 11

Our work

minLG (z) = [l1(z), l2(z), ..., lK (z)]T

I Each lk = −Ez∼pz logDk(G (z)) is the loss provided by thek-th discriminator

4 / 11

Our work

minLG (z) = [l1(z), l2(z), ..., lK (z)]T

I Multiple gradient descent (MGD) is a natural choice to solvethis problem

I But it might be too costly

I Alternative: maximize the hypervolume (HV) of a singlesolution

4 / 11

Multiple gradient descent

I Seeks a Pareto-stationary solutionI Two steps:

1. Find a common descent direction ∀lk1.1 Minimum norm element within the convex hull of all ∇lk(x)

2. Update the parameters with xt+1 = xt − λ w∗t

||w∗t ||

, where

w∗t = argmin||w||2, w =

K∑k=1

αk∇lk(xt),

s.t.K∑

αk = 1, αk ≥ 0 ∀k

5 / 11

Hypervolume maximization for training GANs

6 / 11

LG = − log

(η − lk)

LG = −K∑

log(η − lk) LD1

∂LG∂θ

η − lk

∂lk∂θ

6 / 11

LG = − log

(η − lk)

LG = −K∑

log(η − lk) LD1

∂LG∂θ

η − lk

∂lk∂θ

ηt = δmaxk{l tk}, δ > 1

6 / 11

MGD vs. HV maximization vs. Average loss minimization

I MGD seeks a Pareto-stationary solutionI xt+1 ≺ xt

I HV maximization seeks Pareto-optimal solutionsI HV(xt+1) > HV(xt)I For the single-solution case, central regions of the Pareto-front

are preferred

I Average loss minimization does not enforce equally goodindividual losses

I Might be problematic in case there is a trade-off betweendiscriminators

7 / 11

I Same architecture, hyperparameters, and initialization for allmethods

I 8 Ds, 100 epochs

I FID was calculated using a LeNet trained on MNIST until98% test accuracy

AVG GMAN HV MGDModel

0 250 500 750 1000 1250 1500 1750Wall-clock time until best FID (minutes)

HVGMANMGDAVG

8 / 11

Upscaled CIFAR-10 - Computational cost

I Different GANs with both 1 and 24 Ds + HV

I Same architecture and initialization for all methods

I Comparison of minimum FID obtained during training, alongwith computation cost in terms of time and space

# Disc. FID-ResNet FLOPS∗ Memory

DCGAN1 4.22 8e10 1292

24 1.89 5e11 5671

LSGAN1 4.55 8e10 1303

24 1.91 5e11 5682

HingeGAN1 6.17 8e10 1303

24 2.25 5e11 5682∗Floating point operations per second

I Additional cost → performance improvement

9 / 11

Cats 256× 256

10 / 11

Thank you!

Questions? Come to our poster! #4

Code: https://github.com/joaomonteirof/hGAN

11 / 11

multi-objective training of generative adversarial

Documents

generative adversarial networks (part...

multi-objective combinatorial generative adversarial...

emotigan: emoji art using generative adversarial...

lecture 19: generative adversarial...

triple generative adversarial...

generative adversarial perturbations - arxiv

conditioning of generative adversarial networks ·...

3d generative adversarial networks inference

generative adversarial networks (gans) - emanuele...

unsupervised learning – generative models (pixelrnns,...

comp9444 neural networks and deep learning 9b. generative...

generative adversarial...

discriminator-free generative adversarial attack

time-series generative adversarial networks

lecture 7: generative adversarial networks

anomigan: generative adversarial networks...

hypothetical exploration on generative adversarial...

adversarial distillation of bayesian neural network...

generative adversarial networks (gans) -...

generative adversarial transformers