multi-objective training of generative adversarial

Post on 24-May-2022

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multi-objective training of Generative AdversarialNetworks with multiple discriminators

Isabela Albuquerque∗, Joao Monteiro∗, Thang Doan, BreandanConsidine, Tiago Falk, and Ioannis Mitliagkas

∗Equal contribution

1 / 11

The multiple discriminators GAN setting

I Recent literature proposed to tackle GANs training instability*issues with multiple discriminators (Ds)

1. Generative multi-adversarial networks, Durugkar et al. (2016)2. Stabilizing GANs training with multiple random projections,

Neyshabur et al. (2017)3. Online Adaptative Curriculum Learning for GANs, Doan et al.

(2018)4. Domain Partitioning Network, Csaba et al. (2019)

*Mode-collapse or vanishing gradients

2 / 11

The multiple discriminators GAN setting

3 / 11

Our work

4 / 11

Our work

minLG (z) = [l1(z), l2(z), ..., lK (z)]T

I Each lk = −Ez∼pz logDk(G (z)) is the loss provided by thek-th discriminator

4 / 11

Our work

minLG (z) = [l1(z), l2(z), ..., lK (z)]T

I Multiple gradient descent (MGD) is a natural choice to solvethis problem

I But it might be too costly

I Alternative: maximize the hypervolume (HV) of a singlesolution

4 / 11

Multiple gradient descent

I Seeks a Pareto-stationary solutionI Two steps:

1. Find a common descent direction ∀lk1.1 Minimum norm element within the convex hull of all ∇lk(x)

2. Update the parameters with xt+1 = xt − λ w∗t

||w∗t ||

, where

w∗t = argmin||w||2, w =

K∑k=1

αk∇lk(xt),

s.t.K∑

k=1

αk = 1, αk ≥ 0 ∀k

5 / 11

Hypervolume maximization for training GANs

LD1

LD2

l1

l2

η∗

LG

η

η

6 / 11

Hypervolume maximization for training GANs

LG = − log

(K∏

k=1

(η − lk)

)

LG = −K∑

k=1

log(η − lk) LD1

LD2

l1

l2

η∗

LG

η

η

∂LG∂θ

=K∑

k=1

1

η − lk

∂lk∂θ

6 / 11

Hypervolume maximization for training GANs

LG = − log

(K∏

k=1

(η − lk)

)

LG = −K∑

k=1

log(η − lk) LD1

LD2

l1

l2

η∗

LG

η

η

∂LG∂θ

=K∑

k=1

1

η − lk

∂lk∂θ

ηt = δmaxk{l tk}, δ > 1

6 / 11

MGD vs. HV maximization vs. Average loss minimization

I MGD seeks a Pareto-stationary solutionI xt+1 ≺ xt

I HV maximization seeks Pareto-optimal solutionsI HV(xt+1) > HV(xt)I For the single-solution case, central regions of the Pareto-front

are preferred

I Average loss minimization does not enforce equally goodindividual losses

I Might be problematic in case there is a trade-off betweendiscriminators

7 / 11

MNIST

I Same architecture, hyperparameters, and initialization for allmethods

I 8 Ds, 100 epochs

I FID was calculated using a LeNet trained on MNIST until98% test accuracy

2400

2500

AVG GMAN HV MGDModel

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

FID

- M

NIS

T

0 250 500 750 1000 1250 1500 1750Wall-clock time until best FID (minutes)

7

8

9

10

11

12

Best

FID

ach

ieve

d du

ring

trai

ning

HVGMANMGDAVG

8 / 11

Upscaled CIFAR-10 - Computational cost

I Different GANs with both 1 and 24 Ds + HV

I Same architecture and initialization for all methods

I Comparison of minimum FID obtained during training, alongwith computation cost in terms of time and space

# Disc. FID-ResNet FLOPS∗ Memory

DCGAN1 4.22 8e10 1292

24 1.89 5e11 5671

LSGAN1 4.55 8e10 1303

24 1.91 5e11 5682

HingeGAN1 6.17 8e10 1303

24 2.25 5e11 5682∗Floating point operations per second

I Additional cost → performance improvement

9 / 11

Cats 256× 256

10 / 11

Thank you!

Questions? Come to our poster! #4

Code: https://github.com/joaomonteirof/hGAN

11 / 11

top related