multi-objective training of generative adversarial
TRANSCRIPT
Multi-objective training of Generative AdversarialNetworks with multiple discriminators
Isabela Albuquerque∗, Joao Monteiro∗, Thang Doan, BreandanConsidine, Tiago Falk, and Ioannis Mitliagkas
∗Equal contribution
1 / 11
The multiple discriminators GAN setting
I Recent literature proposed to tackle GANs training instability*issues with multiple discriminators (Ds)
1. Generative multi-adversarial networks, Durugkar et al. (2016)2. Stabilizing GANs training with multiple random projections,
Neyshabur et al. (2017)3. Online Adaptative Curriculum Learning for GANs, Doan et al.
(2018)4. Domain Partitioning Network, Csaba et al. (2019)
*Mode-collapse or vanishing gradients
2 / 11
The multiple discriminators GAN setting
3 / 11
Our work
4 / 11
Our work
minLG (z) = [l1(z), l2(z), ..., lK (z)]T
I Each lk = −Ez∼pz logDk(G (z)) is the loss provided by thek-th discriminator
4 / 11
Our work
minLG (z) = [l1(z), l2(z), ..., lK (z)]T
I Multiple gradient descent (MGD) is a natural choice to solvethis problem
I But it might be too costly
I Alternative: maximize the hypervolume (HV) of a singlesolution
4 / 11
Multiple gradient descent
I Seeks a Pareto-stationary solutionI Two steps:
1. Find a common descent direction ∀lk1.1 Minimum norm element within the convex hull of all ∇lk(x)
2. Update the parameters with xt+1 = xt − λ w∗t
||w∗t ||
, where
w∗t = argmin||w||2, w =
K∑k=1
αk∇lk(xt),
s.t.K∑
k=1
αk = 1, αk ≥ 0 ∀k
5 / 11
Hypervolume maximization for training GANs
LD1
LD2
l1
l2
η∗
LG
η
η
6 / 11
Hypervolume maximization for training GANs
LG = − log
(K∏
k=1
(η − lk)
)
LG = −K∑
k=1
log(η − lk) LD1
LD2
l1
l2
η∗
LG
η
η
∂LG∂θ
=K∑
k=1
1
η − lk
∂lk∂θ
6 / 11
Hypervolume maximization for training GANs
LG = − log
(K∏
k=1
(η − lk)
)
LG = −K∑
k=1
log(η − lk) LD1
LD2
l1
l2
η∗
LG
η
η
∂LG∂θ
=K∑
k=1
1
η − lk
∂lk∂θ
ηt = δmaxk{l tk}, δ > 1
6 / 11
MGD vs. HV maximization vs. Average loss minimization
I MGD seeks a Pareto-stationary solutionI xt+1 ≺ xt
I HV maximization seeks Pareto-optimal solutionsI HV(xt+1) > HV(xt)I For the single-solution case, central regions of the Pareto-front
are preferred
I Average loss minimization does not enforce equally goodindividual losses
I Might be problematic in case there is a trade-off betweendiscriminators
7 / 11
MNIST
I Same architecture, hyperparameters, and initialization for allmethods
I 8 Ds, 100 epochs
I FID was calculated using a LeNet trained on MNIST until98% test accuracy
2400
2500
AVG GMAN HV MGDModel
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
FID
- M
NIS
T
0 250 500 750 1000 1250 1500 1750Wall-clock time until best FID (minutes)
7
8
9
10
11
12
Best
FID
ach
ieve
d du
ring
trai
ning
HVGMANMGDAVG
8 / 11
Upscaled CIFAR-10 - Computational cost
I Different GANs with both 1 and 24 Ds + HV
I Same architecture and initialization for all methods
I Comparison of minimum FID obtained during training, alongwith computation cost in terms of time and space
# Disc. FID-ResNet FLOPS∗ Memory
DCGAN1 4.22 8e10 1292
24 1.89 5e11 5671
LSGAN1 4.55 8e10 1303
24 1.91 5e11 5682
HingeGAN1 6.17 8e10 1303
24 2.25 5e11 5682∗Floating point operations per second
I Additional cost → performance improvement
9 / 11
Cats 256× 256
10 / 11
Thank you!
Questions? Come to our poster! #4
Code: https://github.com/joaomonteirof/hGAN
11 / 11