Transcript
Page 1: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative Adversarial Network

Seoul National University Deep Learning September-December, 2019 1 / 38

Page 2: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network (Goodfellow, 2014)

Since publication of GAN by Goodfellow (2014), many applications arereported.

Seoul National University Deep Learning September-December, 2019 2 / 38

Page 3: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network (Goodfellow, 2014)

Since publication of GAN by Goodfellow (2014), many variants of GANhave been published.

Seoul National University Deep Learning September-December, 2019 3 / 38

Page 4: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network

Similar setup as in VAE. Attempt to generate x given z with a smallerdimension.

Generative adversarial networks are based on a game theoreticscenario in which the generator network must compete against anadversary.

The generator network produces samples x = g(z ; θg ) that attemptsto fool the classifier into believing its samples are real. Its adversary,the discriminator network, attempts to distinguish between samplesdrawn from the training data and samples drawn from the generatorthrough P(y = 1|x) = D(x).

Seoul National University Deep Learning September-December, 2019 4 / 38

Page 5: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network: Setup

Let y = 1 if the data is real and y = 0 if the data is fake. We assumethat there is a lower dimensional representation z of x .

To generate data, one needs to know p(x |y = 1).

If P(y=1)=P(y=0)=.5, we have

P(y = 1|x) =p(x |y = 1)

p(x |y = 1) + p(x |y = 0)

In GAN, we specify P(y = 1|x) and p(x |y = 0) and estimatep(x |y = 1) by minimizing a distance between p(x |y = 0) andp(x |y = 1).

Seoul National University Deep Learning September-December, 2019 5 / 38

Page 6: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network

The likelihood function based on y |x ∼ Ber(D(x)) is

n∑i=1

yi log(D(xi ; θd)) + (1− yi ) log(1− D(xi ; θd))

However, xi ’s are not observed for yi = 0 and we replace xi withg(zi ; θg ). z is not observed for y = 0, and the marginal likelihood is

L(θd , θg ) =n∏

i=1

(D(xi ; θd))yi∫p(zi )(1− D(g(zi ; θg ); θd))(1−yi )dzi

Seoul National University Deep Learning September-December, 2019 6 / 38

Page 7: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network: Discriminator andGenerator

Consider the following quantity

v(θg , θd) = Ex∼pdata logD(x ; θd) + Ex∼pmodellog[1− Dg(z ; θg ); θd],

where pdata ≡ p(x |y = 1) and pmodel ≡ p(x |y = 0). Note that this isnot the expected likelihood in usual sense.

Optimization

Discriminator: Maximize v(θg , θd) over θd given θg .Generator: Minimize maxθd v(θg , θd)Alternate Discriminator and Generator steps.

Seoul National University Deep Learning September-December, 2019 7 / 38

Page 8: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network

Discriminator: Maximizing v(θg , θd) over θd is to estimate thediscriminator. When function space of D(x) is not restricted, argmaxof v over D is

D∗(x) =pdata

pdata + pmodel.

Generator: Plugging in P(y = 1|x) = D∗(x) = p(x |y=1)p(x |y=1)+p(x |y=0) to

v(θg , θd), we minimize

v(θg , θd) =

∫p(x |y = 1) log

p(x |y = 1)

p(x |y = 1) + p(x |y = 0)dx

+

∫p(x |y = 0) log

p(x |y = 0)

p(x |y = 1) + p(x |y = 0)dx

= KL(pdata||p∗∗) + KL(pmodel ||p∗∗) + const

where p∗∗ = (pdata + pmodel)/2

Seoul National University Deep Learning September-December, 2019 8 / 38

Page 9: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Generative adversarial network

source: https://poloclub.github.io/ganlab/

Seoul National University Deep Learning September-December, 2019 9 / 38

Page 10: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

AE, VAE and GAN

AE VAE GAN

estimation transformation conditional distributionof distribution through

transformation

specifying x = g(z ; θg ) none x = g(z ; θg )transformation z = f (x ; θf )

specifying none p(z), p(x |z) p(z)distributions and thus indirectly

q(z |x) p(x)

objective ‖x − g(f (x ; θf ); θg )‖2 KL Jensen-Shannonfunction divergence divergence

Seoul National University Deep Learning September-December, 2019 10 / 38

Page 11: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

GAN algorithm

Seoul National University Deep Learning September-December, 2019 11 / 38

Page 12: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Implementation of GAN

In practice, D(x) is restricted to neural network. We first maximizev(θg , θd) over θd to obtain θ∗d . Then,

θ∗g = argminθgEx∼pmodellog[1− Dg(z ; θg ); θ∗d]

The i th contribution of Ex∼pmodellog[1− Dg(z ; θg ); θ∗d] is

stochastically evaluated by

1

M

M∑m=1

log[1− Dg(z(m)i ; θg ); θ∗d]

where z(m)i is generated from p(z).

In practice, minimizing Ex∼pmodellog[1− Dg(z ; θg ); θ∗d] does not

work well. Instead, one aims to maximizeEx∼pmodel

log[Dg(z ; θg ); θ∗d] over θg .

Seoul National University Deep Learning September-December, 2019 12 / 38

Page 13: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

GAN: Comments

In a nutshell, GAN finds p(x |y = 0), pmodel , by minimizing KLdivergence between pmodel and (pdata + pmodel)/2. Therefore theobjective function is KL(pmodel ||p∗∗).

GAN algorithm represents an example of how to minimizeKL(pmodel ||p∗∗), namely finding P(y = 1|x) indexed by θd through‘discriminator’.

Game-theoretic arguments may be oversold since they are notessential for estimating the density. The role of discriminator is todetermine the loss function for the generator,KL(pdata||p∗∗) + KL(pmodel ||p∗∗), Jensen-Shannon divergence.

Jensen-Shannon divergence has advantage over KL divergence in thatone can avoid the problem of non-overlapping support.

Seoul National University Deep Learning September-December, 2019 13 / 38

Page 14: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Detailed Architecture of GAN: DCGAN (Radford et al.2016, ICLR)

Radford et al. 2016, ICLR (DCGAN)

Seoul National University Deep Learning September-December, 2019 14 / 38

Page 15: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Detailed Architecture of GAN: DCGAN (Radford et al.2016, ICLR)

Radford et al. 2016, ICLR (DCGAN)

Seoul National University Deep Learning September-December, 2019 15 / 38

Page 16: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Detailed Architecture of GAN: DCGAN (Radford et al.2016, ICLR)

Architectures for classification need modification for GAN’s.

Some tips are given such as replacing max pooling by stridedconvolutional layers, using batch normalization, using ReLu forgenerator with tanh for output...

Issues of unstable training remained.

Seoul National University Deep Learning September-December, 2019 16 / 38

Page 17: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Many GAN’s

Many ways of improving GAN’s (Sailmans et al. 2016)

Many variants of GAN have been proposed.

CycleGAN (Zhu et al., 2017): Domain transfer (input=horse,output=zebra)Text to image (Reed et al., 2016, ICML)Pix2pix (Isola et al., 2017, CVPR)

WGAN (Arjovsky et al., 2017) is the most popular which usesWasserstein distance metric to optimize the generating distribution.

Seoul National University Deep Learning September-December, 2019 17 / 38

Page 18: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein GAN: Distance

If the real data distribution Pr of X admits a density and Pθ is thedistribution of the parametrized density Pθ of g(Z ; θ) then,asymptotically, the likelihood inference amounts to minimizing theKullback-Leibler divergence KL(Pr‖Pθ).

When distributions are supported by low dimensional manifolds theKL distance is not defined.

WGAN minimizes Wasserstein distance between Pr and Pθ.

Seoul National University Deep Learning September-December, 2019 18 / 38

Page 19: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Distances between two distributions

Total variation distance:

δ(Pr ,Pθ) = supA∈Σ|Pr (A)− Pθ(A)|

where Σ denote the set of all the Borel subsets of a compact metricset, X .

Kullback-Leibler divergence: KL(Pr‖Pθ)

Jensen-Shannon divergence:

JS(Pr ,Pθ) = KL(Pr‖Pm) + KL(Pθ‖Pm)

where Pm = (Pr + Pθ)/2.

Seoul National University Deep Learning September-December, 2019 19 / 38

Page 20: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Earth-mover distance

Earth-mover distance or Wasserstein-1 distance:

W (Pr ,Pθ) = infγ∈Π(Pr ,Pθ)

E(x ,y)∼γ [‖x − y‖]

where Π(Pr ,Pθ) denotes the set of all joint distributions γ(x , y)whose marginal distributions are respectively Pr and Pθ.

source: Cuturi and Solomon, 2017, NeurIPS tutorial, A Primer on optimal transport

Seoul National University Deep Learning September-December, 2019 20 / 38

Page 21: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Optimal transport

If we imagine the distributions as different heaps of a certain amountof earth, then the EMD is the minimal total amount of work it takesto transform one heap into the other. Work is defined as the amountof earth in a chunk times the distance it was moved.

Calculating the EMD is in itself an optimization problem: There areinfinitely many ways to move the earth around, and we need to findthe optimal one. We call the transport plan that we are trying to find.It simply states how we distribute the amount of earth from one placeover the domain of, or vice versa.

Seoul National University Deep Learning September-December, 2019 21 / 38

Page 22: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein distance

source: STAT36-708 Lecture note from Larry Wasserman

Distances may ignore the underlying geometry of the space. For threedensities p1, p2, p3, we have

∫p1−p2dx =

∫p1−p3dx =

∫p2−p3dx

and similarly for the other distances. But our intuition tells us that p1

and p2 are close together, which is captured in Wasserstein distance.

Seoul National University Deep Learning September-December, 2019 22 / 38

Page 23: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Distance: Example

Let Z be U[0, 1] the uniform distribution on the unit interval. Let P0

be the distribution of (0,Z ) ∈ R2, 0 on the x-axis and the randomvariable Z on the y -axis, uniform on a straight vertical line passingthrough the origin. Let Pθ be the distribution of (θ,Z ) Then,W (P0,Pθ) = |θ|

JS(P0,Pθ) =

log 2 if θ 6= 0

0 if θ = 0

KL(P0,Pθ) =

+∞ if θ 6= 0

0 if θ = 0

δ(P0,Pθ) =

1 if θ 6= 0

0 if θ = 0

Seoul National University Deep Learning September-December, 2019 23 / 38

Page 24: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Distance: Example

When θ → 0, the sequence (Pθt )t∈N converges to P0 under the EMdistance, but does not converge under the JS, KL, or TV divergences.

The KL, JS, and TV distances are not sensible loss functions whenlearning distributions supported by low dimensional manifolds.However the EM distance effectively captures the difference in thissetup.

Seoul National University Deep Learning September-December, 2019 24 / 38

Page 25: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Dual form of Wasserstein distance: Kantorovich-Rubinsteinduality

W (Pr ,Pθ) = infγ∈Π(Pr ,Pθ)

E(x ,y)∼γ [‖x − y‖]

= infγ

supf

E(x ,y)∼γ [‖x − y‖+ Es∼Pr ,[f (s)]− Et∼Pθ,[f (t)]− (f (x)− f (y))]

since

supf

E(x ,y)∼γ [Es∼Pr ,[f (s)]−Et∼Pθ,[f (t)]− (f (x)− f (y))] =

0, if γ ∈ Π

∞ otherwise

Using Simon’s minimax theorem, it can be shown that strong duality holds.

Seoul National University Deep Learning September-December, 2019 25 / 38

Page 26: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Kantorovich-Rubinstein duality-continued

Due to strong duality

infγ

supf

E(x ,y)∼γ [‖x − y‖+ Es∼Pr ,[f (s)]− Et∼Pθ,[f (t)]− (f (x)− f (y))]

= supf

infγE(x ,y)∼γ [‖x−y‖+Es∼Pr ,[f (s)]−Et∼Pθ,[f (t)]−(f (x)−f (y))]

= supf

[Es∼Pr ,[f (s)]−Et∼Pθ,[f (t)]+ inf

γE(x ,y)∼γ [‖x−y‖−(f (x)−f (y))]︸ ︷︷ ︸

=

0, if f∈|f (x1)−f (x2)|≤|x1−x2|−∞ otherwise

]

= supf ∈|f (x1)−f (x2)|≤|x1−x2|

[Es∼Pr ,[f (s)]−Et∼Pθ,[f (t)]

]

Seoul National University Deep Learning September-December, 2019 26 / 38

Page 27: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein GAN (Arjovsky et al., 2017)

Kantorovich-Rubinstein duality:

W (Pr ,Pθ) = supf ∈|f (x1)−f (x2)|≤|x1−x2|

Ex∼Pr [f (x)]− Ex∼Pθ[f (x)]

where the supremum is over all the 1-Lipschitz functions f : C → R.

To approximate computation of W (Pr ,Pθ), consider parametricfamily fw indexed by w and

maxw∈Ω

Ex∼Pr [fw (x)]− Ex∼Pθ[fw (gθ(z))]

WGAN trains a neural network parameterized with weights w lying ina compact space Ω and then backprop throughEz∼p(z)[∇θfw∗(gθ(z))], where w∗ is the argmax.

Seoul National University Deep Learning September-December, 2019 27 / 38

Page 28: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein GAN

Note that the fact that Ω is compact implies that all the functions fwwill be K−Lipschitz for some K .

In order to have parameters w lie in a compact space, clamp theweights, for example, Ω = [−0.01, 0.01]l after each gradient update.

Weight clipping is a poor but practical way to enforce a Lipschitzconstraint. A large clipping parameter may cause slow convergence,and a small clipping may lead to vanishing gradients.

Gulrajani et al. (2017): Add penalizing term, (‖∇xD(x)‖2 − K )2 toenforce lipschitz continuity

Seoul National University Deep Learning September-December, 2019 28 / 38

Page 29: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein GAN algorithm

Seoul National University Deep Learning September-December, 2019 29 / 38

Page 30: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Improved Wasserstein GAN algorithm (Gulrajani et al.,2017)

Seoul National University Deep Learning September-December, 2019 30 / 38

Page 31: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Family of distance functions for distributions: IntegralProbability Metrics

Given F and a set of functions from C → R, define

dF (P,Q) = supf ∈F

Ex∼P [f (x)]− Ex∼Q [f (x)],

called Integral Probability Metrics (IPM’s).

If F is the set of Lipschitz functions, dF (P,Q) is Wassersteindistance.

If F is the set of all measurable functions bounded between [−1, 1],dF (P,Q) is total variation distance.

If F = f ∈ H : ‖f ‖∞ ≤ 1 for some Reproducing Kernel HilbertSpace H, dF (P,Q) is the maximum mean discrepancy (MMD).

Seoul National University Deep Learning September-December, 2019 31 / 38

Page 32: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

WGAN vs. GAN

Seoul National University Deep Learning September-December, 2019 32 / 38

Page 33: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein Autoencoder (WAE)

Tolstikhin et al. (2018) proposed the Wasserstein Auto-Encoder(WAE) a new algorithm for building a generative model of the datadistribution. WAE minimizes a penalized form of the Wassersteindistance between the model distribution and the target distribution

The regularizer encourages the encoded training distribution to matchthe prior.

Comparing with WGAN, WAE uses the primal definition ofWasserstein distance.

Seoul National University Deep Learning September-December, 2019 33 / 38

Page 34: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

AE, VAE, WGAN and WAE

AE VAE GAN/WGAN WAE

Encoder Deterministic Stochastic none Stochastic

Decoder Deterministic Stochastic Deterministic Stochastic

P(z) no yes yes yes

Seoul National University Deep Learning September-December, 2019 34 / 38

Page 35: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein Autoencoder (WAE)

Let deterministic decoders, PG (X |Z ) map Z to X = G (Z ) for a givenG : Z → X . Q(Z |X ) is a conditional distribution of Z given X . LetPZ be prior and QZ (Z ) = EX∼PX

[Q(Z |X )]. Then

infΓ∈P(X∼,Y∼PG )

E(X ,Y )∼Γ[c(X ,Y )] = infQ:QZ=PZ

EPXEQ(Z |X )[c(X ,G (Z ))]

Objective function for WAE:

DWAE (PX ,PG ) = infQ(Z |X )∈Q

EPXEQ(Z |X )[c(X ,G (Z ))] + λDZ (QZ ,PZ ),

where Q is any nonparametric set of probablistic encoders, DZ is anarbitrary divergence between QZ and PZ , and λ > 0 is ahyperparameter.

Seoul National University Deep Learning September-December, 2019 35 / 38

Page 36: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

Wasserstein Autoencoder (WAE): Choices of penalty term

GAN-based DZ : Jensen-Shannon distance between QZandPZ ,DJS(QZ ,PZ ). Adversarial training: estimate γ by maximizingλn

∑ni=1 logDγ(zi ) + log(1− Dγ(zi ))

MMD-based DZ : For a positive-definite reproducing kernelk : Z ×Z → R, the maximum mean discrepancy (MMD) is defined as

MMDk(PZ ,QZ ) = ‖∫Zk(z , )dPZ (z)−

∫Zk(z , )dQZ (z)‖Hk

Seoul National University Deep Learning September-December, 2019 36 / 38

Page 37: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

WAE algorithms

Seoul National University Deep Learning September-December, 2019 37 / 38

Page 38: Generative Adversarial Networkstat.snu.ac.kr/mcp/Lecture10_GAN.pdf · 2019-11-11 · Generative Adversarial Network Generative adversarial network: Setup Let y = 1 if the data is

Generative Adversarial Network

WGAN, WAE-GAN and WAE-MMD

WGAN WAE-GAN WAE-MMD

p(x |z ; θ) x = g(z ; θg ) x = g(z ; θg ) x = g(z ; θg )

p(z) normal normal normal

q(z |x) none q(z |x) q(z |x)

P(y = 1|x) none P(y = 1|x) none

w-distance dual primal primal

critic for dual fw - -

primal - JS(p(z)‖q(z)) ‖∫k(z,)dP(z)

contraint -∫k(z,)dQ(z)‖H

Seoul National University Deep Learning September-December, 2019 38 / 38


Top Related