generative adversarial 2019-11-11آ  generative adversarial network generative adversarial network:...

Download Generative Adversarial 2019-11-11آ  Generative Adversarial Network Generative adversarial network: Setup

Post on 28-May-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Generative Adversarial Network

    Generative Adversarial Network

    Seoul National University Deep Learning September-December, 2019 1 / 38

  • Generative Adversarial Network

    Generative adversarial network (Goodfellow, 2014)

    Since publication of GAN by Goodfellow (2014), many applications are reported.

    Seoul National University Deep Learning September-December, 2019 2 / 38

  • Generative Adversarial Network

    Generative adversarial network (Goodfellow, 2014)

    Since publication of GAN by Goodfellow (2014), many variants of GAN have been published.

    Seoul National University Deep Learning September-December, 2019 3 / 38

  • Generative Adversarial Network

    Generative adversarial network

    Similar setup as in VAE. Attempt to generate x given z with a smaller dimension.

    Generative adversarial networks are based on a game theoretic scenario in which the generator network must compete against an adversary.

    The generator network produces samples x = g(z ; θg ) that attempts to fool the classifier into believing its samples are real. Its adversary, the discriminator network, attempts to distinguish between samples drawn from the training data and samples drawn from the generator through P(y = 1|x) = D(x).

    Seoul National University Deep Learning September-December, 2019 4 / 38

  • Generative Adversarial Network

    Generative adversarial network: Setup

    Let y = 1 if the data is real and y = 0 if the data is fake. We assume that there is a lower dimensional representation z of x .

    To generate data, one needs to know p(x |y = 1). If P(y=1)=P(y=0)=.5, we have

    P(y = 1|x) = p(x |y = 1) p(x |y = 1) + p(x |y = 0)

    In GAN, we specify P(y = 1|x) and p(x |y = 0) and estimate p(x |y = 1) by minimizing a distance between p(x |y = 0) and p(x |y = 1).

    Seoul National University Deep Learning September-December, 2019 5 / 38

  • Generative Adversarial Network

    Generative adversarial network

    The likelihood function based on y |x ∼ Ber(D(x)) is

    n∑ i=1

    {yi log(D(xi ; θd)) + (1− yi ) log(1− D(xi ; θd))}

    However, xi ’s are not observed for yi = 0 and we replace xi with g(zi ; θg ). z is not observed for y = 0, and the marginal likelihood is

    L(θd , θg ) = n∏

    i=1

    {(D(xi ; θd))yi∫ p(zi )(1− D(g(zi ; θg ); θd))(1−yi )dzi}

    Seoul National University Deep Learning September-December, 2019 6 / 38

  • Generative Adversarial Network

    Generative adversarial network: Discriminator and Generator

    Consider the following quantity

    v(θg , θd) = Ex∼pdata logD(x ; θd) + Ex∼pmodel log[1− D{g(z ; θg ); θd}],

    where pdata ≡ p(x |y = 1) and pmodel ≡ p(x |y = 0). Note that this is not the expected likelihood in usual sense.

    Optimization

    Discriminator: Maximize v(θg , θd) over θd given θg . Generator: Minimize maxθd v(θg , θd) Alternate Discriminator and Generator steps.

    Seoul National University Deep Learning September-December, 2019 7 / 38

  • Generative Adversarial Network

    Generative adversarial network

    Discriminator: Maximizing v(θg , θd) over θd is to estimate the discriminator. When function space of D(x) is not restricted, argmax of v over D is

    D∗(x) = pdata

    pdata + pmodel .

    Generator: Plugging in P(y = 1|x) = D∗(x) = p(x |y=1)p(x |y=1)+p(x |y=0) to v(θg , θd), we minimize

    v(θg , θd) =

    ∫ p(x |y = 1) log p(x |y = 1)

    p(x |y = 1) + p(x |y = 0) dx

    +

    ∫ p(x |y = 0) log p(x |y = 0)

    p(x |y = 1) + p(x |y = 0) dx

    = KL(pdata||p∗∗) + KL(pmodel ||p∗∗) + const

    where p∗∗ = (pdata + pmodel)/2

    Seoul National University Deep Learning September-December, 2019 8 / 38

  • Generative Adversarial Network

    Generative adversarial network

    source: https://poloclub.github.io/ganlab/

    Seoul National University Deep Learning September-December, 2019 9 / 38

  • Generative Adversarial Network

    AE, VAE and GAN

    AE VAE GAN

    estimation transformation conditional distribution of distribution through

    transformation

    specifying x = g(z ; θg ) none x = g(z ; θg ) transformation z = f (x ; θf )

    specifying none p(z), p(x |z) p(z) distributions and thus indirectly

    q(z |x) p(x) objective ‖x − g(f (x ; θf ); θg )‖2 KL Jensen-Shannon function divergence divergence

    Seoul National University Deep Learning September-December, 2019 10 / 38

  • Generative Adversarial Network

    GAN algorithm

    Seoul National University Deep Learning September-December, 2019 11 / 38

  • Generative Adversarial Network

    Implementation of GAN

    In practice, D(x) is restricted to neural network. We first maximize v(θg , θd) over θd to obtain θ

    ∗ d . Then,

    θ∗g = argminθgEx∼pmodel log[1− D{g(z ; θg ); θ ∗ d}]

    The i th contribution of Ex∼pmodel log[1− D{g(z ; θg ); θ∗d}] is stochastically evaluated by

    1

    M

    M∑ m=1

    log[1− D{g(z(m)i ; θg ); θ ∗ d}]

    where z (m) i is generated from p(z).

    In practice, minimizing Ex∼pmodel log[1− D{g(z ; θg ); θ∗d}] does not work well. Instead, one aims to maximize Ex∼pmodel log[D{g(z ; θg ); θ∗d}] over θg .

    Seoul National University Deep Learning September-December, 2019 12 / 38

  • Generative Adversarial Network

    GAN: Comments

    In a nutshell, GAN finds p(x |y = 0), pmodel , by minimizing KL divergence between pmodel and (pdata + pmodel)/2. Therefore the objective function is KL(pmodel ||p∗∗). GAN algorithm represents an example of how to minimize KL(pmodel ||p∗∗), namely finding P(y = 1|x) indexed by θd through ‘discriminator’.

    Game-theoretic arguments may be oversold since they are not essential for estimating the density. The role of discriminator is to determine the loss function for the generator, KL(pdata||p∗∗) + KL(pmodel ||p∗∗), Jensen-Shannon divergence. Jensen-Shannon divergence has advantage over KL divergence in that one can avoid the problem of non-overlapping support.

    Seoul National University Deep Learning September-December, 2019 13 / 38

  • Generative Adversarial Network

    Detailed Architecture of GAN: DCGAN (Radford et al. 2016, ICLR)

    Radford et al. 2016, ICLR (DCGAN)

    Seoul National University Deep Learning September-December, 2019 14 / 38

  • Generative Adversarial Network

    Detailed Architecture of GAN: DCGAN (Radford et al. 2016, ICLR)

    Radford et al. 2016, ICLR (DCGAN)

    Seoul National University Deep Learning September-December, 2019 15 / 38

  • Generative Adversarial Network

    Detailed Architecture of GAN: DCGAN (Radford et al. 2016, ICLR)

    Architectures for classification need modification for GAN’s.

    Some tips are given such as replacing max pooling by strided convolutional layers, using batch normalization, using ReLu for generator with tanh for output...

    Issues of unstable training remained.

    Seoul National University Deep Learning September-December, 2019 16 / 38

  • Generative Adversarial Network

    Many GAN’s

    Many ways of improving GAN’s (Sailmans et al. 2016)

    Many variants of GAN have been proposed.

    CycleGAN (Zhu et al., 2017): Domain transfer (input=horse, output=zebra) Text to image (Reed et al., 2016, ICML) Pix2pix (Isola et al., 2017, CVPR)

    WGAN (Arjovsky et al., 2017) is the most popular which uses Wasserstein distance metric to optimize the generating distribution.

    Seoul National University Deep Learning September-December, 2019 17 / 38

  • Generative Adversarial Network

    Wasserstein GAN: Distance

    If the real data distribution Pr of X admits a density and Pθ is the distribution of the parametrized density Pθ of g(Z ; θ) then, asymptotically, the likelihood inference amounts to minimizing the Kullback-Leibler divergence KL(Pr‖Pθ). When distributions are supported by low dimensional manifolds the KL distance is not defined.

    WGAN minimizes Wasserstein distance between Pr and Pθ.

    Seoul National University Deep Learning September-December, 2019 18 / 38

  • Generative Adversarial Network

    Distances between two distributions

    Total variation distance:

    δ(Pr ,Pθ) = sup A∈Σ |Pr (A)− Pθ(A)|

    where Σ denote the set of all the Borel subsets of a compact metric set, X .

    Kullback-Leibler divergence: KL(Pr‖Pθ) Jensen-Shannon divergence:

    JS(Pr ,Pθ) = KL(Pr‖Pm) + KL(Pθ‖Pm)

    where Pm = (Pr + Pθ)/2.

    Seoul National University Deep Learning September-December, 2019 19 / 38

  • Generative Adversarial Network

    Earth-mover distance

    Earth-mover distance or Wasserstein-1 distance:

    W (Pr ,Pθ) = inf γ∈Π(Pr ,Pθ)

    E(x ,y)∼γ [‖x − y‖]

    where Π(Pr ,Pθ) denotes the set of all joint distributions γ(x , y) whose marginal distributions are respectively Pr and Pθ.

    source: Cuturi and Solomon, 2017, NeurIPS tutorial, A Primer on optimal transport

    Seoul National University Deep Learning September-December, 2019 20 / 38

  • Generative Adversarial Network

    Optimal transport

    If we imagine the distributions as differ

Recommended

View more >