variational autoencoders (vaes) - peoplepeople.math.gatech.edu/~yyang767/research/vae_yuqin.pdf ·...

36
Preliminaries Variational Autoencoders Extensions of VAEs Variational Autoencoders (VAEs) Yuqin Yang Wilson Lab Group Meeting Presentation September 26 & October 3, 2017 Yuqin Yang Wilson Lab Group Meeting Presentation Variational Autoencoders (VAEs)

Upload: others

Post on 28-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Variational Autoencoders (VAEs)

Yuqin Yang

Wilson Lab Group Meeting Presentation

September 26 & October 3, 2017

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 2: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Section 1

Preliminaries

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 3: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Kullback-Leibler divergence

KL divergence (continuous case)

p(x) and q(x) are two density distributions. Then theKL-divergence is defined as

KL(p||q) =Z

p(x) logp(x)

q(x)dx . (1.1)

By Jensen’s Inequality, KL(p||q) � 0, and the equation holds if andonly if p = q, almost everywhere.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 4: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Kullback-Leibler divergence

Special case: multivariate Gaussian distribution

Suppose k-dimensional variable p1 ⇠ N (µ1,⌃1), p2 ⇠ N (µ2,⌃2),then

KL(p1||p2) =1

2

log

det(⌃1)

det(⌃2)� k + tr(⌃�1

2 ⌃1) +

(µ2 � µ1)>⌃�1

2 (µ2 � µ1)i.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 5: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Variational Inference

Suppose we want to use Q(z) to approximate P(z |X ), wherep(z |X ) does not have a explicit representation, then a goodapproximation would try to minimize

KL(Q(z)||P(z |X )) =

ZQ(z) log

Q(z)

P(z |X )dz .

By Bayes’ formula, the above equation could be transferred into

logP(X )� KL(Q(z)||P(z |X )) =Z

Q(z) logP(X |z)dz � KL(Q(z)||P(z)).(1.2)

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 6: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Section 2

Variational Autoencoders

1

1A. B. L. Larsen et al. (2015). “Autoencoding beyond pixels using a learnedsimilarity metric”. In: arXiv preprint arXiv:1512.09300.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 7: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Original problem

Given a dataset X from a distribution P(x), we want to generatenew data that satisfies the unknown distribution P(x).

We construct a model f (z ; ✓) : Z ⇥⇥ ! X , where X is the spaceof observed variables (datas), Z the space of latent variables, ⇥the parameter space, and f a complex but deterministic mapping.

Latent Variables: Variables that are not directly observedbut are rather inferred from other directly observed variables.Given z , we can generate a sample X by f (z ; ✓).

We wish to optimize ✓ such that we can sample z from P(z) and,with high probability, f (z ; ✓) will be like the X s in our dataset.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 8: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Original problem

Given a dataset X from a distribution P(x), we want to generatenew data that satisfies the unknown distribution P(x).

We construct a model f (z ; ✓) : Z ⇥⇥ ! X , where X is the spaceof observed variables (datas), Z the space of latent variables, ⇥the parameter space, and f a complex but deterministic mapping.

Latent Variables: Variables that are not directly observedbut are rather inferred from other directly observed variables.Given z , we can generate a sample X by f (z ; ✓).

We wish to optimize ✓ such that we can sample z from P(z) and,with high probability, f (z ; ✓) will be like the X s in our dataset.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 9: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Original problem

Given a dataset X from a distribution P(x), we want to generatenew data that satisfies the unknown distribution P(x).

We construct a model f (z ; ✓) : Z ⇥⇥ ! X , where X is the spaceof observed variables (datas), Z the space of latent variables, ⇥the parameter space, and f a complex but deterministic mapping.

Latent Variables: Variables that are not directly observedbut are rather inferred from other directly observed variables.Given z , we can generate a sample X by f (z ; ✓).

We wish to optimize ✓ such that we can sample z from P(z) and,with high probability, f (z ; ✓) will be like the X s in our dataset.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 10: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Likelihood

P(X ; ✓) =

ZP(X |z ; ✓) P(z)dz (2.1)

Choose ✓ to maximize the above integral.

In VAEs, P(X |z ; ✓) ⇠ N (f (z ; ✓),�2 ⇤ I ) in continuous case, andP(X |z ; ✓) ⇠ B(f (z ; ✓)) in discrete case. In both cases, P(X |z ; ✓) iscontinuous with respect to theta, so we can use gradient ascent tomaximize ✓.Questions:

How to define the latent variable z to capture latentinformation?

How to deal with the integral over z , and its gradient withrespect to ✓?

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 11: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Likelihood

P(X ; ✓) =

ZP(X |z ; ✓) P(z)dz (2.1)

Choose ✓ to maximize the above integral.

In VAEs, P(X |z ; ✓) ⇠ N (f (z ; ✓),�2 ⇤ I ) in continuous case, andP(X |z ; ✓) ⇠ B(f (z ; ✓)) in discrete case. In both cases, P(X |z ; ✓) iscontinuous with respect to theta, so we can use gradient ascent tomaximize ✓.Questions:

How to define the latent variable z to capture latentinformation?

How to deal with the integral over z , and its gradient withrespect to ✓?

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 12: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Define latent variable

We want the latent variable satisfies these two properties:

The latent variables are chosen automatically, because we donot know too much about the intrinsic properties of X .

Di↵erent components of z are mutually independent, in orderto avoid the overlap in latent information.

VAEs asserts that the latent variable could be drawn fromstandard Gaussian distribution, N (0, I ).

Assertion

Any distribution in d dimensions can be generated by taking a setof d variables that are normally distributed and mapping themthrough a su�ciently complicated function.

Since f (z , ✓) is complicated enough (trained by neural network),this choice of latent variable will not matter too much.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 13: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Deal with the integral

P(X ; ✓) ⇡ 1

n

X

i

P(X |z(i); ✓), z(i) ⇠ N (0, I ).

Figure: Contradict Example. We need to set � very small, which willneed a very large dataset.

In this case, we need to choose a faster sampling procedure of z .Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 14: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Deal with the integral

Sampling in VAEs

The key idea behind the variational autoencoder is to attempt tosample values of z that are likely to have produced X , andcompute P(X ) just from those.

New function Q(z): gives us a distribution over z values that arelikely to produce X . Then EP(z)[P(X |z)] ! EQ(z)[P(X |z)].We can see that P(z |X ) is the optimum choice of Q(z), but P isintractable.

Aim:

Find a Q(z) which is an approximation of P(z |X ), with Q(z)simple enough.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 15: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Recall: Variational Inference

For any Q(z), use Q(z) to approximate P(z |X ). According toEquation (1.2),

logP(X )�KL(Q(z)||P(z |X )) = EQ(z)[logP(X |z)]�KL(Q(z)||P(z))

Since were interested in inferring P(X ), it makes sense toconstruct a Q which does depend on X :

logP(X )� KL(Q(z |X )||P(z |X ))

= EQ(z|X )[logP(X |z)]� KL(Q(z |X )||P(z)).(2.2)

Aim:

Maximize logP(x) (w.r.t. ✓), minimize KL(Q(z |X )||P(z |X )). ,Maximize LHS , Maximize RHS.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 16: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Second term of RHS

Aim:

Minimize KL(Q(z |X )||P(z)). We already have P(z) ⇠ N (0, I ).

The usual choice is to define Q(z |X ) ⇠ N (µ(X ;�),⌃(X ;�)),where µ and ⌃ are deterministic functions of X with parameters �.(We omit � in the following equations.) Besides, we constrain ⌃ tobe a diagonal matrix.

Minimization

According to previous equation of KL-divergence of multivariateGaussian distribution,

KL(Q(z |X )||P(z)) = 1

2(tr⌃(X )+(µ(X ))>(µ(X ))�k�log(det⌃(X ))).

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 17: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

First term of RHS

The maximization of the first item uses SGD. To approximate thedistribution, take a sample z from Q(z |X ), and

EQ(z|X )[logP(X |z)] ⇡ logP(X |z).

General Maximization function

EX⇠D [logP(X )� KL(Q(z |X )||P(z |X ))]

= EX⇠D [Ez⇠Q|X [logP(X |z)]� KL(Q(z |X )||P(z))].(2.3)

To use SGD, sample a value X and a value z , then compute thegradient of RHS by backpropagation. Do this for m times and takethe average to get the result converging to the gradient of RHS.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 18: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

First term of RHS

The maximization of the first item uses SGD. To approximate thedistribution, take a sample z from Q(z |X ), and

EQ(z|X )[logP(X |z)] ⇡ logP(X |z).

General Maximization function

EX⇠D [logP(X )� KL(Q(z |X )||P(z |X ))]

= EX⇠D [Ez⇠Q|X [logP(X |z)]� KL(Q(z |X )||P(z))].(2.3)

To use SGD, sample a value X and a value z , then compute thegradient of RHS by backpropagation. Do this for m times and takethe average to get the result converging to the gradient of RHS.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 19: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Figure: Flow chart for the VAE algorithm.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 20: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Significant Problems

The algorithm seems to be perfect, but there are two significantproblems during the calculation:

The gradient of first term of RHS in Equation (2.3) shouldhave included the parameters of both P and Q, but in oursampling method, we omit the parameters of Q. In this case,we cannot generate the true gradient of �.

The algorithm is separated into 2 parts: the first half train themodel Q(z |X ) by the given data X , the second half train themodel f by the newly-sampling data z . Thus thebackpropagation rule cannot cover this discontinuous point,making the algorthm fail.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 21: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Significant Problems

The algorithm seems to be perfect, but there are two significantproblems during the calculation:

The gradient of first term of RHS in Equation (2.3) shouldhave included the parameters of both P and Q, but in oursampling method, we omit the parameters of Q. In this case,we cannot generate the true gradient of �.

The algorithm is separated into 2 parts: the first half train themodel Q(z |X ) by the given data X , the second half train themodel f by the newly-sampling data z . Thus thebackpropagation rule cannot cover this discontinuous point,making the algorthm fail.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 22: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Modification by Reparameterization Trick

To solve the first problem, we need to change the way ofsampling. We firstly sample a ✏ ⇠ N (0, I ), then definez = ⌃(X )1/2✏+ µ(X ). It is just the equivalent representationof the sample z in previous algorithm, but now theoptimization function is changed into

EX⇠D [E✏⇠N (0,I )[logP(X |µ(X )+⌃(X )1/2✏)]�KL(Q(z |X )||P(z))].

This time the sampling function does not include our targetfunction.

Sample from Q(z |X ) by evaluating a function h(⌘,X ), where⌘ is an unobserved noise, and h continuous in X . (DiscreteQ(z |X ) fails in this case.) Then the backpropagation can beoperated successfully.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 23: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Modification by Reparameterization Trick

To solve the first problem, we need to change the way ofsampling. We firstly sample a ✏ ⇠ N (0, I ), then definez = ⌃(X )1/2✏+ µ(X ). It is just the equivalent representationof the sample z in previous algorithm, but now theoptimization function is changed into

EX⇠D [E✏⇠N (0,I )[logP(X |µ(X )+⌃(X )1/2✏)]�KL(Q(z |X )||P(z))].

This time the sampling function does not include our targetfunction.

Sample from Q(z |X ) by evaluating a function h(⌘,X ), where⌘ is an unobserved noise, and h continuous in X . (DiscreteQ(z |X ) fails in this case.) Then the backpropagation can beoperated successfully.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 24: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Figure: Flow chart for the corrected VAE algorithm.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 25: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Verification

For decoder:

We can just sample a random variable z ⇠ N (0, I ) and input itinto the decoder to find the f .

The probability P(X ) for a testing example X :

This is not tractable, because P is implicit.However, according to Equation (2.2), since KL divergence isnon-negative, we can find a lower bound of logP(X ), which iscalled Expectation of Lower BOund (ELBO) of P(X ). This lowerbound can be a useful tool for getting a rough idea of how well ourmodel is capturing a particular datapoint X , because its fastconvergence.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 26: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Verification

For decoder:

We can just sample a random variable z ⇠ N (0, I ) and input itinto the decoder to find the f .

The probability P(X ) for a testing example X :

This is not tractable, because P is implicit.However, according to Equation (2.2), since KL divergence isnon-negative, we can find a lower bound of logP(X ), which iscalled Expectation of Lower BOund (ELBO) of P(X ). This lowerbound can be a useful tool for getting a rough idea of how well ourmodel is capturing a particular datapoint X , because its fastconvergence.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 27: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Remarks

Detailed remarks are not presented here.

Interpretation of RHS. The two terms have their meanings ininformation theory.

Separate the RHS by sample.

Regularization term. It could be found by sometransformation on RHS.

Sampling for Q(z |X ). The original paper expresses thisdistribution with g(X , ✏), where ✏ ⇠ p✏ independently.Restriction on p✏ is needed.2

2D. P. Kingma and M. Welling (2013). “Auto-encoding variational bayes”.In: arXiv preprint arXiv:1312.6114.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 28: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Comparison Versus GAN

Section 3

Extensions of VAEs

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 29: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Comparison Versus GAN

Both are newly deep generative models.

The biggest advantage of VAEs is the nice probabilisticformulation they come with as a result of maximizing a lowerbound on the log-likelihood. Also, VAE is usually easier totrain and get working. Relatively easy to implement androbust to hyperparameter choices.

GANs are better at generating visual features. Sometimes theoutput of VAEs is vague.

More detailed discussions are shown on Reddit.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 30: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Conditional Variational Autoencoders

Original problem:

Given input dataset X and output Y , we want to create a modelP(Y |X ) which maximizes the probability of the ground truthdistribution. Example: Generating Hand-write digits. We want toadd digits to an existing string of digits written by a single person.

A standard regression model will fail in this situation, because itwill finally generate an “average image” with the minimum indistance, which may look like a meaningless blur.However, CVAEs allow us to tackle problems where theinput-to-output mapping is one-to-many, without requiring us toexplicitly specify the structure of the output distribution.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 31: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Conditional Variational Autoencoders

Original problem:

Given input dataset X and output Y , we want to create a modelP(Y |X ) which maximizes the probability of the ground truthdistribution. Example: Generating Hand-write digits. We want toadd digits to an existing string of digits written by a single person.

A standard regression model will fail in this situation, because itwill finally generate an “average image” with the minimum indistance, which may look like a meaningless blur.However, CVAEs allow us to tackle problems where theinput-to-output mapping is one-to-many, without requiring us toexplicitly specify the structure of the output distribution.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 32: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Conditional Variational Autoencoders

Figure: Flow chart for the CVAE algorithm.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 33: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Conditional Variational Autoencoders

P(Y |X ) = N (f (z ,X ),�2I );

logP(Y |X )� KL(Q(z |Y ,X )||P(z |Y ,X ))

= EQ(z|Y ,X )[logP(Y |z ,X )]� KL(Q(z |Y ,X )||P(z |X )).

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 34: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

VAE-GAN3

Combine a VAE with a GAN by collapsing the decoder and thegenerator into one, since they are both from standard Gaussiandistribution to X .

Figure: Overview of the VAE-GAN algorithm.

3A. B. L. Larsen et al. (2015). “Autoencoding beyond pixels using a learnedsimilarity metric”. In: arXiv preprint arXiv:1512.09300.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 35: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

Instead of analyzing the error element-wise, VAE-GANanalyses the error feature-wise, where the feature is generatedby Discriminator.

Share the parameters of Generator and Decoder together.

Optimize three kinds of errors simultaneously.

Figure: Flow of the VAE-GAN algorithm. Grey arrows represents theterms in the training objective.

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)

Page 36: Variational Autoencoders (VAEs) - Peoplepeople.math.gatech.edu/~yyang767/research/VAE_Yuqin.pdf · Preliminaries Variational Autoencoders Extensions of VAEs Deal with the integral

Preliminaries Variational Autoencoders Extensions of VAEs

That’s all. Thanks!

Yuqin Yang Wilson Lab Group Meeting Presentation

Variational Autoencoders (VAEs)