deep generative models i (dlai d9l2 2017 upc deep learning for artificial intelligence)

73
[course site] Santiago Pascual de la Puente [email protected] PhD Candidate Universitat Politecnica de Catalunya Technical University of Catalonia Deep Generative Models I #DLUPC

Upload: universitat-politecnica-de-catalunya

Post on 21-Jan-2018

124 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 1: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

[course site]

Santiago Pascual de la [email protected]

PhD CandidateUniversitat Politecnica de CatalunyaTechnical University of Catalonia

Deep Generative Models I

#DLUPC

Page 2: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Outline

●●●●●●●

2

Page 3: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Introduction

Page 4: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

What we are used to with Neural Nets

4

0

1

0

inputNetwork

output

class

Figure credit: Javier Ruiz

P(Y = [0,1,0] | X = [pixel1, pixel2, …, pixel784])

Page 5: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

What is a generative model?

5

X = {x1, x2, …, xN}

Page 6: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

What is a generative model?

6

X = {x1, x2, …, xN}

X = {x1, x2, …, xN} P(X) →

Page 7: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

What is a generative model?

7

X = {x1, x2, …, xN}Y = {x1, x2, …, xN}

DOGCAT

TRUCKPIZZA

THRILLER

SCI-FI

HISTORY

/aa//e/

/o/

Page 8: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

8

We want our model with parameters θ to output samples distributed Pmodel,

matching the distribution of our training data Pdata or P(X).

Example) y = f(x), where y is scalar, make Pmodel similar to Pdata by training

the parameters θ to maximize their similarity.

What is a generative model?

Page 9: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Key Idea: our model cares about what distribution generated the input data

points, and we want to mimic it with our probabilistic model. Our learned

model should be able to make up new samples from the distribution, not

just copy and paste existing samples!

9

What is a generative model?

Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)

Page 10: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Why Generative Models?

● Model very complex and high-dimensional distributions.

● Be able to generate realistic synthetic samples

○ possibly perform data augmentation

○ simulate possible futures for learning algorithms

● Fill the blanks in the data

● Manipulate real samples with the assistance of the generative model

○ Example: edit pictures with guidance

Page 11: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Motivating Applications

Page 12: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Image inpainting

12

Recover lost information/add enhancing details by learning the natural distribution of pixels.

original enhanced

Page 13: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Speech Enhancement

13

Recover lost information/add enhancing details by learning the natural distribution of audio samples.

original

enhanced

Page 14: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Speech Synthesis

14

Generate spontaneously new speech by learning its natural distribution along time.

Speech synthesizer

Page 15: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Image Generation

15

Generate spontaneously new images by learning their spatial distribution.

Image generator

Figure credit: I. Goodfellow

Page 16: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Super-resolution

16

Generate spontaneously new images by learning their spatial distribution.

Augment resolution introducing plausible details

(Ledig et al. 2016)

Page 17: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Generative Models Taxonomy

Page 18: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

18

Model the probability density function:

● Explicitly○ With tractable density → PixelRNN, PixelCNN and

Wavenet○ With approximate density → Variational

Auto-Encoders● Implicitly

○ Generative Adversarial Networks

Taxonomy

Page 19: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

PixelRNN, PixelCNN & Wavenet

Page 20: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Factorizing the joint distribution

20

Page 21: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Factorizing the joint distribution

21

Page 22: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Factorizing the joint distribution

○■

22

Page 23: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

PixelRNN

RNN/LSTM

(x, y)At every pixel, in a greyscale image: 256 classes (8 bits/pixel).

23

Recall RNN formulation:

Page 24: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Factorizing the joint distribution

○■■

24

Page 25: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Factorizing the joint distribution

○■■

A CNN? Whut??

25

Page 26: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

My CNN is going causal...

hi there how are you doing ?

100 dim

Let’s say we have a sequence of 100 dimensional vectors describing words.

sequence length = 8

arrangein 2D

100 d

im

sequence length = 8

,

Focus in 1D to exemplify causalitaion

26

Page 27: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

27

We can apply a 1D convolutional activation over the 2D matrix: for an arbitrari kernel of width=3

100 d

im

sequence length = 8

K1Each 1D convolutional kernel is a 2D matrix of size (3, 100)

100 d

im

My CNN is going causal...

Page 28: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

28

Keep in mind we are working with 100 dimensions although here we depict just one for simplicity

1 2 3 4 5 6 7 8

w1 w2 w3

w1 w2 w3

w1 w2 w3

w1 w2 w3

w1 w2 w3

1 2 3 4 5 6

w1 w2 w3

The length result of the convolution is well known to be:seqlength - kwidth + 1 = 8 - 3 + 1 = 6

So the output matrix will be (6, 100) because there was no padding

My CNN is going causal...

Page 29: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

29

My CNN is going causal...When we add zero padding, we normally do so on both sides of the sequence (as in image padding)

1 2 3 4 5 6 7 8

w1 w2 w3

w1 w2 w3

w1 w2 w3

w1 w2 w3

w1 w2 w3

1 2 3 4 5 6 7 8

w1 w2 w3

The length result of the convolution is well known to be:seqlength - kwidth + 1 = 10 - 3 + 1 = 8

So the output matrix will be (8, 100) because we had padding

0 0

w1 w2 w3

w1 w2 w3

Page 30: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

30

Add the zero padding just on the left side of the sequence, not symmetrically

1 2 3 4 5 6 7 8

w1 w2 w3

w1 w2 w3

w1 w2 w3

w1 w2 w3

w1 w2 w3

1 2 3 4 5 6 7 8

w1 w2 w3

The length result of the convolution is well known to be:seqlength - kwidth + 1 = 10 - 3 + 1 = 8

So the output matrix will be (8, 100) because we had padding

HOWEVER: now every time-step t depends on the two previous inputs as well as the current time-step → every output is causal

Roughly: We make a causal convolution by padding left the sequence with (kwidth - 1) zeros

0

w1 w2 w3

w1 w2 w3

0

My CNN went causal

Page 31: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

PixelCNN/Wavenet

31

Page 32: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

PixelCNN/Wavenet

receptive field hast to be T 32

Page 33: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

PixelCNN/Wavenet

receptive field hast to be T

Dilated convolutions help us reach a receptive field sufficiently large to emulate an RNN!

33

Page 34: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Conditional PixelCNN/Wavenet

34

Page 35: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Conditional PixelCNN/Wavenet

35

Page 36: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Conditional PixelCNN/Wavenet

36

Page 37: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoders

Page 38: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Auto-Encoder Neural Network

Autoencoders:● Predict at the output the

same input data.● Do not need labels

cx x^

Encode Decode

38

Page 39: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Auto-Encoder Neural Network

● Q: What generative thing can we do with an AE? How can we make it generate data?

c

Encode Decode

“Generate” 39

Page 40: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Auto-Encoder Neural Network

● Q: What generative thing can we do with an AE? How can we make it generate data?○ A: This “just” memorizes codes C from our training samples!

c

Encode Decode

“Generate”

memorization

40

Page 41: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder

VAE intuitively:● Introduce a restriction in z, such that our data points x (e.g. images)

are distributed in a latent space (manifold) following a specified probability density function Z (normally N(0, I)).

z

Encode Decode

z ~ N(0, I)

41

Page 42: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

VAE intuitively:● Introduce a restriction in z, such that our data points x (e.g. images)

are distributed in a latent space (manifold) following a specified probability density function Z (normally N(0, I)).

z

Encode Decode

z ~ N(0, I)

We can then sample z to generate new x data points (e.g. images).

Variational Auto-Encoder

42

Page 43: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder● VAE, aka. where Bayesian theory and deep learning collide, in depth:

Fixed parameters

Generate X, N times

sample z, N times

Deterministic function 43Credit:

Page 44: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder● VAE, aka. where Bayesian theory and deep learning collide, in depth:

Fixed parameters

Generate X, N times

sample z, N times

Deterministic function

Maximize the probability of each X under the generative process.

Maximum likelihood framework

Page 45: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderIntuition behind normally distributed z vectors: any output distribution can be achieved from the simple

N(0, I) with powerful non-linear mappings.

N(0, I)

45

Page 46: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderIntuition behind normally distributed z vectors: any output distribution can be achieved from the simple

N(0, I) with powerful non-linear mappings.

Who’s the strongest non-linear mapper in the universe?

46

Page 47: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder● VAE, aka. where Bayesian theory and deep learning collide, in depth:

Fixed parameters

Generate X, N times

sample z, N times

Deterministic function

Maximize the probability of each X under the generative process.

Maximum likelihood framework

Page 48: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderNow to solve the maximum likelihood problem… We’d like to know and . We

introduce as a key piece → sample values z likely to produce X, not just the whole

possibilities.

But is unkown too! Variational Inference comes in to play its role: approximate

with .

Key Idea behind the variational inference application: find an approximation function that is good enough to represent the real one → optimization problem.

48

Page 49: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderNeural network prespective

The approximated function starts to shape up as a neural encoder, going from training datapoints x to

the likely z points following , which in turn is similar to the real .

49Credit: Altosaar

Page 50: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderNeural network prespective

The (latent→ data) mapping starts to shape up as a neural decoder, where we go from our sampled z

to the reconstruction, which can have a very complex distribution.

50Credit: Altosaar

Page 51: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderContinuing with the encoder approximation , we compute the KL divergence with the true

distribution:

51

KL divergence

Credit: Kristiadis

Page 52: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderContinuing with the encoder approximation , we compute the KL divergence with the true

distribution:

Bayes rule

52Credit: Kristiadis

Page 53: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderContinuing with the encoder approximation , we compute the KL divergence with the true

distribution:

Gets out of expectation for no dependency over z.

53Credit: Kristiadis

Page 54: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderContinuing with the encoder approximation , we compute the KL divergence with the true

distribution:

54Credit: Kristiadis

Page 55: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderContinuing with the encoder approximation , we compute the KL divergence with the true

distribution:

A bit more rearranging with sign and grouping leads us to a new KL term between and , thus the encoder approximate distribution and our prior.

55

Credit: Kristiadis

Page 56: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderWe finally reach the Variational AutoEncoder objective function.

56Credit: Kristiadis

Page 57: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderWe finally reach the Variational AutoEncoder objective function.

log likelihood of our data

57Credit: Kristiadis

Page 58: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderWe finally reach the Variational AutoEncoder objective function.

Not computable and non-negative (KL) approximation error.

log likelihood of our data

58Credit: Kristiadis

Page 59: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderWe finally reach the Variational AutoEncoder objective function.

log likelihood of our data

Not computable and non-negative (KL) approximation error.

59

Reconstruction loss of our data given latent space → NEURAL DECODER reconstruction loss!

Credit: Kristiadis

Page 60: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderWe finally reach the Variational AutoEncoder objective function.

log likelihood of our data

Not computable and non-negative (KL) approximation error.

Reconstruction loss of our data given latent space → NEURAL DECODER reconstruction loss!

Regularization of our latent representation → NEURAL ENCODER projects over prior.

60Credit: Kristiadis

Page 61: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderNow, we have to define shape to compute its divergence against the prior (i.e. to properly

condense x samples over the surface of z). Simplest way: distribute over normal distribution with

moments: and .

This allows us to compute the KL-div with in a closed form :)

61Credit: Kristiadis

Page 62: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder

z

Encode Decode

We can compose our encoder - decoder setup, and place our VAE losses to regularize and reconstruct.

62

Page 63: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder

z

Encode Decode

Reparameterization trick

But WAIT, how can we backprop through sampling of ? Not differentiable!

63

Page 64: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder

z

Reparameterization trick

Sample and operate with it, multiplying by and summing

64

Page 65: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder

z

Generative behavior

Q: How can we now generate new samples once the underlying generating distribution is learned?

65

Page 66: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-Encoder

z1

Generative behavior

Q: How can we now generate new samples once the underlying generating distribution is learned?

A: We can sample from our prior, for example, discarding the encoder path.

z2

z3

66

Page 67: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

67

Variational Auto-Encoder

Walking around z manifold dimensions gives us spontaneous generation of samples with different

shapes, poses, identitites, lightning, etc..

Examples:

MNIST manifold: https://youtu.be/hgyB8RegAlQ

Face manifold: https://www.youtube.com/watch?v=XNZIN7Jh3Sg

Page 68: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

68

Variational Auto-Encoder

Walking around z manifold dimensions gives us spontaneous generation of samples with different

shapes, poses, identitites, lightning, etc..

Example with MNIST manifold

Page 69: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

69

Variational Auto-Encoder

Walking around z manifold dimensions gives us spontaneous generation of samples with different

shapes, poses, identitites, lightning, etc..

Example with Faces manifold

Page 70: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderCode show with PyTorch on VAEs!

https://github.com/pytorch/examples/tree/master/vae

70

Page 71: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto-EncoderModel

Loss

71

Page 72: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

Thanks! Questions?

Page 73: Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

References

73

● NIPS 2016 Tutorial: Generative Adversarial Networks (Goodfellow 2016)

● Pixel Recurrent Neural Networks (van den Oord et al. 2016)

● Conditional Image Generation with PixelCNN Decoders (van den Oord et al. 2016)

● Auto-Encoding Variational Bayes (Kingma & Welling 2013)

● https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/

● https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

● Tutorial on Variational Autoencoders (Doersch 2016)