lecture 10: autoencoders

55
Lecture 10: Autoencoders Madhavan Mukund and Pranabendu Misra Advanced Machine Learning 2021

Upload: others

Post on 08-Dec-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 10: Autoencoders

Lecture 10: Autoencoders

Madhavan Mukund and Pranabendu Misra

Advanced Machine Learning 2021

Page 2: Lecture 10: Autoencoders

Learning (Sensible) Compressed Representations

Learning a sensible and compressed representation of a given dataset is an important task:

Given an image, learn about the objects in the image

Then we can add / remove / modify the objects in the image, thereby generating a newimageWe can generate similar imagesWe can create sensible interpolation between two images

Given an music file, learn about the instruments in the music

Remove noise and distortionsAdd or remove other instrumentsand so on ...

This is an unsupervised learning task

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 2 / 28

Page 3: Lecture 10: Autoencoders

Learning (Sensible) Compressed Representations

Learning a sensible and compressed representation of a given dataset is an important task:

Given an image, learn about the objects in the image

Then we can add / remove / modify the objects in the image, thereby generating a newimageWe can generate similar imagesWe can create sensible interpolation between two images

Given an music file, learn about the instruments in the music

Remove noise and distortionsAdd or remove other instrumentsand so on ...

This is an unsupervised learning task

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 2 / 28

Page 4: Lecture 10: Autoencoders

Consider the following two images

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 3 / 28

Page 5: Lecture 10: Autoencoders

A usual interpolation between the two images could be:

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 4 / 28

Page 6: Lecture 10: Autoencoders

A more sensible interpolation between the two images would be:

This is something that, e.g. a child will imagine, a dog transforming into bird looks like.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 5 / 28

Page 7: Lecture 10: Autoencoders

A more sensible interpolation between the two images would be:

This is something that, e.g. a child will imagine, a dog transforming into bird looks like.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 5 / 28

Page 8: Lecture 10: Autoencoders

Example: shifting the image

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 6 / 28

Page 9: Lecture 10: Autoencoders

Example: Night to Day

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 7 / 28

Page 10: Lecture 10: Autoencoders

Super Resolution

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 8 / 28

Page 11: Lecture 10: Autoencoders

Caption to Image:

Input Caption: A bright blue bird with white belly

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 9 / 28

Page 12: Lecture 10: Autoencoders

Autoencoders

The idea is to learn how to encode the inputinto a compressed form, and then re-generatethe input from the encoding

An encoder network learns how to map theinput x to a code h in the latent space of farsmaller dimension.

And at the same time, a decoder networklearns to map the code h back to anapproximate from of the input x̂

This is an Unsupervised Learning task.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 10 / 28

Page 13: Lecture 10: Autoencoders

Autoencoders

The idea is to learn how to encode the inputinto a compressed form, and then re-generatethe input from the encoding

An encoder network learns how to map theinput x to a code h in the latent space of farsmaller dimension.

And at the same time, a decoder networklearns to map the code h back to anapproximate from of the input x̂

This is an Unsupervised Learning task.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 10 / 28

Page 14: Lecture 10: Autoencoders

Autoencoders

The idea is to learn how to encode the inputinto a compressed form, and then re-generatethe input from the encoding

An encoder network learns how to map theinput x to a code h in the latent space of farsmaller dimension.

And at the same time, a decoder networklearns to map the code h back to anapproximate from of the input x̂

This is an Unsupervised Learning task.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 10 / 28

Page 15: Lecture 10: Autoencoders

Autoencoders

The idea is to learn how to encode the inputinto a compressed form, and then re-generatethe input from the encoding

An encoder network learns how to map theinput x to a code h in the latent space of farsmaller dimension.

And at the same time, a decoder networklearns to map the code h back to anapproximate from of the input x̂

This is an Unsupervised Learning task.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 10 / 28

Page 16: Lecture 10: Autoencoders

Autoencoders

In equations:

h = f (Whx)

x̂ = g(Wxh)

Here f and g are the non-linear activationfunctions

Without these non-linear activations, andWx = W>

h the above equations would definePCA (Principal Component Analysis), wherethe dimension of the vector h is the number ofprincipal components.

So Autoencoders can be thought of as ageneralization of PCA

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 11 / 28

Page 17: Lecture 10: Autoencoders

Autoencoders

In equations:

h = f (Whx)

x̂ = g(Wxh)

Here f and g are the non-linear activationfunctions

Without these non-linear activations, andWx = W>

h the above equations would definePCA (Principal Component Analysis), wherethe dimension of the vector h is the number ofprincipal components.

So Autoencoders can be thought of as ageneralization of PCA

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 11 / 28

Page 18: Lecture 10: Autoencoders

Autoencoders

In equations:

h = f (Whx)

x̂ = g(Wxh)

Here f and g are the non-linear activationfunctions

Without these non-linear activations, andWx = W>

h the above equations would definePCA (Principal Component Analysis), wherethe dimension of the vector h is the number ofprincipal components.

So Autoencoders can be thought of as ageneralization of PCA

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 11 / 28

Page 19: Lecture 10: Autoencoders

Autoencoders

Loss functions:

Mean Squared Error: `(x , x̂) = 12‖x − x̂‖2

Suitable for x ∈ Rn, e.g. an image.

Cross-Entropy loss:

`(x , x̂) = −n∑

i=1

xi log(x̂i ) + (1− x̂i ) log(1− xi )

suitable for categorical x , e.g. binary.

Reconstruction Errors

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 12 / 28

Page 20: Lecture 10: Autoencoders

Autoencoders

Loss functions:

Mean Squared Error: `(x , x̂) = 12‖x − x̂‖2

Suitable for x ∈ Rn, e.g. an image.

Cross-Entropy loss:

`(x , x̂) = −n∑

i=1

xi log(x̂i ) + (1− x̂i ) log(1− xi )

suitable for categorical x , e.g. binary.

Clearly, minimizing the loss-function meansthat the encoder and decoder learn to compressand reconstruct the inputs in the training set.

We can then validate using the test set.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 13 / 28

Page 21: Lecture 10: Autoencoders

Autoencoders

Loss functions:

Mean Squared Error: `(x , x̂) = 12‖x − x̂‖2

Suitable for x ∈ Rn, e.g. an image.

Cross-Entropy loss:

`(x , x̂) = −n∑

i=1

xi log(x̂i ) + (1− x̂i ) log(1− xi )

suitable for categorical x , e.g. binary.

Clearly, minimizing the loss-function meansthat the encoder and decoder learn to compressand reconstruct the inputs in the training set.

We can then validate using the test set.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 13 / 28

Page 22: Lecture 10: Autoencoders

The outputs generated by an autoencoder trained on MNIST.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 14 / 28

Page 23: Lecture 10: Autoencoders

Autoencoders: Variants

De-noising Autoencoder:

The input to the encoder is x + r where r issome small amount of random noise.

The Loss Function still compares x and x̂

The idea is the encoder will learn to separateout the noise r from x , thus learning usefulfeatures of the input and generating x̂ fromthose.

This filters out some of the noise that waspresent in the input data.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 15 / 28

Page 24: Lecture 10: Autoencoders

Autoencoders: Variants

De-noising Autoencoder:

The input to the encoder is x + r where r issome small amount of random noise.

The Loss Function still compares x and x̂

The idea is the encoder will learn to separateout the noise r from x , thus learning usefulfeatures of the input and generating x̂ fromthose.

This filters out some of the noise that waspresent in the input data.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 15 / 28

Page 25: Lecture 10: Autoencoders

Autoencoders: Variants

De-noising Autoencoder:

The input to the encoder is x + r where r issome small amount of random noise.

The Loss Function still compares x and x̂

The idea is the encoder will learn to separateout the noise r from x , thus learning usefulfeatures of the input and generating x̂ fromthose.

This filters out some of the noise that waspresent in the input data.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 15 / 28

Page 26: Lecture 10: Autoencoders

Autoencoders: Variants

De-noising Autoencoder:

The input to the encoder is x + r where r issome small amount of random noise.

The Loss Function still compares x and x̂

The idea is the encoder will learn to separateout the noise r from x , thus learning usefulfeatures of the input and generating x̂ fromthose.

This filters out some of the noise that waspresent in the input data.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 15 / 28

Page 27: Lecture 10: Autoencoders

Autoencoders: Variants

Contractive Autoencoder:

Enforces the notion that two similar inputs xand x ′ should map to similar codes h and h′ inthe latent space.

The Loss Function has an extra Regularizationterm, which is λ · ‖Jh(x)‖2

Recall that Jh(x), the Jacobian, is the matrixof partial derivatives of h w.r.t. x .

So the above term in the loss forces smallerderivatives for h. Hence, small changes in xshould lead to small changes in h.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 16 / 28

Page 28: Lecture 10: Autoencoders

Autoencoders: Variants

Contractive Autoencoder:

Enforces the notion that two similar inputs xand x ′ should map to similar codes h and h′ inthe latent space.

The Loss Function has an extra Regularizationterm, which is λ · ‖Jh(x)‖2

Recall that Jh(x), the Jacobian, is the matrixof partial derivatives of h w.r.t. x .

So the above term in the loss forces smallerderivatives for h. Hence, small changes in xshould lead to small changes in h.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 16 / 28

Page 29: Lecture 10: Autoencoders

Autoencoders: Variants

Contractive Autoencoder:

Enforces the notion that two similar inputs xand x ′ should map to similar codes h and h′ inthe latent space.

The Loss Function has an extra Regularizationterm, which is λ · ‖Jh(x)‖2

Recall that Jh(x), the Jacobian, is the matrixof partial derivatives of h w.r.t. x .

So the above term in the loss forces smallerderivatives for h. Hence, small changes in xshould lead to small changes in h.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 16 / 28

Page 30: Lecture 10: Autoencoders

Autoencoders: The Latent Space

The input data-points x are mappedto points h in the latent-space.

We choose the dimension of thelatent-space to reflect therepresentational dimension of theinput data.e.g. for a data-set of human faces, alatent space dimension of 50 would

be reasonable.

Is the latent space sensible? If wesample a point from the latent-space,will the decoder produce aninteresting image out of it?

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 17 / 28

Page 31: Lecture 10: Autoencoders

Autoencoders: The Latent Space

The input data-points x are mappedto points h in the latent-space.

We choose the dimension of thelatent-space to reflect therepresentational dimension of theinput data.e.g. for a data-set of human faces, alatent space dimension of 50 would

be reasonable.

Is the latent space sensible? If wesample a point from the latent-space,will the decoder produce aninteresting image out of it?

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 17 / 28

Page 32: Lecture 10: Autoencoders

Autoencoders: The Latent Space

The input data-points x are mappedto points h in the latent-space.

We choose the dimension of thelatent-space to reflect therepresentational dimension of theinput data.e.g. for a data-set of human faces, alatent space dimension of 50 would

be reasonable.

Is the latent space sensible? If wesample a point from the latent-space,will the decoder produce aninteresting image out of it?

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 17 / 28

Page 33: Lecture 10: Autoencoders

Autoencoders: Data generation from random point in Latent Space?

Expectation:

Reality:

Turns out that auto-encoders learn something like jpeg compression, most randomlatent-space points are meaning-less.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 18 / 28

Page 34: Lecture 10: Autoencoders

Autoencoders: Data generation from random point in Latent Space?

Expectation:

Reality:

Turns out that auto-encoders learn something like jpeg compression, most randomlatent-space points are meaning-less.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 18 / 28

Page 35: Lecture 10: Autoencoders

Autoencoders: Data generation from random point in Latent Space?

Expectation:

Reality:

Turns out that auto-encoders learn something like jpeg compression, most randomlatent-space points are meaning-less.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 18 / 28

Page 36: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

The generation aspect is at the center of this model.

The encoder maps x to a code h whichhas two parts E (z) and V (z).

E and V stand for expectation and variance

The next step is to sample a point z froma Gaussian distribution with expectationE (z) and variance V (z).

z = E (z) + ε�√V (z)

where ε ∼ N(0, Id) and d is the dimensionof the latent-space.

Finally, the decoder maps this randomlysampled point z to x̂

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 19 / 28

Page 37: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

The generation aspect is at the center of this model.

The encoder maps x to a code h whichhas two parts E (z) and V (z).

E and V stand for expectation and variance

The next step is to sample a point z froma Gaussian distribution with expectationE (z) and variance V (z).

z = E (z) + ε�√V (z)

where ε ∼ N(0, Id) and d is the dimensionof the latent-space.

Finally, the decoder maps this randomlysampled point z to x̂

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 19 / 28

Page 38: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

The generation aspect is at the center of this model.

The encoder maps x to a code h whichhas two parts E (z) and V (z).

E and V stand for expectation and variance

The next step is to sample a point z froma Gaussian distribution with expectationE (z) and variance V (z).

z = E (z) + ε�√

V (z)

where ε ∼ N(0, Id) and d is the dimensionof the latent-space.

Finally, the decoder maps this randomlysampled point z to x̂

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 19 / 28

Page 39: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

The generation aspect is at the center of this model.

The encoder maps x to a code h whichhas two parts E (z) and V (z).

E and V stand for expectation and variance

The next step is to sample a point z froma Gaussian distribution with expectationE (z) and variance V (z).

z = E (z) + ε�√

V (z)

where ε ∼ N(0, Id) and d is the dimensionof the latent-space.

Finally, the decoder maps this randomlysampled point z to x̂

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 19 / 28

Page 40: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

The generation aspect is at the center of this model.

The encoder maps x to a code h whichhas two parts E (z) and V (z).

E and V stand for expectation and variance

The next step is to sample a point z froma Gaussian distribution with expectationE (z) and variance V (z).

z = E (z) + ε�√

V (z)

where ε ∼ N(0, Id) and d is the dimensionof the latent-space.

So the code in the latent-space specifies a probability distribution.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 20 / 28

Page 41: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

The generation aspect is at the center of this model.

The encoder maps x to a code h whichhas two parts E (z) and V (z).

E and V stand for expectation and variance

The next step is to sample a point z froma Gaussian distribution with expectationE (z) and variance V (z).

z = E (z) + ε�√

V (z)

where ε ∼ N(0, Id) and d is the dimensionof the latent-space.

So the code in the latent-space specifies a probability distribution.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 21 / 28

Page 42: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Apart from the usual loss, there is an extraregularization term in the loss

β · `KL(z ,N(0, Id))

The reconstruction error `(x , x̂) = 12‖x − x̂‖2

pushes the “probability balls” apart, because itpenalizes overlapping.

We also have no control over the “size” ofthe bubbles.

Introducing regularization term in the losscounters this.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 22 / 28

Page 43: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Apart from the usual loss, there is an extraregularization term in the loss

β · `KL(z ,N(0, Id))

The reconstruction error `(x , x̂) = 12‖x − x̂‖2

pushes the “probability balls” apart, because itpenalizes overlapping.

We also have no control over the “size” ofthe bubbles.

Introducing regularization term in the losscounters this.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 22 / 28

Page 44: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Apart from the usual loss, there is an extraregularization term in the loss

β · `KL(z ,N(0, Id))

The reconstruction error `(x , x̂) = 12‖x − x̂‖2

pushes the “probability balls” apart, because itpenalizes overlapping.

We also have no control over the “size” ofthe bubbles.

Introducing regularization term in the losscounters this.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 22 / 28

Page 45: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Apart from the usual loss, there is an extraregularization term in the loss

`KL(z ,N(0, Id)) =

1

2

d∑i=1

(V (zi )− log(V (zi ))− 1 + E (zi )2)

divergence b/w distribution of z and N(0, Id).

The term (V (zi )− log(V (zi ))− 1) isminimized when V (zi ) = 1. This makes the“size” of the balls radius-1.

The term E (zi )2 penalizes large values, so the

balls are tightly packed as much as possible.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 23 / 28

Page 46: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Apart from the usual loss, there is an extraregularization term in the loss

`KL(z ,N(0, Id)) =

1

2

d∑i=1

(V (zi )− log(V (zi ))− 1 + E (zi )2)

divergence b/w distribution of z and N(0, Id).

The term (V (zi )− log(V (zi ))− 1) isminimized when V (zi ) = 1. This makes the“size” of the balls radius-1.

The term E (zi )2 penalizes large values, so the

balls are tightly packed as much as possible.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 23 / 28

Page 47: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Apart from the usual loss, there is an extraregularization term in the loss

`KL(z ,N(0, Id)) =

1

2

d∑i=1

(V (zi )− log(V (zi ))− 1 + E (zi )2)

divergence b/w distribution of z and N(0, Id).

The term (V (zi )− log(V (zi ))− 1) isminimized when V (zi ) = 1. This makes the“size” of the balls radius-1.

The term E (zi )2 penalizes large values, so the

balls are tightly packed as much as possible.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 23 / 28

Page 48: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Overall, the loss function leads to aclustering of similar inputs in thelatent-space, and these clusters denselypacked

This means that if we pick a point in thelatent-space, it can generate agood-quality image.

We can also interpolate between twoinput-points in the latent-space togenerate new images.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 24 / 28

Page 49: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Overall, the loss function leads to aclustering of similar inputs in thelatent-space, and these clusters denselypacked

This means that if we pick a point in thelatent-space, it can generate agood-quality image.

We can also interpolate between twoinput-points in the latent-space togenerate new images.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 24 / 28

Page 50: Lecture 10: Autoencoders

Variational Autoencoders (VAE)

Loss functions for VAE:

Overall, the loss function leads to aclustering of similar inputs in thelatent-space, and these clusters denselypacked

This means that if we pick a point in thelatent-space, it can generate agood-quality image.

We can also interpolate between twoinput-points in the latent-space togenerate new images.

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 24 / 28

Page 51: Lecture 10: Autoencoders

Examples: VAE trained on Celebrity Faces

Interpolating between hair-colors:

Interpolating between no glasses and glasses

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 25 / 28

Page 52: Lecture 10: Autoencoders

Examples: VAE trained on Celebrity Faces

Interpolating between hair-colors:

Interpolating between no glasses and glasses

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 25 / 28

Page 53: Lecture 10: Autoencoders

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 26 / 28

Page 54: Lecture 10: Autoencoders

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 27 / 28

Page 55: Lecture 10: Autoencoders

References:

These slides are based on:

https://www.compthree.com/blog/autoencoder/

https://atcold.github.io/pytorch-Deep-Learning/en/week07/07-3/

https://atcold.github.io/pytorch-Deep-Learning/en/week08/08-3/

Madhavan Mukund and Pranabendu Misra Lecture 10: Autoencoders AML 2021 28 / 28