cs-f441: selected topics from computer science …cs-f441: selected topics from computer science...

CS-F441: SELECTED TOPICS FROM COMPUTER

SCIENCE (DEEP LEARNING FOR NLP & CV)

Lecture-KT-15: Autoencoders, VAE

Dr. Kamlesh Tiwari,Assistant Professor,

Department of Computer Science and Information Systems,BITS Pilani, Rajasthan-333031 INDIA

Nov 27, 2019 (Campus @ BITS-Pilani July-Dec 2019)

AutoencodersHigh dimensional data can many a times be represented in lowdimension

Linear manifoldTry to make output same as inputwith a central bottelneckBottelneck corrosponds to PCA orsometing like thatIf Hidden and output units are linearand minimizes the squaredreconstruction error; it is exactly asPCA

It is a supervised learning method to do unsupervised learning

Encoder converts coordinates in input space to coordinates on themanifold (using non linear method). Decoder does the reverse

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-15 (Nov 27, 2019) 2 / 9

Autoencoder example-1

Compressing digit images to 30 numbers.

MNIST1 digit images 28× 28 three hidden layers (weights weretranspose)

784→ 1000→ 500→ 250→ 30

1http://yann.lecun.com/exdb/mnist/



Compare documents for similarity

Bag of wordsWord count is normalized forprobabilityCompressed to 10 denominationalSoftmax is used at output400K business documentsHand labeled for ground truthcategoriescosine similarity



Linear semantic analysis is muchworse

Reduce to 2 real numbers using PCA with log(1 + count)

Deep autoencoders

Reduce to 2 real numbers usingPCA with log(1 + count)


AutoencoderUndercompleteNon linear activationRegularization is needed (to keep coefficients small)Dropout for learning more general representationDenoising autoencodersUseful for segmentation and deep-featureNeural inpainting


Variational Autoencoders

A type of generative model

What if we give anything to decoder?

garbage out.How to get valid hidden representation by sampling?Sample from distribution. No idea about parameters...Force parameters to be of a known onelatent and reconstruction loss




What if we give anything to decoder? garbage out.How to get valid hidden representation by sampling?

Sample from distribution. No idea about parameters...Force parameters to be of a known onelatent and reconstruction loss




What if we give anything to decoder? garbage out.How to get valid hidden representation by sampling?Sample from distribution. No idea about parameters...Force parameters to be of a known onelatent and reconstruction loss



Minimize two quantities1 Minimize ||x − x̂ ||2 LK-Divergence(G(zµ,zσ),N(0,1))

Kullback-Leibler Divergence measures the difference between twoprobability distributions

DKL(p(x)||q(x)) =∑x∈X

p(x). lnp(x)q(x)


Thank You!

Thank you very much for your attention2 !

Queries ?

2Credit: https://www.youtube.com/watch?v=NM6lrxy0bxs


cs-f441: selected topics from computer science …cs-f441: selected topics from computer science...

Documents