lecture 11: deep sequential models - github pages › lectures › apr2019 ›...

50
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 1 Lecture 11: Deep Sequential Models Efstratios Gavves

Upload: others

Post on 07-Jul-2020

11 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 1

Lecture 11: Deep Sequential ModelsEfstratios Gavves

Page 2: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 2

oAutoregressive Models

oPixelCNN, PixelCNN++, PixelRNN

oWaveNet

oTime-Aligned DenseNets

Lecture overview

Page 3: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 3

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals?

Autoregressive Models

Page 4: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 4

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals?

Autoregressive Models

Page 5: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 5

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals? Other signals and orders?

Autoregressive Models

Page 6: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 6

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals?

Autoregressive Models

Page 7: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 7

o If 𝑥 is sequential, there is an order: 𝑥 = [𝑥1, … , 𝑥𝑘]◦E.g., the order of words in a sentence

o If 𝑥 is not sequential, we can create an artificial order 𝑥 = [𝑥𝑟(1), … , 𝑥𝑟(𝑘)]◦E.g., the order with which pixels make (generate) an image

oThen, the marginal likelihood is a product of conditionals

𝑝 𝑥 =ෑ

𝑘=1

𝐷

𝑝(𝑥𝑘|𝑥𝑗<𝑘)

oDifferent from Recurrent Neural Networks(a) no parameter sharing(b) chains are not infinite in length

Autoregressive Models

Page 8: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 8

oPros: because of the product decomposition, 𝑝 𝑥 is tractable

oCons: because the 𝑝 𝑥 is sequential, training is slower◦To generate every new word/frame/pixel, the previous words/frames/pixels in the order must be generated first no parallelism

Autoregressive Models

Page 9: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 9

oSequential data is a natural fit◦Language modelling, time series, etc

◦For non-sequential data they are ok, although arguably artificial

oQuestion: How to model the conditionals 𝑝(𝑥𝑘|𝑥𝑗<𝑘)◦Logistic regression (Frey et al., 1996)

◦Neural networks (Bengio and Bengio, 2000)

oModern deep autoregressors◦NADE, MADE, PixelCNN, PixelCNN++, PixelRNN, WaveNet

The where and now of Deep Autoregressive Models

Page 10: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 10

NADE

Neural Autoregressive Distribution Estimation, Larochelle and Murray, AISTATS 2011

Page 11: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 11

oMinimizing negative log-likelihood as usual

ℒ = − log 𝑝 𝑥 = −

𝑘=1

𝐷

𝑝(𝑥𝑘|𝑥<𝑘)

oThen, we model the conditional as𝑝 𝑥𝑑 𝑥<𝑑 = 𝜎(Vd,∶ ⋅ ℎ𝑑 + bd)

where the latent variable ℎ𝑑 is defined asℎ𝑑 = 𝜎(𝑊:,𝑑 ⋅ 𝑥𝑑 + 𝑐)

NADE

Page 12: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 12

o “Teacher forcing” training

NADE: Training & Testing

Training: Use ground truth values (e.g. of pixels) Testing: Use predicted values in previous order

Page 13: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 13

NADE Visualizations

Page 14: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 14

oQuestion: How could we construct an autoregressive autoencoder?

oTo rephrase: How to modify an autoencoder such that each output 𝑥𝑘depends only on the previous outputs 𝑥<𝑘 (autoregressive property)?◦Namely, the present 𝑘-th output 𝑥𝑘 must not depend on a computational path from future inputs 𝑥𝑘+1, … , 𝑥𝐷◦Autoregressive: 𝑝 𝑥|𝜃 = ς𝑘=1

𝐷 𝑝(𝑥𝑘|𝑥𝑗<𝑘 , 𝜃)

◦Autoencoder: 𝑝 𝑥|𝑥, 𝜃 = ς𝑘=1𝐷 𝑝(𝑥𝑘|𝑥𝑘 , 𝜃)

MADE

Masked Autoencoder for Distribution Estimation, Germain, Mathieu et al., ICML 2015

Page 15: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 15

oQuestion: How could we construct an autoregressive autoencoder?

oTo rephrase: How to modify an autoencoder such that each output 𝑥𝑘depends only on the previous outputs 𝑥<𝑘 (autoregressive property)?◦Namely, the present 𝑘-th output 𝑥𝑘 must not depend on a computational path from future inputs 𝑥𝑘+1, … , 𝑥𝐷◦Autoregressive: 𝑝 𝑥|𝜃 = ς𝑘=1

𝐷 𝑝(𝑥𝑘|𝑥𝑗<𝑘 , 𝜃)

◦Autoencoder: 𝑝 𝑥|𝑥, 𝜃 = ς𝑘=1𝐷 𝑝(𝑥𝑘|𝑥𝑘 , 𝜃)

oAnswer: Masked convolutions!ℎ 𝑥 = 𝑔 𝑏 + 𝑊⨀𝑀𝑊 ⋅ 𝑥𝑥 = 𝜎 𝑐 + 𝑉⨀𝑀𝑉 ⋅ ℎ(𝑥)

MADE

Masked Autoencoder for Distribution Estimation, Germain, Mathieu et al., ICML 2015

Page 16: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 16

MADE

Masked Autoencoder for Distribution Estimation, Germain, Mathieu et al., ICML 2015

Page 17: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 17

oUnsupervised learning: learn how to model 𝑝(𝑥)

oDecompose the marginal

𝑝 𝑥 =ෑ

𝑖=1

𝑛2

𝑝(𝑥𝑖|𝑥1, … , 𝑥𝑖−1)

oAssume row-wise pixel by pixel generation and sequential colors RGB◦Each color conditioned on all colors from previous pixels and specific colors in the same pixel

𝑝 𝑥𝑖,𝑅|𝑥<𝑖 ⋅ 𝑝 𝑥𝑖,𝐺|𝑥<𝑖 , 𝑥𝑖,𝑅 ⋅ 𝑝 𝑥𝑖,𝐵|𝑥<𝑖 , 𝑥𝑖,𝑅 , 𝑥𝑖,𝐺

oFinal output is 256-way softmax

PixelRNN

Pixel Recurrent Neural Networks, van den Oord, Kalchbrenner and Kavukcuoglu, arXiv 2016

Page 18: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 18

oHow to model the conditionals?

𝑝 𝑥𝑖,𝑅|𝑥<𝑖 , 𝑝 𝑥𝑖,𝐺|𝑥<𝑖 , 𝑥𝑖,𝑅 , 𝑝 𝑥𝑖,𝐵|𝑥<𝑖 , 𝑥𝑖,𝑅 , 𝑥𝑖,𝐺

oLSTM variants◦12 layers

oRow LSTM

oDiagonal Bi-LSTM

PixelRNN

Row LSTM Diagonal Bi-LSTM

Page 19: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 19

oHidden state (i, j) = Hidden state (i-1, j-1) + Hidden state (i-1, j) +Hidden state (i-1, j+1) +p(i, j)

oBy recursion the hidden statecaptures a fairly triangular region

PixelRNN - RowLSTM

Row LSTM

Page 20: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 20

oHow to capture the whole previous context

oPixel (i, j) = Pixel (i, j-1) + Pixel (i-1, j)

oProcessing goes on diagonally

oReceptive layer encompasses entire region

PixelRNN – Diagonal BiLSTM

Diagonal Bi-LSTM

Page 21: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 21

oPropagate signal faster

oSpeed up convergence

PixelRNN – Residual connections

Page 22: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 22

oPros: good modelling of 𝑝(𝑥) nice image generation

oHalf pro: Residual connections speeds up convergence

oCons: still slow training, slow generation

PixelRNN – Pros & Cons

Row LSTM Diagonal Bi-LSTM

Page 23: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 24

PixelRNN - Generations

Page 24: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 25

oUnfortunately, PixelRNN is too slow

oSolution: replace recurrent connections with convolutions◦Multiple convolutional layers to preserve spatial resolution

oTraining is much faster because all true pixels are known in advance, so we can parallelize◦Generation still sequential (pixels must be generated) still slow

PixelCNN

Stack of masked convolutions

Pixel Recurrent Neural Networks, van den Oord, Kalchbrenner and Kavukcuoglu, arXiv 2016

Page 25: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 26

oUse masked convolutions again to enforce autoregressive relationships

PixelCNN

Page 26: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 27

oCons: Performance is worse than PixelRNN

oWhy?

PixelCNN – Pros and Cons

Page 27: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 28

oCons: Performance is worse than PixelRNN

oWhy?

oNot all past context is taken into account

oNew problem: the cascaded convolutions create a “blind spot”

PixelCNN – Pros and Cons

Page 28: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 30

oBecause of(a) the limited receptive field of convolutions and(b) computing all features at once (not sequentially) cascading convolutions makes current pixel not depend on all previous blind spot

Blind spot

Page 29: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 31

oUse two layers of convolutions stacks◦Horizontal stack: conditions on current row and takes as input the previous layer output and the vertical stack

◦Vertical stack: conditions on all rows above current pixels

oAlso replace ReLU with a tanh(𝑊 ∗ 𝑥)⋅ 𝜎(𝑈 ∗ 𝑥)

Fixing the blind spot: Gated PixelCNN

Page 30: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 32

oCoral reef

PixelCNN - Generations

Page 31: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 33

oSorrel horse

PixelCNN - Generation

Page 32: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 34

oSandbar

PixelCNN - Generation

Page 33: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 35

oLhasa Apso

PixelCNN - Generation

Page 34: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 36

o Improving the PixelCNN model

oReplace the softmax output with a discretized logistic mixture lihelihood◦Softmax is too memory consuming and gives sparse gradients

◦ Instead, assume logistic distribution of intensity and round off to 8-bits

oCondition on whole pixels, not pixel colors

oDownsample with stride-2 convs to compute long-range dependencies

oUse shortcut connections

oDropout◦PixelCNN is too powerful a framework can onverfit easily

PixelCNN++

PixelCNN++: Improving the PixelCNN with Discretized Logistic, Salimans, Karpathy, Chen, Kingma, ICLR 2017

Page 35: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 37

PixelCNN++

Page 36: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 38

PixelCNN++ - Generations

Page 37: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 39

oA standard VAE with a PixelCNN generator/decoder

oBe careful. Often the generator is so powerful, that the encoder/inference network is ignored Whatever the latent code 𝑧 there will be a nice image generated

PixelVAE

PixelVAE: A Latent Variable Model for Natural Images, Gulrajani et al., ICLR 2017

Page 38: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 40

PixelVAE

Page 39: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 41

PixelVAE - Generations

64x64 LSUN Bedrooms 64x64 ImageNet

Page 40: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 42

PixelVAE - Generations

Varying pixel-level noise

Varying bottom latents

Varying top latents

Page 41: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 43

WaveNet

WaveNet: A Generative Model for Raw Audio, van den Oord et al., arXiv 2017

Page 42: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 44

o Inspired by PixelRNN and PixelCNN

oFully convolutional neural network

oUse dilated convolutions

oSamples

WaveNet

Page 43: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 45

WaveNet architecture

Page 44: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 46

oVideo Time: Properties, Encoders and Evaluation◦A. Ghodrati, E. Gavves, C. Snoek, BMVC 2018

oHow to model image sequences optimally?

VideoTime: Time-Aligned DenseNets

Video Time: Properties, Encoders and Evaluation, Ghodrati, Gavves, Snoek, BMVC 2018

Page 45: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 47

A standard VAE with a PixelCNN generator/decoder

Properties of video

Forward Backward

Page 46: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 48

oTemporal asymmetry

oTemporal continuity

oTemporal causality

Video properties Natural order (+)

Reverse order (-)

“Pretending to put something into something”

Page 47: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 49

Current solutions

LSTM

LSTM

LSTM

LSTM

CNN3D CNN

3D CNN

3D CNNCNNCNNCNN

3D convolutions learn spatiotemporal

patterns within a video

LSTMs learn transitions between

subsequent states

Page 48: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 50

oAutoregressive-like: each new frame directly depends on all previous frames◦They are not generative

oNo parameter sharing like ConvNets, unlike RNNs

oSequential like RNNs, unlike ConvNets

Time-Aligned DenseNets

Laye

r 1

Laye

r 2

Laye

r 3

Laye

r 4

CNNCNNCNNCNN

FC-R

eLU

-Dro

po

ut

FC

Page 49: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES DEEP SEQUENTIAL MODELS - 51

Better latent space

Page 50: Lecture 11: Deep Sequential Models - GitHub Pages › lectures › apr2019 › lecture11-deep...Lecture 11: Deep Sequential Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES

DEEP SEQUENTIAL MODELS - 52

Summary

oAutoregressive Models

oPixelCNN, PixelCNN++, PixelRNN

oWaveNet

oTime-Aligned DenseNets