fei-fei li & andrej karpathy & justin johnson...
TRANSCRIPT
![Page 1: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/1.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 20161
![Page 2: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/2.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Administrative● Everyone should be done with Assignment 3 now● Milestone grades will go out soon
2
![Page 3: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/3.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Last class
3
SegmentationSpatial Transformer
Soft Attention
![Page 4: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/4.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Videos
4
![Page 5: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/5.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
ConvNets for images
5
![Page 6: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/6.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Feature-based approaches to Activity Recognition
6
Dense trajectories and motion boundary descriptors for action recognitionWang et al., 2013
Action Recognition with Improved TrajectoriesWang and Schmid, 2013
(code available!)
![Page 7: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/7.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 20167
Dense trajectories and motion boundary descriptors for action recognitionWang et al., 2013
detect feature points track features with optical flow
extract HOG/HOF/MBH features in the (stabilized) coordinate system of each tracklet
![Page 8: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/8.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 20168
[J. Shi and C. Tomasi, “Good features to track,” CVPR 1994][Ivan Laptev 2005]
detected feature points
Dense trajectories and motion boundary descriptors for action recognitionWang et al., 2013
![Page 9: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/9.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 20169
Dense trajectories and motion boundary descriptors for action recognitionWang et al., 2013
track each keypoint using optical flow.
[G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” 2003][T. Brox and J. Malik, “Large displacement optical flow: Descriptor matching in variational motion estimation,” 2011]
![Page 10: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/10.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201610
Dense trajectories and motion boundary descriptors for action recognitionWang et al., 2013
Extract features in the local coordinate system of each tracklet.
Accumulate into histograms, separately according to multiple spatio-temporal layouts.
![Page 11: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/11.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201611
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4=>Output volume [55x55x96]Q: What if the input is now a small chunk of video? E.g. [227x227x3x15] ?
![Page 12: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/12.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201612
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4=>Output volume [55x55x96]Q: What if the input is now a small chunk of video? E.g. [227x227x3x15] ?A: Extend the convolutional filters in time, perform spatio-temporal convolutions!E.g. can have 11x11xT filters, where T = 2..15.
![Page 13: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/13.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201613
Spatio-Temporal ConvNets
[3D Convolutional Neural Networks for Human Action Recognition, Ji et al., 2010]
![Page 14: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/14.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Spatio-Temporal ConvNets
14
Sequential Deep Learning for Human Action Recognition, Baccouche et al., 2011
![Page 15: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/15.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201615
Spatio-Temporal ConvNets
[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]
spatio-temporal convolutions;worked best.
![Page 16: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/16.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201616
Spatio-Temporal ConvNets
[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]
Learned filters on the first layer
![Page 17: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/17.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201617
Spatio-Temporal ConvNets
[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]
1 million videos487 sports classes
![Page 18: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/18.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201618
Spatio-Temporal ConvNets
[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]
The motion information didn’t add all that much...
![Page 19: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/19.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201619
Spatio-Temporal ConvNets
![Page 20: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/20.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201620
Spatio-Temporal ConvNets
[Learning Spatiotemporal Features with 3D Convolutional Networks, Tran et al. 2015]
3D VGGNet, basically.
![Page 21: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/21.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201621
Spatio-Temporal ConvNets
[Two-Stream Convolutional Networks for Action Recognition in Videos, Simonyan and Zisserman 2014]
[T. Brox and J. Malik, “Large displacement optical flow: Descriptor matching in variational motion estimation,” 2011]
(of VGGNet fame)
![Page 22: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/22.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201622
Spatio-Temporal ConvNets
[Two-Stream Convolutional Networks for Action Recognition in Videos, Simonyan and Zisserman 2014]
[T. Brox and J. Malik, “Large displacement optical flow: Descriptor matching in variational motion estimation,” 2011]
Two-stream version works much better than either alone.
![Page 23: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/23.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201623
Long-time Spatio-Temporal ConvNetsAll 3D ConvNets so far used local motion cues to get extra accuracy (e.g. half a second or so)Q: what if the temporal dependencies of interest are much much longer? E.g. several seconds?
event 1 event 2
![Page 24: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/24.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201624
Sequential Deep Learning for Human Action Recognition, Baccouche et al., 2011
Long-time Spatio-Temporal ConvNets
(This paper was way ahead of its time. Cited 65 times.)
![Page 25: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/25.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201625
Sequential Deep Learning for Human Action Recognition, Baccouche et al., 2011
Long-time Spatio-Temporal ConvNetsLSTM way before it was cool
(This paper was way ahead of its time. Cited 65 times.)
![Page 26: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/26.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201626
Long-time Spatio-Temporal ConvNets
[Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al., 2015]
![Page 27: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/27.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201627
Long-time Spatio-Temporal ConvNets
[Beyond Short Snippets: Deep Networks for Video Classification, Ng et al., 2015]
![Page 28: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/28.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Summary so farWe looked at two types of architectural patterns:
1. Model temporal motion locally (3D CONV)
2. Model temporal motion globally (LSTM / RNN)
+ Fusions of both approaches at the same time.
28
![Page 29: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/29.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Summary so farWe looked at two types of architectural patterns:
1. Model temporal motion locally (3D CONV)
2. Model temporal motion globally (LSTM / RNN)
+ Fusions of both approaches at the same time.
29
There is another (cleaner) way!
![Page 30: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/30.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201630
3DCONVNET
video
RNN
Finite temporal extent(neurons that are onlya function of finitely manyvideo frames in the past)
Infinite (in theory) temporal extent(neurons that are functionof all video frames in the past)
![Page 31: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/31.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201631
Long-time Spatio-Temporal ConvNets
[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]
Beautiful:All neurons in the ConvNet are recurrent.
Only requires (existing) 2D CONV routines. No need for 3D spatio-temporal CONV.
![Page 32: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/32.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201632
Long-time Spatio-Temporal ConvNets
[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]
Convolution Layer
Normal ConvNet:
![Page 33: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/33.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201633
Long-time Spatio-Temporal ConvNets
[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]
CONV
CONV
layer N
layer N+1
layer N+1at previoustimestep
RNN-like recurrence (GRU)
![Page 34: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/34.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201634
Long-time Spatio-Temporal ConvNets
[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]
Recall: RNNs Vanilla RNN
LSTMGRU
![Page 35: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/35.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201635
Long-time Spatio-Temporal ConvNets
[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]
Recall: RNNs
GRU
Matrix multiply=>CONV
![Page 36: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/36.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201636
3DCONVNET
video
RNN
Finite temporal extent(neurons that are onlya function of finitely manyvideo frames in the past)
Infinite (in theory) temporal extent(neurons that are functionof all video frames in the past)
![Page 37: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/37.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 201637
RNNCONVNET
video
Infinite (in theory) temporal extent(neurons that are functionof all video frames in the past)
i.e. we obtain:
![Page 38: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/38.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Summary- You think you need a Spatio-Temporal Fancy Video
ConvNet- STOP. Do you really?- Okay fine: do you want to model:
- local motion? (use 3D CONV), or - global motion? (use LSTM).
- Try out using Optical Flow in a second stream (can work better sometimes)
- Try out GRU-RCN! (imo best model)
38
![Page 39: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/39.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Unsupervised Learning
39
![Page 40: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/40.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Unsupervised Learning Overview
● Definitions● Autoencoders
○ Vanilla○ Variational
● Adversarial Networks
40
![Page 41: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/41.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Supervised vs Unsupervised
41
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc
![Page 42: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/42.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Supervised vs Unsupervised
42
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc
Unsupervised Learning
Data: xJust data, no labels!
Goal: Learn some structure of the data
Examples: Clustering, dimensionality reduction, feature learning, generative models, etc.
![Page 43: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/43.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Unsupervised Learning
43
● Autoencoders○ Traditional: feature learning○ Variational: generate samples
● Generative Adversarial Networks: Generate samples
![Page 44: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/44.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
44
x
z
Encoder
Input data
Features
![Page 45: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/45.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
45
x
z
Encoder
Input data
Features
Originally: Linear + nonlinearity (sigmoid)Later: Deep, fully-connectedLater: ReLU CNN
![Page 46: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/46.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
46
x
z
Encoder
Input data
Features
Originally: Linear + nonlinearity (sigmoid)Later: Deep, fully-connectedLater: ReLU CNN
z usually smaller than x(dimensionality reduction)
![Page 47: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/47.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
47
x
z
xx
Encoder
Decoder
Input data
Features
Reconstructed input data
![Page 48: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/48.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
48
x
z
xx
Encoder
Decoder
Input data
Features
Reconstructed input data
Originally: Linear + nonlinearity (sigmoid)Later: Deep, fully-connectedLater: ReLU CNN (upconv)
Encoder: 4-layer convDecoder: 4-layer upconv
![Page 49: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/49.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
49
x
z
xx
Encoder
Decoder
Input data
Features
Reconstructed input data
Originally: Linear + nonlinearity (sigmoid)Later: Deep, fully-connectedLater: ReLU CNN (upconv)
Train for reconstruction with no labels!
Encoder / decoder sometimes share weights
Example:dim(x) = Ddim(z) = Hwe: H x Dwd: D x H = we
T
![Page 50: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/50.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
50
x
z
xx
Encoder
Decoder
Input data
Features
Reconstructed input data
Loss function (Often L2)
Train for reconstruction with no labels!
![Page 51: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/51.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
51
x
z
Encoder
Input data
Features
xx
Decoder
Reconstructed input data
After training,throw away decoder!
![Page 52: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/52.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
52
x
z
yy
Encoder
Classifier
Input data
Features
Predicted Label
Loss function (Softmax, etc)
yUse encoder to initialize a supervised model
planedog deer
birdtruck
Train for final task (sometimes with
small data)
Fine-tuneencoderjointly withclassifier
![Page 53: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/53.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders: Greedy Training
53Hinton and Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks”, Science 2006
In mid 2000s layer-wise pretraining with Restricted Boltzmann Machines (RBM) was common
Training deep nets was hard in 2006!
![Page 54: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/54.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders: Greedy Training
54Hinton and Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks”, Science 2006
In mid 2000s layer-wise pretraining with Restricted Boltzmann Machines (RBM) was common
Training deep nets was hard in 2006!
Not common anymore
With ReLU, proper initialization, batchnorm, Adam, etc easily train from scratch
![Page 55: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/55.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoders
55
x
z
Encoder
Input data
Features
xx
Decoder
Reconstructed input data
Autoencoders can reconstruct data, and can learn features to initialize a supervised model
Can we generate images from an autoencoder?
![Page 56: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/56.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
56
A Bayesian spin on an autoencoder - lets us generate data!
Assume our data is generated like this:
z xSample from
true prior
Sample from true conditional
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
![Page 57: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/57.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
57
A Bayesian spin on an autoencoder!
Assume our data is generated like this:
z xSample from
true prior
Sample from true conditional
Intuition: x is an image, z gives class, orientation, attributes, etc
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
![Page 58: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/58.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
58
A Bayesian spin on an autoencoder!
Assume our data is generated like this:
z xSample from
true prior
Sample from true conditional
Problem: Estimate without access to
latent states !
Intuition: x is an image, z gives class, orientation, attributes, etc
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
![Page 59: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/59.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
59
Prior: Assumeis a unit Gaussian
Kingma and Welling, ICLR 2014
![Page 60: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/60.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
60
Prior: Assumeis a unit Gaussian
Conditional: Assume is a diagonal Gaussian, predict mean and variance with neural net
Kingma and Welling, ICLR 2014
![Page 61: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/61.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
61
z
x Σx
Mean and (diagonal) covariance of
Latent state
Decoder network with parameters
Prior: Assumeis a unit Gaussian
Conditional: Assume is a diagonal Gaussian, predict mean and variance with neural net
Kingma and Welling, ICLR 2014
![Page 62: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/62.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
62
z
x Σx
Mean and (diagonal) covariance of
Latent state
Decoder network with parameters
Prior: Assumeis a unit Gaussian
Conditional: Assume is a diagonal Gaussian, predict mean and variance with neural net Fully-connected or
upconvolutionalKingma and Welling, ICLR 2014
![Page 63: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/63.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: EncoderBy Bayes Rule the posterior is:
63
Kingma and Welling, ICLR 2014
![Page 64: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/64.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: EncoderBy Bayes Rule the posterior is:
64
Use decoder network =)Gaussian =)Intractible integral =(
Kingma and Welling, ICLR 2014
![Page 65: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/65.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: EncoderBy Bayes Rule the posterior is:
65
x
z Σz
Mean and (diagonal) covariance of
Data point
Encoder network with parameters
Use decoder network =)Gaussian =)Intractible integral =(
Kingma and Welling, ICLR 2014
![Page 66: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/66.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: EncoderBy Bayes Rule the posterior is:
66
x
z Σz
Mean and (diagonal) covariance of
Data point
Encoder network with parameters
Use decoder network =)Gaussian =)Intractible integral =(
Approximate posterior with encoder networkKingma and Welling,
ICLR 2014
![Page 67: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/67.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: EncoderBy Bayes Rule the posterior is:
67
x
z Σz
Mean and (diagonal) covariance of
Data point
Encoder network with parameters
Use decoder network =)Gaussian =)Intractible integral =(
Approximate posterior with encoder network
Fully-connected or convolutional
Kingma and Welling, ICLR 2014
![Page 68: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/68.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
68
x Kingma and Welling, ICLR 2014Data point
![Page 69: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/69.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
69
x
z Σz
Kingma and Welling, ICLR 2014
Mean and (diagonal) covariance of
Data pointEncoder network
![Page 70: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/70.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
70
x
z Σz
z
Kingma and Welling, ICLR 2014
Mean and (diagonal) covariance of
Data pointEncoder network
Sample from
![Page 71: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/71.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
71
x
z Σz
z
x Σx
Kingma and Welling, ICLR 2014
Mean and (diagonal) covariance of
Mean and (diagonal) covariance of
Data pointEncoder network
Sample from
Decoder network
![Page 72: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/72.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder
72
x
z Σz
z
x
xx
Σx
Kingma and Welling, ICLR 2014
Mean and (diagonal) covariance of
Mean and (diagonal) covariance of
Data pointEncoder network
Sample from
Decoder network
Sample from Reconstructed
![Page 73: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/73.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Mean and (diagonal) covariance of(should be close to data x)
Variational Autoencoder
73
x
z Σz Mean and (diagonal) covariance of(should be close to prior ) Data point
Encoder network
z
x
Sample from
Decoder network
Sample from
Training like a normal autoencoder: reconstruction loss at the end, regularization toward prior in middle
xxReconstructed
Σx
Kingma and Welling, ICLR 2014
![Page 74: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/74.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Generate Data!
74
zSample from prior
After network is trained:
![Page 75: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/75.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Generate Data!
75
z
x Σx
Decodernetwork
Sample from prior
After network is trained:
![Page 76: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/76.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Generate Data!
76
z
x Σx
Decodernetwork
Sample from
Sample from prior
After network is trained:
xxGenerated
![Page 77: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/77.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Generate Data!
77
z
x Σx
Decodernetwork
Sample from
Sample from prior
After network is trained:
xxGenerated
![Page 78: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/78.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Generate Data!
78
z
x Σx
Decodernetwork
Sample from
Sample from prior
After network is trained:
xxGenerated
![Page 79: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/79.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Generate Data!
79
z
x Σx
Decodernetwork
Sample from
Sample from prior
Diagonal prior on z => independent latent variables
After network is trained:
xxGenerated
![Page 80: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/80.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: MathMaximum Likelihood?
80
Maximize likelihood of dataset
Kingma and Welling, ICLR 2014
![Page 81: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/81.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: MathMaximum Likelihood?
81
Maximize likelihood of dataset
Maximize log-likelihood instead because sums are nicer
Kingma and Welling, ICLR 2014
![Page 82: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/82.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: MathMaximum Likelihood?
82
Maximize likelihood of dataset
Maximize log-likelihood instead because sums are nicer
Marginalize joint distribution
Kingma and Welling, ICLR 2014
![Page 83: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/83.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: MathMaximum Likelihood?
83
Maximize likelihood of dataset
Maximize log-likelihood instead because sums are nicer
Intractible integral =(
![Page 84: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/84.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
84
![Page 85: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/85.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
85
![Page 86: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/86.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
86
![Page 87: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/87.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
87
![Page 88: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/88.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
88
![Page 89: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/89.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
89
![Page 90: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/90.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
90
“Elbow”
![Page 91: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/91.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
91
“Elbow”
![Page 92: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/92.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
92
Variational lower bound (elbow)
“Elbow”
![Page 93: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/93.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
93
Variational lower bound (elbow) Training: Maximize lower bound
“Elbow”
![Page 94: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/94.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
94
Reconstructthe input data
Variational lower bound (elbow) Training: Maximize lower bound
“Elbow”
![Page 95: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/95.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
95
Reconstructthe input data
Latent states should follow the prior
Variational lower bound (elbow) Training: Maximize lower bound
“Elbow”
![Page 96: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/96.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
96
Reconstructthe input data
Latent states should follow the prior
Variational lower bound (elbow) Training: Maximize lower bound
Samplingwithreparam.trick(see paper)
“Elbow”
![Page 97: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/97.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Variational Autoencoder: Math
97
Reconstructthe input data
Latent states should follow the prior
Variational lower bound (elbow) Training: Maximize lower bound
Everything is Gaussian, closed form solution!Sampling
withreparam.trick(see paper)
“Elbow”
![Page 98: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/98.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Autoencoder Overview
● Traditional Autoencoders○ Try to reconstruct input○ Used to learn features, initialize supervised model○ Not used much anymore
● Variational Autoencoders○ Bayesian meets deep learning○ Sample from model to generate images
98
![Page 99: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/99.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets
99
zRandom noise
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014
Can we generate images with less math?
![Page 100: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/100.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets
100
z
x
Generator
Random noise
Fake image
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014
Can we generate images with less math?
![Page 101: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/101.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets
101
z
x
Generator
Random noise
Fake image
yReal or fake?
Discriminator
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014
Can we generate images with less math?
![Page 102: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/102.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets
102
z
x
Generator
Random noise
Fake image
y
Real image
Real or fake?
Discriminator
x
Fake examples: from generatorReal examples: from dataset
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014
Can we generate images with less math?
![Page 103: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/103.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets
103
z
x
Generator
Random noise
Fake image
y
Real image
Real or fake?
Discriminator
x
Fake examples: from generatorReal examples: from dataset
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014
Train generator and discriminator jointlyAfter training, easy to generate images
Can we generate images with less math?
![Page 104: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/104.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets
104
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014Nearest neighbor from training set
Generated samples
![Page 105: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/105.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets
105
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014Nearest neighbor from training set
Generated samples (CIFAR-10)
![Page 106: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/106.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
106Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015
Generate low-res
![Page 107: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/107.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
107Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015
Generate low-res
Upsample
![Page 108: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/108.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
108Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015
Generate low-res
Upsample
Generate delta, add
![Page 109: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/109.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
109Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015
Generate low-res
Upsample
Generate delta, add
Upsample
![Page 110: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/110.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
110Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015
Generate low-res
Upsample
Generate delta, add
Upsample
Generate delta, add
![Page 111: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/111.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
111Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015
Generate low-res
Upsample
Generate delta, add
Upsample
Generate delta, add
Upsample
![Page 112: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/112.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
112Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015
Generate low-res
Upsample
Generate delta, add
Upsample
Generate delta, add
Upsample
Generate delta, add
Done!
![Page 113: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/113.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
113
Discriminators work at every scale!
Denton et al, NIPS 2015
![Page 114: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/114.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Multiscale
114
Train separate model per-class on CIFAR-10Denton et al, NIPS 2015
![Page 115: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/115.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Simplifying
115
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016
Generator is an upsampling network with fractionally-strided convolutionsDiscriminator is a convolutional network
![Page 116: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/116.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Simplifying
116
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016
Generator
![Page 117: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/117.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Simplifying
117
Radford et al, ICLR 2016
Samples from the model look amazing!
![Page 118: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/118.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Simplifying
118
Radford et al, ICLR 2016
Interpolating between random points in latent space
![Page 119: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/119.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Vector Math
119
Smiling woman Neutral woman Neutral man
Samples from the model
Radford et al, ICLR 2016
![Page 120: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/120.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Vector Math
120
Smiling woman Neutral woman Neutral man
Samples from the model
Average Z vectors, do arithmetic
Radford et al, ICLR 2016
![Page 121: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/121.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Vector Math
121
Smiling woman Neutral woman Neutral man
Smiling ManSamples from the model
Average Z vectors, do arithmetic
Radford et al, ICLR 2016
![Page 122: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/122.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Vector Math
122
Radford et al, ICLR 2016
Glasses man No glasses man No glasses woman
![Page 123: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/123.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Generative Adversarial Nets: Vector Math
123
Radford et al, ICLR 2016
Glasses man No glasses man No glasses woman
Woman with glasses
![Page 124: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/124.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Putting everything together
124
x
z Σzz
x
xxΣx
VariationalAutoencoder
Pixel loss
Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016
![Page 125: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/125.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Putting everything together
125
x
z Σzz
x
xxΣx
y
VariationalAutoencoder
Discriminator network
Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016
Real or Generated
Pixel loss
![Page 126: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/126.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Putting everything together
126
x
z Σzz
x
xxΣx
y
VariationalAutoencoder
Discriminator network
Pretrained AlexNet
Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016
Real or Generated
Pixel loss
![Page 127: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/127.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Putting everything together
127
x
z Σzz
x
xxΣx
y
VariationalAutoencoder
Discriminator network
Pretrained AlexNet
xf xxfFeatures of real image
Features of reconstructed image
Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016
Real or Generated
Pixel loss
![Page 128: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/128.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Putting everything together
128
x
z Σzz
x
xxΣx
y
VariationalAutoencoder
Discriminator network
Pretrained AlexNet
xf xxfFeatures of real image
Features of reconstructed image
L2 loss
Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016
Real or Generated
Pixel loss
![Page 129: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/129.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Putting everything together
129
Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016
Samples from the model, trained on ImageNet
![Page 130: Fei-Fei Li & Andrej Karpathy & Justin Johnson …cs231n.stanford.edu/slides/2016/winter1516_lecture14.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 7 29 Feb 2016 Dense](https://reader030.vdocuments.net/reader030/viewer/2022040103/5e95274febc0745c0f695241/html5/thumbnails/130.jpg)
Lecture 14 - Fei-Fei Li & Andrej Karpathy & Justin Johnson 29 Feb 2016
Recap
● Videos● Unsupervised learning
○ Autoencoders: Traditional / variational○ Generative Adversarial Networks
● Next time: Guest lecture from Jeff Dean
130