![Page 1: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/1.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Deep Learning for Computer VisionLex Fridman
![Page 2: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/2.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Supervised Learning
Unsupervised Learning
Semi-SupervisedLearning
ReinforcementLearning
Computer Vision is Machine Learning
References: [81]
Computer Vision
![Page 3: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/3.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Images are Numbers
References: [89]
• Regression: The output variable takes continuous values
• Classification: The output variable takes class labels• Underneath it may still produce continuous values such as
probability of belonging to a particular class.
![Page 4: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/4.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Human Vision Seems Easy Why: Data
References: [6, 7, 11]
Hans Moravec (CMU) Rodney Brooks (MIT) Marvin Minsky (MIT)
“Encoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it.…Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.”- Hans Moravec, Mind Children (1988)
Visual perception: 540 millions years of data
Bipedal movement: 230+ million years of data
Abstract thought: 100 thousand years of data
![Page 5: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/5.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Human VisionIts structure is instructive and inspiring!
References: [118]
Thalamocortical System Simulation: 8 million cortical neurons + 2 billion synapses:
![Page 6: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/6.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Visual Cortex(Its Structure is Instructive and Inspiring)
Reference: https://www.youtube.com/watch?v=_33K1zTtoow
![Page 7: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/7.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Computer Vision is Hard
References: [66, 69, 89]
![Page 8: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/8.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Image Classification Pipeline
References: [81, 89]
![Page 9: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/9.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Famous Computer Vision Datasets
References: [90, 91, 92, 93]
MNIST: handwritten digits ImageNet: WordNet hierarchy
CIFAR-10(0): tiny images Places: natural scenes
![Page 10: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/10.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Let’s Build an Image Classifier for CIFAR-10
References: [89, 91]
![Page 11: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/11.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Let’s Build an Image Classifier for CIFAR-10
References: [89, 91]
AccuracyRandom: 10%Our image-diff (with L1): 38.6%Our image-diff (with L2): 35.4%
![Page 12: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/12.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
K-Nearest Neighbors: Generalizing the Image-Diff Classifier
References: [89]
Tuning (hyper)parameters:
![Page 13: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/13.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
K-Nearest Neighbors: Generalizing the Image-Diff Classifier
References: [89, 94]
AccuracyRandom: 10%Training and testing on the same data: 35.4%7-Nearest Neighbors: ~30%Human: ~94%…Convolutional Neural Networks: ~95%
![Page 14: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/14.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Reminder: Weighing the Evidence
References: [78]
Evid
ence
Decisio
ns
![Page 15: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/15.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Reminder: “Learning” is Optimization of a Function
References: [63, 80]
Ground truth for “6”:
“Loss” function:
![Page 16: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/16.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Classify and Image of a Number
References: [80]
Input:(28x28)
Network:
![Page 17: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/17.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Convolutional Neural Networks
References: [95]
Regular neural network (fully connected):
Convolutional neural network:
Each layer takes a 3d volume, produces 3d volume with some smooth function that may or may not have parameters.
![Page 18: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/18.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Convolutional Neural Networks: Layers• INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and
with three color channels R,G,B.
• CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.
• RELU layer will apply an elementwise activation function, such as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).
• POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].
• FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x10], where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.
References: [95]
Layers highlighted in blue have learnable parameters.
![Page 19: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/19.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Dealing with Images: Local Connectivity
Same neuron. Just more focused (narrow “receptive field”).
The parameters on a each filter are spatially “shared”(if a feature is useful in one place, it’s useful elsewhere)
References: [95]
![Page 20: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/20.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
ConvNets: Spatial Arrangement of Output Volume
• Depth: number of filters
• Stride: filter step size (when we “slide” it)
• Padding: zero-pad the input
References: [95]
![Page 30: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/30.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Convolution
References: [124]
![Page 31: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/31.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Convolution
References: [124]
![Page 32: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/32.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Convolution: Representation Learning
References: [124]
![Page 33: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/33.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
ConvNets: Pooling
References: [95]
![Page 34: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/34.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Same Architecture, Many Applications
This part might look different for:• Different image classification domains• Image captioning with recurrent neural networks• Image object localization with bounding box• Image segmentation with fully convolutional networks• Image segmentation with deconvolution layers
![Page 35: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/35.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep LearningReferences: [4]
Object Recognition
Case Study: ImageNet
![Page 36: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/36.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
What is ImageNet?
• ImageNet: dataset of 14+ million images (21,841 categories)• Links to images not images
• Let’s take the high level category of fruit as an example:• Total 188,000 images of fruit
• There are 1206 Granny Smith apples:
References: [90]
![Page 37: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/37.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
• ImageNet: dataset of 14+ million images
• ILSVRC: ImageNet Large Scale Visual Recognition Challenge
• AlexNet (2012)
• ZFNet (2013)
• VGGNet (2014)
• GoogLeNet (2014)
• ResNet (2015)
• CUImage (2016)
Dataset
Networks
Competition
References: [90]
What is ImageNet?
![Page 38: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/38.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
ILSVRC Challenge Evaluation for Classification
• Top 5 error rate:• You get 5 guesses to get the correct label
References: [123]
• ~20% reduction in accuracy for Top 1 vs Top 5• Example: In 2012 AlexNet achieved
• Human annotation is a binary task: “apple” or “not apple”
![Page 39: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/39.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
• AlexNet (2012): First CNN (15.4%)
• 8 layers
• 61 million parameters
• ZFNet (2013): 15.4% to 11.2%
• 8 layers
• More filters. Denser stride.
• VGGNet (2014): 11.2% to 7.3%
• Beautifully uniform:3x3 conv, stride 1, pad 1, 2x2 max pool
• 16 layers
• 138 million parameters
• GoogLeNet (2014): 11.2% to 6.7%
• Inception modules
• 22 layers
• 5 million parameters(throw away fully connected layers)
• ResNet (2015): 6.7% to 3.57%
• More layers = better performance
• 152 layers
• CUImage (2016): 3.57% to 2.99%
• Ensemble of 6 models
References: [90]
![Page 40: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/40.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
• AlexNet (2012): First CNN (15.4%)
• 8 layers
• 61 million parameters
• ZFNet (2013): 15.4% to 11.2%
• 8 layers
• More filters. Denser stride.
• VGGNet (2014): 11.2% to 7.3%
• Beautifully uniform:3x3 conv, stride 1, pad 1, 2x2 max pool
• 16 layers
• 138 million parameters
• GoogLeNet (2014): 11.2% to 6.7%
• Inception modules
• 22 layers
• 5 million parameters(throw away fully connected layers)
• ResNet (2015): 6.7% to 3.57%
• More layers = better performance
• 152 layers
• CUImage (2016): 3.57% to 2.99%
• Ensemble of 6 models
References: [4]
Krizhevsky et al. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
![Page 41: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/41.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
• AlexNet (2012): First CNN (15.4%)
• 8 layers
• 61 million parameters
• ZFNet (2013): 15.4% to 11.2%
• 8 layers
• More filters. Denser stride.
• VGGNet (2014): 11.2% to 7.3%
• Beautifully uniform:3x3 conv, stride 1, pad 1, 2x2 max pool
• 16 layers
• 138 million parameters
• GoogLeNet (2014): 11.2% to 6.7%
• Inception modules
• 22 layers
• 5 million parameters(throw away fully connected layers)
• ResNet (2015): 6.7% to 3.57%
• More layers = better performance
• 152 layers
• CUImage (2016): 3.57% to 2.99%
• Ensemble of 6 models
References: [128]
Simonyan et al. "Very deep convolutional networks for large-scale image recognition." 2014.
![Page 42: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/42.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
• AlexNet (2012): First CNN (15.4%)
• 8 layers
• 61 million parameters
• ZFNet (2013): 15.4% to 11.2%
• 8 layers
• More filters. Denser stride.
• VGGNet (2014): 11.2% to 7.3%
• Beautifully uniform:3x3 conv, stride 1, pad 1, 2x2 max pool
• 16 layers
• 138 million parameters
• GoogLeNet (2014): 11.2% to 6.7%
• Inception modules
• 22 layers
• 5 million parameters(throw away fully connected layers)
• ResNet (2015): 6.7% to 3.57%
• More layers = better performance
• 152 layers
• CUImage (2016): 3.57% to 2.99%
• Ensemble of 6 models
References: [129]
Szegedy et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
![Page 43: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/43.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
• AlexNet (2012): First CNN (15.4%)
• 8 layers
• 61 million parameters
• ZFNet (2013): 15.4% to 11.2%
• 8 layers
• More filters. Denser stride.
• VGGNet (2014): 11.2% to 7.3%
• Beautifully uniform:3x3 conv, stride 1, pad 1, 2x2 max pool
• 16 layers
• 138 million parameters
• GoogLeNet (2014): 11.2% to 6.7%
• Inception modules
• 22 layers
• 5 million parameters(throw away fully connected layers)
• ResNet (2015): 6.7% to 3.57%
• More layers = better performance
• 152 layers
• CUImage (2016): 3.57% to 2.99%
• Ensemble of 6 models
References: [130]
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
![Page 44: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/44.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Same Architecture, Many Applications
This part might look different for:• Different image classification domains• Image captioning with recurrent neural networks• Image object localization with bounding box• Image segmentation with fully convolutional networks• Image segmentation with deconvolution layers
![Page 45: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/45.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Original Ground Truth FCN-8
Segmentation
References: [96]
![Page 46: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/46.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Object Detection
References: [97]
![Page 47: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/47.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Applications: Image Caption Generation
References: [43 – Fang et al. 2015]
![Page 48: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/48.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Applications: Image Question Answering
References: [40]
Ren et al. "Exploring models and data for image question answering." 2015.
Code: https://github.com/renmengye/imageqa-public
![Page 49: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/49.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Applications: Video Description Generation
References: [41, 42]
Venugopalan et al."Sequence to sequence-video to text." 2015.
Code: https://vsubhashini.github.io/s2vt.html
![Page 50: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/50.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Applications: Modeling Attention Steering
References: [35, 36]
Jimmy Ba, Volodymyr Mnih, and KorayKavukcuoglu. "Multiple object recognition with visual attention." (2014).
![Page 51: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/51.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Application: Audio Classification
![Page 52: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/52.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Driving Scene Segmentation
References: [127]
![Page 53: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/53.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep LearningReferences: http://cars.mit.edu/deeptesla
End-to-End Learning of the Driving Task
![Page 54: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/54.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Computer Vision for Intelligent Systems
References: [120]
![Page 55: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/55.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Open Problem: Robustness
>99.6% Confidence in the Wrong Answer
References: [67]
Nguyen et al. "Deep neural networks are easily fooled: High confidence predictions for unrecognizable images." 2015.
![Page 56: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/56.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Open Problem: Robustness
Fooled by a Little Distortion
References: [68]
Szegedy et al. "Intriguing properties of neural networks." 2013.
![Page 57: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/57.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Object Category Recognition
![Page 58: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/58.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Object Category Recognition
![Page 59: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/59.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Object Category Recognition
![Page 60: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/60.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Object Category Recognition
![Page 61: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/61.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
Object Category Recognition
References: [121]
![Page 62: Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125538e192bc138cf03b5f5/html5/thumbnails/62.jpg)
Lex Fridman:[email protected]
January2017
Course 6.S191:Intro to Deep Learning
References
All references cited in this presentation are listed in the following Google Sheets file:
https://goo.gl/9Xhp2t