![Page 1: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/1.jpg)
Andrej KarpathyBay Area Deep Learning School, 2016
![Page 2: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/2.jpg)
So far...
![Page 3: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/3.jpg)
So far...
Some input vector (very few assumptions made).
![Page 4: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/4.jpg)
In many real-world applications input vectors have structure.
Spectrograms
ImagesText
![Page 5: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/5.jpg)
Convolutional Neural Networks:A pinch of history
![Page 6: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/6.jpg)
Hubel & Wiesel,1959RECEPTIVE FIELDS OF SINGLE NEURONES INTHE CAT'S STRIATE CORTEX
1962RECEPTIVE FIELDS, BINOCULAR INTERACTIONAND FUNCTIONAL ARCHITECTURE INTHE CAT'S VISUAL CORTEX
1968...
![Page 7: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/7.jpg)
A bit of history:
Neurocognitron[Fukushima 1980]
“sandwich” architecture (SCSCSC…)simple cells: modifiable parameterscomplex cells: perform pooling
![Page 8: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/8.jpg)
Gradient-based learning applied to document recognition[LeCun, Bottou, Bengio, Haffner 1998]
LeNet-5
![Page 9: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/9.jpg)
car 99%
ComputerVision2011
![Page 10: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/10.jpg)
ComputerVision2011
Page 1
![Page 11: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/11.jpg)
ComputerVision2011
Page 2
![Page 12: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/12.jpg)
ComputerVision2011
Page 3+ code complexity :(
![Page 13: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/13.jpg)
ImageNet Classification with Deep Convolutional Neural Networks[Krizhevsky, Sutskever, Hinton, 2012]
“AlexNet”Deng et al.Russakovsky et al.
NVIDIA et al.
![Page 14: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/14.jpg)
(slide from Kaiming He’s recent presentation)
![Page 15: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/15.jpg)
“What I learned from competing against a ConvNet on ImageNet” (karpathy.github.io)
![Page 16: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/16.jpg)
“What I learned from competing against a ConvNet on ImageNet” (karpathy.github.io)
TLDR: Human accuracy is somewhere 2-5%. (depending on how much training or how little life you have)
![Page 17: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/17.jpg)
[224x224x3]
f 1000 numbers, indicating class scores
Feature Extraction
vector describing various image statistics
[224x224x3]
f 1000 numbers, indicating class scores
training
training
![Page 18: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/18.jpg)
“Run the image through 20 layers of 3x3 convolutions and train the filters with SGD.”*
* to the first order
![Page 19: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/19.jpg)
Transfer Learning
1. Train on Imagenet
3. Medium dataset:finetuning
more data = retrain more of the network (or all of it)
2. Small dataset:feature extractor
Freeze these
Train this
Freeze these
Train this
![Page 20: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/20.jpg)
Transfer LearningCNN Features off-the-shelf: an Astounding Baseline for Recognition[Razavian et al, 2014]
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition[Donahue*, Jia*, et al., 2013]
![Page 21: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/21.jpg)
e.g. with keras.io
The power is easily accessible.
![Page 22: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/22.jpg)
ConvNets are everywhere…
e.g. Google Photos search
Face Verification, Taigman et al. 2014 (FAIR)
Self-driving cars[Goodfellow et al. 2014]
Ciresan et al. 2013
Turaga et al 2010
![Page 23: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/23.jpg)
ConvNets are everywhere…
Whale recognition, Kaggle Challenge Satellite image analysisMnih and Hinton, 2010
Galaxy Challenge Dielman et al. 2015
WaveNet, van den Oord et al. 2016 Image captioning, Vinyals et al. 2015
![Page 24: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/24.jpg)
ATARI game playing, Mnih 2013
ConvNets are everywhere…
AlphaGo, Silver et al 2016
VizDoom StarCraft
….
![Page 25: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/25.jpg)
ConvNets are everywhere…
DeepDream reddit.com/r/deepdream
NeuralStyle, Gatys et al. 2015deepart.io, Prisma, etc.
![Page 26: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/26.jpg)
Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition[Cadieu et al., 2014]
ConvNets ←→ Visual Cortex
![Page 27: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/27.jpg)
Convolutional Neural Networks
</history></context>
<explanation>
![Page 28: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/28.jpg)
[224x224x3]
f 1000 numbers, indicating class scores
training
Only two basic operations are involved throughout:1. Dot products w^Tx2. Max operations max(.)
![Page 29: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/29.jpg)
[224x224x3]
f 1000 numbers, indicating class scores
training
Only two basic operations are involved throughout:1. Dot products w^Tx2. Max operations max(.) parameters
(~10M of them)
![Page 30: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/30.jpg)
preview:
e.g. 200K numbers e.g. 10 numbers
![Page 31: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/31.jpg)
32
32
3
Convolution Layer32x32x3 image
width
height
depth
![Page 32: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/32.jpg)
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”
![Page 33: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/33.jpg)
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”
Filters always extend the full depth of the input volume
![Page 34: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/34.jpg)
32
32
3
Convolution Layer32x32x3 image5x5x3 filter
1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image(i.e. 5*5*3 = 75-dimensional dot product + bias)
![Page 35: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/35.jpg)
32
32
3
Convolution Layer32x32x3 image5x5x3 filter
convolve (slide) over all spatial locations
activation map
1
28
28
![Page 36: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/36.jpg)
32
32
3
Convolution Layer32x32x3 image5x5x3 filter
convolve (slide) over all spatial locations
activation maps
1
28
28
consider a second, green filter
![Page 37: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/37.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We stack these up to get a “new image” of size 28x28x6!
![Page 38: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/38.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.Q: how many parameters would this be if we used a fully connected layer instead?
![Page 39: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/39.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.Q: how many parameters would this be if we used a fully connected layer instead?A: (32*32*3)*(28*28*6) = 14.5M parameters, ~14.5M multiplies
![Page 40: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/40.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.Q: how many parameters are used instead?
![Page 41: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/41.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.Q: how many parameters are used instead? --- And how many multiplies?A: (5*5*3)*6 = 450 parameters
![Page 42: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/42.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.Q: how many parameters are used instead?A: (5*5*3)*6 = 450 parameters, (5*5*3)*(28*28*6) = ~350K multiplies
![Page 43: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/43.jpg)
example 5x5 filters(32 total)
We call the layer convolutional because it is related to convolution of two signals:
elementwise multiplication and sum of a filter and the signal (image)
one filter => one activation map
![Page 44: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/44.jpg)
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions
32
32
3
28
28
6
CONV,ReLUe.g. 6 5x5x3 filters
![Page 45: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/45.jpg)
Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions
32
32
3
CONV,ReLUe.g. 6 5x5x3 filters 28
28
6
CONV,ReLUe.g. 10 5x5x6 filters
CONV,ReLU
….
10
24
24
![Page 46: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/46.jpg)
two more layers to go: POOL/FC
![Page 47: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/47.jpg)
Pooling layer- makes the representations smaller and more manageable - operates over each activation map independently:
![Page 48: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/48.jpg)
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
Single depth slice
x
y
max pool with 2x2 filters and stride 2 6 8
3 4
MAX POOLING
![Page 49: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/49.jpg)
Fully Connected Layer (FC layer)- Contains neurons that connect to the entire input volume, as in ordinary Neural
Networks
![Page 50: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/50.jpg)
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
[ConvNetJS demo: training on CIFAR-10]
![Page 51: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/51.jpg)
Visualizing Activationshttp://yosinski.com/deepvis
YouTube videohttps://www.youtube.com/watch?v=AgkfIQ4IGaM(4min)
![Page 52: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/52.jpg)
Convolutional Neural Networks:Case Study
![Page 53: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/53.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4=>Q: what is the output volume size? Hint: (227-11)/4+1 = 55
![Page 54: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/54.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4=>Output volume [55x55x96]
Q: What is the total number of parameters in this layer?
![Page 55: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/55.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4=>Output volume [55x55x96]Parameters: (11*11*3)*96 = 35K
![Page 56: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/56.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 imagesAfter CONV1: 55x55x96
Second layer (POOL1): 3x3 filters applied at stride 2
Q: what is the output volume size? Hint: (55-3)/2+1 = 27
![Page 57: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/57.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 imagesAfter CONV1: 55x55x96
Second layer (POOL1): 3x3 filters applied at stride 2Output volume: 27x27x96
Q: what is the number of parameters in this layer?
![Page 58: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/58.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 imagesAfter CONV1: 55x55x96
Second layer (POOL1): 3x3 filters applied at stride 2Output volume: 27x27x96Parameters: 0!
![Page 59: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/59.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Input: 227x227x3 imagesAfter CONV1: 55x55x96After POOL1: 27x27x96...
![Page 60: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/60.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Full (simplified) AlexNet architecture:[227x227x3] INPUT[55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0[27x27x96] MAX POOL1: 3x3 filters at stride 2[27x27x96] NORM1: Normalization layer[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2[13x13x256] MAX POOL2: 3x3 filters at stride 2[13x13x256] NORM2: Normalization layer[13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1[6x6x256] MAX POOL3: 3x3 filters at stride 2[4096] FC6: 4096 neurons[4096] FC7: 4096 neurons[1000] FC8: 1000 neurons (class scores)
![Page 61: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/61.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Full (simplified) AlexNet architecture:[227x227x3] INPUT[55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0[27x27x96] MAX POOL1: 3x3 filters at stride 2[27x27x96] NORM1: Normalization layer[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2[13x13x256] MAX POOL2: 3x3 filters at stride 2[13x13x256] NORM2: Normalization layer[13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1[6x6x256] MAX POOL3: 3x3 filters at stride 2[4096] FC6: 4096 neurons[4096] FC7: 4096 neurons[1000] FC8: 1000 neurons (class scores)
Compared to LeCun 1998:
1 DATA:- More data: 10^6 vs. 10^32 COMPUTE:- GPU (~20x speedup)3 ALGORITHM:- Deeper: More layers (8 weight layers)- Fancy regularization (dropout)- Fancy non-linearity (ReLU)4 INFRASTRUCTURE:- CUDA
![Page 62: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/62.jpg)
Case Study: AlexNet[Krizhevsky et al. 2012]
Full (simplified) AlexNet architecture:[227x227x3] INPUT[55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0[27x27x96] MAX POOL1: 3x3 filters at stride 2[27x27x96] NORM1: Normalization layer[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2[13x13x256] MAX POOL2: 3x3 filters at stride 2[13x13x256] NORM2: Normalization layer[13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1[6x6x256] MAX POOL3: 3x3 filters at stride 2[4096] FC6: 4096 neurons[4096] FC7: 4096 neurons[1000] FC8: 1000 neurons (class scores)
Details/Retrospectives: - first use of ReLU- used Norm layers (not common anymore)- heavy data augmentation- dropout 0.5- batch size 128- SGD Momentum 0.9- Learning rate 1e-2, reduced by 10manually when val accuracy plateaus- L2 weight decay 5e-4- 7 CNN ensemble: 18.2% -> 15.4%
![Page 63: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/63.jpg)
Case Study: ZFNet [Zeiler and Fergus, 2013]
AlexNet but:CONV1: change from (11x11 stride 4) to (7x7 stride 2)CONV3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512
ImageNet top 5 error: 15.4% -> 14.8%
![Page 64: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/64.jpg)
Case Study: VGGNet[Simonyan and Zisserman, 2014]
best model
Only 3x3 CONV stride 1, pad 1and 2x2 MAX POOL stride 2
11.2% top 5 error in ILSVRC 2013->7.3% top 5 error
![Page 65: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/65.jpg)
INPUT: [224x224x3] memory: 224*224*3=150K params: 0CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864POOL2: [112x112x64] memory: 112*112*64=800K params: 0CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456POOL2: [56x56x128] memory: 56*56*128=400K params: 0CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824POOL2: [28x28x256] memory: 28*28*256=200K params: 0CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296POOL2: [14x14x512] memory: 14*14*512=100K params: 0CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296POOL2: [7x7x512] memory: 7*7*512=25K params: 0FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000
(not counting biases)
![Page 66: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/66.jpg)
INPUT: [224x224x3] memory: 224*224*3=150K params: 0CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864POOL2: [112x112x64] memory: 112*112*64=800K params: 0CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456POOL2: [56x56x128] memory: 56*56*128=400K params: 0CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824POOL2: [28x28x256] memory: 28*28*256=200K params: 0CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296POOL2: [14x14x512] memory: 14*14*512=100K params: 0CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296POOL2: [7x7x512] memory: 7*7*512=25K params: 0FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000
(not counting biases)
TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)TOTAL params: 138M parameters
![Page 67: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/67.jpg)
INPUT: [224x224x3] memory: 224*224*3=150K params: 0CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864POOL2: [112x112x64] memory: 112*112*64=800K params: 0CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456POOL2: [56x56x128] memory: 56*56*128=400K params: 0CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824POOL2: [28x28x256] memory: 28*28*256=200K params: 0CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296POOL2: [14x14x512] memory: 14*14*512=100K params: 0CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296POOL2: [7x7x512] memory: 7*7*512=25K params: 0FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000
(not counting biases)
TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)TOTAL params: 138M parameters
Note:
Most memory is in early CONV
Most params arein late FC
![Page 68: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/68.jpg)
Case Study: GoogLeNet [Szegedy et al., 2014]
Inception module
ILSVRC 2014 winner (6.7% top 5 error)
![Page 69: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/69.jpg)
Case Study: GoogLeNet
Fun features:
- Only 5 million params!(Removes FC layers completely)
Compared to AlexNet:- 12X less params- 2x more compute- 6.67% (vs. 16.4%)
![Page 70: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/70.jpg)
Slide from Kaiming He’s recent presentation https://www.youtube.com/watch?v=1PGLj-uKT1w
Case Study: ResNet [He et al., 2015]
ILSVRC 2015 winner (3.6% top 5 error)
![Page 71: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/71.jpg)
(slide from Kaiming He’s recent presentation)
![Page 72: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/72.jpg)
![Page 73: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/73.jpg)
Case Study: ResNet[He et al., 2015]
224x224x3
spatial dimension only 56x56!
![Page 74: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/74.jpg)
Identity Mappings in Deep Residual Networks, He et al. 2016
![Page 75: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/75.jpg)
Deep Networks with Stochastic Depth, Huang et al., 2016
“We start with very deep networks but during training, for each mini-batch, randomly drop a subset of layers and bypass them with the identity function.”
x
yThink of layers more like vector fields, nudging the input to the label
![Page 76: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/76.jpg)
Wide Residual Networks, Zagoruyko and Komodakis, 2016
- wide networks with only 16 layers can significantly outperform 1000-layer deep networks- main power of residual networks is in residual blocks, and not in extreme depth - wide residual networks are several times faster to train
Swapout: Learning an ensemble of deep architectures, Singh et al., 2016
- 32 layer wider model performs similar to a 1001 layer ResNet model
FractalNet: Ultra-Deep Neural Networks without Residuals, Larsson et al. 2016
![Page 77: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/77.jpg)
Still an active area of research...Densely Connected Convolutional Networks, Huang et al.ResNet in ResNet, Targ et al.Deeply-Fused Nets, Wang et al. Weighted Residuals for Very Deep Networks, Shen et al.Residual Networks of Residual Networks: Multilevel Residual Networks, Zhang et al....
In large part likely due to open source code available, e.g.:
![Page 78: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/78.jpg)
ASIDE: arxiv-sanity.com plug
![Page 79: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/79.jpg)
Addressing other tasks...
![Page 80: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/80.jpg)
Addressing other tasks...
image CNN features
224x224x3
A block of compute with a few million parameters.
7x7x512
![Page 81: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/81.jpg)
Addressing other tasks...
image CNN features
224x224x3
A block of compute with a few million parameters.
7x7x512
predicted thing
desired thing
![Page 82: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/82.jpg)
Addressing other tasks...
image CNN features
224x224x3
A block of compute with a few million parameters.
7x7x512
predicted thing
desired thing
this part changes from task to task
![Page 83: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/83.jpg)
Image Classificationthing = a vector of probabilities for different classes
image CNN features
224x224x37x7x512
e.g. vector of 1000 numbers giving probabilities for different classes.
fully connected layer
![Page 84: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/84.jpg)
Image Captioning
image CNN features
224x224x37x7x512
A sequence of 10,000-dimensional vectors giving probabilities of different words in the caption.
RNN
![Page 85: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/85.jpg)
Localization
image CNN features
224x224x37x7x512
fully connected layer
Class probabilities(as before)
4 numbers: - X coord- Y coord- Width- Height
![Page 86: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/86.jpg)
Reinforcement Learning
image CNN features
160x210x3
fully connected
e.g. vector of 8 numbers giving probability of wanting to take any of the 8 possible ATARI actions.
Mnih et al. 2015
![Page 87: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/87.jpg)
Segmentation
image CNN features
224x224x37x7x512
deconv layers
224x224x20array of class probabilities at each pixel.
image class “map”
![Page 88: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/88.jpg)
Autoencoders
image CNN features
224x224x37x7x512
deconv layers
224x224x3original image
![Page 89: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/89.jpg)
Variational Autoencoders
image CNN features
224x224x37x7x512
deconv layers
224x224x3original image
reparameterization layer
[Kingma et al.], [Rezende et al.], [Salimans et al.]
![Page 90: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/90.jpg)
Detection
image CNN features
224x224x37x7x512
1x1 CONV
E.g. YOLO: You Only Look Once (Demo: http://pjreddie.com/darknet/yolo/)
7x7x(5*B+C)For each of 7x7 locations:
- [x,y,width,height,confidence]*B- class
![Page 91: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/91.jpg)
Dense Image Captioning
image CNN features
224x224x37x7x512
1x1 CONV
7x7x(5*B+[C,..])For each of 7x7 locations:
- x,y,width,height,confidence- sequence of wordsDenseCap: Fully Convolutional Localization Networks for Dense
Captioning, Johnson et al. 2016
![Page 92: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/92.jpg)
Practical considerations when applying ConvNets
![Page 93: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/93.jpg)
What hardware do I use?
Buy your own machine:- NVIDIA DIGITS DevBox (TITAN X GPUs)- NVIDIA DGX-1 (P100 GPUs)
Build your own machine:https://graphific.github.io/posts/building-a-deep-learning-dream-machine/
GPUs in the cloud:- Amazon AWS (GRID K520 :( )- Microsoft Azure (soon); 4x K80 GPUs- Cirrascale (“rent-a-box”)
![Page 94: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/94.jpg)
What framework do I use?
Caffe TorchTheano
Lasagne
Keras
TensorFlow
MxnetchainerNervana’s NeonMicrosoft’s CNTKDeeplearning4j...
![Page 95: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/95.jpg)
What framework do I use?
Caffe TorchTheano
Lasagne
Keras
TensorFlow
12,3
MxnetchainerNervana’s NeonMicrosoft’s CNTKDeeplearning4j...
![Page 96: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/96.jpg)
Q: How do I know what architecture to use?
![Page 97: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/97.jpg)
Q: How do I know what architecture to use?
A: don’t be a hero. 1. Take whatever works best on ILSVRC (latest ResNet)2. Download a pretrained model3. Potentially add/delete some parts of it4. Finetune it on your application.
![Page 98: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/98.jpg)
Q: How do I know what hyperparameters to use?
![Page 99: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/99.jpg)
Q: How do I know what hyperparameters to use?
A: don’t be a hero.
- Use whatever is reported to work best on ILSVRC. - Play with the regularization strength (dropout rates)
![Page 100: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/100.jpg)
ConvNets in practice: Distributed training
VGG: ~2-3 weeks training with 4 GPUsResNet 101: 2-3 weeks with 4 GPUs
~$1K each
![Page 101: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/101.jpg)
ConvNets in practice: Distributed training
Model parallelismData parallelism
[Large Scale Distributed Deep Networks, Jeff Dean et al., 2013]
![Page 102: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/102.jpg)
ConvNets in practice: pre-fetching threads
CPU-disk bottleneckHard disk is slow to read from=> Pre-processed images stored contiguously in files, read asraw byte stream from SSD disk
CPU-GPU bottleneckCPU data prefetch+augment thread runningwhileGPU performs forward/backward pass
Moving parts lol
![Page 103: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/103.jpg)
Learn more!CS231n
- lecture videos on YouTube- slides- notes- assignments
cs231n.stanford.edu
![Page 104: Andrej Karpathy - 텐서 플로우 블로그 (Tensor · ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge](https://reader033.vdocuments.net/reader033/viewer/2022050108/5f46325ee93c5d62ff41fb3c/html5/thumbnails/104.jpg)
Thank you!