mobilenet - pr044
TRANSCRIPT
![Page 1: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/1.jpg)
MobileNets:Efficient Convolutional Neural Networks for
Mobile Vision Applications
29th October, 2017PR12 Paper Review
Jinwon LeeSamsung Electronics
![Page 2: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/2.jpg)
MobileNets
![Page 3: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/3.jpg)
Key Requirements for Commercial Computer Vision Usage• Data-centers(Clouds)
Rarely safety-critical
Low power is nice to have
Real-time is preferable
• Gadgets – Smartphones, Self-driving cars, Drones, etc. Usually safety-critical(except smartphones)
Low power is must-have
Real-time is required
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola
![Page 4: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/4.jpg)
What’s the “Right” Neural Network for Use in a Gadget?• Desirable Properties
Sufficiently high accuracy
Low computational complexity
Low energy usage
Small model size
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola
![Page 5: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/5.jpg)
Why Small Deep Neural Networks?
• Small DNNs train faster on distributed hardware
• Small DNNs are more deployable on embedded processors
• Small DNNs are easily updatable Over-The-Air(OTA)
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola
![Page 6: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/6.jpg)
Techniques for Small Deep Neural Networks
• Remove Fully-Connected Layers
• Kernel Reduction ( 3x3 1x1 )
• Channel Reduction
• Evenly Spaced Downsampling
• Depthwise Separable Convolutions
• Shuffle Operations
• Distillation & Compression
Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola
![Page 7: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/7.jpg)
Key Idea : Depthwise Separable Convolution!
![Page 8: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/8.jpg)
Recap – Convolution Operation
1 1 1 1 01 1 1 1 00 0 0 1 10 1 1 1 00 1 1 0 0
0 1 1 0 10 1 1 0 00 0 1 1 00 0 1 1 11 1 1 0 0
1 1 1 0 00 1 1 1 00 0 1 1 10 0 1 1 00 1 1 0 0
-1 0 00 1 00 0 -1
0 -1 0-1 1 -10 -1 0
1 0 10 1 01 0 1
-1 0 -10 1 00 0 -1
0 -1 0-1 1 00 -1 0
1 0 10 -1 01 0 1
1 -1 10 -1 -13 1 0
3 0 1-2 0 20 2 3
=convolution
Input channel : 3 Output channel : 2# of filters : 2
![Page 9: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/9.jpg)
Recap – VGG, Inception-v3
• VGG – use only 3x3 convolution Stack of 3x3 conv layers has same effective receptive
field as 5x5 or 7x7 conv layer
Deeper means more non-linearities
Fewer parameters: 2 x (3 x 3 x C) vs (5 x 5 x C)
regularization effect
• Inception-v3 Factorization of filters
![Page 10: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/10.jpg)
Why should we always consider all channels?
![Page 11: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/11.jpg)
Standard Convolution
Depthwise convolution
Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
![Page 12: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/12.jpg)
Depthwise Separable Convolution
• Depthwise Convolution + Pointwise Convolution(1x1 convolution)
Depthwise convolution Pointwise convolution
Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
![Page 13: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/13.jpg)
Standard Convolution vs Depthwise Separable Convolution
![Page 14: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/14.jpg)
Standard Convolution vs Depthwise Separable Convolution• Standard convolutions have the computational cost of
DK x DK x M x N x DF x DF
• Depthwise separable convolutions cost DK x DK x M x DF x DF + M x N x DF x DF
• Reduction in computations 1 / N + 1 / DK
2
If we use 3x3 depthwise separable convolutions, we get between 8 to 9 times less computations
DK : width/height of filtersDF : width/height of feature mapsM : number of input channelsN : number of output channels(number of filters)
![Page 15: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/15.jpg)
Depthwise Separable Convolutions
![Page 16: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/16.jpg)
Model Structure
![Page 17: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/17.jpg)
Width Multiplier & Resolution Multiplier
• Width Multiplier – Thinner Models For a given layer and width multiplier α, the number of input channels M
becomes αM and the number of output channels N becomes αN – where αwith typical settings of 1, 0.75, 0.6 and 0.25
• Resolution Multiplier – Reduced Representation The second hyper-parameter to reduce the computational cost of a neural
network is a resolution multiplier ρ
0<ρ≤1, which is typically set of implicitly so that input resolution of network is 224, 192, 160 or 128(ρ = 1, 0.857, 0.714, 0.571)
• Computational cost:
DK x DK x αM x ρDF x ρDF + αM x αN x ρDF x ρDF
![Page 18: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/18.jpg)
Width Multiplier & Resolution Multiplier
![Page 19: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/19.jpg)
Experiments – Model Choices
![Page 20: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/20.jpg)
Model Shrinking Hyperparameters
![Page 21: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/21.jpg)
Model Shrinking Hyperparameters
![Page 22: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/22.jpg)
Results
![Page 23: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/23.jpg)
Results
PlaNet : 52M parameters, 5.74B mult-addsMobilNet : 13M parameters, 0.58M mult-adds
![Page 24: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/24.jpg)
Results
![Page 25: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/25.jpg)
Results
![Page 26: MobileNet - PR044](https://reader034.vdocuments.net/reader034/viewer/2022051710/5a650cd17f8b9aa2548b64b5/html5/thumbnails/26.jpg)
Tensorflow Implementation
https://github.com/Zehaos/MobileNet/blob/master/nets/mobilenet.py