spp-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than r-cnn, as...
TRANSCRIPT
![Page 1: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/1.jpg)
SPP-netSpatial Pyramid Poolingin Deep ConvolutionalNetworks
![Page 2: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/2.jpg)
Highlights
• ILSVRC 2014 (all provided-data tracks)
• DET -2nd
• CLS - 3rd
• LOC - 5th
• ECCV 2014 paper
• Published 2 months ago (arXiv:1406.4729v1, June18)
• Details disclosed (arXiv:1406.4729v2)
![Page 3: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/3.jpg)
Overview
• SPP-net- a new network structure
• Classification- improves all CNNs
• Detection- 20-60x faster than R-CNN, asaccurate
![Page 4: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/4.jpg)
Spatial PyramidMatching
• SPM: very successful in traditional computer vision[Grauman & Darrell, ICCV 2005] “The Pyramid Match Kernel: Discriminative Classification with Sets ofImage Features”
[Lazebnik et al, CVPR 2006] “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural SceneCategories”
denseSIFT encoded
(VQ, SC,FV)SPM SVM
prediction
“fc layers”simply pooling?“conv layers”CNN
counterparts
![Page 5: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/5.jpg)
SPP-net: SPM inCNN
1000
4096 4096
traditional
CNN
fixedsize conv fc
SPP-net
anysize
1000
4096 4096
spatialpyramid
pooling
• Fix bin numbers
• DO NOT fix binsize
![Page 6: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/6.jpg)
SPP-net
• variable input size/scale• multi-size training
• multi-scale testing
• full-image view
• multi-level pooling• robust to deformation
• operates on featuremaps• pooling in regions
conv feature maps
conv layers
input image
concatenate
…...
…...
spatial pyramid poolinglayer
fc layers
![Page 7: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/7.jpg)
14.76
13.92
13.52
11.97
14.14
13.54
11.12
13.64
13.33
12.80
12.33
10.95
10.50
10.00
11.00
11.50
12.00
12.50
13.00
13.50
14.00
14.50
15.00
ZF-5 Convnet*-5 Overfeat-5 Overfeat-7
ILSVRC top-5 val (10-view)
no-SPPbaselines
+ multi-size training
multi-level pooling
All CNNs
improved!
4architectures
![Page 8: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/8.jpg)
ILSVRC 2014 CLSResults
• “shallow”• 7-conv, 1 Titan GPU, 3weeks
• but potential• SPP can improve deeper nets: >1% gain post-competition
team top-5 test
GoogLeNet 6.66
Oxford VGG 7.32
ours 8.06
Howard 8.11
DeeperVision 9.50
NUS-BST 9.79
TTIC_ECP 10.22
…
7-conv SPP-net,10-view 10.95%
7-conv SPP-net,9m6u-vltiei-wsc+a2le-f/uvlilew 9.08%
multiple SPP-nets 8.06%
![Page 9: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/9.jpg)
Detection: SPP onRegions
SPP
conv feature maps
conv layers
input image
region
fc layers
…...
![Page 10: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/10.jpg)
RCNN vs.SPP
• image regions vs. feature mapregions
SPP-net
1 net on fullimage
image
net
feature
featurefeature
net
image
net
feature
net
feature
net
feature
feature
R-CNN
2000 nets on image regions
![Page 11: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/11.jpg)
• With regional features, we can do everything ofRCNN• fine-tune, SVM, bbox regression…
• similar accuracy, much faster
SPP-net1-scale
SPP-net5-scale
RCNN
mAP 58.0 59.2 58.5
GPU time / img 0.14s 0.38s 9s
speed-up 64x 24x -
VOC2007
![Page 12: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/12.jpg)
SPP-net RCNN
GPU time / img 0.6s 32s
40k test imgs 8 hours 15days
cost of a singlemodel
ILSVRC 2014 DETResults
“provided data” track
mAP
NUS 37.2
ours, multiSPP-nets 35.1
UvA 32.0
ours, 1 SPP-net 31.8
Southeast-CASIA 30.4
1-HKUST 28.8
CASIA_CRIPAC_2 28.6
![Page 13: SPP-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than R-CNN, as accurate. Spatial PyramidMatching ... SPP-net 1-scale SPP-net 5-scale RCNN mAP 58.0](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e252f5a3fc72661963cf4ce/html5/thumbnails/13.jpg)
• Conclusion• SPM inCNNs
• CLS: improve all CNNs in the literature
• DET: practical, fast, andaccurate
• Futurework• SPP on advancednetworks
• Resources•code, config, tech report… http://research.microsoft.c
om/en-us/um/people/kahe/
• Acknowledgement• We thank NVIDIA for the GPUdonation.