fast r-cnnweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•fine-tuning conv 3 and...
TRANSCRIPT
![Page 1: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/1.jpg)
Fast R-CNNAuthor: Ross Girshick
Speaker: Charlie Liu
Date: Oct, 13th
Girshick, R. (2015). Fast R-CNN. arXiv preprint arXiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee
![Page 2: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/2.jpg)
Result1
60%
61%
62%
63%
64%
65%
66%
67%
mAP
Accuracy
R-CNN Fast R-CNN
Girshick. Fast R-CNNGirshick et. Al. Rich feature hierarchies for accurate object detection and semantic segmentation
1. Dataset: PASCAL VOC 2012
![Page 3: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/3.jpg)
Result1
60%
61%
62%
63%
64%
65%
66%
67%
mAP
Accuracy
R-CNN Fast R-CNN
Girshick. Fast R-CNNGirshick et. Al. Rich feature hierarchies for accurate object detection and semantic segmentation
1. Dataset: PASCAL VOC 2012
0
1
2
3
4
5
6
7
8
9
10
Speed-up
Training Speed-up
R-CNN Fast R-CNN
![Page 4: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/4.jpg)
Result1
60%
61%
62%
63%
64%
65%
66%
67%
mAP
Accuracy
R-CNN Fast R-CNN
Girshick. Fast R-CNNGirshick et. Al. Rich feature hierarchies for accurate object detection and semantic segmentation
1. Dataset: PASCAL VOC 2012
0
1
2
3
4
5
6
7
8
9
10
Speed-up
Training Speed-up
R-CNN Fast R-CNN
0
50
100
150
200
250
Speed-up
Testing Speed-up
R-CNN Fast R-CNN
![Page 5: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/5.jpg)
Object Detection after R-CNN
R-CNN
• 66.0% mAP1
• 1x Test Speed
SPPnet
• 63.1% mAP
• 24x than R-CNN
Fast R-CNN
• 66.6% mAP
• 10x than SPPnet
Girshick. Fast R-CNNHe et. al. Spatial pyramid pooling in deep convolutional networks for visual recognitionGirshick et. al. Rich feature hierarchies for accurate object detection and semantic segmentation
1. mAP based on PASCAL VOC 2007, results from Girshick
![Page 6: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/6.jpg)
R-CNN
Girshick et. al. Rich feature hierarchies for accurate object detection and semantic segmentation
![Page 7: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/7.jpg)
R-CNN Limitations
• Too slow• 13s/image on a GPU
• 53s/image on a CPU
• VGG-Net 7x slower
Girshick et. al. Rich feature hierarchies for accurate object detection and semantic segmentation
![Page 8: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/8.jpg)
R-CNN Limitations
• Too slow
• Proposals need to be warped to a fixed size• Potential loss of accuracy
Girshick et. al. Rich feature hierarchies for accurate object detection and semantic segmentation
![Page 9: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/9.jpg)
R-CNN Limitations
• Cropping may loss the object’s information
• Warping may change the object’s appearance
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Output
FC layers
Conv layers
Crop/Warp
Image
![Page 10: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/10.jpg)
SPPnet: Motivation
• SPP: Spatial Pyramid Pooling
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Output
FC layers
SPP layer
Conv layers
Image
![Page 11: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/11.jpg)
SPPnet: Motivation
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Output
FC layers
SPP layer
Conv layers
Image
• SPP: Spatial Pyramid Pooling
• Fixed size input
• Arbitrary size output
![Page 12: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/12.jpg)
SPPnet: Motivation
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Output
FC layers
SPP layer
Conv layers
Image
• SPP: Spatial Pyramid Pooling
![Page 13: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/13.jpg)
SPPnet: Motivation
R-CNN: 2000 nets on 1 image SPPnet: 1 net on 1 image
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Output
FC layers
SPP layer
Conv layers
Image
![Page 14: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/14.jpg)
SPPnet: Motivation
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Output
FC layers
SPP layer
Conv layers
Image
• SPP Detection
![Page 15: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/15.jpg)
SPPnet: Result
R-CNN SPPnet 1-scale SPPnet 5-scale
mAP 58.5 58.0 59.2
GPU time / img 9s 0.14s 0.38s
Speed-up2 1x 64x 24x
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
VOC 20071
1. mAP based on PASCAL VOC 2007, results from He et al.2. Speed stands for testing speed
![Page 16: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/16.jpg)
SPPnet: Limitations
• High memory consumption
store
Conv5
feature
map
Conv5
feature
map
Conv5
feature
map
conv
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual RecognitionLiliang Zhang, Detection: From R-CNN to Fast R-CNN
![Page 17: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/17.jpg)
SPPnet: Limitations
• High memory consumption
• Training is inefficient• Inefficient back-propagation through SPP
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
![Page 18: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/18.jpg)
SPPnet: Limitations
• High memory consumption
• Training is inefficient• Inefficient back-propagation through SPP
• Multi-stage pipeline
He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
ConvLayers
FC layers Classifier Regressor
![Page 19: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/19.jpg)
Fast R-CNN: What’s New
Girshick et. al. Rich feature hierarchies for accurate object detection and semantic segmentationGirshick. Fast R-CNN
![Page 20: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/20.jpg)
Fast R-CNN: What’s New
Girshick et. al. Rich feature hierarchies for accurate object detection and semantic segmentationGirshick. Fast R-CNN
![Page 21: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/21.jpg)
Fast R-CNN: What’s New
Girshick. Fast R-CNN
• RoI Pooling Layer • ≈ one scale SPP layer
![Page 22: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/22.jpg)
Fast R-CNN: What’s New
Girshick. Fast R-CNN
• Sibling Classifier & Regressor Layers
![Page 23: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/23.jpg)
Fast R-CNN: What’s New
Girshick. Fast R-CNN
![Page 24: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/24.jpg)
Fast R-CNN: What’s New
Girshick. Fast R-CNNLiliang Zhang, Detection: From R-CNN to Fast R-CNN
• Joint Training – one stage framework• Joint them together: feature extractor, classifier, regressor
![Page 25: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/25.jpg)
Fast R-CNN: Advantages
• One fine-tuning stage• Optimizes the classifier and bbox regressor
• Loss function:
• L = Lclassifier + λLregressor
Girshick. Fast R-CNN
![Page 26: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/26.jpg)
Fast R-CNN: Advantages
• One fine-tuning stage
• Fast training• Using mini-batch stochastic gradient descent
• E.g. 2 images * 64 RoIs each
• 64x faster than 128 images * 1 RoI each (strategies of R-CNN, SPPnet)
Girshick. Fast R-CNN
![Page 27: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/27.jpg)
• One fine-tuning stage
• Fast training
• Efficient back-propagation
Girshick. Fast R-CNNLiliang Zhang, Detection: From R-CNN to Fast R-CNN
Fast R-CNN: Advantages
![Page 28: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/28.jpg)
Fast R-CNN: Advantages
• One fine-tuning stage
• Fast training
• Efficient back-propagation
• Scale invariance• In practice, single scale is good enough
• Single scale: faster x10 than SPP-Net
Girshick. Fast R-CNNLiliang Zhang, Detection: From R-CNN to Fast R-CNN
Conv5
feature
map
Conv5
feature
map
Conv5
feature
map
conv
![Page 29: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/29.jpg)
Fast R-CNN: Advantages
• One fine-tuning stage
• Fast training
• Efficient back-propagation
• Scale invariance
• Fast detecting• Truncated SVD1 W ≈ UΣtV
T
• Single FC layer -> two FC layers
• Reduce parameters2 from uv to t(u + v)
Girshick. Fast R-CNNLiliang Zhang, Detection: From R-CNN to Fast R-CNN
1. U: size(u * t); Σ: size(t * t); V: size(v * t); 2. In practice, t << min(u, v)
![Page 30: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/30.jpg)
Fast R-CNN: Advantages
• One fine-tuning stage
• Fast training
• Efficient back-propagation
• Scale invariance
• Fast detecting• Truncated SVD1 W ≈ UΣtV
T
• Single FC layer -> two FC layers
• Reduce parameters2 from uv to t(u + v)
Girshick. Fast R-CNNLiliang Zhang, Detection: From R-CNN to Fast R-CNN
1. U: size(u * t); Σ: size(t * t); V: size(v * t); 2. In practice, t << min(u, v)
![Page 31: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/31.jpg)
Fast R-CNN: Result1
0
1
2
3
4
5
6
7
8
9
10
Training Speed-up
Training Speed-up
R-CNN SPPnet Fast R-CNN
Girshick. Fast R-CNN 1. Results from Girshick
R-CNN SPPnet Fast R-CNN
Train Time (h) 84 25 9.5
Train Speedup 1x 3.4x 8.8x
Test Rate (s/im) 47.0 2.3 0.22
Test Speedup 1x 20x 213x
VOC07 mAP 66.0% 63.1% 66.6%
Results using VGG16
![Page 32: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/32.jpg)
Fast R-CNN: Result1
0
50
100
150
200
250
Training Speed-up
Testing Speed-up
R-CNN SPPnet Fast R-CNN
Girshick. Fast R-CNN 1. Results from Girshick
R-CNN SPPnet Fast R-CNN
Train Time (h) 84 25 9.5
Train Speedup 1x 3.4x 8.8x
Test Rate (s/im) 47.0 2.3 0.22
Test Speedup 1x 20x 213x
VOC07 mAP 66.0% 63.1% 66.6%
Results using VGG16
![Page 33: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/33.jpg)
Fast R-CNN: Result1
Girshick. Fast R-CNN 1. Results from Girshick
R-CNN SPPnet Fast R-CNN
Train Time (h) 84 25 9.5
Train Speedup 1x 3.4x 8.8x
Test Rate (s/im) 47.0 2.3 0.22
Test Speedup 1x 20x 213x
VOC07 mAP 66.0% 63.1% 66.6%
Results using VGG16
61%
62%
63%
64%
65%
66%
67%
mAP
Accuracy
R-CNN SPPnet Fast R-CNN
![Page 34: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/34.jpg)
Fast R-CNN: Discussions
• Which layers to fine-tune?• Fine-tuning conv1: over-runs GPU memory.
• Fine-tuning conv2: mAP +0.3%, 1.3x slower.
• Fine-tuning conv3 and up.
Girshick. Fast R-CNN
![Page 35: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/35.jpg)
Fast R-CNN: Discussions
• Which layers to fine-tune? conv3 and up.
• Does multi-task training help?• Multi-task: potential to improve results
• Tasks influence each other through a shared representation
• mAP +0.8% to 1.1%
Girshick. Fast R-CNN
![Page 36: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/36.jpg)
Fast R-CNN: Discussions
• Which layers to fine-tune? conv3 and up.
• Does multi-task training help? Yes, mAP +0.8% to 1.1%.
• Scale invariance: single scale or multi scale?• Single scale: faster
• Multi scale: accurate
• Best trade-off: single scale
Girshick. Fast R-CNN
![Page 37: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/37.jpg)
Fast R-CNN: Discussions
• Which layers to fine-tune? conv3 and up.
• Does multi-task training help? Yes, mAP +0.8% to 1.1%.
• Scale invariance: single scale or multi scale? Single.
• Do we need more training data?• VOC07: 66.9% -> 70% with augment of VOC12 dataset.
Girshick. Fast R-CNN
![Page 38: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/38.jpg)
Fast R-CNN: Discussions
• Which layers to fine-tune? conv3 and up.
• Does multi-task training help? Yes, mAP +0.8% to 1.1%.
• Scale invariance: single scale or multi scale? Single.
• Do we need more training data? Yes.
• Do SVMs outperform softmax?• SVM: Yes/No; Softmax: 1 vs. all.
• Softmax: mAP +0.1% to 0.8%, competition between classes.
Girshick. Fast R-CNN
![Page 39: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/39.jpg)
Fast R-CNN: Discussions
• Which layers to fine-tune? conv3 and up.
• Does multi-task training help? Yes, mAP +0.8% to 1.1%.
• Scale invariance: single scale or multi scale? Single.
• Do we need more training data? Yes.
• Do SVMs outperform softmax? Softmax, mAP +0.1% to 0.8%.
• Are more proposals always better?
Girshick. Fast R-CNN
![Page 40: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/40.jpg)
Fast R-CNN: Discussions
Girshick. Fast R-CNN
![Page 41: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/41.jpg)
Fast R-CNN: Discussions
• Which layers to fine-tune? conv3 and up.
• Does multi-task training help? Yes, mAP +0.8% to 1.1%.
• Scale invariance: single scale or multi scale? Single.
• Do we need more training data? Yes.
• Do SVMs outperform softmax? Softmax, mAP +0.1% to 0.8%.
• Are more proposals always better? No.
Girshick. Fast R-CNN
![Page 42: Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/charlie1.pdf•Fine-tuning conv 3 and up. Girshick. Fast R-CNN. Fast R-CNN: Discussions •Which layers to fine-tune?](https://reader035.vdocuments.net/reader035/viewer/2022071607/61452a1e34130627ed50cf85/html5/thumbnails/42.jpg)
Questions?Thanks!
Girshick, R. (2015). Fast R-CNN. arXiv ECS 289G 001 Paper Presentation, Oct 13th