adaptive object detection using adjacency and zoom prediction
TRANSCRIPT
Adaptive Object Detection Using Adjacency and Zoom Prediction
Yongxi Lu, Tara Javidi, Svetlana Lazebnik
[arxiv] [code]
Slides by Míriam BellverComputer Vision Group Reading Group, June
21th, 2016
Introduction
IntroductionObject detection algorithm
Region Detector that labels regions
proposals
used to reduce number of regions evaluated by detector
IntroductionEfficient region proposals: learnt end-to-end with DNN (ex. Faster R-CNN)
1) Train class-independent regressors on a small set of predefined anchors.
Multibox: 800 anchors from clustering YOLO: 7x7 grid, RPN: overlapping sliding window
Test-time anchors are not adaptive to the actual content of images
2) Each anchor decides if there is an object and predicts bounding box
Target: adaptative search strategyADAPTATIVE ANCHORS
1. Starts entire image
2. Divides image into subregions until the given region is unlikely to enclose small objects. The decision is made considering the features of actual region.
Anchors? All visited regions, and are used to predict bounding boxes
Object proposals
AZ-Net
Contributions● Adaptively focusing computational resources on the objects of the image
● Evaluated on Pascal VOC 2007 and MSCOCO with similar performance and fewer anchors compared to Fast and Faster RCNN
Accuracy: same as Faster R-CNN
Regions analyzed: Two orders of magnitude fewer anchors on average
Previous Work
Previous work● Adaptive Object Detection
ex. Active Object Localization with Deep Reinforcement Learning
Previous work● Use of anchors regions for proposal generation or detection
1. Regression technique to generate bounding boxes from anchors2. Comparing to other approaches, regions are generated adaptively3. They compare to Faster R-CNN
Comparison to Faster R-CNN
Design of the algorithm
Design of the Algorithm
Adaptive Search AZ-Net
class-independentobject proposals
Object Detector
class-wisedetections
Fast R-CNN detector
Design of the Algorithm
Adaptive Search AZ-Net
class-independentobject proposals
Object Detector
class-wisedetections
Fast R-CNN detector
AZ-Net
Feature extraction of the region
seen
zoom indicator
score > threshold
OBJECT PROPOSALS
adjacency predictionswith score
indicator > threshold
SUBDIVIDE REGION
AZ-Net
Feature extraction of the region
seen
zoom indicator
score > threshold
OBJECT PROPOSALS
adjacency predictionswith score
indicator > threshold
SUBDIVIDE REGION
AZ-Net: Zoom indicatorReasoning: We should do zoom in to a region when it substantially increases the chance of detection
AZ-Net
Feature extraction of the region
seen
zoom indicator
score > threshold
OBJECT PROPOSALS
adjacency predictionswith score
indicator > threshold
SUBDIVIDE REGION
AZ-Net: Adjacency PredictionsThe predictions are based on sub-region priors
Implementation
Implementation
we input 11 adjacency prediction per anchor:
whole image + adjacency predictions
Implementation1) Region sampling from image
2) The region samples should contain hard positive and hard negative
3) Samples-labels are used to train using SGD
Region sampling and Labeling11 prior regions that cover the full ground truth are computed per object
Training of the AZ-net● Zoom prediction is a mid level step to work with adjacency regions
Zoom prediction ---- > Zoom indicator label, in order to make the training diverge
● Noise to the zoom labels
Problem: it could overfit ---- > Some noise added to the zoom label by flipping the ground truth with a probability of 0.3
● Data augmentation
Data augmentation: Horizontally flipped images to the dataset
Loss function
binary cross-entropy L1-loss for
bounding box output
element-wise cross-entropy for
score output
Multitask loss function
Fast R-CNN Detector
Adaptive Search AZ-Net
class-independentobject proposals
Object Detector
class-wisedetections
Results
Qualitative Results
ExperimentsPASCAL VOC 2007
Quality of Region Proposals
AZ-Net proposals are more accurate
Proposals matched to Ground Truth
Recall for number of region proposals
Efficient Adaptative Search
mAP on MSCOCO 15
Conclusions- Accuracy: same as Faster R-CNN
- Regions analyzed: Two orders of magnitude fewer anchors on average
Thank you for your attention! Questions?