adaptive object detection using adjacency and zoom prediction

Adaptive Object Detection Using Adjacency and Zoom Prediction

Yongxi Lu, Tara Javidi, Svetlana Lazebnik

[arxiv] [code]

Slides by Míriam BellverComputer Vision Group Reading Group, June

21th, 2016

http://arxiv.org/pdf/1512.07711v2.pdf

https://github.com/luyongxi/az-net

https://github.com/imatge-upc/readcv

Introduction

IntroductionObject detection algorithm

Region Detector that labels regions

proposals

used to reduce number of regions evaluated by detector

IntroductionEfficient region proposals: learnt end-to-end with DNN (ex. Faster R-CNN)

1) Train class-independent regressors on a small set of predefined anchors.

Multibox: 800 anchors from clustering YOLO: 7x7 grid, RPN: overlapping sliding window

Test-time anchors are not adaptive to the actual content of images

2) Each anchor decides if there is an object and predicts bounding box

Target: adaptative search strategyADAPTATIVE ANCHORS

1. Starts entire image

2. Divides image into subregions until the given region is unlikely to enclose small objects. The decision is made considering the features of actual region.

Anchors? All visited regions, and are used to predict bounding boxes

Object proposals

AZ-Net

Contributions● Adaptively focusing computational resources on the objects of the image

● Evaluated on Pascal VOC 2007 and MSCOCO with similar performance and fewer anchors compared to Fast and Faster RCNN

Accuracy: same as Faster R-CNN

Regions analyzed: Two orders of magnitude fewer anchors on average

Previous Work

Previous work● Adaptive Object Detection

ex. Active Object Localization with Deep Reinforcement Learning

Previous work● Use of anchors regions for proposal generation or detection

1. Regression technique to generate bounding boxes from anchors2. Comparing to other approaches, regions are generated adaptively3. They compare to Faster R-CNN

Comparison to Faster R-CNN

Design of the algorithm

Design of the Algorithm

Adaptive Search AZ-Net

class-independentobject proposals

Object Detector

class-wisedetections

Fast R-CNN detector

AZ-Net

Feature extraction of the region

seen

zoom indicator

score > threshold

OBJECT PROPOSALS

adjacency predictionswith score

indicator > threshold

SUBDIVIDE REGION

AZ-Net: Zoom indicatorReasoning: We should do zoom in to a region when it substantially increases the chance of detection

AZ-Net

Feature extraction of the region

seen

zoom indicator

score > threshold

OBJECT PROPOSALS

adjacency predictionswith score

indicator > threshold

SUBDIVIDE REGION

AZ-Net: Adjacency PredictionsThe predictions are based on sub-region priors

Implementation

Implementation

we input 11 adjacency prediction per anchor:

whole image + adjacency predictions

Implementation1) Region sampling from image

2) The region samples should contain hard positive and hard negative

3) Samples-labels are used to train using SGD

Region sampling and Labeling11 prior regions that cover the full ground truth are computed per object

Training of the AZ-net● Zoom prediction is a mid level step to work with adjacency regions

Zoom prediction ---- > Zoom indicator label, in order to make the training diverge

● Noise to the zoom labels

Problem: it could overfit ---- > Some noise added to the zoom label by flipping the ground truth with a probability of 0.3

● Data augmentation

Data augmentation: Horizontally flipped images to the dataset

Loss function

binary cross-entropy L1-loss for

bounding box output

element-wise cross-entropy for

score output

Multitask loss function

Fast R-CNN Detector

Adaptive Search AZ-Net

class-independentobject proposals

Object Detector

class-wisedetections

Results

Qualitative Results

ExperimentsPASCAL VOC 2007

Quality of Region Proposals

AZ-Net proposals are more accurate

Proposals matched to Ground Truth

Recall for number of region proposals

Efficient Adaptative Search

mAP on MSCOCO 15

Conclusions- Accuracy: same as Faster R-CNN

- Regions analyzed: Two orders of magnitude fewer anchors on average

Thank you for your attention! Questions?

adaptive object detection using adjacency and zoom prediction

Technology