object bank presenter ： liu changyu advisor ： prof. alex hauptmann interest ： multimedia...

Object BankObject Bank

Presenter ： Liu ChangyuAdvisor ： Prof. Alex HauptmannInterest ： Multimedia Analysis

April 4th, 2013

CMU - Language Technologies Institute 2

Contents

Introduction Model Algorithm Experiment Conclusion


1. Research Question1) Understanding the meanings and contents of images remains one of the most challenging problems in machine intelligence and statistical learning.2) Also present low-level image features are enough for a variety of visual recognition tasks, but still not enough, especial for the visual tasks which carry semantic meanings. So efficient high level image features are often needed.

Introduction


2. What’s Object Bank?Object bank representation is a novel image representation for high-level visual tasks, which encodes semantic and spatial information of the objects within an image. In object bank, an image is represented as a collection of scale-invariant response maps of a large number of pre-trained generic object detectors.

Introduction


3. Why to use it?Fig.1 illustrates the gradient-based GIST features and texture-based Spatial Pyramid representation of two different scenes (foresty mountain vs. street). But such schemes often fail to offer sufficient discriminative power, as one can see from the very similar image statistics in the examples in this figure.

Introduction

Fig. 1: (Best viewed in colors and magnification.) Comparison of OB) representation with GIST and SIFT-SPM of mountain vs. city street.

.


4. What is it used for?The main goal we want to use object bank is:1) Optimize object bank detection code2) Extend object banks to incorporate more objects.

Introduction


Contents



Fig. 2 Object Bank Model

Model---Object Bank

A large number of object detectors are first applied to an input image at multiple scales. For each object at each scale, a three-level spatial pyramid representation of the resulting object filter map is used; the maximum response for each object in each grid is then computed, resulting in a No:Objects length feature vector for each grid. A concatenation of features in all grids leads to an OB descriptor for the image.


Contents



1) Let represent the design built on the J-dimensional object bank representation of N images;

2) Let denote the binary classification labels of N samples.

3) This leads to the following learning problem: (1)

Where is some non-negative, convex loss

is a regularizer that avoids overfitting.

Algorithm

According Paper [1]Object Bank Algorithm is as follows:


Contents



We want to extend the original Object Bank approach, and do some related experiments as follow steps:1) List and number the needed objects in our experiment, as:Object Names:10200-clock10302-goggles10306-spectacles10477-knife10572-key10577-keyboard10638-desktop computer10658-computer1074-dog10790-printer10887-faucet………………………

Experiment


2) Download the related bounding box from image-net.3) Resizing the original image:The image is resized using the following process. First get the image dimensions(i.e. (a,b)). The ratio for scaling is calculated, using the following: Ratio=400/min(a,b);Therefore, the smaller axe of the image is converted to 400 pixels. This example illustrates that:

Experiment

Fig. 3 Resizing Step


4) Getting HOG features at different scales:

After this rescaling, HOG features are obtained using different scales of the image. Although, they obtain HOG features for more scales, they only use six of these scales. These are the ratios used for resizing the image(previously resized in the previous step)

Ratios:1(image obtained from the previous step)0.7070.50.35350.250.176770.128

Experiment


After resizing the images, then calculate the HOG features for every image. These HOG features are used to obtain the response for every object.Example of HOG features are calculated for one image:

Experiment

Fig. 4 Example of HOG feature


5) Getting the response for the object):

After getting the HOG features, we apply a object specific filter to these features. Each root filter, has two different components. Each of these components works on a different scale. As a result, we have 12 different detection scales because we obtained had 6 different scales from the previous steps, and every filter works at 2 different scales. Consequently, 6*2=12.These filter responses, are stored in a matrixes following the same distribution as the HOG feature in the image d of the previous figure. Then, for the HOG obtained from each ratio we have two different filter responses. Namely, we have 12 HOG responses.

Experiment


6) Getting the spatial pyramids:We have 3 different spatial pyramid levels:

These three spatial pyramid levels are applied to the 12 different responses to one object. In order to select the value for each box, they select the maximum response of the filter for every box. For instance, for the second level, they split the filter response using a grid of 2x2; They pick the maximum response inside every box.As you can see, we have 21( 1 + 2*2 + 4*4) values for every one of the 12 filter responses for one object. Resulting in a total of 12*21= 252 dimensions vector for every object.

Experiment


7) Getting the feature vector for one object):Now, I am going to describe the distribution of the feature vector for one object. We have a vector of 252 dimensions.We start for the diferent scales(Remember that our original image, is the one obtained in the first step)

Experiment


Each one of the chunk for every scale is divided in two pieces, because the used root filter in object bank has two different components that work at different scales. So these 42 dimensions for every scale are splited in two pieces of 21 dimensions.

Experiment


Finally, This is the distribution of the 21 dimensions.

Experiment


Contents



1)It is a feasible method to use. The author used several experiments to demonstrate that Object Bank representation that carries rich semantic level image information is more powerful on scene classification tasks than some other popular methods.

2)We could use and extend this approach according the real situation to do the remain experiment in the near future.

Conclusion


Reference

[1] Level Image Representation for Scene Classification and Semantic Feature Sparsification. Proceedings of the Neural Information Processing Systems (NIPS), 2010. [2] Li-Jia Li, Hao Su, Yongwhan Lim and Li Fei-Fei. Objects as Attributes for Scene Classification. Proceedings of the 12th European Conference of Computer Vision (ECCV), 1st International Workshop on Parts and Attributes, 2010. [3] Sreemanananth Sadanand and Jason J. Corso. Action Bank: A High-Level Representation of Activity in Video. CVPR, 2012.[4]Pedro Felzenszwalb, et. al. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008.


Thank you!Thank you!

object bank presenter ： liu changyu advisor ： prof. alex hauptmann interest ： multimedia...

Documents