learning mid-level features for recognition
DESCRIPTION
Learning Mid-Level Features For Recognition. Y-Lan Boureau, Francis Bach, Yann LeCun and Jean Ponce. Published in CVPR 2010. Presented by Bo Chen, 8.20,2010. Outline. 1. Classification System 2. Brief introduction of each step 3. Systematic evaluation of unsupervised mid-level features - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/1.jpg)
Learning Mid-Level Features For Recognition
Y-Lan Boureau, Francis Bach, Yann LeCun and Jean Ponce
Presented by Bo Chen, 8.20,2010
Published in CVPR 2010
![Page 2: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/2.jpg)
Outline
• 1. Classification System• 2. Brief introduction of each step• 3. Systematic evaluation of unsupervised
mid-level features• 4. Learning discriminative dictionaries• 5. Average and max pooling• 6. Conclusions
![Page 3: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/3.jpg)
System Flow Chart
Subsampling sliding patches to cover all of details
Robust features invariant to some conditions
Generate mid-level feature, such as sparse coding,vector quantization and deep network
Max-pooling or average pooling
Linear or nonlinear SVM
Patches
SIFT
Coding
Pooling
Classifier
SPM Spatial pyramid model
![Page 4: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/4.jpg)
Scale Invariant Feature Transform (D. Lowe, IJCV,2004)
Motivations: Image matching, scale invariance, rotation invariance, illumination invariance and viewpoint invariance
Figures from David Lee’s ppt
![Page 5: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/5.jpg)
Calculate SIFT Descriptors
Figures from Jason Clemons’s ppt
Divide a 16x16 patch into 4 subregions, 8 bins in each subregion which leads to a 4x4x8=128 dimensional vector. (low-level)
![Page 6: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/6.jpg)
Notations
Question: How can we represent each region?
Figure from S. Lazebnik et.al, CVPR06
![Page 7: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/7.jpg)
• Vector quantization (Bag-of-features)
Coding and Pooling
Or
• Sparse Coding
![Page 8: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/8.jpg)
Systematic evaluation of unsupervised mid-level features
![Page 9: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/9.jpg)
Macrofeatures and Denser SIFT Sampling
Parameterizations:1. SIFT sampling density 2. macrofeature side length3. subsampling parameter
Results:Caltech101: 75.7% (4, 2, 4) Scene: 84.3% (8, 2, 1)
![Page 10: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/10.jpg)
Results
![Page 11: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/11.jpg)
Discriminative Dictionaries
Algorithm: stochastic gradient descentCons: high computational complexitySolutions: 1.approximate z(n) by pooling over a random sample of ten locations of the image.2. Update only a random subset of coordinates at each iteration.
Scenes dataset
![Page 12: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/12.jpg)
Average and Max Pooling• Why pooling? Pooling is used to achieve invariance to image transformations,more compact representations, and better robustness to noise andclutter so as to preserves important information while discarding irrelevant detail, the crux of the matter being to determine what falls in which category.• Max-pooling vs. Average-pooling The authors show that using max pooling on hard vector quantized features in a spatial pyramid brings the performance of linear classification to the level of that obtained by Lazebnik et al. (2006) with an intersection kernel, even though the resulting feature isbinary.• Our feeling Pooling helps the learned codes sparse, which follows the humanvisual function. Especially, for convolutional deep network, pooling appears very necessary since there is correlations between neighborcontents.
Part conclusion from Y-Lan Boureau et. al, ICML 2010
![Page 13: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/13.jpg)
Theoretical Comparison of Average and Max PoolingExperimental methodology: Binary classification (positive and negative)
![Page 14: Learning Mid-Level Features For Recognition](https://reader036.vdocuments.net/reader036/viewer/2022081520/56816806550346895ddd8785/html5/thumbnails/14.jpg)
Conclusions• 1. Give a comprehensive and systematic comparison
across each step of mid-level feature extraction through several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks.
• 2. Supervised dictionary learning method for sparse coding
• 3. Theoretical and empirical insight into the remarkable performance of max pooling.