perception gap with mid- bridging the robot level visioncli53/papers/chi_isrr15_slides.pdfcopyright...

Copyright © 2015 G.D. Hager

Bridging the Robot Perception Gap With Mid-Level Vision

Chi Li, Jonathan Bohren, Gregory D. HagerLaboratory for Computational Sensing and RoboticsThe Johns Hopkins University


State of the Art Vision

Convolutional architectures such as deep CNN are designed for object classification in natural images

- Large number of classes- Large variations in scale

appearance and background- But ….

- Minor occlusions- Sensitive to 3D rotations [1]- Detection, but not pose

[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.


Vision Meets Manipulation

- Textureless objects

- Objects in contact

- Need accurate pose

But …

- Small number of classes (often task-directed)

- Consistent environment


Our Prior Work on RGB-D Instance Recognition



Our Prior Work on RGB-D Instance Recognition


UW-RGBD Dataset

JHU Tools Dataset


This Paper: Making it Work for Robots


Semantic Segmentation 1:Extract local features (e.g. CSHOT) and encode them using

learned dictionary. Feature codes are pooled in color domains to form higher-level representations.


Semantic Segmentation 2:Build an Integral Image for each pooling region and compute

pooled features of each sliding window for subsequent semantic classification.


Filter on Pose in Each Class

Papazov and Burschka. An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In ACCV, 2010.


Testing Methodology

- Developed an LN-66 dataset - contains 66 scenes with various complex

configurations of the “link” and “node” textureless objects.

- 614 testing frames in total- Background has been subtracted by plane removal and

pass through filtering in each frame.

- Object model (SVM) is trained over the corresponding partial views in JHUIT-50.


Qualitative Examples


Effect of Segmentation


Quantitative Analysis

NS: no segmentation; S: our segmentation; GS: groundtruth segmentation;B: standard ObjRecRANSAC; GB: greedy-batch variant; GO: greedy-one

variant

No segmentation

Oursegmentation

Ground truthsegmentation


Running Time

- Semantic segmentation runs around 1s

- CPU-based ObjRecRANSAC 1-10 Sec. However, the CUDA-based implementation of ObjRecRANSAC runs at 4~5Hz.

- Full CUDA implementation would be << 1 s


Current Progress & Future Work

- Background/Foreground classification from training data

- More advanced features to distinguish objects with similar appearances

- A more effective method to collect sub-global patterns (supervoxels and their higher order sets)

- Cuda-based implementation


Questions?

This work is supported by the National Science Foundation under Grant No. NRI-1227277. Bohren is supported by a NASA graduate fellowship.

perception gap with mid- bridging the robot level visioncli53/papers/chi_isrr15_slides.pdfcopyright...

Documents