perception gap with mid- bridging the robot level visioncli53/papers/chi_isrr15_slides.pdfcopyright...
TRANSCRIPT
Copyright © 2015 G.D. Hager
Bridging the Robot Perception Gap With Mid-Level Vision
Chi Li, Jonathan Bohren, Gregory D. HagerLaboratory for Computational Sensing and RoboticsThe Johns Hopkins University
Copyright © 2015 G.D. Hager
State of the Art Vision
Convolutional architectures such as deep CNN are designed for object classification in natural images
- Large number of classes- Large variations in scale
appearance and background- But ….
- Minor occlusions- Sensitive to 3D rotations [1]- Detection, but not pose
[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.
Copyright © 2015 G.D. Hager
Vision Meets Manipulation
- Textureless objects
- Objects in contact
- Need accurate pose
But …
- Small number of classes (often task-directed)
- Consistent environment
Copyright © 2015 G.D. Hager
Our Prior Work on RGB-D Instance Recognition
[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.
Copyright © 2015 G.D. Hager
Our Prior Work on RGB-D Instance Recognition
[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.
UW-RGBD Dataset
JHU Tools Dataset
Copyright © 2015 G.D. Hager
This Paper: Making it Work for Robots
Copyright © 2015 G.D. Hager
Semantic Segmentation 1:Extract local features (e.g. CSHOT) and encode them using
learned dictionary. Feature codes are pooled in color domains to form higher-level representations.
Copyright © 2015 G.D. Hager
Semantic Segmentation 2:Build an Integral Image for each pooling region and compute
pooled features of each sliding window for subsequent semantic classification.
Copyright © 2015 G.D. Hager
Semantic Segmentation 2:Build an Integral Image for each pooling region and compute
pooled features of each sliding window for subsequent semantic classification.
Copyright © 2015 G.D. Hager
Filter on Pose in Each Class
Papazov and Burschka. An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In ACCV, 2010.
Copyright © 2015 G.D. Hager
Testing Methodology
- Developed an LN-66 dataset - contains 66 scenes with various complex
configurations of the “link” and “node” textureless objects.
- 614 testing frames in total- Background has been subtracted by plane removal and
pass through filtering in each frame.
- Object model (SVM) is trained over the corresponding partial views in JHUIT-50.
Copyright © 2015 G.D. Hager
Qualitative Examples
Copyright © 2015 G.D. Hager
Effect of Segmentation
Copyright © 2015 G.D. Hager
Quantitative Analysis
NS: no segmentation; S: our segmentation; GS: groundtruth segmentation;B: standard ObjRecRANSAC; GB: greedy-batch variant; GO: greedy-one
variant
No segmentation
Oursegmentation
Ground truthsegmentation
Copyright © 2015 G.D. Hager
Running Time
- Semantic segmentation runs around 1s
- CPU-based ObjRecRANSAC 1-10 Sec. However, the CUDA-based implementation of ObjRecRANSAC runs at 4~5Hz.
- Full CUDA implementation would be << 1 s
Copyright © 2015 G.D. Hager
Current Progress & Future Work
- Background/Foreground classification from training data
- More advanced features to distinguish objects with similar appearances
- A more effective method to collect sub-global patterns (supervoxels and their higher order sets)
- Cuda-based implementation
Copyright © 2015 G.D. Hager
Questions?
This work is supported by the National Science Foundation under Grant No. NRI-1227277. Bohren is supported by a NASA graduate fellowship.