perception gap with mid- bridging the robot level visioncli53/papers/chi_isrr15_slides.pdfcopyright...

17
Copyright © 2015 G.D. Hager Bridging the Robot Perception Gap With Mid- Level Vision Chi Li, Jonathan Bohren, Gregory D. Hager Laboratory for Computational Sensing and Robotics The Johns Hopkins University

Upload: others

Post on 27-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Bridging the Robot Perception Gap With Mid-Level Vision

Chi Li, Jonathan Bohren, Gregory D. HagerLaboratory for Computational Sensing and RoboticsThe Johns Hopkins University

Page 2: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

State of the Art Vision

Convolutional architectures such as deep CNN are designed for object classification in natural images

- Large number of classes- Large variations in scale

appearance and background- But ….

- Minor occlusions- Sensitive to 3D rotations [1]- Detection, but not pose

[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.

Page 3: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Vision Meets Manipulation

- Textureless objects

- Objects in contact

- Need accurate pose

But …

- Small number of classes (often task-directed)

- Consistent environment

Page 4: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Our Prior Work on RGB-D Instance Recognition

[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.

Page 5: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Our Prior Work on RGB-D Instance Recognition

[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.

UW-RGBD Dataset

JHU Tools Dataset

Page 6: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

This Paper: Making it Work for Robots

Page 7: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Semantic Segmentation 1:Extract local features (e.g. CSHOT) and encode them using

learned dictionary. Feature codes are pooled in color domains to form higher-level representations.

Page 8: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Semantic Segmentation 2:Build an Integral Image for each pooling region and compute

pooled features of each sliding window for subsequent semantic classification.

Page 9: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Semantic Segmentation 2:Build an Integral Image for each pooling region and compute

pooled features of each sliding window for subsequent semantic classification.

Page 10: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Filter on Pose in Each Class

Papazov and Burschka. An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In ACCV, 2010.

Page 11: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Testing Methodology

- Developed an LN-66 dataset - contains 66 scenes with various complex

configurations of the “link” and “node” textureless objects.

- 614 testing frames in total- Background has been subtracted by plane removal and

pass through filtering in each frame.

- Object model (SVM) is trained over the corresponding partial views in JHUIT-50.

Page 12: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Qualitative Examples

Page 13: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Effect of Segmentation

Page 14: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Quantitative Analysis

NS: no segmentation; S: our segmentation; GS: groundtruth segmentation;B: standard ObjRecRANSAC; GB: greedy-batch variant; GO: greedy-one

variant

No segmentation

Oursegmentation

Ground truthsegmentation

Page 15: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Running Time

- Semantic segmentation runs around 1s

- CPU-based ObjRecRANSAC 1-10 Sec. However, the CUDA-based implementation of ObjRecRANSAC runs at 4~5Hz.

- Full CUDA implementation would be << 1 s

Page 16: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Current Progress & Future Work

- Background/Foreground classification from training data

- More advanced features to distinguish objects with similar appearances

- A more effective method to collect sub-global patterns (supervoxels and their higher order sets)

- Cuda-based implementation

Page 17: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep

Copyright © 2015 G.D. Hager

Questions?

This work is supported by the National Science Foundation under Grant No. NRI-1227277. Bohren is supported by a NASA graduate fellowship.