mid and high-level features for dense monocular slam and high-level features for dense monocular...
TRANSCRIPT
Mid and high-level features for dense monocular SLAM
Javier Civera Qualcomm Augmented Reality Lecture Series
Nov. 19th, 2015
Index
Introduction/motivation
Point-based monocular SLAM
Keypoint-based monocular SLAM
Dense monocular SLAM
Mid-level features
Superpixels
Data-driven primitives
High-level features
Room Layout
Objects.
• Robotic Vision is making a robot “see” ** • Now… what is to see for a robot? • Data input:
• Image sequences. • Multi-sensor. • Active sensing.
• Problem constraints: • Real-time. • Hardware limits.
• Goals: • Autolocation. • 3D scene models. • Temporal models. • Local short-term accuracy. • Long-term models. • Semantics.
Robotic Vision
** Paraphrasing Olivier Faugeras in Hartley & Zisserman’s book
Other applications
• The robotics constraints are shared with other applications.
• AR/VR. • Wearable/mobile devices. • Laparoscopic surgery. • …
Grasa et al., Visual SLAM for Hand-Held Monocular Endoscope, IEEE TMI, 2014
Point-based features (low-level)
• Point-based features are accurate in high-texture image regions and for high-parallax motions.
• The typical approach has been to use salient point features, discarding low-texture parts.
• SfM and Visual SLAM datasets are biased to high-parallax motions.
C2
• Camera is a bearing-only sensor: it only measures angles.
• The depth of the scene is estimated by triangulation.
• The depth estimation is based on the parallax angle.
• The larger the parallax, the more accurate the depth estimation
?
PARALLAX ANGLE
tc1c2 C1
Z
Y
X
pi
Camera Geometry
• Low parallax is due to: • Distant points • Small camera translation
• Depth cannot be estimated for zero parallax points... • ... but provide rich orientation information
Low-Parallax Points
W
ii ,m
WCr
parallax angle
WCWCqr ,
C
i
i
i
z
y
xW
r
ii
i
i
i
i
z
y
x
,1
m
scene point i
i
d
1
i
i
i
z
y
x
i
i
i
i
i
i
i
z
y
x
y
New Points added from 1st observation: 1) {x, y, z, θ, φ} initialized from 1st
observation and state vector 2) ρ0 and covariance σρ0 initialized so that
[ρ0-2 σρ0, ρ0+2 σρ0] includes infinity min0 /12 d
20
0 0
0
1
i
i
i
z
y
x
00 2
1
ii ,m
INVERSE DEPTH SPACE
EUCLIDEAN SPACE
Inverse Depth Point Initialization
W
ii ,m
WCr
parallax angle
WCWCqr ,
C
i
i
i
z
y
xW
r
ii
i
i
i
i
z
y
x
,1
m
scene point i
i
d
1
i
i
i
z
y
x
Projection Model
1
1
Distortion Radial Parameters Two
Model Camera Pinhole
Frame Reference Camera
22
4
2
2
1
4
2
2
1
,
ydyxdxd
ddydy
ddxdx
u
u
u
zC
yC
y
zC
xC
x
u
u
u
ii
WC
i
i
i
i
CWC
CvdCudr
rrCvC
rrCuC
v
u
h
hfC
h
hfC
v
u
z
y
x
h
h
mrRh
i
i
i
i
i
i
i
z
y
x
y
Inverse Depth Point Measurement
10 votes 1 votes 8 votes
Outlier!!
n
Pm
11log
1log2n
1) RANDOM SAMPLES
2) PARTIAL UPDATE
3) RESCUE INLIERS
Standard RANSAC: 1D example
High innovation
n
Pm
11log
1logsamples! less ,lower 1 mn
1) RANDOM SAMPLES
11 votes 3 votes 8 votes
2) PARTIAL UPDATE
3) RESCUE INLIERS
1-Point RANSAC: 1D example
Outlier
Inlier
650 metres trajectory; 24180 images
ERROR : ~1% of the trajectory
length
Experimental Results for Large Trajectories
.
RAWSEEDS datasets: http://www.rawseeds.org
Camera+ wheel odometry,1310 metres, 54000 frames(~30 min video)
Feature-based stereo SLAM
• SPTAM: Stereo Parallel Tracking and Mapping • ~1,35% translation error • 10th position in KITTI (small differences with the previous ones) • 1st one with stereo code available
Taihú Pire, Thomas Fischer, Javier Civera, Pablo de Cristóforis, Julio César Jacobo Berlles, Stereo Parallel Tracking and Mapping for Robot Localization, IROS 2015. CODE AVAILABLE AT https://github.com/lrse/sptam
How useful is a sparse map for a robot?
How useful is a sparse map for a robot?
Not enough for navigation
Not enough for high-level tasks. E.g., “bring me a book from Henry’s table”
At least I have an accurate robot motion…
Dense mapping: RGB-D sensors
But… • RGB-D sensors do not in direct sunlight
• RGB-D sensors do not work in every surface
• Minimum distance (~0,5 metres) and maximum distance (4-8 metres) • Size, weight, power consumption…
• Minimize the photometric error and a regularization term.
Dense monocular mapping
Dense monocular mapping High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Keypoint-based
Dense
Dense Mapping: High Texture
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Dense
Dense Mapping: Low Texture
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Dense
Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.
Superpixels (mid-level)
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Keypoint-based
Dense
Superpixels
Dense + Sup.
• Image segmentation based on color and 2D distance.
• Decent features for textureless areas • We assume that homogeneous color
regions are almost planar.
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Dense
Dense Mapping: Low Texture
Keypoint-Based Mapping: Low Texture
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Keypoint-based
Superpixels: Low Texture
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Superpixels
Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.
Superpixel Initialization
H
Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014
Multiview model: Homography (h)
Error: Contour reprojection error (ɛ)
Montecarlo Initialization: For every superpixel we create h reasonable hypothesis and rank them by their error.
Superpixel Mapping
Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014
Multiview model: Homography (h)
Error: Contour reprojection error (ɛ)
Mapping: Minimize the reprojection error.
H
Superpixels in low-textured areas
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Superpixels
Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014
Using Superpixels in Monocular SLAM
Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014
Dense + Superpixels
Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.
Dense + Superpixels
High Texture Low Texture
Accuracy Density Cost Accuracy Density Cost
Dense + Sup.
Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.
PMVS (high-gradient pixels) Dense (TV-regularization)
Superpixels PMVS + Superpixels Dense + Superpixels
Video (input)
Dense + Superpixels
Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014
Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):13621376, 2010.
Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. Dtam: Dense tracking and mapping in real-time. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 23202327. IEEE, 2011.
Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.
Semidense mapping + superpixels
• TV-regularization is expensive, GPU might be needed for real-time. • Semidense mapping and superpixels is a reasonable option cheaper than
TV-regularization (CPU) and with a small loss on density. • Having a semidense map superpixels can be initialized via SVD more
accurately and at a lower cost.
Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015. Code to be released soon! https://github.com/alejocb/dpptam
Semidense mapping + superpixels
• The SVD superpixels are more accurate than the triangulated ones.
• The SVD superpixels are as accurate as the semidense map.
• Large errors in dense reconstructions!!
• Superpixels improve the error of dense reconstructions.
• A reasonable solution is to filter out low parallax points.
[3] is Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014 (ours) is Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015.
Monocular – Inertial Dense SLAM
• Integrating the inertial measurements gives the real scale of the reconstruction.
ICRA 2016 submission!
Now, how useful is this dense map for a robot?
Good enough for navigation
Not enough for high-level tasks. E.g., “bring me a book from Henry’s table” We are more resilient to low texture, we still need parallax…
Data-driven primitives (mid-level)
David F. Fouhey, Abhinav Gupta, and Martial Hebert. Data-driven 3D primitives for single image understanding. ICCV, 2013.
Feature discovery on RGB-D training data.
Extracts patterns that are consistent in D and discriminative in RGB
At test time, from a single RGB view we can predict mid-level depth patterns.
Multiview Layout (high-level) (a) Sparse/Semidense reconstruction. (b) Plane normals from 3D vanishing points (image VP, backprojection, 3D clustering). (c) Plane distances from a sparse/semidense multiview reconstruction. (d) Superpixel segmentation, geometric and photometric feature extraction. (e), (f) Classification (Adaboost)
Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.
Superpixels and Layout
Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.
Superpixels, Data-Driven Primitives and Layout
Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.
• NYU dataset, high-parallax sequences
Superpixels, Data-Driven Primitives and Layout
Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.
• NYU dataset, low-parallax sequences
The layout can prevent tracking loss!
Marta Salas, Wajahat Hussain, Alejo Concha, Luis Montano, Javier Civera, J. M. M. Montiel, Layout Aware Visual Tracking and Mapping, IROS 2015.
Object features (high-level)
Conclusions: vSLAM features and performance
Point-based features (low-level)
High accuracy if high texture and high parallax.
Superpixels (mid-level)
High accuracy if low texture and high parallax.
Data-driven primitives (mid-level)
Decent accuracy even for low texture and low parallax.
The patterns should be discovered in the training data.
Layout (high-level)
Decent accuracy even for low texture and low parallax.
The layout patterns should appear in the image.
Objects (high-level)
High accuracy for object instances, decent accuracy for object categories.
The object should appear in the image.
Acknowledgments
J. M. M. Montiel, Andrew J. Davison, Alejo Concha, Wajahat Hussain, L. Montano, L. Montesano, J. Sola, T. Vidal-Calleja, A. C. Murillo, O. G. Grasa, D. R. Bueno, A. Agudo, D. Galvez-Lopez, L. Riazuelo, Taihú Pire, Jorge Romeo, J. D. Tardos, J. Neira, J. A. Castellanos, Marta Salas, A. Argiles, Chema Fácil, Jesús Oliva, Vittorio Ferrari, Alessandro Prest, Christian Leistner, Cordelia Schmid, Ian Reid, Brian Williams, Margarita Chli, Paulo Drews Jr, Mario Campos, Martial Hebert, Javier Mínguez, María López, Roboearth Consortium (TU/e, Philips, Universität Stuttgart, ETHZ, TUM), IGLU consortium (Univ. Montreal, Inria Bordeaux, Univ. Mons, KTH, Univ. Lille)…
Funding: CICYT DPI2003-07986, DPI2006-13578, DPI2009-07130, DPI2012-32168, PCIN-2015-122, EU RAWSEEDS project FP6-045144, EU RoboEarth project FP7-248942, DGA-CAI IT12-06, DGA-CAI IT 26/10, SNSF IZK0Z2-136096.
Thank you!
Javier Civera (+34) 876 55 55 54 [email protected]
https://plus.google.com/+JavierCivera http://www.youtube.com/user/jciveravision
https://twitter.com/jcivera http://www.linkedin.com/in/jcivera http://webdiis.unizar.es/~jcivera/