mid and high-level features for dense monocular slam and high-level features for dense monocular...

Mid and high-level features for dense monocular SLAM

Javier Civera Qualcomm Augmented Reality Lecture Series

Nov. 19th, 2015

mailto:[email protected]


Index

Introduction/motivation

Point-based monocular SLAM

Keypoint-based monocular SLAM

Dense monocular SLAM

Mid-level features

Superpixels

Data-driven primitives

High-level features

Room Layout

Objects.

• Robotic Vision is making a robot “see” ** • Now… what is to see for a robot? • Data input:

• Image sequences. • Multi-sensor. • Active sensing.

• Problem constraints: • Real-time. • Hardware limits.

• Goals: • Autolocation. • 3D scene models. • Temporal models. • Local short-term accuracy. • Long-term models. • Semantics.

Robotic Vision

** Paraphrasing Olivier Faugeras in Hartley & Zisserman’s book

Other applications

• The robotics constraints are shared with other applications.

• AR/VR. • Wearable/mobile devices. • Laparoscopic surgery. • …

Grasa et al., Visual SLAM for Hand-Held Monocular Endoscope, IEEE TMI, 2014

videos/project_tango/tango.mp4

videos/Place IKEA furniture in your home with augmented reality.mp4

videos/garcia_etal_TMI13.mp4

Point-based features (low-level)

• Point-based features are accurate in high-texture image regions and for high-parallax motions.

• The typical approach has been to use salient point features, discarding low-texture parts.

• SfM and Visual SLAM datasets are biased to high-parallax motions.

C2

• Camera is a bearing-only sensor: it only measures angles.

• The depth of the scene is estimated by triangulation.

• The depth estimation is based on the parallax angle.

• The larger the parallax, the more accurate the depth estimation

?

PARALLAX ANGLE

tc1c2 C1

Z

Y

X

pi

Camera Geometry

• Low parallax is due to: • Distant points • Small camera translation

• Depth cannot be estimated for zero parallax points... • ... but provide rich orientation information

Low-Parallax Points

W

ii ,m

WCr

parallax angle

WCWCqr ,

C

i

i

i

z

y

xW

r

ii

i

i

i

i

z

y

x

,1

m

scene point i

i

d

1

i

i

i

z

y

x

i

i

i

i

i

i

i

z

y

x

y

New Points added from 1st observation: 1) {x, y, z, θ, φ} initialized from 1st

observation and state vector 2) ρ0 and covariance σρ0 initialized so that

[ρ0-2 σρ0, ρ0+2 σρ0] includes infinity min0 /12 d

20

0 0

0

1

i

i

i

z

y

x

00 2

1

ii ,m

INVERSE DEPTH SPACE

EUCLIDEAN SPACE

Inverse Depth Point Initialization

W

ii ,m

WCr

parallax angle

WCWCqr ,

C

i

i

i

z

y

xW

r

ii

i

i

i

i

z

y

x

,1

m

scene point i

i

d

1

i

i

i

z

y

x

Projection Model

1

1

Distortion Radial Parameters Two

Model Camera Pinhole

Frame Reference Camera

22

4

2

2

1

4

2

2

1

,

ydyxdxd

ddydy

ddxdx

u

u

u

zC

yC

y

zC

xC

x

u

u

u

ii

WC

i

i

i

i

CWC

CvdCudr

rrCvC

rrCuC

v

u

h

hfC

h

hfC

v

u

z

y

x

h

h

mrRh

i

i

i

i

i

i

i

z

y

x

y

Inverse Depth Point Measurement

Feature 3

Feature 11

Inverse Depth Parameterization

videos/03 - civera_tro08_outdoors.avi

10 votes 1 votes 8 votes

Outlier!!

n

Pm

11log

1log2n

1) RANDOM SAMPLES

2) PARTIAL UPDATE

3) RESCUE INLIERS

Standard RANSAC: 1D example

High innovation

n

Pm

11log

1logsamples! less ,lower 1 mn

1) RANDOM SAMPLES

11 votes 3 votes 8 votes

2) PARTIAL UPDATE

3) RESCUE INLIERS

1-Point RANSAC: 1D example

Outlier

Inlier

650 metres trajectory; 24180 images

ERROR : ~1% of the trajectory

length

Experimental Results for Large Trajectories

.

RAWSEEDS datasets: http://www.rawseeds.org

Camera+ wheel odometry,1310 metres, 54000 frames(~30 min video)

http://www.rawseeds.org/

http://www.rawseeds.org/

videos/04 - jfr10_monocular_plus_odometry.avi

Feature-based stereo SLAM

• SPTAM: Stereo Parallel Tracking and Mapping • ~1,35% translation error • 10th position in KITTI (small differences with the previous ones) • 1st one with stereo code available

Taihú Pire, Thomas Fischer, Javier Civera, Pablo de Cristóforis, Julio César Jacobo Berlles, Stereo Parallel Tracking and Mapping for Robot Localization, IROS 2015. CODE AVAILABLE AT https://github.com/lrse/sptam

videos/Stereo Parallel Tracking and Mapping (S-PTAM) in the KITTI dataset (sequence 00).mp4

How useful is a sparse map for a robot?

How useful is a sparse map for a robot?

Not enough for navigation

Not enough for high-level tasks. E.g., “bring me a book from Henry’s table”

At least I have an accurate robot motion…

Dense mapping: RGB-D sensors

But… • RGB-D sensors do not in direct sunlight

• RGB-D sensors do not work in every surface

• Minimum distance (~0,5 metres) and maximum distance (4-8 metres) • Size, weight, power consumption…

• Minimize the photometric error and a regularization term.

Dense monocular mapping

videos/DTAM_ Dense Tracking and Mapping in Real-Time.mp4

Dense monocular mapping High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Keypoint-based

Dense

videos/DTAM_ Dense Tracking and Mapping in Real-Time.mp4

Dense Mapping: High Texture

High Texture Low Texture


Dense

videos/alejo/media2.avi


Dense Mapping: Low Texture



Dense



Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.

Superpixels (mid-level)



Keypoint-based

Dense

Superpixels

Dense + Sup.

• Image segmentation based on color and 2D distance.

• Decent features for textureless areas • We assume that homogeneous color

regions are almost planar.



Dense

Dense Mapping: Low Texture


Keypoint-Based Mapping: Low Texture



Keypoint-based

Superpixels: Low Texture



Superpixels

Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.

Superpixel Initialization

H

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014

Multiview model: Homography (h)

Error: Contour reprojection error (ɛ)

Montecarlo Initialization: For every superpixel we create h reasonable hypothesis and rank them by their error.

Superpixel Mapping


Multiview model: Homography (h)

Error: Contour reprojection error (ɛ)

Mapping: Minimize the reprojection error.

H

Superpixels in low-textured areas



Superpixels


Using Superpixels in Monocular SLAM


videos/ICRA14_0321_VI_i.mp4

Dense + Superpixels

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

Dense + Superpixels



Dense + Sup.


PMVS (high-gradient pixels) Dense (TV-regularization)

Superpixels PMVS + Superpixels Dense + Superpixels

Video (input)

Dense + Superpixels


Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):13621376, 2010.

Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. Dtam: Dense tracking and mapping in real-time. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 23202327. IEEE, 2011.


videos/concha_etal_rss14.mp4

videos/concha_etal_rss14.mp4



Semidense mapping + superpixels

• TV-regularization is expensive, GPU might be needed for real-time. • Semidense mapping and superpixels is a reasonable option cheaper than

TV-regularization (CPU) and with a small loss on density. • Having a semidense map superpixels can be initialized via SVD more

accurately and at a lower cost.

Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015. Code to be released soon! https://github.com/alejocb/dpptam

videos/iros15_video.mp4

Semidense mapping + superpixels

• The SVD superpixels are more accurate than the triangulated ones.

• The SVD superpixels are as accurate as the semidense map.

• Large errors in dense reconstructions!!

• Superpixels improve the error of dense reconstructions.

• A reasonable solution is to filter out low parallax points.

[3] is Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014 (ours) is Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015.

Monocular – Inertial Dense SLAM

• Integrating the inertial measurements gives the real scale of the reconstruction.

ICRA 2016 submission!

videos/ICRA16 1479 VI i.mp4

Now, how useful is this dense map for a robot?

Good enough for navigation

Not enough for high-level tasks. E.g., “bring me a book from Henry’s table” We are more resilient to low texture, we still need parallax…

Data-driven primitives (mid-level)

David F. Fouhey, Abhinav Gupta, and Martial Hebert. Data-driven 3D primitives for single image understanding. ICCV, 2013.

Feature discovery on RGB-D training data.

Extracts patterns that are consistent in D and discriminative in RGB

At test time, from a single RGB view we can predict mid-level depth patterns.

Multiview Layout (high-level) (a) Sparse/Semidense reconstruction. (b) Plane normals from 3D vanishing points (image VP, backprojection, 3D clustering). (c) Plane distances from a sparse/semidense multiview reconstruction. (d) Superpixel segmentation, geometric and photometric feature extraction. (e), (f) Classification (Adaboost)


Superpixels and Layout


Superpixels, Data-Driven Primitives and Layout

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.

• NYU dataset, high-parallax sequences

Superpixels, Data-Driven Primitives and Layout

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.

• NYU dataset, low-parallax sequences

The layout can prevent tracking loss!

Marta Salas, Wajahat Hussain, Alejo Concha, Luis Montano, Javier Civera, J. M. M. Montiel, Layout Aware Visual Tracking and Mapping, IROS 2015.

videos/IROS_Final.mp4

Object features (high-level)

videos/11 - iros11_desktop.avi

Conclusions: vSLAM features and performance

Point-based features (low-level)

High accuracy if high texture and high parallax.

Superpixels (mid-level)

High accuracy if low texture and high parallax.

Data-driven primitives (mid-level)

Decent accuracy even for low texture and low parallax.

The patterns should be discovered in the training data.

Layout (high-level)

Decent accuracy even for low texture and low parallax.

The layout patterns should appear in the image.

Objects (high-level)

High accuracy for object instances, decent accuracy for object categories.

The object should appear in the image.

Acknowledgments

J. M. M. Montiel, Andrew J. Davison, Alejo Concha, Wajahat Hussain, L. Montano, L. Montesano, J. Sola, T. Vidal-Calleja, A. C. Murillo, O. G. Grasa, D. R. Bueno, A. Agudo, D. Galvez-Lopez, L. Riazuelo, Taihú Pire, Jorge Romeo, J. D. Tardos, J. Neira, J. A. Castellanos, Marta Salas, A. Argiles, Chema Fácil, Jesús Oliva, Vittorio Ferrari, Alessandro Prest, Christian Leistner, Cordelia Schmid, Ian Reid, Brian Williams, Margarita Chli, Paulo Drews Jr, Mario Campos, Martial Hebert, Javier Mínguez, María López, Roboearth Consortium (TU/e, Philips, Universität Stuttgart, ETHZ, TUM), IGLU consortium (Univ. Montreal, Inria Bordeaux, Univ. Mons, KTH, Univ. Lille)…

Funding: CICYT DPI2003-07986, DPI2006-13578, DPI2009-07130, DPI2012-32168, PCIN-2015-122, EU RAWSEEDS project FP6-045144, EU RoboEarth project FP7-248942, DGA-CAI IT12-06, DGA-CAI IT 26/10, SNSF IZK0Z2-136096.

Thank you!

Javier Civera (+34) 876 55 55 54 [email protected]

https://plus.google.com/+JavierCivera http://www.youtube.com/user/jciveravision

https://twitter.com/jcivera http://www.linkedin.com/in/jcivera http://webdiis.unizar.es/~jcivera/


mid and high-level features for dense monocular slam and high-level features for dense monocular...

Documents