robust scale estimation in real-time monocular sfm for ...Β Β· approach: adaptive cue fusion cue 1:...
TRANSCRIPT
Monocular SFM
Robust Scale Estimation in Real-Time Monocular SFM for Autonomous Driving Shiyu Song and Manmohan Chandraker, NEC Laboratories America
Motivation
Results
Results in the KITTI Visual Odometry Dataset
Method
Scale Drift Correction using
Ground Plane Estimation
sd1
sd2
sh
Use height from ground
to resolve scale s
(R, st)
We can compute both 3D points and camera motion,
up to unknown scale factor.
Approach: Adaptive Cue Fusion
Cue 1: Dense Stereo
π = (π1, π2, π3)π
Frame k Frame k+1
β Camera
Ground Plane
Homography Mapping
(1 β πSAD)π,βmin
Cue 2: 3D Points
R, T
Frame k Frame k+1
π = (π1, π2, π3)π
β
Camera
Ground Plane
Ground Plane
Camera
β π
z y
q = { exp (βπββππ2 )
πβ π
}π max
Cue 3: Object
β
π = (π1, π2, π3)π
ππ
ππ
Object height estimation
minπ3(βπ β β π)
2
Highlights
β’ Highly accurate monocular SFM with comparable accuracy to stereo.
β’ Ground plane estimation through different cues, such as dense stereo, 3D points and object detection.
β’ A novel data-driven framework that adaptively combines multiple cues based on per-frame observation covariances estimated by rigorously trained models.
β’ Scale drift is corrected by the optimal fused estimates, which yields high accuracy and robustness.
LIDAR Stereo
Camera Monocular
Camera
Cost $70,000 $1,000 $100
Maintenance Hard Hard Easy
Comparisons of LIDAR, stereo and monocular System
KITTI Visual Odometry Benchmark
Examples of 1D Gaussian fits
Learned linear model to relate the observation variance to the underlying variable, the Gaussian variance
Feature matching and triangulation
(a) Histogram of data. (b) Learned linear model to relate the observation variance to the underlying variable, q.
minπ΄π,ππ,Ξ£π
( π΄ππβ12(ππβππ)Ξ£π
β1(ππβππ)
π
π=1
β πΌπ)2
π
π=1
Ξ£π =
ππ₯2 β β β
β ππ¦2 β β
β β ππ€π2 β
β β β πβπ2
,
Dense Stereo
3D Points
Trained Model 1
Trained Model 2
β, ππ,β β, π’π,β
Object Detection
Trained Model 2
π3, ππ,π3 π3, π’π,π3
β, ππ ,β
π1, ππ ,π1
π3, ππ ,π3
β, π’π ,β π1, π’π ,π1
π3, π’π ,π3
πππ =ππ¦πβπππ¦ + πβπ
(a) Histogram of data. (b) Learned
linear model to relate the observation
variance to the underlying variable, πππ.
Top Monocular Performance Summary
Camera
Center
Ambiguous
3D Points
Image Plane
2D Image
Scale
Ambiguity
Challenge
Stereo
Systems
Our Monocular
System
Prior Monocular
System
Fig. Height error relative to ground truth. The effectiveness of our data fusion is shown by less spikiness in the filter output and a far lower error.
Fig. (a) Average error in ground plane estimation. (b) Percent number of frames where height error is lower than 7%.
Ground Height
Object Localization
Comparison of 3D object localization errors for calibrated ground, stereo cue only, fixed covariance fusion and adaptive covariance fusion of stereo and detection cues.
Adaptive Fusion
β π
z y