data-driven 3d primitives for single image...
TRANSCRIPT
Data-Driven 3D Primitives for Single Image Understanding David Fouhey, Abhinav Gupta, Martial Hebert
Qualitative Results
Quantitative Results
Cross-Dataset Results
Input Ground-Truth
Manhattan-World Techniques
Non-Manhattan-World Techniques
Lee ‘09
Hedau ‘10
3DP Karsch ‘12
Saxena ‘08
Hoiem ‘07
Singh ‘12
3DP
Mean (o) 44.9 41.2 33.5 40.8 47.1 41.2 35.0 33.0
Median (o) 34.6 25.5 18.0 37.8 42.3 34.8 32.4 28.3
RMSE (o) 54.8 55.1 46.6 46.9 56.3 49.3 40.6 40.0
Pct. <11.25o 24.8 33.2 34.7 7.9 11.2 9.0 11.2 18.8
Pct. <22.5o 40.5 47.7 55.0 25.8 28.0 31.7 32.1 40.7
Pct. <30o 46.7 53.0 61.2 38.2 37.4 43.9 45.8 52.4
PETS (No ground truth normals)
UIUC (No ground truth normals)
B3DO (State of the art performance)
Hig
h B
ette
r Lo
w B
ette
r
Task: 3D Understanding
Train on NYU, test on other data with identical settings.
Code Available!
Input Sparse Dense Input Sparse Dense Input Sparse Dense
4-way split on NYU Depth v2 Per-pixel evaluation criterion: angular error
Performance (Error) vs. Coverage (% Pixels Predicted)
3D Primitives
Sparse Results 3D Primitives RF+SIFT Karsch et al. Hoiem et al.
Dense Results
Qualitative Comparison
Trihedral Primitives
tinyurl.com/3DPrimitives
(NYU Depth v2 Dataset)
Mean Error % Pixels < 22.5 (Precision)
What are the right primitives?
Our answer: any region that is Visually Discriminative
Geometrically Informative
Dihedral Primitives
Planar Primitives
Many-plane Primitives
Objects and Parts
Geometric Consistency Enforces: Informative
Misclassification Loss Enforces: Discriminative
Hard to optimize directly – use iterative approach Alternates discriminative (learning detector) and
informative (updating canonical form)
Learning
Inference Sparse: Transfer canonical form Dense: Transfer patch context
+ +… =
Detections Averaged patch contexts Results
Regularization
Detections Primitives Results
Goal: Discover primitives in large-scale RGBD Data
Components: Detector (w) Canonical Form (N) Primitive Instances (y)
Initialization (y): Cluster hundreds of thousands of random patches in normal and HOG space.
Formulation
Search: Scan training set for top detections Average: Average per-pixel surface normals SVM: Train a linear SVM to detect instances
Iterative Solution
Lee et al.
Results
Average
SVM
Search
N
y w
Already in use at CMU as a feature!
(NYU v2)