data-driven 3d primitives for single image...

Data-Driven 3D Primitives for Single Image Understanding David Fouhey, Abhinav Gupta, Martial Hebert

Qualitative Results

Quantitative Results

Cross-Dataset Results

Input Ground-Truth

Manhattan-World Techniques

Non-Manhattan-World Techniques

Lee ‘09

Hedau ‘10

3DP Karsch ‘12

Saxena ‘08

Hoiem ‘07

Singh ‘12

3DP

Mean (o) 44.9 41.2 33.5 40.8 47.1 41.2 35.0 33.0

Median (o) 34.6 25.5 18.0 37.8 42.3 34.8 32.4 28.3

RMSE (o) 54.8 55.1 46.6 46.9 56.3 49.3 40.6 40.0

Pct. <11.25o 24.8 33.2 34.7 7.9 11.2 9.0 11.2 18.8

Pct. <22.5o 40.5 47.7 55.0 25.8 28.0 31.7 32.1 40.7

Pct. <30o 46.7 53.0 61.2 38.2 37.4 43.9 45.8 52.4

PETS (No ground truth normals)

UIUC (No ground truth normals)

B3DO (State of the art performance)

Hig

h B

ette

r Lo

w B

ette

r

Task: 3D Understanding

Train on NYU, test on other data with identical settings.

Code Available!

Input Sparse Dense Input Sparse Dense Input Sparse Dense

4-way split on NYU Depth v2 Per-pixel evaluation criterion: angular error

Performance (Error) vs. Coverage (% Pixels Predicted)

3D Primitives

Sparse Results 3D Primitives RF+SIFT Karsch et al. Hoiem et al.

Dense Results

Qualitative Comparison

Trihedral Primitives

tinyurl.com/3DPrimitives

(NYU Depth v2 Dataset)

Mean Error % Pixels < 22.5 (Precision)

What are the right primitives?

Our answer: any region that is Visually Discriminative

Geometrically Informative

Dihedral Primitives

Planar Primitives

Many-plane Primitives

Objects and Parts

Geometric Consistency Enforces: Informative

Misclassification Loss Enforces: Discriminative

Hard to optimize directly – use iterative approach Alternates discriminative (learning detector) and

informative (updating canonical form)

Learning

Inference Sparse: Transfer canonical form Dense: Transfer patch context

+ +… =

Detections Averaged patch contexts Results

Regularization

Detections Primitives Results

Goal: Discover primitives in large-scale RGBD Data

Components: Detector (w) Canonical Form (N) Primitive Instances (y)

Initialization (y): Cluster hundreds of thousands of random patches in normal and HOG space.

Formulation

Search: Scan training set for top detections Average: Average per-pixel surface normals SVM: Train a linear SVM to detect instances

Iterative Solution

Lee et al.

Results

Average

SVM

Search

N

y w

Already in use at CMU as a feature!

(NYU v2)

data-driven 3d primitives for single image...

Documents