recovering surface layout from a single image d. hoiem, a.a. efros, m. hebert robotics institute,...
TRANSCRIPT
Recovering Surface Layout from a Single Image
D. Hoiem, A.A. Efros, M. HebertRobotics Institute, CMU
Presenter: Derek HoiemCS 598, Spring 2009
Jan 29, 2009
Why worry about 3d scenes?
Reason 1: We may want to interact with the scene
Navigation Manipulation
4
Reason 2: We need context
5
Reason 2: We need context
2D Object Detection
What the 2D Detector Sees
Computers need context tooTrue
Detection
True Detections
MissedMissed
False Detections
Local Detector: [Dalal-Triggs 2005]
9
Context in Image Space
[Kumar Hebert 2005][Torralba Murphy Freeman 2004]
[He Zemel Cerreira-Perpiñán 2004]
We need 3d info to reason about 3d relationships
Close
Not Close
How to represent scene space?
How to represent scene space?
Holistic Scene Space: “Gist”
Oliva & Torralba 2001
Torralba & Oliva 2002
How to represent scene space?
Depth Map
Saxena, Chung & Ng 2005, 2007
Gibson’s Surface Layout
slide from Aude Oliva
• Gibson: “The elementary impressions of a visual world are those of surface and edge.” The Perception of the Visual World (1950)• Focus on texture gradients
Surface Layout (Gibson cont.)
slide from Aude Oliva
Gibson’s Surface Layout
Surface Layout (Gibson cont.)
slide from Aude Oliva
Gibson’s Surface Layout
Marr’s 2½D Sketch
Marr’s 2½-D Sketch
Figs from Aude Oliva slide
Surface Layout (this paper)
Goal: Label image into 7 Geometric Classes:• Support• Vertical
– Planar: facing Left (), Center ( ), Right ()– Non-planar: Solid (X), Porous or wiry (O)
• Sky
Our Main Challenge
• Recovering 3D geometry from single 2D projection
• Infinite number of possible solutions!
…
Our World is Structured
Abstract World Our World
Image Credit (left): F. Cunin and M.J. Sailor, UCSD
Most Early Work Tried to Manually Specify the Structure
• Hansen & Riseman 1978 (VISIONS)• Barrow & Tenenbaum 1978 (Intrinsic Images)• Brooks 1979 (ACRONYM)• Marr 1982 (2½ D Sketch)
Ohta & Kanade 1978Guzman 1968
Learn the Structure of the World
…
Infer Most Likely Scene
Unlikely Likely
1. Use All Available Cues
Vanishing points, lines
Color, texture, image location
Texture gradient
Use All Available Cues
2. Get Good Spatial Support
50x50 Patch50x50 Patch
Image Segmentation
• Single segmentation won’t work
• Solution: multiple segmentations
…
…
…
For each segment:
- Get P(good segment | data) P(label | good segment, data)
Labeling Segments
Image Labeling
…
Labeled Segmentations
Labeled Pixels
segments
datasegmentgoodlabelPdatasegmentgoodPdatalabelP ),|()|()|(
30
…
Gray?
High inImage?
Many LongLines?
Yes
No
NoNo
No
Yes Yes
Yes
Very High Vanishing
Point?
High in Image?
Smooth? Green?
Blue?
Yes
No
NoNo
No
Yes Yes
Yes
Decision Trees + AdaboostDecision Trees + Adaboost
Ground Vertical Sky
Collins et al. 2002
Surface Confidence Maps
P(Support) P(Vertical) P(Sky)
P(Planar Left) P(Planar Center) P(Planar Right)
P(Non-Planar Porous) P(Non-Planar Solid)
Test Image
Experiments: Input Image
Experiments: Ground Truth
Experiments: Our Result
Surface Estimates: Outdoor
Input Image Ground Truth Our Result
Avg. Accuracy
Main Class: 88%
Subclass: 62%
Input Image Ground Truth Our Result
Surface Estimates: Outdoor
Input Image Ground Truth Our Result
Surface Estimates: Outdoor
Surface Estimates: Paintings
Input Image Our Result
Surface Estimates: Indoor
Avg. Accuracy
Main Class: 93%
Subclass: 76%
Input Image Ground Truth Our Result
Failures: Reflections and Shadows
Input Image Our Result
Average Accuracy
Main Class: 88%
Subclasses: 61%
Importance of Many Cues
All Position Only
Color Only
Texture Only
Perspective Only
Main 88% 83% 72% 80% 68%
Subclass 61% 43% 43% 55% 52%
All All But Position
All But Color
All But Texture
All But Perspective
Main 88% 84% 87% 87% 88%
Subclass 61% 60% 60% 58% 57%
Importance of Many Cues
Spatial Support Matters
Automatic Photo Popup
Labeled Image Fit Ground-Vertical Boundary with Line
Segments
Form Segments into Polylines
Cut and Fold
Final Pop-up Model
[Hoiem Efros Hebert 2005]
video
Surfaces Not Enough – Need Occlusion Reasoning
Image Surface Labels 3D Model
Surfaces + Occlusions + Objects = Better 3D Models
Surfaces Occlusions
Objects and Viewpoint
SupportHorizon, Object Maps
Surface Maps
Depth, Boundaries
Boundaries
Horizon, O
bject Maps
Viewpoint/Size Reasoning
video 2
Contributions• General principles
– Learn the structure of the world– Use all available cues– Spatial support matters– Use redundancy to deal with unreliable processes
(segmentation)
• Results include entire spread of failure and success
• First work to convincingly demonstrate single-view reconstruction
Criticisms• Still just 2D pattern recognition?
• Not clear how to generalize to arbitrary 3d angles
• Restricted to visible portion of scene
• Coarse layout: not clear if applicable to personal space or object shapes
Ideas for improvement
• Try improving features (e.g., add bag of words)
• Extend to characterize object shapes?
• Combine this surface-based layout with depth estimates from Saxena et al.
Discussion• Use for context (Eamon)• Multiple segmentations (Duan, Sanketh)• Subcategories (Duan, Sanketh)• Global info, use of object knowledge (Binbin)• Combination with multiview cues (Mani)• Landmarks (Gang)
Thank you
Things to cover when you present
• Background• Overview of method• Results• Things you like• Things you don’t• Ideas for improvement• Address bulletin board postings