cs5670: computer vision · what is stereo vision? • generic problem formulation: given several...

82
Multi-view stereo CS5670: Computer Vision Noah Snavely / Zhengqi Li

Upload: others

Post on 30-Dec-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view stereo

CS5670: Computer VisionNoah Snavely / Zhengqi Li

Page 2: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Recommended Reading

Szeliski Chapter 11.6

Multi-View Stereo: A Tutorial

Furukawa and Hernandez, 2015

http://www.cse.wustl.edu/~furukawa/papers/fnt_mvs.pdf

Page 3: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view Stereo

Stereo Multi-view stereo

What is stereo vision?

Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape

Page 4: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view Stereo

CMU’s 3D Room

Point Grey’s Bumblebee XB3

Point Grey’s ProFusion 25

Page 5: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view Stereo

Figures by Carlos Hernandez

Input: calibrated images from several viewpoints

Output: 3D object model

Page 6: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

What is stereo vision?• Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape.

• “Images of the same object or scene”

• Arbitrary number of images (from two to thousands)

• Arbitrary camera positions (camera network or video sequence)

• Calibration may be initially unknown

• “Representation of 3D shape Representation of 3D shape ”

• Depth maps

• Meshes

• Point clouds

• Patch clouds

• Volumetric models

• Layered models

Page 7: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Fua Narayanan, Rander, KanadeSeitz, Dyer

1995 1997 1998

Faugeras, Keriven

1998

Hernandez, Schmitt Pons, Keriven, Faugeras Furukawa, Ponce

2004 2005 2006

Goesele et al.

2007

Furukawa et al., 2010

Page 8: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

https://www.youtube.com/watch?v=ofHFOr2nRxU

https://www.youtube.com/watch?v=NdeD4cjLI0c&t=64s

Page 9: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Applications

Page 10: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 11: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 12: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 13: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 14: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 15: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 16: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

https://code.facebook.com/posts/1755691291326688/introducing-facebook-surround-360-an-open-high-quality-3d-360-video-capture-system?hc_location=ufi

Page 17: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view stereo: Basic idea

Source: Y. Furukawa

Page 18: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view stereo: Basic idea

Source: Y. Furukawa

Page 19: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view stereo: Basic idea

Source: Y. Furukawa

Page 20: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view stereo: Basic idea

Source: Y. Furukawa

Page 21: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 22: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 23: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

There are many variations on this problem.

Page 24: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Why MVS?

• Different points on the object’s surface will be more clearly visible in some subset of cameras

–Could have high-res closeups of some regions

–Some surfaces are foreshortened from certain views

Page 25: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

1 2

3

p

Cameras 2 and 3 can more clearly see point p.

Page 26: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

1 2

3

q

Cameras 1 and 2 can more clearly see point q.

Page 27: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Why MVS?

• Different points on the object’s surface will be more clearly visible in some subset of cameras

– Could have high res close-ups of some regions

– Some surfaces are foreshortened from certain views

• Some points may be occluded entirely in certain views

Page 28: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Camera 5 can’t see point r.

1

2 3 4

5

r

Page 29: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Camera 1 can’t see point s.

1

2 3 4

5

s

Page 30: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Why MVS?

• Different points on the object’s surface will be more clearly visible in some subset of cameras

– Could have high res closeups of some regions

– Some surfaces are foreshortened from certain views

• Some points may be occluded entirely in certain views

• More measurements per point can reduce error

Page 31: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

1 2

Page 32: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

1 2

Page 33: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Estimated points contain some error.

1 2

Page 34: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Estimated points contain some error.

1 2

Page 35: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Estimated points contain some error.3

4

Page 36: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Additional views reduce error.

1 2

3

4

Page 37: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Depth maps reconstruction • Multiple-baseline stereo – Rectification of several cameras onto common plane – Problems with wide baselines and distortions after rectification

• Plane sweep stereo – Choose a reference view – Sweep family of planes at different depths with respect to the reference camera

http://www.uio.no/studier/emner/matnat/its/UNIK4690/v16/forelesninger/lecture_8_3_multiple_view_stereo.pdf

Page 38: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

width of

a pixel

Choosing the stereo baseline

What’s the optimal baseline?

– Too small: large depth error

– Too large: difficult search problem

Large Baseline Small Baseline

all of these

points project

to the same

pair of pixels

Page 39: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

The Effect of Baseline on Depth Estimation• Pick a reference image, and slide the corresponding

window along the corresponding epipolar lines of all

other images, using inverse depth relative to the first

image as the search parameter

Page 40: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multiple-baseline stereo

z

width of

a pixel

width of

a pixel

z

pixel matching score

M. Okutomi and T. Kanade, “A Multiple-Baseline Stereo System,” IEEE Trans. on

Pattern Analysis and Machine Intelligence, 15(4):353-363 (1993).

• For larger baselines, must search larger

area in second image

• For short baselines, estimated depth will be less

precise due to narrow triangulation

Page 41: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multiple-baseline stereo

Page 42: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

I1 I2 I10

Multiple-baseline stereo results

M. Okutomi and T. Kanade, “A Multiple-Baseline Stereo System,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(4):353-363 (1993).

Page 43: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multibaseline Stereo

Basic Approach

– Choose a reference view

– Use your favorite stereo algorithm BUT• replace two-view SSD with SSSD over all baselines

• SSSD: the SSD values are computed first for each pair of stereo images, and then add all together from multiple stereo pairs.

Limitations

– Only gives a depth map (not an “object model”)

– Won’t work for widely distributed views.

Page 44: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Some Solutions

• Match only nearby photos [Narayanan 98]

• Use NCC instead of SSD,

Ignore NCC values > threshold

[Hernandez & Schmitt 03]

Problem: visibility

Page 45: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Photo-consistency measures (matching score)

http://carlos-hernandez.org/papers/fnt_mvs_2015.pdf

Page 46: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Popular matching scores• SSD (Sum Squared Distance)

• SAD (Sum of Absolute Difference)

• ZNCC (Zero-mean Normalized Cross Correlation)

– where

– what advantages might NCC have?

Page 47: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Summary

http://carlos-hernandez.org/papers/fnt_mvs_2015.pdf

Page 48: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Plane sweep stereo

D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang and M. Pollefeys, Real-Time Plane-Sweeping Stereo with Multiple Sweeping

Directions, CVPR 2007

Page 49: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Plane sweep stereo

Page 50: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Plane sweep through oriented planes

Page 51: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Plane Sweep: Enhanced Robustness through oriented planes

https://demuc.de/tutorials/cvpr2017/dense-modeling.pdf

Page 52: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Streetside reconstructions by plane sweeping stereo

http://carlos-hernandez.org/papers/fnt_mvs_2015.pdf

Page 53: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Single depth map often isn’t

enough

Page 54: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Really want full coverage

Page 55: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Idea: Combine many depth maps

Many depth maps, each with error. How can we fuse these?

Page 56: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Merging depth maps

• Given a group of images, choose each

one as reference and compute a depth

map w.r.t. that view using a multi-baseline

approach

• Merge multiple depth maps to a

volume or a mesh (see, e.g., Curless

and Levoy 96)

Map 1 Map 2 Merged

Page 57: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Volumetric fusion

A common world-space coordinate system.

Page 58: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Volumetric fusion

A common world-space coordinate system.

Page 59: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Volumetric stereo

• In plane sweep stereo, the sampling of the scene depends

on the reference view

• We can use a voxel volume to get a view independent

representation

Discretized

Scene Volume

Input Images

(Calibrated)

Goal:GOAL: Assign RGB values to voxels in V

photo-consistent with images

Page 60: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 61: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 62: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Iso surface

Page 63: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Newcombe, et. al. ISMAR 2011.

Page 64: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

https://www.youtube.com/watch?v=quGhaggn3cQ

Page 65: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Questions?

Page 66: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Questions?

Page 67: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Multi-view stereo from Internet Collections

[Goesele, Snavely, Curless, Hoppe, Seitz, ICCV 2007]

Page 68: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Stereo from community photo collections• Need structure from motion to recover unknown

camera parameters

• Need view selection to find good groups of images on which to run dense stereo

Page 69: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

• appearance variation

• resolution

• massive collections

Challenges

Page 70: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

206 Flickr images taken by 92 photographers

Law of Large Image Collections

Page 71: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Local view selection• Automatically select neighboring views for each point in the image

• Desiderata: good matches AND good baselines

4 best neighboring views

reference view

Page 72: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Local view selection• Automatically select neighboring views for each point in the image

• Desiderata: good matches AND good baselines

4 best neighboring views

reference view

Page 73: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Local view selection• Automatically select neighboring views for each point in the image

• Desiderata: good matches AND good baselines

4 best neighboring views

reference view

Page 74: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

St. Peter

151 images

50 photographers

Trevi Fountain

106 images

51 photographers

Mt. Rushmore

160 images

60 photographers

Results

Page 75: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Notre Dame de Paris

653 images

313 photographers

Page 76: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 77: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •
Page 78: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

129 Flickr images taken by 98 photographers

Page 79: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

merged model of Venus de Milo

Page 80: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

56 Flickr images taken by 8 photographers

Page 81: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

merged model of Pisa Cathedral

Page 82: CS5670: Computer Vision · What is stereo vision? • Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape. •

Accuracy compared to laser scanned model:

90% of points within 0.25% of ground truth