introduction 3d reconstruction c classical …ocho.uwaterloo.ca/research/posters/icip00_tj.pdf ·...

2
INTRODUCTION GOAL: To recover the structure of a rigid object using a sequence of stereo images for such aerospace applications as autonomous precision landing, satellite servicing and retrieving payloads. BASIC ASSUMPTIONS: 1. Localised point features such as corners are readily available. 2. Structure is represented by a collection of 3D points. 3. There is a single, unknown rigid motion between the cameras and the object. RESULT: An integrated framework for reconstructing an incrementally accurate and dense representation of a rigid object. 3D RECONSTRUCTION PROBLEM FORMULATION: frame number in image sequence camera viewpoint perspective projection function Set of geometric or textural features, represented as 3D points and their 2D projections onto image are OBJECTIVE: Find given a set of images varying in and/or . CHALLENGES: 1. Feature correspondence: how to locate the projections of a physical 3D point on two different images? search problem ill-posed, often ambiguous 2. Structure estimation: how to recover the depth information from feature correspondences and how accurate are the estimates? want to reduce as much as possible the sensitivity to noise and outliers in correspondences 3. Implementation: how to lower computational complexity and data storage requirements? CLASSICAL APPROACHES STEREO Set-up: spatially varying images large baseline a usually known stereo geometry Method: area-based or feature-based stereo matching reconstruction by triangulation Properties: large baseline accurate depth estimates correspondence difficult due to geometric distortion, occlusion, changes in specular reflection, etc. a baseline = separation between two images in terms of relative distance between camera positions MOTION Set-up: temporally varying images small baseline known or unknown motion Method: correspondence by optical flow or feature tracking motion estimation complementary to shape recovery recursive or batch processing of long sequences Properties: small baseline easy correspondence depth estimates sensitive to error in 2D feature positions COMBINED STEREO AND MOTION Set-up: two consecutive pairs of stereo images or a long stereo sequence known stereo geometry known or unknown motion Method: Adaptations/extensions of existing stereo and/or motion techniques, e.g., refine depth estimates for known initial structure known motion to constrain stereo matching extend optical flow to stereo pairs Properties: stereo and motion complement each other to overcome individual weaknesses lack of unified framework to address all of feature correspondence, motion and structure estimation PROPOSED APPROACH BASIC IDEA: Feature matching, 3D reconstruction, feature tracking and motion estimation bootstrap each other; Initially unambiguous stereo correspondences provide 3D points for unique determination of motion estimates; Ambiguities do not need to be resolved immediately at each frame. Matching candidates are treated as hypotheses to be tested in future frames; Motion estimates give additional constraints for feature tracking and stereo matching may resolve previous matching ambiguities generate more 3D points for more accurate motion estimation NOTATION: image feature extracted from image feature extracted from hypothesis that and are stereo correspondences true projection of a 3D feature on true projection of a 3D feature on 3D point reconstructed from and MOTION AND MEASUREMENT MODELS 2D MOTION MODEL: Initially, a second order motion estimator is used for each 2D feature point in both left and right images: where and models the process noise. 2D MEASUREMENT MODEL: Feature extraction errors are modelled as : 3D MOTION MODEL: After 3D motion estimates become available, rigidity constraint for the whole object is enforced using a single consistent motion. is rotation matrix and is a translation vector. 3D MEASUREMENT MODEL: The measurement vector now consists of the extracted features on both left and right images: ALGORITHM: 1. if and satisfy set of epipolar and minimum/maximum depth constraints Create Reconstruct 2. For each , generate predictions and 3. Match image features at frame with predictions. If matched with ,& matched with ,& , satisfy epipolar constraints, Create Update . If has only one stereo matching candidate 4. Estimate new 3D motion parameters and using and . 5. Repeat from 1. Validated motion correspondences Motion estimation 2D right image features 2D left image features Multiple hypothesis tracking and stereo matching Validated stereo correspondences 3D structure representation Motion parameters 3D reconstruction The Incremental Reconstruction Algorithm Delay Matching Generate new hypotheses Stereo match hypotheses at frame f Stereo match hypotheses at frame f+1 Hypothesis Management (pruning, merging) For each hypothesis, generate predictions Image features Predicted feature locations Validated stereo & motion correspondences Motion parameters, 3D structure Multiple hypothesis tracking and stereo matching 2D dynamics frame f frame f+1 right image left image stereo match hypothesis predicted feature locations 2D dynamics Without 3D motion parameters frame f frame f+1 right image left image stereo match hypothesis predicted feature locations 3D dynamics projection With 3D motion parameters RESULTS SYNTHETIC PROBLEM Thirty 3D data points randomly generated on synthetic model Simulated stereo set-up and motion to create a stereo image sequence Occlusion not modelled Random noise with distribution added to simulate feature extraction noise SUMMARY OF RESULTS: Increased number of reconstructed points and decreased number of stereo matching hypotheses over the first few frames 3D motion estimates incorporated after frame 6, lost track of some features but reconstruction accuracy improved 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 Frame number Number of points Active hypotheses Reconstructed points Mismatched points — Visible features

Upload: dothuan

Post on 19-Oct-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

❖INTRODUCTION ❖

GOAL:To recover the structure of a rigid object usinga sequence of stereo images for suchaerospace applications as autonomousprecision landing, satellite servicing andretrieving payloads.

BASIC ASSUMPTIONS:1. Localised point features such as corners

are readily available.

2. Structure is represented by a collection of3D points.

3. There is a single, unknown rigid motionbetween the cameras and the object.

RESULT:An integrated framework for reconstructing anincrementally accurate and denserepresentation of a rigid object.

❖3D RECONSTRUCTION ❖

PROBLEM FORMULATION:�frame number in image sequence� camera viewpoint�perspective projection function

Set of geometric or textural features,represented as 3D points� � � � � � � � � � � � � � � � � � � � � � �� �and their 2D projections onto image � � � � �

are� � � � � � � �� � � � � � � � � � � � �OBJECTIVE:

Find� � � �

given a set of images� � � � � �

varying in � and/or�.

CHALLENGES:

1. Feature correspondence: how to locate theprojections of a physical 3D point on twodifferent images?

♦ search problem♦ ill-posed, often ambiguous

2. Structure estimation: how to recover thedepth information from featurecorrespondences and how accurate arethe estimates?

♦ want to reduce as much as possible thesensitivity to noise and outliers incorrespondences

3. Implementation: how to lowercomputational complexity and datastorage requirements?

❖CLASSICAL APPROACHES ❖

STEREO

Set-up:

♦ spatially varying images

♦ large baselinea

♦ usually known stereo geometry

Method:

♦ area-based or feature-based stereomatching

♦ reconstruction by triangulation

Properties:

♦ large baseline� accurate depth estimates� correspondence difficult due togeometric distortion, occlusion,changes in specular reflection, etc.

abaseline = separation between two images in terms ofrelative distance between camera positions

MOTION

Set-up:

♦ temporally varying images

♦ small baseline

♦ known or unknown motion

Method:

♦ correspondence by optical flow or featuretracking

♦ motion estimation complementary to shaperecovery

♦ recursive or batch processing of longsequences

Properties:

♦ small baseline� easy correspondence� depth estimates sensitive to errorin 2D feature positions

COMBINED STEREO AND MOTION

Set-up:

♦ two consecutive pairs of stereo images or along stereo sequence

♦ known stereo geometry

♦ known or unknown motion

Method: Adaptations/extensions of existingstereo and/or motion techniques, e.g.,

♦ refine depth estimates for known initialstructure

♦ known motion to constrain stereo matching

♦ extend optical flow to stereo pairs

Properties:

♦ stereo and motion complement each otherto overcome individual weaknesses

♦ lack of unified framework to address all offeature correspondence, motion andstructure estimation

❖PROPOSED APPROACH ❖

BASIC IDEA:♦ Feature matching, 3D reconstruction,

feature tracking and motion estimationbootstrap each other;

♦ Initially unambiguous stereocorrespondences provide 3D points forunique determination of motion estimates;

♦ Ambiguities do not need to be resolvedimmediately at each frame. Matchingcandidates are treated as hypotheses to betested in future frames;

♦ Motion estimates give additional constraintsfor feature tracking and stereo matching� may resolve previous matching

ambiguities� generate more 3D points for moreaccurate motion estimation

NOTATION:� � ! " # image feature $ extracted from % � ! " #� &' ! " # image feature ( extracted from % & ! " #) * ! " + $ + ( # hypothesis that � � ! " # and � &' ! " # arestereo correspondences, � ! " # true projection of a 3D feature on % � ! " #, &' ! " # true projection of a 3D feature on % & ! " #-. ' ! " # 3D point reconstructed from

-, � ! " # and-, &' ! " #MOTION AND MEASUREMENT MODELS

2D MOTION MODEL: Initially, a second order motionestimator is used for each 2D feature point in bothleft and right images:, * ! " / 0 # 1 2 3 4 3 0 5 6778 , * ! " #, * ! " 4 0 #, * ! " 4 9 # : ;;</ = ! " # > ! " #where > ! " # ? @ ! A + % # and = ! " # models theprocess noise.

2D MEASUREMENT MODEL: Feature extractionerrors are modelled as B ! " # ? @ ! A + C # :� * ! " # 1 , * ! " # / B ! " #1 D E F + . ! " # G / B ! " # H

3D MOTION MODEL: After 3D motion estimatesbecome available, rigidity constraint for the wholeobject is enforced using a single consistent motion.. ! " / 0 # 1 I ! " # . ! " # / J ! " #I ! " # is 3 K 3 rotation matrix and J ! " # is atranslation vector.

3D MEASUREMENT MODEL: The measurementvector now consists of the extracted features onboth left and right images:68 � � ! " #� & ! " # :< 1 68 , � ! " #, & ! " # :< / 68 B ! " #B ! " # :<1 68 D E L + . ! " # GD E I + . ! " # G :< / 68 B ! " #B ! " # :<

ALGORITHM:

1. M $ + ( + if � � ! " # and � &' ! " # satisfy set of epipolar andminimum/maximum depth constraintsN Create O ) * ! " + $ + ( # PN Reconstruct O -. ' ! " # P

2. For each) *

, generate predictions-� � ! " / 0 Q " # and-� &' ! " / 0 Q " #

3. Match image features at frame " / 0 withpredictions.

If � �R ! " / 0 # matched with-� � ! " / 0 Q " # , &� &S ! " / 0 # matched with-� &' ! " / 0 Q " # , &O � �R ! " / 0 # , � &S ! " / 0 # P satisfy epipolar constraints,N Create

) * ! " / 0 + T + U # N Update-. R S ! " / 0 # # .

If � �R ! " / 0 # has only one stereo matchingcandidateN -. R S ! " / 0 # V W ! " / 0 #

4. Estimate new 3D motion parameters I ! " # andJ ! " # using W ! " # and W ! " / 0 # .5. Repeat from 1.

Validated motioncorrespondences

Motion estimation

2D right image features2D left image features

Multiple hypothesistracking and stereo matching

Validated stereocorrespondences

3D structurerepresentationMotion parameters

3D reconstruction

The Incremental Reconstruction Algorithm

Del

ay

Mat

chin

g

Gen

erat

e n

ew h

ypo

thes

es

Ste

reo

mat

ch h

ypo

thes

es

at f

ram

e f

Ste

reo

mat

ch h

ypo

thes

es a

t fr

ame

f+1

Hyp

oth

esis

Man

agem

ent

(pru

nin

g, m

erg

ing

)F

or

each

hyp

oth

esis

,g

ener

ate

pre

dic

tio

ns

Imag

e fe

atu

res

Pre

dic

ted

fea

ture

loca

tio

ns

V

alid

ated

s

tere

o &

mo

tio

n

corr

esp

on

den

ces

M

oti

on

p

aram

eter

s,3D

str

uct

ure

Mul

tiple

hypo

thes

istr

acki

ngan

dst

ereo

mat

chin

g

2D dynamics

frame f

frame f+1

rightimage

leftimage

stereo match hypothesis

predicted feature locations

2D dynamics

Without 3D motion parameters

frame fXframe f+1X

rightimage

leftimage

stereo match hypothesis

predicted feature locations

3D dynamics

projection

With 3D motion parameters

❖RESULTS ❖

SYNTHETIC PROBLEM

♦ Thirty 3D data points randomly generatedon synthetic model

♦ Simulated stereo set-up and motion tocreate a stereo image sequence

♦ Occlusion not modelled

♦ Random noise with distribution Y � Z � � �added to simulate feature extraction noise

SUMMARY OF RESULTS:♦ Increased number of reconstructed points

and decreased number of stereo matchinghypotheses over the first few frames

♦ 3D motion estimates incorporated afterframe 6, lost track of some features butreconstruction accuracy improved

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

Frame number

Num

ber

of p

oint

s

[Active hypotheses \ Reconstructed points] Mismatched points — Visible features

REAL IMAGE SEQUENCE

♦ 30 corner features extracted from eachimage in sequence

♦ Many disappearing features due to lightingchanges

♦ New features at each frame are added tolist of hypotheses

Left image Right image

One sample pair of images from the real sequence.

Extracted features are shown as white points.

SUMMARY OF RESULTS:♦ Results not as satisfactory as synthetic

problem

♦ No ground truth to assess accuracy

♦ Many ambiguities unresolved by motionand epipolar constraint alone

♦ Motion estimates affected by outliers

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

Frame numberN

umbe

r of

poi

nts

Active hypothesesReconstructed pointsExisting features

[Active hypotheses \ Reconstructed points

— Visible features

❖CONCLUSIONS ❖

♦ presented incremental 3D reconstructionusing a stereo image sequence

♦ all of feature matching, tracking, motion andstructure estimation integrated into onesingle framework

♦ demonstrated potential in a synthetic problem

♦ motion and epipolar constraints alone notsufficient for real sequence

♦ future work includes: occlusion modelling,robust motion estimation, integrating otherstereo matching techniques

Acknowledgments:

Research in this paper is funded in part by Natural Scienceand Engineering Research Council of Canada. Images arecourtesy of Macdonald Dettwiler Space and AdvancedRobotics Ltd.

References:

I. J. Cox, “A review of statistical data association techniquesfor motion correspondence,” Int. J. Computer Vision, vol.10, no. 1, pp. 53–66, 1993.

I. J. Cox and S. L. Hingorani, “An efficient implementation ofReid’s multiple hypothesis tracking algorithm . . . ,” IEEETrans. PAMI, vol. 18, no. 2, pp. 138–50, Feb. 1996.

U. R. Dhond and J. K. Aggarwal, “Structure from stereo — areview,” IEEE Trans. Systems, Man, and Cybernetics, vol.19, no. 6, pp. 1489–1510, 1989.

T. S. Huang and A. N. Netravali, “Motion and structure fromfeature correspondences: A review,” Proc. IEEE, vol. 82,no. 2, pp. 252–268, Feb. 1994.

G. Stein and A. Shashua, “Direct estimation of motion andextended scene structure for a moving stereo rig,” in Proc.IEEE CVPR, 1998.

C. Tomasi and T. Kanade, “Detection and tracking of pointfeatures,” Tech. Rep. CMU-CS-91-132, Carnegie MellonUniversity, Apr. 1991.

J. Yi and J. Oh, “Recursive resolving algorithm for multiplestereo and motion matches,” Image and Vision Computing,vol. 15, no. 3, pp. 181–96, Mar. 1997.

Res

ults

ofR

econ

stru

ctio

n

^ grou

ndtr

uth

_ reco

nstr

uctio

n

Fram

e1:

allt

hepo

ints

that

initi

ally

have

unam

bigu

ous

ster

eom

atch

esar

ere

cons

truc

ted.

Fron

tvie

wTo

pvi

ew

−60

0−

400

−20

00

200

400

600

800

−50

0

−40

0

−30

0

−20

0

−10

00

100

200

300

400

500

X (

mm

)

Y (mm)

−80

0−

600

−40

0−

200

020

040

060

080

021

00

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (

mm

)

Z (mm)

Fram

e5:

mor

epo

ints

are

reco

nstr

ucte

das

som

eof

the

prev

ious

ambi

guiti

esar

ere

solv

ed.

Fron

tvie

wTo

pvi

ew

−60

0−

400

−20

00

200

400

600

800

−50

0

−40

0

−30

0

−20

0

−10

00

100

200

300

400

500

X (

mm

)

Y (mm)

−80

0−

600

−40

0−

200

020

040

060

080

021

00

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (

mm

)

Z (mm)

Fram

e10

:3D

mot

ion

estim

ates

have

been

inco

rpor

ated

.T

heac

cura

cyof

the

dept

hes

timat

esim

prov

ed.

Fron

tvie

wTo

pvi

ew

−60

0−

400

−20

00

200

400

600

800

−50

0

−40

0

−30

0

−20

0

−10

00

100

200

300

400

500

X (

mm

)

Y (mm)

−80

0−

600

−40

0−

200

020

040

060

080

021

00

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (

mm

)

Z (mm)

Fram

e20

:th

ede

pth

estim

ates

ofso

me

ofth

epo

ints

beco

me

even

mor

eac

cura

te.

Fron

tvie

wTo

pvi

ew

−60

0−

400

−20

00

200

400

600

800

−50

0

−40

0

−30

0

−20

0

−10

00

100

200

300

400

500

X (

mm

)

Y (mm)

−80

0−

600

−40

0−

200

020

040

060

080

021

00

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (

mm

)

Z (mm)