a spatial-temporal analysis of matching in feature-based...

A SPATIAL-TEMPORAL ANALYSIS OF MATCHING IN

FEATURE-BASED MOTION STEREO

Hong Wai Chin

B.Sc., Brandon University, 1987

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

in the School

of

Computing Science

O Hong Wai Chin 1993

SIMON FRASER UNIVERSITY

December 1993

All rights reserved. This work may not be reproduced in whole or in part, by photocopy

or other means, without permission of the author.

Approval

Name:

Degree:

Title of thesis:

Hong Wai Chin

M.Sc.

A Spatial-Temporal Analysis of Matching in Feature

Based Motion Stereo

Examining Committee: Dr. Fred Popowich, Chairman

Date Approved:

Dr. Ze-Nian Li Senior Supervisor

Dr. ~ h n Funt Supervisor

DL~: John Ens External Examiner

PARTIAL COPYRIGHT LICENSE

I hereby grant t o Simon Fraser U n i v e r s i t y the r i g h t t o lend

my thes i s , p r o j e c t o r extended essay ( the t i t l e of which i s shown below)

t o users o f the Simon Fraser U n i v e r s i t y L i b r a r y , and t o make p a r t i a l o r

s i n g l e copies on l y f o r such users o r i n response t o a request from the

l i b r a r y o f any o the r u n i v e r s i t y , o r o the r educat ional i n s t i t u t i o n , on

i t s own beha l f o r f o r one o f i t s users. I f u r t h e r agree t h a t permission

f o r m u l t i p l e copying o f t h i s work f o r s c h o l a r l y purposes may be granted

by me o r t h e Dean o f Graduate Studies. I t i s understood t h a t copying

o r p u b l i c a t i o n o f t h i s work f o r f i n a n c i a l gain s h a l l no t be al lowed

w i thou t my w r i t t e n permission.

T i t l e o f Thesis/Project/Extended Essay

A Spatial-Temporal Analys is o f Matching in Feature Based Mot ion Stereo.

Author:

(s ignature)

Hong Wai C h i n

(name)

(date)

Abstract

Motion stereo is a stereo image acquisition method which takes successive images

of a scene from a moving camera. The important assumption is that the motion is a

simple translation and its parameters are known. Matching in motion stereo is

fundamentally similar to conventional binocular stereo matching, where the major issue is

the correspondence problen~. With few constraints, stereo matching results are inherently

noisy, multiple and ambiguous. Most previous motion stereo approaches have explored

various constraints in ad hoc manners. Moreover, only two image frames are processed

at any given time. After initial matching between the first pair, the third and subsequent

images are merely used as a confirmation or refinement to the previously matched result.

This thesis presents a new matching approach which integrates multiple evidence

and obtains matching correspondences from the entire motion stereo sequence. Our

approach is based on the observation that on the Epipolar Plane Image (EPI)

corresponding feature points lie along the same linear path, known as EPI path. A voting

scheme is developed to collect evidence. Each possible corresponding pair of feature

points on the EPI image forms a hypothetical line, which suggests a potential match with

an associated disparity value d. Accordingly, a vote is cast to a 3-D (xyd) voting space.

Peaks can be detected in the voting space if many votes are correctly registered to their

EPI path. A cooperative algorithm based on the disparity gradient is presented for

filtering out false peaks in the voting space. Weighted support and suppression factors

are used around a neighboring area of a peak in the xyd space. Since the inclusion of the

entire sequence of images in motion stereo exacerbates the surface occlusion, a

cooperative algorithm based on the analysis of several occlusion models taking into

account of vote-counts and occlusion patterns is proposed.

To my father

who spent all his life working for the family

and always reminds me the importance of education

and the meaning of life

To my wife and my family

lkllpllk

Acknowledgments

I would like to thank my supervisor Dr. Ze-Nian Li for his supervision,

encouragement, and advice. I am particularly grateful for his constant support during my

thesis writing when I felt lost and low in confidence. I would also like to extend my

greatest appreciation to Frank Tong for his valuable advice and help. I would like to

thank my wife Christine Yip Yit Lai who has provided me the motivation and love

needed for me to keep going, and my family (especially my little sister, Nee, who has

been writing to me) who has provided me the needed support. I would like to thank Dr.

John Ens who came all the way from Richmond on a rainy Vancouver day for my

defense. I would also like to thank Vincent Ng and Honman Wong who have been giving

me so much help over the years.

I would like extend my appreciation to the people in the Department of Chemistry

for the wonderful working environment. I would particularly like to thank Dr. Dale

Treleaven, Dr. Andy Bennett, Dr. Guy Lamoureux for their help in correcting my thesis.

I would also like to thank Dr. Ken Stuart and Dr. Ralph Korteling for their understanding

and help. Also, many thanks to Dr. Fred Einstein who kept me going, and going, ..., and

going. I would also like to extend nly special appreciation to Winnie Chu Chu-Hui for all

her help.

Table of Contents

. . ......................................................................................................................... Approval 11

... ........................................................................................................................... Abstract 111

....................................................................................................................... Dedication iv

.......................................................................................................... Acknowledgn~ents v

Table of Contents ............................................................................................................ vi ...

List of Figures ................................................................................................................. vlu

................................................................................................... Chapter 1 Introduction 1

1.1 The Stereo Correspondence Problem .......................................................... 1

1.2 Constraints for Stereo Correspondence ....................................................... 3

1.3 Feature-Based Stereo Algorithms .............................................................. 6

........................................................................... 1.4 Motion Stereo Algorithms 8

.................................................................................. 1.5 Occlusion Recovery 1 2

............................................................................................ 1.6 Proposed Work 14

Chapter 2 Cooperative Motion Stereo Algorithm ........................................................ 16

2.1 Basics of Motion Stereo ............................................................................... 16

.......................................................................... 2.2 Spatial-Temporal Analysis 21

2.3 Multiple Voting Scheme .............................................................................. 23

2.4 Disparity Gradient Based Relaxation Algorithm ......................................... 26

2.4.1 Basics of Disparity Gradient ......................................................... 27

2.4.2 Forbidden Zones in XYZ and xyd Voting Space .......................... 28

............................................................ 2.4.3 The Relaxation Algorithm 31

.......................................................... Chapter 3 Occlusion Recovery In Motion Stereo 35

3.1 The Study of Two Simple Occlusion Cases ................................................ 36

3.1.1 Analysis of Occlusion on EPI Plane ............................................. 36

3.1.2 Vote Estimation ............................................................................ 4 1

3.1.3 Disparity Gradient Based Algorithn~ with Occlusion Recovery ... 42

........................................................... 3.2 The Study of Other Occlusion Cases 48

.................................................................................... Chapter 4 Experin~ental Results 52

................................ 4.1 Experimental Results without Occlusion Recovery 52

4.1.1 Random-dot Stereogram ............................................................... 52

4.1.2 Rubik Cube on a Moving Belt ...................................................... 57

4.2 Comparative Results for Occlusion Recovery ............................................. 63

Chapter 5 Conclusion .................................................................................................... 68

5.1 Summary ...................................................................................................... 68

5.2 Discussion and Future Work ....................................................................... 69

References ....................................................................................................................... 72

vii

List of Figures

Figure 1.1: Ambiguity in Correspondence: possible matchings between points from the

left and the right image ................................................................................ 2

Figure 2.1. Camera geometry in binocular stereo .......................................................... 16

Figure 2.2. Geometrical model of a motion stereo sequence ......................................... 18

Figure 2.3: Movement of an edge point along an epipolar line of a motion stereo

sequence with a constant baseline interval .................................................. 19

..... Figure 2.4. Spatial-temporal space: xyt space, Epipolar Plane Image and EPI path 20

Figure 2.5: A view of epipolar plane Image: a projection line intercepting a projection

image at Pp ................................................................................................. -22

Figure 2.6. A graphical view of the voting process ....................................................... 24

Figure 2.7. An example of false vote occurrence in the voting process ........................ 26

Figure 2.8. Defining disparity gradient in stereo vision ................................................ 28

Figure 2.9. The forbidden zone of point P in 3-D space ................................................ 29

Figure 2.10. The forbidden zone of point P in disparity space ...................................... 30

........ Figure 3.1. Case 1 of surface occlusion with a relatively wide occluding surface 38

Figure 3.2. Case 2 of surface occlusion with a relatively wide occluding surface ........ 40

............................................. Figure 3.3. EPI plane view of false target and true match 43

Figure 3.4. EPI path of the leading edge and trailing edge of occluding surface .......... 47

Figure 3.5: Case 3 of surface occlusion where the occluding surface occludes and then

disoccludes an edge point during a motion stereo sequence ....................... 50

Figure 4.1 : Random Dot Stereogram of a Hemisphere .................................................. 54

Figure 4.2. Gray level coded disparity map of the hemisphere at different iterations ... 54

Figure 4.3. Layers of V(x, y, d) for hemisphere at iterations 0, 12, 24, 45 .................... 55

Figure 4.4. Motion stereo intensity images and edge maps of Rubik cube .................... 59

viii

Figure 4.5. Gray level coded disparity map of Rubik cube ............................................ 60

Figure 4.6. Layers of xyd voting space for the Rubik cube at iterations 0, 12, 24, 90 ... 61

.......... Figure 4.7. The random dot stereograms of a synthetic motion stereo sequence 63

........................... Figure 4.8. The original synthetic terrain data in the reference frame 64

.................................................... Figure 4.9. Results without occlusion compensation 65

........................................................ Figure 4.10. Results with occlusion compensation 67

Chapter 1 Introduction

Depth recovery is one of the important mechanisms for a vision system to interact

with the outside world. It enables the vision system to intelligently locate, track, describe,

classify, and inspect objects within a scene. An obvious application of a robot vision

system with such abilities is in a manufacturing environment, where manufactured goods

may be located, inspected, and classified for quality control purposes. There are several

depth recovery methods that will lead to the construction of a three-dimensional

geometrical model of a visible scene; they range from active methods such as ultrasonic,

structured lighting, and laser range scanner to relatively passive methods such as

focusing, texture gradient, shading, stereo disparity and motion analysis. Jarvis [Jarvis83]

and Nitzan [Nitzan88] present a general survey of these different approaches to depth

recovery. One of the commonly used techniques is stereo disparity analysis. This

technique explores the triangulation properties of camera geometry and the

transformation between coordinate systems in stereo imagery.

1.1 The Stereo Correspondence Problem

Stereo disparity analysis is a process of finding correspondences between stereo

images. Since depth is inversely proportional to the disparity, the process of depth

recovery becomes straightforward once the correspondence is established and disparity

values are calculated. The correspondeizce problem [Marr76], however, is difficult,

because it is essentially a matching problem. Due to the enormous diversity of objects

and the unlimited number of ways in which they can be geometrically arranged in scenes,

it is generally very difficult to obtain an unan~biguous matching. Additional factors, such

as the inherent ambiguity of images due to surface reflectance, noise during image

capturing and the problem of quantization, can also make stereo matching unreliable. As

illustrated in Figure 1.1, of the sixteen possible matches only four are correct, the rest are

called "false targets" [Marr76]. Without further constraints based on a more global

consideration, the ambiguity in correspondence between two stereo images cannot be

resolved.

line

Figure 1.1: Ambiguity in Correspondence: possible matchings between points from the left and the right image. Back projections are made from images to the three dimensional space.

1.2 Constraints for Stereo Correspondence

In order to alleviate the stereo correspondence problem, numerous constraints

have been studied and employed.

One of the most widely adopted constraints is the epipolar constraint [BakerSl].

In binocular stereo, cameras are assumed to be separated laterally along a straight line

while acquiring images. This camera configuration ensures that corresponding points lie

on a straight line known as the epipolar line. As a result, the search for corresponding

points is limited to areas on the same straight line.

Inspired by human neurophysiological vision, Marr and Poggio [Marr76] propose

a cooperative stereo algorithm using compatibility, urziquerzess, and corztiizuity constraints

for solving the stereo correspondence. The first constraint (compatibility) suggests that a

feature in one image can only be matched to features with similar characteristics in the

other image. The second constraint suggests that each feature can be matched to only one

feature, and hence there is a unique disparity value for each pixel in the image. The third

constraint suggests that disparity varies smoothly almost everywhere. Basically, their

cooperative algorithm defines a set of inhibitory and excitatory neighborhoods in a

connectionist network. Neighboring nodes of similar characteristics (compatibility) and

disparity (continuity) have excitatory interactions. Neighbors that would be matched to

the same point have inhibitory interactions (uniqueness). The algorithm can be

represented by the following iterative process:

~ " ' ( x , y, d) = 0 St(x', y', & ) - E . ~ S ' ( X ' , y', dt)+SO(x, y, d) x', y', d ' W x , y, d ) x', y', d ' d ( x , y, d )

3

where St ( x , y , d ) denotes the state of a pixel at position (x, y) having disparity d at

iteration t. E(x, y, z) is the local excitatory neighborhood, and I(x, y, z) is the inhibitoly

neighborhood. E is an inhibition constant and o is a threshold function. It has been

shown that this algorithm is successful in matching random-dot stereograms [Marr76].

Mayhew and Frisby introduced the concept of figural corztinuity [Mayhewgl].

They point out that the concept of surface continuity is often violated in a more

complicated scene, especially when there are multiple objects and occlusions. The

continuity of disparity values is only guaranteed along the figures (contours) of an object.

Since the figural continuity is not dependent on the surface continuity and it is more

applicable to contour-based matching, it has been viewed as a more acceptable

alternative. However, as pointed out by Pollard et al. in [Pollard85], the figural

continuity constraint may be ineffective when there is a figural distortion.

The disparity gradient constraint is based on the observation that objects occupy a

well-defined three-dimensional volume. Surface points of an object usually have

relatively similar depths, and thus relatively similar disparities. In their study, Burt and

Julesz [Burt801 show that human observers have a disparity gradient limit of

approximately one. Order reversal is observed when the disparity gradient is greater than

or equal to two. The rzorz-reversal ordering constraint [Yuille84] states that two points

must be in the same relative order in the left and right images. It is based on the

assun~ption that no narrow occluding objects exist in the scene. Trivedi and Lloyd

[TrivedW] prove that ordering is preserved when a disparity gradient limit of less than

two is imposed. The PMF algorithm by Pollard, Mayhew and Frisby [Pollard851 is one

of the early stereo matching algorithms that exploit the constraint of disparity gradient

limit.

Pollard, Mayhew and Frisby [PollardSSJ propose a simple relaxation algorithm

based on the disparity gradient limit, uniqueness and epipolar constraints. They discuss

the inherent relationship between the constraint of disparity gradient limit and other

stereo matching constraints such as surface continuity, figural continuity, and non-

reversal ordering constraints. Pollard et 01. [PollardSb] discuss the relationship between

the disparity gradient and the surface orientation and depth in the 3-D space. The

relationship constitutes a good basis for a disparity gradient based algorithm. Because the

probability of a pair of matches that satisfy the condition that their disparity gradient 5 1

increases with the cyclopean distance between the pair of matches, a weighted support

that is inversely proportional to the distance is used.

Stewart and Dyer [Stewart881 propose a trinocular support function whose value

is inversely proportional to the product of the cyclopean distance and the disparity

difference between a pair of matches. The iterative algorithm is implemented using a

connectionist network model. The support value of each node is based on the value of

the previous support, a decay rate, and the support from the neighboring connections.

Binocular and trinocular disparity gradient constraints are applied directly on the

connections between candidate matching nodes.

More recently, Stewart [Stewart911 proposed a new support function based on the

analysis of the probability of disparity changes and depth changes. Given a pair of

matches m and m' which correspond to points P and P' in 3-D space, respectively, the

support function is proportional to the probability that P and P' are on the same surface.

This probability is the product of the probability that the surface is continuous and the

probability that the surface, if continuous, passes through P'. The first probability term in

the product is derived from the probability density function of the disparity gradient, and

the second probability is a linearly decreasing function of the distance between m and m'.

Overall, most previous approaches treat the matching constraints in an ad lzoc

manners. There is a need to develop a unified algorithm and to provide a systematic way

to determine the various heuristic parameter values.

1.3 Feature-Based Stereo Algorithms

Feature-based matching algorithnls can be divided into two levels of abstraction.

Low level abstraction involves matching pixels of features [Marr79, Grimson85, Ohta88,

Stewart88, Okutomi93, Ens931, while high level abstraction involves matching

geometrical structures, such as line segments, curves, junctions, and even surfaces and

volumes [Medioni85, Ayache87, McIntosh88, Lim88, Kim88, Li89, Horaud88,

Zhang931. The advantage of low level matching is its independence from scene features;

the algorithm is not tied to the representation of a particular feature as opposed to the

higher level matching. However, the higher level abstraction reduces matching search

space by representing multiple pixels as a single feature for matching.

Man and Poggio [Marr79] propose a coarse-to-fine algorithm using the Laplacian

of Gaussian operator. Their approach first convolves stereo images with mask operators

of various sizes and extracts zero-crossings of the images. The matching process

proceeds by using matches obtained at a coarser level to constrain the matching at

subsequently finer detailed levels. An implementation of the algorithm is first reported

by Grimson [Grimson8 11.

Grimson [Grimson851 incorporates the figural continuity constraint into the Marr-

Poggio algorithm. The figural continuity is used as a complementary constraint along

occluding boundaries, as it has been observed that, in the previous implementation of

Marr-Poggio algorithm [GrimsonSl], the continuity constraint fails to resolve the

matching ambiguity along occluding boundaries.

Baker and Binford [Baker811 present a dynamic programming method of stereo

matching. They use the uniqueness constraint to enforce intra-scanline searches. Ohta

and Kanade [Ohta85] improve this method by adding an inter-scanline search along edges

to enforce the figural continuity constraint. Li [Lit391 extends the dynamic programming

method into the Hough space, thus significantly reducing the search space. \

Line segments, junctions, etc. are often used in higher level feature matching.

Some matching approaches use various similarity measurements such as edge length,

contrast, and the gradient angle [Medioni85, Ayache87, McIntosh88, Horaud881. Kim

and Bovik [Kim881 present a stereo matching algorithm which first matches higher level

features such as junctions and end points, then uses the figural continuity to match the

remainder of the edges along the contour. Lim and Binford [Lim88] take the idea further

by constructing a hierarchical representation of an image using bodies, surfaces,

junctions, curves, and edge segments. Low level structures, such as edge segments, are

matched first. Higher level structures are then used to resolve ambiguity of the matches

from the low level.

The main advantage of feature-based matching over area-based matching is that

the former does not rely on the correlation method which is often very time consuming.

However, feature-based stereo matching algorithms are not necessarily more reliable.

Due to noise, image resolution or occlusion, features may not appear in both images.

Moreover, the depth map obtained from feature-based matching is sometimes sparse. To

derive a complete depth map, surface interpolation is necessaly.

1.4 Motion Stereo Algorithms

The term ~notiorz stereo is used by Nevatia in [Nevatia76]. In motion stereo,

successive images of a scene are taken by a moving observer over a time frame. For

example, a flying aircraft taking successive pictures of a scene as it flies by can be

considered acquiring a motion stereo sequence. Alternatively, a stationary camera can be

used to take pictures of moving objects. For example, in the manufacturing environment,

a camera is used to take snapshots of industrial parts on a moving conveyer belt. One

important assumption of this method is that the motion is a simple translation and its

parameters are known to the observer. As a result, disparity only occurs along the

epipolar lines whose direction is known. For simplicity, in this thesis, we will assume the

epipolar lines are along the x-dimension.

The major advantage of acquiring more than two stereo images in motion stereo is

data redundancy. This results in a reduced error rate by allowing repetitious confirmation

against false matches. It also increases the precision of the disparity (up to subpixel

precision), because once a correct match is made, the disparity value can be derived by

dividing the total disparity by N, where N+l is the number of images in the motion stereo

sequence. Additionally, a higher density disparity map is achieved due to the relatively

shorter baseline between consecutive images [Okutomi93]. Occlusion recovery can also

be facilitated because a longer and possibly more revealing sequence is employed. The

major drawback of this technique is the increase of computational time required to

process the massive amount of data. Thus, one of the goals of this thesis is to explore a

way of utilizing the advantages of motion sequence while maintaining the potential

parallelism in the algorithm.

Nevatia [Nevatia76] proposes an early motion stereo algorithm based on a

regional correlation technique using a mean square difference measure of the region.

Disparities of the whole motion sequence are determined by chaining disparities in the

intermediate matches. The problem of broken chains in this algorithm is due to the

present of occlusion, image noise, and erroneous matches. Nevatia suggests using a

global correspondence method such as the cooperative algorithm as proposed in

[Julesz7 1] for removing ambiguity.

Moravec [Moravec79] built a mobile robot equipped with a slider camera. Nine

images are taken at a constant distance interval as the camera slides along a track. A

coarse-to-fine matching method is used for matching features from the reference image to

each of the eight other images. Once the algorithm detects a feature's position in the nine

images, it derives the 3-D depth value from each of the 36 possible image pairings by

treating each pair as in a binocular stereo model. A histogram is plotted based on these

distance values. A peak can be detected in the histogram if the match is correct. It is

claimed that a peak in the histogram is sufficient to offset a large amount of false peaks

even when there are only two or three correct correlations.

Bridweil and Huang [Bridwell83] present a method of generating a complete

minimum depth surface description of an scene. A set of depth planes is used to

approximate a three dimensional volume. If a cell in the plane of depth Z is occupied, it

indicates that there may be an object with minimum depth Z in the scene. All cells are

initially occupied. A cell is clear if it is not occluded in two consecutive image frames

and the intensity is mismatched in the projecting pair of image pixels. This simple

algorithm generates a simple sparse depth map which can be used to guide a second pass

matching algorithm.

Bolles, Baker, and Marimont [Bolles87] take Moravec's idea one step further to

deal with the problem of general motion analysis. Hundreds of images (instead of nine

images) are employed. They propose a technique of epipolar-plane image (EPI) analysis

in which spatial and temporal information are integrated. Images are taken in a rapid

succession to obtain a solid block of image data such that the temporal continuity

between images is comparable to the spatial continuity within an image. The advantage

of the EPI analysis is that it converts a three-dimensional problem into a simpler two-

dimensional one. The correspondence process is accomplished by searching for collinear

points in the EPI. Due to the closeness of the points, occluded areas can also be

recovered by joining broken line segments on the EPI. It is observed that occlusions

always occur at junction points on the EPI plane.

Peng and Medioni [Peng88] report a similar motion sequence analysis for motion

estimation in the spatial-temporal space. By taking a 2W by L window slide in the dense

spatial-temporal space, a path of an edge point is traced. A motion speed Vo is estimated

using the slope of the path. Additional speeds are computed by rotating the slide around

the time-axis direction. These speeds form a constraint line in a velocity space, which

can then determine the normal velocity for the edge point. The real velocity field is

estimated using the normal velocity obtained.

Okutomi and Kanade [Okutomi93] propose a method of combining stereo pairs of

different baselines in a motion sequence using a function called the SSSD-in-inverse-

distance. They observe that short baseline matches have less precise distance estimation,

while longer baseline matches have better precision but more false matches. Thus, they

develop a multi-baseline matching algorithm which computes the sun1 of square

differences (SSD) in-inverse-distance for each pair of image with different baseline

length. All SSD's are then added together to produce the function known as the sum of

SSD-in-inverse-distance (SSSD). They show that this function produces an unambiguous

and sharper minimum value at the correct matching position. Combining results from

different baseline matches, they are able to unify the unambiguity from short baseline

matches with the precision from long baseline matches. Since this technique is based on

a particular reference frame, it is unlikely that occluded or newly emerging features can

be recovered without computing another set of images based on a different reference

frame. In addition, if a feature is not visible in the reference frame due to noise, then

depth cannot be recovered at this feature point.

Ens and Li [Ens931 report a real time multi-scale motion stereo algorithm. The

algorithm is implemented in a tight pipeline fashion on an interesting hybrid pyramid

vision machine. Initially, a reference frame (first image) is matched in the lowest

resolution with the second frame as it is acquired. This matched result is then used to

guide the matching between the reference frame and the third frame at a higher

resolution. As a new image comes in, the matching is processed to higher resolution

using the previous lower resolution result as its guide. The uniqueness and compatibility

constraints are enforced. Due to its chaining constraint, broken chains are simply

discarded. No attempt is made to recover partially occluded edge pixels in the algorithm.

Similar to Okuton~i [Okutomi93] all matchings are based on one particular reference

frame, thus it could be difficult to recover occluded or newly emerging edge points.

Similarly, depth of invisible features caused by noise in the reference frame would not be

recovered as well.

Zhang [Zhang93] presents a line-based matching algorithm in a new parameter

space. Lines from a sequence of motion stereo images are extracted using a fast

pyramidal line detection algorithm, and represented as points in an MI-M2 parameter

space. It has been observed that if the MI-M2 arrays are stacked to form an MI-M2-T

space, then the points representing matching lines can be connected by a straight line in

the MI-M2-T space. Collinear points are connected by tracing the corresponding points

through a small search window. The advantage of Zhang's algorithm is its efficiency

obtained by using the line representation and exploiting the small MI-M2-T search space.

However, the matching is limited to line features only.

1.5 Occlusion Recovery

Psychophysical evidence suggests that the human visual system exploits occlusion

as a positive cue to depth recovery [Marr76, NakayamagO]. Occlusion boundaries are

usually coincident with object boundaries, thus are an important clue for object

segmentation. Occlusion may also be used as a guide for planning the next view in active

vision. The occlusion identification problem has usually been considered to be a higher

level of processing in computational vision than the correspondence problem. Most

approaches to the problem of occlusion identification are based on results from the early

correspondence process. However, this approach presents a circular argument since prior

knowledge of occlusion appears essential to identifying occluding boundaries in the

matching process. Most existing approaches [Marr79, Baker81, Ohta85, Ens92,

Okutomi93, Zhang931 to stereo correspondence either ignore the unmatchable points or

attempt to remove them in a second pass. Attempts are made in [Bolles87, Peng881 to

identify junctions of occlusion in the spatial-temporal space.

Toh and Forrest [Toh90] report an early occlusion detection method based on the

binocular fixation on possible occluding boundary edges. Occlusion occurs when similar

measurements on two sides of an edge produce differing results. This method seems

more applicable in the field of active vision where edges of the targeted object are

studied.

Balasubramanyam and Weiss [Balasubra90] propose an early detection method by

comparing a flow disparity estimate error (FDEE). Pixels in a reference frame are

matched independently to pixels in two different frames. Using the results from these

two matches, the algorithm predicts two possible locations where the pixel in the

reference frame would lie in the fourth image. FDEE is the sum of square difference of

these two estimated points on the fourth image. If FDEE is large, the point will be

identified as on the occluding boundary. The crucial point of the algorithm is the

matching process from the two pairs of stereo images. If false matches occurred, FDEE

would be large and the algorithm would fail.

Dhond and Aggarwal [Dhond92] present a dynamic disparity search (DDS)

framework which performs matching in both foreground and background pools. This

method is able to handle scenes with narrowly occluded objects using spatial hierarchy

mechanisms and a new mechanism based on disparity hierarchy.

A two-pass algorithm for locating disparity discontinuities, is presented by Little

and Gillett [Littlego], using the Drumheller and Poggio [Drunlheller86] cooperative

algorithm. They observe that, near occlusion boundaries, pixels receive conflicting

support from inhibitory and excitatory neighbors at two sides of the boundaries. As a

result, these pixels become "weak-winners" at the end of the first pass. Given evidence of

these weak-winners, a second pass of the algorithm will not seek support across the

boundaries, and hence generate clear winners. Consequently, the occlusion boundaries

can be better recovered.

1.6 Proposed Work

Many previous methods in motion stereo analysis [Nevatia76, Moravec79,

Bridwell83, Ohta87, Ens931 are fundamentally similar to the conventional binocular

stereo matching technique. At any given time, there are only two image frames being

processed. The other image frames are usually used as a confirmation or refinement to

the previous result. As pointed out in [Tsai83, Bolles87, Okutomi931, the intermediate

decisions on correspondences are inherently noisy, anlbiguous, and multiple. Finding the

correct combination requires sophisticated consistency checks and filtering. Furthermore,

except in [Bolles87], most algorithms depend on a particular reference frame; i.e., if a

matching feature is not visible in the reference frame, depth value will not be recovered

for this feature. We argue that motion stereo analysis should integrate all motion images;

instead of analyzing them sequentially, one should consider combining information

contained in all n~otion images. We also argue that the inclusion of multiple images in

motion stereo inevitably creates more opportunities for occlusion. A study on the surface

occlusion models in motion stereo and occlusion recovery algorithm is thus needed.

This thesis proposes an algorithm which attempts to utilize fully the advantages

offered by motion stereo as opposed to binocular stereo, i.e., higher precision and lower

error rate. Edge pixels are used as initial matching features in the algorithm. A voting

scheme for accumulating multiple evidence is developed, which utilizes the linear

property of the path in the epipolar plane images (EPIs). A cooperative relaxation

algorithm is used to remove ambiguous matches based on an analysis of the disparity

gradient. The algorithm is shown to be applicable to both dense and sparse feature

images. An occlusion models will be developed based on the analysis of vote-counts and

occlusion patterns. The cooperative relaxation algorithm will be enhanced by

incorporating additional parameters derived from the analysis of the occlusion models.

The organization of this thesis is as follows. Chapter 2 presents a cooperative

motion stereo matching algorithm using a disparity gradient. Chapter 3 incorporates an

occlusion recovery feature into the cooperative algorithm. Chapter 4 presents some

results using random-dot stereograms and real images. Section 4.1 shows results without

considering occlusion recovery. Section 4.2 compares some results for occlusion

recovery. And finally, Chapter 5 discusses the strengths and weaknesses of the voting

scheme and disparity-based cooperative matching algorithm, and the future direction of

our work.

Chapter 2 Cooperative Motion Stereo Algorithm

In this chapter, we present a feature-based motion stereo matching algorithm.

Initially, a multiple voting scheme using a linear projection method is developed to

collect evidence for potential matches. Then a cooperative relaxation algorithm is

proposed which is based on the properties of the disparity gradient to enhance and detect

peaks in the xyd voting space.

2.1 Basics of Motion Stereo

/ / \ \ P (X, Y, Z)

Figure 2.1: Camera geometry in binocular stereo.

In binocular stereo (Figure 2.1), the disparity d between a point po(xo, yo) and its

match point pl(xl, y l ) is

Assuming an epipolar constraint is applied to the camera model, i.e., yo = y,, then the

disparity is defined as

d = x, - xO

The disparity is related to the depth Z by

where B is the camera separation along the baseline, and F is the focal length of the

camera lenses.

In a simple motion stereo configuration, a series of n+l camera positions are

arranged laterally from right to left along the baseline (see Figure 2.2). The camera in

the motion sequence is moving on a straight baseline with a constant B distance between

consecutive camera positions, and the viewing axis of the camera is perpendicular to the

baseline. This geometrical layout satisfies the epipolar constraint. Each camera image

and its immediate left neighbor in the motion sequence can be treated as a binocular

stereo model of baseline B; i.e., each pair of images at time to and t,, at t l and t2, ..., and

at t,-l and t, follows the binocular model. Let (t, - to) = n, and po, pl , ..., p,-l be

corresponding points along the motion sequence, and their corresponding point P in the

XYZ space has depth Z. Given p, and P,+~ be the two image points of P of XYZ space

at t = t, and t = in the motion sequence. Let d, be the disparity of the binocular pair.

Using Equation 2.1, we get

Figure 2.2: Geometrical model of a motion stereo sequence.

Figure 2.3 depicts the movement of an edge point along an epipolar line during

the motion stereo sequence of a constant baseline interval. It is obvious that the

disparity from an edge point po at to to its corresponding point p, at t, in the motion

sequence is the sum of all intermediate disparities obtained from each of the individual

binocular models mentioned in the previous paragraph. Thus, we write the disparity of po

in the motion stereo sequence as

- Epipolar Line

Figure 2.3: Movement of an edge point along an epipolar line of a motion stereo sequence with a constant baseline interval.

Since B and Z do not change for all of the binocular stereo pairs, from equation 2.2, d, of

each stereo pair are equal. Thus Equation 2.3 becomes,

d = nd, (2.4)

From Equation 2.4, we can infer that with a constant baseline distance B between

consecutive images in a motion stereo sequence, the disparity of a point pi at ti and its

correspondence point pj at tj in the sequence is just a scaled value of the total disparity d

in the motion stereo sequence. Assuming a motion sequence starting at ti and ending

with tj, based on Equation 2.3, we get

Substituting d, with = dh from Equation 2.4,

Thus, given disparity dij of part of a motion sequence, we can derive the disparity d of the

whole motion sequence.

/' Epipolar-Plane Image (EPI)

X

Figure 2.4: Spatial-temporal space: xyt space, Epipolar Plane Image and EPI path.

2.2 Spatial-Temporal Analysis

By arranging the image sequence over a spatial-temporal (xyt) space (see Figure

2.4, where n = Z), corresponding points po, pl , ..., p, within an Epipolar Plane Image

(EPI) can be joined by a hypothetical line. As observed by Bolles, et al. in [BoIles87],

when the sampling rate of the sequence is high, a continuous line of corresponding points

can be traced from an edge point po at time t = to to the last corresponding point p, at t =

t, on the EPI plane. This path is called the EPI pctth. Because any points which lie on

the same EPI path are corresponding matching points, each EPI path can be associated

with a depth value Z which in turn is associated with a disparity value d (see Equation

2.4). Figure 2.5 gives the view of an Epipolar Plane Image. Given any two points pi and

pj at time ti and tj mspectively, on an EPI path passing through point p at time tp, the P

disparity d from point po to point p, in a motion stereo of n+l images can be derived as

follows.

Let disparity between pi and pj be

dij = AX = At* tan(@), where At = t. J - ti

Since t, - to = n. Substituting Equation 2.6 to Equation 2.5,

From Equation 2.7, we can see that the disparity value d is proportional to the

slope tan@) of the EPI path. When the slope tan@) of an EPI path is large, d becomes

large, which means the corresponding points on the EPI path are shifting along the x-axis

quickly. By Equation 2.1, the corresponding 3-D point of the EPI path is close to the

camera.

Projection plane r

ti tn

/ - Line used in estimating projection.

- Projection Line

Figure 2.5: A view of epipolar plane image: a projection line intercepting a projection plane at Pp.

2.3 Multiple Voting Scheme

Most previous matching techniques for motion stereo treat the matching problem

as a process which merges matched results from multiple binocular stereo pairs.

Matching evidence that appears in some of the intermediate results is often lost during the

merging process. In this section, we will show a multiple voting scheme which

integrates all correspondence evidence and provides a more complete representation of

matching evidence from the whole motion sequence.

We derive a voting method for finding the "true" EPI path of the corresponding

points on the EPI plane. It is based on the fact that a hypothetical line formed by two

corresponding edge points lying on the same EPI path will have the same slope tan(@)

with respect to t-axis of the EPI path. Therefore, from Equation 2.7, all the hypothetical

lines on the same EPI path can be associated with a single disparity value d.

Let pi and p be two points forming a hypothetical line on an EPI plane at time t = J

ti and time t = tj, where ti c tj (see Figure 2.5). The line forms an angle 0 relative to the

t-axis. A point in the xyd voting space will be called V(x, y, d), where x and y are the

image coordinates in a projection plane and d is the disparity value. Let the point on the

projection plane be Proj(x, y). If the hypothetical line is extended, joining pi and pj and

passing through the projection plane at point Proj(x, y), a vote is cast at V(x, y, d), where

d can be obtained from Equation 2.7. Because we only consider a path within an EPI

plane, all projections should have the same y value. If all pairing of points pi and pj on

the EPI path have correctly registered their votes at V(x, y, d), a high vote-count at V(x, y,

d) should be observed at the end of the voting process. Figure 2.6 shows how different

lengths pi and p, on the same EPI path vote to V(x, y, d) in the xyd voting space.

Figure 2.6: A graphical view of the voting process.

Let Images(x, y, t) be the motion stereo sequence of images. The following

algorithm describes the voting process:

PROCEDURE VOTE ( V, Proj, Images)

For each edge pixel Images(x, y, ti) at time t = ti

For each edge pixel 1mages(x1, y', tj) at time t = t., where t. > ti, y' = y J J

If d = x' - x, and (tj - ti) * MinDisparity < d < (tj - ti)*MaxDisparity

where MinDisparity and MaxDisparity define the search range

from time t to time (t + 1)

If Images(x, y, ti) and Images(xf, y', tj) have compatible intensity and

edge orientation

Join pixels Images(x, y, tj) and Images(x4, y', ti) with a line and

project the line through Proj(xp, yp).

Findout d p = n * d

( t j - ti

Cast a vote at V(xp, yp, dp)

END PROCEDURE VOTE

A peak V(x , y , d ) value in the voting space may indicate two possible voting P P P

results. The first obvious possibility is that all votes are contributed from the

hypothetical lines which lie on the same EPI path. The second possibility is that votes

are contributed from hypothetical lines which span across several correct EPI paths. An

example is shown in Figure 2.7. This type of votes are the false targets. The next

section will discuss how to filter out such false targets in the voting space.

\ t" noise edge point

/ - "True" EPI path.

- Erroneous projection path Figure 2.7: An example of false vote occurrence in the voting process.

2.4 Disparity Gradient Based Relaxation Algorithm

In an ideal situation, a peak in the voting space can be detected sinlply by finding

the local maximum vote value. However, false peaks can spread over the entire voting

space, especially when a relatively large search range is allowed. Moreover, due to

quantization errors and noise, votes from points which are on the same EPI path are not

registered onto the same point V(xp. yp, dp) in the xyd voting space, and therefore

scattered in an area of V(xp+&. yp, dp), where E is small. We will use a relaxation

algorithm which is based on the analysis of the disparity gradient to eliminate false peaks

and to enhance the weak true peaks.

2.4.1 Basics of Disparity Gradient

Let us take two points P(X, Y, 2) and Q(X, Y, 2) in the 3-0 space. Let p'(xb, yb)

and ql(xb, y:) be the projection points on the left image L, pr(xb, yb) and q r ( x i , y i ) be

the projection points on the right image R, and pV(x i , y;) and q v ( x i , y i ) be the

projection points on a virtual cyclopean image V (see Figure 2.8). The disparity gradient

(dg) between P and Q is defined as the difference of their disparities divided by their

cyclopean separation r where cyclopean separation is the average distance between XY

q l , and p r y q r [PollardSS].

where d, and d, are the disparities of point p and q respectively, and

Because the epipolar constraint is assumed, y b = y; and y b = y i ,

Suppose a virtual camera V is placed in the middle of the left and right cameras. I I Since y; = y;, y i = yb, x i = (x, + x;) 1 2 and x i = (x, + xi ) 1 2 , it follows that

qV pV 9" P'

Figure 2.8: Defining disparity gradient in stereo vision.

2.4.2 Forbidden Zones in XYZ and xyd Voting Space

In Figure 2.9, imagine P as the current fixation point in the XYZ space. Any

other points P' which violate the non-reversal order against point P (i.e., dg > 2 as pointed

out in [BurtSO]) are in one of the two oblique cones tipped at point P. These two cones

are called the "forbidden zone" of P in the XYZ space because of the fact that any point

inside the forbidden zone always violates the non-reversal order. Points outside of the

forbidden zone (dg < 2) satisfy not only the non-reversal ordering constraint, but also the

uniqueness constraint (dg f 2), and sometin~es, the disparity gradient limit (dg < 1.1) and

the continuity and figural continuity constraints (dg << I).

PI pV pr Figure 2.9: The forbidden zone of point P in 3-D space

As shown in Figure 2.10, a vote peak p (corresponding to a potential match) at

V(x, y, d) in the xyd voting space forms two similar cones tipped at the point p. The

disparity gradient between p and p' can be derived from Equation 2.8. The surface of the

oblique cone where dg = 2 in XYZ space corresponds to a cone surface in the xyd space

with slope = 2 with respect to the xy plane. When dg = 0, the surface is an XY plane of

depth Z in the XYZ space. In the xyd space, dg = 0 corresponds to an xy plane of

disparity d. We observe that the shape of the forbidden zone in the xyd space does not

change and has a simple symmetrical property as opposed to the one in the XYZ space.

Y Figure 2.10: The forbidden zone of point P in disparity space.

From Figure 2.9, it can be observed that the shape of the forbidden zone in the

XYZ space varies depending on the point's 3-D location; the shape of the cone surface

skews toward one side when the point is off-centered from the Z-axis. In addition, the

forbidden zone is wider when the point is closer to the camera, i.e. where Z is snlaller.

However, in the xyd space, the shape of the forbidden zone is constant, due to the fact

that dg is defined as the slope with respect to the xy plane (Equation 2.8). Thus, dg = 2

always occurs on the cone surface of slope = 2. Furthermore, the line of sight (dg = 00 or

r = 0) is always aligned with the d-axis in the xyd voting space. In addition, r is not XY XY

direction dependent. Thus, the shape of the forbidden zone is always symmetric and

equal in the xyd voting space. It is important to note that the disparity gradient in

Equation 2.8 is derived based on the cyclopean separation; i.e. the xy plane of the xyd

voting space is the cyclopean image. If the disparity space is derived from one of the base

images other than the cyclopean image, the forbidden zone would be more difficult to

analyze, because it would not be as syn~n~etrical around the line of sight as the forbidden

zone based on the cyclopean image.

2.4.3 The Relaxation Algorithm

The framework of our relaxation algorithm is adopted from Marr and Poggio

[Marr76]. A connectionist network of nodes is constructed. Each node corresponds to a

disparity value d at location (x, y). qt each node, the neighborhood is examined and the

confidence measure of the node is increased or decreased depending on the support

provided by the neighboring nodes. The process is executed iteratively. In Marr and

Poggio's algorithm, a neighboring node provides positive (excitatory) support if it has the

same disparity d and negative (inhibitory) support if the neighbor associates the same

location (x,y) with a different disparity value. In this sense, the support function is ad

hoc, and it only covers a small range of disparity gradient values.

We will be using the disparity gradient as the basis of our stereo matching

algorithm as in [Pollard85]. Support from the neighboring nodes is defined as a function

of the disparity gradient f(dg), whose value can be either positive or negative. This

neighborlzood support fi~izctiort (NSF) is based on Li's systematic analysis [Li94] of

various NSFs in the cooperative stereo matching process.

As mentioned in the previous section, the disparity gradient is defined based on

the cyclopean image of the binocular stereo model. In order to comply with the definition

of the disparity gradient, we will treat the whole sequence of the motion stereo with a

pseudo binocular model; i.e. the leftmost image in the sequence is treated as the left

image of the binocular model, and the rightmost image in the sequence is treated as the

right image of the binocular model. Thus, given a motion stereo sequence to, t l , ..., t, of

n+l images, the projection plane in the multiple voting procedure must be an image at

tn12.

The multiple voting scheme introduced in the previous section casts votes of

potential matches onto a point V(x, y, d) in the xyd voting space. Good matches would

appear as peaks in the voting space. The voting process would likely produce a voting d

space of many false peaks which associate more than one disparity value to an edge

point in the reference frame. Each candidate disparity d (potential match) is assigned an

initial confidence measure So based on the number of vote counts received. At iteration

k+l, the new confidence measure Sk+' of match candidate p is based on the amount of

support received from all its neighbors p' in the xyd space and the old confidence measure

Sk. The support is inversely proportional to the cyclopean distance rXy between the p and

p' and proportional to the NSFf(dg).

Special handling is given to nodes with r - 0, where two points are aligned XY -

along the line of sight. Since dg = Id' - dl 1 0 = w and f(oo) is usually equal to -1 [Li94], a

relatively large coefficient q, e.g. q = 8, is used to yield a significantly negative support.

The following relaxation procedure is outlined in [Li94].

PROCEDURE RELAXATION

Initialization: Potential candidates in xyd space are given a confidence measure So

Repeat until no change

For each p at V(x, y, d)

For each of its neighbor N at V(xu, y', d')

Determine the new confidence measure, such that at kt-1 iteration

3 where rxy = J(x' -x12 + ( y' - y )- , dg = Id'-dl/rxy

If Sk++' (p) > Positive Threshold

then Sk+](N) = 0 I p~

END RELAXATION

5Rl(p) is a region of points that are within a spherical distance of radius rxy centered at

V(x, y, d) satisfying dg < 2. It turns out that the cylindrical radius rxy is also the

cyclopean distance of the points from p to neighboring point N, since the xyd space is

based on a cyclopean image. When the confidence measure of a point p is greater than a

threshold, we consider that the point is saturated. All other potential matches within the

line of sight of p are suppressed at this point, since neither any object in front of the point

nor any other points behind the matched point p is allowed in the line of sight, i.e. the

uniqueness constraint is enforced.

Chapter 3 Occlusion Recovery In Motion Stereo

An object can easily be occluded by some other objects in a portion of the motion

stereo sequence. Hence, the vote count received by a match point that is occluded for

some time during the sequence can be significantly weakened. As a result, the vote count

of the true match may not be distinguishable from those of the false targets. The

algorithm introduced in the previous chapter may not be able to derive the correct

disparity of a match because of the close competition with neighboring false targets and

weak supports from the neigborhood. In this chapter, a method of occlusion recovery in

motion stereo through vote-counts and occlusion pattern is described. A new cooperative

algorithm that incorporates occlusion recovery is presented.

As shown in the previous chapter, the hypothetical lines are projected onto the

projection plane which is in turn used as the reference frame in the xyd space. The

projection plane does not necessarily coincide with one of the actual images, for exanlple,

it could be at t = 1112 = 3.5. In this sense, it is virtual. The advantage of our multiple

voting scheme using the virtual reference frame is that it does not rely on the actual

existence of a feature point on the reference frame. Therefore, it is more tolerant to

missing feature points due to noise, and more robust than the chaining method [Ens93].

It also provides more opportunities for occlusion recovery, since it is not required that an

entire EPI path be intact. On the other hand, when a terrain map (or disparity map) is

generated, disparity values must be associated with individual locations. If a point of

Surface A on the projection plane is occluded by Surface B, then only the disparity of

Surface B at this point will be shown.

3.1 The Study of Two Simple Occlusion Cases

In this section, we will study two cases where a surface occludes an edge point at

the beginning of a motion stereo sequence or at the end of the sequence. An algorithm is

proposed to den1 with these two cases of occlusion.

3.1.1 Analysis of Occlusion on EPI Plane

Figure 3.l(a) shows a motion stereo sequence to, t l , ..., t, where an occluding

surface (darker left side surface of the image) moves on top of another surface (lighter

right side surface of the image). Figure 3.l(b) depicts the EPI path of an edge point p l

which is occluded by the occluding surface in a motion stereo sequence. An edge point p2

is on the leading edge of the occluding surface. The EPI path of p l can be traced until it

meets the EPI path of edge point p2. The edge point pl then disappears in the rest of the

sequence. Dl is the distance between the occluded edge point and the leading edge point

of the occluding surface at the projection frame at t = 0. Let 8 be the angle between the

EPI path of p2 and the t-axis, t, = n, and 6 be the angle for pl .

dl d2 tan(6) = - and tan(8) = - n n

where d l and dl are the disparities of point p I and pz respectively, 0 i 6 i $, 0 $ g 5 +,

and 0 > 6, . Let a be the length of time before EPI path p l is occluded, then

and

By substitution of Equation 3.3 to 3.2,

In order to include the case in which p2 does not occlude pl during the motion

sequence, i.e.,a > n, the equation is slightly modified as

a = min D 1 tan (0) - tan (6) (3.5)

this modification means that if the length of time needed for p2 to start occluding pl is

longer than the motion sequence, then a returns n which means that the path pl is not

occluded from time t = 0 to t = n.

. - - - - Occluded EPI path

Non-occluded EPI path

Figure 3.1: Case 1 of surface occlusion with a relatively wide occluding surface. (a) the image sequence, (b) the corresponding EPI path.

A second case of surface occlusion is shown in Figure 3.2(a). In this case, point

p l is occluded at the beginning of the motion sequence and is visible when the trailing

edge of the occluding surface leaves pl. This occlusion is exactly the opposite of the first

case where the edge point p l is occluded later in the sequence. These two cases are

similar to the examples of left and right occlusion cases in binocular stereo model

[Little901 where point p l is occluded in the leftmost image (at time t,) in the first case,

and is occluded on the rightmost image (at tiine to) in the second case. The length of time

p where p l is visible in Case 2 is equal to the total time 11 minus the time a when p l is

invisible, and a can be derived from Equation 3.5 for the first case. Thus,

= 11 - min D 1

tan(8) - tan(6) '

It may have been noticed that both of the above analyses were based on the

leftmost images of the motion sequence. The distance Dl between point pl and p2 was

based on the left image. However, the projection plane as discussed in the last chapter is

at the cyclopean image, i.e. image at td2. Given the distance Dl1 in the cyclopean

projection plane, Equation 3.5 becomes

Let (tan(8)-tan(6)) be h,

Equation 3.6 can also be rewritten as

- - - - - Occluded EPI path

Non-occluded EPI path

Figure 3.2: Case 2 of surface occlusion with a relatively wide occluding surface. (a) the image sequence, (b) the corresponding EPI path.

3.1.2 Vote Estimation

In this section, given the disparities of the occluded and occluding edge points, we

will derive a forn~ula for estimating vote count. We begin with a case where there is no

occlusion.

Let edge points po, p , , p,, - ...p, be corresponding points in frame to, t l , t2, ..., t,,

respectively. The hypothetical lines formed by (po, pl), (pl , p2), ..., (P,-~, p,,), (po, p2),

(pl, p,), (p2, p4), .... (pI, p,) will each contribute a vote. Let 1 be the length of a

hypothetical line. For example,. (po, pl) and (p,_,, 13,) both have 1 = 1. The following are

the votes grouped by length 1 of all the possible hypothetical lines.

1 = 1 votes = n

1 = 2 votes = n - 1

1 = 3 votes = n - 2

. . . ...

l = n votes = 1

Therefore, we can get the total votes (v) by

In order to reduce the number of false targets, a minimum length of 1 > lmin is usually

required in voting, it follows that

Now, consider the occlusion Case 1, where an edge is not occhded initially and is

then occluded for the rest of the motion sequence. Let m be the number of frames where

the occluded edge point is not occluded, the total number of votes will only be

contributed from these m corresponding points. Thus, the total votes in this case is

The value of In can be derived from Equation 3.7 for occlusion Case I , or 3.5 for

occlusion Case 2.

3.1.3 Disparity Gradient Based Algorithm with Occlusion Recovery

The cooperative algorithm presented in chapter 2 does not consider the possible

impact of occlusions. Thus, when a false target accumulates as many votes as a partially

occluded matching candidate does, the result may be a false match due to the fact that the

false target may get more support from the neighbors.

An example of a case where a false match receives higher support from its

neighbors than its neigboring true match is shown in Figure 3.3. Assume that some false

votes happen to project onto a point p(x, y) in the projection plane. Then, along with the

true votes, there exist two potential matching candidates at V(x, y, d l ) and V(x, y, d2) in

the xyd voting space, where d l is the true disparity and d2 is the false disparity. Suppose

that the vote count at V(x, y, d l ) is low due to occlusion from another object surface, and

the false target also has low vote count (this is true since false targets are mainly made up

of random votes from various false EPI paths). Suppose that V(x, y, d l ) and V(x, y, d2)

both receive approximately the same number of votes during the initial voting stage. Let

p'(xt, y') be a neighboring edge point on between V(x, y, d l ) and V(x, y, d2) and have

disparity of approximately d,. - According to the disparity gradient function f(dg), V(x', y',

d,) - tends to give more support to V(x, y, dl), because of the smaller disparity gradient

difference. Therefore, the false target will emerge faster than the true matching

candidate, and end up as a winner.

- - - - . Occluded EPI path

Non-occluded EPI path - - - False target path

Figure 3.3: EPI plane view of false target and true match.

An occlusion support function ( OCCL() ) is added to the cooperative algorithm to

compensate for the vote loss caused by occlusions. This support function determines

whether a weak vote value at V(x, y, d) is a result of occlusion, and increases the support

accordingly. Such a weak vote peak is called occluded candidate. The purpose of the

OCCL() function is to provide the occluded candidates extra support as compared to the

support given to false targets. Given a vote peak at V(x, y, d) in the disparity space, the

OCCL() function searches for the neighborhood point V(x', y, d') which occludes V(x, y,

d) for some time during the motion sequence. The question of whether V(x, y, d) is an

occluded candidate can be determined by Equations 3.7 and 3.8. If d' > d, 0 < a c 11 and

p c n, then V(x', y, d') has occluded V(x, y, d). Once a or P is obtained, an expected vote

count in V(x, y, d) can be calculated from Equation 3.9. If the initial vote count from the

voting process at V(x, y, d) is close to the expected vote, then it is confirmed that V(x', y,

dl) has occluded V(x, y, d) for a certain period of time in the n~otion sequence.

Thereafter, some additional support will be given to V(x, y, d).

Some inter-scanline support also can be incorporated into OCCL(). Up to this

point, the discussion on intra-scanline has been based on the support within an epipolar

plane, i.e. y = y'. If the intra-scanline neighbor V(x', y, d') occludes V(x, y, d), then based

on the figural continuity property, it is very likely that the figural neighbors of V(x', y, dl)

at scanlines y + E would occlude the figural neighbors of V(x, y, d). The addition of

inter-scanline support can also ensure a significant amount of compensation will be

provided to V(x, y, d) only when V(xl, y, d') is not a false target.

The OCCL() function presented below is based on the two occlusion cases

presented in section 3.1.1. We would only attempt to recover occluded candidates which

are not occluded at the projection plane, i.e. at time = td2. Thus, given an occluded

candidate p(x, y) on the projection plane, an edge point pl(x', y), where x' > x, must be a

trailing edge of a surface (Case 2), if pl occludes p (Figure 3.4). Similarly, an edge point

p2(x1', y), where x" < x, must be a leading edge of a surface (Case I), if p2 occludes p.

The following is the modified algorithm with occ lus io~~ compensation:

PROCEDURE OCCL-RELAXATION

Initialization: Potential candidates in xyd-space are given a confidence measure So

Repeat until no change

For each p at V(x, y, d)

For each of its neighbor N at V(x', y', d')

Determine the new confidence measure, such that at k+l iteration

ski' (p) = s " ~ ) + C f ( d g > s k (N) - 1 C s k ( ~ ) NE%,(P) r x ~ N ~ 9 1 , ( p )

+ COCCL(N) N E % , ( ~ )

where r,, = d ( x ' -x)' + ( y ' - y ) ' , dg=ld l -d l / r XY

Ix' -XIS a, ly' -ylS b,

Id'-dl<c, T ( x ' = x , y ' = y ) I 912 (p) = {(x' , y' ,dl )Ix' = X, y' = Y , d' # d}

Ix' -XIS a ' , ly' - y l l b' 913(Pl={(x ' ,Y '~dl i&> , 7 ( x 1 = x, y' = y)

If Sk+' (p) > Positive Threshold

then Sbl(N) = 0 I N %2(p)

END OCCL-RELAXATION

and the occlusion support function is:

FUNCTION OCCL(N)

/* V(x, y, d) may be occluded son~et in~e after time td2 */ ( x - x ' ) n

rn = niin { +-, n} h 2

19: V(x, y, d) may be occluded sometime before time td2 $:I ( x - x ' ) n

m = n - min 2

If m = n /* no occlusion */

Then return 0;

Else /* V(xl, y', d') occluded V(x, y, d) sometime in the motion sequence $1 (m-1)-lmin

Ves,,,, = i /* I rnin is the minimum projection path length */ i=l

Return h, (I y' -yl) h2 (IVo -Vest,, -vest, *sk(N) I

cp,, cp, are the decaying factors of the exponential functions

V, is the initial vote count at V(x, y, d)

END OCCL()

Match Candidate . . - - - Trailing edge of a surface

Leading edge of a surface

Figure 3.4: EPI path of the leading edge and trailing edge of occluding surface

The exponential function h,(ly' -yl)determines the inter-scanline support weight. And

lz2(IV0 - Vestmi) determines intra-scanline support weight. If V(x', y', d') is the edge of the

occluding surface, then the V,,,,, value should be approximately equal to the initial vote

count received at V(x, p, d). The third term in the OCCL() return expression

is the estimated vote count which is lost due to the occlusion. The last term s ~ ( N ) is a

weigth factor indicating whether the support from V(x', y', d') is coming from a correct

match; if s ~ ( N ) is high, the V(xl, y', d') is less likely to be a false target.

3.2 The Study of Other Occlusion Cases

The two occlusion cases presented in the previous section assume that an

occluded edge is either gradually being occluded or disoccluded by a surface. However,

there are several other cases of occlusions. For example, an edge point is temporarily

occluded by a narrow surface. The term "narrow" is relative to the baseline length

between consecutive images.

Figure 3.5 shows an example of this third occlusion case. In this case, an

occluding surface bounded by edge points P2 and P3 occludes the edge point P1 during a

portion of the motion sequence and then moves away from the edge point. At the

beginning of the motion sequence, we observe that the EPI path of point P1 continues

along the motion sequence and then disappears. Later, it reappears after the trailing edge

point P3 of the occluding surface leaves. It can be shown that the length of the time when

P1 is not occluded by the occluding surface bounded by edge points P2 and P3 is

where a is the length of the time when edge point P1 is visible before it is occluded by the

surface's leading edge point P2, and is the length of time when PI is visible after the

surface's trailing edge point P3 leaves edge point P1. Given the disparity of P1 = d l ,

disparity of P2 = d?, - and disparity of P3 = d3, we know that

d d 2 d3 tan(6) = -l, tan (8) = -, and tan(@) = -

n n n

The length of time required for the EPI path of P2 to occlude P I , a can be obtained from

Equation 3.5

cil = min D I (3.13) tan ( 8 ) - tan ( 6 )

Based on Equation 3.6, we obtain the length of time where p3 is not occluded (P) as

Equation 3.15 will hold even when edge P1 never leaves the occluding surface at the end

of the motion sequence.

Let h = tan ( 8 ) - tan ( 6 ) and h' = tan (9) - tan ( 6 ) . Equation 3.12 can be rewritten

Figure 3.5: Case 3 of surface occlusion where the occluding surface occludes and then disoccludes an edge point during a motion stereo sequence. (a) the image sequence, (b) the corresponding EPI path.

From the previous three cases, we establish the following table. Table 3.1 shows

a summary of all possible situations which may occur in a motion sequence based on the

assumption that an edge point can only be occluded by one surface and the occluded point

is visible at time t = 1112. It shows the state transitions of an edge point between durations

to to ti, ti to tdZ, tdZ to tj, and t. to t,. J

Table 3.1: Possible Occlusion Cases

N denotes "not occluded" 0 denotes "occluded" - denotes "don't care" situation * no occlusion, a special case of Case 1 or Case 2

It may have seemed that all possible situations have been handled. However, the

difficulty in Case 3 is far more complex than in the first two cases. The introduction of a

trailing edge to an occluding surface increases the search complexity dramatically. Case

3' is such a case that the leading and trailing edges belong to two different surfaces or a

concave surface. The OCCL() support function would have to consider the possibility

that the surface bounded by a leading and a trailing edge is actually corresponding to a

real surface. Thus, Case 3 and 3' will be left for future research.

Case

0"

1

1

2

2

3

3

3'

Time t = 1112

N

N

N

N

N

N

N

N

Time t = 0

N

N

N

0

0

N

N

0

Time t = j

N

N

0

N

N

N

0

-

Time t = i

N

N

N

N

0

0

N

-

Time t = n

N

0

0

N

N

N

N

0

Chapter 4 Experimental Results

This chapter presents some experimental results of the cooperative motion stereo

matching algorithm. It starts with results from a random-dot stereogram (RDS) and a

motion stereo sequence of real images. These results will be followed by comparative

results from a synthetic motion stereo sequence which consists of RDS images with

surface occlusions. The effect of occlusion recovery wi!l be demonstrated.

4.1 Experimental Results without Occlusion Recovery

In this section, experin~ental results of the disparity gradient based cooperative

algorithm will be shown. The occlusion recovering function is not involved.

4.1.1 Random-dot Stereogram

A random-dot stereogram (RDS) is used to demonstrate the cooperative nature of

the algorithm and its effectiveness.

Figure 4.1 shows the RDS of a hemisphere. The resolution of the image is 128 x

128. The hemisphere sits at the center of the images on a flat background, it has a radius

of 54 pixels. The disparity at the top of the hemisphere is 13, and the disparity of the

background is 1. The cooperative algorithm is run for 45 iterations. A disparity map is

generated at every third iteration, where the gray level is used to depict the disparity (the

brighter the pixel, the larger the disparity). Since there are usually multiple disparity

values associated with each (x, y) position before the final convergence, the displayed

disparity is actually a weighted average which takes into account the strength S for each

contending disparity. Figure 4.2 depicts the iterative results in a row major arrangement

for iterations 0, 3, 6, ..., 45. Initially, since there are many unsettled peaks at each point

(x, y) in the xyd space, the weighted average value tends to be gray. As weak points are

eliminated or weaken and correct matching points are getting more support, disparity

va!ues of the correct matches would slowly emerge as winners.

Figure 4.3 (a) shows the initial strength SO at each of the 15 disparity layers (d =

0..14) in the xyd space, again in a row major arrangement. Gray value here is used to

show the strength S at each (x, y) pixel. The initial strength SO is set at an intermediate

(non-saturated) level. As expected, the votes are randomly distributed over all layers.

However, the initial matching procedure is able to cast a voteconsistently for every

correct match disparity, because there is always a match at the correct disparity in the

RDS. Thus, gray regions (most of them rings) of pixels with correct disparity can be seen

on each layer of disparity. The task of the relaxation algorithm is to enhance the strength

of the correct matches and to suppress the numerous random matches. Figure 4.3(b), (c)

and (d) show the changes on each disparity layer in the xyd voting space after 12, 24, and

45 iterations. Pixels in the correct gray regions continue to receive steady support from

their neighborhoods as more false targets are eliminated due to less support from its xyd

neighborhood. After iteration 45, most pixels are saturated at their correct disparity

values, except the pixels of the thin rings at disparity 2, 3, and 4. The errors at these thin

rings can be attributed to the limitation of the relaxation algorithm. A calculation based

on the geometry of the sphere also shows that the first two rings are less than one pixel

wide and the third ring is only one pixel wide, thus they present a very difficult challenge.

Figure 4.1: Random Dot Stereogram of a Hemisphere (resolution 128 x 128).

Figure 4.2: Gray level coded disparity map of the hemisphere at different iterations.

(a) Iteration = 0

(b) Iteration = 12

Figure 4.3: Layers of the xyd voting space for hemisphere at iterations 0, 12, 24,45.

(c) Iteration = 24

(d) Iteration = 45

Figure 4.3: (con't)

56

4.1.2 Rubik Cube on a Moving Belt

The second experiment is a motion stereo sequence of a Rubik cube (Figure 4.4).

This is to show that the cooperative algorithm not only works on images with dense

features (e.g., the RDS), but also works on more common images in which features are

relatively sparse. The images are taken from a stationary camera taking nine snapshots of

the Rubik cube on a moving belt. The cube is propped up at one corner. The resolution

of the image is 256 x 256. The disparity range for the cube pixels is between 42 and 56

pixels (i.e. the disparity between consecutive frames is approximately between 5.3 and 7).

The sensors and wires have disparity d = 0 since they are not moving. A sobel operator is

used to obtain the edge maps ( Figure 4.5). Initially, edge features of similar angles are

matched. A relatively large disparity range, from 0 to 13, for matching edge points is

allowed to test the robustness of the algorithm'.

Figure 4.5 shows the gray level coded disparity values after every third iteration

starting from iteration 0. The false matches inside the checker board surfaces of the

Rubik cube are cleanly eliminated after several iterations.

Figure 4.6 (a) shows the result of the initial voting process in 15 disparity planes

in the xyd space. Figure 4.6 (b) (c) and (d) show the results after 12, 24, 90 iterations

respectively. Clearly, most cube edge points have gradually converged to their correct

disparity values 5 ,6 , and 7.

I1n this subsection and the next section, disparity d of the xyd voting space is the disparity between consecutive images, not the total disparity as previously used.

Notice that in Figure 4.6 (a), the initial votes on the horizontal edges of the

moving belt span across multiple disparities. That is due to the ambiguity of edges

aligning along the epipolar line. Most of these edges of the belt later disappear because

the contending peaks have the same initial strength SO and equal support from their

neighborhood. Since these peaks also suppress each other during the relaxation, their

vote strength tends to drop quickly. As a result, most of them are never able to emerge

as winners.

Figure 4.4: Motion stereo intensity images and edge maps of Rubik cube

Figure 4.5: Gray level coded disparity map of Rubik cube.

(a) Iteration = 0

(b) Iteration = 12

Figure 4.6: Layers of xyd voting space for the Rubik cube at iterations 0, 12, 24,90.

(c) Iteration = 34

(d) Iteration = 90

Figure 4.6: (Con't)

62

4.2 Comparative Results for Occlusion Recovery

This section presents some results of the experiments on recovering occlusions in

motion stereo. A sequence of synthetic images is generated to simulate an occlusion

scenario. A total of 9 frames are used in the experiment, and Figure 4.7 shows the first,

the middle and the last frame of the sequence. The images are of size 64 x 64.

Figure 4.7: The random dot stereograms of a synthetic motion stereo sequence. (a) The first frame, (b) The 5th frame, (c) The 9th frame.

There are three object surfaces of different distance in the scene. The near surface

has a disparity of 24 pixels (3 pixels per frame) and is located at the left side of the scene.

The middle surface has a disparity of 16 pixels (2 pixels per frame) and is located at the

right side of the scene. The farthest surface, which constitutes the background, has a

disparity of 8 pixels (1 pixel per frame) and is located in between the other two surface.

In Figure 4.8, it shows the gray level coded disparities map of the surfaces and their

positions as seen in the cyclopean reference frame.

Figure 4.8: The original synthetic terrain data in the reference frame. The near plane is shown in the light shade and the farther planes are shown in progressively darker shades.

This in~plenlentation is based on the occlusion model for the two cases, Case 1

and 2, as studied in the previous chapter. Case 3 has not been implemented in this thesis.

As discussed before, the neighborhood supports of this relaxation scheme is based on

considerations of disparity gradients constraint and surface occlusion. For comparison

purposes, two runs have been performed, namely one from the relaxation algorithm from

Chapter 2 (without occlusion) and one from Chapter 3 with the occlusion model

implemented.

Figure 4.9(a) shows the disparities and 4.9(b) shows the error map. The errors are

desseminated to 8 different values (0.5, 1, 1.5, ..., 4). The errors are obtained by taking

the difference of the correct disparity values and the matching results. It can be seen that

all the errors are committed at the occlusion boundaries. Figure 4.9(c) disseminates the

errors into different error values for examination. It is found that pixels at the boundary

of the occluded surface are always drawn to the wrong disparity values impinged by the

strong vote peaks from the occluding surface. Errors of values 1 and 2 in Figure 4.9(c)

come to occurrence because of that.

Figure 4.9: Results without occlusion compensation. (a) The recovered disparities shown in the gray-level coded display, (b) The composite error map, (c) The errors are disseminated to 8 different values (0.5, 1, 1.5, ..., 4).

Figure 4.10 displays the results from the occlusion algorithm. The display

organization is the same as in Figure 4.9. With the occlusion compensation incorporated

in the neighborhood supporting function in the relaxation algorithm, the occluding

surface effectively gives additional support to the vote peaks of the occluded surface

which are once weakened by the occlusion. Now, the pixels at the occlusion boundary

are no longer overwheln~ingly drawn to the occluding surface. Instead, they are settled on

the correct disparity values. The effectiveness of the occlusion algorithm is readily

appreciated when one compares the errors of values 1 and 2 from Figures 4.9(c) and

4.10(c).

It is quite obvious from Figure 4.10 (b) that the algorithm fails to rectify the errors

at the Y-junction where all three object surfaces meet. It is because the type of occlusion

is Case 3' which is not implemented in this thesis.

Figure 4.10: Results with occlusion compensation. (a) The recovered disparities shown in the gray-level coded display, (b) The composite error map, (c) The errors are disseminated to 8 different error values (0.5, 1, 1.5, ..., 4).

Chapter 5 Conclusion

Summary

This thesis describes a work on cooperative motion stereo. It deals with the issue

of stereo correspondence. A disparity gradient based relaxation algorithm is en~ployed to

eliminate the ambiguity in stereo matching. The exploitation of multiple images in the

motion stereo sequence reduces the rate of erroneous matching while increasing the

precision of the disparity. The inclusion of the whole sequence of images, however, also

creates a more complicated situation for occlusion analysis. Several models of surface

occlusion are therefore studied, and a modified cooperative algorithm is presented to

handle the exacerbated occlusion problem.

The main contributions of this thesis work are:

(a) Through the analysis of EPI path in the spatial-temporal space, we derive a

multiple voting method to collect matching evidence from the motion

sequence. Using linear interpolation, the votes are collected onto the

voting space based on a projection plane located at the cyclopean image.

(b) Exploiting the properties of the disparity gradient, and the relationship

between the XYZ physical space and the xyd voting space, we are able to

use the disparity gradient as the basis of our relaxation algorithm to

eliminate false targets. The cooperative algorithm is shown to work well

on both random dot stereograms and normal motion stereo images where

edge features are relatively sparse.

We explore some simple occlusion cases in our study of occlusion

recovery. It is discovered that the support based on the disparity gradient

cannot distinguish false targets from occluded edges in motion stereo

sequence, because their vote strength may be very similar. By assuming

that an edge can be occluded by only one surface in a sequence and the

occluding surface is not narrow, we derive an occlusion recovery

function. Extra support is given to an occluded point based on the

estimated number of votes lost due to occlusion. Evidence of

improvement in the cooperative algorithm is observed in the experiment

for recovering occlusion bound sues. -'

The cooperative algorithm in this thesis can be viewed as an example of the

connectionist approach. It relies only on local neighborhood support and is hence

inherently suitable for massive parallelism.

5.2 Discussion and Future Work

Feature-based stereo differs from area-based stereo in that the former usually does

not employ correlation methods. Nevertheless, there are various levels of abstraction in

the feature-based approach, ranging from pixels (as in the RDS) to very high level

geometric structures. The cooperative approach adopted in this thesis is shown to be

applicable to the feature-based stereo matching at both pixel and contour levels. We

choose not to rely on higher-level features. As a result, the algorithm can be applied in a

wider range of real-world images. In this way, it is an advance from the work on line-

based motion stereo. In our experiments, relatively large search ranges of disparity values

have been tested. Despite some slight increase of matching errors for large search

ranges, it is encouraging to see that the algorithm has constantly yielded satisfactory

results, which indicates that only limited prior knowledge is required in the estimation of

the depth range of the objects in the scene.

The motion stereo approach to stereo matching provides higher precision and less

errors than binocular stereo approach. Our approach integrates these advantages by a

multiple voting method. High precision matching evidence from a long baseline and low

error matching evidence from a short baseline within motion stereo are combined to

produce complete evidence for matching. Since we do not base our evidence on any

particular image in the motion sequence, evidence from all pairs within a motion

sequence is collected. The voting scheme is also more robust than the practice of

chaining, because the reference frame is actually a virtual frame (the projection plane)

which is located at the position of the cyclopean image. The generation of the

hypothetical votes does not depend on the apparent feature existence on a particular

reference frame. As a result, our algorithm is more tolerant to noise.

From our study, we find that motion stereo has exacerbated the occlusion

problem. Instead of a simple left or right occlusion, as in the binocular stereo, motion

stereo exhibits a subtle combination of both due to its use of continuous evidence anlong

images in the sequence. The analysis of the surface occlusion models and occlusion

recovery functions in this thesis is quite preliminary. The exclusion of nlultiple andlor

70

narrow occluding surfaces is apparently too restricted. Study on occlusion recovery

algorithms and functions for solving Case 3 and Case 3' would be a next step.

Currently, the algorithm is inlplemented on Sun workstations. A complete run

often takes more than an hour to complete. Most of the computing time is spent in the

relaxation process. There is no significant difference between the algorithm with

occlusion support function and the algorithm without the occlusion support fiinction.

Many factors may have contributed to the slow relaxation process. In our experiments,

all runs are required to reach over ninety percent of saturation. To increase the processing

speed, a threshold method may be applied after certain iterations or after certain

percentage of saturation is obtained. Another factor which may have slowed down the

relaxation process is the convergence rate of the relaxation. A study can be done to

evaluate the effect of different convergence rates on computing time and on the quality of

the results. Since the algorithm can be readily parallelized, a natural extension of the

current work would be the conversion of the program to a parallel version.

References

[Ayache87]

[Balasubra90]

[Bakers 11

[Bolles87]

[Bridwell83]

[Burt801

[Dhond92]

[Drumheller86]

[Ens931

[Grimson8 11

[Grimson851

N. Ayache and B. Faverjon. Efficient registration of stereo images by matching graph descriptions of edge segments. I~rternational Jounral of Con~puter Vision, l(2): 107-13 1, 1987.

P. Balasubramanyam. Early identification of occlusion in stereo- motion image sequences. In Proc. DARPA Image Understcmling TVorksltop, pages 1032- 1037, 1989.

H.H. Baker and T.0 Binford. Depth from edge and intensity based stereo. In Proc. 7th Inter7zatioizal Joint Conference on Artificial I~rtelligence, pages 63 1-636, 198 1.

R.C. Bolles, H.H. Baker, and D.H. Marimont. Epipolar-plane image analysis: an approach to determining structure from nlotion. I~7teri1ational Jourrzal of Computer Vision, 1:7-55, 1987.

N.J. Bridwell and T.S. Huang. A discrete spatial representation for lateral motion stereo. Conzpmter Vision, Graphics mcl Image Processing, 21:33-57, 1983.

P. Burt and B. Julesz. Modification of the classical notion of panum's fusional area. Perception, 9:67 1-682, 1980.

U. Dhond and J.K. Aggarwal. Computing stereo correspondences in the presence of narrow occluding objects. In Proc. IEEE Colzference on Computer Visiorz and Pattern Recognition, pages 758-760, 1992.

M. Drumheller and T. Poggio. On parallel stereo. In Proc. IEEE Coltfererrce olz Robotics and Aiitonzatiorz, pages 1439- 1448, 1986.

J. Ens and Z.N. Li. Real-time motion stereo. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 130- 135, 1993.

W.E.L. Grimson. A computational implementation of a theory of human stereo vision. AI Memo No. 565, A1 Lab., Massachusetts Institute of Technology, 198 1.

W.E.L. Grimson. Computational Experiments with a feature based stereo algorithm. IEEE Transaction on Partem Analysis and Machine Iittelligence, 7(1): 17-34, 1985.

R. Horaud. Structural matching for stereo vision. In Proc. IEEE 9th Interimtional Colzfererzce on Pattern Recognitiorz, pages 439-445, 1988.

R. Jarvis. A perspective on range finding techniques for computer vision. IEEE Transaction on Pattern Analysis arzd Maclziite Intelligelzce 5(2): 122-139, 1983.

B. Julesz. For~lzdatior~s of Cyclopean Perception. The University of Chicago Press, 197 1 .

N. Kim and A. Bovik. A contour-based stereo matching algorithn~ using disparity continuity. Pattern Recognition, 21(5):505-5 14, 1988.

Z.N. Li. Dynamic programming in Hough space for line matching in stereopsis. In SPIE Syl?tposiurn on Advarzces in I~ztelligent Robotics Systenzs '89, SPIE vol. 1195, pages 209-220, 1989.

Z.N. Li. Disparity gradient revisited. In I~ztenzatioizcil Sy~zposiunt on I~tforlmtion, Contputer; and Control, 1994 (to appear).

H. Lim and T. Binford. Structural correspondence in stereo vision. In Proc. DARPA Illznge Ur~derstandilzg Workslzop, pages 794-808, 1988.

J. Little and W.E. Gillett. Direct evidence for occlusion in stereo and motion. Image and Vision Conlputing, 8(4):328-340, 1990.

J.E.W. Mayhew and J.P. Frisby. Psychophysical and computational studies towards a theo~y of human stereopsis. Artificiul Irztelligerzce 17:349-385, 1981.

J.H. McIntosh and K.M. Mutch. Matching straight lines. Computer Vision, Graphics, arzd Irlzage Processing, 43:386-408, 1988.

D. Marr and T. Poggio. Cooperative computation of stereo disparity. Sciellce, 194:283-287, 1976.

G. Medioni and R. Nevatia. Segment-based stereo matching. Conlputer Vision, GI-apltics, and Image Processing, 3 1 :2- 18, 1985.

H. Moravec. Visual mapping by a robot rover. In Proc, of the I~ltermtional Joint Conferwzce on Artificial Intelligence, pages 598- 600, 1979.

[Nakayama90] K. Nakayama and S. Shimojo. Da Vinci stereopsis: depth and subjective occluding contours from unpaired image points. Vision Research, 30(11): 18 11-1825, 1990.

R. Nevatia. Depth measurement by motion stereo. Conlputer Grapllics arzd I m g e Processing, 5:203-214, 1976.

D. Nitzan. Three-dimensional vision structure for robot applications. IEEE Transnctiorzs on Pattern A~zalysis and Maclzirte I~ltelligeizce, 10(3):291-309, 1988.

Y. Ohta and T. Kanade. Stereo by intra- and inter-scanline using dynamic programming. IEEE Timzsactioizs on Pattern A~ialysis arzd Maclti~~e I~ztelligence, 7(2): 139- 154, 1985.

Y. Ohta, T. Yamamoto, and K. Ikeda. Collinear trinocular stereo using two-level dynamic programming. In Proc. IEEE 9th Intenzatiorznl Co~tference orz Pattern Recognition, pages 658-662, 1988.

M. Okutomi and T. Kanade. A multiple-baseline stereo. IEEE Tra~lsactions on Pattern A~zalysis and Machine Intelligence, 15(4):353-363, 1993.

S. Peng and G. Medioni. Spatio-temporal analysis for velocity estimation of contours in an image sequence with occlusion. In Proc. IEEE 9th I~zter~zatiorlal Co~zfererzce on Pattern Recog~zitiorz, pages 236- 241, 1988.

S. Pollard, J .E.W. Mayhew, and J.P. Frisby. PMF: A stereo correspondence algorithm using a disparity limit. Perception, 14:449- 470, 1985.

S. Pollard, J. Porrill, J.E.W. Mayhew, and J.P. Frisby. Disparity gradient, Lipschitz continuity and computing binocular correspondences. In O.D. Faugeras and G. Giralt, editors, Proc. the 3rd Int. Symp. on Robotics Research, pages 19-26, 1986, (Also in IU89, pp. 197-214, 1990).

C.V. Stewart and C.R. Dyer, The trinocular general support algorithm: a three camera stereo algorithm for overcoming binocular matching errors. In Proc. Second Iizter~zario~zal Co~zfere~zce on Coinputer Vision, pages 134-138, 1988.

C.V. Stewart. An analysis of the probability of disparity changes in stereo matching and new algorithm based on the analysis. In Proc.

IEEE Collference on Computer Vision nnd Pattern Recognition, pages 670-67 1, 199 1.

[Toh90] P. Toh and A. Forrest. Occlusion detection in early vision. In Proc. IEEE btte~watiorzal Colzferelzce orz Pattern Recognition, pages 126- 131, 1990.

[TrivediSS] H. Trivedi and S. Lloyd. The role of disparity gradient in stereo vision. Perception, 14:685-690, 1985.

[Yuille84] A.L. Yuille and T. Poggio, A generalized ordering constraint for stereo correspondence. AI Memo No 777, A1 Lab., Massachusetts Institute of Technology, 1984.

[Zhang93] D. Zhang. Real-time line detection and line-based motion stereo. M.Sc.Thesis, School of Con~puting Science, Simon Fraser University, 1993.

a spatial-temporal analysis of matching in feature-based...

Documents