on the impact of the error measure selection in evaluating disparity maps

On the Impact of the Error Measure Selection in Evaluating Disparity

MapsIvan Cabezas, Victor Padilla, Maria Trujillo and

Margaret [email protected]

June 27th 2012World Automation Congress, ISIAC, Puerto Vallarta - Mexico

Multimedia and Vision Laboratory

MMV is a research group of the Universidad del Valle in Cali, Colombia

Ivan MariaVictor Margaret

On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico

Multimedia and Vision Laboratory Research: http://mmv-lab.univalle.edu.co

CameraSystem

3D World

2D Images

InverseProblem

OpticsProblem

Content

Stereo Vision Application Domains The Impact of Inaccurate Disparity Estimation Quantitative Evaluation Commonly Used Evaluation Measures Error Measure Function Error Measures Purpose and Meaning Research Problem Comparative Performance Scenario

Middlebury's Evaluation Model A* Evaluation Model Research Questions Algorithm to Measure the Consistency Consistency According to Evaluation Models

Conclusions


Stereo Vision

The stereo vision problem is to recover the 3D structure of a scene

3D ModelStereo Images

Disparity Map

Left Right

Correspondence Algorithm

ReconstructionAlgorithm

Yang Q. et al., Stereo Matching with Colour-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling, IEEE PAMI 2009Scharstein D., and Szeliski R., High-accuracy Stereo Depth Maps using Structured Light, CVPR 2003On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico

Points Disparity Values

d: P L 01

dmax

.

.

.

23

p1p2

.

.

.

p3

pn

P L

P

CrCl

πl πr

pl pr

B

f

Z

Applications Domains

3D recovering has multiple application domains


Whitehorn M., Vincent T., Debrunner C. and Steele J., Stereo Vision on LHD Automation References, IEEE, Trans on Industry Apps., 2003Van der Mark W., and Gavrila D., Real-Time Dense Stereo for Intelligent Vehicles, IEEE Trans. On Intelligent Transportation Systems, 2006Point Grey Research Inc., www.ptgrey.com

The Impact of Inaccurate Disparity Estimation

Disparity is the distance between corresponding points

Trucco, E. and Verri A., Introductory Techniques for 3D Computer Vision, Prentice Hall 1998

Accurate Disparity Estimation Inaccurate Disparity Estimation

pr’

P ’

P

CrCl

πl πrpl pr

B

f

Z


P

CrCl

πl πrpl pr

B

f

ZZ ’

Quantitative Evaluation

Szeliski, R., Prediction Error as a Quality Metric for Motion and Stereo, ICCV 2000Kostliva, J., Cech, J., and Sara, R., Feasibility Boundary in Dense and Semi-Dense Stereo Matching, CVPR 2007Tomabari, F., Mattoccia, S., and Di Stefano, L., Stereo for robots: Quantitative Evaluation of Efficient and Low-memory Dense Stereo Algorithms, ICCARV 2010Cabezas, I. and Trujillo M., A Non-Linear Quantitative Evaluation Approach for Disparity Estimation, VISAPP 2011Cabezas, I. Trujillo M., and Florian M., An Evaluation Methodology for Stereo Correspondence Algorithms, VISAPP 2012

The use of a methodology allows to:

Assert specific components and procedures

Tune algorithm's parameters

Measure the progress in the field


Commonly Used Evaluation Measures

There are different evaluation measures


Sigma Z Error, SZE

Cabezas, I., Padilla, V., and Trujillo M., A Measure for Accuracy Disparity Maps Evaluation, CIARP 2011

Error Measure Function


Yang Q. et al., Stereo Matching with Colour-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling, IEEE PAMI 2008Scharstein D., and Szeliski R., High-accuracy Stereo Depth Maps using Structured Light, CVPR 2003

all

nonocc

disc

Measure nonocc all disc

MAE 0,41 1,48 0,70

MSE 1,48 33,97 4,25

MRE 0,01 0,03 0,02

BMP 2,90 8,78 7,79

SZE 71,39 341,55 37,86

Estimated Ground-truth

Test Bed

Error Criteria

Evaluation Measures

× ×

Error Measures Purpose and Meaning


In practice, different error measures are used for a same purpose: find a distance between estimated and ground-truth disparity data

They have different meaning, as well as different properties

Research Problem


The use of different error measures may produce contradictories score errors

Scharstein, D. and Szeliski, R., A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 2002Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012

RDPADCensus

Algorithms

Teddy

Cones

Comparative Performance Scenario


Four stereo image pairs: Tsukuba, Venus, Teddy, Cones

Three error criteria: nonocc, all, disc

112 Stereo Correspondence Algorithms

Two evaluation models: Middlebury and A* k: a threshold for determining the top-performer

algorithms in the Middlebury's evaluation model

Middlebury’s Methodology Evaluation Model

Compute Error Measures

Algorithm nonocc all disc

ObjectStereo 2.20 1 6.99 2 6.36 1

GC+SegmBorder 4.99 5 5.78 1 8.66 5

PUTv3 2.40 2 9.11 5 6.56 2

PatchMatch 2.47 3 7.80 3 7.11 3

ImproveSubPix 2.96 4 8.22 4 8.55 4

Algorithm Average Rank

FinalRank

ObjectStereo 1.33 1

PatchMatch 3.00 2

PUTv3 3.33 3

GC+SegmBorder 3,66 4

ImproveSubPix 4.00 5

Apply Evaluation Model


ObjectStereo 2.20 6.99 6.36

GC+SegmBorder 4.99 5.78 8.66

PUTv3 2.40 9.11 6.56

PatchMatch 2.47 7.80 7.11

ImproveSubPix 2.96 8.22 8.55

Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012

Middlebury’sEvaluation Model

…


A* Evaluation Model

The A* evaluation model performs a partitioning of the stereo algorithms under evaluation, based on the Pareto Dominance relation

Compute Error Measures


ObjectStereo 2.20 6.99 6.36

GC+SegmBorder 4.99 5.78 8.66

PUTv3 2.40 9.11 6.56

PatchMatch 2.47 7.80 7.11

ImproveSubPix 2.96 8.22 8.55

Algorithm nonocc all disc Set

ObjectStereo 2.20 6.99 6.36 A*

GC+SegmBorder 4.99 5.78 8.66 A*

PUTv3 2.40 9.11 6.56 A’

PatchMatch 2.47 7.80 7.11 A’

ImproveSubPix 2.96 8.22 8.55 A’

Apply Evaluation Model

, GC+SegmBorder

PatchMatch

ObjectStereo

PUTv3 ImproveSubPix,,


…

A* Evaluation Model

Research Questions


What is the impact of using an error measure instead of other?

Different evaluation results are obtained using different evaluation measures

Scharstein, D. and Szeliski, R., A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 2002Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012

Middlebury's Model A* Model

Automati c is computed without human intervention

Reliable I has to operate without being influenced by external factors, and in a

deterministic way

Meaningful is intended for a particular purpose, has a concise interpretation and does not lead to

ambiguous results

Unbiased is capable of accomplish the measurements for which is was conceived, and its use allow to

perform impartial comparisons

Consistent The scores produced by an error measure should be compatible with produced scores by another error

measure with a common particular purpose

Research Questions (ii)

How does an error measure have to be choose ?

A characterisation of error measures may serve as selection criteria

An error measure:

AUTOMATIC

RELIABLE

CONSISTENT

MEANINGFUL

UNBIASED


Algorithm to Measure the Consistency


Consistency is measured by determining the percentages of agreements in obtained results by varying the used error measure

Consistency According to Evaluation Models


The MRE, followed by the MSE error measures shown the highest percentages of consistency using the Middlebury's model

The SZE, followed by the MRE error measures shown the highest percentages of consistency using the A* model

Middlebury's Model A* Model

Conclusions


Using the Middlebury’s evaluation model the MRE and the MSE shown a high consistency

Using the A* evaluation model the SZE and the MRE shown a high consistency

The BMP shown a low consistency in both used evaluation models

A characterisation of error measure was presented in order to support the selection of an error measure

It includes the following attributes: automatic, reliable, meaningful, unbiased, and consistent

Experimental evaluation was focused on measuring consistency

The selection of an error measure is not a trivial issue since it impacts on obtained results during a disparity maps evaluation process

On the Impact of the Error Measure Selection in Evaluating Disparity

MapsIvan Cabezas, Victor Padilla, Maria Trujillo and

Margaret [email protected]

June 27th 2012World Automation Congress, ISIAC, Puerto Vallarta - Mexico

on the impact of the error measure selection in evaluating disparity maps

Technology

error measure selection

wac isiac

evaluation of dense

prediction error

puerto vallarta mexico

stereo vision problem

disparity mapsivan cabezas

evaluation methodology