supplementary material: displets: resolving stereo ... · figure 1: illustration of displets. this...

15
Supplementary Material: Displets: Resolving Stereo Ambiguities using Object Knowledge Fatma G¨ uney MPI T ¨ ubingen [email protected] Andreas Geiger MPI T ¨ ubingen [email protected] Abstract In this supplementary document, we first show how disparities relate to the 3D planes used in our representation. Next, we provide an illustration of the sampled displets for a single test image and visualize the influence of the displets with respect to the associated superpixels. For reproducibility, we also list the learned parameter settings which we used throughout all our experiments and show the change in performance with respect to these parameters. Finally, we show additional qualitative results on KITTI validation set. On our project website 1 we also provide a supplementary video demonstrating the semi-convex hull optimization process which we use in order to simplify the CAD models. 1. Plane Representation In our formulation, each superpixel is represented as a random variable n i R 3 describing a plane in 3D (n T i x =1 for x R 3 on the plane). We denote the disparity of plane n i at pixel p =(u, v) T by ω(n i , p). In the following, we show how ω(n i , p) can be derived for a (rectified) pinhole stereo camera with known intrinsics and extrinsics: u = fx z + c u , x =(u - c u ) z f v = fy z + c v , y =(v - c v ) z f d = fL z 1 = xn x + yn y + zn z 1 = (u - c u ) z f n x +(v - c v ) z f n y + zn z 1 z = (u - c u ) n x f +(v - c v ) n y f + n z d = (u - c u )Ln x +(v - c v )Ln y + fLn z d = Ln x u + Ln y v + fLn z - c u Ln x - c v Ln y (1) d = au + bv + c (2) Here, f,c u ,c v denote the intrinsic camera calibration parameters and L is the baseline. 2. Illustration of Displets Fig. 1 visualizes a subset of the sampled displets for a single test image which serve as input to our CRF. Wrong displet proposals are eliminated during inference if they don’t agree with the observations or overlap with over displets. 1 http://www.cvlibs.net/projects/displets/ 1

Upload: vudat

Post on 18-Aug-2019

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Supplementary Material:Displets: Resolving Stereo Ambiguities using Object Knowledge

Fatma GuneyMPI Tubingen

[email protected]

Andreas GeigerMPI Tubingen

[email protected]

Abstract

In this supplementary document, we first show how disparities relate to the 3D planes used in our representation.Next, we provide an illustration of the sampled displets for a single test image and visualize the influence of the displetswith respect to the associated superpixels. For reproducibility, we also list the learned parameter settings which weused throughout all our experiments and show the change in performance with respect to these parameters. Finally, weshow additional qualitative results on KITTI validation set. On our project website1 we also provide a supplementaryvideo demonstrating the semi-convex hull optimization process which we use in order to simplify the CAD models.

1. Plane RepresentationIn our formulation, each superpixel is represented as a random variable ni ∈ R3 describing a plane in 3D (nTi x = 1

for x ∈ R3 on the plane). We denote the disparity of plane ni at pixel p = (u, v)T by ω(ni,p). In the following, weshow how ω(ni,p) can be derived for a (rectified) pinhole stereo camera with known intrinsics and extrinsics:

u =fx

z+ cu, x = (u− cu)

z

f

v =fy

z+ cv, y = (v − cv)

z

f

d =fL

z

1 = xnx + yny + znz

1 = (u− cu)z

fnx + (v − cv)

z

fny + znz

1

z= (u− cu)

nxf

+ (v − cv)nyf

+ nz

d = (u− cu)Lnx + (v − cv)Lny + fLnz

d = Lnxu+ Lnyv + fLnz − cuLnx − cvLny (1)d = au+ bv + c (2)

Here, f, cu, cv denote the intrinsic camera calibration parameters and L is the baseline.

2. Illustration of DispletsFig. 1 visualizes a subset of the sampled displets for a single test image which serve as input to our CRF. Wrong

displet proposals are eliminated during inference if they don’t agree with the observations or overlap with over displets.

1http://www.cvlibs.net/projects/displets/

1

Page 2: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying thecorresponding disparity maps on top of the image regions they have been proposed for.

3. Displet ParametersThe influence of a displet on its associated superpixels is determined by a penalty function λki. We take λki as a

weighted sigmoid function of the distance transform

λki = λθ1

1 + exp(λa − λb DTki)

where DTki denotes the Euclidean distance transform of superpixel i with respect to the boundary of displet k. Themodel parameters λθ, λa and λb are learned from training data as described in the next section. In Fig. 2, we visualizeλki for a couple of displets and the associated superpixels. Less transparency indicates a higher penalty, e.g., we allowmore deviation at the displet boundaries. In contrast, λki takes a very large value (λθ) in the center of the displet (seeSection 4), effectively approximating a hard constraint (λki = ∞). Furthermore, we take the fitness score κk as thedisplet log likelihood (−EΩ) in order to increase the plausibility of high quality displets. In practice, we subtract thelargest fitness score of all displets originating from the same object proposal region O in order to calibrate for thescores of different regions.

4. Parameter SettingsTable 1 lists the parameter values we used throughout our experiments. The parameters in the left table are obtained

using block coordinate descent on 50 randomly selected images from the KITTI training set as described in the papersubmission. The values of the parameters in the right table have been determined empirically. Furthermore, Fig. 3shows the sensitivity when varying the main parameters in our model for both CNN and SGM features. For thisexperiment, we vary one parameter at a time while setting all other parameters to their optimal values listed in Table 1.We observe that our method is not very sensitive with respect to small parameter changes for most of the parameters.The performance in reflective regions is highly dependent on the displet parameters as the induced sigmoid functiondefines the contribution of the inferred displets to the individual superpixels.

2

Page 3: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 2: Illustration of Displet Influence λki. We penalize superpixels which do not agree (ni 6= nk,zi ) with activedisplets (dk = 1). Less transparency indicates a higher penalty, e.g., we allow more deviation at the displet boundaries.

Parameter ValueUnary Threshold (τ1) 6.98Pairwise Boundary Threshold (τ2) 3.40Pairwise Normal Threshold (τ3) 0.06Pairwise Boundary Weight (θ1) 1.50Pairwise Normal Weight (θ2) 586.87Displet Weight (θ3) 1.53Occlusion Weight 1.00Displet Consistency Weight (λθ) 6× 104

Displet Consistency Sigmoid (λa, λb) [7.90 0.82]

Parameter ValueNumber of Particles 30Number of Superpixels 1000Number of Iterations 40TRWS - Number of Iterations 200Particle Standard Deviations [0.5 0.5 5]

Table 1: Model Parameters. Here, we show the values of the learned parameters in our model.

3

Page 4: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

0.48 3.08 5.68 8.28 10.88 13.48

10

15

20

Unary Threshold

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.40 1.60 2.80 4.00 5.20 6.40

10

15

20

Pairwise Boundary Threshold

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.00 0.03 0.05 0.07 0.10 0.12

10

15

20

Pairwise Normal Threshold

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.01 0.60 1.21 1.80 2.40 3.00

10

15

20

Pairwise Boundary Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

86.88 286.88 486.88 686.88 886.881086.88

10

15

20

Pairwise Normal Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.00 0.40 0.80 1.20 1.60 2.00

10

15

20

Occlusion Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.04 0.64 1.24 1.84 2.44 3.04

10

15

20

Displet Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.03 0.27 0.51 0.75 0.99 1.23

10

15

20

Displets Consistency Weight (x104)

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.40 3.40 6.40 9.40 12.40 15.40

10

15

20

Displet Consistency Sigmoid (a)

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.03 0.35 0.67 0.99 1.31 1.63

10

15

20

Displet Consistency Sigmoid (b)

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

5 10 15 20 25 30 35 40 45 50

10

15

20

Number of Particles

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

(a) Reflective Regions. See text for details.

0.48 3.08 5.68 8.28 10.88 13.48

3

4

5

6

7

Unary Threshold

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.40 1.60 2.80 4.00 5.20 6.40

3

4

5

6

7

Pairwise Boundary Threshold

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.00 0.03 0.05 0.07 0.10 0.12

3

4

5

6

7

Pairwise Normal Threshold

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.01 0.60 1.21 1.80 2.40 3.00

3

4

5

6

7

Pairwise Boundary Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

86.88 286.88 486.88 686.88 886.881086.88

3

4

5

6

7

Pairwise Normal Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.00 0.40 0.80 1.20 1.60 2.00

3

4

5

6

7

Occlusion Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.04 0.64 1.24 1.84 2.44 3.04

3

4

5

6

7

Displet Weight

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.03 0.27 0.51 0.75 0.99 1.23

3

4

5

6

7

Displets Consistency Weight (x104)

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.40 3.40 6.40 9.40 12.40 15.40

3

4

5

6

7

Displet Consistency Sigmoid (a)

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

0.03 0.35 0.67 0.99 1.31 1.63

3

4

5

6

7

Displet Consistency Sigmoid (b)

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

5 10 15 20 25 30 35 40 45 50

3

4

5

6

7

Number of Particles

Err

or

(%)

CNN−Out−All

CNN−Out−Noc

SGM−Out−All

SGM−Out−Noc

(b) All Regions. See text for details.

Figure 3: Sensitivity to Parameters. This figure shows the change in performance while varying one parameter at atime and keeping all others fixed at their optimal values. Errors are with respect to an error threshold of 3 pixels.

4

Page 5: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

5. Results with Degraded Input Disparity MapsAs our sampling procedure is stochastic, it can handle degraded input disparity maps to some extend. Fig. 4

illustrates this fact on very challenging scenes for which the initial disparity maps contain large number of outliers.

Figure 4: Effect of Input Maps on Results. Each subfigure shows: The input image, the superpixels and the semanticsegments (first row), the input disparity maps, and our result by using these as input (second row).

5

Page 6: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

6. Additional Qualitative ResultsIn this section, we show additional qualitative results on randomly sampled images from KITTI validation set. For

each scene, we show the input image, the superpixels and semantic segments which our method takes as input as wellas our inference results without and with displets and the corresponding error images, encoded from black (≤ 1 pixelerror) to white (≥ 5 pixels error).

Figure 5: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

6

Page 7: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 6: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

7

Page 8: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 7: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

8

Page 9: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 8: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

9

Page 10: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 9: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

10

Page 11: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 10: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

11

Page 12: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 11: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

12

Page 13: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 12: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

13

Page 14: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 13: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

14

Page 15: Supplementary Material: Displets: Resolving Stereo ... · Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding

Figure 14: Qualitative Results. Each subfigure shows from top-to-bottom: The input image, the superpixels and thesemantic segments which our method takes as input (first row), our inference results without and with displets (secondrow), and the corresponding error maps from ≤ 1 pixel error in black to ≥ 5 pixels error in white (third row).

15