abstract method results - foundation · [1] 3.33 3.89 67s 830k ca, sgm, se, mf, bf [2] 4.00 4.54 1s...

1
IEEE 2017 Conference on Computer Vision and Pattern Recognition End-to-End Training of Hybrid CNN-CRF Models for Stereo Patrick Knöbelreiter 1 Christian Reinbacher 1 Alexander Shekhovtsov 1 Thomas Pock 1,2 {knoebelreiter, reinbacher, shekhovtsov, pock}@icg.tugraz.at Method Abstract Results Bi-level Optimization Problem min (, ) . . ∈ arg min () “Learn parameters of CNNs, such that the minimizer of the CRF model minimizes a certain loss function“ Directly computing a gradient is not possible, because arg min is not differentiable Compute a gradient based on the structured output support vector machine (SSVM) Minimize upper bound of the loss End-to-End Training P 1 P 2 Qualitative Results Conclusion Principled approach without post- processing - only optimization Accurate and fast Exploit GPU based CRF solver Learn unary and pairwise term such that the CRF works best All parameters learned from data Code online: https://github.com/VLOGroup/cnn- crf-stereo [1] Zbontar, J. et al. Stereo matching by training a convolutional neural network to compare patches. JMLR 2016. [2] Luo, W. et al. Efficient deep learning for stereo matching. CVPR 2016. [3] Shekhovtsov, A. et al. Solving dense image matching in real- time using discrete-continuous optimization. In CVWW 2016. [4] Taskar, B. et al. Max-margin markov networks. MIT Press 2003. [5] Barron, J. T. et al. The fast bilateral solver. ECCV 2016 [6] Mayer, N. et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. CVPR 2016 Stereo Find the disparity for each pixel in an image Standard approach: 1. Densely extract features (engineered or learned ) from 0 and 1 2. Match features for a range of disparities 3. Apply (engineered) post-processing Margin True Label Most Violated Constraint Label ( ) 1 Energy and Inference = −( 0 , 1 ) “Costs for candidate disparities“ Middlebury 2014 Method Ø RMS Ø bad2 Time/MP Parameters Post-Processing [1] 21.3 8.29 112s 830k CA, SGM, SE, MF, BF [5] 15.0 8.62 140s 830k CA, SGM, SE, MF, BF, RBS Ours 14.4 12.5 3.69s 281k - Kitti 2015 Method Non-occ All Time Parameters Post-Processing [1] 3.33 3.89 67s 830k CA, SGM, SE, MF, BF [2] 4.00 4.54 1s 700k CA, SGM, LR, SE, MF, BF, RBS [6] 4.32 4.34 0.06s 42M - Ours 4.84 5.50 1.3s 281k - + Benchmark Results CA…Cost Aggregation, SGM…Semi-Global Matching, SE…Sublabel Enhancement, LR…Left- Right Check, MF…Median Filtering, BF…Bilateral Filtering, RBS…Robust Bilateral Solver Kitti Closeup Left Image ( 0 ) Right Image ( 1 ) Disparity Map () Contrast Sensitive / Pairwise CNN Correlation Unary CNN Unary CNN CRF I 0 I 1 D Unary CNN Generalize photometric/engineered features Automatically learn suitable features for matching from data Pairwise CNN Generalize commonly used contrast sensitive pairwise weights Automatically learn suitable pairwise weights for the CRF Correlation Fixed matching function Match learned features CRF Global energy Only data- and smoothness-costs Model Overview Contrast Sensitive Learned Reference Disparity Error Reference Disparity Error Ours [2] [6] Middlebury 2014 Kitti 2015 19.5 3.66 2.11 References min + , Horizontal Chains Vertical Chains Pairwise Cost: ( , ) Unary Cost: 1 2 =1 =2 =3 Label ( ) 2 Observation: Recent learning-based approaches heavily rely on engineered post-processing A lot of (hyper-)parameters to tune by hand Deep models Idea: No engineered post-processing End-to-end learning of the complete model Directly optimize for the final output Compact model Motivation (1) Find most violated constraint (2a) Decrease energy of true label (2b) Increase energy of most violated constraint Training Procedure Interpretation 1 2 Optimize total of unary and pairwise cost on a 4-connected grid Dual Minorize Maximize (DMM) Solve sum of chain sub-problems [3] Lagrange Multiplier ensures consistent solutions of sub-problems

Upload: others

Post on 02-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract Method Results - Foundation · [1] 3.33 3.89 67s 830k CA, SGM, SE, MF, BF [2] 4.00 4.54 1s 700k CA, SGM, LR, SE, MF, BF, RBS [6] 4.32 4.34 0.06s 42M-Ours 4.84 5.50 1.3s 281k

IEEE 2017 Conference on Computer Vision and Pattern

Recognition End-to-End Training of Hybrid CNN-CRF Models for Stereo

Patrick Knöbelreiter1 Christian Reinbacher1 Alexander Shekhovtsov1 Thomas Pock1,2

{knoebelreiter, reinbacher, shekhovtsov, pock}@icg.tugraz.at

MethodAbstract Results

Bi-level Optimization Problem

min𝜃

𝑙(𝑥, 𝑥∗) 𝑠. 𝑡. 𝑥 ∈ argmin𝑥∈𝑋

𝑓(𝑥)

“Learn parameters 𝜃 of CNNs, such that the minimizer of the CRF model minimizes a certain loss function“

• Directly computing a gradient is not possible, because arg min is not differentiable

• Compute a gradient based on the structured output support vector machine (SSVM)

⇒ Minimize upper bound of the loss

End-to-End Training

𝜌

P1

P2

𝑥𝑖 − 𝑥𝑗

Qualitative Results

Conclusion• Principled approach without post-

processing - only optimization• Accurate and fast• Exploit GPU based CRF solver• Learn unary and pairwise term such

that the CRF works best• All parameters learned from data• Code online:

https://github.com/VLOGroup/cnn-crf-stereo

[1] Zbontar, J. et al. Stereo matching by training a convolutionalneural network to compare patches. JMLR 2016.[2] Luo, W. et al. Efficient deep learning for stereo matching. CVPR 2016.[3] Shekhovtsov, A. et al. Solving dense image matching in real-time using discrete-continuous optimization. In CVWW 2016.

[4] Taskar, B. et al. Max-margin markov networks. MIT Press 2003.[5] Barron, J. T. et al. The fast bilateral solver. ECCV 2016[6] Mayer, N. et al. A large dataset to train convolutionalnetworks for disparity, optical flow, and scene flow estimation. CVPR 2016

StereoFind the disparity for each pixel in an image

Standard approach:1. Densely extract features (engineered or

learned ) from 𝐼0 and 𝐼12. Match features for a range of disparities3. Apply (engineered) post-processing

MarginTrue Label Most Violated Constraint

Label

𝑓𝑖(𝑥𝑖)1

Energy and Inference

𝑓𝑖 𝑥𝑖 = −𝜎( 𝜙𝑖0, 𝜙𝑖−𝑑

1 )

“Costs for candidate disparities“

Mid

dle

bu

ry20

14

Method Ø RMS Ø bad2 Time/MP Parameters Post-Processing

[1] 21.3 8.29 112s 830k CA, SGM, SE, MF, BF

[5] 15.0 8.62 140s 830k CA, SGM, SE, MF, BF, RBS

Ours 14.4 12.5 3.69s 281k -

Kit

ti20

15

Method Non-occ All Time Parameters Post-Processing

[1] 3.33 3.89 67s 830k CA, SGM, SE, MF, BF

[2] 4.00 4.54 1s 700k CA, SGM, LR, SE, MF, BF, RBS

[6] 4.32 4.34 0.06s 42M -

Ours 4.84 5.50 1.3s 281k -

+

Benchmark Results

CA…Cost Aggregation, SGM…Semi-Global Matching, SE…Sublabel Enhancement, LR…Left-Right Check, MF…Median Filtering, BF…Bilateral Filtering, RBS…Robust Bilateral Solver

Kitti CloseupLeft Image (𝐼0) Right Image (𝐼1)

Disparity Map (𝐷)

Contrast Sensitive / Pairwise CNN

Correlation

Unary CNN

Unary CNN

CRF

I0

I1

D

Unary CNN• Generalize photometric/engineered features • Automatically learn suitable features for

matching from data

Pairwise CNN• Generalize commonly used contrast

sensitive pairwise weights• Automatically learn suitable pairwise

weights for the CRF

Correlation• Fixed matching function • Match learned features

CRF• Global energy• Only data- and smoothness-costs

Model Overview

Contrast Sensitive Learned

Refe

ren

ceD

isp

arit

yEr

ror

Refe

ren

ceD

isp

arit

yEr

ror

Ours [2] [6]

Middlebury 2014 Kitti 2015

19.53.66 2.11

References

min𝑥∈𝑋

𝑓 𝑥 ≔

𝑖∈𝑉

𝑓𝑖 𝑥𝑖 +

𝑖𝑗∈𝐸

𝑤𝑖𝑗𝜌 𝑥𝑖 , 𝑥𝑗

Horizontal Chains Vertical Chains

Pairwise Cost: 𝑓𝑖𝑗(𝑥𝑖 , 𝑥𝑗)

Unary Cost: 𝑓𝑖 𝑥𝑖

𝑥1 𝑥2 𝑥𝑁

𝑘 = 1

𝑘 = 2

𝑘 = 3

Label

𝑓𝑖(𝑥𝑖)2

Observation: • Recent learning-based approaches heavily

rely on engineered post-processing ⇒ A lot of (hyper-)parameters to tune by hand• Deep models

Idea: • No engineered post-processing• End-to-end learning of the complete model⇒ Directly optimize for the final output

• Compact model

Motivation

(1) Find most violated constraint (2a) Decrease energy of true label(2b) Increase energy of most violated constraint

Training Procedure

Interpretation

1

2

• Optimize total of unary and pairwise cost on a 4-connected grid• Dual Minorize Maximize (DMM) ⇒ Solve sum of chain sub-problems [3]• Lagrange Multiplier ensures consistent solutions of sub-problems