semantic image segmentation via deep parsing...

Post on 08-May-2020

18 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Semantic Image Segmentation via Deep Parsing Network

Ziwei Liu*, Xiaoxiao Li*, Ping Luo, Chen Change Loy, Xiaoou Tang

Multimedia Lab, The Chinese University of Hong Kong

Problem

Problem

Person

Table

TV

Plant

Background

Previous Attempts

SVM SVM + MRF

CNN CNN + MRF ?

State-of-the-arts

Fully Convolutional Network [Long et al. CVPR 2015]

Learned Features ✓ Pairwise Relations ✗ Joint Training - # Iterations -

State-of-the-arts

DeepLab [Chen et al. ICLR 2015]

Learned Features ✓ Pairwise Relations ✓ Joint Training ✗ # Iterations 10

State-of-the-arts

CRF as RNN [Zheng et al. ICCV 2015]

Learned Features ✓ Pairwise Relations ✓ Joint Training ✓ # Iterations 10

State-of-the-arts

Deep Parsing Network (DPN)

Learned Features ✓ Pairwise Relations ✓ Joint Training ✓ # Iterations 1

Contributions

• Extend MRF to incorporate richer relationships • Formulate mean field inference of high-order MRF as CNN • Capable of joint training and one-pass inference

Revisit MRF

𝑝𝑝𝑖𝑖 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 =′ 𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡 = 0.8

𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 = −� ln 𝑝𝑝𝑖𝑖(𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙)𝑖𝑖

Unary Term 𝑖𝑖

min 𝐸𝐸 = 𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 + 𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈

Energy Function

Revisit MRF

𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 = −� ln 𝑝𝑝𝑖𝑖(𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙)𝑖𝑖

Unary Term

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖) ∗ 𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗)𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

min 𝐸𝐸 = 𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 + 𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈

Energy Function 𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜 𝑖𝑖, 𝑗𝑗 = , = 0.8 𝑖𝑖 𝑗𝑗

Appearance Consistency

Revisit MRF

𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 = −� ln 𝑝𝑝𝑖𝑖(𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙)𝑖𝑖

Unary Term

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖) ∗ 𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗)𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

min 𝐸𝐸 = 𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 + 𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈

Energy Function 𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡 𝑖𝑖; 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 =′ 𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡 = 0.1

Label Consistency

Richer Relationships in DPN

𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 = −� ln 𝑝𝑝𝑖𝑖(𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙)𝑖𝑖

Unary Term

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖) ∗ 𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗)𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

min 𝐸𝐸 = 𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈 + 𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈

Energy Function

Richer Relationships in DPN

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖) ∗ 𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗)𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

( , )

Triple Penalty

Richer Relationships in DPN

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖) ∗ 𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗)𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

( , , ) …

𝑧𝑧1

𝑧𝑧𝑛𝑛 𝑧𝑧1

𝑧𝑧𝑛𝑛

Triple Penalty

Richer Relationships in DPN

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖) ∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

( , , ) …

𝑧𝑧1

Triple Penalty

𝑧𝑧𝑛𝑛 𝑧𝑧1

𝑧𝑧𝑛𝑛

Triple Penalty

Richer Relationships in DPN

𝑖𝑖

0.8

0.6

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡 = 0.7

𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

𝑙𝑙𝑏𝑏𝑜𝑜

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖) ∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term Mixture of Label Contexts

𝑖𝑖

Richer Relationships in DPN

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖, 𝑗𝑗) ∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

0.8 𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

0.6 𝑝𝑝𝑙𝑙𝑈𝑈𝑜𝑜𝑜𝑜𝑈𝑈

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡 , = 0.2

Mixture of Label Contexts

𝑖𝑖 𝑗𝑗

Richer Relationships in DPN

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖, 𝑗𝑗) ∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

0.8 𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

0.6 𝑝𝑝𝑙𝑙𝑈𝑈𝑜𝑜𝑜𝑜𝑈𝑈

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡 , = 0.2

Mixture of Label Contexts

𝑖𝑖 𝑗𝑗

Richer Relationships in DPN

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖, 𝑗𝑗) ∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term

𝑗𝑗

0.6

0.8

𝑝𝑝𝑙𝑙𝑈𝑈𝑜𝑜𝑜𝑜𝑈𝑈

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡 , = 0.8

𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

𝑖𝑖

Spatial Order

Mixture of Label Contexts

𝑗𝑗 𝑖𝑖

Richer Relationships in DPN

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = �𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡(𝑖𝑖, 𝑗𝑗) ∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term

𝑖𝑖 𝑗𝑗

0.8 𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

0.6 𝑝𝑝𝑙𝑙𝑈𝑈𝑜𝑜𝑜𝑜𝑈𝑈

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡 , = 0.2

Mixture of Label Contexts

𝑖𝑖 𝑗𝑗

Richer Relationships in DPN

𝑖𝑖 𝑗𝑗

0.8 𝑡𝑡𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

0.6 𝑝𝑝𝑙𝑙𝑈𝑈𝑜𝑜𝑜𝑜𝑈𝑈

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡 , = 0.2

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = ��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘(𝑖𝑖, 𝑗𝑗)𝑘𝑘

∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term

𝑘𝑘

Mixture of Label Contexts

Mixture of Label Contexts

𝑖𝑖 𝑗𝑗

Solve High-order MRF as Convolution

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = ��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘(𝑖𝑖, 𝑗𝑗)𝑘𝑘

∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜(𝑖𝑖, 𝑗𝑗; 𝑧𝑧)𝑧𝑧𝑖𝑖,𝑗𝑗

Pairwise Term

𝑝𝑝𝑖𝑖 ∝ 𝑙𝑙𝑒𝑒𝑝𝑝 − 𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈𝑖𝑖 + �𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈𝑖𝑖,𝑗𝑗 ∗ 𝑝𝑝𝑗𝑗𝑗𝑗

Mean Field Solver 𝑖𝑖

𝑗𝑗

Solve High-order MRF as Convolution

Iterative Updating Formula

𝑖𝑖 𝑗𝑗

Convolution Summation

𝑝𝑝𝑖𝑖 ∝ 𝑙𝑙𝑒𝑒𝑝𝑝 − 𝑈𝑈𝑈𝑈𝑙𝑙𝑈𝑈𝑈𝑈𝑖𝑖 + �𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈𝑖𝑖,𝑗𝑗 ∗ 𝑝𝑝𝑗𝑗𝑗𝑗

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈𝑖𝑖,𝑗𝑗 : Different Types of Local and Global Filters

Deep Parsing Network

Max Pooling

Deconvolution Local Convolution Convolution

Unary Term Pairwise Term

Triple Penalty Label Contexts

Min Pooling

Deep Parsing Network Unary Term

Fine-tuned VGG-16 Network

Max Pooling Deconvolution Convolution

Deep Parsing Network

Original Image Unary Term Ground Truth

Deep Parsing Network

Unary Term Pairwise Term

Triple Penalty Label Contexts

Max Pooling

Deconvolution Local Convolution Convolution

Min Pooling

�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜 𝑗𝑗; 𝑧𝑧 ∗ 𝑝𝑝𝑧𝑧𝑧𝑧

Deep Parsing Network Triple Penalty

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = ��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗𝑘𝑘

∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜 𝑖𝑖, 𝑗𝑗; 𝑧𝑧 ∗ 𝑝𝑝𝑧𝑧𝑧𝑧𝑖𝑖,𝑗𝑗

�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜 𝑖𝑖, 𝑗𝑗; 𝑧𝑧 ∗ 𝑝𝑝𝑧𝑧𝑧𝑧

Deep Parsing Network Triple Penalty

j z

Unary Term

# classes

�𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜 𝑗𝑗; 𝑧𝑧 ∗ 𝑝𝑝𝑧𝑧𝑧𝑧

𝑝𝑝𝑧𝑧 𝑑𝑑𝑖𝑖𝑜𝑜𝑜𝑜 𝑗𝑗; 𝑧𝑧

# classes

Local Conv

j

Deep Parsing Network

Original Image Unary Term Ground Truth

Triple Penalty

Deep Parsing Network

Unary Term Pairwise Term

Triple Penalty Label Contexts

Max Pooling

Deconvolution Local Convolution Convolution

Min Pooling

Deep Parsing Network Mixture of Label Contexts

𝑃𝑃𝑙𝑙𝑖𝑖𝑈𝑈 = ��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗�𝑑𝑑𝑖𝑖𝑜𝑜𝑡𝑡 𝑖𝑖, 𝑗𝑗; 𝑧𝑧 ∗ 𝑝𝑝𝑧𝑧𝑧𝑧𝑘𝑘𝑖𝑖,𝑗𝑗

��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗𝑘𝑘𝑖𝑖,𝑗𝑗

Deep Parsing Network Mixture of Label Contexts

Triple Penalty Result

# classes

𝑡𝑡𝑈𝑈𝑖𝑖

i j

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗

i

�𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗ 𝑡𝑡𝑈𝑈𝑖𝑖(𝑗𝑗)𝑗𝑗

Min Pooling

��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗ 𝑡𝑡𝑈𝑈𝑖𝑖(𝑗𝑗)𝑘𝑘𝑗𝑗

class 1 class 1

Deep Parsing Network Mixture of Label Contexts

Triple Penalty Result

# classes

𝑡𝑡𝑈𝑈𝑖𝑖

i j

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗

i

�𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗ 𝑡𝑡𝑈𝑈𝑖𝑖(𝑗𝑗)𝑗𝑗

Min Pooling

��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗ 𝑡𝑡𝑈𝑈𝑖𝑖(𝑗𝑗)𝑘𝑘𝑗𝑗

class 2 class 2

Deep Parsing Network Mixture of Label Contexts

Triple Penalty Result

# classes

𝑡𝑡𝑈𝑈𝑖𝑖

i j

𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗

i

�𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗ 𝑡𝑡𝑈𝑈𝑖𝑖(𝑗𝑗)𝑗𝑗

Min Pooling

��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗ 𝑡𝑡𝑈𝑈𝑖𝑖(𝑗𝑗)𝑘𝑘𝑗𝑗

class 3 class 3

# classes

Deep Parsing Network Mixture of Label Contexts

Min Pooling

��𝑐𝑐𝑜𝑜𝑜𝑜𝑡𝑡𝑘𝑘 𝑖𝑖, 𝑗𝑗 ∗ 𝑡𝑡𝑈𝑈𝑖𝑖(𝑗𝑗)𝑘𝑘𝑗𝑗

Deep Parsing Network

Original Image Unary Term Ground Truth

Triple Penalty Label Contexts

Deep Parsing Network Joint Tuning

Unary Term Pairwise Term

Triple Penalty Label Contexts

Deep Parsing Network

Original Image Unary Term Ground Truth

Triple Penalty Label Contexts Joint Tuning

FCN 62.2 DeepLab† 73.9

CRFasRNN† 74.7 BoxSup† 75.2

DPN† 77.5

(PASCAL VOC 2012 Challenge test set)

Overall Performance (Published Results)

Label Contexts Learned

favor

penalty

bkg

areo

bi

ke

bird

bo

at

bott

le

bus

car

cat

chai

r co

w

tabl

e do

g ho

rse

mbi

ke

pers

on

plan

t sh

eep

sofa

tr

ain

tv

bkg areo bike

tv train sofa

bird boat

bottle bus car cat

chair cow

table dog

horse

sheep

mbike person

plant

mbi

ke

bike

person

Label Contexts Learned

person : mbike chair : person

favor

penalty

Original Image Ground Truth FCN

DPN DeepLab CRFasRNN

Challenging Case

Original Image Ground Truth Our Result

Failure Case

car

Conclusions

• General framework of one-pass CNN to model high-order MRF

• Various types of pairwise terms are formulated as local and

global filters • High performance and easy to be speeded up

Semantic Image Segmentation via Deep Parsing Network

Thanks!

Project Page: http://personal.ie.cuhk.edu.hk/~lz013/projects/DPN.html

top related