lecture 7: semantic segmentation - postechlecture 7: semantic segmentation bohyunghan computer...
TRANSCRIPT
Lecture 7: Semantic Segmentation
Bohyung HanComputer Vision [email protected]
CSED703R: Deep Learning for Visual Recognition (2016S)
Semantic Segmentation
• Segmenting images based on its semantic notion
2
3
Supervised Learning
Fully Convolutional Network
• Network architecture[Long15]• End‐to‐end CNN architecture for semantic segmentation• Interpret fully connected layers to convolutional layers
4
[Long15] J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Network for Semantic Segmentation. CVPR 2015
Deconvolution
16x16x21
500x500x3
Deconvolution Filter
• Bilinear interpolation filter Same filter for every class No filter learning!
• How does this deconvolution work? Deconvolution layer is fixed. Fining‐tuning convolutional layers of
the network with segmentation ground‐truth.
5
64x64 bilinear interpolation
seg ∘Fixed Pretrained on ImageNet
Fine‐tuned for segmentation
DeconvNet
• Learning a deep deconvolution network Conceptually more reasonable Better to identify fine structures of objects Designed to generate outputs from larger solution space Capable of predicting dense output scores Instance‐wise training and prediction Difficult to learn: memory intensive, large number of parameters
6[Noh15] H. Noh, S. Hong, B. Han: Learning Deconvolution Network for Semantic Segmentation, ICCV 2015
Operations in Deconvolution Network
• Unpooling Place activations to
pooled location Preserve structure of
activations
• Deconvolution Densify sparse activations Bases to reconstruct shape
• ReLU Same with convolution
network
7
How Deconvolution Network Works?
• Visualization of activations
8
Deconv: 14x14 Unpool: 28x28 Deconv: 28x28
Unpool: 56x56 Deconv: 56x56 Unpool: 112x112 Deconv: 112x112
Training and Inference
• Instance‐wise training Data augmentation: object proposals, random cropping, flipping Two‐stage training
• Binary segmentation with ground truth• Full segmentation with object proposals
Batch normalization
• Instance‐wise prediction
Each class corresponds to one of the channels in the output layer. Label of a pixel is given by max operation over all channels. Aggregation of 50 object proposals: max operations over all proposals
9
1. Input image 2. Object proposals
DeconvNet
3. Prediction and aggregation 4. Results
Results
10
11
Semi‐Supervised Learning
Motivation
• Challenges in existing supervised learning approaches Heavy labeling efforts in semantic segmentation Much more expensive to obtain pixel‐wise segmentation labels than
other kinds of labels Difficult to extend to other classes and handle more classes
12
Problem Definition
• Weakly supervised learning with hybrid annotations Many weak annotations: image‐level object class labels Few strong annotations: full segmentation labels
13
personbike
personhorse
boat persondining tablepotted planttv/monitor
DecoupledNet
• Architecture Classification network Segmentation network Bridging layers
• Characteristics Decouples classification and segmentation Train classification network first and then learn the rest of the network
14
[Hong15] S. Hong*, H. Noh*, B. Han: Decoupled Deep Neural Network for Semi‐Supervised Semantic Segmentation. NIPS 2015
Classification Network
• Specification Input: image Output: 20‐dimensional class label vector ; ∈
• Construction Fine‐tuning from VGG 16‐layer net Transferrable from any other existing classification networks
15
min ; ; , where ∈ 0,1 isGT.
Segmentation Network
• Specification Input: class‐specific activation map of input image Output: two‐channel class‐specific binary segmentation map ;
• Construction Adopting DeconvNet Customized for binary segmentation
16
min ; ; , where isbinaryGT.
Bridging Layers
• Specification Input: concatenation of and in the channel direction
Output: class‐specific activation map
• Construction Fully connected layers : pool5 feature maps
: backpropagating class‐specific information until pool5
17
Class‐Specific Information
• Class‐specific saliency map[Simonyan12]
Given an image, pixels related to specific class can be identified by computing gradient of class score w.r.t image by
18
[Simonyan12] K. Simonyan, A. Vedaldi, A. Zisserman. Deep inside Convolutional Networks: VisualisingImage Classification Models and Saliency Maps. ICLR Workshop, 2014.
⋅ ⋯
Segmentation Maps
19
Inference
• Need iterations Computing segmentation map for each identified label Using the same segmentation network with different class‐specific
information
20
;;
Inference
• Need iterations Computing segmentation map for each identified label Using the same segmentation network with different class‐specific
information
21
∗ ∗;
∗max ; , ; , ∗;
Qualitative Results
22
Quantitative Results
23
Comparison to other algorithms in PASCAL VOC 2012 validation set
Per‐class accuracy in PASCAL VOC 2012 test set
24
Weakly‐Supervised Learning
Problem Definition
• Semantic segmentation by weakly supervised learning Image‐level object class labels only Bounding boxes (and corresponding labels) only Scribbles (and corresponding labels) only
• Approaches Constrained optimization Iterative optimization Transfer learning
25
personbike
boatpersonhorse
• Training
Multiple Instance Learning
26
Input image Overfeat feature
P. O. Pinheiro, R. Collbert: From Image‐level to Pixel‐level Labeling with Convolutional Networks, CVPR 2015
Multiple Instance Learning
• Inference
27
P. O. Pinheiro, R. Collbert: From Image‐level to Pixel‐level Labeling with Convolutional Networks, CVPR 2015
Constrained Convolutional Neural Network
• With Image‐Level Class Labels only Define the objective function with constraints Estimate latent probability distribution for optimization
28
D. Pathak, P. Krähenbühl, T. Darrell: Constrained Convolutional Neural Networks for Weakly Supervised Segmentation, ICCV 2015
subjectto |find
subjecttomin, || |1
Constraints
• Suppression constraint Suppress all labels not in the image.
• Foreground constraint Make positive labels visible.
• Background constraint Define lower and upper
bound of background area.
• Size constraint Define upper bound on a class.
29
0∀ ∉∀ ∈ 0 bg
Optimization
• Iterative method with slack variable
Find
Find
30
max exp ; ;∈min ℓ log
subjecttomin, || |1
Results
31Original image Ground‐truth With labels With labels + tags
BoxSup
• With bounding box annotations only
32
J. Dai, K. He, J. Sun: BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. ICCV 2015
1 1 IoU , , ,min,
Semantic Segmentation by Transfer Learning
• Data Source domain, 1,… ,
• Image‐level class labels • Pixel‐wise segmentation annotations
Target domain, 1,… ,• Image‐level class labels only
Source and target domains are composed of exclusive sets of categories.
• Goal Semantic segmentation of target domain images by transferring
segmentation knowledge from source domain data
• Impact Scalability to the datasets with a large number of classes with
minimal human supervision
33
Network Architecture
• Components Encoder, : pre‐trained VGG16 Decoder, : deconvolution network Classifier, : two fully‐connected layers Attention model, : multiplicative interactions
34
[Hong15] Seunghoon Hong, Junhyuk Oh, Bohyung Han, Honglak Lee: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 2016
, ; exp∑ exp⨀
Network Architecture
• Training strategy Using segmentation annotations: train both decoder and attention model Using image‐level class labels: train both classifier and attention model Encoder is fixed: VGG‐16 layer net
• Loss function
35
min, , ;∈ ∗∈ ∪ , ;∈ ∗∈
[Hong15] Seunghoon Hong, Junhyuk Oh, Bohyung Han, Honglak Lee: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 2016
Attention
• Attention vs. densified attention
36
Attention Densified attention
Results
37
Input image Ground‐truth Densified attention BaselineNet TransferNet TransferNet+CRF
Results
38
39
DeepLab with Various Supervisions
DeepLab
• Same with FCN approach but higher classification resolution• Hole algorithm
Make feature map denser Use existing CNN architecture
40
L.‐C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, ICLR 2015
Less stride but larger filter forboth pooling and convolution!
Fully Connected Conditional Random Field
41
,
L.‐C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, ICLR 2015
Results by DeepLab‐CRF
42
WSSL
• Baseline: semantic segmentation with pixel‐level annotations
• Goals: estimate pixel‐level labels by learning CNN based on Image‐level annotations Bounding box annotations Hybrid annotations: many image‐level annotations and few segmentation
annotations
43
G. Papandreou, L.‐C. Chen, K. P. Murphy, A. L. Yuille: Weakly‐ and Semi‐Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation. ICCV 2015
max log ; log ;where log ; ∝ exp | ;
Output of DCNN Model parameterLabel of pixel
EM using Weak Annotations
• E‐step: estimate the latent segmentation
• M‐step: maximize log‐likelihood by SGD
• EM‐Fixed:
• EM‐Adapt:
44
argmax ;argmax log ; logargmax | ; ′ log
′ argmax ;, 1 if 10 if 0
, const, const
, takes cardinality potential.
With image‐level class labels
45
With bounding box annotations
• Estimate pixel‐level annotations from bounding boxes Fully connected CRF
• Apply EM‐Fixed algorithm
46
With Hybrid Annotation
47
Few segmentation annotations
Many image‐level class annotations
48
49
, , ; ;
∗ log ∈∗,