context for semantic segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfmay...
TRANSCRIPT
Gang Yu
旷视研究院
Context For Semantic Segmentation
Chao Peng Jingbo WangChangqian Yu Changxin GaoXiangyu Zhang Gang Yu Jian Sun
Collaborators
Nong Sang
Outline
• Revisit Semantic Segmentation• Context for Semantic Segmentation
• Backbone• Head• Loss
• Conclusion
Outline
• Revisit Semantic Segmentation• Context for Semantic Segmentation
• Backbone• Head• Loss
• Conclusion
What is Semantic Segmentation?
• Classification + Localization• Visual Recognition
• Classification• Semantic Segmentation• Instance Segmentation• Panoptic Segmentation• Detection• Keypoint Detection
Pipeline
Backbone Head
LOSS
VGG16
ResNet
ResNext
…
Softmax
L2
…
U-Shape
4/8-Sampling + Dilation
…
Challenges in Semantic Segmentation?
• Speed• Performance
• Per-pixel Accuracy• Boundary
What is Context?
• According to Dictionary:• the parts of a discourse that surround a word or passage and
can throw light on its meaning
Sports
ball
Grass
Play
Fields
Person
Outline
• Revisit Semantic Segmentation• Context for Semantic Segmentation
• Backbone• Head• Loss
• Conclusion
Context in Backbone
• Motivation• Traditional Backbone is designed for Classification
• Large Receptive field by compromising spatial resolution• Segmentation requires both Classification & Localization
• Maintain both Receptive Field (context) & Spatial resolution• Computational cost?
Context in Backbone - BiSeNet
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018
• BiSeNet: Bilateral Segmentation Network
Context in Backbone - BiSeNet
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018
• Pipeline
Context in Backbone - BiSeNet
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018
• Results
Context in Backbone - BiSeNet
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018
• Ablation Results
Context in Backbone - BiSeNet
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018
• Speed
Context in Backbone - BiSeNet
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018
• Summary• Two path in backbone: Spatial path + Context path • Context is implicitly encoded in receptive field• Efficient speed• Code: https://github.com/ycszen/TorchSeg
• Context:• A branch encodes semantic meaning with large receptive field?
• Related work:• ICNet for Real-Time Semantic Segmentation on High-Resolution Images, Hengshuang Zhao,
Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia, ECCV2018• Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, Jia
Deng, ECCV2016
Context in Head
• Motivation• Large Receptive field without compromising boundary results• Why working on Head?
• Efficient speed• Obvious gain on increasing the receptive• Simple to implement
Context in Head – Large Kernel
• Receptive Field vs Valid Receptive Field
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters• Why Boundary Refinement?
• Large receptive field will blur the object boundary
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters• Ablation: Why Boundary Refinement?
• Large receptive field will blur the object boundary
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters• Ablation: Different kernel size?
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters• Ablation: Are more parameters helpful?
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters• Ablation: GCN vs. Stack of small convolutions
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters• Ablation: GCN in Backbone
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Large Kernel Matters: illustrative examples
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – Large Kernel
• Summary• Global Convolution network to increase the receptive field• Large separable convolution is an efficient implementation
• Context• Large receptive field?
• Related work• PSPNet: Pyramid Scene Parsing Network, Hengshuang Zhao, Jianping Shi, Xiaojuan Qi,
Xiaogang Wang, Jiaya Jia, CVPR2017• DeeplabV3: Rethinking Atrous Convolution for Semantic Image Segmentation, Liang-Chieh
Chen, George Papandreou, Florian Schroff, Hartwig Adam
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
Context in Head – DFN
• Motivation:• Large kernel (GCN) is computationally intensive
• Global pooling is efficient to compute and can obtain the global context
• Large receptive field does not equal to good context• Attention strategy to adaptively aggreate the features
Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018
Context in Head – DFN
• DFN: Pipeline
Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018
Context in Head – DFN
• DFN: Ablation
Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018
Context in Head – DFN
• DFN: Results
Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018
Context in Head – DFN
• Summary• Global pooling is efficient and effective to capture the long-range
context• Attention for adaptive adjusting feature weights• Code: https://github.com/ycszen/TorchSeg/
• Context• Receptive field & feature aggregation?
• Related work• Non-local Neural Networks, Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He, CVPR2018• CCNet: Criss-Cross Attention for Semantic Segmentation, Zilong Huang, Xinggang Wang, Lichao
Huang, Chang Huang, Yunchao Wei, Wenyu Liu• PSANet: Point-wise Spatial Attention Network for Scene Parsing, Hengshuang Zhao*, Yi Zhang*, Shu
Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia, ECCV2018• OCNet: Object Context Network for Scene Parsing, Yuhui Yuan, Jingdong Wang• ParseNet: Looking Wider to See Better, Wei Liu, Andrew Rabinovich, Alexander C. Berg
Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018
Context in Loss
• Motivation• “Thing” may be important for stuff prediction
COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf
Sports
ball
Grass
Play
Fields
Person
Context in Loss
• Motivation• “Thing” may be important for stuff prediction
COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf
Encoder Train/Inference Train Supervision Inference MergeRes-Block
Multi Types Context
Objects
Semantic
Stuff
Stuff
Context in Loss
• Pipeline
COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf
Context in Loss
• COCO2018 Panoptic Segmentation Challenge
49.3 49.6 54.1 54.550.8
Res50
+Encoder
+Extra Res
Blocks
+Multi
Context
+Huge
Backbone
+Multi-Scale
Flip Test
Results of Stuff Regions on
COCO2018 Panoptic
Segmentation Validation
Dataset
Metric:Mean IoU%
Finally, we assembled three
models and achieve 55.9%
mIoU on this dataset.
COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf
Context in Loss
• COCO2018 Panoptic Segmentation Challenge
COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf
Context in Loss
• Summary• “Thing” and “stuff” are complementary• Loss is a good approach to encode the context
• Better feature representation• Context
• A loss to encode the semantic meaning?• Related work
• Context Encoding for Semantic Segmentation, Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal, CVPR2018
COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf
Outline
• Revisit Semantic Segmentation• Context for Semantic Segmentation
• Backbone• Head• Loss
• Conclusion
Conclusion
• Context in different parts• Backbone, Head, Loss
• What is Context?• Large receptive field? • A semantic branch?• Spatial/feature aggregation?
• Future work• Explicitly show what is a context• Panoptic seg: Stuff vs Thing
Reference
• Pyramid Scene Parsing Network, Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, CVPR2017
• ICNet for Real-Time Semantic Segmentation on High-Resolution Images, Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia, ECCV2018
• Context Encoding for Semantic Segmentation, Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal, CVPR2018
• Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, ECCV2018
• Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017
• Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018
• BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, JingboWang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018