![Page 1: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/1.jpg)
Deep Watershed Transform for Instance Segmentation
Min Bai & Raquel Urtasun
To appear at IEEE CVPR 2017 in HawaiiPresented at NVIDIA GTC 2017
![Page 2: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/2.jpg)
Semantic Segmentation● Input: RGB Image● Output at each pixel:
○ Semantic label
![Page 3: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/3.jpg)
Instance Segmentation● Input: RGB Image● Output at each pixel:
○ Semantic label ○ Instance label
■ Same for each px in object■ Different among objects
○ Difficulty: How to phrase the problem?
![Page 4: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/4.jpg)
Applications● Object tracking
Image credit: Davi Frossard
![Page 5: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/5.jpg)
Applications● Interacting with the environment
Image credit: http://www.rethinkrobotics.com/build-a-bot/
![Page 6: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/6.jpg)
Applications● Useful information for other algorithms such as optical flow, etc
Image credit: Shenlong Wang
![Page 7: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/7.jpg)
Semantic Segmentation● Semantic segmentation is a well studied problem
○ Our instance segmentation method leverages an existing technique○ H. Zhao et al, Pyramid Scene Parsing Network, https://arxiv.org/abs/1612.01105
Image credit: H. Zhao et al.
![Page 8: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/8.jpg)
Watershed Transform● Classical image segmentation technique
Image (left) credit: Adrian Fisher
![Page 9: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/9.jpg)
Scalar Field and Gradient
Image source: Wikipedia: byVivekj78 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15346899
● Scalar field: single number at each pixel
● Gradient: vector at each pixel, pointing toward direction of greatest ascent
![Page 10: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/10.jpg)
Overview of Approach
Gradient of Energy Landscape Energy Landscape Predicted Instances
Input Image
Semantic Segmentation
![Page 11: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/11.jpg)
Overview of Approach
Gradient of Energy Landscape Energy Landscape Predicted Instances
Input Image
Semantic Segmentation
![Page 12: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/12.jpg)
Why Predict Direction First?
Input Image Energy LandscapeDirection of Gradient
Much sharper difference in the direction label at the boundary!
![Page 13: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/13.jpg)
Overall Network
![Page 14: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/14.jpg)
Direction Prediction Network
Ground Truth Directions
Predicted Directions
Input Image
Semantic Segmentation
![Page 15: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/15.jpg)
Energy Prediction Network
Ground Truth Energy
Predicted Energy
Ground Truth Instances
Predicted Instances
![Page 16: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/16.jpg)
Training and Inference● Pre-train both networks● End-to-end fine-tuning● Network trained on NVIDIA DGX-1
○ Approximately 25 hours total for training on one GP100 core○ ~0.1s per image for forward pass○ Thank you NVIDIA for the generous gift!
Image source: www.nvidia.com
![Page 17: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/17.jpg)
Cityscapes Dataset● 2975 training / 500 validation / 1525 testing images● Instances: car, truck, bus, train, person, rider, motorcycle, bicycle
![Page 18: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/18.jpg)
Cityscapes Dataset● 2975 training / 500 validation / 1525 testing images● Instances: car, truck, bus, train, person, rider, motorcycle, bicycle
![Page 19: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/19.jpg)
Cityscapes Instance Segmentation Leaderboard
* Average Precision (AP): higher is better
AP* AP* @ 50% AP* @ 50m AP* @ 100m
van den Brand et al. 2.3% 3.7% 3.9% 4.9%
Cordts et al. 4.6% 12.9% 7.7% 10.3%
Uhrig et al. 8.9% 21.1% 15.3% 16.7%
Ours 19.4% 35.3% 31.4% 36.8%
Recently, new approaches have achieved even higher performance.
![Page 20: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/20.jpg)
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
![Page 21: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/21.jpg)
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
![Page 22: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/22.jpg)
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
![Page 23: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/23.jpg)
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
![Page 24: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/24.jpg)
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
![Page 25: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/25.jpg)
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
![Page 26: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/26.jpg)
Preliminary TorontoCity Aerial Instance Segmentation
Input RGB Semantic Segmentation (ResNet) Predicted Building Instances
![Page 27: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/27.jpg)
Preliminary TorontoCity Aerial Instance Segmentation
Weighted Coverage*
AP* Recall* @ 50%
Precision* @ 50%
FCN-8 41.92% 11.37% 21.50% 36.00%
ResNet-56 40.65% 12.13% 18.90% 45.36%
Ours 56.22% 21.22% 67.16% 63.67%
* higher is better
![Page 28: Deep Watershed Transform for Instance Segmentationon-demand.gputechconf.com/gtc/2017/presentation/s7588... · 2017-05-14 · Deep Watershed Transform for Instance Segmentation Min](https://reader034.vdocuments.net/reader034/viewer/2022050402/5f805e18e55f3158ae3f0d57/html5/thumbnails/28.jpg)
In Summary...
● Simple technique for instance segmentation● Encodes object instances as energy map● Predicts gradient direction as intermediate task for better
supervision