uc berkeley fully convolutional networks for semantic segmentation jonathan long* evan shelhamer*...

27
UC Berkeley Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie ide credit: Jonathan Long

Upload: maria-cross

Post on 17-Jan-2016

250 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

UC Berkeley

Fully Convolutional Networksfor Semantic Segmentation

Jonathan Long* Evan Shelhamer* Trevor Darrell

1Presented by: Gordon Christie

Slide credit: Jonathan Long

Page 2: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Overview

• Reinterpret standard classification convnets as “Fully convolutional” networks (FCN) for semantic segmentation

• Use AlexNet, VGG, and GoogleNet in experiments• Novel architecture: combine information from different

layers for segmentation• State-of-the-art segmentation for PASCAL VOC

2011/2012, NYUDv2, and SIFT Flow at the time• Inference less than one fifth of a second for a typical

image

2Slide credit: Jonathan Long

Page 3: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

pixels in, pixels outmonocular depth estimation (Liu et al. 2015)

boundary prediction (Xie & Tu 2015)

semanticsegmentation

3Slide credit: Jonathan Long

Page 4: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

4

“tabby cat”

1000-dim vector

< 1 millisecond

convnets perform classification

end-to-end learning

Slide credit: Jonathan Long

Page 5: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

R-CNN

5

many seconds

“cat”

“dog”

R-CNN does detection

Slide credit: Jonathan Long

Page 6: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

6

R-CNN

figure: Girshick et al.

Slide credit: Jonathan Long

Page 7: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

7

< 1/5 second

end-to-end learning

???

Slide credit: Jonathan Long

Page 8: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

“tabby cat”

8

a classification network

Slide credit: Jonathan Long

Page 9: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

9

becoming fully convolutional

Slide credit: Jonathan Long

Page 10: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

10

becoming fully convolutional

Slide credit: Jonathan Long

Page 11: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

11

upsampling output

Slide credit: Jonathan Long

Page 12: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

conv, pool,nonlinearity

upsampling

pixelwiseoutput + loss

end-to-end, pixels-to-pixels network

12Slide credit: Jonathan Long

Page 13: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Dense Predictions

• Shift-and-stitch: trick that yields dense predictions without interpolation

• Upsampling via deconvolution• Shift-and-stitch used in preliminary experiments, but not

included in final model• Upsampling found to be more effective and efficient

13

Page 14: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Classifier to Dense FCN

• Convolutionalize proven classification architectures: AlexNet, VGG, and GoogLeNet (reimplementation)

• Remove classification layer and convert all fully connected layers to convolutions

• Append 1x1 convolution with channel dimensions and predict scores at each of the coarse output locations (21 categories + background for PASCAL)

14

Page 15: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Classifier to Dense FCN

Cast ILSVRC classifiers into FCNs and compare performance on validation set of PASCAL 2011

15

Page 16: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

spectrum of deep features

combine where (local, shallow) with what (global, deep)

fuse features into deep jet

(cf. Hariharan et al. CVPR15 “hypercolumn”) 16Slide credit: Jonathan Long

Page 17: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

skip layers

skip to fuse layers!

interp + sum

interp + sum

dense output 17

end-to-end, joint learningof semantics and location

Slide credit: Jonathan Long

Page 18: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

skip layers

18

Page 19: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Comparison of skip FCNs

19

Results on subset of validation set of PASCAL VOC 2011

Page 20: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

stride 32

no skips

stride 16

1 skip

stride 8

2 skips

ground truthinput image

skip layer refinement

20Slide credit: Jonathan Long

Page 21: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

training + testing- train full image at a time without patch sampling - reshape network to take input of any size- forward time is ~150ms for 500 x 500 x 21 output

21Slide credit: Jonathan Long

Page 22: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Results – PASCAL VOC 2011/12

VOC 2011: 8498 training images (from additional labeled data

22

Page 23: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Results – NYUDv2

23

1449 RGB-D images with pixelwise labels 40 categories

Page 24: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

Results – SIFT Flow2688 images with pixel labels

33 semantic categories, 3 geometric categoriesLearn both label spaces jointly

learning and inference have similar performance and computation as independent models

24

Page 25: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

FCN SDS* Truth Input

25

Relative to prior state-of-the-art SDS:

­ 20% relative improvementfor mean IoU

­ 286× faster

*Simultaneous Detection and Segmentation Hariharan et al. ECCV14Slide credit: Jonathan Long

Page 26: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

leaderboard

== segmentation with Caffe

26

FCNFCNFCNFCNFCNFCNFCNFCNFCNFCNFCN

FCNFCNFCN

FCN

Slide credit: Jonathan Long

Page 27: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide

conclusion

fully convolutional networks are fast, end-to-end models for pixelwise problems

- code in Caffe branch (merged soon)- models for PASCAL VOC, NYUDv2,

SIFT Flow, PASCAL-Context

27

caffe.berkeleyvision.org

github.com/BVLC/caffefcn.berkeleyvision.org

Slide credit: Jonathan Long