image segmentation and visualization · 2021. 6. 7. · visualization by optimization •maximize...

Image Segmentation and Visualization

Xiaolong Wang

This Class

• Naïve FCN model for Image Segmentation

• Transpose Convolution

• Skip Connection, Hypercolumn

• Visualizing Deep Networks

Slides partially from: http://cs231n.stanford.edu/ https://slazebni.cs.illinois.edu/fall20/

http://cs231n.stanford.edu/

https://slazebni.cs.illinois.edu/fall20/

Segmentation Problem and FCN

The problem

semantic segmentation instance segmentation

image classification object detection

The problem

Semantic Segmentation

Simply staring one pixel is impossible to do the classification

Let’s put in some context!


Time Consuming!

Can we process the whole image at one time?

AlexNet input: 227 x 277 x 3

AlexNet Conv5: 13 x 13 x 128

Output ?13 x 13 x 21

Output is too small!

Can we process the whole image at one time?

AlexNet input: 227 x 277 x 3

AlexNet Conv5: 13 x 13 x 128

Output ?227 x 227 x 21

FC + reshape

Huge number of Parameters for FC: 13 x 13 x 128 x 227 x 227 x 21

Fully Convolutional Network

Convolution at original image resolution has high computation cost.

Fully Convolutional Network

D! × H/4 ×W/4 D! × H/4 ×W/4

D" × H/8 ×W/8

Make the feature map small increases the receptive field

Make the feature map larger again increases the resolution

Transpose Convolution

The FCN paper

Long et al. 2015

1 x 1 convolutions

The FCN paper

The upsamling

• Upsampling Layer

• Deconvolution Layer

• Transpose Convolution Layer

Recall Convolution3 X 3 convolution with stride 1 and padding 1

Recall Convolution3 X 3 convolution with stride 2 and padding 1

Transpose Convolution3 X 3 transpose convolution, stride 2 and padding 1

Transpose Convolution3 X 3 transpose convolution, stride 2 and padding 1

Sum over the overlapping region

1D Transpose Convolution Example

2D ConvolutionRegular convolution (stride 1, pad 0)

𝑥(( 𝑥() 𝑥(* 𝑥(+

𝑥)( 𝑥)) 𝑥)* 𝑥)+

𝑥*( 𝑥*) 𝑥** 𝑥*+

𝑥+( 𝑥+) 𝑥+* 𝑥++

𝑤(( 𝑤() 𝑤(*

𝑤)( 𝑤)) 𝑤)*

𝑤*( 𝑤*) 𝑤**

𝑧(( 𝑧()

𝑧)( 𝑧))

𝑤!! 𝑤!" 𝑤!# 0 𝑤"! 𝑤"" 𝑤"# 0 𝑤#! 𝑤#" 𝑤## 0 0 0 0 00 𝑤!! 𝑤!" 𝑤!# 0 𝑤"! 𝑤"" 𝑤"# 0 𝑤#! 𝑤#" 𝑤## 0 0 0 00 0 0 0 𝑤!! 𝑤!" 𝑤!# 0 𝑤"! 𝑤"" 𝑤"# 0 𝑤#! 𝑤#" 𝑤## 00 0 0 0 0 𝑤!! 𝑤!" 𝑤!# 0 𝑤"! 𝑤"" 𝑤"# 0 𝑤#! 𝑤#" 𝑤##

𝑥##𝑥#!𝑥#"𝑥#$⋮𝑥$$

𝑧##𝑧#!𝑧!#𝑧!!

Matrix-vector form:

∗ =

=

4x4 input, 2x2 output

𝑥!!𝑥!"𝑥!#𝑥!$𝑥"!𝑥""𝑥"#𝑥"$𝑥#!𝑥#"𝑥##𝑥#$𝑥$!𝑥$"𝑥$#𝑥$$

𝑧##𝑧#!𝑧!#𝑧!!

𝑤!! 0 0 0𝑤!" 𝑤!! 0 0𝑤!# 𝑤!" 0 00 𝑤!# 0 0𝑤"! 0 𝑤!! 0𝑤"" 𝑤"! 𝑤!" 𝑤!!𝑤"# 𝑤"" 𝑤!# 𝑤!"0 𝑤"# 0 𝑤!#𝑤#! 0 𝑤"! 0𝑤#" 𝑤#! 𝑤"" 𝑤"!𝑤## 𝑤#" 𝑤"# 𝑤""0 𝑤## 0 𝑤"#0 0 𝑤#! 00 0 𝑤#" 𝑤#!0 0 𝑤## 𝑤#"0 0 0 𝑤##

𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

=2x2 input, 4x4 output

=∗%

Not an inverse of the original convolution operation, simply reverses dimension change!

SourceTranspose Convolution

https://github.com/vdumoulin/conv_arithmetic


𝑧##𝑧#!𝑧!#𝑧!!


𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

𝑥## = 𝑤##𝑧##

=

=∗%



𝑧##𝑧#!𝑧!#𝑧!!


𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

𝑥#! = 𝑤#!𝑧## + 𝑤##𝑧#!

=

Convolve input with flipped filter

=∗%



𝑧##𝑧#!𝑧!#𝑧!!


𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

𝑥#" = 𝑤#"𝑧## + 𝑤#!𝑧#!

=

=∗%




𝑧##𝑧#!𝑧!#𝑧!!


𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

𝑥#$ = 𝑤#"𝑧#!

=

=∗%




𝑧##𝑧#!𝑧!#𝑧!!


𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

𝑥!# = 𝑤!#𝑧## + 𝑤##𝑧!#

=

=∗%




𝑧##𝑧#!𝑧!#𝑧!!


𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

=

=

𝑥!! = 𝑤!!𝑧## + 𝑤!#𝑧#! + 𝑤#!𝑧!# + 𝑤##𝑧!!

∗%




𝑧##𝑧#!𝑧!#𝑧!!


𝑥!! 𝑥!" 𝑥!# 𝑥!$

𝑥"! 𝑥"" 𝑥"# 𝑥"$

𝑥#! 𝑥#" 𝑥## 𝑥#$

𝑥$! 𝑥$" 𝑥$# 𝑥$$

𝑤!! 𝑤!" 𝑤!#

𝑤"! 𝑤"" 𝑤"#

𝑤#! 𝑤#" 𝑤##

𝑧!! 𝑧!"

𝑧"! 𝑧""

=

=

𝑥!" = 𝑤!"𝑧## + 𝑤!!𝑧#! + 𝑤#"𝑧!# + 𝑤#!𝑧!!

∗%




input

output

D! × H/4 ×W/4 D! × H/4 ×W/4

D" × H/8 ×W/8

Advanced Techniques in Segmentation

U-Net

Ronneberger et al. 2015

U-Net

https://phillipi.github.io/pix2pix/

U-Net and FPN

Lin et al. 2017

Hypercolumn and Skip Connection

Hariharan et al. 2015

Hypercolumn and Skip Connection

Bansal et al. 2017

Dilated convolutions

• Idea: instead of reducing spatial resolution of feature maps, use a large sparse filter

Dilation factor 1 Dilation factor 2 Dilation factor 3

Dilated convolutions

• Increase the receptive field

input

output

Visualizing Deep Networks using “Saliency map”

CAM: Class Activation Maps

Zhou et al. 2016


https://www.youtube.com/watch?v=fZvOy0VXWAI&ab_channel=BoleiZhou

https://www.youtube.com/watch?v=fZvOy0VXWAI&ab_channel=BoleiZhou

Visualizing Deep Networks by maximizing activation

Visualization by optimization

• We can synthesize images that maximize activation of a given neuron.

• Find image 𝑥 maximizing target activation 𝑓 𝑥 subject to natural image regularization penalty 𝑅(𝑥):

𝑥∗ = arg max" 𝑓 𝑥 − 𝜆𝑅(𝑥)

Visualization by optimization• Maximize 𝑓 𝑥 − 𝜆𝑅(𝑥)

• 𝑓 𝑥 is score for a category before softmax• 𝑅(𝑥) is L2 regularization• Perform gradient ascent starting with zero image, add dataset mean

to result

Simonyan et al. 2014


Compute the Loss

BP the gradients, but do not train the network

Keep adding/aggregating the gradients under constraint 𝑅(𝑥)

Google DeepDreamAmplify one layer instead of just one neuron.

Choose an image and a layer in a CNN; repeat:1. Forward: compute activations at chosen layer2. Set gradient of chosen layer equal to its activation

Equivalent to maximizing ∑6 𝑓6)(𝑥)3. Backward: Compute gradient w.r.t. image4. Update image (with some tricks)

Next Class

Visualizing Deep Networks

image segmentation and visualization · 2021. 6. 7. · visualization by optimization •maximize...

Documents