image domain transfer - mit deep learning...

68
Jan Kautz, VP of Learning and Perception Research Image Domain Transfer

Upload: others

Post on 04-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Jan Kautz, VP of Learning and Perception Research

Image Domain Transfer

Page 2: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Image Domain Transfer: enabling machines to have human-like imagination abilities

Input image Domain transferred image

F

This image is generated by our method.

Page 3: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Example use cases

Low-res to high-res Blurry to sharp

Noisy to clean

LDR to HDR Thermal to color

Image to painting

Day to night Summer to winter

Synthetic to real

Page 4: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Two Approaches

• Example-based§ Non-parametric model§ The transfer function F is defined by an example image.

• Learning-based§ Parametric model§ The transfer function is learned via fitting a training dataset.

Input image Domain transferred image

F

F ( |Training Dataset)

F ( |xexample)

Page 5: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Example-based Image Domain Transfer

F ( |xexample)

Page 6: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Example-based image domain transfer

Stylized content photoContent photoStyle photo

xoutput = F (xinput|xexample)

Often referred to as Style Transfer

Page 7: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Example-based image domain transfer• Artistic Style Transfer• Content: real photo; • Style: painting• Gatys et. al. , Johnson

et. al., Li et al., Huang et. al.

• Photo Style Transfer• Content: real photo; • Style: real photo• Luan et. al., Pitie et.

al., Reinhard et. al.

Style (painting) Content Output

Style Content Output

Page 8: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

FastPhotoStyle

• ”A Closed-form Solution to Photorealistic Image Stylization” by Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz,ECCV 2018• Code: https://github.com/NVIDIA/FastPhotoStyle

Page 9: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Fast Photo Style

Style

Content

Stylization step Photo smoothing step

We model photo style transfer as a close-form function mapping given by

Content image

Style image

Page 10: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Stylization Step

VGG19 up to conv4_1

Assumption: Covariance matrix of deep features encodes the style information.

HCHTC = EC⇤CE

TC

HSHTS = ES⇤SE

TS

Page 11: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Stylization StepDecoder

whitening coloring

Architecture is similar to Li et al., but uses unpooling.

Page 12: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

When semantic label maps are availableContent Style

Page 13: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

ContentStyle Output

Page 14: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Photo Smoothing

Assumption: If we can compute a new image where the image pixel values resemble those in the intermediate image but the similarities between neighboring pixels resemble those in the content image, then we have a photorealistic stylization outputs.

Style

Content

Stylization step Photo smoothing step

Intermediate image pixel values

Similarity between neighboring pixels(Gaussian/matting affinity)

R⇤ = (1� ↵)(I � ↵D� 12WD� 1

2 )�1YClose-form solution:

Page 15: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

ContentStyle Output

Page 16: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

ContentStyle Output

Page 17: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Comparison

Page 18: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Comparison

Page 19: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Quantitative results

Page 20: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Learning-based Image Domain Transfer

F ( |Training Dataset)

Page 21: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Generative Adversarial Networks (GANs)

Generator Discriminator

Discriminator

z

True

False

p(G(z)) ! p(X)

Page 22: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Supervised Unsupervised

Supervised vs Unsupervised

Page 23: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Unimodal vs Multimodal

F ( ) =

F ( ) =

Unimodal

Multimodal

p(Y |X) = �(F (X))

p(Y |X) = F (X,S)

Page 24: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Categorization

Supervised Unsupervised

Unimodal pix2pix, CRN, SRGAN UNIT, Coupled GAN, DTN, DiscoGAN, CycleGAN, DualGAN, StarGAN

Multimodal pix2pixHD, vid2vid, BiCycleGAN

MUNIT

Page 25: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Categorization

Supervised Unsupervised

Unimodal pix2pix, CRN, SRGAN UNIT, Coupled GAN, DTN, DiscoGAN, CycleGAN, DualGAN, StarGAN

Multimodal pix2pixHD, vid2vid, BiCycleGAN

MUNIT

Page 26: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

pix2pixHD: Supervised and multimodalimage domain transfer• ”High-Resolution Image Synthesis and Semantic Manipulation with

Conditional GANs” by Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro, CVPR 2018

• Code: https://github.com/NVIDIA/pix2pixHD

Page 27: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

pix2pixHD

Page 28: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

pix2pix

Generator Discriminator

True

False

Discriminator

p(G(S), S) ! p(X,S)

Page 29: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

pix2pixHD

Generator Discriminator False

FeatureEncoder

Low-dimensional feature map

InstancePooling

Once we finish training, we extract the instance-pooled feature maps from all the training images and fit a mixture of Gaussian model to features belong to the same semantic class.

During inference, we sample from the mixture of Gaussian distributions for achieving multimodal image translation.

Page 30: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

D3

D2

Multi-scale Discriminators Robust Objective(GAN + discriminator feature matching loss)

D1

Coarse-to-fine Generator

... Residual blocks

Residual blocks

...

2real synthesized

Di Di

real synthesized

match

match

real

Page 31: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

pix2pixHD multimodal results

Page 32: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

pix2pixHD label changes

Page 33: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Comparison

Page 34: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Categorization

Supervised Unsupervised

Unimodal pix2pix, CRN, SRGAN UNIT, Coupled GAN, DTN, DiscoGAN, CycleGAN, DualGAN, StarGAN

Multimodalpix2pixHD, vid2vid, BiCycleGAN

MUNIT

Page 35: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

vid2vid: Video-to-Video Synthesis

• ”Video-to-Video Synthesis” by Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro, NIPS 2018

• Code: https://github.com/NVIDIA/vid2vid

Page 36: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Motivation

Page 37: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Using pix2pixHD

Page 38: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

...

Spatially progressive

Temporally progressive

Spatio-temporally Progressive Training

...

Residual blocks

Alternating training

T

T T

SS S

Image Discriminator Video Discriminator

D1

D2

D3

Final Image

Interm

ediate

Flow map

Sequential Generator

W

Warped

Semantic

maps

Past im

ages

Multi-scale Discriminators

Page 39: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Results

Page 40: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Results

Page 41: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+
Page 42: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+
Page 43: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

AI Rendered Game

Page 44: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Categorization

Supervised Unsupervised

Unimodal pix2pix, CRN, SRGAN, … UNIT, Coupled GAN, DTN, DiscoGAN, CycleGAN, DualGAN, StarGAN

Multimodal pix2pixHD, vid2vid, BiCycleGAN

MUNIT

Page 45: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

UNIT: Unsupervised and unimodal image domain transfer• ”Unsupervised Image-to-image Translation Networks” by

Ming-Yu Liu, Thomas Breuel, Jan Kautz, NIPS 2017• Code: https://github.com/mingyuliutw/UNIT

Page 46: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Supervised Unsupervised

Supervised vs Unsupervised

Page 47: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Generator 1 Discriminator 1

True

False

Discriminator 1

Generator 2 Discriminator 2

True

False

Discriminator 2

p(G1(X1)|X1) ! p(X2)

p(G2(X2)|X2) ! p(X1)

p(G1(X1)|X1) 9 p(X2|X1)But

p(G2(X2)|X2) 9 p(X1|X2)But

Page 48: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

UNIT assumption: Shared Latent Space

Coupling the mapping

function via weight-sharing

Domain 2 GAN Discriminator

Page 49: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Encoder 1 Discriminator 1

True

False

Discriminator 1

Discriminator 2

True

False

Discriminator 2

Decoder 2

Encoder 2

Decoder 1

Shared latent space

Page 50: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Encoder 1

Decoder 2

Encoder 2

Decoder 1

Page 51: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Day to Night Translation

Input Translated Input Translated

Resolution640x480

Page 52: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Input Translated Input Translated

Snowy to Summery TranslationResolution640x480

Page 53: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Input Translated Input Translated

Sunny to Rainy TranslationResolution640x480

Page 54: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Categorization

Supervised Unsupervised

Unimodal pix2pix, CRN, SRGAN UNIT, Coupled GAN, DTN, DiscoGAN, CycleGAN, DualGAN, StarGAN

Multimodal pix2pixHD, BiCycleGAN MUNIT

Page 55: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

MUNIT: Unsupervised and multimodalimage domain transfer• ”Multimodal Unsupervised Image-to-image Translation” by

Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz, ECCV 2018

• Code: https://github.com/NVlabs/MUNIT

Page 56: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

MUNIT assumption: Partially Shared Latent Space

Page 57: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Content Encoder

1Discriminator 1

True

False

Discriminator 1

Discriminator 2

True

False

Discriminator 2

Decoder 2

Content Encoder

2

Decoder 1

Shared contentlatent space

Style Encoder

2

Style Encoder

1

Page 58: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

MUNIT network

Page 59: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

MUNIT results

Page 60: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+
Page 61: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+
Page 62: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+
Page 63: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

MUNIT Style TransferContent Encoder

1

Decoder 2

Style Encoder

2

Page 64: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+
Page 65: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Styl

e T

ransf

er

Com

par

ison

Page 66: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

Conclusion• Example-based image domain transfer• Learning-based image domain transfer

Learning-based Unimodal MultimodalSupervised

Unsupervised

Page 67: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+

77

100 Best Companies to Work For— Fortune

Employees’ Choice: Highest Rated CEOs— Glassdoor

Most Innovative Companies— Fast Company

World’s Most Admired Companies— Fortune

50 Smartest Companies— MIT Tech Review

JOIN NVIDIA

YOUR LIFE’S WORK STARTS HERE

INTERESTED? Email: [email protected]

Page 68: Image Domain Transfer - MIT Deep Learning 6.S191introtodeeplearning.com/2019/materials/2019_6S191_L9.pdfImage Domain Transfer : enabling machines to have human -like imagination abilities!"#$%&'()*+