deep learning for computer visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdflecun...

86
Deep Learning for Computer Vision Fall 2020 http://vllab.ee.ntu.edu.tw/dlcv.html (Public website) https://cool.ntu.edu.tw/courses/3368 (NTU COOL; for grade, etc.) Yu-Chiang Frank Wang 王鈺強, Professor Dept. Electrical Engineering, National Taiwan University 2020/11/10

Upload: others

Post on 31-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Deep Learning for Computer VisionFall 2020

http://vllab.ee.ntu.edu.tw/dlcv.html (Public website)

https://cool.ntu.edu.tw/courses/3368 (NTU COOL; for grade, etc.)

Yu-Chiang Frank Wang 王鈺強, Professor

Dept. Electrical Engineering, National Taiwan University

2020/11/10

Page 2: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Week Date Topic Remarks

1 9/15 Course Logistics

2 9/22 Machine Learning 101

3 9/29 Intro to Neural Networks; Convolutional Neural Network (I) HW #1 out

4 10/6 Convolutional Neural Network (II): Visualization & Extensions of CNN

5 10/13 Tutorials on Python, Github, etc. (by TAs) HW #1 due

6 10/20 Visualization of CNN (II)Object Detection & Segmentation

HW #2 out

7 10/27 Image Segmentation; Generative Models

8 11/3 Generative Adversarial Network (GAN)

9 11/10 Transfer Learning for Visual Classification & Synthesis; Representation Disentanglement

HW #2 due;HW #3 out

10 11/17 TBD (CVPR Week)

11 11/24 Recurrent Neural Networks & Transformer

12 12/1 Meta-Learning; Few-Shot and Zero-Shot Classification (I) HW #3 due

13 12/8 Meta-Learning; Few-Shot and Zero-Shot Classification (II) HW #4 out

14 12/15 From Domain Adaptation to Domain Generalization Team-up for Final Projects

15 12/22 Beyond 2D vision (3D and Depth)

16 12/29 Image Inpainting and Outpainting; Guest Lecture HW #4 due

17 1/5 Guest Lectures

1/18-22 Presentation for Final Projects TBD2

Page 3: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

What to Cover Today…• Transfer Learning for Visual Classification & Synthesis

• Visual Classification• Domain Adaptation & Adversarial Learning

• Visual Synthesis• Style Transfer

• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement

Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 3

Page 4: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Revisit of CNN for Visual Classification

4LeCun & Ranzato, Deep Learning Tutorial, ICML 2013

Page 5: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

(Traditional) Machine Learning vs. Transfer Learning

• Machine Learning• Collecting/annotating data is typically expensive.

5Image Credit: A. Karpathy

Page 6: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

(Traditional) Machine Learning vs. Transfer Learning (cont’d)

• Transfer Learning• Improved learning & understanding in the (target) domain of interest

by leveraging knowledge from a different source domain.

6

Page 7: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

7https://techcrunch.com/2017/02/08/udacity-open-sources-its-self-driving-car-simulator-for-anyone-to-use/https://googleblog.blogspot.tw/2014/04/the-latest-chapter-for-self-driving-car.html

• A More Practical Example

Transfer Learning: What, When, and Why?

Page 8: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Transfer Learning

8

Transfer Learning

Multi-task Learning

TransductiveTransfer Learning

Unsupervised Transfer Learning

Inductive Transfer Learning

Domain Adaptation

Sample Selection Bias /Covariance Shift

Self-taught Learning

Labeled data are available in a target domain

Labeled data are available only in a source domain

No labeled data in both source and target domain

No labeled data in a source domain

Labeled data are available in a source domain

Case 1

Case 2Source and target tasks are learnt simultaneously

Different domains but single task

Assumption: single domain and single task

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE TKDE, 2010.

Page 9: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Domain Adaptationin Transfer Learning

• What’s DA?• Leveraging info from source to target domains,

so that the same learning task across domains can be addressed.• Typically all the source-domain data are labeled,

while the target domain data are partially labeled or fully unlabeled.

• Settings• Semi-supervised/unsupervised DA:

few/no target-domain data are with labels.• Imbalanced DA:

fewer classes of interest in the target domain• Open/closed-set/universal DA:

overlapping label space or not• Homogeneous vs. heterogeneous DA:

same/distinct feature types across domains

9

Page 10: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Deep Feature is Sufficiently Promising.

• DeCAF• Leveraging an auxiliary large dataset to train CNN.• The resulting features exhibit sufficient representation ability.• Supporting results on Office+Caltech datasets, etc.

10Donahue et al., DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, ICML 2014

Page 11: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Recent Deep Learning Methods for DA

• Deep Domain Confusion (DDC)

• Domain-Adversarial Training of Neural Networks (DANN)

• Adversarial Discriminative Domain Adaptation (ADDA)

• Domain Separation Network (DSN)

• Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks (PixelDA)

• No More Discrimination: Cross City Adaptation of Road Scene Segmenters

11

Shared weights Adaptation loss Generative model

DDC ✓ MMD ✘

DANN ✓ Adversarial ✘

ADDA ✘ Adversarial ✘

DSN Partially shared MMD/Adversarial ✘

PixelDA ✘ Adversarial ✓

Page 12: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Deep Domain Confusion (DDC)

• Deep Domain Confusion: Maximizing for Domain Invariance• Tzeng et al., arXiv: 1412.3474, 2014

12

Page 13: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Deep Domain Confusion (DDC)

13

sharedweights

✓Minimize classification loss:

Page 14: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Domain Confusion by Domain-Adversarial Training

• Domain-Adversarial Training of Neural Networks (DANN)• Y. Ganin et al., ICML 2015• Maximize domain confusion = maximize domain classification loss• Minimize source-domain data classification loss• The derived feature f can be viewed as a disentangled & domain-invariant feature.

14

Page 15: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Beyond Domain Confusion

• Domain Separation Network (DSN)• Bousmalis et al., NIPS 2016• Separate encoders for domain-invariant and domain-specific features• Private/common features are disentangled from each other.

15

Page 16: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Beyond Domain Confusion

• Domain Separation Network, NIPS 2016• Example results

32

Source-domain image Xs

Reconstruct private + shared featuresD(Ec(xs)+Ep(xs))

Reconstruct shared feature only D(Ec(xs))

Reconstruct private feature D(Ep(xs))

Target-domain image XT

Page 17: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Beyond Domain Confusion

• Domain Separation Network, NIPS 2016• Example results

32

Source-domain image Xs

Target-domain image XT

Page 18: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

What to Cover Today…• Transfer Learning for Visual Classification & Synthesis

• Visual Classification• Domain Adaptation & Adversarial Learning

• Visual Synthesis• Style Transfer

• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement

Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 18

Page 19: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Transfer Learning for Manipulating Data?

• TL not only addresses cross-domain classification tasks.• Let’s see how we can synthesize and manipulate data across domains.

• As a computer vision guy, let’s focus on visual data in this lecture…

19

SourceDomain

TargetDomain

Page 20: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Transfer Learning for Image Synthesis

• Cross-Domain Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)• Beyond image translation

20

Page 21: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

A Super Brief Review forGenerative Adversarial Networks (GAN)• Architecture of GAN

• Loss:

21Goodfellow et al., Generative Adversarial Nets, NIPS, 2014

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺,𝐷𝐷 = 𝔼𝔼 log 1 − 𝐷𝐷 (𝐺𝐺(𝑥𝑥)) + 𝔼𝔼 log𝐷𝐷 𝑦𝑦

x

y

Page 22: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Pix2pix

• Image-to-image translation with conditional adversarial networks (CVPR’17)• Can be viewed as image style transfer

22

Sketch Photo

Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.

Page 23: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Pix2pix• Goal / Problem Setting

• Image translation across two distinct domains (e.g., sketch v.s. photo)

• Pairwise training data

• Method: Conditional GAN• Example: Sketch to Photo

• GeneratorInput: SketchOutput: Photo

• DiscriminatorInput: Concatenation of Input(Sketch) & Synthesized/Real(Photo) imagesOutput: Real or Fake

23

Testing Phase

GeneratedInput

Input

Concat

Concat

Training Phase

Input

real

Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.

Page 24: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Pix2pix

24

• Learning the model

GeneratedInput

Input

Concat

Training Phase

Concat

Input

Real

ℒ𝑐𝑐𝐺𝐺𝐺𝐺𝐺𝐺(G, D) = 𝔼𝔼 𝑥𝑥 log 1 − D(𝑥𝑥, G(𝑥𝑥)) + 𝔼𝔼 𝑥𝑥,𝑦𝑦 log D 𝑥𝑥,𝑦𝑦Fake (Generated) Real

Concatenate Concatenate

ℒ𝐿𝐿𝐿(G) = 𝔼𝔼 𝑥𝑥,𝑦𝑦 𝑦𝑦 − G(𝑥𝑥) 𝐿

Reconstruction Loss

Conditional GAN loss

Overall objective functionG∗ = arg min

GmaxD

ℒ𝑐𝑐𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝐿𝐿𝐿(G)

Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.

Page 25: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Pix2pix

25

• Experiment results

Demo page: https://affinelayer.com/pixsrv/

Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.

Page 26: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Transfer Learning for Image Synthesis

• Cross-Domain Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)• Beyond image translation

26

Page 27: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

CycleGAN/DiscoGAN/DualGAN

• CycleGAN (CVPR’17)• Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

Networks -to-image translation with conditional adversarial networks

27Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Easier to collect training data

• More practical

Paired Unpaired

1-to-1 Correspondence

No Correspondence

Page 28: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

CycleGAN

28Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Goal / Problem Setting• Image translation across two distinct domains • Unpaired training data

• Idea• Autoencoding-like image translation• Cycle consistency between two domains

Photo PaintingUnpaired

Cycle Consistency

Photo Painting Photo

Training data

Painting Photo Painting

Cycle Consistency

Page 29: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

CycleGAN

29Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Method (Example: Photo & Painting)

• Based on 2 GANs• First GAN (G1, D1): Photo to Painting• Second GAN (G2, D2): Painting to Photo

• Cycle Consistency• Photo consistency• Painting consistency

Photo(Input)

Painting(Generated)

G1

D1

Painting(Real)

or Real / Fake

Photo(Generated)

Painting(Input)

G2

D2or Real / Fake

Photo(Real)

Page 30: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

CycleGAN

30Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Method (Example: Photo vs. Painting)

• Based on 2 GANs• First GAN (G1, D1): Photo to Painting• Second GAN (G2, D2): Photo to Painting

• Cycle Consistency• Photo consistency• Painting consistency

Photo Consistency

Photo Painting PhotoG1 G2

Painting Photo Painting

Painting Consistency

G1G2

Page 31: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

CycleGAN

31Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Learning

• Adversarial Loss• First GAN (G1, D1):

• Second GAN (G2, D2):

Overall objective functionG𝐿∗ , G2∗ = arg min

G1,G2maxD1,D2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2First GAN Second GAN

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑥𝑥)) + 𝔼𝔼 log D𝐿 𝑦𝑦

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 = 𝔼𝔼 log 1 − D2(G2(𝑦𝑦)) + 𝔼𝔼 log D2 𝑥𝑥

Photo(Input)

Painting(Generated)

G1

D1

Painting(Real)

or

Real/ Fake

Photo(Generated)

Painting(Input)

G2

D2or Real/ Fake

Photo(Real)

𝑥𝑥 G𝐿(𝑥𝑥)

𝑦𝑦

𝑦𝑦 G2(𝑦𝑦)

𝑥𝑥

Page 32: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

CycleGAN

32Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Learning

• Consistency Loss• Photo and Painting consistency

Overall objective functionG𝐿∗ , G2∗ = arg min

G1,G2maxD1,D2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2Cycle Consistency

ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2 = 𝔼𝔼 G2 G𝐿 𝑥𝑥 − 𝑥𝑥 𝐿 + G𝐿 G2 𝑦𝑦 − 𝑦𝑦 𝐿

Photo Consistency

Photo Painting Photo

G1 G2𝑥𝑥 G𝐿 𝑥𝑥 G2 G𝐿 𝑥𝑥

Painting Photo Painting

Painting Consistency

G1G2𝑦𝑦 G2 𝑦𝑦 G𝐿 G2 𝑦𝑦

Page 33: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

CycleGAN

33Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Example results

Project Page: https://junyanz.github.io/CycleGAN/

Page 34: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Image Translation Using Unpaired Training Data

34

Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.Kim et al. "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks.“, ICML 2017

Yi, Zili, et al. "Dualgan: Unsupervised dual learning for image-to-image translation." ICCV 2017

• CycleGAN, DiscoGAN, and DualGAN

CycleGANICCV’17

DiscoGANICML’17

DualGANICCV’17

Page 35: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Transfer Learning for Image Synthesis

• Cross-Domain Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)• Beyond image translation

35

Page 36: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

UNIT

• Unsupervised Image-to-Image Translation Networks (NIPS’17)• Image translation via learning cross-domain joint representation

36Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017

𝑥𝑥𝐿𝑥𝑥2

𝑧𝑧𝒵𝒵: Joint latent space

𝒳𝒳𝐿 𝒳𝒳2

𝑥𝑥𝐿𝑥𝑥2

𝑧𝑧𝒵𝒵: Joint latent space

𝒳𝒳𝐿 𝒳𝒳2

Stage1: Encode to the joint space Stage2: Generate cross-domain images

Day Night Day Night

Page 37: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

UNIT

• Goal/Problem Setting• Image translation

across two distinct domains• Unpaired training image data

• Idea• Based on two parallel VAE-GAN models

37Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017

VAE GAN

Page 38: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

UNIT

• Goal/Problem Setting• Image translation

across two distinct domains• Unpaired training image data

• Idea• Based on two parallel VAE-GAN models• Learning of joint representation

across image domains

38Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017

Page 39: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

UNIT

• Goal/Problem Setting• Image translation

across two distinct domains• Unpaired training image data

• Idea• Based on two parallel VAE-GAN models• Learning of joint representation

across image domains• Generate cross-domain images

from joint representation

39Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017

Page 40: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Variation Autoencoder Loss

• Learning

40

Overall objective functionG∗ = arg min

GmaxD

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 = 𝔼𝔼 G𝐿 E𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))𝔼𝔼 G2 E2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))

VAEG𝐿E𝐿 D𝐿

E2 G2 D2

Adversarial Lossℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑧𝑧) + 𝔼𝔼 log D𝐿 𝑦𝑦𝐿

𝔼𝔼 log 1 − D2(G2(𝑧𝑧) + 𝔼𝔼 log D2 𝑦𝑦2

G𝐿(𝑧𝑧)

G2(𝑧𝑧)

GAN

Variation Autoencoder Adversarial

UNIT

Page 41: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Generated

Variation Autoencoder Loss

• Learning

41

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 𝐸𝐸𝐿,𝐺𝐺𝐿,𝐸𝐸2,𝐺𝐺2 = 𝔼𝔼 𝐺𝐺𝐿 𝐸𝐸𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))𝔼𝔼 𝐺𝐺2 𝐸𝐸2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))

Adversarial Lossℒ𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺𝐿,𝐷𝐷𝐿,𝐺𝐺2,𝐷𝐷2 = 𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙 1 − 𝐷𝐷𝐿(𝐺𝐺𝐿(𝑧𝑧) + 𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙𝐷𝐷𝐿 𝑦𝑦𝐿

𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙 1 − 𝐷𝐷2(𝐺𝐺2(𝑧𝑧) + 𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙𝐷𝐷2 𝑦𝑦2Real

UNIT

Overall objective functionG∗ = arg min

GmaxD

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2

Variation Autoencoder Adversarial

VAEG𝐿E𝐿 D𝐿

E2 G2 D2

G𝐿(𝑧𝑧)

G2(𝑧𝑧)

GAN

Page 42: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

UNIT

42

• Example results

Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017

Sunny → Rainy

Rainy → Sunny

Real Street-view → Synthetic Street-view

Synthetic Street-view → Real Street-view

Github Page: https://github.com/mingyuliutw/UNIT

Page 43: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Transfer Learning for Image Synthesis

• Cross-Domain Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)• Beyond image translation

43

Page 44: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Domain Transfer Networks

• Unsupervised Cross-Domain Image Generation (ICLR’17)• Goal/Problem Setting

• Image translation across two domains• One-way only translation• Unpaired training data

• Idea • Apply unified model to learn

joint representation across domains.

44Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2017

Page 45: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Domain Transfer Networks

• Unsupervised Cross-Domain Image Generation (ICLR’17)• Goal/Problem Setting

• Image translation across two domains• One-way only translation• Unpaired training data

• Idea • Apply unified model to learn

joint representation across domains.• Consistency observed in image and feature spaces

45Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2016

Image consistency

feature consistency

Page 46: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

46

• Learning• Unified model to translate across domains

• Consistency of feature and image space

• Adversarial loss

G∗ = arg minG

maxD

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑙𝑙 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2

ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑙𝑙 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦

G D

Domain Transfer Networks

Page 47: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

47

• Learning• Unified model to translate across domains

• Consistency of image and feature space

• Adversarial loss

G∗ = arg minG

maxD

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑙𝑙 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2

ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑙𝑙 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦

𝑦𝑦

𝑥𝑥

G D

Image consistency

feature consistency

G = {𝑓𝑓,𝑙𝑙}

Domain Transfer Networks

Page 48: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

48

• Learning• Unified model to translate across domains

• Consistency of feature and image space

• Adversarial loss

G∗ = arg minG

maxD

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑙𝑙 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2

ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑙𝑙 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦

𝑦𝑦

𝑥𝑥

G DG(𝑦𝑦)

G(𝑥𝑥)

Domain Transfer Networks

Page 49: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

49

• Learning• Unified model to translate across domains

• Consistency of feature and image space

• Adversarial loss

G∗ = arg minG

maxD

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑙𝑙 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2

ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑙𝑙 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦

𝑦𝑦

𝑥𝑥

G DG(𝑦𝑦)

G(𝑥𝑥)

Domain Transfer Networks

Page 50: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

DTN

50

• Example results

Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2016

SVHN 2 MNIST Photo 2 Emoji

Page 51: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

What to Cover Today…• Transfer Learning for Visual Classification & Synthesis

• Visual Classification• Domain Adaptation & Adversarial Learning

• Visual Synthesis• Style Transfer

• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement

Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 51

Page 52: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Beyond Image Style Transfer:Learning Interpretable Deep Representations• Faceapp – Putting a smile on your face!

• Deep learning for representation disentanglement • Interpretable deep feature representation

InputMr. Takeshi Kaneshiro

52

Page 53: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Recall: Generative Adversarial Networks (GAN)

• Architecture of GAN• Loss

Goodfellow et al., Generative Adversarial Nets, NIPS, 2014

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺,𝐷𝐷 = 𝔼𝔼 log 1 − 𝐷𝐷 (𝐺𝐺(𝑥𝑥)) + 𝔼𝔼 log𝐷𝐷 𝑦𝑦

x

y

53

Page 54: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Representation Disentanglement

• Goal• Interpretable deep feature representation• Disentangle attribute of interest c from the derived latent representation z• Possible solutions: VAE, GAN, or mix of them…

GLatent feature z

(uninterpretable)

InterpretableFactor c

(e.g., season)

54

Page 55: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Representation Disentanglement

• Goal• Interpretable deep feature representation• Disentangle attribute of interest c from the derived latent representation z

• Supervised setting: from VAE to conditional VAE

55

Page 56: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Representation Disentanglement

• Conditional VAE• Given training data x and attribute of interest c,

we model the conditional distribution 𝑝𝑝𝜃𝜃 𝑥𝑥|𝑐𝑐 .

https://zhuanlan.zhihu.com/p/25518643 56

Page 57: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Representation Disentanglement

• Conditional VAE• Example Results

57

Page 58: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Representation Disentanglement

• Conditional GAN• Interpretable latent factor c• Latent representation z

https://arxiv.org/abs/1411.1784 58

Page 59: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Representation Disentanglement

• Goal• Interpretable deep feature representation• Disentangle attribute of interest c from the derived latent representation z

• Unsupervised: InfoGAN• Supervised: AC-GAN

InfoGANChen et al.

NIPS ’16

ACGANOdena et al.

ICML ’17

Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17 59

Page 60: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

AC-GAN

Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17

Real dataw.r.t. its domain label

• Supervised Disentanglement

G∗ = arg minG

maxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑧𝑧, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

• Learning• Overall objective function

• Adversarial Loss

• Disentanglement lossG

D

𝑧𝑧𝑐𝑐

G(𝑧𝑧, 𝑐𝑐)𝑦𝑦 (real)

Supervised

Generated dataw.r.t. assigned label

60

Page 61: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

AC-GAN

Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17

• Supervised Disentanglement

G

D

𝑧𝑧𝑐𝑐

G(𝑧𝑧, 𝑐𝑐)𝑦𝑦 (real)

Supervised

Different 𝑐𝑐 values

61

Page 62: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

InfoGAN

• Unsupervised Disentanglement

Generated dataw.r.t. assigned label

G∗ = arg minG

maxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑧𝑧, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

• Learning• Overall objective function

• Adversarial Loss

• Disentanglement loss

Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.62

Page 63: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

InfoGAN

• Unsupervised Disentanglement

Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.

• No guarantee in disentangling particular semantics

Different 𝑐𝑐

Rotation Angle Width

Training process

Different 𝑐𝑐

Time

Loss

63

Page 64: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

What to Cover Today…• Transfer Learning for Visual Classification & Synthesis

• Visual Classification• Domain Adaptation & Adversarial Learning

• Visual Synthesis• Style Transfer

• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement• Joint style transfer & feature disentanglement

Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 64

Page 65: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Goal• Unified GAN for multi-domain image-to-image translation

Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018

Traditional Cross-Domain Models Unified Multi-Domain Model(StarGAN)

65

Page 66: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Goal• Unified GAN for multi-domain image-to-image translation

Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018

Traditional Cross-Domain ModelsUnified Multi-Domain Model

(StarGAN)

UnifiedG

𝐺𝐺𝐿2

𝐺𝐺𝐿3 𝐺𝐺24

𝐺𝐺34

𝐺𝐺𝐿4

𝐺𝐺23

66

Page 67: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Goal / Problem Setting• Single image translation model across

multiple domains • Unpaired training data

67

Page 68: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Goal / Problem Setting• Single Image translation model across multiple

domains • Unpaired training data

• Idea• Concatenate image and target domain label as input of generator• Auxiliary domain classifier on Discriminator

Target domain Image

68

Page 69: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Goal / Problem Setting• Single Image translation model across multiple

domains • Unpaired training data

• Idea• Concatenate image and target domain label as input of

Generator• Auxiliary domain classifier as discriminator too

69

Page 70: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Goal / Problem Setting• Single Image translation model across

multiple domains • Unpaired training data

• Idea• Concatenate image and target domain label as input of

Generator• Auxiliary domain classifier on Discriminator• Cycle consistency across domains

70

Page 71: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN• Goal / Problem Setting

• Single Image translation model across multiple domains

• Unpaired training data

• Idea• Auxiliary domain classifier as discriminator• Concatenate image and target domain label as input• Cycle consistency across domains

71

Page 72: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• LearningOverall objective function

G∗ = arg minG

maxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

72

Page 73: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Learning

• Adversarial Loss

Overall objective functionG∗ = arg min

GmaxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

Adversarial Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)

𝑥𝑥𝑐𝑐

73

Page 74: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Learning

• Adversarial Loss

• Domain Classification Loss (Disentanglement)

Overall objective functionG∗ = arg min

GmaxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

Domain Classification Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)

𝑥𝑥𝑐𝑐

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

74

Page 75: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Learning

• Adversarial Loss

• Domain Classification Loss (Disentanglement)

Overall objective functionG∗ = arg min

GmaxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

Domain Classification Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

Real dataw.r.t. its domain label

𝑐𝑐

𝑥𝑥

Dcls(𝑐𝑐′|𝑦𝑦)

75

Page 76: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Learning

• Adversarial Loss

• Domain Classification Loss (Disentanglement)

Overall objective functionG∗ = arg min

GmaxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

Domain Classification Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

Generated dataw.r.t. assigned label

𝑐𝑐 𝑥𝑥

Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

76

Page 77: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Learning

• Adversarial Loss

• Domain Classification Loss (Disentanglement)

• Cycle Consistency Loss

Overall objective functionG∗ = arg min

GmaxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

Consistency Loss

𝑐𝑐𝑥𝑥

G(𝑥𝑥, 𝑐𝑐)

𝑥𝑥𝑐𝑐ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D= 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G = 𝔼𝔼 G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥 − 𝑥𝑥 𝐿

G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥

77

Page 78: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN

• Learning

• Adversarial Loss

• Domain Classification Loss

• Cycle Consistency Loss

Overall objective functionG∗ = arg min

GmaxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G = 𝔼𝔼 G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥 − 𝑥𝑥 𝐿

78

Page 79: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

StarGAN• Example results

• StarGAN can somehow be viewed as a representation disentanglement model, instead of an image translation one.

Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018

Multiple Domains Multiple Domains

Github Page: https://github.com/yunjey/StarGAN

79

Page 80: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

What to Cover Today…• Transfer Learning for Visual Classification & Synthesis

• Visual Classification• Domain Adaptation & Adversarial Learning

• Visual Synthesis• Style Transfer

• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement• Joint style transfer & feature disentanglement

Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 80

Page 81: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

81

• Learning interpretable representations

Page 82: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

82

• Learning interpretable representations

Page 83: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

83

• Learning interpretable representations

Page 84: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Example Results

• Face image translation

84

Page 85: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Example Results

• Multi-attribute image translation

85

Page 86: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdfLeCun & Ranzato, Deep Learning Tutorial, ICML 2013 4 (Traditional) Machine Learning vs. Transfer

Next Week

86

Guest Lectures:1. “The Paradigm Shift in AI”

- 2:20pm-3:10pm- Dr. Trista ChenChief Scientist of Machine LearningInventec Corp.

2. “AI醫療創業分享”- 3:30pm-4:20pm- David ChouFounder & CEO of Deep01