deep learning for computer visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdflecun...

Deep Learning for Computer VisionFall 2020

http://vllab.ee.ntu.edu.tw/dlcv.html (Public website)

https://cool.ntu.edu.tw/courses/3368 (NTU COOL; for grade, etc.)

Yu-Chiang Frank Wang 王鈺強, Professor

Dept. Electrical Engineering, National Taiwan University

2020/11/10

http://vllab.ee.ntu.edu.tw/dlcv.html

https://cool.ntu.edu.tw/courses/3368

Week Date Topic Remarks

1 9/15 Course Logistics

2 9/22 Machine Learning 101

3 9/29 Intro to Neural Networks; Convolutional Neural Network (I) HW #1 out

4 10/6 Convolutional Neural Network (II): Visualization & Extensions of CNN

5 10/13 Tutorials on Python, Github, etc. (by TAs) HW #1 due

6 10/20 Visualization of CNN (II)Object Detection & Segmentation

HW #2 out

7 10/27 Image Segmentation; Generative Models

8 11/3 Generative Adversarial Network (GAN)

9 11/10 Transfer Learning for Visual Classification & Synthesis; Representation Disentanglement

HW #2 due;HW #3 out

10 11/17 TBD (CVPR Week)

11 11/24 Recurrent Neural Networks & Transformer

12 12/1 Meta-Learning; Few-Shot and Zero-Shot Classification (I) HW #3 due

13 12/8 Meta-Learning; Few-Shot and Zero-Shot Classification (II) HW #4 out

14 12/15 From Domain Adaptation to Domain Generalization Team-up for Final Projects

15 12/22 Beyond 2D vision (3D and Depth)

16 12/29 Image Inpainting and Outpainting; Guest Lecture HW #4 due

17 1/5 Guest Lectures

1/18-22 Presentation for Final Projects TBD2

What to Cover Today…• Transfer Learning for Visual Classification & Synthesis

• Visual Classification• Domain Adaptation & Adversarial Learning

• Visual Synthesis• Style Transfer

• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement

Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 3

Revisit of CNN for Visual Classification

4LeCun & Ranzato, Deep Learning Tutorial, ICML 2013

(Traditional) Machine Learning vs. Transfer Learning

• Machine Learning• Collecting/annotating data is typically expensive.

5Image Credit: A. Karpathy

(Traditional) Machine Learning vs. Transfer Learning (cont’d)

• Transfer Learning• Improved learning & understanding in the (target) domain of interest

by leveraging knowledge from a different source domain.

6

7https://techcrunch.com/2017/02/08/udacity-open-sources-its-self-driving-car-simulator-for-anyone-to-use/https://googleblog.blogspot.tw/2014/04/the-latest-chapter-for-self-driving-car.html

• A More Practical Example

Transfer Learning: What, When, and Why?

Transfer Learning

8

Transfer Learning

Multi-task Learning

TransductiveTransfer Learning

Unsupervised Transfer Learning

Inductive Transfer Learning

Domain Adaptation

Sample Selection Bias /Covariance Shift

Self-taught Learning

Labeled data are available in a target domain

Labeled data are available only in a source domain

No labeled data in both source and target domain

No labeled data in a source domain

Labeled data are available in a source domain

Case 1

Case 2Source and target tasks are learnt simultaneously

Different domains but single task

Assumption: single domain and single task

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE TKDE, 2010.

Domain Adaptationin Transfer Learning

• What’s DA?• Leveraging info from source to target domains,

so that the same learning task across domains can be addressed.• Typically all the source-domain data are labeled,

while the target domain data are partially labeled or fully unlabeled.

• Settings• Semi-supervised/unsupervised DA:

few/no target-domain data are with labels.• Imbalanced DA:

fewer classes of interest in the target domain• Open/closed-set/universal DA:

overlapping label space or not• Homogeneous vs. heterogeneous DA:

same/distinct feature types across domains

9

Deep Feature is Sufficiently Promising.

• DeCAF• Leveraging an auxiliary large dataset to train CNN.• The resulting features exhibit sufficient representation ability.• Supporting results on Office+Caltech datasets, etc.

10Donahue et al., DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, ICML 2014

Recent Deep Learning Methods for DA

• Deep Domain Confusion (DDC)

• Domain-Adversarial Training of Neural Networks (DANN)

• Adversarial Discriminative Domain Adaptation (ADDA)

• Domain Separation Network (DSN)

• Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks (PixelDA)

• No More Discrimination: Cross City Adaptation of Road Scene Segmenters

11

Shared weights Adaptation loss Generative model

DDC ✓ MMD ✘

DANN ✓ Adversarial ✘

ADDA ✘ Adversarial ✘

DSN Partially shared MMD/Adversarial ✘

PixelDA ✘ Adversarial ✓

Deep Domain Confusion (DDC)

• Deep Domain Confusion: Maximizing for Domain Invariance• Tzeng et al., arXiv: 1412.3474, 2014

12

Deep Domain Confusion (DDC)

13

sharedweights

✓Minimize classification loss:

Domain Confusion by Domain-Adversarial Training

• Domain-Adversarial Training of Neural Networks (DANN)• Y. Ganin et al., ICML 2015• Maximize domain confusion = maximize domain classification loss• Minimize source-domain data classification loss• The derived feature f can be viewed as a disentangled & domain-invariant feature.

14

Beyond Domain Confusion

• Domain Separation Network (DSN)• Bousmalis et al., NIPS 2016• Separate encoders for domain-invariant and domain-specific features• Private/common features are disentangled from each other.

15


• Domain Separation Network, NIPS 2016• Example results

32

Source-domain image Xs

Reconstruct private + shared featuresD(Ec(xs)+Ep(xs))

Reconstruct shared feature only D(Ec(xs))

Reconstruct private feature D(Ep(xs))

Target-domain image XT


• Domain Separation Network, NIPS 2016• Example results

32

Source-domain image Xs

Target-domain image XT

Transfer Learning for Manipulating Data?

• TL not only addresses cross-domain classification tasks.• Let’s see how we can synthesize and manipulate data across domains.

• As a computer vision guy, let’s focus on visual data in this lecture…

19

SourceDomain

TargetDomain

Transfer Learning for Image Synthesis

• Cross-Domain Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)• Beyond image translation

20

A Super Brief Review forGenerative Adversarial Networks (GAN)• Architecture of GAN

• Loss:

21Goodfellow et al., Generative Adversarial Nets, NIPS, 2014

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺,𝐷𝐷 = 𝔼𝔼 log 1 − 𝐷𝐷 (𝐺𝐺(𝑥𝑥)) + 𝔼𝔼 log𝐷𝐷 𝑦𝑦

x

y

Pix2pix

• Image-to-image translation with conditional adversarial networks (CVPR’17)• Can be viewed as image style transfer

22

Sketch Photo

Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.

Pix2pix• Goal / Problem Setting

• Image translation across two distinct domains (e.g., sketch v.s. photo)

• Pairwise training data

• Method: Conditional GAN• Example: Sketch to Photo

• GeneratorInput: SketchOutput: Photo

• DiscriminatorInput: Concatenation of Input(Sketch) & Synthesized/Real(Photo) imagesOutput: Real or Fake

23

Testing Phase

GeneratedInput

Input

Concat

Concat

Training Phase

Input

real


Pix2pix

24

• Learning the model

GeneratedInput

Input

Concat

Training Phase

Concat

Input

Real

ℒ𝑐𝑐𝐺𝐺𝐺𝐺𝐺𝐺(G, D) = 𝔼𝔼 𝑥𝑥 log 1 − D(𝑥𝑥, G(𝑥𝑥)) + 𝔼𝔼 𝑥𝑥,𝑦𝑦 log D 𝑥𝑥,𝑦𝑦Fake (Generated) Real

Concatenate Concatenate

ℒ𝐿𝐿𝐿(G) = 𝔼𝔼 𝑥𝑥,𝑦𝑦 𝑦𝑦 − G(𝑥𝑥) 𝐿

Reconstruction Loss

Conditional GAN loss

Overall objective functionG∗ = arg min

GmaxD

ℒ𝑐𝑐𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝐿𝐿𝐿(G)


Pix2pix

25

• Experiment results

Demo page: https://affinelayer.com/pixsrv/


https://affinelayer.com/pixsrv/



26

CycleGAN/DiscoGAN/DualGAN

• CycleGAN (CVPR’17)• Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

Networks -to-image translation with conditional adversarial networks

27Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.

• Easier to collect training data

• More practical

Paired Unpaired

1-to-1 Correspondence

No Correspondence

CycleGAN


• Goal / Problem Setting• Image translation across two distinct domains • Unpaired training data

• Idea• Autoencoding-like image translation• Cycle consistency between two domains

Photo PaintingUnpaired

Cycle Consistency

Photo Painting Photo

Training data

Painting Photo Painting

Cycle Consistency

CycleGAN


• Method (Example: Photo & Painting)

• Based on 2 GANs• First GAN (G1, D1): Photo to Painting• Second GAN (G2, D2): Painting to Photo

• Cycle Consistency• Photo consistency• Painting consistency

Photo(Input)

Painting(Generated)

G1

D1

Painting(Real)

or Real / Fake

Photo(Generated)

Painting(Input)

G2

D2or Real / Fake

Photo(Real)

CycleGAN


• Method (Example: Photo vs. Painting)

• Based on 2 GANs• First GAN (G1, D1): Photo to Painting• Second GAN (G2, D2): Photo to Painting

• Cycle Consistency• Photo consistency• Painting consistency

Photo Consistency

Photo Painting PhotoG1 G2


Painting Consistency

G1G2

CycleGAN


• Learning

• Adversarial Loss• First GAN (G1, D1):

• Second GAN (G2, D2):

Overall objective functionG𝐿∗ , G2∗ = arg min

G1,G2maxD1,D2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2First GAN Second GAN

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑥𝑥)) + 𝔼𝔼 log D𝐿 𝑦𝑦

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 = 𝔼𝔼 log 1 − D2(G2(𝑦𝑦)) + 𝔼𝔼 log D2 𝑥𝑥

Photo(Input)

Painting(Generated)

G1

D1

Painting(Real)

or

Real/ Fake

Photo(Generated)

Painting(Input)

G2

D2or Real/ Fake

Photo(Real)

𝑥𝑥 G𝐿(𝑥𝑥)

𝑦𝑦

𝑦𝑦 G2(𝑦𝑦)

𝑥𝑥

CycleGAN


• Learning

• Consistency Loss• Photo and Painting consistency

Overall objective functionG𝐿∗ , G2∗ = arg min

G1,G2maxD1,D2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2Cycle Consistency

ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2 = 𝔼𝔼 G2 G𝐿 𝑥𝑥 − 𝑥𝑥 𝐿 + G𝐿 G2 𝑦𝑦 − 𝑦𝑦 𝐿

Photo Consistency

Photo Painting Photo

G1 G2𝑥𝑥 G𝐿 𝑥𝑥 G2 G𝐿 𝑥𝑥


Painting Consistency

G1G2𝑦𝑦 G2 𝑦𝑦 G𝐿 G2 𝑦𝑦

CycleGAN


• Example results

Project Page: https://junyanz.github.io/CycleGAN/

https://junyanz.github.io/CycleGAN/

Image Translation Using Unpaired Training Data

34

Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.Kim et al. "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks.“, ICML 2017

Yi, Zili, et al. "Dualgan: Unsupervised dual learning for image-to-image translation." ICCV 2017

• CycleGAN, DiscoGAN, and DualGAN

CycleGANICCV’17

DiscoGANICML’17

DualGANICCV’17



35

UNIT

• Unsupervised Image-to-Image Translation Networks (NIPS’17)• Image translation via learning cross-domain joint representation

36Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017

𝑥𝑥𝐿𝑥𝑥2

𝑧𝑧𝒵𝒵: Joint latent space

𝒳𝒳𝐿 𝒳𝒳2

𝑥𝑥𝐿𝑥𝑥2

𝑧𝑧𝒵𝒵: Joint latent space

𝒳𝒳𝐿 𝒳𝒳2

Stage1: Encode to the joint space Stage2: Generate cross-domain images

Day Night Day Night

UNIT

• Goal/Problem Setting• Image translation

across two distinct domains• Unpaired training image data

• Idea• Based on two parallel VAE-GAN models


VAE GAN

UNIT



• Idea• Based on two parallel VAE-GAN models• Learning of joint representation

across image domains


UNIT



• Idea• Based on two parallel VAE-GAN models• Learning of joint representation

across image domains• Generate cross-domain images

from joint representation


Variation Autoencoder Loss

• Learning

40


GmaxD

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 = 𝔼𝔼 G𝐿 E𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))𝔼𝔼 G2 E2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))

VAEG𝐿E𝐿 D𝐿

E2 G2 D2

Adversarial Lossℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑧𝑧) + 𝔼𝔼 log D𝐿 𝑦𝑦𝐿

𝔼𝔼 log 1 − D2(G2(𝑧𝑧) + 𝔼𝔼 log D2 𝑦𝑦2

G𝐿(𝑧𝑧)

G2(𝑧𝑧)

GAN

Variation Autoencoder Adversarial

UNIT

Generated

Variation Autoencoder Loss

• Learning

41

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 𝐸𝐸𝐿,𝐺𝐺𝐿,𝐸𝐸2,𝐺𝐺2 = 𝔼𝔼 𝐺𝐺𝐿 𝐸𝐸𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))𝔼𝔼 𝐺𝐺2 𝐸𝐸2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))

Adversarial Lossℒ𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺𝐿,𝐷𝐷𝐿,𝐺𝐺2,𝐷𝐷2 = 𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙 1 − 𝐷𝐷𝐿(𝐺𝐺𝐿(𝑧𝑧) + 𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙𝐷𝐷𝐿 𝑦𝑦𝐿

𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙 1 − 𝐷𝐷2(𝐺𝐺2(𝑧𝑧) + 𝔼𝔼 𝑙𝑙𝑙𝑙𝑙𝑙𝐷𝐷2 𝑦𝑦2Real

UNIT


GmaxD

ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2

Variation Autoencoder Adversarial

VAEG𝐿E𝐿 D𝐿

E2 G2 D2

G𝐿(𝑧𝑧)

G2(𝑧𝑧)

GAN

UNIT

42

• Example results

Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017

Sunny → Rainy

Rainy → Sunny

Real Street-view → Synthetic Street-view

Synthetic Street-view → Real Street-view

Github Page: https://github.com/mingyuliutw/UNIT

https://github.com/mingyuliutw/UNIT



43

Domain Transfer Networks

• Unsupervised Cross-Domain Image Generation (ICLR’17)• Goal/Problem Setting

• Image translation across two domains• One-way only translation• Unpaired training data

• Idea • Apply unified model to learn

joint representation across domains.

44Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2017


• Unsupervised Cross-Domain Image Generation (ICLR’17)• Goal/Problem Setting

• Image translation across two domains• One-way only translation• Unpaired training data

• Idea • Apply unified model to learn

joint representation across domains.• Consistency observed in image and feature spaces

45Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2016

Image consistency

feature consistency

46

• Learning• Unified model to translate across domains

• Consistency of feature and image space

• Adversarial loss

G∗ = arg minG

maxD

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D

ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑙𝑙 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2

ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑙𝑙 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦

G D


47


• Consistency of image and feature space


G∗ = arg minG

maxD





𝑦𝑦

𝑥𝑥

G D

Image consistency

feature consistency

G = {𝑓𝑓,𝑙𝑙}


48




G∗ = arg minG

maxD





𝑦𝑦

𝑥𝑥

G DG(𝑦𝑦)

G(𝑥𝑥)


49




G∗ = arg minG

maxD





𝑦𝑦

𝑥𝑥

G DG(𝑦𝑦)

G(𝑥𝑥)


DTN

50

• Example results

Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2016

SVHN 2 MNIST Photo 2 Emoji

Beyond Image Style Transfer:Learning Interpretable Deep Representations• Faceapp – Putting a smile on your face!

• Deep learning for representation disentanglement • Interpretable deep feature representation

InputMr. Takeshi Kaneshiro

52

Recall: Generative Adversarial Networks (GAN)

• Architecture of GAN• Loss

Goodfellow et al., Generative Adversarial Nets, NIPS, 2014

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 𝐺𝐺,𝐷𝐷 = 𝔼𝔼 log 1 − 𝐷𝐷 (𝐺𝐺(𝑥𝑥)) + 𝔼𝔼 log𝐷𝐷 𝑦𝑦

x

y

53

Representation Disentanglement

• Goal• Interpretable deep feature representation• Disentangle attribute of interest c from the derived latent representation z• Possible solutions: VAE, GAN, or mix of them…

GLatent feature z

(uninterpretable)

InterpretableFactor c

(e.g., season)

54


• Goal• Interpretable deep feature representation• Disentangle attribute of interest c from the derived latent representation z

• Supervised setting: from VAE to conditional VAE

55


• Conditional VAE• Given training data x and attribute of interest c,

we model the conditional distribution 𝑝𝑝𝜃𝜃 𝑥𝑥|𝑐𝑐 .

https://zhuanlan.zhihu.com/p/25518643 56


• Conditional VAE• Example Results

57


• Conditional GAN• Interpretable latent factor c• Latent representation z

https://arxiv.org/abs/1411.1784 58


• Goal• Interpretable deep feature representation• Disentangle attribute of interest c from the derived latent representation z

• Unsupervised: InfoGAN• Supervised: AC-GAN

InfoGANChen et al.

NIPS ’16

ACGANOdena et al.

ICML ’17

Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17 59

AC-GAN

Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17

Real dataw.r.t. its domain label

• Supervised Disentanglement

G∗ = arg minG

maxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑧𝑧, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

• Learning• Overall objective function

• Adversarial Loss

• Disentanglement lossG

D

𝑧𝑧𝑐𝑐

G(𝑧𝑧, 𝑐𝑐)𝑦𝑦 (real)

Supervised

Generated dataw.r.t. assigned label

60

AC-GAN

Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17

• Supervised Disentanglement

G

D

𝑧𝑧𝑐𝑐

G(𝑧𝑧, 𝑐𝑐)𝑦𝑦 (real)

Supervised

Different 𝑐𝑐 values

61

InfoGAN

• Unsupervised Disentanglement


G∗ = arg minG

maxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑧𝑧, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

• Learning• Overall objective function


• Disentanglement loss

Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.62

InfoGAN

• Unsupervised Disentanglement

Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.

• No guarantee in disentangling particular semantics

Different 𝑐𝑐

Rotation Angle Width

Training process

Different 𝑐𝑐

Time

Loss

63




• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement• Joint style transfer & feature disentanglement


StarGAN

• Goal• Unified GAN for multi-domain image-to-image translation

Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018

Traditional Cross-Domain Models Unified Multi-Domain Model(StarGAN)

65

StarGAN

• Goal• Unified GAN for multi-domain image-to-image translation


Traditional Cross-Domain ModelsUnified Multi-Domain Model

(StarGAN)

UnifiedG

𝐺𝐺𝐿2

𝐺𝐺𝐿3 𝐺𝐺24

𝐺𝐺34

𝐺𝐺𝐿4

𝐺𝐺23

66

StarGAN

• Goal / Problem Setting• Single image translation model across

multiple domains • Unpaired training data

67

StarGAN

• Goal / Problem Setting• Single Image translation model across multiple

domains • Unpaired training data

• Idea• Concatenate image and target domain label as input of generator• Auxiliary domain classifier on Discriminator

Target domain Image

68

StarGAN

• Goal / Problem Setting• Single Image translation model across multiple

domains • Unpaired training data

• Idea• Concatenate image and target domain label as input of

Generator• Auxiliary domain classifier as discriminator too

69

StarGAN

• Goal / Problem Setting• Single Image translation model across

multiple domains • Unpaired training data

• Idea• Concatenate image and target domain label as input of

Generator• Auxiliary domain classifier on Discriminator• Cycle consistency across domains

70

StarGAN• Goal / Problem Setting

• Single Image translation model across multiple domains

• Unpaired training data

• Idea• Auxiliary domain classifier as discriminator• Concatenate image and target domain label as input• Cycle consistency across domains

71

StarGAN

• LearningOverall objective function

G∗ = arg minG

maxD

ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G

72

StarGAN

• Learning



GmaxD


ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦

Adversarial Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)

𝑥𝑥𝑐𝑐

73

StarGAN

• Learning


• Domain Classification Loss (Disentanglement)


GmaxD



Domain Classification Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)

𝑥𝑥𝑐𝑐


74

StarGAN

• Learning




GmaxD





Real dataw.r.t. its domain label

𝑐𝑐

𝑥𝑥

Dcls(𝑐𝑐′|𝑦𝑦)

75

StarGAN

• Learning




GmaxD






𝑐𝑐 𝑥𝑥

Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

76

StarGAN

• Learning



• Cycle Consistency Loss


GmaxD



Consistency Loss

𝑐𝑐𝑥𝑥

G(𝑥𝑥, 𝑐𝑐)

𝑥𝑥𝑐𝑐ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D= 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))

ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G = 𝔼𝔼 G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥 − 𝑥𝑥 𝐿

G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥

77

StarGAN

• Learning


• Domain Classification Loss

• Cycle Consistency Loss


GmaxD




ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G = 𝔼𝔼 G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥 − 𝑥𝑥 𝐿

78

StarGAN• Example results

• StarGAN can somehow be viewed as a representation disentanglement model, instead of an image translation one.


Multiple Domains Multiple Domains

Github Page: https://github.com/yunjey/StarGAN

79

https://github.com/yunjey/StarGAN




• Representation Disentanglement• Supervised vs. unsupervised feature disentanglement• Joint style transfer & feature disentanglement


A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

81

• Learning interpretable representations


82



83


Example Results

• Face image translation

84

Example Results

• Multi-attribute image translation

85

Next Week

86

Guest Lectures:1. “The Paradigm Shift in AI”

- 2:20pm-3:10pm- Dr. Trista ChenChief Scientist of Machine LearningInventec Corp.

2. “AI醫療創業分享”- 3:30pm-4:20pm- David ChouFounder & CEO of Deep01

deep learning for computer visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w8.pdflecun...

Documents