ttic 31230, fundamentals of deep learningttic.uchicago.edu/~dmcallester/deepclass/gans.pdfttic...
TRANSCRIPT
![Page 1: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/1.jpg)
TTIC 31230, Fundamentals of Deep Learning
David McAllester, April 2017
Generative Adversarial Networks (GANs)
![Page 2: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/2.jpg)
The Generator and The Discriminator
A GAN consists of two networks: a generator PgenΘ (x) and a
discriminator P discΨ (y|x).
Θ∗ = argmaxΘ
minΨ
E(x,y)∼(D ] P genΘ )
[log
1
P discΨ (y|x)
]
Here x is drawn from the data distribution D or the generatordistribution P
genΘ with equal propability and y = 1 if x is
drawn from D and −1 if x is drawn from PgenΘ .
The discriminator tries to determine which source x came fromand the generator tries to fool the discriminator.
![Page 3: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/3.jpg)
Consistency
If the discriminator is perfect, then the only way to fool it isto exactly copy the data distribution.
Consistency Theorem: If PgenΘ (x) and P disc
Ψ (y|x) are bothuniversally expressive (any distribution can be represented)then P
genΘ∗ = D.
![Page 4: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/4.jpg)
DC GANs, Radford, Metz and Chintala, ICLR 2016
![Page 5: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/5.jpg)
The Generator
![Page 6: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/6.jpg)
Generated Bedrooms
![Page 7: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/7.jpg)
Interpolated Faces
[Ayan Chakrabarti]
![Page 8: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/8.jpg)
Conditional Distribution Modeling
All distribution modeling methods apply to conditional distri-butions.
For conditional GANs we allow the generator to take x as aninput and generate a conditional value c.
Θ∗ = argmaxΘ
minΨ
Ex∼D, (c,y)∼( D(c|x) ] P genΘ (c|x) )
[log
1
P discΨ (y|c, x)
]
Here y = 1 if c is drawn from D(c|x) and y = −1 if c is drawnfrom P
genΘ (c|x).
![Page 9: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/9.jpg)
The Case of Imperfect Generation
Θ∗ = argmaxΘ
minΨ
E(x,y)∼(D ] P genΘ )
[log
1
P discΨ (y|x)
]
Ψ∗(Θ) = argminΨ
E(x,y)∼(D ] P genΘ )
[log2
1
P (y|x)
]
P discΨ∗(Θ)(y = 1|x) =
P (x, y = 1)
P (x)=
D(x)
D(x) + PgenΘ (x)
![Page 10: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/10.jpg)
Θ∗ = argmaxΘ
E(x,y)∼(D ] P genΘ )
[− log2P
discΨ∗(Θ)(y|x)
]
= argmaxΘ
1
2E(x,1)∼D
[log2
D(x) + π(x|Θ)
D(x)
]+
1
2E(x,−1)∼P gen
Θ
[log2
D(x) + PgenΘ (x)
PgenΘ (x)
]
= argmaxΘ
1− 1
2KL(D,A)− 1
2KL(P
genΘ , A)
A(x) =1
2(D(x) + P
genΘ (x))
![Page 11: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/11.jpg)
Jensen-Shannon Divergence (JSD)
We have arrived at the Jensen-Shannon divergence.
Θ∗ = argminΘ
JSD(D,PgenΘ )
JSD(P,Q) =1
2KL
(P,P + Q
2
)+
1
2KL
(Q,P + Q
2
)
0 ≤ JSD(P,Q) = JSD(Q,P ) ≤ 1
![Page 12: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/12.jpg)
The Discriminator Tends to Win
If the discriminator “wins” the discriminator log loss goes tozero (becomes exponentially small) and there is no gradient toguide the generator.
In this case the learning stops and the generator is blockedfrom minimizing JSD(D,P
genΘ ).
![Page 13: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/13.jpg)
The Standard Fix
The standard fix is to replace the loss
` = − logP discΨ (y|x)
with
˜̀ = −y logP discΨ (1|x)
These two loss functions agree when y = 1 (the case where xis drawn from D) but are very different when x is drawn fromthe generator (y = −1) and P disc
Ψ (1|x) is exponentially closeto zero.
![Page 14: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/14.jpg)
A Margin Interpretation of the Standard Fix
The standard fix can be interpreted in terms of the “margin”of binary classification.
For y ∈ {−1, 1} we typically have sΨ(1|x) = −sΨ(−1|x) andsoftmax over 1 and -1 gives
PΨ(y|x) =1
1 + e−m
where the margin m = 2ysΨ(x).
The margin is large when the prediction is confidently corrent.
![Page 15: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/15.jpg)
A Margin Interpretation of the Standard Fix
In the standard fix we (essentially) take the loss to be themargin of the discriminator.
The generator wants to reduce the discriminator’s margin.
The direction of the update is the same but the step is muchlarger under margin-loss for generated inputs and large dis-criminator margins.
![Page 16: TTIC 31230, Fundamentals of Deep Learningttic.uchicago.edu/~dmcallester/DeepClass/GANs.pdfTTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Generative Adversarial](https://reader035.vdocuments.net/reader035/viewer/2022062606/5fe6b659dba125287e37317e/html5/thumbnails/16.jpg)
END