temperature is all you need - centre de recherches ... · temperature is all you need massimo...
TRANSCRIPT
Temperature Is All You Need
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin
MILA, Universite de Montreal, RLlab, McGill University & Google Brain
September 1, 2018
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 1 / 19
tl;dr
(lower is better for both metrics)
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 2 / 19
tl;dr
(lower is better for both metrics)
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 2 / 19
tl;dr
The most effective way to evaluate NLG models is to comparethem in quality/diversity space w.r.t multiple temperatures
Maximum-likelihood training is superior (for now) to TextualGANs
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 3 / 19
tl;dr
The most effective way to evaluate NLG models is to comparethem in quality/diversity space w.r.t multiple temperatures
Maximum-likelihood training is superior (for now) to TextualGANs
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 3 / 19
tl;dr
The most effective way to evaluate NLG models is to comparethem in quality/diversity space w.r.t multiple temperatures
Maximum-likelihood training is superior (for now) to TextualGANs
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 3 / 19
MLE trained models generate poor samples
(1) Researchers are expected to comment on where ascheme is sold , but it is no longer this big name at thispoint .
(2) We know you ’ re going to build the kind of homeyou ’ re going to be expecting it can give us a betterunderstanding of what ground test we ’ re on this year, he explained .
Table 1: MLE samples lack in global coherence
It is hypothesized that this is in-part because of the well-known issuesof exposure bias [1], [2].
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 4 / 19
MLE trained models generate poor samples
(1) Researchers are expected to comment on where ascheme is sold , but it is no longer this big name at thispoint .
(2) We know you ’ re going to build the kind of homeyou ’ re going to be expecting it can give us a betterunderstanding of what ground test we ’ re on this year, he explained .
Table 1: MLE samples lack in global coherence
It is hypothesized that this is in-part because of the well-known issuesof exposure bias [1], [2].
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 4 / 19
GAN generate great samples (in images)
Figure 1: GAN generated images from [3]
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 5 / 19
GAN generate great samples (in images)
Figure 1: GAN generated images from [3]
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 5 / 19
GAN visualized
Figure 2: GAN [4] Framework visualized
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 6 / 19
non-differentiable in output space
Difficult to backpropagate through discrete variables
Straight-through estimator [5]
Gumbel softmax [6]
Reinforcement Learning (in particular REINFORCE [7])
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 7 / 19
non-differentiable in output space
Difficult to backpropagate through discrete variables
Straight-through estimator [5]
Gumbel softmax [6]
Reinforcement Learning (in particular REINFORCE [7])
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 7 / 19
Recent advancements in Sequential GANs
SeqGAN [8]: First sequential GAN (trained with REINFORCE)
Figure 3: A cartoon of SeqGAN
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 8 / 19
Recent advancements in Sequential GANs
SeqGAN [8]: First sequential GAN (trained with REINFORCE)
Figure 3: A cartoon of SeqGAN
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 8 / 19
Recent advancements in Sequential GANs
SeqGAN [8]: First sequential GAN (trained with REINFORCE)
Figure 3: A cartoon of SeqGAN
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 8 / 19
Recent advancements in Sequential GANs
MaliGAN [9]: rescale the reward to alleviate the vanishing gradient
RankGAN [10]: replace binary classification with ranking score
LeakGAN [11]: Discriminator leaks information to the Generator
TextGAN [12]: MMD loss + latent regularization
a lot more: DPGAN, GSGAN, IRLGAN, etc
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19
Recent advancements in Sequential GANs
MaliGAN [9]: rescale the reward to alleviate the vanishing gradient
RankGAN [10]: replace binary classification with ranking score
LeakGAN [11]: Discriminator leaks information to the Generator
TextGAN [12]: MMD loss + latent regularization
a lot more: DPGAN, GSGAN, IRLGAN, etc
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19
Recent advancements in Sequential GANs
MaliGAN [9]: rescale the reward to alleviate the vanishing gradient
RankGAN [10]: replace binary classification with ranking score
LeakGAN [11]: Discriminator leaks information to the Generator
TextGAN [12]: MMD loss + latent regularization
a lot more: DPGAN, GSGAN, IRLGAN, etc
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19
Recent advancements in Sequential GANs
MaliGAN [9]: rescale the reward to alleviate the vanishing gradient
RankGAN [10]: replace binary classification with ranking score
LeakGAN [11]: Discriminator leaks information to the Generator
TextGAN [12]: MMD loss + latent regularization
a lot more: DPGAN, GSGAN, IRLGAN, etc
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19
Recent advancements in Sequential GANs
MaliGAN [9]: rescale the reward to alleviate the vanishing gradient
RankGAN [10]: replace binary classification with ranking score
LeakGAN [11]: Discriminator leaks information to the Generator
TextGAN [12]: MMD loss + latent regularization
a lot more: DPGAN, GSGAN, IRLGAN, etc
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19
Recent advancements in Sequential GANs
MaliGAN [9]: rescale the reward to alleviate the vanishing gradient
RankGAN [10]: replace binary classification with ranking score
LeakGAN [11]: Discriminator leaks information to the Generator
TextGAN [12]: MMD loss + latent regularization
a lot more: DPGAN, GSGAN, IRLGAN, etc
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19
flawed evaluation protocol
BLEU [13] is an n-gram overlap metric most popular in MT
To advertise the Textual GANs, corpus-level BLEU was designed
corpus-level BLEU (alone) is really bad
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19
flawed evaluation protocol
BLEU [13] is an n-gram overlap metric most popular in MT
To advertise the Textual GANs, corpus-level BLEU was designed
corpus-level BLEU (alone) is really bad
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19
flawed evaluation protocol
BLEU [13] is an n-gram overlap metric most popular in MT
To advertise the Textual GANs, corpus-level BLEU was designed
corpus-level BLEU (alone) is really bad
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19
flawed evaluation protocol
BLEU [13] is an n-gram overlap metric most popular in MT
To advertise the Textual GANs, corpus-level BLEU was designed
corpus-level BLEU (alone) is really bad
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19
quick fix
Self-BLEU [14]: BLEU score between generated sentences
Figure 4: Results taken from [15]. Lower is better for both metrics.
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 11 / 19
quick fix
Self-BLEU [14]: BLEU score between generated sentences
Figure 4: Results taken from [15]. Lower is better for both metrics.
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 11 / 19
temperature tuning
G(yt|y1:t−1) = softmax(ot ·W/α).
ot: generator’s pre-logits activation at tW : word embedding matrixα: temperature parameter
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 12 / 19
temperature tuning
G(yt|y1:t−1) = softmax(ot ·W/α).
ot: generator’s pre-logits activation at tW : word embedding matrixα: temperature parameter
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 12 / 19
temperature tuning
α
2 (1) If you go at watch crucial characters putting awareness in Washington , forgetthere are now unique developments organized personally then why charge .
(2) Front wants zero house blood number places than above spin 5 provide schoolprojects which youth particularly teenager temporary dollars plenty of investors enjoyheaded Japan about if federal assets own , at 41 .
0.7 (1) The other witnesses are believed to have been injured , the police said in a state-ment , adding that there was no immediate threat to any other witnesses .
(2) The company ’ s net income fell to 5 . 29 billion , or 2 cents per share , on thesame period last year .
0 (1) The company ’ s shares rose 1 . 5 percent to 1 . 81 percent , the highest since theend of the year .
(2) The company ’ s shares rose 1 . 5 percent to 1 . 81 percent , the highest since theend of the year .
Table 2: Effect of varying the temperature of the softmax layer in anautoregressive language model
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 13 / 19
Quality/Diversity space w.r.t temperatures
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 14 / 19
Quality/Diversity space w.r.t temperatures
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 14 / 19
New (global) metrics in town
Language Model score (quality): likelihood of the generatedsentences under a LM
Reverse LM score [16] (diversity + quality)
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 15 / 19
New (global) metrics in town
Language Model score (quality): likelihood of the generatedsentences under a LM
Reverse LM score [16] (diversity + quality)
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 15 / 19
New (global) metrics in town
Language Model score (quality): likelihood of the generatedsentences under a LM
Reverse LM score [16] (diversity + quality)
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 15 / 19
Quality/Diversity space w.r.t temperatures (global)
Figure 5: lower is better for both metrics
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 16 / 19
Quality/Diversity space w.r.t temperatures (global)
Figure 5: lower is better for both metrics
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 16 / 19
Conclusion
most effective way to evaluate NLG models is in quality/diversityspace w.r.t. multiples temperatures
MLE training is superior (for now) to adversarial training
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 17 / 19
Conclusion
most effective way to evaluate NLG models is in quality/diversityspace w.r.t. multiples temperatures
MLE training is superior (for now) to adversarial training
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 17 / 19
Conclusion
most effective way to evaluate NLG models is in quality/diversityspace w.r.t. multiples temperatures
MLE training is superior (for now) to adversarial training
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 17 / 19
Future Work
What about adversarial latent space regularization ?
Temperature Is All You Need for Image Generation ?
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 18 / 19
Future Work
What about adversarial latent space regularization ?
Temperature Is All You Need for Image Generation ?
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 18 / 19
Future Work
What about adversarial latent space regularization ?
Temperature Is All You Need for Image Generation ?
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 18 / 19
References
Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba.
Sequence level training with recurrent neural networks.arXiv preprint arXiv:1511.06732, 2015.
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer.
Scheduled sampling for sequence prediction with recurrent neural networks.In Advances in Neural Information Processing Systems, pages 1171–1179, 2015.
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen.
Progressive growing of gans for improved quality, stability, and variation.arXiv preprint arXiv:1710.10196, 2017.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,
Aaron Courville, and Yoshua Bengio.Generative adversarial nets.In Advances in neural information processing systems, pages 2672–2680, 2014.
Yoshua Bengio, Nicholas Leonard, and Aaron Courville.
Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013.
Eric Jang, Shixiang Gu, and Ben Poole.
Categorical reparameterization with gumbel-softmax.arXiv preprint arXiv:1611.01144, 2016.
Ronald J Williams.
Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine learning, 8(3-4):229–256, 1992.
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu.
Seqgan: Sequence generative adversarial nets with policy gradient.2017.
Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, and Yoshua
Bengio.Maximum-likelihood augmented discrete generative adversarial networks.arXiv preprint arXiv:1702.07983, 2017.
Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun.
Adversarial ranking for language generation.In Advances in Neural Information Processing Systems, pages 3155–3165, 2017.
Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang.
Long text generation via adversarial training with leaked information.arXiv preprint arXiv:1709.08624, 2017.
Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin.
Adversarial feature matching for text generation.arXiv preprint arXiv:1706.03850, 2017.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu.
Bleu: a method for automatic evaluation of machine translation.In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318.Association for Computational Linguistics, 2002.
Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu.
Texygen: A benchmarking platform for text generation models.SIGIR, 2018.
Sidi Lu, Yaoming Zhu, Weinan Zhang, Jun Wang, and Yong Yu.
Neural text generation: Past, present and beyond.arXiv preprint arXiv:1803.07133, 2018.
Junbo Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M Rush, and Yann LeCun.
Adversarially regularized autoencoders for generating discrete structures.CoRR, abs/1706.04223, 2017.
Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 19 / 19