![Page 1: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/1.jpg)
Efficient Saddle Point Optimization for Modern Machine Learning
Prédoc III - Gauthier Gidel
Jury: Président : Yoshua Bengio
Membre : Ioannis MitliagkasDirecteur : Simon Lacoste-Julien
![Page 2: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/2.jpg)
Outline
1. Introduction on Saddle point optimization, Games and Variational Inequalities.
2. Frank-Wolfe Algorithm for Saddle Point problems.
3. Negative Momentum for improved game dynamics.
4. A Variational inequality perspective on GANs.
5. Future Work.
NB: All the citations in this talk are at the end of the slides.
Slides available on my website: http://gauthiergidel.github.io
![Page 3: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/3.jpg)
Saddle point optimization, Games and Variational Inequalities.
Based on [Gidel et al. 2017], [Gidel et al. 2018a] and [Gidel et al. 2018b]
![Page 4: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/4.jpg)
Game dynamics are weirdfascinating
![Page 5: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/5.jpg)
Start with optimization dynamics
![Page 6: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/6.jpg)
Optimization
Smooth, differentiable cost function, L→ Looking for stationary (fixed) points
(gradient is 0)→ Gradient descent
Gauthier Gidel, Predoc III , November 28, 2018
![Page 7: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/7.jpg)
Optimization
Conservative vector field →
Gradient based dynamics
Gauthier Gidel, Predoc III , November 28, 2018
![Page 8: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/8.jpg)
Saddle point problems
Gauthier Gidel, Predoc III , November 28, 2018
Smooth, differentiable cost function,→ Looking for stationary (fixed) points
(gradients are 0)→ Gradient descent method.
![Page 9: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/9.jpg)
Non-Conservative vector field →
Gradient based dynamics:
Gauthier Gidel, Predoc III , November 28, 2018
Saddle point problems
![Page 10: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/10.jpg)
Minmax training is hard different !
![Page 11: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/11.jpg)
Minmax training is hard different !(You can replace “minmax” with two-player games)
![Page 12: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/12.jpg)
“Minmax Training is Hard ...”
Example: WGAN [Arjovky et al. 2017] with linear discriminator and generator
Bilinear saddle point = Linear in 𝜃 and 𝜙 ⇒ “Cycling behavior” (see right).
Dynamics:
Gauthier Gidel, Predoc III , November 28, 2018
![Page 13: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/13.jpg)
Multi-player Games
![Page 14: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/14.jpg)
Two-player Games
Zero-sum game if: also called Saddle Point (SP).
Example: WGAN formulation [Arjovsky et al. 2017]
Player 2Player 1
Gauthier Gidel, Predoc III , November 28, 2018
![Page 15: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/15.jpg)
Two-player Games
Non zero-sum game if we do not have:
Player 2Player 1
Example: Non-saturating GAN: [Goodfellow et al. 2014]
Loss of Generator Loss of Discriminator
Gauthier Gidel, Predoc III , November 28, 2018
![Page 16: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/16.jpg)
Two-player GamesPlayer 2Player 1
● In games we want to converge to the Saddle Point.
● Different from single objective minimization where
we want to avoid saddle points.
● Saddle point -> Zero-sum game (or Minmax)
Gauthier Gidel, Predoc III , November 28, 2018
![Page 17: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/17.jpg)
Variational Inequality Problem (VIP)
![Page 18: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/18.jpg)
Variational Inequality Problem
Nash-Equilibrium:
Stationary Conditions:
No player can improve its cost
- Based on stationary conditions.- Relates to vast literature with standard algorithms.
can be constraint sets.
Gauthier Gidel, Predoc III , November 28, 2018
![Page 19: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/19.jpg)
Variational Inequality Problems
Nash-Equilibrium: Stationary Conditions:
Same problem but different perspective.
Joint Minimization vs. Stationary point
Gauthier Gidel, Predoc III , November 28, 2018
(under convexity assumption)
![Page 20: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/20.jpg)
Variational Inequality Problem
Stationary Conditions:
Can be written as:
𝜔* solves the Variational Inequality
Gauthier Gidel, Predoc III , November 28, 2018
![Page 21: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/21.jpg)
Variational Inequality ProblemStationary Conditions:
Figure from [Dunn 1979]
Unconstrained (or optimum in the interior):
Gauthier Gidel, Predoc III , November 28, 2018
![Page 22: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/22.jpg)
Variational Inequality ProblemStationary Conditions:
Figure from [Dunn 1979]
Unconstrained (or ⍵* in the interior): Constrained and ⍵* on the boundary:
Figure from [Dunn 1979]
Gauthier Gidel, Predoc III , November 28, 2018
![Page 23: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/23.jpg)
Techniques to optimize VIP (Batch setting)
![Page 24: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/24.jpg)
Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.
- Easy to implement. (out of the training loop)- Can be combined with any method.
Averaging schemes can be efficiently implemented in an online fashion:
Gauthier Gidel, Predoc III , November 28, 2018
![Page 25: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/25.jpg)
Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.
- Easy to implement. (out of the training loop)- Can be combined with any method.
General Online averaging:
Example 1: Uniform averaging
Example 2: Exponential moving averaging (EMA)
Gauthier Gidel, Predoc III , November 28, 2018
![Page 26: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/26.jpg)
Standard Algorithms from Variational InequalityMethod 2: Extragradient
- Step 1:
- Step 2:
Intuition:
1. Game prespective: Look one step in the future and anticipate next move of adversary.
- Standard in the literature.- Does not require averaging.- Theoretically and empirically
faster.
Gauthier Gidel, Predoc III , November 28, 2018
![Page 27: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/27.jpg)
Frank-Wolfe Algorithm for Saddle Point Problems
Based on an AISTATS paper [Gidel et al. 2017]. Joint work with Tony Jebara and Simon Lacoste-Julien
![Page 28: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/28.jpg)
Saddle point problems
Gauthier Gidel, Predoc III , November 28, 2018
Smooth, differentiable cost function,→ Compact constraints sets.→ Looking for stationary (fixed) points→ Gradient descent method.
![Page 29: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/29.jpg)
Saddle point problems
Gauthier Gidel, Predoc III , November 28, 2018
Smooth, differentiable cost function,→ Compact constraints sets.→ Looking for stationary (fixed) points→ Gradient descent method.
Need to project ?
![Page 30: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/30.jpg)
Projection-free Method
Projection may be challenging.
Figure from [Dunn 1979]
Gauthier Gidel, Predoc III , November 28, 2018
(Extra-)Gradient method:- Require Projection- Each projection is a quadratic problem
- Might be too expensive if the constraints set is structured.
- May use instead projection-free methods.- Frank-Wolfe is projection-free.- It only requires to solve linear problem.
2
![Page 31: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/31.jpg)
Projection-free Method
Gauthier Gidel, Predoc III , November 28, 2018
(Extra-)Gradient method:- Require Projection- Each projection is a quadratic problem
- Might be too expensive if the constraints set is structured.
- May use instead projection-free methods.
- Frank-Wolfe is projection-free.- It only requires to solve linear problem.
Example of problem with expensive projection:
![Page 32: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/32.jpg)
Projection-free Method
Gauthier Gidel, Predoc III , November 28, 2018
![Page 33: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/33.jpg)
Projection-free Method
Gauthier Gidel, Predoc III , November 28, 2018
![Page 34: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/34.jpg)
Projection-free Method
Gauthier Gidel, Predoc III , November 28, 2018
![Page 35: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/35.jpg)
Projection-free Method for Saddle Point
Gauthier Gidel, Predoc III , November 28, 2018
![Page 36: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/36.jpg)
Theoretical Contributions
Gauthier Gidel, Predoc III , November 28, 2018
[Hammond 1984]
![Page 37: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/37.jpg)
Negative Momentum for Improved Game Dynamics
Based on an AISTATS submission [Gidel et al. 2018b]. Joint work with Reyhane Askari Hemmat, Mohammad Pezeshki, Rémi Le Priol, Gabriel Huang, Simon
Lacoste-Julien and Ioannis Mitliagkas
![Page 38: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/38.jpg)
Two-player Games
Nash Equilibrium Smooth, differentiable L→ Looking for local Nash equil.
→ Gradient method: → Simultaneous→ Alternating
Gauthier Gidel, Predoc III , November 28, 2018
![Page 39: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/39.jpg)
Two-player Games
Gauthier Gidel, Predoc III , November 28, 2018
Simultaneous Updates:
Alternating Updates:
![Page 40: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/40.jpg)
First contribution: Bilinear game
![Page 41: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/41.jpg)
“Proof by picture”Gradient descent
→ Simultaneous→ Alternating
Momentum→ Positive→ Negative
![Page 42: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/42.jpg)
Second contribution: Game dynamics
Gauthier Gidel, Predoc III , November 28, 2018
![Page 43: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/43.jpg)
Game dynamics under gradient descent
Jacobian is non-symmetric, with complex eigenvalues → Rotations in decision space
Momentum can manipulate the eigenvalues of the Jacobian.
Can momentum help/hurt??
Gauthier Gidel, Predoc III , November 28, 2018
![Page 44: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/44.jpg)
Spoiler
Positive momentum can be bad for adversarial games
Practice that was very common when GANs were first invented.
→ Recent work reduced the momentum parameter.→ Not an accident
Gauthier Gidel, Predoc III , November 28, 2018
![Page 45: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/45.jpg)
Momentum on games
Fixed point operator requires a state augmentation:(because we need previous iterate)
Recall Polyak’s momentum (on top of simultaneous grad. desc.):
Gauthier Gidel, Predoc III , November 28, 2018
![Page 46: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/46.jpg)
A Variational Inequality Perspective on GANs
Based on an ICLR submission [Gidel et al. 2018a]. Joint work with Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien
![Page 47: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/47.jpg)
Quick recap on Generative Adversarial Networks (GANs)
(and two-player games)
![Page 48: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/48.jpg)
Generative Adversarial Networks (GANs)
Fake Data
True Data
GeneratorNoise
DiscriminatorFakeorReal
[Goodfelow et al. NIPS 2014]
Gauthier Gidel, Predoc III , November 28, 2018
![Page 49: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/49.jpg)
Generative Adversarial Networks (GANs)
Discriminator Generator
If D is non-parametric:
[Goodfelow et al. NIPS 2014]
Non-saturating GAN: “much stronger gradient in early learning”Loss of Generator Loss of Discriminator
Gauthier Gidel, Predoc III , November 28, 2018
![Page 50: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/50.jpg)
Two-player Games
Non zero-sum game if we do not have:
Player 2Player 1
Example: Non-saturating GAN: [Goodfellow et al. 2014]
Loss of Generator Loss of Discriminator
Gauthier Gidel, Predoc III , November 28, 2018
![Page 51: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/51.jpg)
GANs as a Variational InequalityTakeaways:
- GAN can be formulated as a Variational Inequality.
- Encompass most of GANs formulations.
- Standard algorithms from Variational Inequality can be used for GANs.
- Theoretical Guarantees (for convex and stochastic cost functions).
Gauthier Gidel, Predoc III , November 28, 2018
![Page 52: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/52.jpg)
Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.
- Easy to implement. (out of the training loop)- Can be combined with any method.
General Online averaging:
Example 1: Uniform averaging
Example 2: Exponential moving averaging (EMA)
Gauthier Gidel, Predoc III , November 28, 2018
![Page 53: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/53.jpg)
Standard Algorithms from Variational InequalityMethod 1: Averaging
Simple Minmax problem:
Gauthier Gidel, Predoc III , November 28, 2018
![Page 54: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/54.jpg)
Standard Algorithms from Variational InequalityMethod 1: Averaging
Simple Minmax problem:
Gauthier Gidel, Predoc III , November 28, 2018
![Page 55: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/55.jpg)
Standard Algorithms from Variational InequalityMethod 1: Averaging
Gauthier Gidel, Predoc III , November 28, 2018
![Page 56: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/56.jpg)
Standard Algorithms from Variational InequalityMethod 2: Extragradient
- Step 1:
- Step 2:
Intuition:
1. Game prespective: Look one step in the future and anticipate next move of adversary.
2. Euler’s method: Extrapolation is close to an implicit method because
- Standard in the literature.- Does not require averaging.- Theoretically and empirically
faster.
Gauthier Gidel, Predoc III , November 28, 2018
![Page 57: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/57.jpg)
Standard Algorithms from Variational InequalityMethod 2: Extragradient
Intuition: Extrapolation is close to an implicit method because
Unknown:Require to solve a non-linear system
Gauthier Gidel, Predoc III , November 28, 2018
![Page 58: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/58.jpg)
Standard Algorithms from Variational InequalityMethod 2: Extragradient Intuition: Extrapolation is close to an implicit method
*
*
almost the same
Gauthier Gidel, Predoc III , November 28, 2018
![Page 59: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/59.jpg)
Problem: Extragradient requires to compute two gradients at each step.
Solution: Extrapolation from the past Re-use gradient.
- Step 1: Re-use from previous iteration.
- Step 2: (same as extragradient).
Extrapolation from the past: Re-using the gradients
Gauthier Gidel, Predoc III , November 28, 2018
![Page 60: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/60.jpg)
Extrapolation from the past: Re-using the gradients
Problem: Extragradient requires to compute two gradients at each step.
Solution: Extrapolation from the past Re-use gradient.
- Step 1: Re-use from previous iteration.
- Step 2: (same as extragradient).
New Method !!!Related to [Daskalakis et al., 2018]
Gauthier Gidel, Predoc III , November 28, 2018
![Page 61: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/61.jpg)
step-size = 0.2 step-size = 0.5
![Page 62: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/62.jpg)
Experimental Results
![Page 63: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/63.jpg)
Experimental Results
Bilinear Stochastic Objective:
Gauthier Gidel, Predoc III , November 28, 2018
![Page 64: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/64.jpg)
Experimental Results: WGAN (DCGAN) on CIFAR10
Inception Score on CIFAR10
Extragradient Methods
Inception Score vs nb of generator updates
Averaging
Gauthier Gidel, Predoc III , November 28, 2018
![Page 65: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/65.jpg)
Extrapolation(Adam style)
Update(Adam style)
![Page 66: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/66.jpg)
Experimental Results: WGAN-GP (ResNet) on CIFAR10
Extragradient Methods Averaging
Inception Score vs Number of
Gauthier Gidel, Predoc III , November 28, 2018
![Page 67: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/67.jpg)
Conclusion
![Page 68: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/68.jpg)
- Training of adversarial formulations has been a recurrent issue in modern ML.
- Impact of non-convexity and stochasticity are less understood than in the single objective minimization.
- A better understanding of this framework is key to design new optimization algorithms.
- We provided tools to better understand saddle point problem, multi-player games and more generally variational inequalities.
- However, we just scratched the surface .
Gauthier Gidel, Mila Tea Talk, October 26, 2018
![Page 69: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/69.jpg)
Thank you !
Gaëtan VignoudHugo Berard
Pascal VincentSimon
Lacoste-Julien Tony Jebara
Reyhane Askari Hemmat Gabriel Huang Rémi Le priol
Ioannis Mitliagkas
Mohammad Pezeshki
![Page 70: Efficient Saddle Point Optimization for Modern Machine ... · Efficient Saddle Point Optimization for Modern Machine Learning Prédoc III - Gauthier Gidel Jury: Président : Yoshua](https://reader035.vdocuments.net/reader035/viewer/2022081405/5f0705177e708231d41ae597/html5/thumbnails/70.jpg)
Bibliography- Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." in ICML 2017.
- Dunn, Joseph C. "Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals." SIAM Journal on Control and Optimization” 1979
- Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems 2014.
- Gidel, Gauthier, Tony Jebara, and Simon Lacoste-Julien. "Frank-wolfe algorithms for saddle point problems." in AISTATS 2017.
- Gidel, Gauthier, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. "A Variational Inequality Perspective on Generative Adversarial Nets." arXiv preprint arXiv:1802.10551 (2018).
- Gidel, Gauthier, Reyhane Askari Hemmat, Mohammad Pezeshki, Rémi Le priol, Gabriel Huang, Simon Lacoste-Julien, and Ioannis Mitliagkas. "Negative momentum for improved game dynamics." arXiv preprint arXiv:1807.04740 (2018).
- Hammond, Janice H. "Solving asymmetric variational inequality problems and systems of equations with generalized nonlinear programming algorithms." PhD diss., Massachusetts Institute of Technology, 1984.
Gauthier Gidel, Mila Tea Talk, October 26, 2018