mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · how to choose the minibatch...
TRANSCRIPT
![Page 1: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/1.jpg)
Optimization for Data Science
Mini-batching, sampling, momentum and other tricks
Lecturer: Robert M. Gower & Alexandre Gramfort
Tutorials: Quentin Bertrand, Nidham Gazagnadou
![Page 2: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/2.jpg)
Sampled i.i.d
The Stochastic Gradient Method
Step size/ Learning rate
![Page 3: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/3.jpg)
Sampled i.i.d
The Stochastic Gradient Method
Step size/ Learning rate
What about mini-batchingWhat about mini-batching
![Page 4: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/4.jpg)
Sample mini-batch with
The Stochastic Gradient Method
![Page 5: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/5.jpg)
Sample mini-batch with
The Stochastic Gradient Method
What should b and be?
![Page 6: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/6.jpg)
Sample mini-batch with
The Stochastic Gradient Method
What should b and be? How does b influence the stepsize ?
![Page 7: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/7.jpg)
Sample mini-batch with
The Stochastic Gradient Method
What should b and be? How does b influence the stepsize ? How does the data influence the best
mini-batch and stepsize?
![Page 8: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/8.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 9: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/9.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 10: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/10.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 11: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/11.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 12: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/12.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 13: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/13.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 14: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/14.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 15: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/15.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 16: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/16.jpg)
How to choose the minibatch size?
minibatch size
step
siz
e
![Page 17: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/17.jpg)
How to choose the minibatch size?
minibatch size
step
siz
egood
bad
Cross validation score/objective function
![Page 18: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/18.jpg)
How to choose the minibatch size?
minibatch size
step
siz
eBest
parametersgood
bad
Cross validation score/objective function
![Page 19: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/19.jpg)
Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k.Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k.
How to choose the minibatch size?
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Goyal et al., CoRR 2017
minibatch size
step
siz
eBest
parametersgood
bad
Cross validation score/objective function
![Page 20: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/20.jpg)
Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k.Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k.
How to choose the minibatch size?
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Goyal et al., CoRR 2017
minibatch size
step
siz
eBest
parametersgood
bad
Cross validation score/objective function
![Page 21: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/21.jpg)
How to choose the minibatch size?
Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Goyal et al., CoRR 2017
minibatch size
step
siz
e
Cross validation score
![Page 22: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/22.jpg)
How to choose the minibatch size?
Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Goyal et al., CoRR 2017
minibatch size
step
siz
e
Cross validation score
![Page 23: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/23.jpg)
How to choose the minibatch size?good
bad
Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Goyal et al., CoRR 2017
minibatch size
step
siz
e
Cross validation score
![Page 24: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/24.jpg)
How to choose the minibatch size?good
bad
Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Goyal et al., CoRR 2017
minibatch size
step
siz
e
Missed the best one
Cross validation score
![Page 25: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/25.jpg)
How to choose the minibatch size?good
bad
Need to figure out functional relationshipbetween minibatchsize and step size
Need to figure out functional relationshipbetween minibatchsize and step size
Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.Linear Scaling Rule: When the mini-batch size is multiplied by k, multiply the learning rate by k.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Goyal et al., CoRR 2017
minibatch size
step
siz
e
Missed the best one
Cross validation score
![Page 26: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/26.jpg)
Stochastic Reformulation of Finite sum problems
![Page 27: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/27.jpg)
Random sampling vectorRandom sampling vector
Simple Stochastic Reformulation
![Page 28: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/28.jpg)
Random sampling vectorRandom sampling vector
Simple Stochastic Reformulation
![Page 29: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/29.jpg)
Random sampling vectorRandom sampling vector
Simple Stochastic Reformulation
![Page 30: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/30.jpg)
Random sampling vectorRandom sampling vector
Stochastic Reformulation
Minimizing the expectation of random linear combinations of original function
Stochastic Reformulation
Minimizing the expectation of random linear combinations of original function
Original finite sum problem
Original finite sum problem
Simple Stochastic Reformulation
![Page 31: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/31.jpg)
SGD with arbitrary sampling
![Page 32: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/32.jpg)
SGD with arbitrary sampling
By design we have that
![Page 33: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/33.jpg)
SGD with arbitrary sampling
The distribution encodes any form of i.i.d mini-batching/ non-uniform sampling.
By design we have thatExample: Gradient descent
![Page 34: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/34.jpg)
SGD with arbitrary sampling
The distribution encodes any form of i.i.d mini-batching/ non-uniform sampling.
saves time for theorists: One representation for all forms of sampling
saves time for theorists: One representation for all forms of sampling
By design we have thatExample: Gradient descent
![Page 35: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/35.jpg)
Examples of arbitrary sampling: uniform single element
Random setRandom set
![Page 36: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/36.jpg)
Examples of arbitrary sampling: uniform single element
Random setRandom set
![Page 37: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/37.jpg)
Examples of arbitrary sampling: uniform single element
Random setRandom set
![Page 38: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/38.jpg)
Examples of arbitrary sampling: uniform single element
Random setRandom set
Single element SGD Single element SGD
![Page 39: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/39.jpg)
Examples of arbitrary sampling: uniform mini-batching
Random setRandom set
Mini-batch SGD without replacementMini-batch SGD without replacement
![Page 40: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/40.jpg)
Examples of arbitrary sampling: non-uniform mini-batching
Random setRandom set
Richtárik and Takáč (arXiv:1310.3438; Opt Letters 2016)
![Page 41: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/41.jpg)
Examples of arbitrary sampling: non-uniform mini-batching
Random setRandom set
Richtárik and Takáč (arXiv:1310.3438; Opt Letters 2016)
![Page 42: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/42.jpg)
Examples of arbitrary sampling: non-uniform mini-batching
Random setRandom set
Richtárik and Takáč (arXiv:1310.3438; Opt Letters 2016)
![Page 43: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/43.jpg)
Examples of arbitrary sampling: non-uniform mini-batching
Random setRandom set
Arbitrary sampling SGD Arbitrary sampling SGD
Richtárik and Takáč (arXiv:1310.3438; Opt Letters 2016)
![Page 44: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/44.jpg)
SGD with arbitrary sampling
Includes all forms of SGD (including GD)
![Page 45: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/45.jpg)
SGD with arbitrary sampling
How to analyse this general SGD?
Includes all forms of SGD (including GD)
![Page 46: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/46.jpg)
SGD with arbitrary sampling
How to analyse this general SGD?
Includes all forms of SGD (including GD)
Look at the extremes: GD and single element SGD
![Page 47: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/47.jpg)
Assumption and convergence of Gradient Descent and SGD
![Page 48: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/48.jpg)
Reminder: Convergence GD strongly convex + smooth
Now smoothness gives
![Page 49: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/49.jpg)
Assumptions and Convergence of Gradient Descent quasi strong
convexity constant
Smoothness constant
![Page 50: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/50.jpg)
Assumptions and Convergence of Gradient Descent
Iteration complexity of gradient descentIteration complexity of gradient descent
quasi strong convexity constant
Smoothness constant
![Page 51: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/51.jpg)
Assumptions and Convergence of Stochastic Gradient Descent
Bigger smoothness constant/ stronger assumption
![Page 52: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/52.jpg)
Assumptions and Convergence of Stochastic Gradient Descent
Definition Definition
Bigger smoothness constant/ stronger assumption
![Page 53: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/53.jpg)
Assumptions and Convergence of Stochastic Gradient Descent
Iteration complexity of SGDIteration complexity of SGD
Definition Definition
Bigger smoothness constant/ stronger assumption
Needell, Srebro, Ward: Math. Prog, 2016.
![Page 54: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/54.jpg)
Informal comparison between GD and SGD iteration complexity
![Page 55: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/55.jpg)
Informal comparison between GD and SGD iteration complexity
GDGD SGDSGD
![Page 56: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/56.jpg)
Informal comparison between GD and SGD iteration complexity
GDGD SGDSGD
In general:In general:How do they compare?
![Page 57: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/57.jpg)
Informal comparison between GD and SGD iteration complexity
GDGD SGDSGD
In general:In general:How do they compare?
When n is big
![Page 58: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/58.jpg)
Informal comparison between GD and SGD iteration complexity
GDGD SGDSGD
In general:In general:How do they compare?
Need new “interpolating” notion of smoothness
When n is big
![Page 59: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/59.jpg)
Ass: Expected Smoothness. We write when Ass: Expected Smoothness. We write when
Key constant: Expected smoothness
![Page 60: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/60.jpg)
Ass: Expected Smoothness. We write when Ass: Expected Smoothness. We write when
Key constant: Expected smoothness
![Page 61: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/61.jpg)
Ass: Expected Smoothness. We write when Ass: Expected Smoothness. We write when
Key constant: Expected smoothness
Expected smoothness constantDepends on v and f
RMG, Richtárik and Bach (arXiv:1805.02632, 2018)
![Page 62: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/62.jpg)
Ass: Expected Smoothness. We write when Ass: Expected Smoothness. We write when
Key constant: Expected smoothness
Lemma: Lemma:
Expected smoothness constantDepends on v and f
RMG, Richtárik and Bach (arXiv:1805.02632, 2018)
![Page 63: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/63.jpg)
Ass: Expected Smoothness. We write when Ass: Expected Smoothness. We write when
Key constant: Expected smoothness
Lemma: Lemma:
Rough estimate (we can do better)
Expected smoothness constantDepends on v and f
RMG, Richtárik and Bach (arXiv:1805.02632, 2018)
![Page 64: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/64.jpg)
Ass: Expected Smoothness. We write when Ass: Expected Smoothness. We write when
Key constant: Expected smoothness
Lemma: Lemma:
Rough estimate (we can do better)
Expected smoothness constantDepends on v and f
RMG, Richtárik and Bach (arXiv:1805.02632, 2018)
Definition: Gradient noiseDefinition: Gradient noise
![Page 65: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/65.jpg)
Ass: Expected Smoothness. We write when Ass: Expected Smoothness. We write when
Key constant: Expected smoothness
Lemma: Lemma:
Rough estimate (we can do better)
Expected smoothness constantDepends on v and f
RMG, Richtárik and Bach (arXiv:1805.02632, 2018)
Definition: Gradient noiseDefinition: Gradient noise
Generalization of
![Page 66: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/66.jpg)
Example of Expected Smoothness
1 2 3 4 5 6 7 8 9 10 11 12 13
50
100
150
200
250
300
350
400
batchsize
ex
pe
cte
d s
mo
oth
ne
ss
S is chosen uniformly at random from all subsets of size b
EXE: In your list!
![Page 67: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/67.jpg)
Example of Expected Smoothness
1 2 3 4 5 6 7 8 9 10 11 12 13
50
100
150
200
250
300
350
400
batchsize
ex
pe
cte
d s
mo
oth
ne
ss
S is chosen uniformly at random from all subsets of size b
EXE: In your list!
![Page 68: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/68.jpg)
Example of Expected Smoothness
1 2 3 4 5 6 7 8 9 10 11 12 13
50
100
150
200
250
300
350
400
batchsize
ex
pe
cte
d s
mo
oth
ne
ss
S is chosen uniformly at random from all subsets of size b
EXE: In your list!
![Page 69: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/69.jpg)
Example of Expected Smoothness
1 2 3 4 5 6 7 8 9 10 11 12 13
50
100
150
200
250
300
350
400
batchsize
ex
pe
cte
d s
mo
oth
ne
ss
S is chosen uniformly at random from all subsets of size b
Measures how much model fits dataEXE: In your list!
![Page 70: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/70.jpg)
Expected smoothness gives awesome bound on 2nd moment
Assumption There exists B>0 Assumption There exists B>0
Normally bound on gradient is an assumption
Recht, Wright & Niu, F. Hogwild: Neurips, 2011.
Hazan & Kale, JMLR 2014.
Rakhlin, Shamir, & Sridharan, ICML 2012
Shamir & Zhang, ICML 2013.
![Page 71: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/71.jpg)
Expected smoothness gives awesome bound on 2nd moment
Assumption There exists B>0 Assumption There exists B>0
Normally bound on gradient is an assumption
Recht, Wright & Niu, F. Hogwild: Neurips, 2011.
Hazan & Kale, JMLR 2014.
Rakhlin, Shamir, & Sridharan, ICML 2012
Shamir & Zhang, ICML 2013.
![Page 72: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/72.jpg)
Expected smoothness gives awesome bound on 2nd moment
Assumption There exists B>0 Assumption There exists B>0
Normally bound on gradient is an assumption
Recht, Wright & Niu, F. Hogwild: Neurips, 2011.
Hazan & Kale, JMLR 2014.
Rakhlin, Shamir, & Sridharan, ICML 2012
Shamir & Zhang, ICML 2013.
![Page 73: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/73.jpg)
Expected smoothness gives awesome bound on 2nd moment
Lemma Lemma
Assumption There exists B>0 Assumption There exists B>0
Normally bound on gradient is an assumption
Recht, Wright & Niu, F. Hogwild: Neurips, 2011.
Hazan & Kale, JMLR 2014.
Rakhlin, Shamir, & Sridharan, ICML 2012
Shamir & Zhang, ICML 2013.
![Page 74: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/74.jpg)
Expected smoothness gives awesome bound on 2nd moment
Lemma Lemma
informative: with realistic assumptionsinformative: with realistic assumptions
Assumption There exists B>0 Assumption There exists B>0
Normally bound on gradient is an assumption
Recht, Wright & Niu, F. Hogwild: Neurips, 2011.
Hazan & Kale, JMLR 2014.
Rakhlin, Shamir, & Sridharan, ICML 2012
Shamir & Zhang, ICML 2013.
![Page 75: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/75.jpg)
Main Theorem (Linear convergence to a neighborhood)
Theorem Theorem
![Page 76: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/76.jpg)
Main Theorem (Linear convergence to a neighborhood)
Theorem Theorem
Fixed stepsize
![Page 77: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/77.jpg)
CorollaryCorollary
Main Theorem (Linear convergence to a neighborhood)
Theorem Theorem
Fixed stepsize
![Page 78: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/78.jpg)
CorollaryCorollary
Main Theorem (Linear convergence to a neighborhood)
Theorem Theorem
Fixed stepsize
saves time for theorists: Includes GD and SGD as special cases. Also tighter!
![Page 79: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/79.jpg)
Proof is SUPER EASY:
Taking expectation with respect to
Taking total expectationLemma Lemma
quasi strong conv
![Page 80: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/80.jpg)
Exercises on Sampling, Expected Smoothness + gradient noise
![Page 81: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/81.jpg)
Optimal mini-batch sizes
![Page 82: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/82.jpg)
CorollaryCorollary
Total complexity for mini-batch SGD
![Page 83: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/83.jpg)
CorollaryCorollary
Total complexity for mini-batch SGD
![Page 84: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/84.jpg)
CorollaryCorollary
Total complexity for mini-batch SGD
![Page 85: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/85.jpg)
CorollaryCorollary
Total complexity for mini-batch SGD
Total Complexity = #stochastic gradient calculated
in each iteration
![Page 86: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/86.jpg)
CorollaryCorollary
Total complexity for mini-batch SGD
Total complexity is a simple function of mini-batch size b
Total Complexity = #stochastic gradient calculated
in each iteration
![Page 87: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/87.jpg)
Optimal mini-batch size
![Page 88: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/88.jpg)
Optimal mini-batch size
Linearly increasing
![Page 89: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/89.jpg)
Optimal mini-batch size
Linearly increasing Linearly decreasing
![Page 90: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/90.jpg)
Optimal mini-batch size
Linearly increasing Linearly decreasing
![Page 91: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/91.jpg)
Optimal mini-batch size
Linearly increasing Linearly decreasing
Stepsize increases with b
![Page 92: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/92.jpg)
Optimal mini-batch size for models that interpolate data
![Page 93: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/93.jpg)
Optimal mini-batch size for models that interpolate data
![Page 94: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/94.jpg)
Optimal mini-batch size for models that interpolate data
![Page 95: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/95.jpg)
Optimal mini-batch size for models that interpolate data
![Page 96: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/96.jpg)
Optimal mini-batch size for models that interpolate data
Linearly increasing
increases with b
![Page 97: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/97.jpg)
Optimal mini-batch size for models that interpolate data
Linearly increasing
increases with b
All gains in mini-batching are due to multi-threading and cache memory?
![Page 98: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/98.jpg)
Stochastic Gradient Descent 𝛄 = 0.2
![Page 99: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/99.jpg)
Learning schedule: Constant & decreasing step sizes
Theorem Theorem
Learning rate with switch point
![Page 100: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/100.jpg)
Learning schedule: Constant & decreasing step sizes
Theorem Theorem
A stochastic condition number
Learning rate with switch point
![Page 101: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/101.jpg)
Learning schedule: Constant & decreasing step sizes
Theorem Theorem
A stochastic condition number
Learning rate with switch point
![Page 102: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/102.jpg)
Learning schedule: Constant & decreasing step sizes
Theorem Theorem
A stochastic condition number
Learning rate with switch point
![Page 103: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/103.jpg)
109Stochastic Gradient Descent with switch to decreasing stepsizes
Switch point
![Page 104: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/104.jpg)
Stochastic variance reduced methods
![Page 105: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/105.jpg)
Random sampling vectorRandom sampling vector
Stochastic Reformulation
Minimizing the expectation of random linear combinations of original function
Stochastic Reformulation
Minimizing the expectation of random linear combinations of original function
Original finite sum problem
Original finite sum problem
Simple Stochastic Reformulation
What to do about the variance?
![Page 106: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/106.jpg)
Controlled Stochastic Reformulation
![Page 107: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/107.jpg)
covariate Cancel out
Controlled Stochastic Reformulation
![Page 108: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/108.jpg)
covariate Cancel out
Controlled Stochastic Reformulation
![Page 109: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/109.jpg)
covariate Cancel out
Controlled Stochastic Reformulation
Use covariates to control the variance
Controlled Stochastic Reformulation
Use covariates to control the variance
Original finite sum problem
Original finite sum problem
Controlled Stochastic Reformulation
![Page 110: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/110.jpg)
Variance reduction with arbitrary sampling
![Page 111: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/111.jpg)
Variance reduction with arbitrary sampling
![Page 112: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/112.jpg)
Variance reduction with arbitrary sampling
![Page 113: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/113.jpg)
Variance reduction with arbitrary sampling
By design we have that
![Page 114: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/114.jpg)
Variance reduction with arbitrary sampling
By design we have that
How to choose ?How to choose ?
![Page 115: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/115.jpg)
Choosing the covariate
![Page 116: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/116.jpg)
Choosing the covariate
We would like:
![Page 117: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/117.jpg)
Choosing the covariate
We would like:
![Page 118: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/118.jpg)
Linear approximationLinear approximation
Choosing the covariate
We would like:
A reference point/ snap shot
![Page 119: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/119.jpg)
SVRG: Stochastic Variance Reduced Gradients
Reference pointReference point
SampleSample
Grad. estimateGrad. estimate
Johnson & Zhang, 2013 NIPS
![Page 120: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/120.jpg)
SVRG: Stochastic Variance Reduced Gradients
Reference pointReference point
SampleSample
Grad. estimateGrad. estimate
Johnson & Zhang, 2013 NIPS
![Page 121: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/121.jpg)
SVRG: Stochastic Variance Reduced Gradients
Reference pointReference point
SampleSample
Grad. estimateGrad. estimate
Johnson & Zhang, 2013 NIPS
![Page 122: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/122.jpg)
SVRG: Stochastic Variance Reduced Gradients
Reference pointReference point
SampleSample
Grad. estimateGrad. estimate
Johnson & Zhang, 2013 NIPS
![Page 123: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/123.jpg)
Iteration complexity for SVRG and SAGA for arbitrary sampling
Theorem for SVRG Theorem for SVRG
stepsize Iteration complexity
Sebbouh, Gazagnadou, Jelassi, Bach, G., 2019
![Page 124: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/124.jpg)
Iteration complexity for SVRG and SAGA for arbitrary sampling
Theorem for SVRG Theorem for SVRG
stepsize Iteration complexity
Sebbouh, Gazagnadou, Jelassi, Bach, G., 2019
Theorem for SAGA (and the JacSketch family of methods) Theorem for SAGA (and the JacSketch family of methods)
stepsize Iteration complexity
G., Bach, Richtarik, 2018
![Page 125: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/125.jpg)
Iteration complexity for SVRG and SAGA for arbitrary sampling
Theorem for SVRG Theorem for SVRG
stepsize Iteration complexity
Sebbouh, Gazagnadou, Jelassi, Bach, G., 2019
Theorem for SAGA (and the JacSketch family of methods) Theorem for SAGA (and the JacSketch family of methods)
stepsize Iteration complexity
G., Bach, Richtarik, 2018
Missing details due to extra definitions
![Page 126: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/126.jpg)
Total Complexity of mini-batch SVRG Sebbouh, Gazagnadou, Jelassi, Bach, G, 2019
![Page 127: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/127.jpg)
Total Complexity of mini-batch SVRG
Non-linearly increasing
Sebbouh, Gazagnadou, Jelassi, Bach, G, 2019
![Page 128: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/128.jpg)
Total Complexity of mini-batch SVRG
Non-linearly increasing
Linearly decreasing
Sebbouh, Gazagnadou, Jelassi, Bach, G, 2019
![Page 129: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/129.jpg)
Total Complexity of mini-batch SVRG
Non-linearly increasing
Linearly decreasing
Sebbouh, Gazagnadou, Jelassi, Bach, G, 2019
![Page 130: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/130.jpg)
Total Complexity of mini-batch SVRG
Non-linearly increasing
Linearly decreasing
Sebbouh, Gazagnadou, Jelassi, Bach, G, 2019
![Page 131: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/131.jpg)
Total Complexity of mini-batch SVRG
Non-linearly increasing
Linearly decreasing
Stepsize increasing with b
Sebbouh, Gazagnadou, Jelassi, Bach, G, 2019
![Page 132: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/132.jpg)
Total Complexity of mini-batch SAGA Gazagnadou, G & Salmon, ICML 2019
![Page 133: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/133.jpg)
Total Complexity of mini-batch SAGA
Linearly increasing
Gazagnadou, G & Salmon, ICML 2019
![Page 134: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/134.jpg)
Total Complexity of mini-batch SAGA
Linearly increasing Linearly decreasing
Gazagnadou, G & Salmon, ICML 2019
![Page 135: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/135.jpg)
Total Complexity of mini-batch SAGA
Linearly increasing Linearly decreasing
Gazagnadou, G & Salmon, ICML 2019
![Page 136: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/136.jpg)
Total Complexity of mini-batch SAGA
Linearly increasing Linearly decreasing
Always smallerthan 25% of data
Gazagnadou, G & Salmon, ICML 2019
![Page 137: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/137.jpg)
Total Complexity of mini-batch SAGA
Real-sim data:(n,d) = (72’309, 20’958)
Predicts good total complexity
![Page 138: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/138.jpg)
Total Complexity of mini-batch SAGA
Real-sim data:(n,d) = (72’309, 20’958)
Predicts good total complexity
![Page 139: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/139.jpg)
Total Complexity of mini-batch SAGA
Slice data:(n,d) = (53’500, 386)
![Page 140: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/140.jpg)
Total Complexity of mini-batch SAGA
Slice data:(n,d) = (53’500, 386)
Predicts good total complexity
![Page 141: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/141.jpg)
Take home message so farStochastic reformulations allow
to view all variants as simple SGDStochastic reformulations allow
to view all variants as simple SGD
To analyse all forms of sampling used through expected smooth
To analyse all forms of sampling used through expected smooth
How to calculate optimal mini-batch size of SGD, SAGA and SVRG
How to calculate optimal mini-batch size of SGD, SAGA and SVRG
Stepsize increase by orders when mini-batch size increases
Stepsize increase by orders when mini-batch size increases
![Page 142: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/142.jpg)
Take home message so farStochastic reformulations allow
to view all variants as simple SGDStochastic reformulations allow
to view all variants as simple SGD
To analyse all forms of sampling used through expected smooth
To analyse all forms of sampling used through expected smooth
How to calculate optimal mini-batch size of SGD, SAGA and SVRG
How to calculate optimal mini-batch size of SGD, SAGA and SVRG
Stepsize increase by orders when mini-batch size increases
Stepsize increase by orders when mini-batch size increases
![Page 143: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/143.jpg)
Momentum
![Page 144: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/144.jpg)
Issue with Gradient Descent
Step size/ Learning rate
![Page 145: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/145.jpg)
Max local rateMax local rate
Local rate of changeLocal rate of change
Issue with Gradient Descent
GD is the “steepest descent”
![Page 146: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/146.jpg)
Issue with Gradient Descent
Solution
Get’s stuck in “flat” valleys Give momentum to keep going
![Page 147: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/147.jpg)
Heavey Ball Method:Heavey Ball Method:
Adding some Momentum to GD
Adds “Inertia” to update
![Page 148: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/148.jpg)
Heavey Ball Method:Heavey Ball Method:
Adding some Momentum to GD
Adds “Inertia” to update
GD with momentum (GDm):GD with momentum (GDm):Adds “Momentum”
to update
![Page 149: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/149.jpg)
GDm and Heavy Ball EquivalenceGD with momentum:GD with momentum:
![Page 150: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/150.jpg)
GDm and Heavy Ball EquivalenceGD with momentum:GD with momentum:
![Page 151: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/151.jpg)
GDm and Heavy Ball EquivalenceGD with momentum:GD with momentum:
![Page 152: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/152.jpg)
GDm and Heavy Ball EquivalenceGD with momentum:GD with momentum:
![Page 153: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/153.jpg)
Heavey Ball Method:Heavey Ball Method:
GDm and Heavy Ball EquivalenceGD with momentum:GD with momentum:
![Page 154: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/154.jpg)
Convergence of Gradient Descent with Momentum
Theorem Theorem
stepsize
momentum parameter
Polyak 1964
![Page 155: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/155.jpg)
Convergence of Gradient Descent with Momentum
Theorem Theorem
stepsize
momentum parameter
CorollaryCorollary
Polyak 1964
![Page 156: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/156.jpg)
Fundamental Theorem of CalculusFundamental Theorem of Calculus
Proof sketch: GDm convergence
![Page 157: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/157.jpg)
Fundamental Theorem of CalculusFundamental Theorem of Calculus
Proof sketch: GDm convergence
![Page 158: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/158.jpg)
Fundamental Theorem of CalculusFundamental Theorem of Calculus
Proof sketch: GDm convergence
![Page 159: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/159.jpg)
Fundamental Theorem of CalculusFundamental Theorem of Calculus
Proof sketch: GDm convergence
![Page 160: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/160.jpg)
Fundamental Theorem of CalculusFundamental Theorem of Calculus
Proof sketch: GDm convergence
Depends on past. Difficult recurrence
![Page 161: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/161.jpg)
Proof: Convergence of Heavy Ball
![Page 162: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/162.jpg)
Proof: Convergence of Heavy Ball
![Page 163: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/163.jpg)
Proof: Convergence of Heavy Ball
![Page 164: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/164.jpg)
Proof: Convergence of Heavy Ball
![Page 165: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/165.jpg)
Proof: Convergence of Heavy Ball
Simple recurrence!
![Page 166: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/166.jpg)
Proof: Convergence of Heavy Ball
Simple recurrence!
![Page 167: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/167.jpg)
Proof: Convergence of Heavy Ball
![Page 168: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/168.jpg)
Proof: Convergence of Heavy Ball
EXE on Eigenvalues: EXE on Eigenvalues:
![Page 169: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/169.jpg)
Proof: Convergence of Heavy Ball
EXE on Eigenvalues: EXE on Eigenvalues:
![Page 170: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/170.jpg)
Stochastic Heavey Ball Method:Stochastic Heavey Ball Method:
Adding Momentum to SGD
Adds “Inertia” to update
SGD with momentum (SGDm):SGD with momentum (SGDm):
Sampled i.i.d
Rumelhart, Hinton, Geoffrey, Ronald, 1986, Nature
![Page 171: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/171.jpg)
SGDm and Averaging
http://fa.bianp.net/teaching/2018/COMP-652/stochastic_gradient.html
![Page 172: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/172.jpg)
SGDm and Averaging
http://fa.bianp.net/teaching/2018/COMP-652/stochastic_gradient.html
![Page 173: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/173.jpg)
SGD with momentum (SGDm):SGD with momentum (SGDm):
SGDm and Averaging
http://fa.bianp.net/teaching/2018/COMP-652/stochastic_gradient.html
![Page 174: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/174.jpg)
SGD with momentum (SGDm):SGD with momentum (SGDm):
SGDm and Averaging
Acts like an approximate variance reduction since
http://fa.bianp.net/teaching/2018/COMP-652/stochastic_gradient.html
![Page 175: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/175.jpg)
SGD with momentum (SGDm):SGD with momentum (SGDm):
SGDm and Averaging
Acts like an approximate variance reduction since
http://fa.bianp.net/teaching/2018/COMP-652/stochastic_gradient.html
![Page 176: Mini-batching, sampling, momentum and other tricks · 2021. 6. 25. · How to choose the minibatch size? good bad Need to figure out functional relationship between minibatch size](https://reader035.vdocuments.net/reader035/viewer/2022081623/6147209af4263007b1359ee8/html5/thumbnails/176.jpg)
RMG, P. Richtarik, F. Bach (2018), preprint online Stochastic quasi-gradient methods: Variance reduction via Jacobian sketching
N. Gazagnadou, RMG, J. Salmon (2019) , ICML 2019. Optimal mini-batch and step sizes for SAGA
RMG, Nicolas Loizou, Xun Qian, Alibek Sailanbayev, Egor Shulgin and Peter Richtárik (2019), ICML SGD: general analysis and improved rates
O. Sebbouh, N. Gazagnadou, S. Jelassi, F. Bach, RMG Neurips 2019, preprint online. Towards closing the gap between the theory and practice of SVRG