optimization for datascience - robert m. gower · master 2 data science, institut polytechnique de...
TRANSCRIPT
![Page 1: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/1.jpg)
1
Optimization for Datascience
Proximal operator and proximal gradient methods
Lecturer: Robert M. Gower & Alexandre Gramfort
Tutorials: Quentin Bertrand, Nidham Gazagnadou
Master 2 Data Science, Institut Polytechnique de Paris (IPP)
![Page 2: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/2.jpg)
2
References for todays class
Amir Beck and Marc Teboulle (2009), SIAM J. IMAGING SCIENCES, A Fast Iterative Shrinkage-Thresholding Algorithmfor Linear Inverse Problems.
Sébastien Bubeck (2015)Convex Optimization: Algorithms andComplexity
Chapter 1 and Section 5.1
![Page 3: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/3.jpg)
3
A Datum Function
Finite Sum Training Problem
Optimization Sum of Terms
![Page 4: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/4.jpg)
4
The Training Problem
![Page 5: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/5.jpg)
5Convergence GD I
Theorem
Let f be convex and L-smooth.
Where
![Page 6: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/6.jpg)
6Convergence GD I
Theorem
Let f be convex and L-smooth.
Where
Is f always differentiable?
![Page 7: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/7.jpg)
7Convergence GD I
Theorem
Let f be convex and L-smooth.
Where
Not true for many problems
Is f always differentiable?
![Page 8: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/8.jpg)
8Change notation: Keep loss and regularizor separate
Loss function
The Training problem
If L or R is not differentiable
L+R is not differentiable
If L or R is not smooth
L+R is not smooth
![Page 9: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/9.jpg)
9
Non-smooth Example
![Page 10: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/10.jpg)
10
Non-smooth Example
Does not fit. Not smooth
![Page 11: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/11.jpg)
11
Non-smooth Example
Does not fit. Not smooth
The problem
![Page 12: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/12.jpg)
12
Non-smooth Example
Does not fit. Not smooth
Need more tools
The problem
![Page 13: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/13.jpg)
13
Non-smooth Example
Does not fit. Not smooth
Need more tools
The problem
![Page 14: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/14.jpg)
14
Assumptions for this class
The Training problem
What does this mean?
Assume this is easy to solve
![Page 15: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/15.jpg)
15
Examples
SVM with soft margin
Lasso
Low Rank Matrix RecoveryNot smooth, but prox is easy
Not smooth
![Page 16: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/16.jpg)
16
Convexity: Subgradient
![Page 17: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/17.jpg)
17
Convexity: Subgradient
g =0
![Page 18: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/18.jpg)
18
Examples: L1 norm
![Page 19: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/19.jpg)
19
Optimality conditions
The Training problem
![Page 20: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/20.jpg)
20
Working example: Lasso
Lasso
Difficult inclusion, do iteratively.
![Page 21: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/21.jpg)
21
Proximal method I
The w that minimizes the upper bound gives gradient descent
![Page 22: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/22.jpg)
22
Proximal method I
The w that minimizes the upper bound gives gradient descent
But what about R(w)? Adding on + to upper bound:
![Page 23: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/23.jpg)
23
Proximal method I
The w that minimizes the upper bound gives gradient descent
But what about R(w)? Adding on + to upper bound:
![Page 24: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/24.jpg)
24
Proximal method I
The w that minimizes the upper bound gives gradient descent
But what about R(w)? Adding on + to upper bound:
Can we minimize the right-hand side?
![Page 25: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/25.jpg)
25Proximal method: iteratively minimizes an upper boundMinimizing the right-hand side of
![Page 26: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/26.jpg)
26Proximal method: minimizes an upperbound viewpointSet y = wt and minimize the right-hand side in w
This suggests an iterative method
![Page 27: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/27.jpg)
27Proximal method: minimizes an upperbound viewpointSet y = wt and minimize the right-hand side in w
This suggests an iterative method
What is this prox operator?
![Page 28: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/28.jpg)
28
EXE: Let
Show that
Gradient Descent using proximal map
A gradient step is also a proximal step
![Page 29: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/29.jpg)
29Proximal Operator: Well defined inclusion
EXE: Is this Proximal operator well defined? Is it even a function?
![Page 30: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/30.jpg)
30Proximal Operator: Well defined inclusion
EXE: Is this Proximal operator well defined? Is it even a function?
![Page 31: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/31.jpg)
31Proximal Operator: Well defined inclusion
Rearranging
EXE: Is this Proximal operator well defined? Is it even a function?
![Page 32: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/32.jpg)
32Proximal Method: A fixed point viewpoint
The Training problem
![Page 33: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/33.jpg)
33Proximal Method: A fixed point viewpoint
The Training problem
![Page 34: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/34.jpg)
34Proximal Method: A fixed point viewpoint
The Training problem
![Page 35: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/35.jpg)
35Proximal Method: A fixed point viewpoint
The Training problem
![Page 36: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/36.jpg)
36Proximal Method: A fixed point viewpoint
The Training problem
Optimal is a fixed point
![Page 37: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/37.jpg)
37Proximal Method: A fixed point viewpoint
The Training problem
Optimal is a fixed point
Upperbound viewpoint
![Page 38: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/38.jpg)
38
Proximal Operator: Properties
Exe:
![Page 39: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/39.jpg)
39
Proximal Operator: Soft thresholding
Exe:
![Page 40: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/40.jpg)
41Proximal Operator: Singular value thresholding
Similarly, the prox of the nuclear norm for matrices:
![Page 41: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/41.jpg)
42Proximal method: iteratively minimizes an upper boundMinimizing the right-hand side of
Make iterative method based on this upper bound minimization
![Page 42: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/42.jpg)
43
The Proximal Gradient Method
![Page 43: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/43.jpg)
44Example of prox gradient: Iterative Soft Thresholding Algorithm (ISTA)
Lasso
Amir Beck and Marc Teboulle (2009), SIAM J. IMAGING SCIENCES, A Fast Iterative Shrinkage-Thresholding Algorithmfor Linear Inverse Problems.
ISTA:
![Page 44: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/44.jpg)
45Convergence of Prox-GDTheorem (Beck Teboulle 2009)
Then
where
Amir Beck and Marc Teboulle (2009), SIAM J. IMAGING SCIENCES, A Fast Iterative Shrinkage-Thresholding Algorithmfor Linear Inverse Problems.
![Page 45: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/45.jpg)
46Convergence of Prox-GDTheorem (Beck Teboulle 2009)
Then
where
Can we do better?
Amir Beck and Marc Teboulle (2009), SIAM J. IMAGING SCIENCES, A Fast Iterative Shrinkage-Thresholding Algorithmfor Linear Inverse Problems.
![Page 46: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/46.jpg)
47The FISTA Method
![Page 47: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/47.jpg)
48The FISTA Method
Weird, but it works
![Page 48: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/48.jpg)
49Convergence of FISTATheorem (Beck Teboulle 2009)
Then
Where wt are given by the FISTA algorithm
Amir Beck and Marc Teboulle (2009), SIAM J. IMAGING SCIENCES, A Fast Iterative Shrinkage-Thresholding Algorithmfor Linear Inverse Problems.
![Page 49: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/49.jpg)
50Convergence of FISTATheorem (Beck Teboulle 2009)
Then
Where wt are given by the FISTA algorithm
Is this as good as it gets?
Amir Beck and Marc Teboulle (2009), SIAM J. IMAGING SCIENCES, A Fast Iterative Shrinkage-Thresholding Algorithmfor Linear Inverse Problems.
![Page 50: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/50.jpg)
51Lab Session 30.09
Room C129 and C130
Bring your laptop Please install:Python, matplotlib, scipy and numpy
![Page 51: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/51.jpg)
52Lab Session 30.09
Room C129 and C130
Bring your laptop Please install:Python, matplotlib, scipy and numpy
![Page 52: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/52.jpg)
53
Introduction to Stochastic Gradient Descent
![Page 53: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/53.jpg)
54
General methods
Recap
● Gradient Descent
Training Problem
Two parts● Proximal gradient
(ISTA)● Fast proximal
gradient (FISTA)
L(w)
![Page 54: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/54.jpg)
55
A Datum Function
Finite Sum Training Problem
Optimization Sum of Terms
Can we use this sum structure?
![Page 55: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/55.jpg)
56
The Training Problem
![Page 56: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/56.jpg)
57
The Training Problem
![Page 57: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/57.jpg)
58
Stochastic Gradient Descent
Is it possible to design a method that uses only the gradient of a single data function at each iteration?
![Page 58: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/58.jpg)
59
Stochastic Gradient Descent
Is it possible to design a method that uses only the gradient of a single data function at each iteration?
Unbiased EstimateLet j be a random index sampled from {1, …, n} selected uniformly at random. Then
![Page 59: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/59.jpg)
60
Stochastic Gradient Descent
Is it possible to design a method that uses only the gradient of a single data function at each iteration?
Unbiased EstimateLet j be a random index sampled from {1, …, n} selected uniformly at random. Then
![Page 60: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/60.jpg)
61
Stochastic Gradient Descent
Is it possible to design a method that uses only the gradient of a single data function at each iteration?
Unbiased EstimateLet j be a random index sampled from {1, …, n} selected uniformly at random. Then
EXE:
![Page 61: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/61.jpg)
62
Stochastic Gradient Descent
![Page 62: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/62.jpg)
63
Stochastic Gradient Descent
Optimal point
![Page 63: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/63.jpg)
64
Strong Convexity
Assumptions for Convergence
![Page 64: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/64.jpg)
65
Strong Convexity
Assumptions for Convergence
![Page 65: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/65.jpg)
66
Expected Bounded Stochastic Gradients
Strong Convexity
Assumptions for Convergence
![Page 66: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/66.jpg)
67
Expected Bounded Stochastic Gradients
Strong Convexity
Assumptions for Convergence
![Page 67: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/67.jpg)
68Complexity / Convergence
Theorem
![Page 68: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/68.jpg)
69Complexity / Convergence
Theorem
![Page 69: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/69.jpg)
70Complexity / Convergence
Theorem
![Page 70: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/70.jpg)
71Proof:
Unbiased estimatorTaking expectation with respect to j
Taking total expectationStrong conv.
Bounded Stoch grad
![Page 71: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/71.jpg)
72Stochastic Gradient Descent α =0.01
![Page 72: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/72.jpg)
73Stochastic Gradient Descent α =0.1
![Page 73: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/73.jpg)
74Stochastic Gradient Descent α =0.2
![Page 74: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/74.jpg)
75Stochastic Gradient Descent α =0.5
![Page 75: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/75.jpg)
76
Expected Bounded Stochastic Gradients
Strong Convexity
Assumptions for Convergence
![Page 76: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/76.jpg)
77
Expected Bounded Stochastic Gradients
Strong Convexity
Assumptions for Convergence
![Page 77: Optimization for Datascience - Robert M. Gower · Master 2 Data Science, Institut Polytechnique de Paris (IPP) 2 References for todays class Amir Beck and Marc Teboulle (2009), SIAM](https://reader033.vdocuments.net/reader033/viewer/2022060606/605c46a2c064563a3a28d81e/html5/thumbnails/77.jpg)
78
Expected Bounded Stochastic Gradients
Strong Convexity
Assumptions for Convergence