![Page 1: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/1.jpg)
Lecture 18 Optimization approaches to Sparse Regularized
Regression
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAA
![Page 2: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/2.jpg)
Least Squares Linear Regression
![Page 3: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/3.jpg)
Lasso
Primal-Dual pair of problems
![Page 4: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/4.jpg)
Optimality Conditions
![Page 5: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/5.jpg)
An active set approach
![Page 6: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/6.jpg)
Optimality Conditions
![Page 7: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/7.jpg)
Active set approach
![Page 8: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/8.jpg)
Active set approach
![Page 9: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/9.jpg)
Checking optimality, choosing next nonzero element
![Page 10: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/10.jpg)
Least angle regression
AT
A
x
b
![Page 11: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/11.jpg)
Least angle regression
AT
A
x
b
![Page 12: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/12.jpg)
Least angle regression
AT
(A(p,n)x(p,n)-b)
![Page 13: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/13.jpg)
Least angle regression
AT
(A(p,n)x(p,n)-b)=r
residual
If all angles are “big” (defined by ¸) or r is small then we are done!
![Page 14: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/14.jpg)
![Page 15: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/15.jpg)
Computing regularization path
![Page 16: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/16.jpg)
Per iteration cost
Update factorization of A(p,n)TA(p.n) at each step - O(mk)
Memory – O(k2)
Compute AT(A(p,n)x(p,n)): O(nm) (improved by “sifting”)
Can be too costly to compute and to store.
![Page 17: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/17.jpg)
Coordinate descent
![Page 18: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/18.jpg)
Coordinate descent
Soft-thresholding operator
![Page 19: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/19.jpg)
![Page 20: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/20.jpg)
Given the scaled gradient
Can choose coordinate to update by:
• Simply cycle through all coordinates
• Update all at once
• Choose the one with largest gradient component
• Choose the one with largest obj. function improvement
• Choose coordinate at random
![Page 21: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/21.jpg)
![Page 22: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/22.jpg)
![Page 23: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/23.jpg)
First order methods
![Page 24: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/24.jpg)
• Consider:
• Linear lower approximation
• Quadratic upper approximation
First-order proximal gradient methods
![Page 25: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/25.jpg)
• Minimize quadratic upper approximation on each iteration
• If µ· 1/L then
First-order proximal gradient method
![Page 26: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/26.jpg)
• Minimize upper approximation at an intermediate point.
• If µ· 1/L then
Accelerated first-order method Nesterov, ’83, ‘00s,
Beck&Teboulle ‘09
![Page 27: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/27.jpg)
• Minimize upper approximation at an intermediate point.
• If µ· 1/L then in iterations finds solution
Complexity of accelerated first-order method Nesterov, ’83, ‘00s,
Beck&Teboulle ‘09
This method is optimal if only gradient information is used.
![Page 28: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/28.jpg)
• Consider:
• Quadratic upper approximation
Prox method with nonsmooth term
Assume that g(y) is such that the above function is easy to optimize over y
![Page 29: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/29.jpg)
l Minimize quadratic upper relaxation on each iteration
l If µ· 1/L then in O(1/²) iterations finds solution
Beck&Teboulle, Tseng, Auslender&Teboulle, 2008
First-order method for nonsmooth functions
![Page 30: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/30.jpg)
l Minimize a upper approximation at an intermediate point.
l If µ· 1/L then in iterations finds solution
Fast-first order methods
Beck&Teboulle, Tseng, 2008
![Page 31: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/31.jpg)
g(y) in sparse regression
![Page 32: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/32.jpg)
l Minimize upper approximation function Qf(x,y) on each iteration
Example 1 (Lasso)
Closed form solution!
O(n) effort
![Page 33: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/33.jpg)
Gradient method for Lasso
2 matrix/vector multiplications + shrinkage operator per iteration
O(1/²) iteration bound
![Page 34: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/34.jpg)
Sparse logistic regression
A gradient computation + shrinkage operator per iteration
O(1/²) iteration bound
![Page 35: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/35.jpg)
l Very similar to the previous case, but with ||.|| instead of |.|
Example 2 (Group Lasso)
Closed form solution!
O(n) effort
![Page 36: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/36.jpg)
SIMPLIFIED ACTIVE SET (EXTRA SLIDES)
![Page 37: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/37.jpg)
Optimality Conditions
![Page 38: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/38.jpg)
Active set approach
![Page 39: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/39.jpg)
Least angle regression
AT
A
x
b
![Page 40: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/40.jpg)
Least angle regression
AT
A
x
b
![Page 41: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/41.jpg)
Least angle regression
AT
(ABxB-b)
![Page 42: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/42.jpg)
Least angle regression
AT
(ABxB-b)=r
residual
![Page 43: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader034.vdocuments.net/reader034/viewer/2022050404/5f815a1b1b018d4ef135d35c/html5/thumbnails/43.jpg)
Per iteration cost
Update factorization of ABTAB: O(mk) if AB2 Rm x k
Compute AT(ABxB): O(nm) (can be improved in practice)
Can be too costly to compute and to store. We’ll now see how to avoid any matrix factorizations