Transcript
Page 1: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Lecture 18 Optimization approaches to Sparse Regularized

Regression

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAA

Page 2: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least Squares Linear Regression

Page 3: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Lasso

Primal-Dual pair of problems

Page 4: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Optimality Conditions

Page 5: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

An active set approach

Page 6: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Optimality Conditions

Page 7: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Active set approach

Page 8: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Active set approach

Page 9: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Checking optimality, choosing next nonzero element

Page 10: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

A

x

b

Page 11: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

A

x

b

Page 12: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

(A(p,n)x(p,n)-b)

Page 13: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

(A(p,n)x(p,n)-b)=r

residual

If all angles are “big” (defined by ¸) or r is small then we are done!

Page 14: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you
Page 15: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Computing regularization path

Page 16: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Per iteration cost

Update factorization of A(p,n)TA(p.n) at each step - O(mk)

Memory – O(k2)

Compute AT(A(p,n)x(p,n)): O(nm) (improved by “sifting”)

Can be too costly to compute and to store.

Page 17: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Coordinate descent

Page 18: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Coordinate descent

Soft-thresholding operator

Page 19: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you
Page 20: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Given the scaled gradient

Can choose coordinate to update by:

• Simply cycle through all coordinates

• Update all at once

• Choose the one with largest gradient component

• Choose the one with largest obj. function improvement

• Choose coordinate at random

Page 21: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you
Page 22: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you
Page 23: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

First order methods

Page 24: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

•  Consider:

•  Linear lower approximation

•  Quadratic upper approximation

First-order proximal gradient methods

Page 25: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

•  Minimize quadratic upper approximation on each iteration

•  If µ· 1/L then

First-order proximal gradient method

Page 26: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then

Accelerated first-order method Nesterov, ’83, ‘00s,

Beck&Teboulle ‘09

Page 27: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then in iterations finds solution

Complexity of accelerated first-order method Nesterov, ’83, ‘00s,

Beck&Teboulle ‘09

This method is optimal if only gradient information is used.

Page 28: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

•  Consider:

•  Quadratic upper approximation

Prox method with nonsmooth term

Assume that g(y) is such that the above function is easy to optimize over y

Page 29: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

l  Minimize quadratic upper relaxation on each iteration

l  If µ· 1/L then in O(1/²) iterations finds solution

Beck&Teboulle, Tseng, Auslender&Teboulle, 2008

First-order method for nonsmooth functions

Page 30: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

l  Minimize a upper approximation at an intermediate point.

l  If µ· 1/L then in iterations finds solution

Fast-first order methods

Beck&Teboulle, Tseng, 2008

Page 31: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

g(y) in sparse regression

Page 32: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

l  Minimize upper approximation function Qf(x,y) on each iteration

Example 1 (Lasso)

Closed form solution!

O(n) effort

Page 33: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Gradient method for Lasso

2 matrix/vector multiplications + shrinkage operator per iteration

O(1/²) iteration bound

Page 34: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Sparse logistic regression

A gradient computation + shrinkage operator per iteration

O(1/²) iteration bound

Page 35: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

l  Very similar to the previous case, but with ||.|| instead of |.|

Example 2 (Group Lasso)

Closed form solution!

O(n) effort

Page 36: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

SIMPLIFIED ACTIVE SET (EXTRA SLIDES)

Page 37: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Optimality Conditions

Page 38: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Active set approach

Page 39: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

A

x

b

Page 40: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

A

x

b

Page 41: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

(ABxB-b)

Page 42: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Least angle regression

AT

(ABxB-b)=r

residual

Page 43: Lecture 18 Optimization approaches to Sparse …Lecture 18 Optimization approaches to Sparse Regularized Regression TexPoint fonts used in EMF. Read the TexPoint manual before you

Per iteration cost

Update factorization of ABTAB: O(mk) if AB2 Rm x k

Compute AT(ABxB): O(nm) (can be improved in practice)

Can be too costly to compute and to store. We’ll now see how to avoid any matrix factorizations


Top Related