on nesterov’s nonsmooth chebyshev-rosenbrock functionsaspremon/houches/talks/leshouches... ·...
TRANSCRIPT
![Page 1: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/1.jpg)
1 / 48
On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functions
Michael L. OvertonCourant Institute of Mathematical Sciences
New York University
Les Houches, 8 February 2016
![Page 2: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/2.jpg)
Yurii Nesterov
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
2 / 48
![Page 3: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/3.jpg)
Yurii Nesterov
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
2 / 48
■ It seems we first met in 1988 at the Tokyo ISMP. We don’thave a proof of this, but we do have a proof that we wereboth at the meeting: we both used the beautiful gray bagwith the Samurai warrior design for many years, bringing it toother conferences long after everyone else abandoned theirs!
![Page 4: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/4.jpg)
Yurii Nesterov
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
2 / 48
■ It seems we first met in 1988 at the Tokyo ISMP. We don’thave a proof of this, but we do have a proof that we wereboth at the meeting: we both used the beautiful gray bagwith the Samurai warrior design for many years, bringing it toother conferences long after everyone else abandoned theirs!
■ We definitely met in 1994 at the Ann Arbor ISMP, where Ilearned about the Nesterov-Todd primal-dual interior-pointalgorithm for SDP.
![Page 5: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/5.jpg)
Yurii Nesterov
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
2 / 48
■ It seems we first met in 1988 at the Tokyo ISMP. We don’thave a proof of this, but we do have a proof that we wereboth at the meeting: we both used the beautiful gray bagwith the Samurai warrior design for many years, bringing it toother conferences long after everyone else abandoned theirs!
■ We definitely met in 1994 at the Ann Arbor ISMP, where Ilearned about the Nesterov-Todd primal-dual interior-pointalgorithm for SDP.
■ We met again on many subsequent occasions, most notablyduring very enjoyable extended visits to Louvain-la-neuve in2004 and 2008.
![Page 6: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/6.jpg)
Yurii Nesterov
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
2 / 48
■ It seems we first met in 1988 at the Tokyo ISMP. We don’thave a proof of this, but we do have a proof that we wereboth at the meeting: we both used the beautiful gray bagwith the Samurai warrior design for many years, bringing it toother conferences long after everyone else abandoned theirs!
■ We definitely met in 1994 at the Ann Arbor ISMP, where Ilearned about the Nesterov-Todd primal-dual interior-pointalgorithm for SDP.
■ We met again on many subsequent occasions, most notablyduring very enjoyable extended visits to Louvain-la-neuve in2004 and 2008.
■ Always a great pleasure to interact with this brilliant butmodest colleague!
![Page 7: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/7.jpg)
Introduction
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
3 / 48
![Page 8: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/8.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
![Page 9: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/9.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
■ Continuous
![Page 10: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/10.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
■ Continuous■ Not differentiable everywhere, in particular often not
differentiable at local minimizers
![Page 11: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/11.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
■ Continuous■ Not differentiable everywhere, in particular often not
differentiable at local minimizers■ Not convex
![Page 12: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/12.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
■ Continuous■ Not differentiable everywhere, in particular often not
differentiable at local minimizers■ Not convex■ Usually, but not always, locally Lipschitz
![Page 13: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/13.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
■ Continuous■ Not differentiable everywhere, in particular often not
differentiable at local minimizers■ Not convex■ Usually, but not always, locally Lipschitz
Lots of interesting applications
![Page 14: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/14.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
■ Continuous■ Not differentiable everywhere, in particular often not
differentiable at local minimizers■ Not convex■ Usually, but not always, locally Lipschitz
Lots of interesting applications
Any locally Lipschitz function is differentiable almost everywhereon its domain. So, whp, can evaluate gradient at any given point.
![Page 15: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/15.jpg)
Nonsmooth, Nonconvex Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
4 / 48
Problem: find x that locally minimizes f , where f : Rn → R is
■ Continuous■ Not differentiable everywhere, in particular often not
differentiable at local minimizers■ Not convex■ Usually, but not always, locally Lipschitz
Lots of interesting applications
Any locally Lipschitz function is differentiable almost everywhereon its domain. So, whp, can evaluate gradient at any given point.
What happens if we simply use steepest descent (gradientdescent) with a standard line search?
![Page 16: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/16.jpg)
Example
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
5 / 48
f(x)=10*|x2 − x
12| + (1−x
1)2
steepest descent iterates−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
![Page 17: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/17.jpg)
Methods Suitable for Nonsmooth Functions
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
6 / 48
In fact, it’s been known for several decades that at any giveniterate, one should exploit the gradient information obtained atseveral points, not just at one point. Some such methods:
![Page 18: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/18.jpg)
Methods Suitable for Nonsmooth Functions
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
6 / 48
In fact, it’s been known for several decades that at any giveniterate, one should exploit the gradient information obtained atseveral points, not just at one point. Some such methods:
■ Bundle methods (C. Lemarechal, K.C. Kiwiel, etc.):extensive practical use and theoretical analysis, butcomplicated in nonconvex case
![Page 19: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/19.jpg)
Methods Suitable for Nonsmooth Functions
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
6 / 48
In fact, it’s been known for several decades that at any giveniterate, one should exploit the gradient information obtained atseveral points, not just at one point. Some such methods:
■ Bundle methods (C. Lemarechal, K.C. Kiwiel, etc.):extensive practical use and theoretical analysis, butcomplicated in nonconvex case
■ Gradient sampling: an easily stated method with niceconvergence theory (J.V. Burke, A.S. Lewis, M.L.O., 2005;K.C. Kiwiel, 2007), but computationally intensive
![Page 20: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/20.jpg)
Methods Suitable for Nonsmooth Functions
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
6 / 48
In fact, it’s been known for several decades that at any giveniterate, one should exploit the gradient information obtained atseveral points, not just at one point. Some such methods:
■ Bundle methods (C. Lemarechal, K.C. Kiwiel, etc.):extensive practical use and theoretical analysis, butcomplicated in nonconvex case
■ Gradient sampling: an easily stated method with niceconvergence theory (J.V. Burke, A.S. Lewis, M.L.O., 2005;K.C. Kiwiel, 2007), but computationally intensive
■ BFGS: traditional workhorse for smooth optimization, worksamazingly well for nonsmooth optimization too, but verylimited convergence theory
![Page 21: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/21.jpg)
Methods Suitable for Nonsmooth Functions
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
6 / 48
In fact, it’s been known for several decades that at any giveniterate, one should exploit the gradient information obtained atseveral points, not just at one point. Some such methods:
■ Bundle methods (C. Lemarechal, K.C. Kiwiel, etc.):extensive practical use and theoretical analysis, butcomplicated in nonconvex case
■ Gradient sampling: an easily stated method with niceconvergence theory (J.V. Burke, A.S. Lewis, M.L.O., 2005;K.C. Kiwiel, 2007), but computationally intensive
■ BFGS: traditional workhorse for smooth optimization, worksamazingly well for nonsmooth optimization too, but verylimited convergence theory
A completely different approach using randomized gradient-freemethods: the first complexity result for nonsmooth, nonconvexoptimization (Y. Nesterov and V. Spokoiny, JFoCM, 2015).
![Page 22: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/22.jpg)
Failure of Steepest Descent: Simpler Example
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
7 / 48
Let f(x) = 6|x1|+ 3x2. Note that f is polyhedral and convex.
![Page 23: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/23.jpg)
Failure of Steepest Descent: Simpler Example
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
7 / 48
Let f(x) = 6|x1|+ 3x2. Note that f is polyhedral and convex.
On this function, using a bisection-based backtracking line
search with “Armijo” parameter in [0, 13 ] and starting at
[23
],
steepest descent generates the sequence
2−k
[2(−1)k
3
], k = 1, 2, . . . ,
converging to
[00
].
![Page 24: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/24.jpg)
Failure of Steepest Descent: Simpler Example
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
7 / 48
Let f(x) = 6|x1|+ 3x2. Note that f is polyhedral and convex.
On this function, using a bisection-based backtracking line
search with “Armijo” parameter in [0, 13 ] and starting at
[23
],
steepest descent generates the sequence
2−k
[2(−1)k
3
], k = 1, 2, . . . ,
converging to
[00
].
In contrast, BFGS with the same line search rapidly reduces thefunction value towards −∞ (arbitrarily far, in exact arithmetic)(A.S. Lewis and S. Zhang, 2010).
![Page 25: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/25.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
![Page 26: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/26.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
![Page 27: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/27.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
![Page 28: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/28.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
![Page 29: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/29.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0
![Page 30: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/30.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0■ Armijo-Wolfe line search: find t so that f(x+ td) < f(x) + βtα
and ∇f(x+ td)T d > γα
![Page 31: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/31.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0■ Armijo-Wolfe line search: find t so that f(x+ td) < f(x) + βtα
and ∇f(x+ td)T d > γα
■ Set s = td, y = ∇f(x+ td)−∇f(x)
![Page 32: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/32.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0■ Armijo-Wolfe line search: find t so that f(x+ td) < f(x) + βtα
and ∇f(x+ td)T d > γα
■ Set s = td, y = ∇f(x+ td)−∇f(x)■ Replace x by x+ td
![Page 33: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/33.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0■ Armijo-Wolfe line search: find t so that f(x+ td) < f(x) + βtα
and ∇f(x+ td)T d > γα
■ Set s = td, y = ∇f(x+ td)−∇f(x)■ Replace x by x+ td
■ Replace H by V HV T + 1
sT yssT , where V = I − 1
sT ysyT
![Page 34: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/34.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0■ Armijo-Wolfe line search: find t so that f(x+ td) < f(x) + βtα
and ∇f(x+ td)T d > γα
■ Set s = td, y = ∇f(x+ td)−∇f(x)■ Replace x by x+ td
■ Replace H by V HV T + 1
sT yssT , where V = I − 1
sT ysyT
Note that H can be computed in O(n2) operations since V is arank one perturbation of the identity
![Page 35: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/35.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0■ Armijo-Wolfe line search: find t so that f(x+ td) < f(x) + βtα
and ∇f(x+ td)T d > γα
■ Set s = td, y = ∇f(x+ td)−∇f(x)■ Replace x by x+ td
■ Replace H by V HV T + 1
sT yssT , where V = I − 1
sT ysyT
Note that H can be computed in O(n2) operations since V is arank one perturbation of the identityThe Armijo condition ensures “sufficient decrease” in f
![Page 36: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/36.jpg)
The BFGS Method (“Full” Version)
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
8 / 48
Broyden, Fletcher, Goldfarb, Shanno independently, 1970
Choose line search parameters 0 < β < γ < 1
Initialize iterate x and positive-definite symmetric matrix H
(which is supposed to approximate the inverse Hessian of f)
Repeat
■ Set d = −H∇f(x). Let α = ∇f(x)T d < 0■ Armijo-Wolfe line search: find t so that f(x+ td) < f(x) + βtα
and ∇f(x+ td)T d > γα
■ Set s = td, y = ∇f(x+ td)−∇f(x)■ Replace x by x+ td
■ Replace H by V HV T + 1
sT yssT , where V = I − 1
sT ysyT
Note that H can be computed in O(n2) operations since V is arank one perturbation of the identityThe Armijo condition ensures “sufficient decrease” in f
The Wolfe condition ensures that the directional derivative alongthe line increases algebraically, which guarantees that sT y > 0and that the new H is positive definite.
![Page 37: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/37.jpg)
BFGS for Nonsmooth Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
9 / 48
In 1982, C. Lemarechal observed that quasi-Newton methods can beeffective for nonsmooth optimization, but dismissed them as there wasno theory behind them and no good way to terminate them.
![Page 38: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/38.jpg)
BFGS for Nonsmooth Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
9 / 48
In 1982, C. Lemarechal observed that quasi-Newton methods can beeffective for nonsmooth optimization, but dismissed them as there wasno theory behind them and no good way to terminate them.
Otherwise, there is not much in the literature on the subject untilA.S. Lewis and M.L.O. (Math. Prog., 2013): we address both issues indetail, but our convergence results are limited to very special cases.
![Page 39: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/39.jpg)
BFGS for Nonsmooth Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
9 / 48
In 1982, C. Lemarechal observed that quasi-Newton methods can beeffective for nonsmooth optimization, but dismissed them as there wasno theory behind them and no good way to terminate them.
Otherwise, there is not much in the literature on the subject untilA.S. Lewis and M.L.O. (Math. Prog., 2013): we address both issues indetail, but our convergence results are limited to very special cases.
Key point: use the original Armijo-Wolfe line search. Do not insist onreducing the magnitude of the directional derivative along the line!
![Page 40: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/40.jpg)
BFGS for Nonsmooth Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
9 / 48
In 1982, C. Lemarechal observed that quasi-Newton methods can beeffective for nonsmooth optimization, but dismissed them as there wasno theory behind them and no good way to terminate them.
Otherwise, there is not much in the literature on the subject untilA.S. Lewis and M.L.O. (Math. Prog., 2013): we address both issues indetail, but our convergence results are limited to very special cases.
Key point: use the original Armijo-Wolfe line search. Do not insist onreducing the magnitude of the directional derivative along the line!
In the nonsmooth case, BFGS builds a very ill-conditioned inverse“Hessian” approximation, with some tiny eigenvalues converging tozero, corresponding to “infinitely large” curvature in the directionsdefined by the associated eigenvectors.
![Page 41: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/41.jpg)
BFGS for Nonsmooth Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
9 / 48
In 1982, C. Lemarechal observed that quasi-Newton methods can beeffective for nonsmooth optimization, but dismissed them as there wasno theory behind them and no good way to terminate them.
Otherwise, there is not much in the literature on the subject untilA.S. Lewis and M.L.O. (Math. Prog., 2013): we address both issues indetail, but our convergence results are limited to very special cases.
Key point: use the original Armijo-Wolfe line search. Do not insist onreducing the magnitude of the directional derivative along the line!
In the nonsmooth case, BFGS builds a very ill-conditioned inverse“Hessian” approximation, with some tiny eigenvalues converging tozero, corresponding to “infinitely large” curvature in the directionsdefined by the associated eigenvectors.
Remarkably, the condition number of the inverse Hessianapproximation typically reaches 1016 before the method breaks down.
![Page 42: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/42.jpg)
BFGS for Nonsmooth Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
9 / 48
In 1982, C. Lemarechal observed that quasi-Newton methods can beeffective for nonsmooth optimization, but dismissed them as there wasno theory behind them and no good way to terminate them.
Otherwise, there is not much in the literature on the subject untilA.S. Lewis and M.L.O. (Math. Prog., 2013): we address both issues indetail, but our convergence results are limited to very special cases.
Key point: use the original Armijo-Wolfe line search. Do not insist onreducing the magnitude of the directional derivative along the line!
In the nonsmooth case, BFGS builds a very ill-conditioned inverse“Hessian” approximation, with some tiny eigenvalues converging tozero, corresponding to “infinitely large” curvature in the directionsdefined by the associated eigenvectors.
Remarkably, the condition number of the inverse Hessianapproximation typically reaches 1016 before the method breaks down.
We have never seen convergence to non-stationary points that cannotbe explained by numerical difficulties.
![Page 43: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/43.jpg)
BFGS for Nonsmooth Optimization
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
9 / 48
In 1982, C. Lemarechal observed that quasi-Newton methods can beeffective for nonsmooth optimization, but dismissed them as there wasno theory behind them and no good way to terminate them.
Otherwise, there is not much in the literature on the subject untilA.S. Lewis and M.L.O. (Math. Prog., 2013): we address both issues indetail, but our convergence results are limited to very special cases.
Key point: use the original Armijo-Wolfe line search. Do not insist onreducing the magnitude of the directional derivative along the line!
In the nonsmooth case, BFGS builds a very ill-conditioned inverse“Hessian” approximation, with some tiny eigenvalues converging tozero, corresponding to “infinitely large” curvature in the directionsdefined by the associated eigenvectors.
Remarkably, the condition number of the inverse Hessianapproximation typically reaches 1016 before the method breaks down.
We have never seen convergence to non-stationary points that cannotbe explained by numerical difficulties.
Convergence rate of BFGS is typically linear (not superlinear) in thenonsmooth case.
![Page 44: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/44.jpg)
With BFGS
Yurii Nesterov
IntroductionNonsmooth,NonconvexOptimization
Example
Methods Suitable forNonsmoothFunctionsFailure of SteepestDescent: SimplerExample
The BFGS Method(“Full” Version)
BFGS forNonsmoothOptimization
With BFGS
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
10 / 48
f(x)=10*|x2 − x
12| + (1−x
1)2
steepest descent, grad sampling and BFGS iterates
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
contours of fstarting pointoptimal pointsteepest descentgrad samp (1st phase)grad samp (2nd phase)bfgs
![Page 45: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/45.jpg)
Some Nonsmooth Analysis
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
11 / 48
![Page 46: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/46.jpg)
The Clarke Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
12 / 48
Assume f : Rn → R is locally Lipschitz, andlet D = {x ∈ R
n : f is differentiable at x}.
![Page 47: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/47.jpg)
The Clarke Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
12 / 48
Assume f : Rn → R is locally Lipschitz, andlet D = {x ∈ R
n : f is differentiable at x}.
Rademacher’s Theorem: Rn\D has measure zero.
![Page 48: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/48.jpg)
The Clarke Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
12 / 48
Assume f : Rn → R is locally Lipschitz, andlet D = {x ∈ R
n : f is differentiable at x}.
Rademacher’s Theorem: Rn\D has measure zero.
The Clarke subdifferential of f at x is
∂Cf(x) = conv
{lim
x→x,x∈D∇f(x)
}.
![Page 49: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/49.jpg)
The Clarke Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
12 / 48
Assume f : Rn → R is locally Lipschitz, andlet D = {x ∈ R
n : f is differentiable at x}.
Rademacher’s Theorem: Rn\D has measure zero.
The Clarke subdifferential of f at x is
∂Cf(x) = conv
{lim
x→x,x∈D∇f(x)
}.
F.H. Clarke, 1973 (he used the name “generalized gradient”).
![Page 50: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/50.jpg)
The Clarke Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
12 / 48
Assume f : Rn → R is locally Lipschitz, andlet D = {x ∈ R
n : f is differentiable at x}.
Rademacher’s Theorem: Rn\D has measure zero.
The Clarke subdifferential of f at x is
∂Cf(x) = conv
{lim
x→x,x∈D∇f(x)
}.
F.H. Clarke, 1973 (he used the name “generalized gradient”).
If f is continuously differentiable at x, then ∂Cf(x) = {∇f(x)}.
![Page 51: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/51.jpg)
The Clarke Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
12 / 48
Assume f : Rn → R is locally Lipschitz, andlet D = {x ∈ R
n : f is differentiable at x}.
Rademacher’s Theorem: Rn\D has measure zero.
The Clarke subdifferential of f at x is
∂Cf(x) = conv
{lim
x→x,x∈D∇f(x)
}.
F.H. Clarke, 1973 (he used the name “generalized gradient”).
If f is continuously differentiable at x, then ∂Cf(x) = {∇f(x)}.
If f is convex, ∂Cf is the subdifferential of convex analysis.
![Page 52: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/52.jpg)
The Clarke Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
12 / 48
Assume f : Rn → R is locally Lipschitz, andlet D = {x ∈ R
n : f is differentiable at x}.
Rademacher’s Theorem: Rn\D has measure zero.
The Clarke subdifferential of f at x is
∂Cf(x) = conv
{lim
x→x,x∈D∇f(x)
}.
F.H. Clarke, 1973 (he used the name “generalized gradient”).
If f is continuously differentiable at x, then ∂Cf(x) = {∇f(x)}.
If f is convex, ∂Cf is the subdifferential of convex analysis.
We say x is Clarke stationary for f if 0 ∈ ∂Cf(x).
![Page 53: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/53.jpg)
Note that 0 ∈ ∂Cf(x) = 0 at x = [1; 1]T
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
13 / 48
f(x)=10*|x2 − x
12| + (1−x
1)2
steepest descent, grad sampling and BFGS iterates
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
contours of fstarting pointoptimal pointsteepest descentgrad samp (1st phase)grad samp (2nd phase)bfgs
![Page 54: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/54.jpg)
Regularity
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
14 / 48
A locally Lipschitz, directionally differentiable function f is(Clarke) regular near a point x when its directional derivativex 7→ f ′(x; d) is upper semicontinuous near x for every fixeddirection d.
![Page 55: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/55.jpg)
Regularity
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
14 / 48
A locally Lipschitz, directionally differentiable function f is(Clarke) regular near a point x when its directional derivativex 7→ f ′(x; d) is upper semicontinuous near x for every fixeddirection d.
In this case 0 ∈ ∂Cf(x) is equivalent to the first-order optimalitycondition f ′(x, d) ≥ 0 for all directions d.
![Page 56: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/56.jpg)
Regularity
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
14 / 48
A locally Lipschitz, directionally differentiable function f is(Clarke) regular near a point x when its directional derivativex 7→ f ′(x; d) is upper semicontinuous near x for every fixeddirection d.
In this case 0 ∈ ∂Cf(x) is equivalent to the first-order optimalitycondition f ′(x, d) ≥ 0 for all directions d.
■ All convex functions are regular
![Page 57: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/57.jpg)
Regularity
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
14 / 48
A locally Lipschitz, directionally differentiable function f is(Clarke) regular near a point x when its directional derivativex 7→ f ′(x; d) is upper semicontinuous near x for every fixeddirection d.
In this case 0 ∈ ∂Cf(x) is equivalent to the first-order optimalitycondition f ′(x, d) ≥ 0 for all directions d.
■ All convex functions are regular■ All smooth functions are regular
![Page 58: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/58.jpg)
Regularity
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
14 / 48
A locally Lipschitz, directionally differentiable function f is(Clarke) regular near a point x when its directional derivativex 7→ f ′(x; d) is upper semicontinuous near x for every fixeddirection d.
In this case 0 ∈ ∂Cf(x) is equivalent to the first-order optimalitycondition f ′(x, d) ≥ 0 for all directions d.
■ All convex functions are regular■ All smooth functions are regular■ Nonsmooth concave functions are not regular
Example: f(x) = −|x|
![Page 59: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/59.jpg)
Partly Smooth Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
15 / 48
A regular function f is partly smooth at x relative to a manifoldM containing x (A.S. Lewis 2003) if
![Page 60: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/60.jpg)
Partly Smooth Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
15 / 48
A regular function f is partly smooth at x relative to a manifoldM containing x (A.S. Lewis 2003) if
■ its restriction to M is twice continuously differentiable near x
![Page 61: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/61.jpg)
Partly Smooth Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
15 / 48
A regular function f is partly smooth at x relative to a manifoldM containing x (A.S. Lewis 2003) if
■ its restriction to M is twice continuously differentiable near x■ the Clarke subdifferential ∂Cf is continuous on M near x
![Page 62: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/62.jpg)
Partly Smooth Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
15 / 48
A regular function f is partly smooth at x relative to a manifoldM containing x (A.S. Lewis 2003) if
■ its restriction to M is twice continuously differentiable near x■ the Clarke subdifferential ∂Cf is continuous on M near x■ par ∂Cf(x), the subspace parallel to the affine hull of the
subdifferential of f at x, is exactly the subspace normal toM at x.
![Page 63: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/63.jpg)
Partly Smooth Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
15 / 48
A regular function f is partly smooth at x relative to a manifoldM containing x (A.S. Lewis 2003) if
■ its restriction to M is twice continuously differentiable near x■ the Clarke subdifferential ∂Cf is continuous on M near x■ par ∂Cf(x), the subspace parallel to the affine hull of the
subdifferential of f at x, is exactly the subspace normal toM at x.
We refer to par ∂Cf(x) as the V-space for f at x (with respectto M), and to its orthogonal complement, the subspace tangentto M at x, as the U-space for f at x.
![Page 64: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/64.jpg)
Partly Smooth Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
15 / 48
A regular function f is partly smooth at x relative to a manifoldM containing x (A.S. Lewis 2003) if
■ its restriction to M is twice continuously differentiable near x■ the Clarke subdifferential ∂Cf is continuous on M near x■ par ∂Cf(x), the subspace parallel to the affine hull of the
subdifferential of f at x, is exactly the subspace normal toM at x.
We refer to par ∂Cf(x) as the V-space for f at x (with respectto M), and to its orthogonal complement, the subspace tangentto M at x, as the U-space for f at x.
For nonzero y in the V-space, the mapping t 7→ f(x+ ty) isnecessarily nonsmooth at t = 0, while for nonzero y in theU-space, t 7→ f(x+ ty) is differentiable at t = 0 as long as f islocally Lipschitz.
![Page 65: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/65.jpg)
Illustration of U and V-spaces on Same Example
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
The ClarkeSubdifferentialNote that0 ∈ ∂Cf(x) = 0
at x = [1; 1]T
Regularity
Partly SmoothFunctionsIllustration of U andV-spaces on SameExample
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctions
16 / 48
f(x)=10*|x2 − x
12| + (1−x
1)2
steepest descent, grad sampling and BFGS iterates
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
contours of fstarting pointoptimal pointsteepest descentgrad samp (1st phase)grad samp (2nd phase)bfgs
![Page 66: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/66.jpg)
Nesterov’s Chebyshev-Rosenbrock Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
17 / 48
![Page 67: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/67.jpg)
Nesterov’s First Chebyshev-Rosenbrock Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
18 / 48
Nesterov (2008, private comm.): consider the function
Np(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|p, where p ∈ [1, 2]
![Page 68: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/68.jpg)
Nesterov’s First Chebyshev-Rosenbrock Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
18 / 48
Nesterov (2008, private comm.): consider the function
Np(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|p, where p ∈ [1, 2]
The unique minimizer is x∗ = [1, 1, . . . , 1]T with Np(x∗) = 0.
![Page 69: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/69.jpg)
Nesterov’s First Chebyshev-Rosenbrock Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
18 / 48
Nesterov (2008, private comm.): consider the function
Np(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|p, where p ∈ [1, 2]
The unique minimizer is x∗ = [1, 1, . . . , 1]T with Np(x∗) = 0.
Define x = [−1, 1, 1, . . . , 1]T with Np(x) = 1 and the manifold
MN = {x : xi+1 = 2x2i − 1, i = 1, . . . , n− 1}
![Page 70: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/70.jpg)
Nesterov’s First Chebyshev-Rosenbrock Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
18 / 48
Nesterov (2008, private comm.): consider the function
Np(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|p, where p ∈ [1, 2]
The unique minimizer is x∗ = [1, 1, . . . , 1]T with Np(x∗) = 0.
Define x = [−1, 1, 1, . . . , 1]T with Np(x) = 1 and the manifold
MN = {x : xi+1 = 2x2i − 1, i = 1, . . . , n− 1}
For x ∈ MN , e.g. x = x∗ or x = x, the 2nd term of Np is zero.Starting at x, BFGS needs to approximately follow MN to reachx∗ (unless it “gets lucky”).
![Page 71: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/71.jpg)
Nesterov’s First Chebyshev-Rosenbrock Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
18 / 48
Nesterov (2008, private comm.): consider the function
Np(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|p, where p ∈ [1, 2]
The unique minimizer is x∗ = [1, 1, . . . , 1]T with Np(x∗) = 0.
Define x = [−1, 1, 1, . . . , 1]T with Np(x) = 1 and the manifold
MN = {x : xi+1 = 2x2i − 1, i = 1, . . . , n− 1}
For x ∈ MN , e.g. x = x∗ or x = x, the 2nd term of Np is zero.Starting at x, BFGS needs to approximately follow MN to reachx∗ (unless it “gets lucky”).
When p = 2: N2 is smooth but not convex. Starting at x:
![Page 72: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/72.jpg)
Nesterov’s First Chebyshev-Rosenbrock Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
18 / 48
Nesterov (2008, private comm.): consider the function
Np(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|p, where p ∈ [1, 2]
The unique minimizer is x∗ = [1, 1, . . . , 1]T with Np(x∗) = 0.
Define x = [−1, 1, 1, . . . , 1]T with Np(x) = 1 and the manifold
MN = {x : xi+1 = 2x2i − 1, i = 1, . . . , n− 1}
For x ∈ MN , e.g. x = x∗ or x = x, the 2nd term of Np is zero.Starting at x, BFGS needs to approximately follow MN to reachx∗ (unless it “gets lucky”).
When p = 2: N2 is smooth but not convex. Starting at x:
■ n = 5: BFGS needs 370 iterations to reduce N2 below 10−15
![Page 73: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/73.jpg)
Nesterov’s First Chebyshev-Rosenbrock Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
18 / 48
Nesterov (2008, private comm.): consider the function
Np(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|p, where p ∈ [1, 2]
The unique minimizer is x∗ = [1, 1, . . . , 1]T with Np(x∗) = 0.
Define x = [−1, 1, 1, . . . , 1]T with Np(x) = 1 and the manifold
MN = {x : xi+1 = 2x2i − 1, i = 1, . . . , n− 1}
For x ∈ MN , e.g. x = x∗ or x = x, the 2nd term of Np is zero.Starting at x, BFGS needs to approximately follow MN to reachx∗ (unless it “gets lucky”).
When p = 2: N2 is smooth but not convex. Starting at x:
■ n = 5: BFGS needs 370 iterations to reduce N2 below 10−15
■ n = 10: needs ∼ 50,000 iterations to reduce N2 below 10−15
even though N2 is smooth!
![Page 74: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/74.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
![Page 75: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/75.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
![Page 76: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/76.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
![Page 77: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/77.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
■ x1 to change from −1 to 1
![Page 78: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/78.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
■ x1 to change from −1 to 1■ x2 = 2x2
1 − 1 to trace the graph of T2(x1) on [−1, 1]
![Page 79: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/79.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
■ x1 to change from −1 to 1■ x2 = 2x2
1 − 1 to trace the graph of T2(x1) on [−1, 1]■ x3 = T2(T2(x)) to trace the graph of T4(x1) on [−1, 1]
![Page 80: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/80.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
■ x1 to change from −1 to 1■ x2 = 2x2
1 − 1 to trace the graph of T2(x1) on [−1, 1]■ x3 = T2(T2(x)) to trace the graph of T4(x1) on [−1, 1]■ xn = T2n−1(x) to trace the graph of T2n−1(x1) on [−1, 1]
which has 2n−1 − 1 extrema in (−1, 1).
![Page 81: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/81.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
■ x1 to change from −1 to 1■ x2 = 2x2
1 − 1 to trace the graph of T2(x1) on [−1, 1]■ x3 = T2(T2(x)) to trace the graph of T4(x1) on [−1, 1]■ xn = T2n−1(x) to trace the graph of T2n−1(x1) on [−1, 1]
which has 2n−1 − 1 extrema in (−1, 1).Even though BFGS will not track the manifold MN exactly, it willfollow it approximately. So, since the manifold is highly oscillatory,BFGS must take relatively short steps to obtain reduction in N2 in theline search, and hence it takes many iterations!
![Page 82: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/82.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
■ x1 to change from −1 to 1■ x2 = 2x2
1 − 1 to trace the graph of T2(x1) on [−1, 1]■ x3 = T2(T2(x)) to trace the graph of T4(x1) on [−1, 1]■ xn = T2n−1(x) to trace the graph of T2n−1(x1) on [−1, 1]
which has 2n−1 − 1 extrema in (−1, 1).Even though BFGS will not track the manifold MN exactly, it willfollow it approximately. So, since the manifold is highly oscillatory,BFGS must take relatively short steps to obtain reduction in N2 in theline search, and hence it takes many iterations!
At the very end, since N2 is smooth, BFGS is superlinearly convergent!
![Page 83: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/83.jpg)
Why BFGS Takes So Many Iterations to Minimize N2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
19 / 48
Let Ti(x) denote the ith Chebyshev polynomial. For x ∈ MN ,
xi+1 = 2x2i − 1 = T2(xi) = T2(T2(xi−1))
= T2(T2(. . . T2(x1) . . .)) = T2i(x1).
To move from x to x∗ along the manifold MN exactly requires
■ x1 to change from −1 to 1■ x2 = 2x2
1 − 1 to trace the graph of T2(x1) on [−1, 1]■ x3 = T2(T2(x)) to trace the graph of T4(x1) on [−1, 1]■ xn = T2n−1(x) to trace the graph of T2n−1(x1) on [−1, 1]
which has 2n−1 − 1 extrema in (−1, 1).Even though BFGS will not track the manifold MN exactly, it willfollow it approximately. So, since the manifold is highly oscillatory,BFGS must take relatively short steps to obtain reduction in N2 in theline search, and hence it takes many iterations!
At the very end, since N2 is smooth, BFGS is superlinearly convergent!
Newton’s method is not much faster, although it convergesquadratically at the end.
![Page 84: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/84.jpg)
Length of a Piecewise Linear Descent Path
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
20 / 48
F. Jarre (2013): if the second term (the sum) in Nesterov’ssmooth Chebyshev-Rosenbrock function N2 is weighted by 400,any continuous piecewise linear descent path starting at x andleading to the global minimizer x∗ has
at least 1.618n linear segments.
![Page 85: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/85.jpg)
Nesterov’s First C-R Function: Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
21 / 48
N1(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|
![Page 86: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/86.jpg)
Nesterov’s First C-R Function: Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
21 / 48
N1(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|
N1 is nonsmooth (though locally Lipschitz) as well asnonconvex. The second term is still zero on the manifold MN ,but N1 is not differentiable on MN .
![Page 87: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/87.jpg)
Nesterov’s First C-R Function: Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
21 / 48
N1(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|
N1 is nonsmooth (though locally Lipschitz) as well asnonconvex. The second term is still zero on the manifold MN ,but N1 is not differentiable on MN .
However, N1 is regular at x ∈ MN and partly smooth at x w.r.t.MN , and x∗ = [1, 1, . . . , 1]T is its only stationary point.
![Page 88: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/88.jpg)
Nesterov’s First C-R Function: Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
21 / 48
N1(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|
N1 is nonsmooth (though locally Lipschitz) as well asnonconvex. The second term is still zero on the manifold MN ,but N1 is not differentiable on MN .
However, N1 is regular at x ∈ MN and partly smooth at x w.r.t.MN , and x∗ = [1, 1, . . . , 1]T is its only stationary point.
We cannot initialize BFGS at x, so starting at normallydistributed random points:
![Page 89: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/89.jpg)
Nesterov’s First C-R Function: Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
21 / 48
N1(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|
N1 is nonsmooth (though locally Lipschitz) as well asnonconvex. The second term is still zero on the manifold MN ,but N1 is not differentiable on MN .
However, N1 is regular at x ∈ MN and partly smooth at x w.r.t.MN , and x∗ = [1, 1, . . . , 1]T is its only stationary point.
We cannot initialize BFGS at x, so starting at normallydistributed random points:
■ n = 5: BFGS reduces N1 only to about 5× 10−3 in 1000iterations
![Page 90: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/90.jpg)
Nesterov’s First C-R Function: Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
21 / 48
N1(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|
N1 is nonsmooth (though locally Lipschitz) as well asnonconvex. The second term is still zero on the manifold MN ,but N1 is not differentiable on MN .
However, N1 is regular at x ∈ MN and partly smooth at x w.r.t.MN , and x∗ = [1, 1, . . . , 1]T is its only stationary point.
We cannot initialize BFGS at x, so starting at normallydistributed random points:
■ n = 5: BFGS reduces N1 only to about 5× 10−3 in 1000iterations
■ n = 10: BFGS reduces N1 only to about 2× 10−2 in 1000iterations
![Page 91: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/91.jpg)
Nesterov’s First C-R Function: Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
21 / 48
N1(x) =1
4(x1 − 1)2 +
n−1∑
i=1
|xi+1 − 2x2i + 1|
N1 is nonsmooth (though locally Lipschitz) as well asnonconvex. The second term is still zero on the manifold MN ,but N1 is not differentiable on MN .
However, N1 is regular at x ∈ MN and partly smooth at x w.r.t.MN , and x∗ = [1, 1, . . . , 1]T is its only stationary point.
We cannot initialize BFGS at x, so starting at normallydistributed random points:
■ n = 5: BFGS reduces N1 only to about 5× 10−3 in 1000iterations
■ n = 10: BFGS reduces N1 only to about 2× 10−2 in 1000iterations
The method appears to be converging, very slowly, but may behaving numerical difficulties.
![Page 92: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/92.jpg)
Nesterov’s Second Nonsmooth C-R Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
22 / 48
N1(x) =1
4|x1 − 1|+
n−1∑
i=1
|xi+1 − 2|xi|+ 1|.
Again, the unique global minimizer is x∗. The second term iszero on the set
S = {x : xi+1 = 2|xi| − 1, i = 1, . . . , n− 1}
but S is not a manifold: it has “corners”.
![Page 93: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/93.jpg)
Contour Plots of the Nonsmooth Variants for n = 2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
23 / 48
Nesterov−Chebyshev−Rosenbrock, first variant
x1
x2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2Nesterov−Chebyshev−Rosenbrock, second variant
x1
x2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Contour plots of nonsmooth Chebyshev-Rosenbrock functions N1
(left) and N1 (right), with n = 2, with iterates generated byBFGS initialized at 7 different randomly generated points.
![Page 94: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/94.jpg)
Contour Plots of the Nonsmooth Variants for n = 2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
23 / 48
Nesterov−Chebyshev−Rosenbrock, first variant
x1
x2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2Nesterov−Chebyshev−Rosenbrock, second variant
x1
x2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Contour plots of nonsmooth Chebyshev-Rosenbrock functions N1
(left) and N1 (right), with n = 2, with iterates generated byBFGS initialized at 7 different randomly generated points.On the left, always get convergence to x∗ = [1, 1]T . On theright, most runs converge to [1, 1] but some go to x = [0,−1]T .
![Page 95: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/95.jpg)
Properties of the Second Nonsmooth Variant N1
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
24 / 48
When n = 2, the point x = [0,−1]T is Clarke stationary for thesecond nonsmooth variant N1. We can see this because zero isin the convex hull of the gradient limits for N1 at the point x.
![Page 96: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/96.jpg)
Properties of the Second Nonsmooth Variant N1
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
24 / 48
When n = 2, the point x = [0,−1]T is Clarke stationary for thesecond nonsmooth variant N1. We can see this because zero isin the convex hull of the gradient limits for N1 at the point x.
However, x = [0,−1]T is not a local minimizer, becaused = [1, 2]T is a direction of linear descent: N ′
1(x, d) < 0.
![Page 97: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/97.jpg)
Properties of the Second Nonsmooth Variant N1
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
24 / 48
When n = 2, the point x = [0,−1]T is Clarke stationary for thesecond nonsmooth variant N1. We can see this because zero isin the convex hull of the gradient limits for N1 at the point x.
However, x = [0,−1]T is not a local minimizer, becaused = [1, 2]T is a direction of linear descent: N ′
1(x, d) < 0.
These two properties mean that N1 is not regular at [0,−1]T .
![Page 98: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/98.jpg)
The Mordukhovich Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
25 / 48
B.S. Mordukhovich (1976), R.T. Rockafellar and R. J.-B. Wets(1998)
Consider a continuous function f : Rn → R (not necessarilyLipschitz) and a point x ∈ R
n. A vector v ∈ Rn is a regular
subgradient of f at x (written v ∈ ∂f(x)) if
lim infz → x
z 6= x
f(z)− f(x)− 〈v, z − x〉
|z − x|≥ 0.
![Page 99: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/99.jpg)
The Mordukhovich Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
25 / 48
B.S. Mordukhovich (1976), R.T. Rockafellar and R. J.-B. Wets(1998)
Consider a continuous function f : Rn → R (not necessarilyLipschitz) and a point x ∈ R
n. A vector v ∈ Rn is a regular
subgradient of f at x (written v ∈ ∂f(x)) if
lim infz → x
z 6= x
f(z)− f(x)− 〈v, z − x〉
|z − x|≥ 0.
A vector v ∈ Rn is a Mordukhovich subgradient of f at x
(written v ∈ ∂Mf(x)) if there exist sequences {x} and {v} inRn satisfying
x → x
v ∈ ∂f(x)
v → v.
![Page 100: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/100.jpg)
The Mordukhovich Subdifferential
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
25 / 48
B.S. Mordukhovich (1976), R.T. Rockafellar and R. J.-B. Wets(1998)
Consider a continuous function f : Rn → R (not necessarilyLipschitz) and a point x ∈ R
n. A vector v ∈ Rn is a regular
subgradient of f at x (written v ∈ ∂f(x)) if
lim infz → x
z 6= x
f(z)− f(x)− 〈v, z − x〉
|z − x|≥ 0.
A vector v ∈ Rn is a Mordukhovich subgradient of f at x
(written v ∈ ∂Mf(x)) if there exist sequences {x} and {v} inRn satisfying
x → x
v ∈ ∂f(x)
v → v.
We say f is Mordukhovich stationary at x if 0 ∈ ∂Mf(x).
![Page 101: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/101.jpg)
Relationship Between ∂Cf and ∂Mf
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
26 / 48
For a locally Lipschitz function f , we have
∂Cf(x) = conv ∂Mf(x).
and, if f is regular,
∂Cf(x) = ∂Mf(x).
![Page 102: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/102.jpg)
Relationship Between ∂Cf and ∂Mf
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
26 / 48
For a locally Lipschitz function f , we have
∂Cf(x) = conv ∂Mf(x).
and, if f is regular,
∂Cf(x) = ∂Mf(x).
Example: let g(x) = |x1| − |x2|, x ∈ R2. Then
∂Cg(0) = [−1, 1]× [−1, 1] and ∂Mg(0) = [−1, 1]× {−1, 1}
so g is not regular.
![Page 103: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/103.jpg)
Back to Nesterov’s Second Nonsmooth C-R Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
27 / 48
![Page 104: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/104.jpg)
Back to Nesterov’s Second Nonsmooth C-R Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
27 / 48
Theorem. For n ≥ 2:
■ N1 has 2n−1 Clarke stationary points
![Page 105: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/105.jpg)
Back to Nesterov’s Second Nonsmooth C-R Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
27 / 48
Theorem. For n ≥ 2:
■ N1 has 2n−1 Clarke stationary points■ N1 has exactly one Mordukhovich stationary point, the
global minimizer x∗
![Page 106: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/106.jpg)
Back to Nesterov’s Second Nonsmooth C-R Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
27 / 48
Theorem. For n ≥ 2:
■ N1 has 2n−1 Clarke stationary points■ N1 has exactly one Mordukhovich stationary point, the
global minimizer x∗
■ its only local minimizer is the global minimizer x∗
M. Gurbuzbalaban and M.L.O., SIOPT, 2012.
![Page 107: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/107.jpg)
Back to Nesterov’s Second Nonsmooth C-R Function
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
27 / 48
Theorem. For n ≥ 2:
■ N1 has 2n−1 Clarke stationary points■ N1 has exactly one Mordukhovich stationary point, the
global minimizer x∗
■ its only local minimizer is the global minimizer x∗
M. Gurbuzbalaban and M.L.O., SIOPT, 2012.
Furthermore, starting from enough randomly generated startingpoints, BFGS finds all 2n−1 Clarke stationary points!
![Page 108: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/108.jpg)
Behavior of BFGS on the Second Nonsmooth Variant
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
28 / 48
0 200 400 600 800 10000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Nesterov−Chebyshev−Rosenbrock, n=5
different starting points
so
rte
d f
ina
l va
lue
of
f
0 200 400 600 800 10000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Nesterov−Chebyshev−Rosenbrock, n=6
different starting points
so
rte
d f
ina
l va
lue
of
f
Left: sorted final values of N1 for 1000 randomly generatedstarting points, when n = 5: BFGS finds all 16 Clarke stationarypoints. Right: same with n = 6: BFGS finds all 32 Clarkestationary points.
![Page 109: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/109.jpg)
Convergence to Non-Locally-Minimizing Points
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
29 / 48
When f is smooth, convergence of methods such as BFGS tonon-locally-minimizing stationary points or local maxima ispossible but not likely, because of the line search, and suchconvergence will not be stable under perturbation.
![Page 110: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/110.jpg)
Convergence to Non-Locally-Minimizing Points
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
29 / 48
When f is smooth, convergence of methods such as BFGS tonon-locally-minimizing stationary points or local maxima ispossible but not likely, because of the line search, and suchconvergence will not be stable under perturbation.
However, this kind of convergence is what we are seeing for thenon-regular, non-smooth Nesterov Chebyshev-Rosenbrockexample, and it is stable under perturbation. The same behavioroccurs for gradient sampling or bundle methods.
![Page 111: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/111.jpg)
Convergence to Non-Locally-Minimizing Points
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
29 / 48
When f is smooth, convergence of methods such as BFGS tonon-locally-minimizing stationary points or local maxima ispossible but not likely, because of the line search, and suchconvergence will not be stable under perturbation.
However, this kind of convergence is what we are seeing for thenon-regular, non-smooth Nesterov Chebyshev-Rosenbrockexample, and it is stable under perturbation. The same behavioroccurs for gradient sampling or bundle methods.
Kiwiel (private communication): the Nesterov example is the firsthe had seen which causes his bundle code to have this behavior.
![Page 112: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/112.jpg)
Convergence to Non-Locally-Minimizing Points
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
29 / 48
When f is smooth, convergence of methods such as BFGS tonon-locally-minimizing stationary points or local maxima ispossible but not likely, because of the line search, and suchconvergence will not be stable under perturbation.
However, this kind of convergence is what we are seeing for thenon-regular, non-smooth Nesterov Chebyshev-Rosenbrockexample, and it is stable under perturbation. The same behavioroccurs for gradient sampling or bundle methods.
Kiwiel (private communication): the Nesterov example is the firsthe had seen which causes his bundle code to have this behavior.
Nonetheless, we don’t know whether, in exact arithmetic, themethods would actually generate sequences converging to thenonminimizing Clarke stationary points. Experiments by Kaku(2011) suggest that the higher the precision used, the more likelyBFGS is to eventually move away from such a point.
![Page 113: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/113.jpg)
Experiments using BFGS with Extended Precision
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
30 / 48
M.S. thesis by A. Kaku experimenting with Sherry Li’s “doubledouble” C++ package.
![Page 114: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/114.jpg)
Experiments using BFGS with Extended Precision
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
30 / 48
M.S. thesis by A. Kaku experimenting with Sherry Li’s “doubledouble” C++ package.
“double double” is not the same as quadruple precision: eachnumber is represented as the sum of two ordinary doubleprecision numbers
![Page 115: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/115.jpg)
Experiments using BFGS with Extended Precision
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
30 / 48
M.S. thesis by A. Kaku experimenting with Sherry Li’s “doubledouble” C++ package.
“double double” is not the same as quadruple precision: eachnumber is represented as the sum of two ordinary doubleprecision numbers
Thus, 1 + 10−30 and 1 + 10−300 are both valid “double double”numbers
![Page 116: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/116.jpg)
Experiments using BFGS with Extended Precision
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
30 / 48
M.S. thesis by A. Kaku experimenting with Sherry Li’s “doubledouble” C++ package.
“double double” is not the same as quadruple precision: eachnumber is represented as the sum of two ordinary doubleprecision numbers
Thus, 1 + 10−30 and 1 + 10−300 are both valid “double double”numbers
In practice, it is just a convenient, inexpensive softwareimplementation that approximates quadruple precision(approximately 32 decimal digits of accuracy instead of 16)
![Page 117: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/117.jpg)
Experiments using BFGS with Extended Precision
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
30 / 48
M.S. thesis by A. Kaku experimenting with Sherry Li’s “doubledouble” C++ package.
“double double” is not the same as quadruple precision: eachnumber is represented as the sum of two ordinary doubleprecision numbers
Thus, 1 + 10−30 and 1 + 10−300 are both valid “double double”numbers
In practice, it is just a convenient, inexpensive softwareimplementation that approximates quadruple precision(approximately 32 decimal digits of accuracy instead of 16)
Show plots from Kaku’s thesis.
![Page 118: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/118.jpg)
An Approach using Automatic Differentiation
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
31 / 48
Recent work by A. Griewank on automatic differentiation fornonsmooth optimization: leads to a more efficient method foroptimization of Nesterov’s second nonsmoothChebyshev-Rosenbrock since it is able to efficiently exploit thepiecewise-linearity of the function.
![Page 119: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/119.jpg)
An Approach using Automatic Differentiation
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctionsNesterov’s FirstChebyshev-RosenbrockFunctionWhy BFGS Takes SoMany Iterations toMinimize N2
Length of aPiecewise LinearDescent PathNesterov’s First C-RFunction:Nonsmooth CaseNesterov’s SecondNonsmooth C-RFunctionContour Plots of theNonsmooth Variantsfor n = 2Properties of theSecond NonsmoothVariant N1
The MordukhovichSubdifferentialRelationship
Between ∂Cf and
31 / 48
Recent work by A. Griewank on automatic differentiation fornonsmooth optimization: leads to a more efficient method foroptimization of Nesterov’s second nonsmoothChebyshev-Rosenbrock since it is able to efficiently exploit thepiecewise-linearity of the function.
Starting at x, it visits all 2n−1 Clarke stationary points, but itdoes not get stuck at any of them because it repeatedly solvesLPs that define the piecewise linear path leading to the globalminimum.
![Page 120: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/120.jpg)
Other Examples of Behavior of BFGS
on Nonsmooth Functions
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
32 / 48
![Page 121: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/121.jpg)
Minimizing a Product of Eigenvalues
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
33 / 48
Let SN denote the space of real symmetric N ×N matrices, and
λ1(X) ≥ λ2(X) ≥ · · ·λN (X)
denote the eigenvalues of X ∈ SN .
![Page 122: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/122.jpg)
Minimizing a Product of Eigenvalues
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
33 / 48
Let SN denote the space of real symmetric N ×N matrices, and
λ1(X) ≥ λ2(X) ≥ · · ·λN (X)
denote the eigenvalues of X ∈ SN . We wish to minimize
f(X) = log
N/2∏
i=1
λi(A ◦X)
where A ∈ SN is fixed and ◦ is the Hadamard (componentwise)matrix product, subject to the constraints that X is positivesemidefinite and has diagonal entries equal to 1.
![Page 123: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/123.jpg)
Minimizing a Product of Eigenvalues
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
33 / 48
Let SN denote the space of real symmetric N ×N matrices, and
λ1(X) ≥ λ2(X) ≥ · · ·λN (X)
denote the eigenvalues of X ∈ SN . We wish to minimize
f(X) = log
N/2∏
i=1
λi(A ◦X)
where A ∈ SN is fixed and ◦ is the Hadamard (componentwise)matrix product, subject to the constraints that X is positivesemidefinite and has diagonal entries equal to 1.
If we replace∏
by∑
we would have a semidefinite program.
![Page 124: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/124.jpg)
Minimizing a Product of Eigenvalues
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
33 / 48
Let SN denote the space of real symmetric N ×N matrices, and
λ1(X) ≥ λ2(X) ≥ · · ·λN (X)
denote the eigenvalues of X ∈ SN . We wish to minimize
f(X) = log
N/2∏
i=1
λi(A ◦X)
where A ∈ SN is fixed and ◦ is the Hadamard (componentwise)matrix product, subject to the constraints that X is positivesemidefinite and has diagonal entries equal to 1.
If we replace∏
by∑
we would have a semidefinite program.
Since f is not convex, may as well replace X by Y Y T whereY ∈ R
N×N : eliminates psd constraint, and then also easy toeliminate diagonal constraint.
![Page 125: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/125.jpg)
Minimizing a Product of Eigenvalues
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
33 / 48
Let SN denote the space of real symmetric N ×N matrices, and
λ1(X) ≥ λ2(X) ≥ · · ·λN (X)
denote the eigenvalues of X ∈ SN . We wish to minimize
f(X) = log
N/2∏
i=1
λi(A ◦X)
where A ∈ SN is fixed and ◦ is the Hadamard (componentwise)matrix product, subject to the constraints that X is positivesemidefinite and has diagonal entries equal to 1.
If we replace∏
by∑
we would have a semidefinite program.
Since f is not convex, may as well replace X by Y Y T whereY ∈ R
N×N : eliminates psd constraint, and then also easy toeliminate diagonal constraint.
Application: entropy minimization in an environmentalapplication (K.M. Anstreicher and J. Lee, 2004)
![Page 126: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/126.jpg)
BFGS from 10 Randomly Generated Starting Points
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
34 / 48
0 200 400 600 800 1000 1200 140010
−15
10−10
10−5
100
105
iteration
f − f op
t (di
ffere
nt s
tart
ing
poin
ts)
Log eigenvalue product, N=20, n=400, fopt
= −4.37938e+000
f − fopt, where fopt is least value of f found over all runs
![Page 127: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/127.jpg)
Evolution of Eigenvalues of A ◦X
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
35 / 48
0 200 400 600 800 1000 1200 14000.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Log eigenvalue product, N=20, n=400, fopt
= −4.37938e+000
iteration
eige
nval
ues
of A
o X
![Page 128: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/128.jpg)
Evolution of Eigenvalues of A ◦X
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
35 / 48
0 200 400 600 800 1000 1200 14000.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Log eigenvalue product, N=20, n=400, fopt
= −4.37938e+000
iteration
eige
nval
ues
of A
o X
Note that λ6(X), . . . , λ14(X) coalesce
![Page 129: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/129.jpg)
Evolution of Eigenvalues of H
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
36 / 48
0 200 400 600 800 1000 1200 140010
−15
10−10
10−5
100
105
Log eigenvalue product, N=20, n=400, fopt
= −4.37938e+000
iteration
eige
nval
ues
of H
![Page 130: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/130.jpg)
Evolution of Eigenvalues of H
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
36 / 48
0 200 400 600 800 1000 1200 140010
−15
10−10
10−5
100
105
Log eigenvalue product, N=20, n=400, fopt
= −4.37938e+000
iteration
eige
nval
ues
of H
44 eigenvalues of H converge to zero...why???
![Page 131: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/131.jpg)
Why Did 44 Eigenvalues of H Converge to Zero?
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
37 / 48
The eigenvalue product is partly smooth with respect to themanifold of matrices with an eigenvalue with given multiplicity.
![Page 132: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/132.jpg)
Why Did 44 Eigenvalues of H Converge to Zero?
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
37 / 48
The eigenvalue product is partly smooth with respect to themanifold of matrices with an eigenvalue with given multiplicity.
Recall that at the computed minimizer,
λ6(A ◦X) ≈ . . . ≈ λ14(A ◦X).
Matrix theory says that imposing multiplicity m on an eigenvaluea matrix ∈ SN is m(m+1)
2 − 1 conditions, or 44 when m = 9, sothe dimension of the V -space at this minimizer is 44.
![Page 133: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/133.jpg)
Why Did 44 Eigenvalues of H Converge to Zero?
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
37 / 48
The eigenvalue product is partly smooth with respect to themanifold of matrices with an eigenvalue with given multiplicity.
Recall that at the computed minimizer,
λ6(A ◦X) ≈ . . . ≈ λ14(A ◦X).
Matrix theory says that imposing multiplicity m on an eigenvaluea matrix ∈ SN is m(m+1)
2 − 1 conditions, or 44 when m = 9, sothe dimension of the V -space at this minimizer is 44.
And tiny eigenvalues of the BFGS matrix H approximating the“inverse Hessian” correspond to “infinite curvature”:nonsmoothness in the V-space
![Page 134: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/134.jpg)
Why Did 44 Eigenvalues of H Converge to Zero?
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
37 / 48
The eigenvalue product is partly smooth with respect to themanifold of matrices with an eigenvalue with given multiplicity.
Recall that at the computed minimizer,
λ6(A ◦X) ≈ . . . ≈ λ14(A ◦X).
Matrix theory says that imposing multiplicity m on an eigenvaluea matrix ∈ SN is m(m+1)
2 − 1 conditions, or 44 when m = 9, sothe dimension of the V -space at this minimizer is 44.
And tiny eigenvalues of the BFGS matrix H approximating the“inverse Hessian” correspond to “infinite curvature”:nonsmoothness in the V-space
Thus BFGS automatically detected the U and V spacepartitioning without knowing anything about the mathematicalstructure of f !
![Page 135: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/135.jpg)
Variation of f from Minimizer, along EigVecs of H
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
38 / 48
−10 −5 0 5 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
f(x op
t + t
w)
− f op
t
t
Log eigenvalue product, N=20, n=400, fopt
= −4.37938e+000
w is eigvector for eigvalue 10 of final Hw is eigvector for eigvalue 20 of final Hw is eigvector for eigvalue 30 of final Hw is eigvector for eigvalue 40 of final Hw is eigvector for eigvalue 50 of final Hw is eigvector for eigvalue 60 of final H
Eigenvalues of H numbered smallest to largest
![Page 136: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/136.jpg)
Minimizing the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
39 / 48
Given the discrete-time dynamical system with control input andmeasured output
z(k+1) = Fz(k) +Gu(k), y(k) = Hz(k)
where F ∈ Rn×n, G ∈ R
n×p, H ∈ Rm×n, the static output
feedback problem is to find a controller X ∈ Rp×m so that,
setting u(k) = Xy(k), all solutions of
z(k+1) = (F +GXH)z(k)
converge to zero, that is all eigenvalues of F +GXH are insidethe unit disk (Schur stable), or prove that this is not possible.
![Page 137: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/137.jpg)
Minimizing the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
39 / 48
Given the discrete-time dynamical system with control input andmeasured output
z(k+1) = Fz(k) +Gu(k), y(k) = Hz(k)
where F ∈ Rn×n, G ∈ R
n×p, H ∈ Rm×n, the static output
feedback problem is to find a controller X ∈ Rp×m so that,
setting u(k) = Xy(k), all solutions of
z(k+1) = (F +GXH)z(k)
converge to zero, that is all eigenvalues of F +GXH are insidethe unit disk (Schur stable), or prove that this is not possible.Pose as optimization problem:
minX∈Rp×m
ρ(F +GXH)
where ρ is spectral radius.
![Page 138: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/138.jpg)
Minimizing the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
39 / 48
Given the discrete-time dynamical system with control input andmeasured output
z(k+1) = Fz(k) +Gu(k), y(k) = Hz(k)
where F ∈ Rn×n, G ∈ R
n×p, H ∈ Rm×n, the static output
feedback problem is to find a controller X ∈ Rp×m so that,
setting u(k) = Xy(k), all solutions of
z(k+1) = (F +GXH)z(k)
converge to zero, that is all eigenvalues of F +GXH are insidethe unit disk (Schur stable), or prove that this is not possible.Pose as optimization problem:
minX∈Rp×m
ρ(F +GXH)
where ρ is spectral radius.NP-hard if add bounds on entries of X(V. Blondel and J. Tsitsiklis, 1996).
![Page 139: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/139.jpg)
Nonsmooth Analysis of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
40 / 48
The spectral radius ρ is not locally Lipschitz at matrices withmultiple active eigenvalues (those attaining the maximalmodulus).
![Page 140: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/140.jpg)
Nonsmooth Analysis of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
40 / 48
The spectral radius ρ is not locally Lipschitz at matrices withmultiple active eigenvalues (those attaining the maximalmodulus).
Nonsmooth analysis of ρ in this case, deriving ∂Mρ, was given byJ.V. Burke and M.L.O. (2001), J.V. Burke, A.S. Lewis andM.L.O. (2005), etc.
![Page 141: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/141.jpg)
Nonsmooth Analysis of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
40 / 48
The spectral radius ρ is not locally Lipschitz at matrices withmultiple active eigenvalues (those attaining the maximalmodulus).
Nonsmooth analysis of ρ in this case, deriving ∂Mρ, was given byJ.V. Burke and M.L.O. (2001), J.V. Burke, A.S. Lewis andM.L.O. (2005), etc.
But to apply BFGS, we assume that everywhere we evaluate ρ atA(X) = F +GXH, there is just one active real eigenvalue oractive conjugate pair with multiplicity one, and break any “ties”arbitrarily.
![Page 142: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/142.jpg)
Gradient of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
41 / 48
Gradient of the spectral radius in real matrix space:
∇ρ(A) = Reµ
|µ|
1
v∗uvu∗
where v and u are right and left eigenvectors for the relevantactive eigenvalue µ of A, which is assumed to be simple andhave nonnegative imaginary part.
![Page 143: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/143.jpg)
Gradient of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
41 / 48
Gradient of the spectral radius in real matrix space:
∇ρ(A) = Reµ
|µ|
1
v∗uvu∗
where v and u are right and left eigenvectors for the relevantactive eigenvalue µ of A, which is assumed to be simple andhave nonnegative imaginary part.
Gradients may be arbitrarily large for µ nearly a multipleeigenvalue: spectral functions are not locally Lipschitz at anactive multiple eigenvalue.
![Page 144: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/144.jpg)
Gradient of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
41 / 48
Gradient of the spectral radius in real matrix space:
∇ρ(A) = Reµ
|µ|
1
v∗uvu∗
where v and u are right and left eigenvectors for the relevantactive eigenvalue µ of A, which is assumed to be simple andhave nonnegative imaginary part.
Gradients may be arbitrarily large for µ nearly a multipleeigenvalue: spectral functions are not locally Lipschitz at anactive multiple eigenvalue.
Break ties for active eigenvalue arbitrarily.
![Page 145: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/145.jpg)
Gradient of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
41 / 48
Gradient of the spectral radius in real matrix space:
∇ρ(A) = Reµ
|µ|
1
v∗uvu∗
where v and u are right and left eigenvectors for the relevantactive eigenvalue µ of A, which is assumed to be simple andhave nonnegative imaginary part.
Gradients may be arbitrarily large for µ nearly a multipleeigenvalue: spectral functions are not locally Lipschitz at anactive multiple eigenvalue.
Break ties for active eigenvalue arbitrarily.
Since A is real, take Im µ ≥ 0 wlog.
![Page 146: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/146.jpg)
Gradient of the Spectral Radius
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
41 / 48
Gradient of the spectral radius in real matrix space:
∇ρ(A) = Reµ
|µ|
1
v∗uvu∗
where v and u are right and left eigenvectors for the relevantactive eigenvalue µ of A, which is assumed to be simple andhave nonnegative imaginary part.
Gradients may be arbitrarily large for µ nearly a multipleeigenvalue: spectral functions are not locally Lipschitz at anactive multiple eigenvalue.
Break ties for active eigenvalue arbitrarily.
Since A is real, take Im µ ≥ 0 wlog.
Defining A(X) = F +GXH, use ordinary chain rule to obtaingradients of ρ(A(X)) in the X space.
![Page 147: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/147.jpg)
Numerical Results for some SOF Problems
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
42 / 48
Let F be an n× n Toeplitz matrix whose nonzeros are 0.5 onthe main diagonal and first three superdiagonals and and thenumber −0.5 on the first subdiagonal. Not Schur stable.
![Page 148: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/148.jpg)
Numerical Results for some SOF Problems
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
42 / 48
Let F be an n× n Toeplitz matrix whose nonzeros are 0.5 onthe main diagonal and first three superdiagonals and and thenumber −0.5 on the first subdiagonal. Not Schur stable.
First set of experiments: set n = 8 and optimize over X ∈ Rp×m
with p = 1 (setting G = [1, . . . , 1]T ), and consider m rangingfrom 0 to 8 (setting H to the matrix whose rows are the first mrows of the identity matrix).
![Page 149: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/149.jpg)
Numerical Results for some SOF Problems
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
42 / 48
Let F be an n× n Toeplitz matrix whose nonzeros are 0.5 onthe main diagonal and first three superdiagonals and and thenumber −0.5 on the first subdiagonal. Not Schur stable.
First set of experiments: set n = 8 and optimize over X ∈ Rp×m
with p = 1 (setting G = [1, . . . , 1]T ), and consider m rangingfrom 0 to 8 (setting H to the matrix whose rows are the first mrows of the identity matrix).
For each m, run BFGS from 100 randomly generated startingpoints to search for local minimizers of ρ(F +GXH) over Xand plot eigenvalues of F +GXH for the best X found.
![Page 150: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/150.jpg)
Numerical Results for some SOF Problems
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
42 / 48
Let F be an n× n Toeplitz matrix whose nonzeros are 0.5 onthe main diagonal and first three superdiagonals and and thenumber −0.5 on the first subdiagonal. Not Schur stable.
First set of experiments: set n = 8 and optimize over X ∈ Rp×m
with p = 1 (setting G = [1, . . . , 1]T ), and consider m rangingfrom 0 to 8 (setting H to the matrix whose rows are the first mrows of the identity matrix).
For each m, run BFGS from 100 randomly generated startingpoints to search for local minimizers of ρ(F +GXH) over Xand plot eigenvalues of F +GXH for the best X found.
Second set of experiments: n = 15, p = 2, with G having asecond column [1,−1, 1,−1, ..., 1]T .
![Page 151: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/151.jpg)
Optimized Eigenvalues: n = 8, p = 1
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
43 / 48
−1 0 1−1
−0.5
0
0.5
1m=0
−1 0 1−1
−0.5
0
0.5
1m=1
−0.5 0 0.5
−0.5
0
0.5
m=2
−0.5 0 0.5
−0.5
0
0.5
m=3
−0.5 0 0.5
−0.5
0
0.5
m=4
−0.5 0 0.5
−0.5
0
0.5
m=5
−0.5 0 0.5
−0.5
0
0.5
m=6
−0.2 0 0.2−0.2
−0.1
0
0.1
0.2m=7
−0.05 0 0.05
−0.05
0
0.05
m=8
’*’ : known optimal value for m = 7 and m = 8
![Page 152: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/152.jpg)
Sorted Final Values of ρ for 100 Runs of BFGS
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
44 / 48
0 50 100
1.0413
1.0413
1.0414
1.0414
m=0
0 50 100
1.2
1.4
1.6
m=1
0 50 1001
1.2
1.4
1.6m=2
0 50 1000.95
1
1.05
1.1
1.15
m=3
0 50 100
0.95
1
1.05
1.1
m=4
0 50 100
0.85
0.9
0.95
1
m=5
0 50 100
0.8
0.9
1
1.1
m=6
0 50 100
0.4
0.6
0.8
1
m=7
0 50 100
0.2
0.4
0.6
0.8
1
1.2m=8
![Page 153: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/153.jpg)
Optimized Eigenvalues: n = 15, p = 2
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
45 / 48
−1 0 1−1
−0.5
0
0.5
1
m=0
−1 0 1−1
−0.5
0
0.5
1
m=1
−1 0 1−1
−0.5
0
0.5
1m=2
−1 0 1−1
−0.5
0
0.5
1m=3
−1 0 1−1
−0.5
0
0.5
1m=4
−0.5 0 0.5
−0.5
0
0.5
m=5
−0.5 0 0.5
−0.5
0
0.5
m=6
−0.5 0 0.5
−0.5
0
0.5
m=7
−0.5 0 0.5
−0.5
0
0.5
m=8
’*’ : known optimal value for m = 8
![Page 154: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/154.jpg)
Sorted Final Values of ρ for 100 Runs of BFGS
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
46 / 48
0 50 100
1.1011
1.1012
1.1012
1.1013
m=0
0 50 100
1.08
1.09
1.1
m=1
0 50 100
1.1
1.2
1.3
m=2
0 50 100
1.05
1.1
1.15
1.2
m=3
0 50 100
1.1
1.2
1.3
m=4
0 50 1001
1.05
1.1
1.15
1.2
m=5
0 50 100
1
1.05
1.1
1.15
1.2
m=6
0 50 1000.9
1
1.1
m=7
0 50 1000.9
0.95
1
1.05
1.1
m=8
![Page 155: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/155.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
![Page 156: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/156.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
Assume f is locally Lipschitz with bounded level sets and issemi-algebraic
![Page 157: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/157.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
Assume f is locally Lipschitz with bounded level sets and issemi-algebraic
Assume the initial x and H are generated randomly (e.g. fromnormal and Wishart distributions)
![Page 158: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/158.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
Assume f is locally Lipschitz with bounded level sets and issemi-algebraic
Assume the initial x and H are generated randomly (e.g. fromnormal and Wishart distributions)
Prove or disprove that the following hold with probability one:
![Page 159: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/159.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
Assume f is locally Lipschitz with bounded level sets and issemi-algebraic
Assume the initial x and H are generated randomly (e.g. fromnormal and Wishart distributions)
Prove or disprove that the following hold with probability one:
1. BFGS generates an infinite sequence {x} with f
differentiable at all iterates
![Page 160: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/160.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
Assume f is locally Lipschitz with bounded level sets and issemi-algebraic
Assume the initial x and H are generated randomly (e.g. fromnormal and Wishart distributions)
Prove or disprove that the following hold with probability one:
1. BFGS generates an infinite sequence {x} with f
differentiable at all iterates2. Any cluster point x is Clarke stationary
![Page 161: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/161.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
Assume f is locally Lipschitz with bounded level sets and issemi-algebraic
Assume the initial x and H are generated randomly (e.g. fromnormal and Wishart distributions)
Prove or disprove that the following hold with probability one:
1. BFGS generates an infinite sequence {x} with f
differentiable at all iterates2. Any cluster point x is Clarke stationary3. The sequence of function values generated (including all of
the line search iterates) converges to f(x) R-linearly
![Page 162: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/162.jpg)
Challenge: Convergence of BFGS in Nonsmooth Case
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
47 / 48
Assume f is locally Lipschitz with bounded level sets and issemi-algebraic
Assume the initial x and H are generated randomly (e.g. fromnormal and Wishart distributions)
Prove or disprove that the following hold with probability one:
1. BFGS generates an infinite sequence {x} with f
differentiable at all iterates2. Any cluster point x is Clarke stationary3. The sequence of function values generated (including all of
the line search iterates) converges to f(x) R-linearly4. If {x} converges to x where f is “partly smooth” w.r.t. a
manifold M then the subspace defined by the eigenvectorscorresponding to eigenvalues of H converging to zeroconverges to the “V-space” of f w.r.t. M at x
A.S. Lewis and M.L.O., Math Programming, 2013.
![Page 163: On Nesterov’s Nonsmooth Chebyshev-Rosenbrock Functionsaspremon/Houches/talks/lesHouches... · 2016. 3. 6. · Nonsmooth Optimization With BFGS Some Nonsmooth Analysis Nesterov’s](https://reader035.vdocuments.net/reader035/viewer/2022071505/6125d87d77b3002268280c74/html5/thumbnails/163.jpg)
And Finally
Yurii Nesterov
Introduction
Some NonsmoothAnalysis
Nesterov’sChebyshev-RosenbrockFunctions
Other Examples ofBehavior of BFGSon NonsmoothFunctionsMinimizing aProduct ofEigenvalues
BFGS from 10Randomly GeneratedStarting Points
Evolution ofEigenvalues ofA ◦ XEvolution ofEigenvalues of H
Why Did 44Eigenvalues of HConverge to Zero?
Variation of f fromMinimizer, alongEigVecs of H
Minimizing theSpectral Radius
Nonsmooth Analysisof the Spectral
48 / 48
Happy Birthday Yurii!