analysis of optimization algorithms via sum-of-squares
TRANSCRIPT
![Page 1: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/1.jpg)
Analysis of Optimization Algorithms viaSum-of-Squares
Sandra S. Y. Tan Antonios Varvitsiotis (SUTD) Vincent Y. F. Tan
![Page 2: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/2.jpg)
Motivation
![Page 3: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/3.jpg)
First-order optimization algorithms
Iterative algorithms using only (sub)gradient information
• Low computational complexity
• Ideal for large-scale problems with low-accuracy requirements
minx∈Rn
f (x)
GD: xk+1 = xk − γk∇f (xk) x 0
x 1
x 2
x 3 x 4
*
*
Figure 1: Gradient Descent (GD)
Vincent Y. F. Tan 1
![Page 4: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/4.jpg)
First-order optimization algorithms
Iterative algorithms using only (sub)gradient information
• Low computational complexity
• Ideal for large-scale problems with low-accuracy requirements
minx∈Rn
f (x)
GD: xk+1 = xk − γk∇f (xk) x 0
x 1
x 2
x 3 x 4
*
*
Figure 1: Gradient Descent (GD)
Vincent Y. F. Tan 1
![Page 5: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/5.jpg)
Worst-case convergence bounds
Convergence rate of algorithm A wrt function class F systematically
Find minimum contraction factor t ∈ (0, 1) such that
Performance Metric(xk+1) ≤ t Performance Metric(xk)
holds for all f ∈ F and all {xk}k≥1 generated by A
Performance metric
• Objective function accuracy f (xk)− f (x∗), also denoted as fk − f∗
• Squared distance to optimality ‖xk − x∗‖2
• Squared residual gradient norm ‖∇f (xk)‖2, also denoted as ‖gk‖2
Vincent Y. F. Tan 2
![Page 6: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/6.jpg)
Worst-case convergence bounds
Convergence rate of algorithm A wrt function class F systematically
Find minimum contraction factor t ∈ (0, 1) such that
Performance Metric(xk+1) ≤ t Performance Metric(xk)
holds for all f ∈ F and all {xk}k≥1 generated by A
Performance metric
• Objective function accuracy f (xk)− f (x∗), also denoted as fk − f∗
• Squared distance to optimality ‖xk − x∗‖2
• Squared residual gradient norm ‖∇f (xk)‖2, also denoted as ‖gk‖2
Vincent Y. F. Tan 2
![Page 7: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/7.jpg)
Worst-case convergence bounds
Convergence rate of algorithm A wrt function class F systematically
Find minimum contraction factor t ∈ (0, 1) such that
Performance Metric(xk+1) ≤ t Performance Metric(xk)
holds for all f ∈ F and all {xk}k≥1 generated by A
Performance metric
• Objective function accuracy f (xk)− f (x∗), also denoted as fk − f∗
• Squared distance to optimality ‖xk − x∗‖2
• Squared residual gradient norm ‖∇f (xk)‖2, also denoted as ‖gk‖2
Vincent Y. F. Tan 2
![Page 8: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/8.jpg)
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
![Page 9: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/9.jpg)
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
![Page 10: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/10.jpg)
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
![Page 11: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/11.jpg)
Example
L-smooth, µ-strongly convex ((µ, L)-smooth) function
• ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y ∈ Rn
• f (x)− µ2 ‖x‖
2 is convex
Theorem (de Klerk et al., 2017)Given an (µ, L)-smooth function, apply GD with exact line search:
fk+1 − f∗ ≤(L− µL + µ
)2
(fk − f∗)
A sum-of-squares (SOS) proof(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)≥ 0
where q1 and q2 are linear functions of x∗, xk , xk+1, gk , gk+1
Vincent Y. F. Tan 3
![Page 12: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/12.jpg)
More SOS proofs
GD with constant step size, γ ∈ (0, 2L ) (Lessard et al., 2016)
ρ2‖xk − x∗‖2 − ‖xk+1 − x∗‖2
≥
γ(2−γ(L+µ))
L−µ ‖gk − µ(xk − x∗)‖2, γ ∈(0, 2
L+µ
]γ(γ(L+µ)−2)
L−µ ‖gk − L(xk − x∗)‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|}
GD with the Armijo rule(1− 4µε(1− ε)
ηL
)(fk − f∗)− (fk+1− f∗) ≥
2ε(1− ε)η(L− µ)
‖gk + µ(x∗ − xk)‖2,
where ε ∈ (0, 1), η > 1 are the parameters of the Armijo rule
Vincent Y. F. Tan 4
![Page 13: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/13.jpg)
More SOS proofs
GD with constant step size, γ ∈ (0, 2L ) (Lessard et al., 2016)
ρ2‖xk − x∗‖2 − ‖xk+1 − x∗‖2
≥
γ(2−γ(L+µ))
L−µ ‖gk − µ(xk − x∗)‖2, γ ∈(0, 2
L+µ
]γ(γ(L+µ)−2)
L−µ ‖gk − L(xk − x∗)‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|}
GD with the Armijo rule(1− 4µε(1− ε)
ηL
)(fk − f∗)− (fk+1− f∗) ≥
2ε(1− ε)η(L− µ)
‖gk + µ(x∗ − xk)‖2,
where ε ∈ (0, 1), η > 1 are the parameters of the Armijo rule
Vincent Y. F. Tan 4
![Page 14: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/14.jpg)
More SOS proofs: Proximal Gradient Method (PGM)
PGM with exact line search (Taylor et al., 2018)
(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥
(1z
)>Q∗
(1z
)
where z = (f∗, fk , fk+1, x∗, xk , xk+1, . . . ) and Q∗ is a PSD matrix
PGM with constant step size, γ ∈ (0, 2L ) (Taylor et al., 2018)
ρ2‖rk + sk‖2 − ‖rk+1 + s̄k+1‖2
≥
ρ2‖q1‖2 + 2−γ(L+µ)
γ(L−µ) ‖q2‖2, γ ∈(0, 2
L+µ
]ρ2‖q1‖2 + γ(L+µ)−2
γ(L−µ) ‖q3‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|} and q1, q2 and q3 are linearfunctions of sk , s̄k+1, rk , rk+1
Vincent Y. F. Tan 5
![Page 15: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/15.jpg)
More SOS proofs: Proximal Gradient Method (PGM)
PGM with exact line search (Taylor et al., 2018)
(L− µL + µ
)2
(fk − f∗)− (fk+1 − f∗) ≥
(1z
)>Q∗
(1z
)
where z = (f∗, fk , fk+1, x∗, xk , xk+1, . . . ) and Q∗ is a PSD matrix
PGM with constant step size, γ ∈ (0, 2L ) (Taylor et al., 2018)
ρ2‖rk + sk‖2 − ‖rk+1 + s̄k+1‖2
≥
ρ2‖q1‖2 + 2−γ(L+µ)
γ(L−µ) ‖q2‖2, γ ∈(0, 2
L+µ
]ρ2‖q1‖2 + γ(L+µ)−2
γ(L−µ) ‖q3‖2, γ ∈[
2L+µ ,
2L
)where ρ := max{|1− γµ| , |1− γL|} and q1, q2 and q3 are linearfunctions of sk , s̄k+1, rk , rk+1
Vincent Y. F. Tan 5
![Page 16: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/16.jpg)
Related Work
![Page 17: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/17.jpg)
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
![Page 18: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/18.jpg)
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
![Page 19: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/19.jpg)
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
![Page 20: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/20.jpg)
Performance Estimation Problem (PEP)
Framework to derive worst-case bounds (Drori and Teboulle, 2014)
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• GD with exact line search (de Klerk et al., 2017)
• Proximal Gradient Method (PGM) with constant step size (Tayloret al., 2018)
• Proximal Point Algorithm, Conditional Gradient Method, ... (Tayloret al., 2017a); ...
• Formulated as an SDP (exact under certain conditions)
• Inspiration for our current work
Vincent Y. F. Tan 6
![Page 21: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/21.jpg)
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
![Page 22: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/22.jpg)
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
![Page 23: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/23.jpg)
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
![Page 24: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/24.jpg)
Integral Quadratic Constraints (IQCs)
Views optimization algorithms as linear dynamical systems (LDS) withnonlinear feedback (Lessard et al., 2016)
uk yk
φ
ξk
State: ξk+1 = Aξk + Buk
Output: yk = Cξk + Duk
Feedback: uk = φ(yk)
• For first-order methods, uk = φ(yk) := ∇f (yk)
• Convergence of algorithm ≡ stability of dynamical system
• Replace φ with quadratic constraints on all instances of (yk , uk)
Vincent Y. F. Tan 7
![Page 25: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/25.jpg)
The Sum-of-Squares Approach
![Page 26: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/26.jpg)
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
![Page 27: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/27.jpg)
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
![Page 28: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/28.jpg)
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
![Page 29: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/29.jpg)
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
1 : 4 = Q1,1
x : 0 = Q1,2 + Q2,1 = 2Q1,2
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
![Page 30: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/30.jpg)
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
1 : 4 = Q1,1
x : 0 = Q1,2 + Q2,1 = 2Q1,2
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
![Page 31: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/31.jpg)
SDP formulation of sum-of-squares optimization
Sum-of-squares of degree-2 polynomialsFor z = (x , y) and d = 2,
p(z) = (x2 − 2)2 + (x − 3y)2
= x4 − 3x2 − 6xy + 9y2 + 4
=
1xy
x2
xy
y2
>
Q1,1 Q1,2 ...
Q2,1...
...
1xy
x2
xy
y2
where Q � 0
• Match coefficients of 1, x , y , x2, . . . , y4 =⇒ affine constraints
• Affine constraints + Q � 0 condition =⇒ SDP feasibility problem
min 0 s.t. Q � 0, affine constraints
Vincent Y. F. Tan 8
![Page 32: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/32.jpg)
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
![Page 33: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/33.jpg)
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
![Page 34: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/34.jpg)
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z)︸ ︷︷ ︸SOS
+∑i
σi (z)︸ ︷︷ ︸SOS
hi (z) +∑j
θj(z)vj(z)
where a sum-of-squares (SOS) polynomial: σ(z) =∑
k q2k(z)
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
![Page 35: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/35.jpg)
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z)︸ ︷︷ ︸SOS
+∑i
σi (z)︸ ︷︷ ︸SOS
hi (z)︸︷︷︸≥0
+∑j
θj(z) vj(z)︸︷︷︸=0
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
![Page 36: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/36.jpg)
Certifying constrained nonnegativity
Closed semi-algebraic set
K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
for some polynomials hi (z), vj(z)’s
“Constrained” nonnegativityTo certify that p(z) ≥ 0 for all z ∈ K (Parrilo, 2003; Lasserre, 2007),
p(z) = σ0(z)︸ ︷︷ ︸SOS
+∑i
σi (z)︸ ︷︷ ︸SOS
hi (z)︸︷︷︸≥0
+∑j
θj(z) vj(z)︸︷︷︸=0
• Fix degrees of the σi (z)’s and θj(z)’s
• Construct and solve the corresponding SDP feasibility problem
Vincent Y. F. Tan 9
![Page 37: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/37.jpg)
Analyzing optimization algorithms
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
f ∈ Fxk+1 = A (xk , fk , gk)
where fk = f (xk) and gk ∈ ∂f (xk) for all k
Vincent Y. F. Tan 10
![Page 38: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/38.jpg)
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
f ∈ Fxk+1 = A (xk , fk , gk)
}K
where fk = f (xk) and gk ∈ ∂f (xk) for all k
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
![Page 39: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/39.jpg)
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
![Page 40: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/40.jpg)
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
E.g. f is L-smooth =⇒ ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y
Vincent Y. F. Tan 11
![Page 41: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/41.jpg)
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
E.g. f is L-smooth =⇒ ‖∇f (x)−∇f (y)‖ ≤ L‖x− y‖ ∀ x, y=⇒ h1(z) = L2‖xk − x∗‖2 − ‖gk − g∗‖2 ≥ 0
Vincent Y. F. Tan 11
![Page 42: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/42.jpg)
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
![Page 43: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/43.jpg)
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0
t ∈ (0, 1)
hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
}K
where z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1) ∈ R6n+3
1. Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
Vincent Y. F. Tan 11
![Page 44: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/44.jpg)
First relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0 ∀ z ∈ K
t ∈ (0, 1)
where K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
Vincent Y. F. Tan 12
![Page 45: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/45.jpg)
Second relaxation step
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0 ∀ z ∈ K
t ∈ (0, 1)
where K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
2. Relax the constraint that p(z) := t(fk − f∗)− (fk+1 − f∗) isnonnegative over K with an SOS cert
Vincent Y. F. Tan 13
![Page 46: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/46.jpg)
Second relaxation step
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
where K = {z : hi (z) ≥ 0, vj(z) = 0 ∀ i , j}
2. Relax the constraint that p(z) := t(fk − f∗)− (fk+1 − f∗) isnonnegative over K with an SOS cert
Vincent Y. F. Tan 13
![Page 47: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/47.jpg)
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
![Page 48: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/48.jpg)
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
![Page 49: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/49.jpg)
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
![Page 50: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/50.jpg)
Obtaining the bound
1. Identify polynomial (in)equalities necessarily satisfied given F and A
f ∈ Fxk+1 = A (xk , fk , gk)
=⇒hi (z) ≥ 0, ∀ivj(z) = 0, ∀j
2. Relax nonnegativity constraint with an SOS cert, i.e., relax
p(z) ≥ 0 ∀ z ∈ K to p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
for SOS polynomials σi (z) and arbitrary polynomials θj(z)
3. Solve the SDP multiple times for the n = 1 case and ‘guess’ theanalytic expression of the optimal contraction factor
t∗ = min {t : t ∈ (0, 1), SOS constraints}
4. Generalize this to a feasible SDP solution for the multivariate case
Vincent Y. F. Tan 14
![Page 51: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/51.jpg)
Example
![Page 52: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/52.jpg)
Function class
Class of (µ, L)-smooth functions (denoted Fµ,L(Rn))
Theorem (Fµ,L-interpolability (Taylor et al., 2017b))
Given a set {(xi , fi , gi )}i∈I , ∃ a (µ, L)-smooth function f wherefi = f (xi ) and gi ∈ ∂f (xi ) ∀ i iff
fi − fj − g>j (xi − xj) ≥L
2(L− µ)
[1L‖gi − gj‖2 + µ‖xi − xj‖2
− 2µ
L(gj − gi )>(xj − xi )
]∀ i 6= j
E.g. if we set i = k and j = k + 1:
fk − fk+1 − g>k+1(xk − xk+1)− L
2(L− µ)
[1L‖gk − gk+1‖2 + µ‖xk − xk+1‖2
− 2µ
L(gk+1 − gk)>(xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 15
![Page 53: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/53.jpg)
Function class
Class of (µ, L)-smooth functions (denoted Fµ,L(Rn))
Theorem (Fµ,L-interpolability (Taylor et al., 2017b))
Given a set {(xi , fi , gi )}i∈I , ∃ a (µ, L)-smooth function f wherefi = f (xi ) and gi ∈ ∂f (xi ) ∀ i iff
fi − fj − g>j (xi − xj) ≥L
2(L− µ)
[1L‖gi − gj‖2 + µ‖xi − xj‖2
− 2µ
L(gj − gi )>(xj − xi )
]∀ i 6= j
E.g. if we set i = k and j = k + 1:
fk − fk+1 − g>k+1(xk − xk+1)− L
2(L− µ)
[1L‖gk − gk+1‖2 + µ‖xk − xk+1‖2
− 2µ
L(gk+1 − gk)>(xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 15
![Page 54: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/54.jpg)
Function class
Class of (µ, L)-smooth functions (denoted Fµ,L(Rn))
Theorem (Fµ,L-interpolability (Taylor et al., 2017b))
Given a set {(xi , fi , gi )}i∈I , ∃ a (µ, L)-smooth function f wherefi = f (xi ) and gi ∈ ∂f (xi ) ∀ i iff
fi − fj − g>j (xi − xj) ≥L
2(L− µ)
[1L‖gi − gj‖2 + µ‖xi − xj‖2
− 2µ
L(gj − gi )>(xj − xi )
]∀ i 6= j
E.g. if we set i = k and j = k + 1:
fk − fk+1 − g>k+1(xk − xk+1)− L
2(L− µ)
[1L‖gk − gk+1‖2 + µ‖xk − xk+1‖2
− 2µ
L(gk+1 − gk)>(xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 15
![Page 55: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/55.jpg)
Algorithm
Gradient Descent (GD) with exact line search (ELS)
γk = argminγ>0
f (xk − γgk)
xk+1 = xk − γkgk
where gk = ∇f (xk)
Derived constraints
g>k+1 (xk+1 − xk) = 0
g>k+1gk = 0
Vincent Y. F. Tan 16
![Page 56: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/56.jpg)
Algorithm
Gradient Descent (GD) with exact line search (ELS)
γk = argminγ>0
f (xk − γgk)
xk+1 = xk − γkgk
where gk = ∇f (xk)
Derived constraints
g>k+1 (xk+1 − xk) = 0
g>k+1gk = 0
Vincent Y. F. Tan 16
![Page 57: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/57.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
g>k+1 (xk+1 − xk) = 0
g>k+1gk = 0
The SDP
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
σ0(z) : SOS of linear polynomials
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
![Page 58: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/58.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
σ0(z) : SOS of linear polynomials
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
![Page 59: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/59.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) = σ0(z) +∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
σ0(z) : SOS of linear polynomials
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
![Page 60: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/60.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
![Page 61: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/61.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σi (z)hi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi (z), i ≥ 1 : SOS of degree 0, θj(z) : degree 0
Vincent Y. F. Tan 17
![Page 62: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/62.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, i ≥ 1, θj(z) : degree 0
Vincent Y. F. Tan 17
![Page 63: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/63.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θj(z)vj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, i ≥ 1, θj(z) : degree 0
Vincent Y. F. Tan 17
![Page 64: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/64.jpg)
Applying the SOS approach
Necessary constraints
• 6 Fµ,L-interpolability constraints corresponding to k, k + 1, ∗(hi (z) ≥ 0)
• 2 algorithmic constraints (vj(z) = 0)
The SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, i ≥ 1, θj ∈ R
Vincent Y. F. Tan 17
![Page 65: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/65.jpg)
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
![Page 66: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/66.jpg)
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
![Page 67: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/67.jpg)
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t =
2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
![Page 68: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/68.jpg)
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2
− σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
![Page 69: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/69.jpg)
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2
− σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
![Page 70: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/70.jpg)
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗)
=
1f∗...
>
Q1,1 Q1,2 . . .
Q2,1. . .
...
1f∗...
+∑i
σihi (z) +∑j
θjvj(z)
Matching coefficients for e.g. f∗ term:
1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
Constraint h2(z) ≥ 0:
fk − f∗ −L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (x∗ − xk)
]≥ 0
Vincent Y. F. Tan 18
![Page 71: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/71.jpg)
Deriving the affine constraints
t(fk − f∗)− (fk+1 − f∗) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
1 : 0 = Q1,1
f∗ : 1− t = 2Q1,2 − σ2 − σ4 + σ5 + σ6
fk : t = 2Q1,3 + σ1 + σ2 − σ3 − σ5
fk+1 : −1 = 2Q1,4 − σ1 + σ3 + σ4 − σ6
...
Vincent Y. F. Tan 19
![Page 72: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/72.jpg)
Solving the SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, θj ∈ R
Guessing the optimal solution
t∗ =
(L− µ
L+ µ
)2, Q∗ =
(04×4 04×5
05×4 Q∗5×5
), σ∗
1 =L− µ
L+ µ, . . .
Q∗5×5 =
2L2µ2(L+µ)2(L−µ)
− Lµ2(L+µ)2
− Lµ2(L+µ)(L−µ)
Lµ
(L+µ)2Lµ
(L+µ)(L−µ)Lµ(L+3µ)2(L+µ)2
− Lµ2(L+µ) −µ(3L+µ)
2(L+µ)2− µ
2(L+µ)Lµ
2(L−µ)µ
2(L+µ) − µ2(L−µ)
L+3µ2(L+µ)2
12(L+µ)
12(L−µ)
Vincent Y. F. Tan 20
![Page 73: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/73.jpg)
Solving the SDP
min t
s.t. p(z) =
(1z
)>Q
(1z
)+∑i
σihi (z) +∑j
θjvj(z)
t ∈ (0, 1)
Q � 0
σi ≥ 0, θj ∈ R
Guessing the optimal solution
t∗ =
(L− µ
L+ µ
)2, Q∗ =
(04×4 04×5
05×4 Q∗5×5
), σ∗
1 =L− µ
L+ µ, . . .
Q∗5×5 =
2L2µ2(L+µ)2(L−µ)
− Lµ2(L+µ)2
− Lµ2(L+µ)(L−µ)
Lµ
(L+µ)2Lµ
(L+µ)(L−µ)Lµ(L+3µ)2(L+µ)2
− Lµ2(L+µ) −µ(3L+µ)
2(L+µ)2− µ
2(L+µ)Lµ
2(L−µ)µ
2(L+µ) − µ2(L−µ)
L+3µ2(L+µ)2
12(L+µ)
12(L−µ)
Vincent Y. F. Tan 20
![Page 74: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/74.jpg)
Multivariate case
Recall z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1)
Let z = (z0, z1, . . . , zn) where
z0 = (f∗, fk , fk+1) and z` = (x∗(`), xk(`), xk+1(`), g∗(`), gk(`), gk+1(`))
Require polynomials p(z), hi (z) and vj(z) satisfy
pol(z) = pol0(z0) +n∑`=1
pol1(z`)
• Fulfilled when polynomials are linear in f ’s and in the inner productsof x’s and g’s e.g.
fk−f∗−L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 21
![Page 75: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/75.jpg)
Multivariate case
Recall z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1)
Let z = (z0, z1, . . . , zn) where
z0 = (f∗, fk , fk+1) and z` = (x∗(`), xk(`), xk+1(`), g∗(`), gk(`), gk+1(`))
Require polynomials p(z), hi (z) and vj(z) satisfy
pol(z) = pol0(z0) +n∑`=1
pol1(z`)
• Fulfilled when polynomials are linear in f ’s and in the inner productsof x’s and g’s e.g.
fk−f∗−L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 21
![Page 76: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/76.jpg)
Multivariate case
Recall z = (f∗, fk , fk+1, x∗, xk , xk+1, g∗, gk , gk+1)
Let z = (z0, z1, . . . , zn) where
z0 = (f∗, fk , fk+1) and z` = (x∗(`), xk(`), xk+1(`), g∗(`), gk(`), gk+1(`))
Require polynomials p(z), hi (z) and vj(z) satisfy
pol(z) = pol0(z0) +n∑`=1
pol1(z`)
• Fulfilled when polynomials are linear in f ’s and in the inner productsof x’s and g’s e.g.
fk−f∗−L
2(L− µ)
[1L‖gk‖2 + µ‖xk − x∗‖2 + 2
µ
Lg>k (xk+1 − xk)
]≥ 0
Vincent Y. F. Tan 21
![Page 77: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/77.jpg)
Multivariate case
Univariate case
p(z) =
1z0
z1
>0
Q0
Q1
1z0
z1
+∑i
σihi (z) +∑j
θjvj(z)
Multivariate case
p(z) =
1z0
z1...
zn
>0Q0
Q1 ⊗ In
1z0
z1...
zn
+∑i
σihi (z) +∑j
θjvj(z)
SOS cert for univariate case =⇒ SOS cert for multivariate case
Vincent Y. F. Tan 22
![Page 78: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/78.jpg)
Multivariate case
Univariate case
p(z) =
1z0
z1
>0
Q0
Q1
1z0
z1
+∑i
σihi (z) +∑j
θjvj(z)
Multivariate case
p(z) =
1z0
z1...
zn
>0Q0
Q1 ⊗ In
1z0
z1...
zn
+∑i
σihi (z) +∑j
θjvj(z)
SOS cert for univariate case =⇒ SOS cert for multivariate case
Vincent Y. F. Tan 22
![Page 79: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/79.jpg)
Multivariate case
Univariate case
p(z) =
1z0
z1
>0
Q0
Q1
1z0
z1
+∑i
σihi (z) +∑j
θjvj(z)
Multivariate case
p(z) =
1z0
z1...
zn
>0Q0
Q1 ⊗ In
1z0
z1...
zn
+∑i
σihi (z) +∑j
θjvj(z)
SOS cert for univariate case =⇒ SOS cert for multivariate case
Vincent Y. F. Tan 22
![Page 80: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/80.jpg)
Obtaining the SOS certificate
t∗(fk − f∗)− (fk+1 − f∗)(1)=
(1z
)>Q∗
(1z
)+∑i
σ∗i hi (z) +∑j
θ∗j vj(z)
(2)≥
(1z
)>Q∗
(1z
)(3)=µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)
1. Verifying the solution to be feasible
2. Since σ∗i , hi (z) ≥ 0 and vj(z) = 0
3. Expanding the quadratic term into its SOS decomposition
Vincent Y. F. Tan 23
![Page 81: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/81.jpg)
Obtaining the SOS certificate
t∗(fk − f∗)− (fk+1 − f∗)(1)=
(1z
)>Q∗
(1z
)+∑i
σ∗i hi (z) +∑j
θ∗j vj(z)
(2)≥
(1z
)>Q∗
(1z
)
(3)=µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)
1. Verifying the solution to be feasible
2. Since σ∗i , hi (z) ≥ 0 and vj(z) = 0
3. Expanding the quadratic term into its SOS decomposition
Vincent Y. F. Tan 23
![Page 82: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/82.jpg)
Obtaining the SOS certificate
t∗(fk − f∗)− (fk+1 − f∗)(1)=
(1z
)>Q∗
(1z
)+∑i
σ∗i hi (z) +∑j
θ∗j vj(z)
(2)≥
(1z
)>Q∗
(1z
)(3)=µ
4
(‖q1‖2
1 +õ/L
+‖q2‖2
1−√µ/L
)
1. Verifying the solution to be feasible
2. Since σ∗i , hi (z) ≥ 0 and vj(z) = 0
3. Expanding the quadratic term into its SOS decomposition
Vincent Y. F. Tan 23
![Page 83: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/83.jpg)
Revisiting the PEP
![Page 84: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/84.jpg)
The PEP as an SDP
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
![Page 85: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/85.jpg)
The PEP as an SDP
max fN − f∗
s.t. f ∈ Fxk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
![Page 86: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/86.jpg)
The PEP as an SDP
max fN − f∗
s.t. ∃f ∈ F s.t. Of (xi ) = {fi , gi} ∀ i
xk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
![Page 87: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/87.jpg)
The PEP as an SDP
max fN − f∗
s.t. ∃f ∈ F s.t. Of (xi ) = {fi , gi} ∀ i
xk+1 = A (xk , fk , gk) , k = 0, . . . ,N − 1
f0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
• Linear in f ’s and in the inner products of x’s and g’s =⇒ Grammatrices
• Formulated as an SDP
Vincent Y. F. Tan 24
![Page 88: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/88.jpg)
The PEP as an SDP
max fN − f∗
s.t. hi (z) ≥ 0, ∀ivj(z) = 0, ∀jf0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A
• Linear in f ’s and in the inner products of x’s and g’s =⇒ Grammatrices
• Formulated as an SDP
Vincent Y. F. Tan 24
![Page 89: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/89.jpg)
The PEP as an SDP
max fN − f∗
s.t. hi (z) ≥ 0, ∀ivj(z) = 0, ∀jf0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
![Page 90: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/90.jpg)
The PEP as an SDP
max fN − f∗
s.t. hi (z) ≥ 0, ∀ivj(z) = 0, ∀jf0 − f∗ ≤ R
• Assume black-box model using oracle Of
• Identify polynomial inequalities hi (z) ≥ 0 and equalities vj(z) = 0necessarily satisfied given F and A• Linear in f ’s and in the inner products of x’s and g’s =⇒ Gram
matrices
• Formulated as an SDP
Vincent Y. F. Tan 24
![Page 91: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/91.jpg)
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
![Page 92: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/92.jpg)
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
![Page 93: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/93.jpg)
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
![Page 94: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/94.jpg)
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
![Page 95: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/95.jpg)
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
![Page 96: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/96.jpg)
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒
Deg-1 SOS-SDP
Vincent Y. F. Tan 25
![Page 97: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/97.jpg)
Relation between PEP and SOS
max fN − f∗
s.t. {xi} generated by Af ∈ Ff0 − f∗ ≤ R
(PEP)
↓
N-step PEP-SDP
↓
1-step PEP-SDP
min t
s.t. t(fk − f∗)− (fk+1 − f∗) ≥ 0,
∀ {xi} generated by A and
f ∈ F(SOS)
↓
Deg-d SOS-SDP
↓
⇐= Dual =⇒ Deg-1 SOS-SDP
Vincent Y. F. Tan 25
![Page 98: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/98.jpg)
Results
![Page 99: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/99.jpg)
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Set ε and η to be parameters related to the Armijo rule, and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
GD ELS fk − f∗(
L−µL+µ
)2 de Klerk et al.
(2017)
γ ∈ (0, 2L ) ‖xk − x∗‖2 ρ2
γ
Lessard et al.
(2016)
Armijo rule fk − f∗ 1− 4µε(1−ε)ηL New
Vincent Y. F. Tan 26
![Page 100: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/100.jpg)
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Set ε and η to be parameters related to the Armijo rule, and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
GD ELS fk − f∗(
L−µL+µ
)2 de Klerk et al.
(2017)
γ ∈ (0, 2L ) ‖xk − x∗‖2 ρ2
γ
Lessard et al.
(2016)
Armijo rule fk − f∗ 1− 4µε(1−ε)ηL New
Vincent Y. F. Tan 26
![Page 101: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/101.jpg)
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Set ε and η to be parameters related to the Armijo rule, and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
GD ELS fk − f∗(
L−µL+µ
)2 de Klerk et al.
(2017)
γ ∈ (0, 2L ) ‖xk − x∗‖2 ρ2
γ
Lessard et al.
(2016)
Armijo rule fk − f∗ 1− 4µε(1−ε)ηL New
Vincent Y. F. Tan 26
![Page 102: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/102.jpg)
GD with Armijo-terminated Line Search
Our result tnew vs. known contraction factors tLY (Luenberger and Ye,2016) and tnemi (Nemirovski, 1999):
tnew = 1− 4ε(1− ε)ηκ
tLY = 1− 2εηκ
tnemi =κ− (2− ε−1)(1− ε)η−1
κ+ (ε−1 − 1)η−1
2 4 6 8 10
Condition number,
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tLY, = 1.001
Figure 2: tnew and tLY for ε = 0.25
2 4 6 8 10
Condition number,
0
0.2
0.4
0.6
0.8
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tnemi, = 1.001
Figure 3: tnew and tnemi for ε = 0.5
Condition number, κ = L/µ
Vincent Y. F. Tan 27
![Page 103: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/103.jpg)
GD with Armijo-terminated Line Search
Our result tnew vs. known contraction factors tLY (Luenberger and Ye,2016) and tnemi (Nemirovski, 1999):
tnew = 1− 4ε(1− ε)ηκ
tLY = 1− 2εηκ
tnemi =κ− (2− ε−1)(1− ε)η−1
κ+ (ε−1 − 1)η−1
2 4 6 8 10
Condition number,
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tLY, = 1.001
Figure 2: tnew and tLY for ε = 0.25
2 4 6 8 10
Condition number,
0
0.2
0.4
0.6
0.8
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tnemi, = 1.001
Figure 3: tnew and tnemi for ε = 0.5
Condition number, κ = L/µ
Vincent Y. F. Tan 27
![Page 104: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/104.jpg)
GD with Armijo-terminated Line Search
Our result tnew vs. known contraction factors tLY (Luenberger and Ye,2016) and tnemi (Nemirovski, 1999):
tnew = 1− 4ε(1− ε)ηκ
tLY = 1− 2εηκ
tnemi =κ− (2− ε−1)(1− ε)η−1
κ+ (ε−1 − 1)η−1
2 4 6 8 10
Condition number,
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tLY, = 1.001
Figure 2: tnew and tLY for ε = 0.25
2 4 6 8 10
Condition number,
0
0.2
0.4
0.6
0.8
1
Con
trac
tion
fact
or, t
tnew, = 1.001
tnemi, = 1.001
Figure 3: tnew and tnemi for ε = 0.5
Condition number, κ = L/µ
Vincent Y. F. Tan 27
![Page 105: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/105.jpg)
A new function class: Composite functions
Composite convex minimization problem
minx∈Rn{f (x) := a(x) + b(x)} ,
where a ∈ Fµ,L and b is closed, convex and proper.
Further assumptionsAssume that the proximal operator of b,
proxγb(x) := arg miny∈Rn
{γb(y) +
12‖x− y‖2
},
exists at every x
Vincent Y. F. Tan 28
![Page 106: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/106.jpg)
A new function class: Composite functions
Composite convex minimization problem
minx∈Rn{f (x) := a(x) + b(x)} ,
where a ∈ Fµ,L and b is closed, convex and proper.
Further assumptionsAssume that the proximal operator of b,
proxγb(x) := arg miny∈Rn
{γb(y) +
12‖x− y‖2
},
exists at every x
Vincent Y. F. Tan 28
![Page 107: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/107.jpg)
Algorithm
Proximal gradient method (PGM) with constant step size γ > 0
xk+1 = proxγb (xk − γ∇a(xk)) ,
where 0 ≤ γ ≤ 2L
PGM with exact line search
γk = argminγ>0
f[proxγb (xk − γ∇a(xk))
]xk+1 = proxγb (xk − γk∇a(xk))
Vincent Y. F. Tan 29
![Page 108: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/108.jpg)
Algorithm
Proximal gradient method (PGM) with constant step size γ > 0
xk+1 = proxγb (xk − γ∇a(xk)) ,
where 0 ≤ γ ≤ 2L
PGM with exact line search
γk = argminγ>0
f[proxγb (xk − γ∇a(xk))
]xk+1 = proxγb (xk − γk∇a(xk))
Vincent Y. F. Tan 29
![Page 109: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/109.jpg)
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Let gk denotes a (sub)gradient of f at xk , and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
PGM γ ∈ (0, 2L ) ‖gk‖2 ρ2
γ
Taylor et al.
(2018)
ELS fk − f∗(
L−µL+µ
)2 Taylor et al.
(2018)
Vincent Y. F. Tan 30
![Page 110: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/110.jpg)
Summary of Results
Form of bounds: Performance metric(xk+1) ≤ t∗ Performance metric(xk)
Let gk denotes a (sub)gradient of f at xk , and
ργ := max{|1− γµ| , |1− γL|}
Method Step size γ Metric Rate t∗ Known?
PGM γ ∈ (0, 2L ) ‖gk‖2 ρ2
γ
Taylor et al.
(2018)
ELS fk − f∗(
L−µL+µ
)2 Taylor et al.
(2018)
Vincent Y. F. Tan 30
![Page 111: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/111.jpg)
Conclusion
![Page 112: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/112.jpg)
Conclusion
Introduced an SDP hierarchy for bounding convergence rates via SOScertificates⇒ First level coincides with the PEP
Future work
• Other function classes
• Other algorithms
• Understanding when SOS certificates cannot be found
Links
arXiv paper: http://arxiv.org/abs/1906.04648github codes: https://github.com/sandratsy/SumsOfSquares
Vincent Y. F. Tan 31
![Page 113: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/113.jpg)
Conclusion
Introduced an SDP hierarchy for bounding convergence rates via SOScertificates⇒ First level coincides with the PEP
Future work
• Other function classes
• Other algorithms
• Understanding when SOS certificates cannot be found
Links
arXiv paper: http://arxiv.org/abs/1906.04648github codes: https://github.com/sandratsy/SumsOfSquares
Vincent Y. F. Tan 31
![Page 114: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/114.jpg)
Thank you!
Vincent Y. F. Tan 31
![Page 115: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/115.jpg)
References I
References
de Klerk, E., Glineur, F., and Taylor, A. B. (2017). On the worst-casecomplexity of the gradient method with exact line search for smoothstrongly convex functions. Optimization Letters, 11(7):1185–1199.
Drori, Y. and Teboulle, M. (2014). Performance of first-order methodsfor smooth convex minimization: A novel approach. MathematicalProgramming, 145(1):451–482.
Lasserre, J. (2007). A sum of squares approximation of nonnegativepolynomials. SIAM Review, 49(4):651–669.
Lessard, L., Recht, B., and Packard, A. (2016). Analysis and design ofoptimization algorithms via integral quadratic constraints. SIAMJournal on Optimization, 26(1):57–95.
![Page 116: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/116.jpg)
References II
Luenberger, D. G. and Ye, Y. (2016). Linear and NonlinearProgramming. Springer International Publishing, 4th edition.
Nemirovski, A. (1999). Optimization II: Numerical methods for nonlinearcontinuous optimization.https://www2.isye.gatech.edu/~nemirovs/Lect_OptII.pdf.
Parrilo, P. A. (2003). Semidefinite programming relaxations forsemialgebraic problems. Mathematical Programming, 96(2):293–320.
Taylor, A., Hendrickx, J., and Glineur, F. (2017a). Exact worst-caseperformance of first-order methods for composite convex optimization.SIAM Journal on Optimization, 27(3):1283–1313.
Taylor, A. B., Hendrickx, J. M., and Glineur, F. (2017b). Smoothstrongly convex interpolation and exact worst-case performance offirst-order methods. Mathematical Programming, 161(1):307–345.
![Page 117: Analysis of Optimization Algorithms via Sum-of-Squares](https://reader031.vdocuments.net/reader031/viewer/2022012100/6169e83111a7b741a34caeef/html5/thumbnails/117.jpg)
References III
Taylor, A. B., Hendrickx, J. M., and Glineur, F. (2018). Exact worst-caseconvergence rates of the proximal gradient method for compositeconvex minimization. Journal of Optimization Theory andApplications, 178(2):455–476.