constrained optimization: indirect...
TRANSCRIPT
Constrained optimization:
indirect methods
Jussi Hakanen
Post-doctoral researcher [email protected]
spring 2014 TIES483 Nonlinear optimization
On constrained optimization
We have seen how to characterize optimal
solutions in constrained optimization
– KKT optimality conditions include the balance of
forces (−𝛻𝑓 𝑥∗ , 𝛻𝑔𝑖 𝑥∗ , 𝑖 ∈ 𝐼 and 𝛻ℎ𝑗(𝑥
∗)) and
complementarity conditions (𝜇𝑖𝑔𝑖 𝑥∗ = 0 ∀𝑖)
– Regularity of 𝑥∗ need to be assumed
Now, we are interested in how to find such
solutions
spring 2014 TIES483 Nonlinear optimization
Methods for constrained optimization
Many methods utilize knowledge about the constraints – Linear inequalities or linear equalities
– Nonlinear inequalities or equalities
For example, if a linear constraint is active at some point, you know that by taking steps along the direction of the constraint, it remains active
For nonlinear constraints, you don’t have such a direction
Methods for constrained optimization can be characterized based on how they treat constraints
spring 2014 TIES483 Nonlinear optimization
Classification of the methods
Indirect methods: the constrained problem is
converted into a sequence of unconstrained
problems whose solutions will approach to the
solution of the constrained problem, the
intermediate solutions need not to be feasible
Direct methods: the constraints are taking into
account explicitly, intermediate solutions are
feasible
spring 2014 TIES483 Nonlinear optimization
Transforming the optimization
problem
Constraints of the problem can be transformed if needed
𝑔𝑖 𝑥 ≤ 0 ⟺ 𝑔𝑖 𝑥 + 𝑦𝑖2 = 0, where 𝑦𝑖 is a
slack variable; constraint is active if 𝑦𝑖 = 0
– By adding 𝑦𝑖2 no need to add 𝑦𝑖 ≥ 0
– If 𝑔𝑖(𝑥) is linear, then linearity is preserved by 𝑔𝑖 𝑥 + 𝑦𝑖 = 0, 𝑦𝑖 ≥ 0
𝑔𝑖 𝑥 ≥ 0 ⟺ −𝑔𝑖 𝑥 ≤ 0
ℎ𝑖 𝑥 = 0 ⟺ ℎ𝑖 𝑥 ≤ 0 & − ℎ𝑖 𝑥 ≤ 0
spring 2014 TIES483 Nonlinear optimization
Examples of indirect methods
Penalty function methods
Lagrangian methods
spring 2014 TIES483 Nonlinear optimization
Penalty function methods
Include constraints into the objective function with the help of penalty functions that penalize constraint violations or even approaching the boundary of 𝑆
Different types – Penalty function: penalize for constraint violations
– Barrier function: prevents leaving the feasible region
– Exact penalty function
Resulting unconstrained problems can be solved by using the methods presented earlier in the course
spring 2014 TIES483 Nonlinear optimization
Penalty function methods
Generate a sequence of points that approach
the feasible region from outside
Constrained problem is converted into
min𝑥∈𝑅𝑛
𝑓 𝑥 + 𝑟 𝛼(𝑥),
where 𝛼(𝑥) is a penalty function and 𝑟 is a
penalty parameter
Requirements: 𝛼 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑅𝑛 and 𝛼 𝑥 = 0
if and only if 𝑥 ∈ 𝑆
spring 2014 TIES483 Nonlinear optimization
On convergence
When 𝑟 → ∞, the solutions 𝑥𝑟 of penalty
function problems converge to a constrained
minimizer (𝑥𝑟 → 𝑥∗ and 𝑟𝛼 𝑥𝑟 → 0)
– All the functions should be continuous
– For each 𝑟, there should exist a solution for penalty
functions problem and {𝑥𝑟} belongs to a compact
subset of 𝑅𝑛
spring 2014 TIES483 Nonlinear optimization
Examples of penalty functions
Can you give an example of a penalty function
𝛼(𝑥)?
For equality constraints
– ℎ𝑖 𝑥 = 0 ⟶ 𝛼 𝑥 = ℎ𝑖 𝑥2𝑙
𝑖=1 or
𝛼 𝑥 = |ℎ𝑖(𝑥)|𝑝𝑙
𝑖=1 , 𝑝 ≥ 2
For inequality constraints
– 𝑔𝑖 𝑥 ≤ 0 ⟶ 𝛼 𝑥 = max 0, 𝑔𝑖(𝑥)𝑚𝑖=1 or
𝛼 𝑥 = max 0, 𝑔𝑖 𝑥𝑝, 𝑝 ≥ 2𝑚
𝑖=1
spring 2014 TIES483 Nonlinear optimization
How to choose 𝑟?
Should be large enough in order for the solutions be close enough to the feasible region
If 𝑟 is too large, there could be numerical problems in solving the penalty problems
For large values of 𝑟, the emphasis is on finding feasible solutions and, thus, the solution can be feasible but far from optimum
Typically 𝑟 is updated iteratively
Different parameters can be used for different constraints (e.g. 𝑔𝑖 ⟶ 𝑟𝑖 , 𝑔𝑗 ⟶ 𝑟𝑗) – For the sake of simplicity, same parameter is used here for all
the constraints
spring 2014 TIES483 Nonlinear optimization
Algorithm
1) Choose the final tolerance 𝜖 > 0 and a starting point 𝑥1. Choose 𝑟1 > 0 (not too large) and set ℎ = 1.
2) Solve min
𝑥∈𝑅𝑛𝑓 𝑥 + 𝑟ℎ𝛼(𝑥)
with some method for unconstrained problems (𝑥ℎ as a starting point). Let the solution be 𝑥ℎ+1 = 𝑥(𝑟ℎ).
3) Test optimality: If 𝑟ℎ𝛼 𝑥ℎ+1 < 𝜖, stop. Solution 𝑥ℎ+1 is close enough to optimum. Otherwise, set 𝑟ℎ+1 > 𝑟ℎ (e.g. 𝑟ℎ+1 = 𝜅𝑟ℎ, where 𝜅 can be initialized to be e.g. 10). Set ℎ = ℎ + 1 and go to 2).
spring 2014 TIES483 Nonlinear optimization
Example
min 𝑥 𝑠. 𝑡. −𝑥 + 2 ≤ 0
Let 𝛼 𝑥 = max [0, (−𝑥 + 2)] 2
Then 𝛼 𝑥 =0, 𝑖𝑓 𝑥 ≥ 2
−𝑥 + 2 2, 𝑖𝑓 𝑥 < 2
Minimum of 𝑓 + 𝑟𝛼 is at 2 −1
2𝑟
spring 2014 TIES483 Nonlinear optimization
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
)
Barrier function method
Prevents leaving the feasible region
Suitable only for problems with equality constraints – Set 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} should not be empty
Problem to be solved is min
𝑟Θ 𝑟 𝑠. 𝑡. 𝑟 ≥ 0,
where Θ 𝑟 = inf 𝑥
𝑓 𝑥 + 𝑟𝛽 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖]
𝛽 is a barrier function: 𝛽 𝑥 ≥ 0 when 𝑔𝑖 𝑥 < 0 ∀𝑖 and 𝛽 𝑥 → ∞ when 𝑥 approaches boundary of 𝑆
Constraints 𝑔𝑖 𝑥 < 0 can be omitted since 𝛽 → ∞ in the boundary of 𝑆
spring 2014 TIES483 Nonlinear optimization
On convergence
Denote Θ 𝑟 = 𝑓 𝑥𝑟 + 𝑟𝛽(𝑥𝑟)
Under some assumptions, the solutions 𝑥𝑟 of
barrier problems converge to a constrained
minimizer (𝑥𝑟 → 𝑥∗ and 𝑟𝛽 𝑥𝑟 → 0) when
𝑟 → 0+
– All functions should be continuous
– 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} ≠ ∅
spring 2014 TIES483 Nonlinear optimization
Properties of barrier functions
Nonnegative and continuous in 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} Approaches ∞ when the boundary of the feasible region is approached from inside
Ideally: 𝛽 = 0 in 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} and 𝛽 = ∞ in the boundary – Guarantees staying in the feasible region
– This kind of discontinuity causes problems for any numerical method
Examples of barrier functions
– 𝛽 𝑥 = −1
𝑔𝑖 𝑥𝑚𝑖=1
– 𝛽 𝑥 = − ln (min[1, −𝑔𝑖(𝑥)])𝑚𝑖=1
spring 2014 TIES483 Nonlinear optimization
Algorithm
1) Choose the final tolerance 𝜖 > 0 and a starting point 𝑥1 s.t. 𝑔𝑖 𝑥 < 0 ∀𝑖. Choose 𝑟1 > 0, not too small (and a parameter 0 < 𝜏 < 1 for reducing 𝑟). Set ℎ = 1.
2) Solve min
𝑥𝑓 𝑥 + 𝑟ℎ𝛽(𝑥) 𝑠. 𝑡. 𝑔𝑖 𝑥 < 0 ∀𝑖
by using the starting point 𝑥ℎ. Let the solution be 𝑥ℎ+1.
3) Test optimality: If 𝑟ℎ𝛽 𝑥ℎ+1 < 𝜖, stop. Solution
𝑥ℎ+1 is close enough to optimum. Otherwise, set 𝑟ℎ+1 < 𝑟ℎ (e.g. 𝑟ℎ+1 = 𝜏𝑟ℎ). Set ℎ = ℎ + 1 and go to 2).
spring 2014 TIES483 Nonlinear optimization
Example
spring 2014 TIES483 Nonlinear optimization
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
) min 𝑥 𝑠. 𝑡. −𝑥 + 1 ≤ 0
Let 𝛽 𝑥 =−1
−𝑥+1 when 𝑥 ≠ 1
Minimum of 𝑓 𝑥 + 𝑟𝛽 𝑥 =𝑥 + 𝑟 𝑥 − 1 −1 is at 1 + 𝑟
Summary: penalty and barrier
function methods
Penalty and barrier functions usually differentiable
Minimum is obtained in a limit – Penalty function: 𝑟ℎ → ∞
– Barrier function: 𝑟ℎ → 0
Choosing the sequence 𝑟ℎ essential for convergence – If 𝑟ℎ → ∞ or 𝑟ℎ → 0 too slowly, a large number of
unconstrained problems need to be solved
– If 𝑟ℎ → ∞ or 𝑟ℎ → 0 too fast, solutions of successive unconstrained problems are far from each other and solution time increases
spring 2014 TIES483 Nonlinear optimization
Exact penalty function
Idea is to have a method where the solution could be found with a small amount of iterations
Suitable for both equality and inequality constraints
An exact penalty function problem is e.g. of the form
min𝑥∈𝑅𝑛
𝑓 𝑥 + 𝑟( max[0, 𝑔𝑖 𝑥 ] + |ℎ𝑖(𝑥)|𝑙𝑖=1
𝑚𝑖=1 )
spring 2014 TIES483 Nonlinear optimization
Exact penalty function method
Theorem: Consider a point 𝑥 where the necessary KKT conditions hold. Let the corresponding Lagrange multipliers be 𝜇 and 𝜈. Assume that objective and inequality constraint functions are convex and equality constraint functions are affine. Then 𝑥 is a solution of the exact penalty function problem with 𝑟 ≥ max[𝜇𝑖 , 𝑖 = 1,… ,𝑚, 𝜈𝑖 , 𝑖 = 1,… , 𝑙] Solution can be obtained with a finite value for the penalty parameter 𝑟
Algorithm is similar to penalty function method except for that 𝑟ℎ is increased only if necessary – E.g. when the feasible region is not approached fast enough
spring 2014 TIES483 Nonlinear optimization
Properties of exact penalty function
Not differentiable in points 𝑥 where 𝑔𝑖 𝑥 = 0
or ℎ𝑖 𝑥 = 0
– Gradient based methods are not suitable
If 𝑟 and the starting point could be chosen
appropriately, only one minimization would be
required in principle
– If 𝑟 is too large and the starting point is not close
enough to the optimum, minimizing the exact
penalty function problem could become difficult
spring 2014 TIES483 Nonlinear optimization
Example
min 𝑓 𝑥 = 𝑥12 + 𝑥2
2 𝑠. 𝑡. 𝑥1 + 𝑥2 − 1 = 0
Optimal solution is 𝑥∗ =1
2,1
2
𝑇, 𝜈∗ = −2𝑥1
∗ = −2𝑥2∗ = −1
Exact penalty function problem: min
𝑥∈𝑅𝑛𝑥12 + 𝑥2
2 + 𝑟|𝑥1 + 𝑥2 − 1|
Solution: 𝑥∗ =𝑟
2,𝑟
2
𝑇when 0 ≤ 𝑟 < 1 and 𝑥∗ =
1
2,1
2
𝑇
when 𝑟 ≥ 1 – (obtained by using KKT conditions of an equivalent
differentiable problem where the absolute value term is replaced with a new variable and two inequality constraints)
Thus, the solution can be found with 𝑟 ≥ 1 (= |𝜈∗|)
spring 2014 TIES483 Nonlinear optimization
Example: barrier function
min 𝑓 𝑥 = 𝑥1𝑥22 𝑠. 𝑡. 𝑥1
2+𝑥22 − 2 ≤ 0
𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗
(a) level curves of 𝑓(𝑥) and the boundary of 𝑆
Logarithmic barrier function: (b) 𝑟 = 0.2, (c) 𝑟 = 0.001
spring 2014 TIES483 Nonlinear optimization
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
)
Example: penalty function
min 𝑓 𝑥 = 𝑥1𝑥22 𝑠. 𝑡. 𝑥1
2+𝑥22 − 2 ≤ 0
𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗
Quadratic penalty function: (a) 𝑟 = 1, (b) 𝑟 = 100
spring 2014 TIES483 Nonlinear optimization
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
)
Example: exact penalty function
min 𝑓 𝑥 = 𝑥1𝑥22 𝑠. 𝑡. 𝑥1
2+𝑥22 − 2 ≤ 0
𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗, 𝜇∗ = 0.8165
Exact penalty function: (a) 𝑟 = 1.2, (b) 𝑟 = 5, (c) 𝑟 = 100
spring 2014 TIES483 Nonlinear optimization
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
)
Lagrangian function
Consider problem
min𝑓(𝑥) 𝑠. 𝑡. ℎ𝑖 𝑥 = 0, 𝑖 = 1,… , 𝑙
Lagrangian function
𝐿 𝑥, 𝜈 = 𝑓 𝑥 + 𝜈𝑖ℎ𝑖(𝑥)𝑙𝑖=1
KKT conditions
𝛻𝑓 𝑥 + 𝜈𝑖𝛻ℎ𝑖(𝑥)
𝑙𝑖=1 = 0
ℎ𝑖 𝑥 = 0, 𝑖 = 1,… , 𝑙
Let 𝑥∗ be a minimizer and 𝜈∗ corresponding
Lagrange multiplier
spring 2014 TIES483 Nonlinear optimization
Properties of Lagrangian
KKT conditions: 𝑥∗ is a critical point of the
Lagrangian function
– 𝑥∗ is not necessarily a minimizer for 𝐿(𝑥, 𝜈∗)
Thus, minimizing the Lagrangian function
doesn’t necessarily give a minimum for 𝑓(𝑥)
– Hessian 𝛻𝑥𝑥2 𝐿(𝑥∗, 𝜈∗) may be indefinite → a saddle
point
Improve Lagrangian function!
spring 2014 TIES483 Nonlinear optimization
Augmented Lagrangian function
Augmented Lagrangian function:
𝐿𝐴 𝑥, 𝜈, 𝜚 = 𝑓 𝑥 + 𝜈𝑖ℎ𝑖(𝑥)𝑙𝑖=1 +
1
2𝜚 ℎ𝑖 𝑥
2𝑙𝑖=1 , 𝜚 > 0
– Lagrangian function + quadratic penalty function
A point (𝑥∗, 𝜈∗) is a critical point of the augmented Lagrangian
– 𝛻x𝐿𝐴 𝑥∗, 𝜈∗, 𝜚 = 0 and 1
2𝜚 ℎ𝑖 𝑥
∗ 2 = 0𝑙𝑖=1
Hessian: 𝛻𝑥𝑥
2 𝐿𝐴 𝑥∗, 𝜈∗, 𝜚 = 𝛻𝑥𝑥2 𝐿 𝑥∗, 𝜈∗ + 𝜚𝛻ℎ 𝑥∗ 𝑇𝛻ℎ(𝑥∗)
It can be shown that for 𝜚 > 𝜚 , 𝛻𝑥𝑥2 𝐿𝐴 𝑥∗, 𝜈∗, 𝜚 is
positive definite → 𝑥∗ is a local minimizer of 𝐿𝐴 𝑥, 𝜈∗, 𝜚
Need to know 𝜈∗
spring 2014 TIES483 Nonlinear optimization
Properties of 𝐿𝐴(𝑥, 𝜈, 𝜚)
Differentiable if the original functions are
𝑥∗ is a minimizer of 𝐿𝐴 𝑥, 𝜈∗, 𝜚 for finite 𝜚
Lagrangian function + quadratic penalty
function
spring 2014 TIES483 Nonlinear optimization
Algorithm
1) Choose the final tolerance 𝜖 > 0. Choose 𝑥1, 𝜈𝑖1 (𝑖 = 1,… , 𝑙) and 𝜚. Set ℎ = 1.
2) Test optimality: if optimality conditions are satisfied, stop. The solution is 𝑥ℎ.
3) Solve (with a suitable method) min
𝑥∈𝑅𝑛𝐿𝐴 𝑥, 𝜈ℎ, 𝜚
by using 𝑥ℎ as a starting point. Let the solution be 𝑥ℎ+1.
4) Update Lagrange multipliers: e.g. 𝜈ℎ+1 = 𝜈ℎ + 𝜚ℎ 𝑥ℎ+1 .
5) Increase 𝜚 if necessary: e.g. if ℎ(𝑥ℎ) − ℎ 𝑥ℎ+1 < 𝜖.
6) Set ℎ = ℎ + 1 and go to 2).
Note: 𝑥ℎ → 𝑥∗ only if 𝜈ℎ → 𝜈∗
spring 2014 TIES483 Nonlinear optimization
Example
spring 2014 TIES483 Nonlinear optimization
min𝑓 𝑥 = 𝑥1𝑥22 𝑠. 𝑡. 𝑥1
2+𝑥22 − 2 ≤ 0
𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗ Lagrangian function: saddle point in 𝑥∗
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
)
Example (cont.)
Augmented Lagrangian function
(a) 𝜚 = 0.075, (b) 𝜚 = 0.2, (c) 𝜚 = 100
spring 2014 TIES483 Nonlinear optimization
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
)
Example (cont.)
Augmented Lagrangian function
𝜈∗ = 0.8165, 𝜚 = 0.2
(a) 𝜈 = 0.5, (b) 𝜈 = 0.9, (c) 𝜈 = 1.0
spring 2014 TIES483 Nonlinear optimization
Fro
m M
iett
ine
n: N
on
line
ar
op
tim
iza
tio
n, 2
00
7 (
in F
inn
ish
)
Topic of the lectures next week
Mon, Feb 10th: Constrained optimization: gradient projection, active set method
Wed, Feb 12th: Constrained optimization, SQP method & Matlab
Study this before the lecture!
Questions to be considered – What is the basic idea of gradient projection?
– What is the basic idea of active set methods?
– What is the basic idea of Sequential Quadratic Programming (SQP)?
spring 2014 TIES483 Nonlinear optimization