constrained optimization: indirect...

Constrained optimization:

indirect methods

Jussi Hakanen

Post-doctoral researcher [email protected]

spring 2014 TIES483 Nonlinear optimization

mailto:[email protected]

On constrained optimization

We have seen how to characterize optimal

solutions in constrained optimization

– KKT optimality conditions include the balance of

forces (−𝛻𝑓 𝑥∗ , 𝛻𝑔𝑖 𝑥∗ , 𝑖 ∈ 𝐼 and 𝛻ℎ𝑗(𝑥

∗)) and

complementarity conditions (𝜇𝑖𝑔𝑖 𝑥∗ = 0 ∀𝑖)

– Regularity of 𝑥∗ need to be assumed

Now, we are interested in how to find such

solutions


Methods for constrained optimization

Many methods utilize knowledge about the constraints – Linear inequalities or linear equalities

– Nonlinear inequalities or equalities

For example, if a linear constraint is active at some point, you know that by taking steps along the direction of the constraint, it remains active

For nonlinear constraints, you don’t have such a direction

Methods for constrained optimization can be characterized based on how they treat constraints


Classification of the methods

Indirect methods: the constrained problem is

converted into a sequence of unconstrained

problems whose solutions will approach to the

solution of the constrained problem, the

intermediate solutions need not to be feasible

Direct methods: the constraints are taking into

account explicitly, intermediate solutions are

feasible


Transforming the optimization

problem

Constraints of the problem can be transformed if needed

𝑔𝑖 𝑥 ≤ 0 ⟺ 𝑔𝑖 𝑥 + 𝑦𝑖2 = 0, where 𝑦𝑖 is a

slack variable; constraint is active if 𝑦𝑖 = 0

– By adding 𝑦𝑖2 no need to add 𝑦𝑖 ≥ 0

– If 𝑔𝑖(𝑥) is linear, then linearity is preserved by 𝑔𝑖 𝑥 + 𝑦𝑖 = 0, 𝑦𝑖 ≥ 0

𝑔𝑖 𝑥 ≥ 0 ⟺ −𝑔𝑖 𝑥 ≤ 0

ℎ𝑖 𝑥 = 0 ⟺ ℎ𝑖 𝑥 ≤ 0 & − ℎ𝑖 𝑥 ≤ 0


Examples of indirect methods

Penalty function methods

Lagrangian methods



Include constraints into the objective function with the help of penalty functions that penalize constraint violations or even approaching the boundary of 𝑆

Different types – Penalty function: penalize for constraint violations

– Barrier function: prevents leaving the feasible region

– Exact penalty function

Resulting unconstrained problems can be solved by using the methods presented earlier in the course



Generate a sequence of points that approach

the feasible region from outside

Constrained problem is converted into

min𝑥∈𝑅𝑛

𝑓 𝑥 + 𝑟 𝛼(𝑥),

where 𝛼(𝑥) is a penalty function and 𝑟 is a

penalty parameter

Requirements: 𝛼 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑅𝑛 and 𝛼 𝑥 = 0

if and only if 𝑥 ∈ 𝑆


On convergence

When 𝑟 → ∞, the solutions 𝑥𝑟 of penalty

function problems converge to a constrained

minimizer (𝑥𝑟 → 𝑥∗ and 𝑟𝛼 𝑥𝑟 → 0)

– All the functions should be continuous

– For each 𝑟, there should exist a solution for penalty

functions problem and {𝑥𝑟} belongs to a compact

subset of 𝑅𝑛


Examples of penalty functions

Can you give an example of a penalty function

𝛼(𝑥)?

For equality constraints

– ℎ𝑖 𝑥 = 0 ⟶ 𝛼 𝑥 = ℎ𝑖 𝑥2𝑙

𝑖=1 or

𝛼 𝑥 = |ℎ𝑖(𝑥)|𝑝𝑙

𝑖=1 , 𝑝 ≥ 2

For inequality constraints

– 𝑔𝑖 𝑥 ≤ 0 ⟶ 𝛼 𝑥 = max 0, 𝑔𝑖(𝑥)𝑚𝑖=1 or

𝛼 𝑥 = max 0, 𝑔𝑖 𝑥𝑝, 𝑝 ≥ 2𝑚

𝑖=1


How to choose 𝑟?

Should be large enough in order for the solutions be close enough to the feasible region

If 𝑟 is too large, there could be numerical problems in solving the penalty problems

For large values of 𝑟, the emphasis is on finding feasible solutions and, thus, the solution can be feasible but far from optimum

Typically 𝑟 is updated iteratively

Different parameters can be used for different constraints (e.g. 𝑔𝑖 ⟶ 𝑟𝑖 , 𝑔𝑗 ⟶ 𝑟𝑗) – For the sake of simplicity, same parameter is used here for all

the constraints


Algorithm

1) Choose the final tolerance 𝜖 > 0 and a starting point 𝑥1. Choose 𝑟1 > 0 (not too large) and set ℎ = 1.

2) Solve min

𝑥∈𝑅𝑛𝑓 𝑥 + 𝑟ℎ𝛼(𝑥)

with some method for unconstrained problems (𝑥ℎ as a starting point). Let the solution be 𝑥ℎ+1 = 𝑥(𝑟ℎ).

3) Test optimality: If 𝑟ℎ𝛼 𝑥ℎ+1 < 𝜖, stop. Solution 𝑥ℎ+1 is close enough to optimum. Otherwise, set 𝑟ℎ+1 > 𝑟ℎ (e.g. 𝑟ℎ+1 = 𝜅𝑟ℎ, where 𝜅 can be initialized to be e.g. 10). Set ℎ = ℎ + 1 and go to 2).


Example

min 𝑥 𝑠. 𝑡. −𝑥 + 2 ≤ 0

Let 𝛼 𝑥 = max [0, (−𝑥 + 2)] 2

Then 𝛼 𝑥 =0, 𝑖𝑓 𝑥 ≥ 2

−𝑥 + 2 2, 𝑖𝑓 𝑥 < 2

Minimum of 𝑓 + 𝑟𝛼 is at 2 −1

2𝑟


Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

)

Barrier function method

Prevents leaving the feasible region

Suitable only for problems with equality constraints – Set 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} should not be empty

Problem to be solved is min

𝑟Θ 𝑟 𝑠. 𝑡. 𝑟 ≥ 0,

where Θ 𝑟 = inf 𝑥

𝑓 𝑥 + 𝑟𝛽 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖]

𝛽 is a barrier function: 𝛽 𝑥 ≥ 0 when 𝑔𝑖 𝑥 < 0 ∀𝑖 and 𝛽 𝑥 → ∞ when 𝑥 approaches boundary of 𝑆

Constraints 𝑔𝑖 𝑥 < 0 can be omitted since 𝛽 → ∞ in the boundary of 𝑆


On convergence

Denote Θ 𝑟 = 𝑓 𝑥𝑟 + 𝑟𝛽(𝑥𝑟)

Under some assumptions, the solutions 𝑥𝑟 of

barrier problems converge to a constrained

minimizer (𝑥𝑟 → 𝑥∗ and 𝑟𝛽 𝑥𝑟 → 0) when

𝑟 → 0+

– All functions should be continuous

– 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} ≠ ∅


Properties of barrier functions

Nonnegative and continuous in 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} Approaches ∞ when the boundary of the feasible region is approached from inside

Ideally: 𝛽 = 0 in 𝑥 𝑔𝑖 𝑥 < 0 ∀𝑖} and 𝛽 = ∞ in the boundary – Guarantees staying in the feasible region

– This kind of discontinuity causes problems for any numerical method

Examples of barrier functions

– 𝛽 𝑥 = −1

𝑔𝑖 𝑥𝑚𝑖=1

– 𝛽 𝑥 = − ln (min[1, −𝑔𝑖(𝑥)])𝑚𝑖=1


Algorithm

1) Choose the final tolerance 𝜖 > 0 and a starting point 𝑥1 s.t. 𝑔𝑖 𝑥 < 0 ∀𝑖. Choose 𝑟1 > 0, not too small (and a parameter 0 < 𝜏 < 1 for reducing 𝑟). Set ℎ = 1.

2) Solve min

𝑥𝑓 𝑥 + 𝑟ℎ𝛽(𝑥) 𝑠. 𝑡. 𝑔𝑖 𝑥 < 0 ∀𝑖

by using the starting point 𝑥ℎ. Let the solution be 𝑥ℎ+1.

3) Test optimality: If 𝑟ℎ𝛽 𝑥ℎ+1 < 𝜖, stop. Solution

𝑥ℎ+1 is close enough to optimum. Otherwise, set 𝑟ℎ+1 < 𝑟ℎ (e.g. 𝑟ℎ+1 = 𝜏𝑟ℎ). Set ℎ = ℎ + 1 and go to 2).


Example


Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

) min 𝑥 𝑠. 𝑡. −𝑥 + 1 ≤ 0

Let 𝛽 𝑥 =−1

−𝑥+1 when 𝑥 ≠ 1

Minimum of 𝑓 𝑥 + 𝑟𝛽 𝑥 =𝑥 + 𝑟 𝑥 − 1 −1 is at 1 + 𝑟

Summary: penalty and barrier

function methods

Penalty and barrier functions usually differentiable

Minimum is obtained in a limit – Penalty function: 𝑟ℎ → ∞

– Barrier function: 𝑟ℎ → 0

Choosing the sequence 𝑟ℎ essential for convergence – If 𝑟ℎ → ∞ or 𝑟ℎ → 0 too slowly, a large number of

unconstrained problems need to be solved

– If 𝑟ℎ → ∞ or 𝑟ℎ → 0 too fast, solutions of successive unconstrained problems are far from each other and solution time increases


Exact penalty function

Idea is to have a method where the solution could be found with a small amount of iterations

Suitable for both equality and inequality constraints

An exact penalty function problem is e.g. of the form

min𝑥∈𝑅𝑛

𝑓 𝑥 + 𝑟( max[0, 𝑔𝑖 𝑥 ] + |ℎ𝑖(𝑥)|𝑙𝑖=1

𝑚𝑖=1 )


Exact penalty function method

Theorem: Consider a point 𝑥 where the necessary KKT conditions hold. Let the corresponding Lagrange multipliers be 𝜇 and 𝜈. Assume that objective and inequality constraint functions are convex and equality constraint functions are affine. Then 𝑥 is a solution of the exact penalty function problem with 𝑟 ≥ max[𝜇𝑖 , 𝑖 = 1,… ,𝑚, 𝜈𝑖 , 𝑖 = 1,… , 𝑙] Solution can be obtained with a finite value for the penalty parameter 𝑟

Algorithm is similar to penalty function method except for that 𝑟ℎ is increased only if necessary – E.g. when the feasible region is not approached fast enough


Properties of exact penalty function

Not differentiable in points 𝑥 where 𝑔𝑖 𝑥 = 0

or ℎ𝑖 𝑥 = 0

– Gradient based methods are not suitable

If 𝑟 and the starting point could be chosen

appropriately, only one minimization would be

required in principle

– If 𝑟 is too large and the starting point is not close

enough to the optimum, minimizing the exact

penalty function problem could become difficult


Example

min 𝑓 𝑥 = 𝑥12 + 𝑥2

2 𝑠. 𝑡. 𝑥1 + 𝑥2 − 1 = 0

Optimal solution is 𝑥∗ =1

2,1

2

𝑇, 𝜈∗ = −2𝑥1

∗ = −2𝑥2∗ = −1

Exact penalty function problem: min

𝑥∈𝑅𝑛𝑥12 + 𝑥2

2 + 𝑟|𝑥1 + 𝑥2 − 1|

Solution: 𝑥∗ =𝑟

2,𝑟

2

𝑇when 0 ≤ 𝑟 < 1 and 𝑥∗ =

1

2,1

2

𝑇

when 𝑟 ≥ 1 – (obtained by using KKT conditions of an equivalent

differentiable problem where the absolute value term is replaced with a new variable and two inequality constraints)

Thus, the solution can be found with 𝑟 ≥ 1 (= |𝜈∗|)


Example: barrier function

min 𝑓 𝑥 = 𝑥1𝑥22 𝑠. 𝑡. 𝑥1

2+𝑥22 − 2 ≤ 0

𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗

(a) level curves of 𝑓(𝑥) and the boundary of 𝑆

Logarithmic barrier function: (b) 𝑟 = 0.2, (c) 𝑟 = 0.001


Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

)

Example: penalty function


2+𝑥22 − 2 ≤ 0

𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗

Quadratic penalty function: (a) 𝑟 = 1, (b) 𝑟 = 100


Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

)

Example: exact penalty function


2+𝑥22 − 2 ≤ 0

𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗, 𝜇∗ = 0.8165

Exact penalty function: (a) 𝑟 = 1.2, (b) 𝑟 = 5, (c) 𝑟 = 100


Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

)

Lagrangian function

Consider problem

min𝑓(𝑥) 𝑠. 𝑡. ℎ𝑖 𝑥 = 0, 𝑖 = 1,… , 𝑙

Lagrangian function

𝐿 𝑥, 𝜈 = 𝑓 𝑥 + 𝜈𝑖ℎ𝑖(𝑥)𝑙𝑖=1

KKT conditions

𝛻𝑓 𝑥 + 𝜈𝑖𝛻ℎ𝑖(𝑥)

𝑙𝑖=1 = 0

ℎ𝑖 𝑥 = 0, 𝑖 = 1,… , 𝑙

Let 𝑥∗ be a minimizer and 𝜈∗ corresponding

Lagrange multiplier


Properties of Lagrangian

KKT conditions: 𝑥∗ is a critical point of the

Lagrangian function

– 𝑥∗ is not necessarily a minimizer for 𝐿(𝑥, 𝜈∗)

Thus, minimizing the Lagrangian function

doesn’t necessarily give a minimum for 𝑓(𝑥)

– Hessian 𝛻𝑥𝑥2 𝐿(𝑥∗, 𝜈∗) may be indefinite → a saddle

point

Improve Lagrangian function!


Augmented Lagrangian function

Augmented Lagrangian function:

𝐿𝐴 𝑥, 𝜈, 𝜚 = 𝑓 𝑥 + 𝜈𝑖ℎ𝑖(𝑥)𝑙𝑖=1 +

1

2𝜚 ℎ𝑖 𝑥

2𝑙𝑖=1 , 𝜚 > 0

– Lagrangian function + quadratic penalty function

A point (𝑥∗, 𝜈∗) is a critical point of the augmented Lagrangian

– 𝛻x𝐿𝐴 𝑥∗, 𝜈∗, 𝜚 = 0 and 1

2𝜚 ℎ𝑖 𝑥

∗ 2 = 0𝑙𝑖=1

Hessian: 𝛻𝑥𝑥

2 𝐿𝐴 𝑥∗, 𝜈∗, 𝜚 = 𝛻𝑥𝑥2 𝐿 𝑥∗, 𝜈∗ + 𝜚𝛻ℎ 𝑥∗ 𝑇𝛻ℎ(𝑥∗)

It can be shown that for 𝜚 > 𝜚 , 𝛻𝑥𝑥2 𝐿𝐴 𝑥∗, 𝜈∗, 𝜚 is

positive definite → 𝑥∗ is a local minimizer of 𝐿𝐴 𝑥, 𝜈∗, 𝜚

Need to know 𝜈∗


Properties of 𝐿𝐴(𝑥, 𝜈, 𝜚)

Differentiable if the original functions are

𝑥∗ is a minimizer of 𝐿𝐴 𝑥, 𝜈∗, 𝜚 for finite 𝜚

Lagrangian function + quadratic penalty

function


Algorithm

1) Choose the final tolerance 𝜖 > 0. Choose 𝑥1, 𝜈𝑖1 (𝑖 = 1,… , 𝑙) and 𝜚. Set ℎ = 1.

2) Test optimality: if optimality conditions are satisfied, stop. The solution is 𝑥ℎ.

3) Solve (with a suitable method) min

𝑥∈𝑅𝑛𝐿𝐴 𝑥, 𝜈ℎ, 𝜚

by using 𝑥ℎ as a starting point. Let the solution be 𝑥ℎ+1.

4) Update Lagrange multipliers: e.g. 𝜈ℎ+1 = 𝜈ℎ + 𝜚ℎ 𝑥ℎ+1 .

5) Increase 𝜚 if necessary: e.g. if ℎ(𝑥ℎ) − ℎ 𝑥ℎ+1 < 𝜖.

6) Set ℎ = ℎ + 1 and go to 2).

Note: 𝑥ℎ → 𝑥∗ only if 𝜈ℎ → 𝜈∗


Example


min𝑓 𝑥 = 𝑥1𝑥22 𝑠. 𝑡. 𝑥1

2+𝑥22 − 2 ≤ 0

𝑥∗ = −0.8165,−1.1547 𝑇, the constraint is active in 𝑥∗ Lagrangian function: saddle point in 𝑥∗

Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

)

Example (cont.)


(a) 𝜚 = 0.075, (b) 𝜚 = 0.2, (c) 𝜚 = 100


Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

)

Example (cont.)


𝜈∗ = 0.8165, 𝜚 = 0.2

(a) 𝜈 = 0.5, (b) 𝜈 = 0.9, (c) 𝜈 = 1.0


Fro

m M

iett

ine

n: N

on

line

ar

op

tim

iza

tio

n, 2

00

7 (

in F

inn

ish

)

Topic of the lectures next week

Mon, Feb 10th: Constrained optimization: gradient projection, active set method

Wed, Feb 12th: Constrained optimization, SQP method & Matlab

Study this before the lecture!

Questions to be considered – What is the basic idea of gradient projection?

– What is the basic idea of active set methods?

– What is the basic idea of Sequential Quadratic Programming (SQP)?


constrained optimization: indirect...

Documents