introduction to optimization: basic...
TRANSCRIPT
Introduction to Optimization
Basic concepts
Instructor: Wotao YinDepartment of Mathematics, UCLA
Spring 2015
based on Chong-Zak, 4th Ed.
Goals of this lecture
The general form of optimization:
minimize f (x)
subject to x ∈ Ω
We study the following topics:
• terminology
• types of minimizers
• optimality conditions
Unconstrained vs constrained optimization
minimize f (x)
subject to x ∈ Ω
Suppose x ∈ Rn , Ω is called the feasible set.
• if Ω = Rn , then the problem is called unconstrained.
• otherwise, the problem is called constrained.
In general, more sophisticated techniques are needed to solve constrained
problems.
Types of solutions
• x∗ is a local minimizer if there is ε > 0 such that f (x) ≥ f (x∗) for all
x ∈ Ω \ {x∗} and ‖x − x∗‖ < ε
• x∗ is a global minimizer if f (x) ≥ f (x∗) for all x ∈ Ω \ {x∗}
• If “≥” is replaced with “>”, then they are strict local minimizer and strict
global minimizer, respectively.
x1: strict global minimizer; x2: strict local minimizer; x3: local minimizer
Convexity and global minimizers
• A set Ω is convex if λx + (1− λ)y ∈ Ω for any x, y ∈ Ω and λ ∈ [0, 1].
• A function is convex if
f (λx + (1− λ)y) ≤ λf (x) + (1− λ)f (y)
for any x, y ∈ Ω and λ ∈ [0, 1].
A function is convex if and only if its graph is convex.
• An optimization problem is convex if both the objective function and
feasible set are convex.
• Theorem: Any local minimizer of a convex optimization problem is a
global minimizer.
Derivatives
• First-order derivative: row vector
• Gradient of f : ∇f = (Df )T , which is a column vector.
• A gradient represents the slope of the tangent of the graph of function. It
gives the linear approximation of f at a point. It points toward the
greatest rate of increase.
• Hessian (i.e., second-derivative) of f :
which is a symmetric matrix. (F(x))ij = ∂2f∂xi∂xj
= ∂2f∂xj∂xi
.
• For one-dimensional function f (x) where x ∈ R, it reduces to f ′′(x).
• F(x) is the Jacobian of ∇f (x), that is, F(x) = J (∇f (x)).
• Alternative notation: H (x) and ∇2f (x) are also used for Hessian.
• A Hessian gives a quadratic approximation of f at a point.
• Gradient and Hessian are local properties that help us recognize local
solutions and determine a direction to move at toward the next point.
Example
Consider
f (x1, x2) = x31 + x2
1 − x1x2 + x22 + 5x1 + 8x2 + 4
Then,
∇f (x) =
[3x2
1 + 2x1 − x2 + 5
−x1 + 2x2 + 8
]∈ R2
and
F(x) =
[6x1 + 2 −1
−1 2
]∈ R2×2.
Observation: if f is a quadratic function (remove x31 in the above example),
∇f (x) is a linear vector and F(x) is a symmetric constant matrix for any x.
Taylor expansion
Suppose φ ∈ Cm (m times continuously differentiable). The Taylor expansion
of φ at a point a is
φ(b) = φ(0) + φ′(a)h +φ′′(a)
2!h2 + ∙ ∙ ∙+
φm
m!hm + o(hm).1
There are other ways to write the last two terms.
Example: Consider x,d ∈ Rn and f ∈ C2. Define φ(α) = f (x + αd). Then,
φ′(α) = ∇f (x + αd)T d
φ′′(α) = dT F(x + αd)T d
Hence,
f (x + αd) = f (x) +(∇f (x + αd)T d
)α+ o(α)
= f (x) +(∇f (x + αd)T d
)α+
dT F(x + αd)T d2
α2 + o(α2).
1o(α) collects the term(s) that is “asymptotically smaller than α” near 0, that is, o(α)α→ 0, as α ↓ 0.
Feasible direction
• A vector d ∈ Rn is a feasible direction at x ∈ Ω if d 6= 0 and x + αd ∈ Ω
for some small α > 0.
(It is possible that d is an infeasible step, that is, x + d 6∈ Ω. But if there is
some room in Ω to move from x toward d, then d is a feasible direction.)
d1 is feasible, d2 is infeasible
• If Ω = Rn or x lies in the interior of Ω, then any d ∈ Rn \ {0} is a feasible
direction
• Feasible directions are introduced to establish optimality conditions,
especially for points on the boundary of a constrained problem
First-order necessary condition
Let C1 be the set of continuously differentiable functions.
Proof: Let d by any feasible direction. First-order Taylor expansion:
f (x∗ + αd) = f (x∗) + αdT∇f (x∗) + o(α).
If dT∇f (x∗) < 0, which does not depend on α, then f (x∗ + αd) < f (x∗) for
all sufficiently small α > 0 (that is, all α ∈ (0, α) for some α > 0). This is a
contradiction since x∗ is a local minimizer.
Proof: Since any d ∈ Rn \ {0} is a feasible direction, we can set
d = −∇f (x∗). From Theorem 6.1, we have dT∇f (x∗) = −‖∇f (x∗)‖2 ≥ 0.
Since ‖∇f (x∗)‖2 ≥ 0, we have ‖∇f (x∗)‖2 = 0 and thus ∇f (x∗) = 0.
Comment: This condition also reduces the problem
minimize f (x)
to solving the equation
∇f (x∗) = 0.
x1 fails to satisfy the FONC; x2 satisfies the FONC
Second-order necessary condition
In FONC, there are two possibilities
• dT∇f (x∗) > 0;
• dT∇f (x∗) = 0.
In the first case, f (x∗ + αd) > f (x∗) for all sufficiently small α > 0.
In the second case, the vanishing dT∇f (x∗) allows us to check higher-order
derivatives.
Let C2 be the set of twice continuously differentiable functions.
Proof: Assume that ∃ a feasible direction d with dT∇f (x∗) = 0 and
dT F(x∗)d < 0. By 2nd-order Taylor expansion (with a vanishing 1st order
term), we have
f (x∗ + αd) = f (x∗) +dT F(x∗)d
2α2 + o(α2),
where by our assumption dT F(x∗)d < 0. Hence, for all sufficiently small
α > 0, we have f (x∗ + αd) < f (x∗), which contradicts that x∗ is a local
minimizer.
The necessary conditions are not sufficient
Counter examples
f (x) = x3, f ′(x) = 3x2, f ′′(x) = 6x
f (x) = x21 − x2
2
0 is a saddle point: ∇f (0) = 0 but
neither a local minimizer nor maximizer
By SONC, 0 is not a local minimizer!
Second-order sufficient condition
Comments:
• part 2 states F(x∗) is positive definite: xT F(x∗)x > 0 for x 6= 0.
• the condition is not necessary for strict local minimizer.
Proof: For any d 6= 0 and ‖d‖ = 1, we have dT F(x∗)d ≥ λmin(F(x∗)) > 0.
Use the 2nd order Taylor expansion
f (x∗+αd) = f (x∗)+α2
2dT F(x∗)d+o(α2) ≥ f (x∗)+
α2
2λmin(F(x∗))+o(α2).
Then, ∃ α > 0, regardless of d, such that f (x∗ + αd) > f (x∗), α ∈ (0, α).
Graph of f (x) = x21 + x2
2
The point 0 satisfies the SOSC.
Roles of optimality conditions
• Recognize a solution: given a candidate solution, check optimality
conditions to verify it is a solution.
• Measure the quality of an approximate solution: measure how “close” a
point is to being a solution
• Develop algorithms: reduce an optimization problem to solving a
(nonlinear) equation (finding a root of the gradient).
Later, we will see other forms of optimality conditions and how they lead
to equivalent subproblems, as well as algorithms
Quiz questions
1. Show that for Ω = {x ∈ Rn : Ax = b}, d 6= 0 is a feasible direction at
x ∈ Ω if and only if Ad = 0.
2. Show that for any unconstrained quadratic program, which has the form
minimize f (x) :=12
xT Qx − bT x,
if x∗ satisfies the second-order necessary condition, then x∗ is a global
minimizer.
3. Show that for any unconstrained quadratic program with Q ≥ 0 (Q is
symmetric and positive semi-definite), x∗ is a global minimizer if and only
if x∗ satisfies the first-order necessary condition. That is, the problem is
equivalent to solving Qx = b.
4. Consider minimize cT x, subject to x ∈ Ω. Suppose that c 6= 0 and the
problem has a global minimizer. Can the minimizer lie in the interior of Ω?
1. Show that for Ω = {x ∈ Rn : Ax = b}, d 6= 0 is a feasible direction at
x ∈ Ω if and only if Ad = 0.
Proof: d 6= 0 is a feasible direction ⇐⇒ ∃α > 0 such that A(x + αd) = b
⇐⇒ Ad = 0 (since Ax = b).
2. Show that for any unconstrained quadratic program, which has the form
minimize f (x) :=12
xT Qx − bT x,
if x∗ satisfies the second-order necessary condition, x∗ is a global minimizer.
Proof: Since the problem is unconstrained, any point is an interior point (of
the feasible set Rn). By SONC (Corollary 6.2), ∇f (x∗) = Qx∗ − b = 0 and
F(x∗) = Q > 0, that is dT Qd ≥ 0 for any d ∈ Rn .
Pick any x ∈ Rn \ {x∗} and set d = x − x∗. Then
f (x)− f (x∗) = (12
xT Qx − bT x)− (12
x∗T Qx∗ − bT x∗)
= (12
xT Qx − x∗T Qx)− (12
x∗T Qx∗ − x∗T Qx∗)
=12
xT Qx − x∗T Qx +12
x∗T Qx∗
=12
dT Qd ≥ 0.
Therefore, x∗ is a global minimizer (not necessarily strict).
3. Show that for any unconstrained quadratic program with Q ≥ 0 (Q is
symmetric and positive semi-definite), x∗ is a global minimizer if and only if x∗
satisfies the first-order necessary condition. That is, the problem is equivalent
to solving Qx = b.
Proof: The problem is unconstrained, so the FONC is ∇F(x∗) = Qx∗− b = 0.
“⇐=” The assumption Q ≥ 0 along with the FONC means that x∗ satisfies
the SONC. By the last quiz question, x∗ is a global minimizer.
“=⇒” As a global minimizer, x∗ is automatically a local minimizer and thus
must satisfy the first-order necessary condition.
Three possible cases:
• if Qx = b has the unique solution x∗, it is also the strict global minimizer;
• if Qx = b has the infinitely many solutions, they are all global minimizers
(they achieve the save optimal objective);
• if Qx = b has no solution, then the optimization problem is unbounded
(the objective can be made as small as possible).
4. Consider minimize cT x, subject to x ∈ Ω. Suppose that c 6= 0 and the
problem has a global minimizer. Can the minimizer lie in the interior of Ω?
Answer: No.
Proof: (Proof by contradiction.) Suppose x∗ (exists by assumption) lies in the
interior of Ω. Then there exists a ball B = {x : ‖x − x∗‖ < ε} ∈ Ω, for some
ε > 0, such that cT x ≥ cT x∗ for all x ∈ B. However, pick x = x∗ − ε2 c ∈ B,
and we have cT x = cT x∗ − ε2‖c‖2 < cT x∗ since c 6= 0. A contradiction is
reached.