lecture: duality of lp, socp and sdp -...
TRANSCRIPT
Lecture: Duality of LP, SOCP and SDP
Zaiwen Wen
Beijing International Center For Mathematical ResearchPeking University
http://bicmr.pku.edu.cn/~wenzw/[email protected]
Acknowledgement: this slides is based on Prof. Lieven Vandenberghe’s lecture notes
1/33
2/33
Lagrangian
standard form problem (not necessarily convex)
min f0(x)
s.t. fi(x) ≤ 0, i = 1, . . . ,m
hi(x) = 0, i = 1, . . . , p
variable x ∈ Rn, domain D, optimal value p∗.Lagrangian: L : Rn × Rm × Rp → R, with domL = D × Rm × Rp
L(x, λ, ν) = f0(x) +m∑
i=1
λifi(x) +p∑
i=1
νihi(x)
weighted sum of objective and constraint functions
λi is the Lagrange multiplier associated with fi(x) ≤ 0
νi is the Lagrange multiplier associated with hi(x) = 0
3/33
Lagrange dual function
Lagrange dual function: g : Rm × Rp → R
g(λ, ν) = infx∈D
L(x, λ, ν)
= infx∈D
(f0(x) +
m∑i=1
λifi(x) +p∑
i=1
νihi(x)
)
g is concave, can be −∞ for some λ, νlower bound property: if λ ≥ 0, then g(λ, ν) ≤ p∗
Proof: If x̃ is feasible and λ ≥ 0, then
f0(x̃) ≥ L(x̃, λ, ν) ≥ infx∈DL(x, λ, ν) = g(λ, ν).
minimizing over all feasible x̃ gives p∗ ≥ g(λ, ν).
4/33
The dual problem
Lagrange dual problemmax g(λ, ν)
s.t. λ ≥ 0
finds best lower bound on p∗, obtained from Lagrange dualfunction
a convex optimization problem; optimal value denoted d∗
λ, ν are dual feasible if λ ≥ 0, (λ, ν) ∈ domg
often simplified by making implicit constraint (λ, ν) ∈ domgexplicit
5/33
Standard form LP
(P) min c>x
s.t. Ax = b
x ≥ 0
(D) max b>y
s.t. A>y + s = c
s ≥ 0
Lagrangian is
L(x, λ, ν) = c>x + ν>(Ax− b)− λ>x = −b>ν + (c + A>v− λ)>x
L is affine in x, hence
g(λ, ν) = infxL(x, λ, ν) =
{−b>ν, A>ν − λ+ c = 0−∞ otherwise
lower bound property: p∗ ≥ −b>v if A>v + c ≥ 0
Check the dual of (D)
6/33
Inequality form LP
min c>x
s.t. Ax ≤ b
Lagrangian is
L(x, λ, ν) = c>x + ν>(Ax− b) = −b>ν + (c + A>v)>x
L is affine in x, hence
g(λ, ν) = infxL(x, λ, ν) =
{−b>ν, A>ν + c = 0−∞ otherwise
Dual problemmax b>v
s.t. A>v = c, v ≤ 0
7/33
Equality constrained norm minimization
min ‖x‖s.t. Ax = b
max b>v
s.t. ‖A>v‖∗ ≤ 1
Dual function
g(ν) = infx(‖x‖ − ν>Ax + b>ν) =
{b>ν ‖A>ν‖∗ ≤ 1−∞ otherwise,
where ‖v‖∗ = sup‖u‖≤1 u>v is the dual norm of ‖ · ‖Proof: follows from infx(‖x‖ − y>x) = 0 if ‖y‖∗ ≤ 1, −∞ otherwise
if ‖y‖∗ ≤ 1, then ‖x‖ − y>x ≥ 0 for all x, with equality if x = 0
if ‖y‖∗ > 1, then choose x = tu, where ‖u‖ ≤ 1, u>y = ‖y‖∗ > 1:
‖x‖ − y>x = t(‖u‖ − ‖y‖∗)→ −∞ as t→∞
for all x, with equality if x = 0
8/33
LP with box constraints
min c>x
s.t. Ax = b
− 1 ≤ x ≤ 1
max − b>v− 1>λ1 − 1>λ2
s.t. c + A>v + λ1 − λ2 = 0
λ1 ≥ 0, λ2 ≥ 0
reformulation with box constraints made implicit
min f0(x) =
{c>x −1 ≤ x ≤ 1∞ otherwise
s.t. Ax = b
Dual function
g(ν) = inf−1≤x≤1
(c>x + ν>(Ax− b)) = −b>ν − ‖A>v + c‖1
Dual problem:maxν
−b>ν − ‖A>v + c‖1
9/33
Lagrange dual and conjugate function
min f0(x)
s.t. Ax ≤ b
Cx = d
dual function
g(λ, ν) = infx∈domf0
(f0(x) + (A>λ+ C>v)>x− b>λ− d>ν)
= −f ∗(−A>λ− C>v)− b>λ− d>ν
recall definition of conjugate f ∗(y) = supx∈domf (y>x− f (x))
simplifies derivation of dual if conjugate of f0 is known
10/33
Conjugate function
the conjugate of a function f is
f ∗(y) = supx∈dom f
(yTx− f (x))
f ∗ is closed and
convex even if f is not Fenchel’s inequality
f (x) + f (y) ≥ xTy ∀x, y
(extends inequality xTx/2 + yTy/2 ≥ xTy to non-quadratic convex f )
11/33
Quadratic function
f (x) =12
xTAx + bTx + c
strictly convex case (A � 0)
f ∗(y) =12(y− b)TA−1(y− b)− c
general convex case (A � 0)
f ∗(y) =12(y− b)TA†(y− b)− c, dom f ∗ = range(A) + b
12/33
Negative entropy and negative logarithm
negative entropy
f (x) =n∑
i=1
xi log xi f ∗(y) =n∑
i=1
eyi−1
negative logarithm
f (x) = −n∑
i=1
log xi f ∗(y) = −n∑
i=1
log(−yi)− n
matrix logarithm
f (x) = − log det X (dom f = Sn++) f ∗(Y) = − log det(−Y)− n
13/33
Indicator function and norm
indicator of convex set C: conjugate is support function of C
f (x) ={
0, x ∈ C+∞, x 6∈ C
f ∗(y) = supx∈C
yTx
norm: conjugate is indicator of unit dual norm ball
f (x) = ||x|| f ∗(y) ={
0, ||y||∗ ≤ 1+∞, ||y||∗ > 1
(see next page)
14/33
proof: recall the definition of dual norm:
||y||∗ = sup||x||≤1
xTy
to evaluate f ∗(y) = supx(yTx− ||x||) we distinguish two cases
if ||y||∗ ≤ 1, then (by definition of dual norm)
yTx ≤ ||x|| ∀x
and equality holds if x = 0; therefore supx(yTx− ||x||) = 0
if ||y||∗ > 1, there exists an x with ||x|| ≤ 1, xTy > 1; then
f ∗(y) ≥ yT(tx)− ||tx|| = t(yTx− ||x||)
and r.h.s. goes to infinity if t→∞
15/33
The second conjugate
f ∗∗(x) = supy∈dom f ∗
(xTy− f ∗(y))
f ∗∗(x) is closed and convex
from Fenchel’s inequality xTy− f ∗(y) ≤ f (x) for all y and x):
f ∗∗ ≤ f (x) ∀x
equivalently, epi f ⊆ epi f ∗∗ (for any f )
if f is closed and convex, then
f ∗∗(x) = f (x) ∀x
equivalently, epi f = epi f ∗∗ (if f is closed convex); proof on nextpage
16/33
Conjugates and subgradients
if f is closed and convex, then
y ∈ ∂f (x) ⇔ x ∈ ∂f ∗(x) ⇔ xTy = f (x) + f ∗(y)
proof: if y ∈ ∂f (x), then f ∗(y) = supu(yTu− f (u)) = yTx− f (x)
f ∗(v) = supu(vTu− f (u)
≥ vTx− f (x)
= xT(v− y)− f (x) + yTx
= f ∗(y) + xT(v− y)
(1)
for all v; therefore, x is a subgradient of f ∗ at y (x ∈ ∂f ∗(y))reverse implication x ∈ ∂f ∗(y)⇒ y ∈ ∂f (x) follows from f ∗∗ = f
17/33
Some calculus rules
separable sum
f (x1, x2) = g(x1) + h(x2) f ∗(y1, y2) = g∗(y1) + h∗(y2)
scalar multiplication: (for α > 0)
f (x) = αg(x) f ∗(y) = αg∗(y/α)
addition to affine function
f (x) = g(x) + aTx + b f ∗(y) = g∗(y− a)− b
infimal convolution
f (x) = infu+v=x
(g(u) + h(v)) f ∗(y) = g∗(y) + h∗(y)
18/33
Duality and problem reformulations
equivalent formulations of a problem can lead to very differentduals
reformulating the primal problem can be useful when the dual isdifficult to derive, or uninteresting
Common reformulationsintroduce new variables and equality constraints
make explicit constraints implicit or vice-versa
transform objective or constraint functionse.g., replace f0(x) by φ(f0(x)) with φ convex, increasing
19/33
Introducing new variables and equality constraints
min f0(Ax + b)
dual function is constant: g = infx L(x) = infx f0(Ax + b) = p∗
we have strong duality, but dual is quite useless
reformulated problem and its dual
(P) min f0(y)
s.t. Ax + b = y
(D) max b>y− f ∗0 (ν)
s.t. A>ν = 0
dual functions follows from
g(ν) = infx,y
f0(y)− ν>y + ν>Ax + b>ν
=
{−f ∗0 (ν) + b>ν, A>ν = 0−∞ otherwise
20/33
Norm approximation problem
min ‖Ax− b‖ ⇐⇒min ‖y‖s.t. Ax− b = y
can look up conjugate of ‖ · ‖, or derive dual directly
g(ν) = infx,y
‖y‖+ ν>y− ν>Ax + b>ν
=
{b>ν + infy ‖y‖+ ν>y, A>ν = 0−∞, otherwise
=
{b>ν, A>ν = 0, ‖v‖∗ ≤ 1−∞, otherwise
Dual problem ismax b>ν
s.t. A>ν = 0, ‖v‖∗ ≤ 1
21/33
Weak and strong duality
weak duality: d∗ ≤ p∗
always holds (for convex and nonconvex problems)
can be used to find nontrivial lower bounds for difficult problemsfor example, solving the maxcut SDP
strong duality: d∗ = p∗
does not hold in general
(usually) holds for convex problems
conditions that guarantee strong duality in convex problems arecalled constraint qualifications
22/33
Slater’s constraint qualification
strong duality holds for a convex problem
min f0(x)
s.t. fi(x) ≤ 0, i = 1, . . . ,m
Ax = b
If it is strictly feasible, i.e.,
∃x ∈ intD : fi(x) < 0, i = 1, . . . ,m, Ax = b
also guarantees that the dual optimum is attained (if p∗ > −∞)can be sharpened: e.g., can replace intD with relintD (interiorrelative to affine hull); linear inequalities do not need to hold withstrict inequality, . . .there exist many other types of constraint qualifications
23/33
Complementary slackness
assume strong duality holds, x∗ is primal optimal, (λ∗, ν∗) is dualfeasible
f0(x∗) = g(λ∗, ν∗) = infx∈D
(f0(x) +
m∑i=1
λ∗i fi(x) +p∑
i=1
ν∗i hi(x)
)
≤ f0(x∗) +m∑
i=1
λ∗i fi(x∗) +p∑
i=1
ν∗i hi(x∗)
≤ f0(x∗)
hence, the two inequalities hold with equalityx∗ minimizes L(x, λ∗, ν∗)
λ∗i fi(x∗) = 0 for i = 1, . . . ,m (known as complementaryslackness):
λ∗i > 0 =⇒ fi(x∗) = 0, fi(x∗) < 0 =⇒ λ∗i = 0
24/33
Karush-Kuhn-Tucker (KKT) conditions
the following four conditions are called KKT conditions (for a problemwith differentiable fi, hi):
1 primal constraints: fi(x) ≤ 0, i = 1, . . . ,m, hi(x) = 0, i = 1, . . . , p
2 dual constraints: λ ≥ 0
3 complementary slackness: λifi(x) = 0, i = 1, . . . ,m
4 gradient of Lagrangian with respect to x vanishes:
∇f0(x) +m∑
i=1
λi∇fi(x) +p∑
i=1
νi∇hi(x) = 0
If strong duality holds and x, λ, ν are optimal, then they must satisfythe KKT conditions
25/33
KKT conditions for convex problem
If x̃, λ̃, ν̃ satisfy KKT for a convex problem, then they are optimalfrom complementary slackness: f0(x̃) = L(x̃, λ̃, ν̃)
from 4th condition (and convexity): g(λ̃, ν̃) = L(x̃, λ̃, ν̃)Hence, f0(x̃) = g(λ̃, ν̃)
if Slater’s condition is satisfied:x is optimal if and only if there exist λ, ν that satisfy KKT conditions
recall that Slater implies strong duality, and dual optimum isattained
generalizes optimality condition ∇f0(x) = 0 for unconstrainedproblem
26/33
LP Duality
Strong duality: If a LP has an optimal solution, so does its dual, andtheir objective fun. are equal.
PPPPPPPPdualprimal finite unbounded infeasible
finite√
× ×unbounded × ×
√
infeasible ×√ √
If p∗ = −∞, then d∗ ≤ p∗ = −∞, hence dual is infeasible
If d∗ = +∞, then +∞ = d∗ ≤ p∗, hence primal is infeasible
min x1 + 2x2
s.t. x1 + x2 = 1
2x1 + 2x2 = 3
max p1 + 3p2
s.t. p1 + 2p2 = 1
p1 + 2p2 = 2
27/33
Problems with generalized inequalities
min f0(x)
s.t. fi(x) �Ki 0, i = 1, . . . ,m
hi(x) = 0, i = 1, . . . , p
fi(x) � Ki means −fi(x) ∈ Ki.
Lagrangian: 〈·, ·〉Kiinner product in Ki
L(x, λ1, . . . , λm, ν) = f0(x) +m∑
i=1
〈λi, fi(x)〉Ki+
p∑i=1
νihi(x)
dual function
g(λ1, . . . , λm, ν) = infx∈D
L(x, λ1, . . . , λm, ν)
28/33
lower bound property: if λi ∈ K∗i , then g(λ1, . . . , λm, ν) ≤ p∗
proof: If x̃ is feasible and λ �K∗i
0, then
f0(x̃) ≥ f0(x̃) +m∑
i=1
〈λi, fi(x̃)〉Ki+
p∑i=1
νihi(x̃)
≥ infx∈D
L(x, λ1, . . . , λm, ν)
= g(λ1, . . . , λm, ν)
minimizing over all feasible x̃ gives g(λ1, . . . , λm, ν) ≤ p∗.Dual problem
max g(λ1, . . . , λm, ν)
s.t. λi �K∗i
0, i = 1, . . . ,m
weak duality: p∗ ≥ d∗
strong duality: p∗ = d∗ for convex problem with constraintqualification (for example, Slater’s: primal problem is strictlyfeasible)
29/33
Semidefinite program
min c>x
s.t. x1F1 + . . .+ xnFn � G
Fi,G ∈ Sk, multiplier is matrix Z ∈ Sk
Lagrangian L(x,Z) = c>x + 〈Z, x1F1 + . . .+ xnFn − G〉
dual function
g(Z) = infxL(x,Z) =
{−〈G,Z〉 , 〈Fi,Z〉+ ci = 0−∞ otherwise
Dual SDPmin − 〈G,Z〉s.t. 〈Fi,Z〉+ ci = 0,Z � 0
p∗ = d∗ if primal SDP is strictly feasible.
30/33
SDP Relaxtion of Maxcut
min x>Wx
s.t. x2i = 1
=⇒max − 1>ν
s.t. W + diag(ν) � 0⇐⇒
min Tr(WX)
s.t. Xii = 1
X � 0
a nonconvex problem; feasible set contains 2n discrete points
interpretation: partition {1, . . . , n} in two sets; Wij is cost ofassigning i, j to the same set;; −Wij is cost of assigning todifferent sets
dual function
g(ν) = infx
(x>Wx +
∑i
νi(x2i − 1)
)= inf
xx>(W + diag(ν))x− 1>ν
=
{−1>ν W + diag(ν) � 0−∞ otherwise
31/33
SOCP/SDP Duality
(P) min c>x
s.t. Ax = b, xQ � 0
(D) max b>y
s.t. A>y + s = c, sQ � 0
(P) min 〈C,X〉s.t. 〈A1,X〉 = b1
. . .
〈Am,X〉 = bm
X � 0
(D) max b>y
s.t.∑
i
yiAi + S = C
S � 0
Strong dualityIf p∗ > −∞, (P) is strictly feasible, then (D) is feasible andp∗ = d∗
If d∗ < +∞, (D) is strictly feasible, then (P) is feasible andp∗ = d∗
If (P) and (D) has strictly feasible solutions, then both haveoptimal solutions.
32/33
Failure of SOCP Duality
inf (1,−1, 0)x
s.t. (0, 0, 1)x = 1
xQ � 0
sup y
s.t. (0, 0, 1)>y + z = (1,−1, 0)>
zQ � 0
primal: min x0 − x1, s.t. x0 ≥√
x21 + 1; It holds x0 − x1 > 0 and
x0 − x1 → 0 if x0 =√
x21 + 1→∞. Hence, p∗ = 0, no finite
solutiondual: sup y s.t. 1 ≥
√1 + y2. Hence, y = 0
p∗ = d∗ but primal is not attainable.
33/33
Failure of SDP Duality
Consider
min
⟨0 0 00 0 00 0 1
,X
⟩
s.t.
⟨1 0 00 0 00 0 0
,X
⟩= 0
⟨0 1 00 0 01 0 2
,X
⟩= 2
X � 0
max 2y2
s.t.
1 0 00 0 00 0 0
y1 +
0 1 00 0 01 0 2
y2 �
0 0 00 0 00 0 1
primal: X∗ =
0 0 00 0 00 0 1
, p∗ = 1
dual: y∗ = (0, 0). Hence, d∗ = 0
Both problems have finite optimal values, but p∗ 6= d∗