lecture: duality of lp, socp and sdp -...

Lecture: Duality of LP, SOCP and SDP

Zaiwen Wen

Beijing International Center For Mathematical ResearchPeking University

http://bicmr.pku.edu.cn/~wenzw/[email protected]

Acknowledgement: this slides is based on Prof. Lieven Vandenberghe’s lecture notes

1/33

http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html

2/33

Lagrangian

standard form problem (not necessarily convex)

min f0(x)

s.t. fi(x) ≤ 0, i = 1, . . . ,m

hi(x) = 0, i = 1, . . . , p

variable x ∈ Rn, domain D, optimal value p∗.Lagrangian: L : Rn × Rm × Rp → R, with domL = D × Rm × Rp

L(x, λ, ν) = f0(x) +m∑

i=1

λifi(x) +p∑

i=1

νihi(x)

weighted sum of objective and constraint functions

λi is the Lagrange multiplier associated with fi(x) ≤ 0

νi is the Lagrange multiplier associated with hi(x) = 0

3/33

Lagrange dual function

Lagrange dual function: g : Rm × Rp → R

g(λ, ν) = infx∈D

L(x, λ, ν)

= infx∈D

(f0(x) +

m∑i=1

λifi(x) +p∑

i=1

νihi(x)

)

g is concave, can be −∞ for some λ, νlower bound property: if λ ≥ 0, then g(λ, ν) ≤ p∗

Proof: If x̃ is feasible and λ ≥ 0, then

f0(x̃) ≥ L(x̃, λ, ν) ≥ infx∈DL(x, λ, ν) = g(λ, ν).

minimizing over all feasible x̃ gives p∗ ≥ g(λ, ν).

4/33

The dual problem

Lagrange dual problemmax g(λ, ν)

s.t. λ ≥ 0

finds best lower bound on p∗, obtained from Lagrange dualfunction

a convex optimization problem; optimal value denoted d∗

λ, ν are dual feasible if λ ≥ 0, (λ, ν) ∈ domg

often simplified by making implicit constraint (λ, ν) ∈ domgexplicit

5/33

Standard form LP

(P) min c>x

s.t. Ax = b

x ≥ 0

(D) max b>y

s.t. A>y + s = c

s ≥ 0

Lagrangian is

L(x, λ, ν) = c>x + ν>(Ax− b)− λ>x = −b>ν + (c + A>v− λ)>x

L is affine in x, hence

g(λ, ν) = infxL(x, λ, ν) =

{−b>ν, A>ν − λ+ c = 0−∞ otherwise

lower bound property: p∗ ≥ −b>v if A>v + c ≥ 0

Check the dual of (D)

6/33

Inequality form LP

min c>x

s.t. Ax ≤ b

Lagrangian is

L(x, λ, ν) = c>x + ν>(Ax− b) = −b>ν + (c + A>v)>x

L is affine in x, hence

g(λ, ν) = infxL(x, λ, ν) =

{−b>ν, A>ν + c = 0−∞ otherwise

Dual problemmax b>v

s.t. A>v = c, v ≤ 0

7/33

Equality constrained norm minimization

min ‖x‖s.t. Ax = b

max b>v

s.t. ‖A>v‖∗ ≤ 1

Dual function

g(ν) = infx(‖x‖ − ν>Ax + b>ν) =

{b>ν ‖A>ν‖∗ ≤ 1−∞ otherwise,

where ‖v‖∗ = sup‖u‖≤1 u>v is the dual norm of ‖ · ‖Proof: follows from infx(‖x‖ − y>x) = 0 if ‖y‖∗ ≤ 1, −∞ otherwise

if ‖y‖∗ ≤ 1, then ‖x‖ − y>x ≥ 0 for all x, with equality if x = 0

if ‖y‖∗ > 1, then choose x = tu, where ‖u‖ ≤ 1, u>y = ‖y‖∗ > 1:

‖x‖ − y>x = t(‖u‖ − ‖y‖∗)→ −∞ as t→∞

for all x, with equality if x = 0

8/33

LP with box constraints

min c>x

s.t. Ax = b

− 1 ≤ x ≤ 1

max − b>v− 1>λ1 − 1>λ2

s.t. c + A>v + λ1 − λ2 = 0

λ1 ≥ 0, λ2 ≥ 0

reformulation with box constraints made implicit

min f0(x) =

{c>x −1 ≤ x ≤ 1∞ otherwise

s.t. Ax = b

Dual function

g(ν) = inf−1≤x≤1

(c>x + ν>(Ax− b)) = −b>ν − ‖A>v + c‖1

Dual problem:maxν

−b>ν − ‖A>v + c‖1

9/33

Lagrange dual and conjugate function

min f0(x)

s.t. Ax ≤ b

Cx = d

dual function

g(λ, ν) = infx∈domf0

(f0(x) + (A>λ+ C>v)>x− b>λ− d>ν)

= −f ∗(−A>λ− C>v)− b>λ− d>ν

recall definition of conjugate f ∗(y) = supx∈domf (y>x− f (x))

simplifies derivation of dual if conjugate of f0 is known

10/33

Conjugate function

the conjugate of a function f is

f ∗(y) = supx∈dom f

(yTx− f (x))

f ∗ is closed and

convex even if f is not Fenchel’s inequality

f (x) + f (y) ≥ xTy ∀x, y

(extends inequality xTx/2 + yTy/2 ≥ xTy to non-quadratic convex f )

11/33

Quadratic function

f (x) =12

xTAx + bTx + c

strictly convex case (A � 0)

f ∗(y) =12(y− b)TA−1(y− b)− c

general convex case (A � 0)

f ∗(y) =12(y− b)TA†(y− b)− c, dom f ∗ = range(A) + b

12/33

Negative entropy and negative logarithm

negative entropy

f (x) =n∑

i=1

xi log xi f ∗(y) =n∑

i=1

eyi−1

negative logarithm

f (x) = −n∑

i=1

log xi f ∗(y) = −n∑

i=1

log(−yi)− n

matrix logarithm

f (x) = − log det X (dom f = Sn++) f ∗(Y) = − log det(−Y)− n

13/33

Indicator function and norm

indicator of convex set C: conjugate is support function of C

f (x) ={

0, x ∈ C+∞, x 6∈ C

f ∗(y) = supx∈C

yTx

norm: conjugate is indicator of unit dual norm ball

f (x) = ||x|| f ∗(y) ={

0, ||y||∗ ≤ 1+∞, ||y||∗ > 1

(see next page)

14/33

proof: recall the definition of dual norm:

||y||∗ = sup||x||≤1

xTy

to evaluate f ∗(y) = supx(yTx− ||x||) we distinguish two cases

if ||y||∗ ≤ 1, then (by definition of dual norm)

yTx ≤ ||x|| ∀x

and equality holds if x = 0; therefore supx(yTx− ||x||) = 0

if ||y||∗ > 1, there exists an x with ||x|| ≤ 1, xTy > 1; then

f ∗(y) ≥ yT(tx)− ||tx|| = t(yTx− ||x||)

and r.h.s. goes to infinity if t→∞

15/33

The second conjugate

f ∗∗(x) = supy∈dom f ∗

(xTy− f ∗(y))

f ∗∗(x) is closed and convex

from Fenchel’s inequality xTy− f ∗(y) ≤ f (x) for all y and x):

f ∗∗ ≤ f (x) ∀x

equivalently, epi f ⊆ epi f ∗∗ (for any f )

if f is closed and convex, then

f ∗∗(x) = f (x) ∀x

equivalently, epi f = epi f ∗∗ (if f is closed convex); proof on nextpage

16/33

Conjugates and subgradients

if f is closed and convex, then

y ∈ ∂f (x) ⇔ x ∈ ∂f ∗(x) ⇔ xTy = f (x) + f ∗(y)

proof: if y ∈ ∂f (x), then f ∗(y) = supu(yTu− f (u)) = yTx− f (x)

f ∗(v) = supu(vTu− f (u)

≥ vTx− f (x)

= xT(v− y)− f (x) + yTx

= f ∗(y) + xT(v− y)

(1)

for all v; therefore, x is a subgradient of f ∗ at y (x ∈ ∂f ∗(y))reverse implication x ∈ ∂f ∗(y)⇒ y ∈ ∂f (x) follows from f ∗∗ = f

17/33

Some calculus rules

separable sum

f (x1, x2) = g(x1) + h(x2) f ∗(y1, y2) = g∗(y1) + h∗(y2)

scalar multiplication: (for α > 0)

f (x) = αg(x) f ∗(y) = αg∗(y/α)

addition to affine function

f (x) = g(x) + aTx + b f ∗(y) = g∗(y− a)− b

infimal convolution

f (x) = infu+v=x

(g(u) + h(v)) f ∗(y) = g∗(y) + h∗(y)

18/33

Duality and problem reformulations

equivalent formulations of a problem can lead to very differentduals

reformulating the primal problem can be useful when the dual isdifficult to derive, or uninteresting

Common reformulationsintroduce new variables and equality constraints

make explicit constraints implicit or vice-versa

transform objective or constraint functionse.g., replace f0(x) by φ(f0(x)) with φ convex, increasing

19/33

Introducing new variables and equality constraints

min f0(Ax + b)

dual function is constant: g = infx L(x) = infx f0(Ax + b) = p∗

we have strong duality, but dual is quite useless

reformulated problem and its dual

(P) min f0(y)

s.t. Ax + b = y

(D) max b>y− f ∗0 (ν)

s.t. A>ν = 0

dual functions follows from

g(ν) = infx,y

f0(y)− ν>y + ν>Ax + b>ν

=

{−f ∗0 (ν) + b>ν, A>ν = 0−∞ otherwise

20/33

Norm approximation problem

min ‖Ax− b‖ ⇐⇒min ‖y‖s.t. Ax− b = y

can look up conjugate of ‖ · ‖, or derive dual directly

g(ν) = infx,y

‖y‖+ ν>y− ν>Ax + b>ν

=

{b>ν + infy ‖y‖+ ν>y, A>ν = 0−∞, otherwise

=

{b>ν, A>ν = 0, ‖v‖∗ ≤ 1−∞, otherwise

Dual problem ismax b>ν

s.t. A>ν = 0, ‖v‖∗ ≤ 1

21/33

Weak and strong duality

weak duality: d∗ ≤ p∗

always holds (for convex and nonconvex problems)

can be used to find nontrivial lower bounds for difficult problemsfor example, solving the maxcut SDP

strong duality: d∗ = p∗

does not hold in general

(usually) holds for convex problems

conditions that guarantee strong duality in convex problems arecalled constraint qualifications

22/33

Slater’s constraint qualification

strong duality holds for a convex problem

min f0(x)

s.t. fi(x) ≤ 0, i = 1, . . . ,m

Ax = b

If it is strictly feasible, i.e.,

∃x ∈ intD : fi(x) < 0, i = 1, . . . ,m, Ax = b

also guarantees that the dual optimum is attained (if p∗ > −∞)can be sharpened: e.g., can replace intD with relintD (interiorrelative to affine hull); linear inequalities do not need to hold withstrict inequality, . . .there exist many other types of constraint qualifications

23/33

Complementary slackness

assume strong duality holds, x∗ is primal optimal, (λ∗, ν∗) is dualfeasible

f0(x∗) = g(λ∗, ν∗) = infx∈D

(f0(x) +

m∑i=1

λ∗i fi(x) +p∑

i=1

ν∗i hi(x)

)

≤ f0(x∗) +m∑

i=1

λ∗i fi(x∗) +p∑

i=1

ν∗i hi(x∗)

≤ f0(x∗)

hence, the two inequalities hold with equalityx∗ minimizes L(x, λ∗, ν∗)

λ∗i fi(x∗) = 0 for i = 1, . . . ,m (known as complementaryslackness):

λ∗i > 0 =⇒ fi(x∗) = 0, fi(x∗) < 0 =⇒ λ∗i = 0

24/33

Karush-Kuhn-Tucker (KKT) conditions

the following four conditions are called KKT conditions (for a problemwith differentiable fi, hi):

1 primal constraints: fi(x) ≤ 0, i = 1, . . . ,m, hi(x) = 0, i = 1, . . . , p

2 dual constraints: λ ≥ 0

3 complementary slackness: λifi(x) = 0, i = 1, . . . ,m

4 gradient of Lagrangian with respect to x vanishes:

∇f0(x) +m∑

i=1

λi∇fi(x) +p∑

i=1

νi∇hi(x) = 0

If strong duality holds and x, λ, ν are optimal, then they must satisfythe KKT conditions

25/33

KKT conditions for convex problem

If x̃, λ̃, ν̃ satisfy KKT for a convex problem, then they are optimalfrom complementary slackness: f0(x̃) = L(x̃, λ̃, ν̃)

from 4th condition (and convexity): g(λ̃, ν̃) = L(x̃, λ̃, ν̃)Hence, f0(x̃) = g(λ̃, ν̃)

if Slater’s condition is satisfied:x is optimal if and only if there exist λ, ν that satisfy KKT conditions

recall that Slater implies strong duality, and dual optimum isattained

generalizes optimality condition ∇f0(x) = 0 for unconstrainedproblem

26/33

LP Duality

Strong duality: If a LP has an optimal solution, so does its dual, andtheir objective fun. are equal.

PPPPPPPPdualprimal finite unbounded infeasible

finite√

× ×unbounded × ×

√

infeasible ×√ √

If p∗ = −∞, then d∗ ≤ p∗ = −∞, hence dual is infeasible

If d∗ = +∞, then +∞ = d∗ ≤ p∗, hence primal is infeasible

min x1 + 2x2

s.t. x1 + x2 = 1

2x1 + 2x2 = 3

max p1 + 3p2

s.t. p1 + 2p2 = 1

p1 + 2p2 = 2

27/33

Problems with generalized inequalities

min f0(x)

s.t. fi(x) �Ki 0, i = 1, . . . ,m

hi(x) = 0, i = 1, . . . , p

fi(x) � Ki means −fi(x) ∈ Ki.

Lagrangian: 〈·, ·〉Kiinner product in Ki

L(x, λ1, . . . , λm, ν) = f0(x) +m∑

i=1

〈λi, fi(x)〉Ki+

p∑i=1

νihi(x)

dual function

g(λ1, . . . , λm, ν) = infx∈D

L(x, λ1, . . . , λm, ν)

28/33

lower bound property: if λi ∈ K∗i , then g(λ1, . . . , λm, ν) ≤ p∗

proof: If x̃ is feasible and λ �K∗i

0, then

f0(x̃) ≥ f0(x̃) +m∑

i=1

〈λi, fi(x̃)〉Ki+

p∑i=1

νihi(x̃)

≥ infx∈D

L(x, λ1, . . . , λm, ν)

= g(λ1, . . . , λm, ν)

minimizing over all feasible x̃ gives g(λ1, . . . , λm, ν) ≤ p∗.Dual problem

max g(λ1, . . . , λm, ν)

s.t. λi �K∗i

0, i = 1, . . . ,m

weak duality: p∗ ≥ d∗

strong duality: p∗ = d∗ for convex problem with constraintqualification (for example, Slater’s: primal problem is strictlyfeasible)

29/33

Semidefinite program

min c>x

s.t. x1F1 + . . .+ xnFn � G

Fi,G ∈ Sk, multiplier is matrix Z ∈ Sk

Lagrangian L(x,Z) = c>x + 〈Z, x1F1 + . . .+ xnFn − G〉

dual function

g(Z) = infxL(x,Z) =

{−〈G,Z〉 , 〈Fi,Z〉+ ci = 0−∞ otherwise

Dual SDPmin − 〈G,Z〉s.t. 〈Fi,Z〉+ ci = 0,Z � 0

p∗ = d∗ if primal SDP is strictly feasible.

30/33

SDP Relaxtion of Maxcut

min x>Wx

s.t. x2i = 1

=⇒max − 1>ν

s.t. W + diag(ν) � 0⇐⇒

min Tr(WX)

s.t. Xii = 1

X � 0

a nonconvex problem; feasible set contains 2n discrete points

interpretation: partition {1, . . . , n} in two sets; Wij is cost ofassigning i, j to the same set;; −Wij is cost of assigning todifferent sets

dual function

g(ν) = infx

(x>Wx +

∑i

νi(x2i − 1)

)= inf

xx>(W + diag(ν))x− 1>ν

=

{−1>ν W + diag(ν) � 0−∞ otherwise

31/33

SOCP/SDP Duality

(P) min c>x

s.t. Ax = b, xQ � 0

(D) max b>y

s.t. A>y + s = c, sQ � 0

(P) min 〈C,X〉s.t. 〈A1,X〉 = b1

. . .

〈Am,X〉 = bm

X � 0

(D) max b>y

s.t.∑

i

yiAi + S = C

S � 0

Strong dualityIf p∗ > −∞, (P) is strictly feasible, then (D) is feasible andp∗ = d∗

If d∗ < +∞, (D) is strictly feasible, then (P) is feasible andp∗ = d∗

If (P) and (D) has strictly feasible solutions, then both haveoptimal solutions.

32/33

Failure of SOCP Duality

inf (1,−1, 0)x

s.t. (0, 0, 1)x = 1

xQ � 0

sup y

s.t. (0, 0, 1)>y + z = (1,−1, 0)>

zQ � 0

primal: min x0 − x1, s.t. x0 ≥√

x21 + 1; It holds x0 − x1 > 0 and

x0 − x1 → 0 if x0 =√

x21 + 1→∞. Hence, p∗ = 0, no finite

solutiondual: sup y s.t. 1 ≥

√1 + y2. Hence, y = 0

p∗ = d∗ but primal is not attainable.

33/33

Failure of SDP Duality

Consider

min

⟨0 0 00 0 00 0 1

,X

⟩

s.t.

⟨1 0 00 0 00 0 0

,X

⟩= 0

⟨0 1 00 0 01 0 2

,X

⟩= 2

X � 0

max 2y2

s.t.

1 0 00 0 00 0 0

y1 +

0 1 00 0 01 0 2

y2 �

0 0 00 0 00 0 1

primal: X∗ =

0 0 00 0 00 0 1

, p∗ = 1

dual: y∗ = (0, 0). Hence, d∗ = 0

Both problems have finite optimal values, but p∗ 6= d∗

lecture: duality of lp, socp and sdp -...

Documents