convex optimization - tullotullo.ch/static/cambridge/convexoptimization-lecturenotes.pdf · convex...

A N D R E W T U L L O C H

C O N V E X O P T I M I Z AT I O N

T R I N I T Y C O L L E G E

T H E U N I V E R S I T Y O F C A M B R I D G E

Contents

1 Introduction 5

1.1 Setup 5

1.2 Convexity 6

2 Convexity 9

3 Cones and Generalized Inequalities 15

4 Conjugate Functions 21

4.1 The Legendre-Fenchel Transform 21

4.2 Duality Correspondences 24

5 Duality in Optimization 27

6 First-order Methods 31

7 Interior Point Methods 33

4 andrew tulloch

8 Support Vector Machines 37

8.1 Machine Learning 37

8.2 Linear Classifiers 38

8.3 Kernel Trick 39

9 Total Variation 41

9.1 Meyer’s G-norm 41

9.2 Non-local Regularization 42

9.2.1 How to choose w(x, y)? 43

10 Relaxation 45

10.1 Mumford-Shah model 45

11 Bibliography 49

1

Introduction

1.1 Setup

A setup is

(i) A function f : U → Ω (energy), an input image (i), and a re-

constructed candidate image (u) and find the minimizer of the

problem

minu∈U

f (u, i) (1.1)

where f is typically of the form

f (u, i) = l(u, i)︸︷︷︸cost

+ r(u)︸︷︷︸regulariser

(1.2)

Example 1.1 (Rudin-Osher-Fortem).

minu12

∫Ω(u− I)2dx +

∫Ω‖∇u‖du (1.3)

• Will reduce contrast

• Will not introduce new jumps

Example 1.2 (Total variation (TV) L1).

minu12

∫Ω|u− I|dx + λ

∫Ω‖∇u‖du (1.4)

• In general does not cause contrast loss.

6 andrew tulloch

• Can show that if I = Br(0), then

u =

I r ≥ λ2

c r < λ2

(1.5)

Example 1.3 (Inpainting).

minu

12

∫Ω/A|u− I|dx + λ

∫Ω‖∇u‖dx (1.6)

1.2 Convexity

Theorem 1.4

If a function f is convex, every local optimizer is a global optimizer.

Theorem 1.5

If a function f is convex, it can (very often) be efficiently optimized

(polynomial time in number of bits in the input)

In computer vision, problems are:

• Usually large-scale (105 - 107 variables)

• Usually non-differentiable

Definition 1.6. f is lower semicountinuous if

f (x′) ≤ lim infx→x′

f (x) = minα ∈ R|(xk)→ x′1, f (xk)→ α (1.7)

Theorem 1.7. f : Rn → R. Then the following are equivalent

(i) f is lower semi continuous,

(ii) epi f is closed C in Rn ×R

(iii) levα f = x ∈ Rn| f (x) ≤ α are closed for all α ∈ R

Proof. ((i) → (ii)) Take (xk, αk) ∈ epi f → (x, α) ∈ Rn ×R. Then By

2.8 f (x) ≤ lim infk→∞ f (xk) ≤ limin fk→∞xk = α as (x, α) ∈ epi f .

((ii) → (iii)) epi f closed implies epi f ∩ (Rn × a′) closed for all

α′ ∈ R, if and only if

x ∈ Rn| f (x) ≤ α′ (1.8)

convex optimization 7

is closed for all α′ ∈ R If α′ = ∞, lev≤+∞ f = Rn. If α′ = −∞,

lev≤−∞ f = ∩k∈N lev≤−k f which is the intersection of closed sets.

((iii)→ (i)) For xj → x′, take the subsequence xk → x′, with

f (xk)→ lim infx→x′

f (x) = c (1.9)

if c ∈ R, for all K(ε) = K′ such that f (xk) ≤ C + ε for all k > K′.

Then

⇒ xk ∈ lev≤C+ε f ⇒ x′ ∈ lev≤C+ε.

Equivalently, x′ ∈ lev≤C f ⇒ f (x′) ≤ C = lim inf x → x′ f (x).

If c = +∞, done - f (x′) ≤ +∞ = lim inf. If c = ∞, same argument

with k ∈N, f (xk) ≤ −k, ...

Example 1.8. f (x) = x is lower semi-continuous, but does not have a

minimizer.

Definition 1.9. f : Rn → R is level-bounded if and only if lev≤α f is

bounded for all α ∈ R.

This is also known as coercivity.

Example 1.10. f (x) = x2 is level-bounded. f (x) = 1|x| is not level

bounded.

Theorem 1.11. f : Rn → R is level bounded if and only if f (xk) → ∞ if

‖xk‖ → ∞.

Proof. Exercise.

Theorem 1.12. f : Rn → R is lower-semicontinuous, level-bounded, and

proper. Then infx f (x) ∈ (−∞,+∞) and argmin f = x| f (x) ≤ in f f (x)is nonempty and compact.

Proof.

arg min f = x| f (x) ≤ infx

f (x) (1.10)

= x| f (x) ≤ inf f (x) +1k

, ∀k ∈N (1.11)

= ∩k∈N levinf f+ 1

k

f (1.12)

If inf f is −∞, just replace inf f + 1k by α with α > ∞, and set

αk = −k, k ∈N.

8 andrew tulloch

These are bounded (by level-boundedness), closed (by f being

lower semicontinuous), and non-empty (since 1k ≥ 0.). Then these

limit sets are compact, we can just take the limit of the left bound-

aries of the level sets, and construct the convergent subsequence that

is contained in every level set.

We need to show that inf f 6= −∞. If inf f = −∞, then there exists

x ∈ argmin f with f (x) = −∞. Since f is proper, this cannot exist.

Thus, inf f 6== −in f ty.

Remark 1.13. For Theorem 1.12, it suffices to have lev≤α f bounded and

non-empty for at least one α ∈ R.

Proposition 1.14. We have the following properties for lower semi continu-

ity.

(i) If f , g is lower semicontinuous, then f + g is lower semicontinuous

(ii) IF f is lower semicontinuous, λ ≥ 0, then λ f is lower semicontinuous

(iii) f : Rn → R is lower semicontinuous, g : Rm → Rn is continuous, then

f g is lower semicontinuous.

2

Convexity

Definition 2.1. (i) f : Rn → R is convex if

f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y) (2.1)

for all x, y ∈ Rn, τ ∈ (0, 1).

(ii) A set C ⊆ Rn is convex if and only if I(C) is convex.

(iii) f is strictly convex if and only if (2.1) holds strictly whenever

x 6= y and f (x), f (y) ∈ R.

Remark 2.2. C ⊆ Rn is convex if and only if for all x, y ∈ C, the connect-

ing line segment is contained in C.

Exercise 2.3. Show

x|aTx + b ≥ 0 (2.2)

is convex.

f (x) = aTx + b (2.3)

is convex.

Definition 2.4. x0, . . . , xm ∈ Rn, λ0, . . . , λm ≥ 0 with ∑mi=0 λi = 1, then

∑i λixi is a convex combination of the xi.

Theorem 2.5. Rn → R is convex if and only if

f

(n

∑i=1

λixi

)≤

n

∑i=1

λi f (xi) (2.4)

10 andrew tulloch

C ⊆ Rn is convex if and only if C contains all convex combinations of

it’s elements.

Proof.

n

∑i=1

λixi = λmxm +

(1− λm

λi1− λm

xi

)(2.5)

which is a convex combination of two points, and proceed by induc-

tion on m.

The set version is proven by application on I(C).

Proposition 2.6. f : Rn → R is convex implies that the domain of f is

convex.

Proposition 2.7. f : Rn → R is convex if and only if epi f is convex in

Rn ×R if and only if

(x, α) ∈ Rn ×R| f (x) < α (2.6)

is convex in Rn ×R.

Proof. epi f is convex if and only if for all τ ∈ (0, 1), ∀x, y, α ≥f (x), β > f (y), (x, α), (y, β) ∈ epi f , we have

f ((1− τ)x + τy) ≤ (1− τ)α + τβ (2.7)

⇐⇒ ∀τ ∈ (0, 1), ∀x, y, f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y) (2.8)

⇐⇒ f is convex. (2.9)

Proposition 2.8. f : Rn → R is convex implies lev≤α f is convex for all

α ∈ R.

Proof.

f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y) ≤ α

which implies lev≤α is convex.

α = ∞ then the lev≤α f = Rn which is convex.

Theorem 2.9. f : Rn → R is convex. Then


(i) arg min f is convex

(ii) x is a local minimizer of f implies x is a global minimizer of f

(iii) f is strictly convex and proper implies f has at most one global mini-

mizer.

Proof. (i) f = ∞ ⇒ arg min f = ∅. f 6= ∞ ⇒ arg min f = lev≤inf f f is

convex by previous proposition.

(ii) Assume x is a local minimizer and there exists y with f (y) < f (x).

Then

f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y)︸︷︷︸< f (x)

< f (x)

Taking τ → 0 shows that x cannot be a local minimizer, and thus y

cannot exist.

(iii) Assume x, y minimizes, which implies f (x) = f (y). Then

f (12

x +12

y) <12

f (x) +12

f (y) = f (x) = f (y)

which implies x, y not global minimizers.

Proposition 2.10. To construct convex functions:

(i) Let fi, i ∈ I convex, then

f (x) = supi∈I

f (x) (2.10)

is convex.

(ii) Let fi, i ∈ I strictly convex, I finite, then

supi∈I

f (i) (2.11)

is strictly convex.

(iii) Ci, i ∈ I convex sets, then

∩i∈ICi (2.12)

is convex.

12 andrew tulloch

(iv) fk, k ∈N convex,

lim supk→∞

fk(x) (2.13)

is convex.

Proof. Exercise.

Example 2.11. (i) C, D convex does not imply C ∪ D is convex (e.g. dis-

joint)

(ii) f : Rn → R is convex, C is convex implies f + I(C) is convex.

(iii) f (x) = |x| = maxx′ − x is convex.

(iv) f (x) = ‖x‖p, p ≥ 1 is convex, as

‖ · ‖p = sup‖y‖p=1

〈·, y〉 (2.14)

Theorem 2.12. Let C ⊆ Rn be open and convex. Let f : C → R be

differentiable. Then the following are equivalent:

(i) f is [strictly] convex

(ii) 〈y− x,∇ f (y)−∇ f (x)〉 ≥ 00 for all x, y ∈ C [with > 0 for x 6= y]

(iii) f (x) + 〈∇ f (x), y− x〉 ≤ f (y) for all x, y ∈ C [with < for x 6= y]

(iv) If f is additionally twice differentiable, then ∇2 f is positive semidefinite

for all x ∈ C.

(ii) is monotonicity of ∇ f .

Proof. Exercise (reduce to n = 1, then extend by a convex function is

convex on Rn iff it is convex on all Rn−1 subsets.)

Remark 2.13. Note that the inverse of (iv) does not necessarily hold, for

example f (x) = x4.

Proposition 2.14. We have the following results hold for convex functions.

(i) fi, . . . , fm : Rn → R is convex, λi, . . . , λm ≥ 0, the n

f = ∑i

λi fi (2.15)

is convex, and strictly convex if there exists i such that λi > 0 and fi is

strictly convex.


(ii) fi : Rn → R is convex implies f (x1, . . . , xm) = ∑ fi(xi) is convex,

strictly convex if all fi are strictly convex.

(iii) f : Rn → R is convex, A ∈ Rn×n, b ∈ Rn implies g(x) = f (Ax + b) is

convex.

Remark 2.15.

‖M‖ε = supx∈Rn

(‖Mx‖)

is convex.

Proposition 2.16. (i) c1, . . . , Cm convex implies C1 × · · · × Cm convex

(ii) C ⊆ Rn, A ∈ Rm×n, b ∈ Rm implies L(C) is convex with L(x) =

Ax + b.

(iii) C ⊆ Rm, ...⇒ L?(C) is convex.

(iv) f , g convex implies f + g are convex

(v) f convex, λ ∈ R implies λ f is convex

Definition 2.17. For S ⊆ Rn, x ∈ Rn, define the projection of x onto S

as

ΠS(y) = arg minx∈S

‖x− y‖2 (2.16)

Proposition 2.18. If C ⊆ Rn is convex, closed, and non-empty, then ΠC is

single-valued - that is, the projection is unique.

Proof. Let

ΠS(y) = arg minx∈S

12‖x− y‖2

2 + IC(x) (2.17)

To show uniqueness, 12‖x − y‖2 is strictly convex, and ∇2 1

2‖x −y‖2 = I > 0, so f is strictly convex.

f is proper (C 6= ∅), and thus f has at most one minimizer.

To show existence, we have that f is proper, lower semicontinuous

(left part from continuous, right part from C closed). Level bounded

as ‖x− y‖22 → ∞ as ‖x‖2 → ∞, and I(C) ≥ 0.

Thus, the arg min f 6= ∅.

14 andrew tulloch

Definition 2.19. Let S ⊆ Rn is arbitrary. Then

con S = ∩C convex, C ⊇ SC (2.18)

is the convex ball of S.

Remark 2.20. con S is the smallest convex set that contains S.

Theorem 2.21. Let S ⊆ Rn, then

con S =

n

∑i=0

λixi|xi ∈ S, λi ≥ 0,n

∑i=0

λi = 1

(2.19)

Proof. D is convex and contains S - if x, y ∈ D, then (1− τ)x + (τ)y ∈D. Thus con S ⊆ D.

From a previous theorem, con S convex implies con S contains all

convex combinations of points in S. Thus con S ⊇ D

Thus con S = D.

Definition 2.22. For a set C ⊆ Rn, define cl C as

cl C = x ∈ Rn|for all open neighborhoods N of x, N ∩ C 6= ∅(2.20)

int C as

int C = x ∈ Rn|there exists an open neighborhood N of x with N ⊆ C(2.21)

∂C (the boundary) as

∂C = cl C\ int C (2.22)

Remark 2.23.

cl G =⋂

S closed, S ⊇ G

S (2.23)

3

Cones and Generalized Inequalities

Definition 3.1. K ⊆ Rn is a cone if and only if 0 ∈ K and λx ∈ K for

all x ∈ K, λ ≥ 0.

Definition 3.2. A cone in pointed if and only if

x1 + · · ·+ xn = 0 ⇐⇒ x1 = · · · = xn = 0 (3.1)

Example 3.3. (i) Rn is a pointed cone.

(ii) (x1, x2) ∈ R2|x2 > 0 is not a cone.

(iii) (x1, x2) ∈ R2|x2 ≥ 0 is a cone, but is not pointed.

(iv) x ∈ Rn|xi ≥ 0, i ∈ 1, . . . , n is a pointed cone.

Proposition 3.4. K ⊆ Rn be any set. Then the following are equivalent:

(i) K is a convex cone.

(ii) K is a cone and K + K ⊆ K.

(iii) K 6= ∅ and ∑ni=0 αixi ∈ K for all xi ∈ K, αi ≥ 0.

Proof. Exercise.

Proposition 3.5. If K is a convex cone, then K is pointed if and only if

K ∩ (−K) = 0.

Proof. Exercise.

Definition 3.6.

∂ f (x) = v ∈ Rn| f (x) + 〈v, y− x〉 ≤ f (y)∀y (3.2)

16 andrew tulloch

f ′(x) = limt→0

f (x + t)− f (x)t

(3.3)

Proposition 3.7. Let f , g : Rn → R be convex. Then

(i) f differentiable at x implies ∂ f (x) = ∇ f (x).

(ii) f differentiable at x, g(x) ∈ R implies

∂( f + g)(x) = ∇ f (x) + ∂g(x) (3.4)

Proof. We have

(i) Equivalent to (ii) with g = 0.

(ii) To show LHS ⊇ RHS, if v ∈ ∂g(x), then 〈v, y− x〉 ≤ g(y), which

by Theorem 3.12 in notes gives

〈∇ f (x), y− x〉+ f (x) ≤ f (y) (3.5)

〈v +∇ f (x), y− x〉+ ( f + g)(x) ≤ ( f + g)(y) (3.6)

⇐⇒ v +∇ f (x) ∈ ∂( f + g) (3.7)

The other direction is given as

lim inf z→ xf (z) + g(z)− f (x)− g(x)− 〈v, z− x〉

‖z− x‖ ≥ 0

(3.8)

⇒ lim infz→x

g(z)− g(x)− 〈v−∇ f (x), z− x〉‖z− x‖ ≥ 0

(3.9)

⇒ lim inft↓0

g(z(t))− g(x)− 〈v−∇ f (x), (1− t)x + ty− y〉t‖y− x‖ ≥ 0

(3.10)

⇒ lim inft↓0

t(g(y)− g(x))− t〈v−∇ f (x), y− x〉t‖y− x‖ ≥ 0

(3.11)

⇒ g(y)− g(x)− 〈v−∇ f (x), y− x〉 ≥ 0⇒ v−∇ f (x) ∈ ∂g(x)

(3.12)

v ∈ ∇ f (x)− ∂g(x)

(3.13)


Theorem 3.8. Let f : Rn → R be proper. Then

x ∈ arg min f ⇐⇒ 0 ∈ ∂ f (x) (3.14)

Proof.

0 ∈ ∂ f (x) ⇐⇒ 〈0, y− x〉︸︷︷︸0

+ f (x) ≤ f (y) (3.15)

Definition 3.9. For a convex set C ⊆ Rn and point x ∈ C,

NC(x) = v ∈ Rn|〈v, y− x〉 ≤ 0∀y ∈ G (3.16)

is the “normal cone” of x. By convention, NC(x) = ∅ for all x /∈ C.

Proposition 3.10. Let C be convex and C 6= ∅. Then

∂I(C) (x) = NC(x) (3.17)

Proof. For x ∈ C, we have

∂I(C) = v|I(C) (x) + 〈v, y− x〉 ≤ I(C) (y)∀y ∈ C = NC(x) (3.18)

For x /∈ C, . Fill in proof hereFill in proof here

Proposition 3.11. C, closed, C 6= ∅ and convex, x ∈ Rn. Then

y = ΠC(x) ⇐⇒ x− y ∈ NC(y) (3.19)

Proof.

y ∈ ΠC(x) ⇐⇒ y minimizes12‖y− x‖2 + I(C) (y′)︸︷︷︸

g(y)

. (3.20)

If and only if 0 ∈ ∂g(y) if and only if 0 ∈ y− x + ∂I(C) (y)

Proposition 3.12. Let f : Rn → R be convex proper. Then

∂ f (x) =

∅ x /∈ dom f

v ∈ Rn|(u,−1) ∈ Nepi f (x, f (x)) x ∈ dom f(3.21)

18 andrew tulloch

If x ∈ dom f ,

Ndom f = v ∈ Rn|(x, 0) ∈ Nepi f (x, f (x)) (3.22)

Example 3.13. Let subdifferential of f : R→ R, f (x) = |x| is

∂ f (x) =

sign x x 6= 0

[−1, 1] x = 0(3.23)

Definition 3.14. For C ⊆ Rn, define the affine hull as

aff(C) = ∩A affine, C ⊆ A A (3.24)

rint(C) = x ∈ Rn|there exists an open neighborhood N of x such that N ∩ aff(C) ⊆ C(3.25)

Example 3.15.

aff(Rn) = Rn (3.26)

aff(Rn) = Rn (3.27)

rint([0, 1]2) = (0, 1)2 (3.28)

rint([0, 1]× 0) = (0, 1)× 0 (3.29)

Exercise 3.16. We know

intA ∩ B ⊆∫

A ∩∫

B (3.30)

Does

rint(A ∩ B) ⊆ rint A ∩ rint B (3.31)

Proposition 3.17. Let f : Rn → R be convex. Then

(i) (i) g(x) = f (x + y)⇒ ∂g(x) = ∂ f (x + y)

(ii) g(x) = f (λx)⇒ ∂g(x) = λ∂ f (x)

(iii) g(x) = λ f (x)⇒ ∂g(x) = λ∂ f (x)

(ii) f : Rn → R proper, convex, A ∈ Rm × n such that

Ay|y ∈ Rm ∩ rint dom f 6= ∅ (3.32)


Then for x ∈ dom( f A) we have

∂( f A)(x) = AT∂ f (Ax) (3.33)

(iii) Let f1, . . . fm : Rn → R be proper, convex, and rint dom f1 ∩ · · · ∩rint dom fm 6= ∅. Then

∂ f (x) = ∂ f1(x) + · · ·+ ∂ fm(x) (3.34)

4

Conjugate Functions

4.1 The Legendre-Fenchel Transform

Definition 4.1. For f : Rn → R, define

con f (x) = supg≤ f ,g convex

g(x) (4.1)

is the convex hull of f .

Proposition 4.2. con f is the greatest convex function majorized by f .

Definition 4.3. For f : Rn → R, the (lower) closure of f is defined as

cl f (x) = lim infy→x

f (y) (4.2)

Proposition 4.4. For f : Rn → R,

epi(cl f ) = cl(epi f ) (4.3)

In particular, if f is convex, then cl f is convex.

Proof. Exercise.

Proposition 4.5. If f : Rn → R, then

(cl f )(x) = supg≤ f ,g lsc

g(x) (4.4)

Proof. Fill inFill in

22 andrew tulloch

Theorem 4.6. Let C ⊆ Rn be closed and convex. Then

C =⋂

(b,β)s.t.c⊆Hb,β

Hb,β (4.5)

where

Hb,β = x ∈ Rn|〈x, b〉 − β ≤ 0 (4.6)

Proof.

Theorem 4.7. Let f : Rn → R be proper, lsc, and convex. Then

f (x) = supg≤ f ,g affine

g(x) (4.7)

Proof. This is quite an involved

proof.

This is quite an involved

proof.Definition 4.8. For f : Rn → R, let f ? : Rn → R be defined by

f ?(v) = supx∈Rn〈v, x〉 − f (x) (4.8)

as the conjugate to f . The mapping f 7→ f ? is the Legendre-Fenchel

Transform

Remark 4.9. For v ∈ Rn,

f ?(v) = supx ∈ Rn〈v, x〉 − f (x) (4.9)

⇒ f ?(v) ≥ 〈v, x〉 − f (x)∀x ∈ Rn ⇐⇒ f (x) ≥ 〈v, x〉 − f ?(x)∀x ∈ Rn.

(4.10)

Thus f ? is the largest affine function with gradient v majorized by f .

Theorem 4.10. Assume f : Rn → R. Then

(i) f ? = (con f )? = (cl f )? = (cl con f )? and f ≥ f ?? = ( f ?)?, the

biconjugate of f .

(ii) If con f is proper, then f ?, f ?? as proper, lower semicontinuous, convex

and f ?? = cl con f .

(iii) If f : Rn → R is proper, lower semicontinuous, and proper, then

f ?? = f (4.11)


Proof. We have

(v, β) ∈ epi f ? ⇐⇒ β ≥ 〈v, x〉− f (x)∀x ∈ Rn ⇐⇒ f (x) ≥ 〈v, x〉− β∀x ∈ Rn

(4.12)

We claim that for an affine function h, we have h ≤ f ⇐⇒ h ≤con f ⇐⇒ h ≤ cl f ⇐⇒ h ≤ cl con f . This is shown as con f is the

largest convex function less than or equal to f , and h is convex. Same

for cl, cl con, etc.

Thus in (4.12) we can replace f by con f , cl f , cl con f , which gives

our required result.

We also have

f ??(y) = supv∈Rn〈v, y〉 − sup

x∈Rn〈v, x〉 − f (x) (4.13)

≤ supv∈Rn〈v, y〉 − 〈v, y〉+ f (y) (4.14)

= f (y) (4.15)

For the second part, we have con f is proper, and claim that

cl con f is proper, lower semicontinuous, and convex.

Lower semicontinuity is a give. Convexity is given by the previous

proposition that f is convex implies cl f is convex. Properness is to be

shown in an exercise.

Applying the previous theorem,

cl con f (x) = supg∈cl con f ,g affine

g(x) (4.16)

= sup(v,β)∈epi f ?

〈v, x〉 − β (4.17)

= sup(v,β)∈epi f ?

〈v, x〉 − f ?(v) (4.18)

= supv∈dom f ?

〈v, x〉 − f ?(v) (4.19)

= supv∈Rn〈v, x〉 − f ?(v) − f ??(x) (4.20)

with g(x) ≤ 〈v, x〉 − β, (v, β) ∈ epi(cl con f )? ⇐⇒ (v, β) ∈ epi f ?.

To show f ? is proper, lower semicontinuous, and convex, we have

epi f ? is the intersection of closed convex sets, and therefore closed

and convex, and hence f ? is lower semicontinuous and convex.

24 andrew tulloch

To show properness, we have con f is proper implies there exists

x ∈ Rn with con f (x) < ∞. Then f ?(v) = supx∈Rn〈v, x〉 − f (x),which is greater than −∞.

If f ? ≡ +∞, then cl con f = f ?? = supv〈v, x− f ?(x) ≡ −∞, and so

cl con f is proper, which implies f ? is proper, lower semicontinuous,

and convex. Applying to f ? - we need con f ? proper (which is proper

by previous result)„ and thus f ?? is proper, lower semicontinuous,

and convex.

For part 3, apply 2 - f is convex, which implies con f = f and

con f is proper (as f is proper), and f is lsc and convex, and thus

f ?? = cl con f = f .

4.2 Duality Correspondences

Theorem 4.11. Let f : Rn → R be proper, lower semicontinuous, and

convex. Then:

(i) ∂ f ? = (∂ f )−1

(ii) v ∈ ∂ f (x) ⇐⇒ f (x) + f ?(v) = 〈v, x〉 ⇐⇒ x ∈ ∂( f ?)(v).

(iii)

∂ f (x) = arg maxv′〈v′, x〉 − f ?(v′) (4.21)

∂ f ?(x) = arg maxx′〈v, x′〉 − f (x′) (4.22)

(4.23)

Proof. (i) This is obvious from (2)

(ii)

f (x) + f ?(v) = 〈v, x〉 (4.24)

⇐⇒ f ?(v) = 〈v, x〉 − f (x) (4.25)

⇐⇒ x ∈ arg max〈v, x′〉 − f (x′) (4.26)

... (4.27)

Finish off proofFinish off proof


Proposition 4.12. Let f : Rn → R be proper, lower semicontinuous, and

convex. Then

( f (·)− 〈a, ·〉)? = f ?(·+a) (4.28)

( f (·+ b))? = f ?(·)− 〈·, b〉 (4.29)

( f (·) + c)? = f ?(·)− c (4.30)

(λ f (·))? = λ f ?(·λ), λ > 0 (4.31)

(λ f (·λ))? = λ f ?(·) (4.32)

Proof. Exercise

Proposition 4.13. fi : Rn → R, i : 1, . . . , m proper, f (x1, . . . , xm) =

∑i fi(xi). Then f ?(v1, . . . , vm) = ∑i f ?(vi)

Definition 4.14. For any set S ⊆ Rn, define the support function

GS(v) = supx∈S〈v, x〉 = (δS)

?(v) (4.33)

Definition 4.15. A function h : Rn → R is positively homogeneous if

0 ∈ dom h and h(λx) = λh(x) for all λ > 0, x ∈ Rn.

Proposition 4.16. The set of positive homogeneous proper lower semicon-

tinuous convex functions and the set of closed convex nonempty sets are in

one-to-one correspondence through the Legendre-Fenchel transform.

δC ↔ GC (4.34)

and

x ∈ ∂GC(v) ⇐⇒ x ∈ C (4.35)

and

GC(v) = 〈v, x〉 ⇐⇒ v ∈ NC(x) = ∂δC(x) (4.36)

The set of closed convex cones is in one-to-one correspondence with itself:

δK ↔ δK? (4.37)

K? = v ∈ Rn|〈v, x〉 ≤ 0∀x ∈ K (4.38)

26 andrew tulloch

and

x ∈ NK?(v) ⇐⇒ x ∈ K, v ∈ K?, (4.39)

〈x, v〉 = 0 ⇐⇒ v ∈ NK(x) (4.40)

5

Duality in Optimization

Definition 5.1. For f : Rn ×Rm → R proper, lower semicontinuous,

convex, we define the primal and dual problems

infx∈Rn

φ(x), φ(x) = f (x, 0) (5.1)

supy∈Rm

ψ(y), ψ(y) = f ?(0, y) (5.2)

and the inf-projections

p(u) = infx∈Rn

f (x, u) (5.3)

q(v) = infy∈Rm

f ?(v, y) (5.4)

f is the perturbation function for φ, p is the associated projection

function.

Consider the problem

infx

12‖x− z‖2 + δ≥0(Ax− b) = inf φ(x) (5.5)

Consider the perturbed problem

f (x, u) =12‖x− z‖2 + δ≥0(Ax− b + u) (5.6)

Proposition 5.2. Assume f satisfying the assumptions in Definition 5.1.

Then

(i) φ,−psi are convex and lower semicontinuous

(ii) p, q are convex

28 andrew tulloch

(iii) p(0) = infx φ(x)

(iv) p??(0) = supy ψ(y)

(v) infx φ(x) < ∞ ⇐⇒ 0 ∈ dom p

(vi) sup ψ(y) > −∞ ⇐⇒ 0 ∈ dom q

Proof. (i) φ is clearly convex. For ψ, f ? is lower semicontinuous and

convex, which implies −ψ is lower semicontinuous and convex.

(ii) Look at the strict epigraph of p:

E = (u, α) ∈ Rm ×R|p(u) < α (5.7)

= (u, α) ∈ Rm ×R|∃x : f (x, u) < α(5.8)

= A(u, α, x) ∈ Rm ×R×Rn| f (x, u) ≤ α(5.9)

= A(u, α, x) 7→ (u, α) (5.10)

as a linear map over a convex set. As E is convex, p must be con-

vex. Similarly with q.

(iii) For p(0), this proceeds by definition. For p??(0),

p?(y) = supu〈y, u〉 − p(u) (5.11)

= supu,x〈y, u〉 − f (x, u) (5.12)

= f ?(0, y) (5.13)

and

p??(0) = supy〈0, y〉 − p?(y) (5.14)

= supy− f ?(0, y) (5.15)

= supy

ψ(y) (5.16)

(iv) By definition, 0 ∈ dom p ⇐⇒ p(0) = infx f (x) < ∞.

complete proofcomplete proof


Theorem 5.3. Let f as in Definition 7.1. Then weak duality holds

p(0) = infx

φ(x) ≥ supy

ψ(y) = p??(0) (5.17)

and under certain conditions the inf, sup are equal and finite (strong dual-

ity).

p(0) ∈ R and p lower-semicontinuous at 0 if and only if inf φ(x) =

sup ψ(y) ∈ R.

Definition 5.4.

inf φ− sup ψ (5.18)

is the duality gap

Proof. (⇐) p?? ≤ cl p ≤ p⇒ cl p(0) = p(0).

(⇒) cl p is lower semicontinuous, convex⇒ cl p(x) is proper,

sup ψ = (p?)?(0) = (cl p)??(0) = cl p(0) = p(0) = inf φ

Proposition 5.5.

Fill in notes from lectureFill in notes from lecture

b

6

First-order Methods

Idea is that we do gradient descent on our objective function f , with

step size τk.

Definition 6.1. Let f : Rn → R, then

(i)

Fτk f (xk) = xk − τk∂ f (xk) = (I − τk∂ f )(xk) (6.1)

(ii)

Bτk f (xk) = (I + τk∂ f )−1(xk) (6.2)

so

(6.3)

Fill inFill in

Proposition 6.2. If f : Rn → R is proper, lower semicontinuous, and

convex, for τ > 0,

Bτk f (x) = arg miny 1

2τk‖y− x‖2

2 + f (y) (6.4)

Proof.

y ∈ arg min... ⇐⇒ 0 ∈ 1τk(y− x)∂ f (y) (6.5)

(6.6)

Missed lecture materialMissed lecture material

7

Interior Point Methods

Theorem 7.1 (Problem). Consider the problem

infx〈c, x〉+ δK(Ax− b) (7.1)

with K a closed convex cone - thus Ax ≥K b.

Notation.

The idea is to replace δK by a smooth approximation F, with

F(x)→ ∞ as x → boundary K, and solve inf 〈c, x〉+ 1t F(x), equivalent

to solving t〈c, x〉+ F(x).

Proposition 7.2. F is canonical barrier for K. Then F is smooth on

dom F =∫

K and strictly convex, with

F(tx) = F(x)−ΘFλt, (7.2)

and

(i) −∇ F(x) ∈ dom F

(ii) 〈∇ F(x), x〉 = −ΘF

(iii) −∇ F(−∇ F(x)) = x

(iv) −∇ F(tx) = − 1t ∇ F(x).

For the problem

infx〈c, x〉 (7.3)

34 andrew tulloch

such that Ax ≥K b with K closed, convex, self-dual cone, we have

supx

infx〈c, x〉+ δK(Ax− b) (7.4)

⇒ −〈b, y〉 − h?(−ATy− c)− k?(y) (7.5)

⇐⇒ supy〈b, y〉s.t.y ≥K 0, ATy = c (7.6)

⇐⇒ infx〈c, x〉+ F(ax− b) (7.7)

⇐⇒ supy〈b, y〉 − F(y)− δATy=c (7.8)

More conditions on, etcMore conditions on, etc

Proposition 7.3. For t > 0, define

x(t) = arg mint〈c, x〉+ F(Ax− b) (7.9)

y(t) = arg min−t〈b, y〉+ F(y) + δATy=c (7.10)

z(t) = (x(t), y(t)) (7.11)

These paths exist, are unique, and

(x, y) = (x(t), y(t)) ⇐⇒

ATy = c

ty +∇ F(Ax− b) = 0(7.12)

Proof. The first part follows by definition.

For (7.12), the primal optimality condition is that 0 = tc +

AT∇ F(Ax−b). The dual optimality condition is that 0 ∈ −tb +

∇ F(y) + Nz|ATz=c(y), which is

⇐⇒ 0 ∈ −tb +∇ F(y) +

∅ ATy 6= c

range A ATy = c(7.13)

⇐⇒ 0 ∈ −tb +∇ F(y) + range A, ATy = c

(7.14)

Assume x satisfies the primal optimality condition. Then

tc + AT∇ F(Ax− b) = 0 ⇐⇒ tc + AT(−ty) = 0 (7.15)

⇐⇒ ATy = c (7.16)


A few more steps...A few more steps...

Proposition 7.4. If x, y feasible, then

φ(x)− ψ(y) = 〈y, Ax− b〉 (7.17)

Moreover, for (x(t), y(t)) on the central path,

φ(x(t))− ψ(y(t)) =ΘFt

(7.18)

Proof.

φ(x)− ψ(y) = 〈c, x〉 − 〈b, y〉 (7.19)

=⟨

ATy, x⟩− 〈b, y〉 (7.20)

= 〈Ax− b, y〉 (7.21)

For the second part, we have

φ(x(t))− ψ(y(t)) = 〈y(t), Ax(t)− b〉 (7.22)

=

⟨−1

t∇ F(Ax(t)− b), Ax(t)− b

⟩(7.23)

=ΘFt

(7.24)

Fill in from lecture notesFill in from lecture notes

8

Support Vector Machines

8.1 Machine Learning

Given X = x1, . . . , xn a sample set, xi ∈ F ⊆ Rm, with F our

feature space, and C a set of classes. We seek to find

hθ : F → C, θ ∈ Θ (8.1)

with Θ our parameter space.

Our task is to find θ such that hθ is the “best” mapping from X

into C.

(i) Unsupervised learning (only X is known, usually |C| not known)

• Clustering

• Outlier detection

• Mapping to lower dimensional subspace

(ii) Supervised learning (|C| known). We have training date T =

(x1, y1), . . . , (xn, yn), with yi ∈ C.

We seek to find

θ = arg minθ∈Θ

f (θ, T) =n

∑i=1

g(yi, hθ(xi)) + R(θ) (8.2)

38 andrew tulloch

8.2 Linear Classifiers

The idea is to consider

hθ : Rm → −1, 1, θ =

w

b

(8.3)

hθ(x) = sign(〈w, x〉+ b) (8.4)

We want to consider maximum margin classifiers, satisfying

maxw,b,w 6=0

mini

yi(

⟨w‖w‖ , x

⟩+

b‖w‖ ) (8.5)

which can be rewritten as

maxw,b

c (8.6)

such that

c ≤ yi(⟨

w‖w‖ , xi

⟩+

b‖w‖

)(8.7)

‖w‖c ≤ yi(⟨

w, xi⟩+ b)

(8.8)

(8.9)

or just

minw,b

12‖w‖2 (8.10)

such that

1 ≤ yi(〈w, x〉+ b) (8.11)

Definition 8.1. In standard form,

infw,b

k(w, b) + h(M

w

b

− e) (8.12)


The conjugates are

k(w, b) =12‖w‖2 (8.13)

k?(u, c) =12‖u‖2 + δ0(c) (8.14)

h(z) = δ≥0(z)h?(v) = δ≤v(v) (8.15)

In saddle point form,

infw,b

supz

12‖w‖2

2 +

⟨M

w

b

− e, z

⟩− δ≤0(z) (8.16)

The dual problem is

supz−〈e, z〉 − δ≤0(z)−

12‖ −

n

∑i=1

yixizi‖22 − δ0(〈y, z〉) (8.17)

and thus

infz

12‖∑

iyixizi‖2

2 + 〈e, z〉 (8.18)

such that z ≤ 0, 〈y, z〉 = 0.

The optimality conditions are

(8.19)

Fill in rest of optimality con-

ditions

Fill in rest of optimality con-

ditionsWe use the fact that if k(x) + h(Ax + b) is our primal, then the dual

is −〈b, y〉 − k?(−ATy)− h?(z).

Explanation of support vec-

tors

Explanation of support vec-

tors

8.3 Kernel Trick

The idea is to embed our features into a a higher dimensional space,

mapping function φ. Then our decision function takes the form

hθ(x) = sign(〈φ(x), w〉+ b) (8.20)

9

Total Variation

9.1 Meyer’s G-norm

Idea - regulariser that favors textured regions

‖u‖G = inf ‖v‖∞| ÷ v = u, v ∈ L∞(Rd, Rd) (9.1)

Discretized, we have

‖u‖G = infδ?C (v) + δ−GTv=u (9.2)

C = v|∑x‖v(x)‖2 ≤ 1 (9.3)

= inf v supv

supwδ?C(v)−

⟨w, GTv + u

⟩

(9.4)

= supw− sup

v〈v,−Gw〉+ δ?C(v)− 〈w, u〉

(9.5)

= supwδC(−Gw) + 〈w, u〉 (9.6)

= supw〈w, u〉 − δC(−Gw) (9.7)

= sup〈w, u〉|TV(w) ≤ 1 (9.8)

and so ‖ · ‖G is the dual to TV.

‖ · ‖G = δ?BTV(9.9)

42 andrew tulloch

where BTV = u|TV(u) ≤ 1.Similarly,

sup〈u, w〉|‖w‖G ≤ 1 (9.10)

= sup〈u, w〉|∃v : w = −GTv, ‖v‖∞ ≤ 1(9.11)

= sup⟨

u,−GTv⟩|‖v‖∞ ≤ 1 (9.12)

= TV(u) (9.13)

Why is ‖ · ‖ good in separating noise?

arg min12‖u− g‖2

2s.t.‖u‖G ≤ λ (9.14)

= ∏λB‖·‖G

(g) = BλB‖·‖G(g) = g− BλB‖·‖G

= g− BλTV(g)

(9.15)

9.2 Non-local Regularization

In real-world images, large ‖∇ u‖ does not always mean noise.

Definition 9.1. Ω = 1, . . . , n, given u ∈ Rn, x, y ∈ Ω, w ∈ Ω2 →R≥0, then

∂yu(x) = (u(y)− u(x))w(x, y) (9.16)

∇w

u(x = (∂yu(x)))y∈Ω (9.17)

A suitable divergence ÷wu(x) = ∑y∈Ω(w(x, y) − v(y, x))w(x, y)

adjoint to ∇w with respect to Euclidean inner products.

〈− ÷w v, u〉 =⟨

v,∇w

u⟩

(9.18)

Non-local regularizers are

J(u) = ∑x∈Ω

g(∇w

u(x)) (9.19)


with

TVgNL(u) = ∑

x∈Ω‖∇

wu(x)‖2 (9.20)

TVdNL(u) = ∑

x∈Ω∑

y∈Ω|∂yu(x) (9.21)

This reduces to classical TV if the weights are chosen as constant

(w(x, y) = 1h ).

9.2.1 How to choose w(x, y)?

(i) Large if neighborhoods of x, y are similar with respect to a dis-

tance metric.

(ii) Sparse, otherwise we have n2 terms in the regularized (n is num-

ber of pixels).

Possible choice

du(x, y) =∫

ΩKG(t)(u(y + t)− u(x + t))2dt (9.22)

A(x) = arg minA∑

y∈Adu(x, y)|A ⊆ S(x), |A| = k (9.23)

w(x, y) =

1 y ∈ A(x), x ∈ A(y)

0 otherwise(9.24)

with KG a Gaussian kernel of variance G2.

10

Relaxation

Example 10.1 (Segmentation). Find C ⊆ Ω that fits the given data and

prior knowledge about typical shapes.

Theorem 10.2 (Chan-Vese). Given g : Ω→ R, Ω ⊆ Rd, consider

infC⊆R,c1,c2∈R

fCV(C, c1, c2) (10.1)

fCV(C, c1, c2) =∫

C(g− c1)

2dx +∫

Ω\C(g− c2)

2dx + λHd−1(∂C)

(10.2)

. Confirm form of the regu-

lariser

Confirm form of the regu-

lariserThus fit c1 to shape, c2 to outside of shape, and C is the region of the

shape.

10.1 Mumford-Shah model

infK ⊆ Ω closed,u∈C∞(Ω\K)

f (K, u) (10.3)

f (K, u) =∫

Ω(g− u)2dx + λ

∫Ω\K‖∇ u‖2

2dx + νHd−1(∂K). (10.4)

Chan-Vese is a special case of this, with forcing u = c1I(C) +

c2(1− I(C)).

46 andrew tulloch

infC⊆Ω,c1,c2∈R

∫C(g− c1)

2dx +∫

Ω\C(g− c2)

2dx + λHd−1(∂C) (10.5)

infu∈BV(D,0,1),c1,c2∈R

∫Ω

u(g− c1)2 + (1− u)(g− c2)

2dx + λTV(u)

(10.6)

Fix c1, c2. Then

infu∈BV(Ω,0,1)

∫Ω

u ((g− c1)2 − (g− c2)

2)︸︷︷︸S(·)

dx + λTV(u) (C)

Replacing 0, 1 by it’s convex hull, we obtain

infu∈BV(Ω,[0,1])

∫Ω

u · Sdx + λTV(u) (R)

This is a “convex relaxation” -

(i) Replace non-convex energy by convex approximation

(ii) Replace non-convex f by con f .

Can the minima of (R) have values /∈ 0, 1. Assume u1, u2 ∈BV(Ω, 0, 1) that u2 minimized (R) and (C), and u1 6= u2. Then

12

u1 +12

u2 /∈ BV(Ω, 0, 1) (10.7)

is a minimizer of (R).

Proposition 10.3. Let c1, c2 be fixed. If u minimizes (R) and u ∈ BV(Ω, 0, 1),then u minimizes (C) and (u = 1C) C minimizes fCV(·, c1, c2).

Proof. Let C′ ⊆ Ω. Then 1C′ ∈ BV(Ω, [0, 1]). Then

f (1C′) ≥ f (1C) (10.8)

∀C′ ⊆ Ω.

Definition 10.4. L = BV(Ω, [0, 1]). For u ∈ L, α ∈ [0, 1],

uα(x) = I(u > α) (x) =

1 u(x) > α

0 u(x) ≤ α(10.9)


Then f : L → R satisfies general coarea condition (gcc) if and

only if

f (u) =∫ 1

0f (uα)dα (10.10)

Upper bound?Upper bound?

Proposition 10.5. Let s ∈ L∞(Ω), Ω bounded. Then f as in (R) satisfies

gcc.

Proof. If f1, f2 satisfy gcc, then f1 + f2 satisfy gcc. This follows as

λTV satisfies gcc by the coarea formula for total variation.

Then we have

∫Ω

u(x)S(x)dx =∫

Ω(∫ 1

0I(u(x) > α) S(x)dα)dx (10.11)

=∫ 1

0

∫Ω

I(u(x) > α) dxdα. (10.12)

Theorem 10.6. Assume f : BV(Ω, [0, 1])→ R satisfies gcc and

u? ∈ arg minu∈BV(Ω,[0,1])

f (u). (10.13)

Then for almost any α ∈ [0, 1], u?α is a minimizer of f over BV(Ω, 0, 1),

with

u?α ∈ arg min

u∈BV(Ω,0,1)f (u). (10.14)

Proof.

S = α| f (u?α) 6= f (u?) (10.15)

If L(S) = ∅, then for a.e. α,

f (u?a) = f (u?) = inf

u∈BV(Ω,[0,1])f (u) (10.16)

If L(S) > 0, then there exists ε > 0 such that

L(Sε) > 0, Sε = α|u?α ≥ f (u?) + ε (10.17)

48 andrew tulloch

which implies

f (u?) =∫[0,1]\Sε

f (u?)dα +∫

Sε

f (u?)dα (10.18)

≤∫ 1

0f (u?

α)dα− εL(Sε) (10.19)

<∫ 1

0f (u?

α)dα (10.20)

which contradicts gcc.

Remark 10.7. Discretization problem, consider

minu∈Rn

f (u), f (u) = 〈s, u〉+ λ ∑i‖Gu‖2 (10.21)

such that 0 ≤ u ≤ 1 vs

minu∈Rn

f (u), f (u) = 〈s, u〉+ λ ∑i‖Gu‖2. (10.22)

such that u ∈ 0, 1n.

If u? solves the relaxation, does u?α solve the combinatorial problem?

Only if the discretized energy f satisfies gcc.

11

Bibliography

convex optimization - tullotullo.ch/static/cambridge/convexoptimization-lecturenotes.pdf · convex...

Documents