convex optimization - tullotullo.ch/static/cambridge/convexoptimization-lecturenotes.pdf · convex...
TRANSCRIPT
A N D R E W T U L L O C H
C O N V E X O P T I M I Z AT I O N
T R I N I T Y C O L L E G E
T H E U N I V E R S I T Y O F C A M B R I D G E
Contents
1 Introduction 5
1.1 Setup 5
1.2 Convexity 6
2 Convexity 9
3 Cones and Generalized Inequalities 15
4 Conjugate Functions 21
4.1 The Legendre-Fenchel Transform 21
4.2 Duality Correspondences 24
5 Duality in Optimization 27
6 First-order Methods 31
7 Interior Point Methods 33
4 andrew tulloch
8 Support Vector Machines 37
8.1 Machine Learning 37
8.2 Linear Classifiers 38
8.3 Kernel Trick 39
9 Total Variation 41
9.1 Meyer’s G-norm 41
9.2 Non-local Regularization 42
9.2.1 How to choose w(x, y)? 43
10 Relaxation 45
10.1 Mumford-Shah model 45
11 Bibliography 49
1
Introduction
1.1 Setup
A setup is
(i) A function f : U → Ω (energy), an input image (i), and a re-
constructed candidate image (u) and find the minimizer of the
problem
minu∈U
f (u, i) (1.1)
where f is typically of the form
f (u, i) = l(u, i)︸ ︷︷ ︸cost
+ r(u)︸︷︷︸regulariser
(1.2)
Example 1.1 (Rudin-Osher-Fortem).
minu12
∫Ω(u− I)2dx +
∫Ω‖∇u‖du (1.3)
• Will reduce contrast
• Will not introduce new jumps
Example 1.2 (Total variation (TV) L1).
minu12
∫Ω|u− I|dx + λ
∫Ω‖∇u‖du (1.4)
• In general does not cause contrast loss.
6 andrew tulloch
• Can show that if I = Br(0), then
u =
I r ≥ λ2
c r < λ2
(1.5)
Example 1.3 (Inpainting).
minu
12
∫Ω/A|u− I|dx + λ
∫Ω‖∇u‖dx (1.6)
1.2 Convexity
Theorem 1.4
If a function f is convex, every local optimizer is a global optimizer.
Theorem 1.5
If a function f is convex, it can (very often) be efficiently optimized
(polynomial time in number of bits in the input)
In computer vision, problems are:
• Usually large-scale (105 - 107 variables)
• Usually non-differentiable
Definition 1.6. f is lower semicountinuous if
f (x′) ≤ lim infx→x′
f (x) = minα ∈ R|(xk)→ x′1, f (xk)→ α (1.7)
Theorem 1.7. f : Rn → R. Then the following are equivalent
(i) f is lower semi continuous,
(ii) epi f is closed C in Rn ×R
(iii) levα f = x ∈ Rn| f (x) ≤ α are closed for all α ∈ R
Proof. ((i) → (ii)) Take (xk, αk) ∈ epi f → (x, α) ∈ Rn ×R. Then By
2.8 f (x) ≤ lim infk→∞ f (xk) ≤ limin fk→∞xk = α as (x, α) ∈ epi f .
((ii) → (iii)) epi f closed implies epi f ∩ (Rn × a′) closed for all
α′ ∈ R, if and only if
x ∈ Rn| f (x) ≤ α′ (1.8)
convex optimization 7
is closed for all α′ ∈ R If α′ = ∞, lev≤+∞ f = Rn. If α′ = −∞,
lev≤−∞ f = ∩k∈N lev≤−k f which is the intersection of closed sets.
((iii)→ (i)) For xj → x′, take the subsequence xk → x′, with
f (xk)→ lim infx→x′
f (x) = c (1.9)
if c ∈ R, for all K(ε) = K′ such that f (xk) ≤ C + ε for all k > K′.
Then
⇒ xk ∈ lev≤C+ε f ⇒ x′ ∈ lev≤C+ε.
Equivalently, x′ ∈ lev≤C f ⇒ f (x′) ≤ C = lim inf x → x′ f (x).
If c = +∞, done - f (x′) ≤ +∞ = lim inf. If c = ∞, same argument
with k ∈N, f (xk) ≤ −k, ...
Example 1.8. f (x) = x is lower semi-continuous, but does not have a
minimizer.
Definition 1.9. f : Rn → R is level-bounded if and only if lev≤α f is
bounded for all α ∈ R.
This is also known as coercivity.
Example 1.10. f (x) = x2 is level-bounded. f (x) = 1|x| is not level
bounded.
Theorem 1.11. f : Rn → R is level bounded if and only if f (xk) → ∞ if
‖xk‖ → ∞.
Proof. Exercise.
Theorem 1.12. f : Rn → R is lower-semicontinuous, level-bounded, and
proper. Then infx f (x) ∈ (−∞,+∞) and argmin f = x| f (x) ≤ in f f (x)is nonempty and compact.
Proof.
arg min f = x| f (x) ≤ infx
f (x) (1.10)
= x| f (x) ≤ inf f (x) +1k
, ∀k ∈N (1.11)
= ∩k∈N levinf f+ 1
k
f (1.12)
If inf f is −∞, just replace inf f + 1k by α with α > ∞, and set
αk = −k, k ∈N.
8 andrew tulloch
These are bounded (by level-boundedness), closed (by f being
lower semicontinuous), and non-empty (since 1k ≥ 0.). Then these
limit sets are compact, we can just take the limit of the left bound-
aries of the level sets, and construct the convergent subsequence that
is contained in every level set.
We need to show that inf f 6= −∞. If inf f = −∞, then there exists
x ∈ argmin f with f (x) = −∞. Since f is proper, this cannot exist.
Thus, inf f 6== −in f ty.
Remark 1.13. For Theorem 1.12, it suffices to have lev≤α f bounded and
non-empty for at least one α ∈ R.
Proposition 1.14. We have the following properties for lower semi continu-
ity.
(i) If f , g is lower semicontinuous, then f + g is lower semicontinuous
(ii) IF f is lower semicontinuous, λ ≥ 0, then λ f is lower semicontinuous
(iii) f : Rn → R is lower semicontinuous, g : Rm → Rn is continuous, then
f g is lower semicontinuous.
2
Convexity
Definition 2.1. (i) f : Rn → R is convex if
f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y) (2.1)
for all x, y ∈ Rn, τ ∈ (0, 1).
(ii) A set C ⊆ Rn is convex if and only if I(C) is convex.
(iii) f is strictly convex if and only if (2.1) holds strictly whenever
x 6= y and f (x), f (y) ∈ R.
Remark 2.2. C ⊆ Rn is convex if and only if for all x, y ∈ C, the connect-
ing line segment is contained in C.
Exercise 2.3. Show
x|aTx + b ≥ 0 (2.2)
is convex.
f (x) = aTx + b (2.3)
is convex.
Definition 2.4. x0, . . . , xm ∈ Rn, λ0, . . . , λm ≥ 0 with ∑mi=0 λi = 1, then
∑i λixi is a convex combination of the xi.
Theorem 2.5. Rn → R is convex if and only if
f
(n
∑i=1
λixi
)≤
n
∑i=1
λi f (xi) (2.4)
10 andrew tulloch
C ⊆ Rn is convex if and only if C contains all convex combinations of
it’s elements.
Proof.
n
∑i=1
λixi = λmxm +
(1− λm
λi1− λm
xi
)(2.5)
which is a convex combination of two points, and proceed by induc-
tion on m.
The set version is proven by application on I(C).
Proposition 2.6. f : Rn → R is convex implies that the domain of f is
convex.
Proposition 2.7. f : Rn → R is convex if and only if epi f is convex in
Rn ×R if and only if
(x, α) ∈ Rn ×R| f (x) < α (2.6)
is convex in Rn ×R.
Proof. epi f is convex if and only if for all τ ∈ (0, 1), ∀x, y, α ≥f (x), β > f (y), (x, α), (y, β) ∈ epi f , we have
f ((1− τ)x + τy) ≤ (1− τ)α + τβ (2.7)
⇐⇒ ∀τ ∈ (0, 1), ∀x, y, f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y) (2.8)
⇐⇒ f is convex. (2.9)
Proposition 2.8. f : Rn → R is convex implies lev≤α f is convex for all
α ∈ R.
Proof.
f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y) ≤ α
which implies lev≤α is convex.
α = ∞ then the lev≤α f = Rn which is convex.
Theorem 2.9. f : Rn → R is convex. Then
convex optimization 11
(i) arg min f is convex
(ii) x is a local minimizer of f implies x is a global minimizer of f
(iii) f is strictly convex and proper implies f has at most one global mini-
mizer.
Proof. (i) f = ∞ ⇒ arg min f = ∅. f 6= ∞ ⇒ arg min f = lev≤inf f f is
convex by previous proposition.
(ii) Assume x is a local minimizer and there exists y with f (y) < f (x).
Then
f ((1− τ)x + τy) ≤ (1− τ) f (x) + τ f (y)︸︷︷︸< f (x)
< f (x)
Taking τ → 0 shows that x cannot be a local minimizer, and thus y
cannot exist.
(iii) Assume x, y minimizes, which implies f (x) = f (y). Then
f (12
x +12
y) <12
f (x) +12
f (y) = f (x) = f (y)
which implies x, y not global minimizers.
Proposition 2.10. To construct convex functions:
(i) Let fi, i ∈ I convex, then
f (x) = supi∈I
f (x) (2.10)
is convex.
(ii) Let fi, i ∈ I strictly convex, I finite, then
supi∈I
f (i) (2.11)
is strictly convex.
(iii) Ci, i ∈ I convex sets, then
∩i∈ICi (2.12)
is convex.
12 andrew tulloch
(iv) fk, k ∈N convex,
lim supk→∞
fk(x) (2.13)
is convex.
Proof. Exercise.
Example 2.11. (i) C, D convex does not imply C ∪ D is convex (e.g. dis-
joint)
(ii) f : Rn → R is convex, C is convex implies f + I(C) is convex.
(iii) f (x) = |x| = maxx′ − x is convex.
(iv) f (x) = ‖x‖p, p ≥ 1 is convex, as
‖ · ‖p = sup‖y‖p=1
〈·, y〉 (2.14)
Theorem 2.12. Let C ⊆ Rn be open and convex. Let f : C → R be
differentiable. Then the following are equivalent:
(i) f is [strictly] convex
(ii) 〈y− x,∇ f (y)−∇ f (x)〉 ≥ 00 for all x, y ∈ C [with > 0 for x 6= y]
(iii) f (x) + 〈∇ f (x), y− x〉 ≤ f (y) for all x, y ∈ C [with < for x 6= y]
(iv) If f is additionally twice differentiable, then ∇2 f is positive semidefinite
for all x ∈ C.
(ii) is monotonicity of ∇ f .
Proof. Exercise (reduce to n = 1, then extend by a convex function is
convex on Rn iff it is convex on all Rn−1 subsets.)
Remark 2.13. Note that the inverse of (iv) does not necessarily hold, for
example f (x) = x4.
Proposition 2.14. We have the following results hold for convex functions.
(i) fi, . . . , fm : Rn → R is convex, λi, . . . , λm ≥ 0, the n
f = ∑i
λi fi (2.15)
is convex, and strictly convex if there exists i such that λi > 0 and fi is
strictly convex.
convex optimization 13
(ii) fi : Rn → R is convex implies f (x1, . . . , xm) = ∑ fi(xi) is convex,
strictly convex if all fi are strictly convex.
(iii) f : Rn → R is convex, A ∈ Rn×n, b ∈ Rn implies g(x) = f (Ax + b) is
convex.
Remark 2.15.
‖M‖ε = supx∈Rn
(‖Mx‖)
is convex.
Proposition 2.16. (i) c1, . . . , Cm convex implies C1 × · · · × Cm convex
(ii) C ⊆ Rn, A ∈ Rm×n, b ∈ Rm implies L(C) is convex with L(x) =
Ax + b.
(iii) C ⊆ Rm, ...⇒ L?(C) is convex.
(iv) f , g convex implies f + g are convex
(v) f convex, λ ∈ R implies λ f is convex
Definition 2.17. For S ⊆ Rn, x ∈ Rn, define the projection of x onto S
as
ΠS(y) = arg minx∈S
‖x− y‖2 (2.16)
Proposition 2.18. If C ⊆ Rn is convex, closed, and non-empty, then ΠC is
single-valued - that is, the projection is unique.
Proof. Let
ΠS(y) = arg minx∈S
12‖x− y‖2
2 + IC(x) (2.17)
To show uniqueness, 12‖x − y‖2 is strictly convex, and ∇2 1
2‖x −y‖2 = I > 0, so f is strictly convex.
f is proper (C 6= ∅), and thus f has at most one minimizer.
To show existence, we have that f is proper, lower semicontinuous
(left part from continuous, right part from C closed). Level bounded
as ‖x− y‖22 → ∞ as ‖x‖2 → ∞, and I(C) ≥ 0.
Thus, the arg min f 6= ∅.
14 andrew tulloch
Definition 2.19. Let S ⊆ Rn is arbitrary. Then
con S = ∩C convex, C ⊇ SC (2.18)
is the convex ball of S.
Remark 2.20. con S is the smallest convex set that contains S.
Theorem 2.21. Let S ⊆ Rn, then
con S =
n
∑i=0
λixi|xi ∈ S, λi ≥ 0,n
∑i=0
λi = 1
(2.19)
Proof. D is convex and contains S - if x, y ∈ D, then (1− τ)x + (τ)y ∈D. Thus con S ⊆ D.
From a previous theorem, con S convex implies con S contains all
convex combinations of points in S. Thus con S ⊇ D
Thus con S = D.
Definition 2.22. For a set C ⊆ Rn, define cl C as
cl C = x ∈ Rn|for all open neighborhoods N of x, N ∩ C 6= ∅(2.20)
int C as
int C = x ∈ Rn|there exists an open neighborhood N of x with N ⊆ C(2.21)
∂C (the boundary) as
∂C = cl C\ int C (2.22)
Remark 2.23.
cl G =⋂
S closed, S ⊇ G
S (2.23)
3
Cones and Generalized Inequalities
Definition 3.1. K ⊆ Rn is a cone if and only if 0 ∈ K and λx ∈ K for
all x ∈ K, λ ≥ 0.
Definition 3.2. A cone in pointed if and only if
x1 + · · ·+ xn = 0 ⇐⇒ x1 = · · · = xn = 0 (3.1)
Example 3.3. (i) Rn is a pointed cone.
(ii) (x1, x2) ∈ R2|x2 > 0 is not a cone.
(iii) (x1, x2) ∈ R2|x2 ≥ 0 is a cone, but is not pointed.
(iv) x ∈ Rn|xi ≥ 0, i ∈ 1, . . . , n is a pointed cone.
Proposition 3.4. K ⊆ Rn be any set. Then the following are equivalent:
(i) K is a convex cone.
(ii) K is a cone and K + K ⊆ K.
(iii) K 6= ∅ and ∑ni=0 αixi ∈ K for all xi ∈ K, αi ≥ 0.
Proof. Exercise.
Proposition 3.5. If K is a convex cone, then K is pointed if and only if
K ∩ (−K) = 0.
Proof. Exercise.
Definition 3.6.
∂ f (x) = v ∈ Rn| f (x) + 〈v, y− x〉 ≤ f (y)∀y (3.2)
16 andrew tulloch
f ′(x) = limt→0
f (x + t)− f (x)t
(3.3)
Proposition 3.7. Let f , g : Rn → R be convex. Then
(i) f differentiable at x implies ∂ f (x) = ∇ f (x).
(ii) f differentiable at x, g(x) ∈ R implies
∂( f + g)(x) = ∇ f (x) + ∂g(x) (3.4)
Proof. We have
(i) Equivalent to (ii) with g = 0.
(ii) To show LHS ⊇ RHS, if v ∈ ∂g(x), then 〈v, y− x〉 ≤ g(y), which
by Theorem 3.12 in notes gives
〈∇ f (x), y− x〉+ f (x) ≤ f (y) (3.5)
〈v +∇ f (x), y− x〉+ ( f + g)(x) ≤ ( f + g)(y) (3.6)
⇐⇒ v +∇ f (x) ∈ ∂( f + g) (3.7)
The other direction is given as
lim inf z→ xf (z) + g(z)− f (x)− g(x)− 〈v, z− x〉
‖z− x‖ ≥ 0
(3.8)
⇒ lim infz→x
g(z)− g(x)− 〈v−∇ f (x), z− x〉‖z− x‖ ≥ 0
(3.9)
⇒ lim inft↓0
g(z(t))− g(x)− 〈v−∇ f (x), (1− t)x + ty− y〉t‖y− x‖ ≥ 0
(3.10)
⇒ lim inft↓0
t(g(y)− g(x))− t〈v−∇ f (x), y− x〉t‖y− x‖ ≥ 0
(3.11)
⇒ g(y)− g(x)− 〈v−∇ f (x), y− x〉 ≥ 0⇒ v−∇ f (x) ∈ ∂g(x)
(3.12)
v ∈ ∇ f (x)− ∂g(x)
(3.13)
convex optimization 17
Theorem 3.8. Let f : Rn → R be proper. Then
x ∈ arg min f ⇐⇒ 0 ∈ ∂ f (x) (3.14)
Proof.
0 ∈ ∂ f (x) ⇐⇒ 〈0, y− x〉︸ ︷︷ ︸0
+ f (x) ≤ f (y) (3.15)
Definition 3.9. For a convex set C ⊆ Rn and point x ∈ C,
NC(x) = v ∈ Rn|〈v, y− x〉 ≤ 0∀y ∈ G (3.16)
is the “normal cone” of x. By convention, NC(x) = ∅ for all x /∈ C.
Proposition 3.10. Let C be convex and C 6= ∅. Then
∂I(C) (x) = NC(x) (3.17)
Proof. For x ∈ C, we have
∂I(C) = v|I(C) (x) + 〈v, y− x〉 ≤ I(C) (y)∀y ∈ C = NC(x) (3.18)
For x /∈ C, . Fill in proof hereFill in proof here
Proposition 3.11. C, closed, C 6= ∅ and convex, x ∈ Rn. Then
y = ΠC(x) ⇐⇒ x− y ∈ NC(y) (3.19)
Proof.
y ∈ ΠC(x) ⇐⇒ y minimizes12‖y− x‖2 + I(C) (y′)︸ ︷︷ ︸
g(y)
. (3.20)
If and only if 0 ∈ ∂g(y) if and only if 0 ∈ y− x + ∂I(C) (y)
Proposition 3.12. Let f : Rn → R be convex proper. Then
∂ f (x) =
∅ x /∈ dom f
v ∈ Rn|(u,−1) ∈ Nepi f (x, f (x)) x ∈ dom f(3.21)
18 andrew tulloch
If x ∈ dom f ,
Ndom f = v ∈ Rn|(x, 0) ∈ Nepi f (x, f (x)) (3.22)
Example 3.13. Let subdifferential of f : R→ R, f (x) = |x| is
∂ f (x) =
sign x x 6= 0
[−1, 1] x = 0(3.23)
Definition 3.14. For C ⊆ Rn, define the affine hull as
aff(C) = ∩A affine, C ⊆ A A (3.24)
rint(C) = x ∈ Rn|there exists an open neighborhood N of x such that N ∩ aff(C) ⊆ C(3.25)
Example 3.15.
aff(Rn) = Rn (3.26)
aff(Rn) = Rn (3.27)
rint([0, 1]2) = (0, 1)2 (3.28)
rint([0, 1]× 0) = (0, 1)× 0 (3.29)
Exercise 3.16. We know
intA ∩ B ⊆∫
A ∩∫
B (3.30)
Does
rint(A ∩ B) ⊆ rint A ∩ rint B (3.31)
Proposition 3.17. Let f : Rn → R be convex. Then
(i) (i) g(x) = f (x + y)⇒ ∂g(x) = ∂ f (x + y)
(ii) g(x) = f (λx)⇒ ∂g(x) = λ∂ f (x)
(iii) g(x) = λ f (x)⇒ ∂g(x) = λ∂ f (x)
(ii) f : Rn → R proper, convex, A ∈ Rm × n such that
Ay|y ∈ Rm ∩ rint dom f 6= ∅ (3.32)
convex optimization 19
Then for x ∈ dom( f A) we have
∂( f A)(x) = AT∂ f (Ax) (3.33)
(iii) Let f1, . . . fm : Rn → R be proper, convex, and rint dom f1 ∩ · · · ∩rint dom fm 6= ∅. Then
∂ f (x) = ∂ f1(x) + · · ·+ ∂ fm(x) (3.34)
4
Conjugate Functions
4.1 The Legendre-Fenchel Transform
Definition 4.1. For f : Rn → R, define
con f (x) = supg≤ f ,g convex
g(x) (4.1)
is the convex hull of f .
Proposition 4.2. con f is the greatest convex function majorized by f .
Definition 4.3. For f : Rn → R, the (lower) closure of f is defined as
cl f (x) = lim infy→x
f (y) (4.2)
Proposition 4.4. For f : Rn → R,
epi(cl f ) = cl(epi f ) (4.3)
In particular, if f is convex, then cl f is convex.
Proof. Exercise.
Proposition 4.5. If f : Rn → R, then
(cl f )(x) = supg≤ f ,g lsc
g(x) (4.4)
Proof. Fill inFill in
22 andrew tulloch
Theorem 4.6. Let C ⊆ Rn be closed and convex. Then
C =⋂
(b,β)s.t.c⊆Hb,β
Hb,β (4.5)
where
Hb,β = x ∈ Rn|〈x, b〉 − β ≤ 0 (4.6)
Proof.
Theorem 4.7. Let f : Rn → R be proper, lsc, and convex. Then
f (x) = supg≤ f ,g affine
g(x) (4.7)
Proof. This is quite an involved
proof.
This is quite an involved
proof.Definition 4.8. For f : Rn → R, let f ? : Rn → R be defined by
f ?(v) = supx∈Rn〈v, x〉 − f (x) (4.8)
as the conjugate to f . The mapping f 7→ f ? is the Legendre-Fenchel
Transform
Remark 4.9. For v ∈ Rn,
f ?(v) = supx ∈ Rn〈v, x〉 − f (x) (4.9)
⇒ f ?(v) ≥ 〈v, x〉 − f (x)∀x ∈ Rn ⇐⇒ f (x) ≥ 〈v, x〉 − f ?(x)∀x ∈ Rn.
(4.10)
Thus f ? is the largest affine function with gradient v majorized by f .
Theorem 4.10. Assume f : Rn → R. Then
(i) f ? = (con f )? = (cl f )? = (cl con f )? and f ≥ f ?? = ( f ?)?, the
biconjugate of f .
(ii) If con f is proper, then f ?, f ?? as proper, lower semicontinuous, convex
and f ?? = cl con f .
(iii) If f : Rn → R is proper, lower semicontinuous, and proper, then
f ?? = f (4.11)
convex optimization 23
Proof. We have
(v, β) ∈ epi f ? ⇐⇒ β ≥ 〈v, x〉− f (x)∀x ∈ Rn ⇐⇒ f (x) ≥ 〈v, x〉− β∀x ∈ Rn
(4.12)
We claim that for an affine function h, we have h ≤ f ⇐⇒ h ≤con f ⇐⇒ h ≤ cl f ⇐⇒ h ≤ cl con f . This is shown as con f is the
largest convex function less than or equal to f , and h is convex. Same
for cl, cl con, etc.
Thus in (4.12) we can replace f by con f , cl f , cl con f , which gives
our required result.
We also have
f ??(y) = supv∈Rn〈v, y〉 − sup
x∈Rn〈v, x〉 − f (x) (4.13)
≤ supv∈Rn〈v, y〉 − 〈v, y〉+ f (y) (4.14)
= f (y) (4.15)
For the second part, we have con f is proper, and claim that
cl con f is proper, lower semicontinuous, and convex.
Lower semicontinuity is a give. Convexity is given by the previous
proposition that f is convex implies cl f is convex. Properness is to be
shown in an exercise.
Applying the previous theorem,
cl con f (x) = supg∈cl con f ,g affine
g(x) (4.16)
= sup(v,β)∈epi f ?
〈v, x〉 − β (4.17)
= sup(v,β)∈epi f ?
〈v, x〉 − f ?(v) (4.18)
= supv∈dom f ?
〈v, x〉 − f ?(v) (4.19)
= supv∈Rn〈v, x〉 − f ?(v) − f ??(x) (4.20)
with g(x) ≤ 〈v, x〉 − β, (v, β) ∈ epi(cl con f )? ⇐⇒ (v, β) ∈ epi f ?.
To show f ? is proper, lower semicontinuous, and convex, we have
epi f ? is the intersection of closed convex sets, and therefore closed
and convex, and hence f ? is lower semicontinuous and convex.
24 andrew tulloch
To show properness, we have con f is proper implies there exists
x ∈ Rn with con f (x) < ∞. Then f ?(v) = supx∈Rn〈v, x〉 − f (x),which is greater than −∞.
If f ? ≡ +∞, then cl con f = f ?? = supv〈v, x− f ?(x) ≡ −∞, and so
cl con f is proper, which implies f ? is proper, lower semicontinuous,
and convex. Applying to f ? - we need con f ? proper (which is proper
by previous result)„ and thus f ?? is proper, lower semicontinuous,
and convex.
For part 3, apply 2 - f is convex, which implies con f = f and
con f is proper (as f is proper), and f is lsc and convex, and thus
f ?? = cl con f = f .
4.2 Duality Correspondences
Theorem 4.11. Let f : Rn → R be proper, lower semicontinuous, and
convex. Then:
(i) ∂ f ? = (∂ f )−1
(ii) v ∈ ∂ f (x) ⇐⇒ f (x) + f ?(v) = 〈v, x〉 ⇐⇒ x ∈ ∂( f ?)(v).
(iii)
∂ f (x) = arg maxv′〈v′, x〉 − f ?(v′) (4.21)
∂ f ?(x) = arg maxx′〈v, x′〉 − f (x′) (4.22)
(4.23)
Proof. (i) This is obvious from (2)
(ii)
f (x) + f ?(v) = 〈v, x〉 (4.24)
⇐⇒ f ?(v) = 〈v, x〉 − f (x) (4.25)
⇐⇒ x ∈ arg max〈v, x′〉 − f (x′) (4.26)
... (4.27)
Finish off proofFinish off proof
convex optimization 25
Proposition 4.12. Let f : Rn → R be proper, lower semicontinuous, and
convex. Then
( f (·)− 〈a, ·〉)? = f ?(·+a) (4.28)
( f (·+ b))? = f ?(·)− 〈·, b〉 (4.29)
( f (·) + c)? = f ?(·)− c (4.30)
(λ f (·))? = λ f ?(·λ), λ > 0 (4.31)
(λ f (·λ))? = λ f ?(·) (4.32)
Proof. Exercise
Proposition 4.13. fi : Rn → R, i : 1, . . . , m proper, f (x1, . . . , xm) =
∑i fi(xi). Then f ?(v1, . . . , vm) = ∑i f ?(vi)
Definition 4.14. For any set S ⊆ Rn, define the support function
GS(v) = supx∈S〈v, x〉 = (δS)
?(v) (4.33)
Definition 4.15. A function h : Rn → R is positively homogeneous if
0 ∈ dom h and h(λx) = λh(x) for all λ > 0, x ∈ Rn.
Proposition 4.16. The set of positive homogeneous proper lower semicon-
tinuous convex functions and the set of closed convex nonempty sets are in
one-to-one correspondence through the Legendre-Fenchel transform.
δC ↔ GC (4.34)
and
x ∈ ∂GC(v) ⇐⇒ x ∈ C (4.35)
and
GC(v) = 〈v, x〉 ⇐⇒ v ∈ NC(x) = ∂δC(x) (4.36)
The set of closed convex cones is in one-to-one correspondence with itself:
δK ↔ δK? (4.37)
K? = v ∈ Rn|〈v, x〉 ≤ 0∀x ∈ K (4.38)
5
Duality in Optimization
Definition 5.1. For f : Rn ×Rm → R proper, lower semicontinuous,
convex, we define the primal and dual problems
infx∈Rn
φ(x), φ(x) = f (x, 0) (5.1)
supy∈Rm
ψ(y), ψ(y) = f ?(0, y) (5.2)
and the inf-projections
p(u) = infx∈Rn
f (x, u) (5.3)
q(v) = infy∈Rm
f ?(v, y) (5.4)
f is the perturbation function for φ, p is the associated projection
function.
Consider the problem
infx
12‖x− z‖2 + δ≥0(Ax− b) = inf φ(x) (5.5)
Consider the perturbed problem
f (x, u) =12‖x− z‖2 + δ≥0(Ax− b + u) (5.6)
Proposition 5.2. Assume f satisfying the assumptions in Definition 5.1.
Then
(i) φ,−psi are convex and lower semicontinuous
(ii) p, q are convex
28 andrew tulloch
(iii) p(0) = infx φ(x)
(iv) p??(0) = supy ψ(y)
(v) infx φ(x) < ∞ ⇐⇒ 0 ∈ dom p
(vi) sup ψ(y) > −∞ ⇐⇒ 0 ∈ dom q
Proof. (i) φ is clearly convex. For ψ, f ? is lower semicontinuous and
convex, which implies −ψ is lower semicontinuous and convex.
(ii) Look at the strict epigraph of p:
E = (u, α) ∈ Rm ×R|p(u) < α (5.7)
= (u, α) ∈ Rm ×R|∃x : f (x, u) < α(5.8)
= A(u, α, x) ∈ Rm ×R×Rn| f (x, u) ≤ α(5.9)
= A(u, α, x) 7→ (u, α) (5.10)
as a linear map over a convex set. As E is convex, p must be con-
vex. Similarly with q.
(iii) For p(0), this proceeds by definition. For p??(0),
p?(y) = supu〈y, u〉 − p(u) (5.11)
= supu,x〈y, u〉 − f (x, u) (5.12)
= f ?(0, y) (5.13)
and
p??(0) = supy〈0, y〉 − p?(y) (5.14)
= supy− f ?(0, y) (5.15)
= supy
ψ(y) (5.16)
(iv) By definition, 0 ∈ dom p ⇐⇒ p(0) = infx f (x) < ∞.
complete proofcomplete proof
convex optimization 29
Theorem 5.3. Let f as in Definition 7.1. Then weak duality holds
p(0) = infx
φ(x) ≥ supy
ψ(y) = p??(0) (5.17)
and under certain conditions the inf, sup are equal and finite (strong dual-
ity).
p(0) ∈ R and p lower-semicontinuous at 0 if and only if inf φ(x) =
sup ψ(y) ∈ R.
Definition 5.4.
inf φ− sup ψ (5.18)
is the duality gap
Proof. (⇐) p?? ≤ cl p ≤ p⇒ cl p(0) = p(0).
(⇒) cl p is lower semicontinuous, convex⇒ cl p(x) is proper,
sup ψ = (p?)?(0) = (cl p)??(0) = cl p(0) = p(0) = inf φ
Proposition 5.5.
Fill in notes from lectureFill in notes from lecture
b
6
First-order Methods
Idea is that we do gradient descent on our objective function f , with
step size τk.
Definition 6.1. Let f : Rn → R, then
(i)
Fτk f (xk) = xk − τk∂ f (xk) = (I − τk∂ f )(xk) (6.1)
(ii)
Bτk f (xk) = (I + τk∂ f )−1(xk) (6.2)
so
(6.3)
Fill inFill in
Proposition 6.2. If f : Rn → R is proper, lower semicontinuous, and
convex, for τ > 0,
Bτk f (x) = arg miny 1
2τk‖y− x‖2
2 + f (y) (6.4)
Proof.
y ∈ arg min... ⇐⇒ 0 ∈ 1τk(y− x)∂ f (y) (6.5)
(6.6)
Missed lecture materialMissed lecture material
7
Interior Point Methods
Theorem 7.1 (Problem). Consider the problem
infx〈c, x〉+ δK(Ax− b) (7.1)
with K a closed convex cone - thus Ax ≥K b.
Notation.
The idea is to replace δK by a smooth approximation F, with
F(x)→ ∞ as x → boundary K, and solve inf 〈c, x〉+ 1t F(x), equivalent
to solving t〈c, x〉+ F(x).
Proposition 7.2. F is canonical barrier for K. Then F is smooth on
dom F =∫
K and strictly convex, with
F(tx) = F(x)−ΘFλt, (7.2)
and
(i) −∇ F(x) ∈ dom F
(ii) 〈∇ F(x), x〉 = −ΘF
(iii) −∇ F(−∇ F(x)) = x
(iv) −∇ F(tx) = − 1t ∇ F(x).
For the problem
infx〈c, x〉 (7.3)
34 andrew tulloch
such that Ax ≥K b with K closed, convex, self-dual cone, we have
supx
infx〈c, x〉+ δK(Ax− b) (7.4)
⇒ −〈b, y〉 − h?(−ATy− c)− k?(y) (7.5)
⇐⇒ supy〈b, y〉s.t.y ≥K 0, ATy = c (7.6)
⇐⇒ infx〈c, x〉+ F(ax− b) (7.7)
⇐⇒ supy〈b, y〉 − F(y)− δATy=c (7.8)
More conditions on, etcMore conditions on, etc
Proposition 7.3. For t > 0, define
x(t) = arg mint〈c, x〉+ F(Ax− b) (7.9)
y(t) = arg min−t〈b, y〉+ F(y) + δATy=c (7.10)
z(t) = (x(t), y(t)) (7.11)
These paths exist, are unique, and
(x, y) = (x(t), y(t)) ⇐⇒
ATy = c
ty +∇ F(Ax− b) = 0(7.12)
Proof. The first part follows by definition.
For (7.12), the primal optimality condition is that 0 = tc +
AT∇ F(Ax−b). The dual optimality condition is that 0 ∈ −tb +
∇ F(y) + Nz|ATz=c(y), which is
⇐⇒ 0 ∈ −tb +∇ F(y) +
∅ ATy 6= c
range A ATy = c(7.13)
⇐⇒ 0 ∈ −tb +∇ F(y) + range A, ATy = c
(7.14)
Assume x satisfies the primal optimality condition. Then
tc + AT∇ F(Ax− b) = 0 ⇐⇒ tc + AT(−ty) = 0 (7.15)
⇐⇒ ATy = c (7.16)
convex optimization 35
A few more steps...A few more steps...
Proposition 7.4. If x, y feasible, then
φ(x)− ψ(y) = 〈y, Ax− b〉 (7.17)
Moreover, for (x(t), y(t)) on the central path,
φ(x(t))− ψ(y(t)) =ΘFt
(7.18)
Proof.
φ(x)− ψ(y) = 〈c, x〉 − 〈b, y〉 (7.19)
=⟨
ATy, x⟩− 〈b, y〉 (7.20)
= 〈Ax− b, y〉 (7.21)
For the second part, we have
φ(x(t))− ψ(y(t)) = 〈y(t), Ax(t)− b〉 (7.22)
=
⟨−1
t∇ F(Ax(t)− b), Ax(t)− b
⟩(7.23)
=ΘFt
(7.24)
Fill in from lecture notesFill in from lecture notes
8
Support Vector Machines
8.1 Machine Learning
Given X = x1, . . . , xn a sample set, xi ∈ F ⊆ Rm, with F our
feature space, and C a set of classes. We seek to find
hθ : F → C, θ ∈ Θ (8.1)
with Θ our parameter space.
Our task is to find θ such that hθ is the “best” mapping from X
into C.
(i) Unsupervised learning (only X is known, usually |C| not known)
• Clustering
• Outlier detection
• Mapping to lower dimensional subspace
(ii) Supervised learning (|C| known). We have training date T =
(x1, y1), . . . , (xn, yn), with yi ∈ C.
We seek to find
θ = arg minθ∈Θ
f (θ, T) =n
∑i=1
g(yi, hθ(xi)) + R(θ) (8.2)
38 andrew tulloch
8.2 Linear Classifiers
The idea is to consider
hθ : Rm → −1, 1, θ =
w
b
(8.3)
hθ(x) = sign(〈w, x〉+ b) (8.4)
We want to consider maximum margin classifiers, satisfying
maxw,b,w 6=0
mini
yi(
⟨w‖w‖ , x
⟩+
b‖w‖ ) (8.5)
which can be rewritten as
maxw,b
c (8.6)
such that
c ≤ yi(⟨
w‖w‖ , xi
⟩+
b‖w‖
)(8.7)
‖w‖c ≤ yi(⟨
w, xi⟩+ b)
(8.8)
(8.9)
or just
minw,b
12‖w‖2 (8.10)
such that
1 ≤ yi(〈w, x〉+ b) (8.11)
Definition 8.1. In standard form,
infw,b
k(w, b) + h(M
w
b
− e) (8.12)
convex optimization 39
The conjugates are
k(w, b) =12‖w‖2 (8.13)
k?(u, c) =12‖u‖2 + δ0(c) (8.14)
h(z) = δ≥0(z)h?(v) = δ≤v(v) (8.15)
In saddle point form,
infw,b
supz
12‖w‖2
2 +
⟨M
w
b
− e, z
⟩− δ≤0(z) (8.16)
The dual problem is
supz−〈e, z〉 − δ≤0(z)−
12‖ −
n
∑i=1
yixizi‖22 − δ0(〈y, z〉) (8.17)
and thus
infz
12‖∑
iyixizi‖2
2 + 〈e, z〉 (8.18)
such that z ≤ 0, 〈y, z〉 = 0.
The optimality conditions are
(8.19)
Fill in rest of optimality con-
ditions
Fill in rest of optimality con-
ditionsWe use the fact that if k(x) + h(Ax + b) is our primal, then the dual
is −〈b, y〉 − k?(−ATy)− h?(z).
Explanation of support vec-
tors
Explanation of support vec-
tors
8.3 Kernel Trick
The idea is to embed our features into a a higher dimensional space,
mapping function φ. Then our decision function takes the form
hθ(x) = sign(〈φ(x), w〉+ b) (8.20)
9
Total Variation
9.1 Meyer’s G-norm
Idea - regulariser that favors textured regions
‖u‖G = inf ‖v‖∞| ÷ v = u, v ∈ L∞(Rd, Rd) (9.1)
Discretized, we have
‖u‖G = infδ?C (v) + δ−GTv=u (9.2)
C = v|∑x‖v(x)‖2 ≤ 1 (9.3)
= inf v supv
supwδ?C(v)−
⟨w, GTv + u
⟩
(9.4)
= supw− sup
v〈v,−Gw〉+ δ?C(v)− 〈w, u〉
(9.5)
= supwδC(−Gw) + 〈w, u〉 (9.6)
= supw〈w, u〉 − δC(−Gw) (9.7)
= sup〈w, u〉|TV(w) ≤ 1 (9.8)
and so ‖ · ‖G is the dual to TV.
‖ · ‖G = δ?BTV(9.9)
42 andrew tulloch
where BTV = u|TV(u) ≤ 1.Similarly,
sup〈u, w〉|‖w‖G ≤ 1 (9.10)
= sup〈u, w〉|∃v : w = −GTv, ‖v‖∞ ≤ 1(9.11)
= sup⟨
u,−GTv⟩|‖v‖∞ ≤ 1 (9.12)
= TV(u) (9.13)
Why is ‖ · ‖ good in separating noise?
arg min12‖u− g‖2
2s.t.‖u‖G ≤ λ (9.14)
= ∏λB‖·‖G
(g) = BλB‖·‖G(g) = g− BλB‖·‖G
= g− BλTV(g)
(9.15)
9.2 Non-local Regularization
In real-world images, large ‖∇ u‖ does not always mean noise.
Definition 9.1. Ω = 1, . . . , n, given u ∈ Rn, x, y ∈ Ω, w ∈ Ω2 →R≥0, then
∂yu(x) = (u(y)− u(x))w(x, y) (9.16)
∇w
u(x = (∂yu(x)))y∈Ω (9.17)
A suitable divergence ÷wu(x) = ∑y∈Ω(w(x, y) − v(y, x))w(x, y)
adjoint to ∇w with respect to Euclidean inner products.
〈− ÷w v, u〉 =⟨
v,∇w
u⟩
(9.18)
Non-local regularizers are
J(u) = ∑x∈Ω
g(∇w
u(x)) (9.19)
convex optimization 43
with
TVgNL(u) = ∑
x∈Ω‖∇
wu(x)‖2 (9.20)
TVdNL(u) = ∑
x∈Ω∑
y∈Ω|∂yu(x) (9.21)
This reduces to classical TV if the weights are chosen as constant
(w(x, y) = 1h ).
9.2.1 How to choose w(x, y)?
(i) Large if neighborhoods of x, y are similar with respect to a dis-
tance metric.
(ii) Sparse, otherwise we have n2 terms in the regularized (n is num-
ber of pixels).
Possible choice
du(x, y) =∫
ΩKG(t)(u(y + t)− u(x + t))2dt (9.22)
A(x) = arg minA∑
y∈Adu(x, y)|A ⊆ S(x), |A| = k (9.23)
w(x, y) =
1 y ∈ A(x), x ∈ A(y)
0 otherwise(9.24)
with KG a Gaussian kernel of variance G2.
10
Relaxation
Example 10.1 (Segmentation). Find C ⊆ Ω that fits the given data and
prior knowledge about typical shapes.
Theorem 10.2 (Chan-Vese). Given g : Ω→ R, Ω ⊆ Rd, consider
infC⊆R,c1,c2∈R
fCV(C, c1, c2) (10.1)
fCV(C, c1, c2) =∫
C(g− c1)
2dx +∫
Ω\C(g− c2)
2dx + λHd−1(∂C)
(10.2)
. Confirm form of the regu-
lariser
Confirm form of the regu-
lariserThus fit c1 to shape, c2 to outside of shape, and C is the region of the
shape.
10.1 Mumford-Shah model
infK ⊆ Ω closed,u∈C∞(Ω\K)
f (K, u) (10.3)
f (K, u) =∫
Ω(g− u)2dx + λ
∫Ω\K‖∇ u‖2
2dx + νHd−1(∂K). (10.4)
Chan-Vese is a special case of this, with forcing u = c1I(C) +
c2(1− I(C)).
46 andrew tulloch
infC⊆Ω,c1,c2∈R
∫C(g− c1)
2dx +∫
Ω\C(g− c2)
2dx + λHd−1(∂C) (10.5)
infu∈BV(D,0,1),c1,c2∈R
∫Ω
u(g− c1)2 + (1− u)(g− c2)
2dx + λTV(u)
(10.6)
Fix c1, c2. Then
infu∈BV(Ω,0,1)
∫Ω
u ((g− c1)2 − (g− c2)
2)︸ ︷︷ ︸S(·)
dx + λTV(u) (C)
Replacing 0, 1 by it’s convex hull, we obtain
infu∈BV(Ω,[0,1])
∫Ω
u · Sdx + λTV(u) (R)
This is a “convex relaxation” -
(i) Replace non-convex energy by convex approximation
(ii) Replace non-convex f by con f .
Can the minima of (R) have values /∈ 0, 1. Assume u1, u2 ∈BV(Ω, 0, 1) that u2 minimized (R) and (C), and u1 6= u2. Then
12
u1 +12
u2 /∈ BV(Ω, 0, 1) (10.7)
is a minimizer of (R).
Proposition 10.3. Let c1, c2 be fixed. If u minimizes (R) and u ∈ BV(Ω, 0, 1),then u minimizes (C) and (u = 1C) C minimizes fCV(·, c1, c2).
Proof. Let C′ ⊆ Ω. Then 1C′ ∈ BV(Ω, [0, 1]). Then
f (1C′) ≥ f (1C) (10.8)
∀C′ ⊆ Ω.
Definition 10.4. L = BV(Ω, [0, 1]). For u ∈ L, α ∈ [0, 1],
uα(x) = I(u > α) (x) =
1 u(x) > α
0 u(x) ≤ α(10.9)
convex optimization 47
Then f : L → R satisfies general coarea condition (gcc) if and
only if
f (u) =∫ 1
0f (uα)dα (10.10)
Upper bound?Upper bound?
Proposition 10.5. Let s ∈ L∞(Ω), Ω bounded. Then f as in (R) satisfies
gcc.
Proof. If f1, f2 satisfy gcc, then f1 + f2 satisfy gcc. This follows as
λTV satisfies gcc by the coarea formula for total variation.
Then we have
∫Ω
u(x)S(x)dx =∫
Ω(∫ 1
0I(u(x) > α) S(x)dα)dx (10.11)
=∫ 1
0
∫Ω
I(u(x) > α) dxdα. (10.12)
Theorem 10.6. Assume f : BV(Ω, [0, 1])→ R satisfies gcc and
u? ∈ arg minu∈BV(Ω,[0,1])
f (u). (10.13)
Then for almost any α ∈ [0, 1], u?α is a minimizer of f over BV(Ω, 0, 1),
with
u?α ∈ arg min
u∈BV(Ω,0,1)f (u). (10.14)
Proof.
S = α| f (u?α) 6= f (u?) (10.15)
If L(S) = ∅, then for a.e. α,
f (u?a) = f (u?) = inf
u∈BV(Ω,[0,1])f (u) (10.16)
If L(S) > 0, then there exists ε > 0 such that
L(Sε) > 0, Sε = α|u?α ≥ f (u?) + ε (10.17)
48 andrew tulloch
which implies
f (u?) =∫[0,1]\Sε
f (u?)dα +∫
Sε
f (u?)dα (10.18)
≤∫ 1
0f (u?
α)dα− εL(Sε) (10.19)
<∫ 1
0f (u?
α)dα (10.20)
which contradicts gcc.
Remark 10.7. Discretization problem, consider
minu∈Rn
f (u), f (u) = 〈s, u〉+ λ ∑i‖Gu‖2 (10.21)
such that 0 ≤ u ≤ 1 vs
minu∈Rn
f (u), f (u) = 〈s, u〉+ λ ∑i‖Gu‖2. (10.22)
such that u ∈ 0, 1n.
If u? solves the relaxation, does u?α solve the combinatorial problem?
Only if the discretized energy f satisfies gcc.