convex optimization a journey of 60 yearsdimitrib/convex_opt_paths_ahead_s… · lids paths ahead...

Convex OptimizationA Journey of 60 Years

Dimitri P. Bertsekas

Department of Electrical Engineering and Computer ScienceMassachusetts Institute of Technology

LIDS Paths Ahead Symposium, 2009

History and Prehistory

Prehistory: Early 1900s - 1949.Caratheodory, Minkowski, Steinitz, Farkas.Properties of convex sets and functions.

Fenchel - Rockafellar era: 1949 - mid 1980s.Duality theory.Minimax/game theory (von Neumann).(Sub)differentiability, optimality conditions, sensitivity.

Modern era - Paradigm shift: Mid 1980s - present.Nonsmooth analysis (a theoretical/esoteric direction).Algorithms (a practical/high impact direction).A change in the assumptions underlying the field.

Duality

Two different views of the same object.

Example: Dual description of signals.

A union of points An intersection of hyperplanesTime domain Frequency domain

f�2,Xk

(−λ)

x1 x2 x3 x4 Slope: xk+1 Slope: xi, i ≤ k

f�(λ)

Constant− f�1 (λ) f�

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

M =�(u, w) | there exists x ∈ X

Outer Linearization of f

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) ≤ supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Shapley-Folkman Theorem: Let S = S1 + · · · + Sm with Si ⊂ �n,i = 1, . . . ,mIf s ∈ conv(S) then s = s1 + · · · + sm wheresi ∈ conv(Si) for all i = 1, . . . ,m,si ∈ Si for at least m− n− 1 indices i.

The sum of a large number of convex sets is almost convexNonconvexity of the sum is caused by a small number (n + 1) of sets

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

Duality Gap DecompositionConvex and concave part can be estimated separatelyq is closed and concaveMin Common ProblemMax Crossing ProblemWeak Duality q∗ ≤ w∗

minimize w

subject to (0, w) ∈ M,

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

M =�(u,w) | there exists x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

minimize w

Dual description of closed convex sets

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

minimize w

A union of its points An intersection of halfspacesAbstract Min-Common/Max-Crossing TheoremsMinimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

Duality Gap DecompositionConvex and concave part can be estimated separatelyq is closed and concaveMin Common Problem

Dual Description of Convex Functions

Define a closed convex function by its epigraph.Describe the epigraph by hyperplanes.Associate hyperplanes with crossing points (the conjugate function).

Polyhedral Convexity Template

infx!"n!f(x)! x#y

"= !h(y) (y, 1) Slope = y x

epi(f) w (µ, 1) q(µ)w$ is uniformly distributed in the interval [!1, 1]! ! f!(!) X = x Measurement

(µ, ")

3 5 9 11 13 4 10 1/6

Mean SquaredLeast squares estimate

E[! | X = x]

X = ! + W

M M = epi(p) (0, w$) epi(p)E[!] var(!) Hyperplane {x | y#x = 0}cone({a1, . . . , ar})u v M

(µ, 1)

infx!"n!f(x)! x#y

"= !h(y) (y, 1) Slope = y x

f(x) = (c/2)x2

f(x) = |x|

f(x) = !! "

h(y) = (1/2c)y2

epi(f) w (µ, 1) q(µ)w$ is uniformly distributed in the interval [!1, 1]! # f!(#) X = x Measurement

(µ, ")

3 5 9 11 13 4 10 1/6

Mean SquaredLeast squares estimate

E[! | X = x]

f(x) = supy!"n

!y#x! h(y)

infx!"n

!f(x)! x#y

"= !h(y) (y, 1) Slope = ! x 0

x#y ! h(y)

f(x) = (c/2)x2

f(x) = |x|

f(x) = !x ! "

" ! !1 1epi(f) w (µ, 1) q(µ)w$ is uniformly distributed in the interval [!1, 1]! # f!(#) X = x Measurement

(µ, ")

3 5 9 11 13 4 10 1/6

Indicator FunctionsSupport Functions

(!x, 1)

(!y, 1)

rf (x)

f(x) = supy!"n

!y#x! h(y)

y#x! h(y)

infx!"n!f(x)! x#y

"= !h(y)

(y, 1)

Slope = 2

Slope = 1/2

Indicator FunctionsSupport Functions

(!x, 1)

(!y, 1)

rf (x)

f(x) = supy!"n

!y#x! h(y)

y#x! h(y)

infx!"n!f(x)! x#y

"= !h(y)

(y, 1)

Slope = 2

Slope = 1/2

infx!"n

{f(x)! x#y} = !f!(y),

State sampling according to Markov chain P

(µ, 0)(a) (b) (c)

(0, 0) X (0, 1) cone(X) conv(X)

x" = !x+(1!!)x C x ! " x S S" x4 f(x) f(z) !f(x)+ (1!!)f(y)0 !"

dom(f)

f!!x + (1! !)y

"C x y z x1 x2 x3 x4 f(x) f(z) !f(x) + (1! !)f(y) 0

!f(x) + (1! !)f(y) C x y f(x) f(z) z = !x + (1 ! !)y

f(z) + (y ! z)#"f(z) f(z) + (x! z)#"f(z)

#x | f(x) # #

x + !(z ! x)

f(x) +f!x + !(z ! x)

"! f(x)

e1 e2 e3 e4 yk zk xk xk+1 0

Primal DescriptionDual DescriptionValues f(x) Crossing points f∗(y)A union of points An intersection of halfspacesAbstract Min-Common/Max-Crossing TheoremsMinimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

Duality Gap Decomposition

Primal DescriptionDual DescriptionValues f(x) Crossing points f∗(y)A union of points An intersection of halfspacesAbstract Min-Common/Max-Crossing TheoremsMinimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

Duality Gap Decomposition

Primal Description

Dual Description

Values f(x) Crossing points f∗(y)

A union of points An intersection of halfspacesAbstract Min-Common/Max-Crossing TheoremsMinimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

Primal Description

Dual Description

A union of points An intersection of halfspacesAbstract Min-Common/Max-Crossing TheoremsMinimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

Fenchel Duality Framework - The Primal Problem

a1 a2 x 0

(a) (b) (c) Slope λ Slope λ∗ f1(x) −f2(x) h1(λ) + h2(−λ) x∗ f∗ − q∗

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

D C C∗ y z x H P P C ∩H1 C ∩H1 ∩H2

{y | y�a1 ≤ 0} {y | y�a2 ≤ 0}(0, w∗) w a1 a2 a3 a4 a5 c1 c2 v1 v2 v3

c = µ∗1a1 + µ∗

β α −1 1 0 N(A) R(A�) D = P ∩ aff(C) M = H ∩ aff(C)

�(g(x), f(x)) | x ∈ X

C = aff(C)∩ (Closed Halfspace Containing C)

M =�(u, w) | there exists x ∈ X such that g(x) ≤ u, f(x) ≤ w

M =�(u, w) | g(x) ≤ u, f(x) ≤ w for some x ∈ C

Separating Hyperplane H that Properly Separates C and D C and Pβ α −1 1 0

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

β if y = α∞ if y �= α

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

Primal Description

Dual Description

A union of points An intersection of halfspaces minx

�f1(x) + f2(x)

Abstract Min-Common/Max-Crossing TheoremsMinimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

Fenchel Primal and Dual Problem Descriptions

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

Vertical Distances

Crossing Point Differentials

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

A union of points An intersection of halfspaces

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗2 (−y)

Abstract Min-Common/Max-Crossing Theorems

Minimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

Vertical Distances

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗2 (−y)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

Vertical Distances

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗2 (−y)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

Vertical Distances

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗2 (−y)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Fenchel Duality

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

a1 a2 x 0

cone�{a1, a2, a3}

�{x | a�

jx ≤ 0, j = 1, 2, 3}

c = µ∗1a1 + µ∗

�(g(x), f(x)) | x ∈ X

h(y) = (1/2c)y2

h(y) =�

0 if |y| ≤ 1∞ if |y| > 1

h(y) =�

epi(f) w (µ, 1) q(µ)3 5 9 11 1

3 4 10 1/6cone({a1, . . . , ar})u v M a

(µ, 1)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

Primal Description

Dual Description

−f∗1 (y) f∗1 (y) + f∗2 (−y) f∗2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

�f1(x) + f2(x)

�= max

�− f�

1 (y)− f�2 (−y)

f�2,Xk

(−λ)

�B(x) ��B(x) S �� < �

Boundary of S

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

Duality Gap DecompositionConvex and concave part can be estimated separatelyq is closed and concaveMin Common ProblemMax Crossing Problem

A More Abstract View of Duality

Despite its elegance, the Fenchel framework is somewhat indirect.From duality of set descriptions, to

duality of functional descriptions, toduality of problem descriptions.

A more direct approach:Start with a set, then to

two simple prototype problems dual to each other.

Avoid functional descriptions (a simpler, less constrained framework).

Min-Common/Max-Crossing Duality

Negative Halfspace {x | a!x ! b}Positive Halfspace {x | a!x " b}

a!(C) C C # S" d z x

Hyperplane {x | a!x = b} = {x | a!x = a!x}

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

x Wk Nk Wk y C2 C C2k+1 yk AC

C = C + S"

Nonvertical Vertical

Hyperplane

Level Sets of f Constancy Space Lf #$k=0Ck Rf

Level Sets of f " ! $1 1(µ, 0) cl(C)

Min Common Point w*

Max Crossing Point q*

Min Common Point w*w w

MMax Crossing Point q*

Min Common Point w*

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

x M M Wk y C2 C C2k+1 yk AC

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

Hyperplane

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

Max Crossing Point q#

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

x# x f!!x# + (1 $ !)x

x0 $ d x1 x2 x x4 $ d x5 $ d d

x0 x1 x2 x3

a0 a1 a2 a3

X 0 u w (µ, ") (u, w)µ

!u + w

#X(y)/%y%

C = C + S"

Min Common Point w#

w(µ + !µ, !")

(a) (b) (c)

(µ, 1)

w(µ + !µ, !")

(a) (b) (c)

(µ, 1)

w(µ + !µ, !")

(a) (b) (c)

(µ, 1)

Abstract Framework for Duality Analysis

A union of points An intersection of hyperplanesAbstract Min-Common/Max-Crossing TheoremsMinimax Duality (minmax=maxmin)Constrained Optimization DualityTheorems of the Alternative etcTime domain Frequency domain

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

f�2,Xk

(−λ)

f�(λ)

2 (−λ) F ∗2,k(−λ)F ∗

k (λ)

�(g(x), f(x)) | x ∈ X

F (x) H(y) y h(y)

supz∈Z

infx∈X

φ(x, z) = q∗ = p(0) ≤ p(0) = w∗ = infx∈X

supz∈Z

φ(x, z)

f(x) = (cl )f(x)

q∗ = (cl )p(0) ≤ p(0) = w∗

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Subgradient Cutting plane Interior point Subgradient

LPs are solved by simplex method

NLPs are solved by gradient/Newton methods.

Convex programs are special cases of NLPs.

Modern view: Post 1990s

LPs are often best solved by nonsimplex/convex methods.

Convex problems are often solved by the same methods as LPs.

Nondifferentiability and piecewise linearity are common features.Primal Description

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

Minimax Duality ( MinMax = MaxMin )

Constrained Optimization Duality

Theorems of the Alternative etc

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

Abstract Geometric Framework

Special choices of M

f!2,Xk

x1 x2 x3 x4 Slope: xk+1 Slope: xi, i " k

Constant! f!1 (!) f!

2 (!!) F !2,k(!!)F !

!(g(x), f(x)) | x # X

M =!(u, w) | there exists x # X

F (x) H(y) y h(y)

supz"Z

infx"X

"(x, z) " supz"Z

infx"X

"(x, z) = q! = p(0) " p(0) = w! = infx"X

supz"Z

"(x, z)

Shapley-Folkman Theorem: Let S = S1 + · · · + Sm with Si $ %n,i = 1, . . . , mIf s # conv(S) then s = s1 + · · · + sm wheresi # conv(Si) for all i = 1, . . . , m,si # Si for at least m! n! 1 indices i.

f(x) = (cl )f(x)

q! = (cl )p(0) " p(0) = w!

Duality Gap DecompositionConvex and concave part can be estimated separatelyq is closed and concaveMin Common ProblemMax Crossing ProblemWeak Duality q! " w!

minimize w

subject to (0, w) # M,

Special choices of M

f!2,Xk

x1 x2 x3 x4 Slope: xk+1 Slope: xi, i " k

Constant! f!1 (!) f!

2 (!!) F !2,k(!!)F !

!(g(x), f(x)) | x # X

M =!(u, w) | there exists x # X

F (x) H(y) y h(y)

supz"Z

infx"X

"(x, z) " supz"Z

infx"X

"(x, z) = q! = p(0) " p(0) = w! = infx"X

supz"Z

"(x, z)

Shapley-Folkman Theorem: Let S = S1 + · · · + Sm with Si $ %n,i = 1, . . . , mIf s # conv(S) then s = s1 + · · · + sm wheresi # conv(Si) for all i = 1, . . . , m,si # Si for at least m! n! 1 indices i.

f(x) = (cl )f(x)

q! = (cl )p(0) " p(0) = w!

Duality Gap DecompositionConvex and concave part can be estimated separatelyq is closed and concaveMin Common ProblemMax Crossing ProblemWeak Duality q! " w!

minimize w

subject to (0, w) # M,

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Nondi!erentiability and piecewise linearity are common features.Primal Description

Dual Description

Vertical Distances

Crossing Point Di!erentials

Values f(x) Crossing points f!(y)

!f!1 (y) f!

1 (y) + f!2 (!y) f!

2 (!y)

Slope y! Slope y

!f1(x) + f2(x)

"= maxy

!f!1 (y) + f!

2 (!y)"

Abstract Geometric Framework (Set M)

The Modern Era: Duality Coupled with Algorithms

Traditional view: Pre 1990sLPs are solved by simplex method (G. Dantzig view).NLPs are solved by gradient/Newton methods (M. Powell view).Convex programs are special cases of NLPs.

LP CONVEX NLP

Traditional view: Pre 1990s

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

Modern view: Post 1990sLPs are often solved by nonsimplex/convex methods.Convex problems are often solved by the same methods as LPs.“Key distinction is not Linear-Nonlinear but Convex-Nonconvex" (Rockafellar)

LP CONVEX NLP

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Subgradient Cutting plane Interior point

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

LP CONVEX NLP

Simplex

Gradient/Newton

Duality

Dual Description

Vertical Distances

−f∗1 (y) f∗

1 (y) + f∗2 (−y) f∗

2 (−y)

Slope y∗ Slope y

�f1(x) + f2(x)

�= maxy

�f∗1 (y) + f∗

2 (−y)�

Methodological Trends

Convex programs and LPs connect around duality and large-scalepiecewise linear problems.

New methods, renewed interest in old methods.Interior point methodsSubgradient methodsPolyhedral approximation/cutting plane methodsRegularization/proximal methods

Renewed emphasis on complexity analysis.Nesterov, Nemirovski, and others ...“Optimal algorithms" (e.g., extrapolated gradient methods)

Synergy Between Duality, Algorithms, and Applications

Duality-based decomposition.Large-scale resource allocationLagrangian relaxation, discrete optimizationStochastic programming

Conic programming.Robust optimizationSemidefinite programming

Machine learning.Support vector machinesl1 regularization/Robust regression/Compressed sensingIncremental methods

Speculation - What’s Next?

“It is hard to predict, especially about the future" (N. Bohr)

Very large problems/new applications (?)Problems with network overlays (e.g., smart grids).Huge data sets in machine learning.

New approaches to large size and complexity (?)Approximate dynamic programming paradigm (e.g., LP-based dynamicprogramming).Reduced space approximations.Sampling mechanisms.

Better hardware/better algorithms multiplier effect (?)

A new paradigm shift (?)

convex optimization a journey of 60 yearsdimitrib/convex_opt_paths_ahead_s… · lids paths ahead...

Documents

cups & drinkware - bidfood · cold cups & lids hot cups &...

rural life: early 1900s

japan meiji 1700s-1900s

women - early 1900s

australia in 1900s

life in 1850s – 1900s

truck caps & lids

(1850s – early 1900s)

early 1900s

chennai in early 1900s

huckleberry finn country, early 1900s

history of russia (until 1900s)

ottoman & qing (1750-1900s)

20th century egypt - 1900s

the polynomial caratheodory-fejer approximation method...

manual basico lids al rescate

b3 55 hood lids

proof of the caratheodory conjecture

america in the 1900s

ottoman & qing 1750 1900s