cvpr2010: higher order models in computer vision: part 3

Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions

Higher Order Models in Computer Vision:Inference

Carsten Rother, Sebastian Nowozin

Machine Learning and Perception GroupMicrosoft Research Cambridge

San Francisco, 18th June 2010

Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge

Higher Order Models in Computer Vision: Inference


Belief Propagation

Belief Propagation Reminder

I Sum-product belief propagation

I Max-product/Max-sum/Min-sum beliefpropagation

Uses

I Messages: vectors at the factor graphedges

I qYi→F ∈ RYi , the variable-to-factormessage, and

I rF→Yi ∈ RYi , the factor-to-variablemessage.

I Updates: iteratively recomputing thesevectors

rF→Yi

F qYi→F

Yi... ...




Belief Propagation

Belief Propagation Reminder: Min-Sum Updates (cont)

Variable-to-factor message

qYi→F (yi ) =∑

F ′∈M(i)\{F}

rF ′→Yi (yi )

Factor-to-variable message

rF→Yi (yi ) = miny ′F∈YF

y ′i =yi

EF (y ′F ) +

∑

j∈N(F )\{i}

qYj→F (y ′j )

Yi

qYi→F

F...

rA→Yi

rB→Yi

rF→Yi

A

B

rF→Yi

F...

qYj→F

qYk→F

qYi→F

Yi

Yj

Yk




Belief Propagation

Higher-order Factors

Factor-to-variable message

rF→Yi (yi ) = miny ′F∈YF

y ′i =yi

EF (y ′F ) +

∑

j∈N(F )\{i}

qYj→F (y ′j )

I Minimization becomes intractable forgeneral EF

I Tractable for special structure of EF

I See (Felzenszwalb and Huttenlocher,CVPR 2004), (Potetz, CVPR 2007), and(Tarlow et al., AISTATS 2010)

rF→Yi

F

...

YiYl

Yj

Yk

Ym

Yn




Relaxations

Problem Relaxations

I Optimization problems (minimizing g : G → R over Y ⊆ G) canbecome easier if

I feasible set is enlarged, and/orI objective function is replaced with a bound.

y, z

g(x),h(x)

h(z)

g(y)

Y

Z ⊇ Y




Relaxations

Problem Relaxations

I Optimization problems (minimizing g : G → R over Y ⊆ G) canbecome easier if

I feasible set is enlarged, and/orI objective function is replaced with a bound.

Definition (Relaxation (Geoffrion, 1974))

Given two optimization problems (g ,Y,G) and (h,Z,G), the problem(h,Z,G) is said to be a relaxation of (g ,Y,G) if,

1. Z ⊇ Y, i.e. the feasible set of the relaxation contains the feasibleset of the original problem, and

2. ∀y ∈ Y : h(y) ≤ g(y), i.e. over the original feasible set the objectivefunction h achieves no larger values than the objective function g .(Minimization version)




Relaxations

Relaxations

I Relaxed solution z∗ provides a bound:

h(z∗) ≥ g(y∗), (maximization: upper bound)

h(z∗) ≤ g(y∗), (minimization: lower bound)

I Relaxation is typically tractable

I Evidence that relaxations are “better” for learning (Kulesza andPereira, 2007), (Finley and Joachims, 2008), (Martins et al., 2009)

I Drawback: z∗ ∈ Z \ Y possibleI solution is not integral (but fractional), orI solution violates constraints (primal infeasible).

I To obtain z ′ ∈ Y from y∗: rounding, heuristics




Relaxations

Constructing Relaxations

Principled methods to construct relaxations

I Linear Programming Relaxations

I Lagrangian relaxation

I Lagrangian/Dual decomposition




MAP-MRF LP

MAP-MRF Linear Programming Relaxation

Simple two variable pairwise MRF

Y1 Y2

Y1 Y1 × Y2 Y2

θ1 θ1,2 θ2




MAP-MRF LP


Y1 Y2

Y1 Y1 × Y2 Y2

y1 = 2

y2 = 3

(y1, y2) = (2, 3)

I Energy E (y ; x) = θ1(y1; x) + θ1,2(y1, y2; x) + θ2(y2; x)

I Probability p(y |x) = 1Z(x) exp{−E (y ; x)}

I MAP prediction: argmaxy∈Y

p(y |x) = argminy∈Y

E (y ; x)




MAP-MRF LP


Y1 Y2

Y1 Y1 × Y2 Y2

µ1 ∈ {0, 1}Y1 µ1,2 ∈ {0, 1}Y1×Y2 µ2 ∈ {0, 1}Y2

µ1(y1) =∑

y2∈Y2µ1,2(y1, y2)

∑y1∈Y1

µ1,2(y1, y2) = µ2(y2)∑y1∈Y1

µ1(y1) = 1∑

y2∈Y2µ2(y2) = 1

∑(y1,y2)∈Y1×Y2

µ1,2(y1, y2) = 1

I Energy is now linear: E (y ; x) = 〈θ,µ〉




MAP-MRF LP

MAP-MRF Linear Programming Relaxation (cont)

minµ

∑i∈V

∑yi∈Yi

θi (yi )µi (yi ) +∑{i,j}∈E

∑(yi ,yj )∈Yi×Yj

θi,j(yi , yj)µi,j(yi , yj)

sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,

∑yj∈Yj

µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E ,∀yi ∈ Yi

µi (yi ) ∈ {0, 1}, ∀i ∈ V ,∀yi ∈ Yi ,

µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj .




MAP-MRF LP


minµ

∑i∈V

∑yi∈Yi




sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,

∑yj∈Yj


µi (yi ) ∈ {0, 1}, ∀i ∈ V ,∀yi ∈ Yi ,


I → NP-hard integer linear program




MAP-MRF LP


minµ

∑i∈V

∑yi∈Yi




sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,

∑yj∈Yj


µi (yi ) ∈ [0, 1], ∀i ∈ V , ∀yi ∈ Yi ,

µi,j(yi , yj) ∈ [0, 1], ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj .

I → linear program

I → LOCAL polytope




MAP-MRF LP

MAP-MRF LP Relaxation, Works

MAP-MRF LP Analysis

I (Wainwright et al., Trans. Inf. Tech., 2005), (Weiss et al., UAI2007), (Werner, PAMI 2005)

I TRW-S (Kolmogorov, PAMI 2006)

Improving tightness

I (Komodakis and Paragios, ECCV 2008), (Kumar and Torr, ICML2008), (Sontag and Jaakkola, NIPS 2007), (Sontag et al., UAI2008), (Werner, CVPR 2008), (Kumar et al., NIPS 2007)

Algorithms based on the LP

I (Globerson and Jaakkola, NIPS 2007), (Kumar and Torr, ICML2008), (Sontag et al., UAI 2008)




MAP-MRF LP

MAP-MRF LP Relaxation, General case

∑yA∈YA

µA(yA) = 1,∑

yA∈YA

[yA]B=yB

µA(yA) = µB(yB), ∀yB ∈ YB.

AB

Generalization to higher-order factors

I For B ⊆ A factors: marginalization constraints

I Defines LOCAL polytope (Wainwright and Jordan, 2008)




MAP-MRF LP

MAP-MRF LP Relaxation, Known results

1. LOCAL is tight iff the factor graph is a forest (acyclic)

2. All labelings are vertices of LOCAL (Wainwright and Jordan, 2008)

3. For cyclic graphs there are additional fractional vertices.

4. If all factors have regular energies, the fractional solutions are neveroptimal (Wainwright and Jordan, 2008)

5. For models with only binary states: half-integrality, integral variablesare certain

See some examples later...




Lagrangian Relaxation










Lagrangian Relaxation, Motivation

∑yA∈YA

[yA]B=yB


A

B

MAP-MRF LP with higher-order factors problematic:

I Number of LP variables grows with |YA| → exponential in MRFvariables

I But: number of constraints stays small (|YB |)Lagrangian Relaxation allows one to solve the LP by treating theconstraints implicitly






miny

g(y)

sb.t. y ∈ D,y ∈ C.

Assumption

I Optimizing g(y) over y ∈ D is “easy”.

I Optimizing over y ∈ D ∩ C is hard.

High-level idea

I Incorporate y ∈ C constraint into objective function

For an excellent introduction, see (Guignard, “Lagrangean Relaxation”,TOP 2003) and (Lemarechal, “Lagrangian Relaxation”, 2001).





Lagrangian Relaxation (cont)

Recipe

1. Express C in terms of equalities and inequalities

C = {y ∈ G : ui (y) = 0,∀i = 1, . . . , I , vj(y) ≤ 0,∀j = 1, . . . , J},

I ui : G → R differentiable, typically affine,I vj : G → R differentiable, typically convex.

2. Introduce Lagrange multipliers, yielding

miny

g(y)

sb.t. y ∈ D,ui (y) = 0 : λ, i = 1, . . . , I ,

vj(y) ≤ 0 : µ, j = 1, . . . , J.






2. Introduce Lagrange multipliers, yielding

miny

g(y)

sb.t. y ∈ D,ui (y) = 0 : λ, i = 1, . . . , I ,

vj(y) ≤ 0 : µ, j = 1, . . . , J.

3. Build partial Lagrangian

miny

g(y) + λ>u(y) + µ>v(y) (1)

sb.t. y ∈ D.







miny

g(y) + λ>u(y) + µ>v(y) (1)

sb.t. y ∈ D.

Theorem (Weak Duality of Lagrangean Relaxation)

For differentiable functions ui : G → R and vj : G → R, and for anyλ ∈ RI and any non-negative µ ∈ RJ , µ ≥ 0, problem (1) is a relaxationof the original problem: its value is lower than or equal to the optimalvalue of the original problem.







q(λ, µ) := miny

g(y) + λ>u(y) + µ>v(y) (1)

sb.t. y ∈ D.

4. Maximize lower bound wrt λ, µ

maxλ,µ

q(λ, µ)

sb.t. µ ≥ 0

I → efficiently solvable if q(λ, µ) can be evaluated





Optimizing Lagrangian Dual Functions4. Maximize lower bound wrt λ, µ

maxλ,µ

q(λ, µ) (2)

sb.t. µ ≥ 0

Theorem (Lagrangean Dual Function)

1. q is concave in λ and µ, Problem (2) is a concave maximizationproblem.

2. If q is unbounded above, then the original problem is infeasible.

3. For any λ, µ ≥ 0, let

y(λ, µ) = argminy∈D g(y) + λ>u(y) + µ>v(u)

Then, a supergradient of q can be constructed by evaluating theconstraint functions at y(λ, µ) as

u(y(λ, µ)) ∈ ∂

∂λq(λ, µ), and v(y(λ, µ)) ∈ ∂

∂µq(λ, µ).





Subgradients, Supergradients

f(x)

I Subgradients for convex functions: global affine underestimators

I Supergradients for concave functions: global affine overestimators






f(x)

x








f(x)

f(y) ≥ f(x) + g>(y − x), ∀yx








f(x)

0 ∈ ∂∂x

f(x)x







Example behaviour of objectives

(from a 10-by-10 pairwise binary MRF relaxation, submodular)

0 50 100 150 200 250−50

−40

−30

−20

−10

0

10

Iteration

Ob

jective

Dual objective

Primal objective





Primal Solution Recovery

Assume we solved the dual problem for (λ∗, µ∗)

1. Can we obtain a primal solution y∗?

2. Can we say something about the relaxation quality?

Theorem (Sufficient Optimality Conditions)

If for a given λ, µ ≥ 0, we have u(y(λ, µ)) = 0 and v(y(λ, µ)) ≤ 0(primal feasibility) and further we have

λ>u(y(λ, µ)) = 0, and µ>v(y(λ, µ)) = 0,

(complementary slackness), then

I y(λ, µ) is an optimal primal solution to the original problem, and

I (λ, µ) is an optimal solution to the dual problem.





Primal Solution Recovery (cont)But,

I we might never see a solution satisfying the optimality conditionI is the case only if there is no duality gap, i.e. q(λ∗, µ∗) = g(x , y∗)

Special case: integer linear programsI we can always reconstruct a primal solution to

miny

g(y) (3)

sb.t. y ∈ conv(D),

y ∈ C.I For example for subgradient method updates it is known that

(Anstreicher and Wolsey, MathProg 2009)

limT→∞

1

T

T∑

t=1

y(λt , µt)→ y∗ of (3).





Application to higher-order factors

∑yA∈YA

[yA]B=yB


A

B

I (Komodakis and Paragios, CVPR 2009) propose a MAP inferencealgorithm suitable for higher-order factors

I Can be seen as Lagrangian relaxation of the marginalizationconstraint of the MAP-MRF LP

I See also (Johnson et al., ACCC 2007), (Schlesinger and Giginyak,CSC 2007), (Wainwright and Jordan, 2008, Sect. 4.1.3), (Werner,PAMI 2005), (Werner, TechReport 2009), (Werner, CVPR 2008)





Komodakis MAP-MRF Decomposition

minµ

∑i∈V

∑yi∈Yi

θi (yi )µi (yi ) +∑{i,j}∈

E



sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,


µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,

∑yj∈Yj

µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,






Komodakis MAP-MRF Decomposition

minµ

∑i∈V〈θi , µi 〉+

∑{i,j}∈

E

〈θi,j , µi,j〉

sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,


µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,

∑yj∈Yj


µi (yi ) ∈ {0, 1}, ∀i ∈ V ,∀yi ∈ Yi ,






Komodakis MAP-MRF Decompositionminµ


∑{i,j}∈

E

〈θi,j , µi,j〉+∑

yA∈YA

θA(yA)µA(yA)

sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,


µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,

∑yj∈Yj


∑yA∈YA,[yA]B=yB

µA(yA) = µB(yB), ∀B ⊂ A, yB ∈ YB ,

∑yA∈YA

µA(yA) = 1,

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,

µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E ,∀(yi , yj) ∈ Yi × Yj ,

µA(yA) ∈ {0, 1}, ∀yA ∈ YA.





Komodakis MAP-MRF Decompositionminµ


∑{i,j}∈

E


yA∈YA

θA(yA)µA(yA)

sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,


µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,

∑yj∈Yj


∑yA∈YA,[yA]B=yB

µA(yA) = µB(yB) : φA→B(yB), ∀B ⊂ A, yB ∈ YB ,

∑yA∈YA

µA(yA) = 1,

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,


µA(yA) ∈ {0, 1}, ∀yA ∈ YA.





Komodakis MAP-MRF Decompositionq(φ) := min

µ


∑{i,j}∈

E


yA∈YA

θA(yA)µA(yA)

+∑B⊂A,yB∈YB

φA→B(yB)

∑yA∈YA,[yA]B=yB

µA(yA)− µB(yB)

sb.t.

∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,


µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,

∑yA∈YA

µA(yA) = 1,

∑yj∈Yj


µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,


µA(yA) ∈ {0, 1}, ∀yA ∈ YA.Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge





µ


∑{i,j}∈

E


yA∈YA

θA(yA)µA(yA)

+∑B⊂A,yB∈YB

φA→B(yB)∑

yA∈YA,[yA]B=yB

µA(yA)−∑B⊂A,yB∈YB

φA→B(yB)µB(yB)

sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,


µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,

∑yA∈YA

µA(yA) = 1,

∑yj∈Yj


µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,

µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj ,

µA(yA) ∈ {0, 1}, ∀yA ∈ YA.






µ


∑{i,j}∈

E

〈θi,j , µi,j〉 −∑B⊂A,yB∈YB

φA→B(yB)µB(yB)

+∑B⊂A,yB∈YB

φA→B(yB)∑

yA∈YA,[yA]B=yB

µA(yA) +∑

yA∈YA

θA(yA)µA(yA)

sb.t.∑yi∈Yi

µi (yi ) = 1, ∀i ∈ V ,


µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,

∑yA∈YA

µA(yA) = 1,

∑yj∈Yj


µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,

µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj ,

µA(yA) ∈ {0, 1}, ∀yA ∈ YA.Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge




Message passing

A

B

C E

D

φA→B

φA→C

φA→D

φA→E

I Maximizing q(φ) corresponds to message passing from thehigher-order factor to the unary factors





The higher-order subproblem

minµA

∑B⊂A,yB∈YB

φA→B(yB)∑

yA∈YA,[yA]B=yB

µA(yA) +∑

yA∈YA

θA(yA)µA(yA)

sb.t.∑

yA∈YA

µA(yA) = 1,

µA(yA) ∈ {0, 1}, ∀yA ∈ YA.

I Simple structure: select one element

I Simplify to an optimization problem over yA ∈ YA





The higher-order subproblem

minyA∈YA

θA(yA) +∑B⊂A,yB∈YB

I ([yA]B = yB)φA→B(yB)

I Solvability depends on structure of θA as function of yAI (For now) assume we can solve for y∗A ∈ YAI How can we use this to solve the original large LP?





Maximizing the dual function: Subgradient Method

Maximizing q(φ):

1. Initialize φ

2. Iterate

2.1 Given φ solve pairwise MAP subproblem, obtain µ∗1 , µ∗22.2 Given φ solve higher-order MAP subproblem, obtain y∗A2.3 Update φ

I Supergradient

∂

∂φB(yB)3

∑B⊂A,yB∈YB

(I ([y∗A ]B = yB)− µB(yB))

I Supergradient ascent, step size αt > 0

φB(yB)← φB(yB) + αt

∑B⊂A,yB∈YB

(I ([y∗A ]B = yB)− µB(yB))

The step size αt must satisfy the “diminishing step size conditions”,i) limt→∞ αt → 0, and ii)

∑∞t=0 αt →∞.





Subgradient Method (cont)

Typical step size choices

1. Simple diminishing step size,

αt =1 + m

t + m,

where m > 0 is an arbitrary constant.

2. Polyak’s step size,

αt = βt q − q(λt , µt)

‖u(y(λt , µt))‖2 + ‖v(y(λt , µt))‖2,

whereI 0 < βt < 2 is a diminishing step size, andI q ≥ q(λ∗, µ∗) is an upper bound on the optimal dual objective.





Subgradient Method (cont)

Primal solution recovery

I Uniform average

limT→∞

1

T

T∑

t=1

y(λt , µt)→ y∗.

I Geometric series (volume algorithm)

y t = γy(λt , µt) + (1− γ)y t−1,

where 0 < γ < 1 is a small constant such as γ = 0.1, and y t mightbe slightly primal infeasible.





Pattern-based higher order potentials

Example (Pattern-based potentials)

Given a small set of patterns P = {y1, y2, . . . , yK}, define

θA(yA) =

{CyA if yA ∈ P,

Cmax otherwise.

I (Rother et al, CVPR 2009), (Komodakis and Paragios, CVPR 2009)

I Generalizes Pn-Potts model (Kohli et al., CVPR 2007)

I “Patterns have low energies”: Cy ≤ Cmax for all y ∈ P





Pattern-based higher order potentials

Example (Pattern-based potentials)

Given a small set of patterns P = {y1, y2, . . . , yK}, define

θA(yA) =

{CyA if yA ∈ P,

Cmax otherwise.

Can solve

minyA∈YA

θA(yA) +∑

B⊂A,yB∈YB

I ([yA]B = yB)φA→B(yB)

by

1. Testing all yA ∈ P, and

2. Fixing θA(yA) = Cmax and solving for the minimum of the secondterm.




Dual Decomposition









Dual Decomposition

Dual/Lagrangian DecompositionAdditive structure

miny

K∑

k=1

gk(y)

sb.t. y ∈ Yk , ∀k = 1, . . . ,K ,

such that∑K

k=1 gk(y) is hard, but for any k ,

miny

gk(y)

sb.t. y ∈ Yk

is tractable.

I (Guignard, “Lagrangean Decomposition”, MathProg 1987)

I (Conejo et al., “Decomposition Techniques in MathematicalProgramming”, 2006)




Dual Decomposition

Dual/Lagrangian Decomposition

Idea

1. Selectively duplicate variables (“variable splitting”)

miny1,...,yK ,y

K∑

k=1

gk(yk)

sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,

y = yk : λk , k = 1, . . . ,K , (4)

where Y ′ ⊇ ⋂k=1,...,K Yk .

2. Add coupling equality constraints between duplicated variables

3. Apply Lagrangian relaxation to the coupling constraint

Also known as dual decomposition and variable splitting.




Dual Decomposition

Dual/Lagrangian Decomposition (cont)

miny1,...,yK ,y

K∑

k=1

gk(yk)

sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,

y = yk : λk , k = 1, . . . ,K . (5)




Dual Decomposition


miny1,...,yK ,y

K∑

k=1

gk(yk) +K∑

k=1

λk> (y − yk)

sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,




Dual Decomposition


miny1,...,yK ,y

K∑

k=1

(gk(yk)− λk>yk

)+

(K∑

k=1

λk

)>y

sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,

I Problem is decomposed

I There can be multiple possible decompositions of different quality




Dual Decomposition


Dual function

q(λ) := miny1,...,yK ,y

K∑

k=1

(gk(yk)− λk>yk

)+

(K∑

k=1

λk

)>y

sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,

Dual maximization problem

maxλ

q(λ)

sb.t.

K∑

k=1

λk = 0,

where {λ|∑Kk=1 λk 6= 0} is the domain where q(λ) > −∞.




Dual Decomposition


Dual maximization problem

maxλ

q(λ)

sb.t.

K∑

k=1

λk = 0,

Projected subgradient method, using

∂

∂λk3 (y − yk)− 1

K

K∑

`=1

(y − y`) =1

K

K∑

`=1

y` − yk .




Dual Decomposition

Example (Connectivity, Vicente et al., CVPR 2008)




Dual Decomposition


miny

E (y) + C (y),

where E (y) is a normal binary pairwise segmentation energy, and

C (y) =

{0 if marked points are connected,∞ otherwise.




Dual Decomposition


miny ,z E (y) + C (z)

sb.t. y = z .




Dual Decomposition


miny ,z E (y) + C (z)

sb.t. y = z : λ.




Dual Decomposition


miny ,z E (y) + C (z) + λ>(y − z)




Dual Decomposition


q(λ) := miny ,z E (y) + λ>y︸︷︷︸Subproblem 1

+ C (z)− λ>z︸︷︷︸Subproblem 2

I Subproblem 1: normal binary image segmentation energy

I Subproblem 2: shortest path problem (no pairwise terms)

I → Maximize q(λ) to find best lower bound

I Above derivation simplified, see (Vicente et al., CVPR 2008) for amore sophisticated decomposition using three subproblems




Dual Decomposition

Other examples

Plenty of applications of decomposition:

I (Komodakis et al., ICCV 2007)

I (Strandmark and Kahl, CVPR 2010)

I (Torresani et al., ECCV 2008)

I (Werner, TechReport 2009)

I (Strandmark and Kahl, CVPR 2010)

I (Kumar and Koller, CVPR 2010)

I (Woodford et al., ICCV 2009)

I (Vicente et al., ICCV 2009)




Dual Decomposition

Geometric Interpretation

Theorem (Lagrangian Decomposition Primal)

Let Yk be finite sets and g(y) = c>y a linear objective function. Thenthe optimal solution of the Lagrangian decomposition obtains the valueof the following relaxed primal optimization problem.

miny

K∑

k=1

gk(y)

sb.t. y ∈ conv(Yk), ∀k = 1, . . . ,K .

Therefore, optimizing the Lagrangian decomposition dual is equivalent to

I optimizing the primal objective,

I on the intersection of the convex hulls of the individual feasible sets.




Dual Decomposition

Geometric Interpretation (cont)

conv(Y2)conv(Y1)

conv(Y1)∩conv(Y2)

yD

y∗

c>y




Dual Decomposition

Lagrangian Relaxation vs. Decomposition


I Needs explicit complicating constraints

Lagrangian/Dual Decomposition

I Can work with black-box solver for subproblem (implicit)

I Can yield stronger bounds




Dual Decomposition

Lagrangian Relaxation vs. Decomposition

conv(Y2)conv(Y1)

conv(Y1)∩conv(Y2)

yD yR

y∗

c>y

Y ′1 ⊇ conv(Y1)




Polyhedral Higher-order Interactions

Polyhedral View on Higher-order Interactions

Idea

I MAP-MRF LP relaxation: LOCAL polytope

I Incorporate higher-order interactions directly by constraining LOCAL

I Techniques from polyhedral combinatorics

I See (Werner, CVPR 2008), (Nowozin and Lampert, CVPR 2009),(Lempitsky et al., ICCV 2009)





Representation of Polytopes

Convex polytopes have two equivalentrepresentations

I As a convex combination of extremepoints

I As a set of facet-defining linearinequalities

A linear inequality with respect to apolytope can be

I valid, does not cut off the polytope,

I representing a face, valid and touching,

I facet-defining, representing a face ofdimension one less than the polytope.

Z

d>1 y ≤ 1

d>2 y ≤ 1d>3 y ≤ 1





Intersecting Polytopes

Construction

1. Consider marginal polytopeM(G ) of a discrete graphicalmodel

2. Vertices of M(G ) are feasiblelabelings

3. Add linear inequalities d>µ ≤ bthat cut off vertices

4. Optimize over resultingintersection M′(G ),

EA(µA) =

{0 if µ is valid,∞ otherwise.

M(G)

µ






Construction





EA(µA) =


M(G)

µ

d>µ ≤ b






Construction





EA(µA) =


M′(G)

µ





Defining the Inequalities

Questions

1. Where do the inequalities come from?

2. Is this construction always possible?

3. How to optimize over M′(G )?





1. Deriving Combinatorial Polytopes

I “Foreground mask must be connected”

I Global constraint on labelings





Constructing the Inequalities





Constructing the Inequalities

y0 + y2 − y1 ≤ 1





Generalization and Intuition

The following inequalities are violated by unconnected subgraphs:

yi + yj −∑

k∈Syk ≤ 1, ∀(i , j) /∈ E : ∀S ∈ S(i , j)

Intuition: “If two vertices i and j are selected (yi = yj = 1, shown inblack), then any set of vertices which separate them (set S) must containat least one selected vertex.”

i j

S

. . . . . . . . .. . .

Figure: Vertex i and j and one vertex separator set S ∈ S(i , j).





Derivation, Formal StepsSteps from intuition to inequalities:

1. Identify invariants that hold for any “good” solution

2. Formulate invariants as linear inequalities

3. Prove that inequalities are facet-defining

4. Prove that together with integrality they are a formulation

Final result is a statement like:

TheoremFor a given graph G = (V ,E ), the set of all induced subgraphs that areconnected, can be described exactly by the following constraint set.

yi + yj −∑

k∈Syk ≤ 1,∀(i , j) /∈ E : ∀S ∈ S(i , j),

yi ∈ {0, 1}, i ∈ V .





2. Is this always possible?

Yes, but...

I identifying and proving the inequalities is not straight-forward,

I even if all facet-defining inequalities are known, problem can remainintractable.

Further reading

I Polyhedral combinatorics literature (Schrijver, 2003), (Ziegler, 1994)

I There exist software to help identifying facet-defining inequalities,see e.g. polymake





3. How to optimize over M′(G )?

I LOCAL has a small number of inequalities,

I M′(G ) typically has an exponential number of inequalities,

I Optimization over M′(G ) is still tractable if separation problem canbe solved.

Definition (Separation problem)

Given a point y , prove that y ∈M′(G ) or produce a valid inequality thatis strictly violated.





Example Separation for Connectivity

Need to show that for all (i , j) /∈ E with yi > 0, yj > 0:

yi + yj −∑

k∈Syk − 1 ≤ 0, ∀S ∈ S(i , j).

Equivalently, in variational form:

maxS∈S(i,j)

(yi + yj −

∑

k∈Syk − 1

)≤ 0. (5)

If (5) is satisfied, no violated inequality for (i , j) exist. Otherwise, the setcorresponding to the violated inequality is

S∗(i , j) = argminS∈S(i,j)

∑

k∈Syk . (6)





Example Separation for Connectivity (cont)

Need to solve

S∗(i , j) = argminS∈S(i,j)

∑

k∈Syk

I Transform G into a directededge-weighted graph

I Solve minimum (i , j)-cut problem

I Equivalently: solve a linear max-flowproblem (efficient)

It follows that,

I by construction, a finite-capacity(i , j)-cut always exist,

I the cut found corresponds to a violatedinequality or proves absence.

i j

a b

c

∞

∞

∞

∞

∞

∞ ji

ya yb

yc





Example

X pattern

5 10 15 20 25 30

5

10

15

20

25

30

Noisy X pattern

5 10 15 20 25 30

5

10

15

20

25

30





ExampleMRF result

5 10 15 20 25 30

5

10

15

20

25

30

MRFcomp result

5 10 15 20 25 30

5

10

15

20

25

30

CMRF result

5 10 15 20 25 30

5

10

15

20

25

30

Figure: MRF/MRFcomp/CMRF: E = −985.61, E = −974.16, E = −984.21MRF result

5 10 15 20 25 30

5

10

15

20

25

30

MRFcomp result

5 10 15 20 25 30

5

10

15

20

25

30

CMRF result

5 10 15 20 25 30

5

10

15

20

25

30

Figure: MRF/MRFcomp/CMRF: E = −980.13, E = −974.03, E = −976.83





Bounding-box Prior

Example (Bounding Box Prior (Lempitsky et al., ICCV 2009))

I Prior: segmentation should be tight wrt bounding box

I Exponential number of linear inequalities





Relaxation and Decomposition: Summary

Lagrangian relaxation

I requires: complicating constraints

I widely applicable

Problem decomposition

I requires: multiple tractable subproblems

I can make use of efficient algorithms for the subproblems

I widely applicable

Polyhedral higher-order interactions

I requires: combinatorial understanding of constraint set

I universal: can represent any constraint

I tractability depends on tractable separation oracle



cvpr2010: higher order models in computer vision: part 3

Education

yi propagation qyi f

relaxation of g

brf yi

variable message yj

hy g y

rf yi ryi

computer vision

upper boundhz g y