cvpr2010: higher order models in computer vision: part 3
DESCRIPTION
TRANSCRIPT
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Higher Order Models in Computer Vision:Inference
Carsten Rother, Sebastian Nowozin
Machine Learning and Perception GroupMicrosoft Research Cambridge
San Francisco, 18th June 2010
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Belief Propagation
Belief Propagation Reminder
I Sum-product belief propagation
I Max-product/Max-sum/Min-sum beliefpropagation
Uses
I Messages: vectors at the factor graphedges
I qYi→F ∈ RYi , the variable-to-factormessage, and
I rF→Yi ∈ RYi , the factor-to-variablemessage.
I Updates: iteratively recomputing thesevectors
rF→Yi
F qYi→F
Yi... ...
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Belief Propagation
Belief Propagation Reminder: Min-Sum Updates (cont)
Variable-to-factor message
qYi→F (yi ) =∑
F ′∈M(i)\{F}
rF ′→Yi (yi )
Factor-to-variable message
rF→Yi (yi ) = miny ′F∈YF
y ′i =yi
EF (y ′F ) +
∑
j∈N(F )\{i}
qYj→F (y ′j )
Yi
qYi→F
F...
rA→Yi
rB→Yi
rF→Yi
A
B
rF→Yi
F...
qYj→F
qYk→F
qYi→F
Yi
Yj
Yk
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Belief Propagation
Belief Propagation Reminder: Min-Sum Updates (cont)
Variable-to-factor message
qYi→F (yi ) =∑
F ′∈M(i)\{F}
rF ′→Yi (yi )
Factor-to-variable message
rF→Yi (yi ) = miny ′F∈YF
y ′i =yi
EF (y ′F ) +
∑
j∈N(F )\{i}
qYj→F (y ′j )
Yi
qYi→F
F...
rA→Yi
rB→Yi
rF→Yi
A
B
rF→Yi
F...
qYj→F
qYk→F
qYi→F
Yi
Yj
Yk
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Belief Propagation
Higher-order Factors
Factor-to-variable message
rF→Yi (yi ) = miny ′F∈YF
y ′i =yi
EF (y ′F ) +
∑
j∈N(F )\{i}
qYj→F (y ′j )
I Minimization becomes intractable forgeneral EF
I Tractable for special structure of EF
I See (Felzenszwalb and Huttenlocher,CVPR 2004), (Potetz, CVPR 2007), and(Tarlow et al., AISTATS 2010)
rF→Yi
F
...
YiYl
Yj
Yk
Ym
Yn
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Belief Propagation
Higher-order Factors
Factor-to-variable message
rF→Yi (yi ) = miny ′F∈YF
y ′i =yi
EF (y ′F ) +
∑
j∈N(F )\{i}
qYj→F (y ′j )
I Minimization becomes intractable forgeneral EF
I Tractable for special structure of EF
I See (Felzenszwalb and Huttenlocher,CVPR 2004), (Potetz, CVPR 2007), and(Tarlow et al., AISTATS 2010)
rF→Yi
F
...
YiYl
Yj
Yk
Ym
Yn
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Relaxations
Problem Relaxations
I Optimization problems (minimizing g : G → R over Y ⊆ G) canbecome easier if
I feasible set is enlarged, and/orI objective function is replaced with a bound.
y, z
g(x),h(x)
h(z)
g(y)
Y
Z ⊇ Y
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Relaxations
Problem Relaxations
I Optimization problems (minimizing g : G → R over Y ⊆ G) canbecome easier if
I feasible set is enlarged, and/orI objective function is replaced with a bound.
Definition (Relaxation (Geoffrion, 1974))
Given two optimization problems (g ,Y,G) and (h,Z,G), the problem(h,Z,G) is said to be a relaxation of (g ,Y,G) if,
1. Z ⊇ Y, i.e. the feasible set of the relaxation contains the feasibleset of the original problem, and
2. ∀y ∈ Y : h(y) ≤ g(y), i.e. over the original feasible set the objectivefunction h achieves no larger values than the objective function g .(Minimization version)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Relaxations
Relaxations
I Relaxed solution z∗ provides a bound:
h(z∗) ≥ g(y∗), (maximization: upper bound)
h(z∗) ≤ g(y∗), (minimization: lower bound)
I Relaxation is typically tractable
I Evidence that relaxations are “better” for learning (Kulesza andPereira, 2007), (Finley and Joachims, 2008), (Martins et al., 2009)
I Drawback: z∗ ∈ Z \ Y possibleI solution is not integral (but fractional), orI solution violates constraints (primal infeasible).
I To obtain z ′ ∈ Y from y∗: rounding, heuristics
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Relaxations
Relaxations
I Relaxed solution z∗ provides a bound:
h(z∗) ≥ g(y∗), (maximization: upper bound)
h(z∗) ≤ g(y∗), (minimization: lower bound)
I Relaxation is typically tractable
I Evidence that relaxations are “better” for learning (Kulesza andPereira, 2007), (Finley and Joachims, 2008), (Martins et al., 2009)
I Drawback: z∗ ∈ Z \ Y possibleI solution is not integral (but fractional), orI solution violates constraints (primal infeasible).
I To obtain z ′ ∈ Y from y∗: rounding, heuristics
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Relaxations
Constructing Relaxations
Principled methods to construct relaxations
I Linear Programming Relaxations
I Lagrangian relaxation
I Lagrangian/Dual decomposition
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Relaxations
Constructing Relaxations
Principled methods to construct relaxations
I Linear Programming Relaxations
I Lagrangian relaxation
I Lagrangian/Dual decomposition
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF Linear Programming Relaxation
Simple two variable pairwise MRF
Y1 Y2
Y1 Y1 × Y2 Y2
θ1 θ1,2 θ2
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF Linear Programming Relaxation
Y1 Y2
Y1 Y1 × Y2 Y2
y1 = 2
y2 = 3
(y1, y2) = (2, 3)
I Energy E (y ; x) = θ1(y1; x) + θ1,2(y1, y2; x) + θ2(y2; x)
I Probability p(y |x) = 1Z(x) exp{−E (y ; x)}
I MAP prediction: argmaxy∈Y
p(y |x) = argminy∈Y
E (y ; x)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF Linear Programming Relaxation
Y1 Y2
Y1 Y1 × Y2 Y2
µ1 ∈ {0, 1}Y1 µ1,2 ∈ {0, 1}Y1×Y2 µ2 ∈ {0, 1}Y2
µ1(y1) =∑
y2∈Y2µ1,2(y1, y2)
∑y1∈Y1
µ1,2(y1, y2) = µ2(y2)∑y1∈Y1
µ1(y1) = 1∑
y2∈Y2µ2(y2) = 1
∑(y1,y2)∈Y1×Y2
µ1,2(y1, y2) = 1
I Energy is now linear: E (y ; x) = 〈θ,µ〉
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF Linear Programming Relaxation (cont)
minµ
∑i∈V
∑yi∈Yi
θi (yi )µi (yi ) +∑{i,j}∈E
∑(yi ,yj )∈Yi×Yj
θi,j(yi , yj)µi,j(yi , yj)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E ,∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V ,∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj .
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF Linear Programming Relaxation (cont)
minµ
∑i∈V
∑yi∈Yi
θi (yi )µi (yi ) +∑{i,j}∈E
∑(yi ,yj )∈Yi×Yj
θi,j(yi , yj)µi,j(yi , yj)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E ,∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V ,∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj .
I → NP-hard integer linear program
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF Linear Programming Relaxation (cont)
minµ
∑i∈V
∑yi∈Yi
θi (yi )µi (yi ) +∑{i,j}∈E
∑(yi ,yj )∈Yi×Yj
θi,j(yi , yj)µi,j(yi , yj)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E ,∀yi ∈ Yi
µi (yi ) ∈ [0, 1], ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ [0, 1], ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj .
I → linear program
I → LOCAL polytope
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF LP Relaxation, Works
MAP-MRF LP Analysis
I (Wainwright et al., Trans. Inf. Tech., 2005), (Weiss et al., UAI2007), (Werner, PAMI 2005)
I TRW-S (Kolmogorov, PAMI 2006)
Improving tightness
I (Komodakis and Paragios, ECCV 2008), (Kumar and Torr, ICML2008), (Sontag and Jaakkola, NIPS 2007), (Sontag et al., UAI2008), (Werner, CVPR 2008), (Kumar et al., NIPS 2007)
Algorithms based on the LP
I (Globerson and Jaakkola, NIPS 2007), (Kumar and Torr, ICML2008), (Sontag et al., UAI 2008)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF LP Relaxation, General case
∑yA∈YA
µA(yA) = 1,∑
yA∈YA
[yA]B=yB
µA(yA) = µB(yB), ∀yB ∈ YB.
AB
Generalization to higher-order factors
I For B ⊆ A factors: marginalization constraints
I Defines LOCAL polytope (Wainwright and Jordan, 2008)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
MAP-MRF LP
MAP-MRF LP Relaxation, Known results
1. LOCAL is tight iff the factor graph is a forest (acyclic)
2. All labelings are vertices of LOCAL (Wainwright and Jordan, 2008)
3. For cyclic graphs there are additional fractional vertices.
4. If all factors have regular energies, the fractional solutions are neveroptimal (Wainwright and Jordan, 2008)
5. For models with only binary states: half-integrality, integral variablesare certain
See some examples later...
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Constructing Relaxations
Principled methods to construct relaxations
I Linear Programming Relaxations
I Lagrangian relaxation
I Lagrangian/Dual decomposition
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Lagrangian Relaxation, Motivation
∑yA∈YA
[yA]B=yB
µA(yA) = µB(yB), ∀yB ∈ YB.
A
B
MAP-MRF LP with higher-order factors problematic:
I Number of LP variables grows with |YA| → exponential in MRFvariables
I But: number of constraints stays small (|YB |)Lagrangian Relaxation allows one to solve the LP by treating theconstraints implicitly
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Lagrangian Relaxation
miny
g(y)
sb.t. y ∈ D,y ∈ C.
Assumption
I Optimizing g(y) over y ∈ D is “easy”.
I Optimizing over y ∈ D ∩ C is hard.
High-level idea
I Incorporate y ∈ C constraint into objective function
For an excellent introduction, see (Guignard, “Lagrangean Relaxation”,TOP 2003) and (Lemarechal, “Lagrangian Relaxation”, 2001).
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Lagrangian Relaxation (cont)
Recipe
1. Express C in terms of equalities and inequalities
C = {y ∈ G : ui (y) = 0,∀i = 1, . . . , I , vj(y) ≤ 0,∀j = 1, . . . , J},
I ui : G → R differentiable, typically affine,I vj : G → R differentiable, typically convex.
2. Introduce Lagrange multipliers, yielding
miny
g(y)
sb.t. y ∈ D,ui (y) = 0 : λ, i = 1, . . . , I ,
vj(y) ≤ 0 : µ, j = 1, . . . , J.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Lagrangian Relaxation (cont)
2. Introduce Lagrange multipliers, yielding
miny
g(y)
sb.t. y ∈ D,ui (y) = 0 : λ, i = 1, . . . , I ,
vj(y) ≤ 0 : µ, j = 1, . . . , J.
3. Build partial Lagrangian
miny
g(y) + λ>u(y) + µ>v(y) (1)
sb.t. y ∈ D.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Lagrangian Relaxation (cont)
3. Build partial Lagrangian
miny
g(y) + λ>u(y) + µ>v(y) (1)
sb.t. y ∈ D.
Theorem (Weak Duality of Lagrangean Relaxation)
For differentiable functions ui : G → R and vj : G → R, and for anyλ ∈ RI and any non-negative µ ∈ RJ , µ ≥ 0, problem (1) is a relaxationof the original problem: its value is lower than or equal to the optimalvalue of the original problem.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Lagrangian Relaxation (cont)
3. Build partial Lagrangian
q(λ, µ) := miny
g(y) + λ>u(y) + µ>v(y) (1)
sb.t. y ∈ D.
4. Maximize lower bound wrt λ, µ
maxλ,µ
q(λ, µ)
sb.t. µ ≥ 0
I → efficiently solvable if q(λ, µ) can be evaluated
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Optimizing Lagrangian Dual Functions4. Maximize lower bound wrt λ, µ
maxλ,µ
q(λ, µ) (2)
sb.t. µ ≥ 0
Theorem (Lagrangean Dual Function)
1. q is concave in λ and µ, Problem (2) is a concave maximizationproblem.
2. If q is unbounded above, then the original problem is infeasible.
3. For any λ, µ ≥ 0, let
y(λ, µ) = argminy∈D g(y) + λ>u(y) + µ>v(u)
Then, a supergradient of q can be constructed by evaluating theconstraint functions at y(λ, µ) as
u(y(λ, µ)) ∈ ∂
∂λq(λ, µ), and v(y(λ, µ)) ∈ ∂
∂µq(λ, µ).
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Subgradients, Supergradients
f(x)
I Subgradients for convex functions: global affine underestimators
I Supergradients for concave functions: global affine overestimators
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Subgradients, Supergradients
f(x)
x
I Subgradients for convex functions: global affine underestimators
I Supergradients for concave functions: global affine overestimators
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Subgradients, Supergradients
f(x)
f(y) ≥ f(x) + g>(y − x), ∀yx
I Subgradients for convex functions: global affine underestimators
I Supergradients for concave functions: global affine overestimators
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Subgradients, Supergradients
f(x)
0 ∈ ∂∂x
f(x)x
I Subgradients for convex functions: global affine underestimators
I Supergradients for concave functions: global affine overestimators
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Example behaviour of objectives
(from a 10-by-10 pairwise binary MRF relaxation, submodular)
0 50 100 150 200 250−50
−40
−30
−20
−10
0
10
Iteration
Ob
jective
Dual objective
Primal objective
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Primal Solution Recovery
Assume we solved the dual problem for (λ∗, µ∗)
1. Can we obtain a primal solution y∗?
2. Can we say something about the relaxation quality?
Theorem (Sufficient Optimality Conditions)
If for a given λ, µ ≥ 0, we have u(y(λ, µ)) = 0 and v(y(λ, µ)) ≤ 0(primal feasibility) and further we have
λ>u(y(λ, µ)) = 0, and µ>v(y(λ, µ)) = 0,
(complementary slackness), then
I y(λ, µ) is an optimal primal solution to the original problem, and
I (λ, µ) is an optimal solution to the dual problem.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Primal Solution Recovery
Assume we solved the dual problem for (λ∗, µ∗)
1. Can we obtain a primal solution y∗?
2. Can we say something about the relaxation quality?
Theorem (Sufficient Optimality Conditions)
If for a given λ, µ ≥ 0, we have u(y(λ, µ)) = 0 and v(y(λ, µ)) ≤ 0(primal feasibility) and further we have
λ>u(y(λ, µ)) = 0, and µ>v(y(λ, µ)) = 0,
(complementary slackness), then
I y(λ, µ) is an optimal primal solution to the original problem, and
I (λ, µ) is an optimal solution to the dual problem.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Primal Solution Recovery (cont)But,
I we might never see a solution satisfying the optimality conditionI is the case only if there is no duality gap, i.e. q(λ∗, µ∗) = g(x , y∗)
Special case: integer linear programsI we can always reconstruct a primal solution to
miny
g(y) (3)
sb.t. y ∈ conv(D),
y ∈ C.I For example for subgradient method updates it is known that
(Anstreicher and Wolsey, MathProg 2009)
limT→∞
1
T
T∑
t=1
y(λt , µt)→ y∗ of (3).
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Application to higher-order factors
∑yA∈YA
[yA]B=yB
µA(yA) = µB(yB), ∀yB ∈ YB.
A
B
I (Komodakis and Paragios, CVPR 2009) propose a MAP inferencealgorithm suitable for higher-order factors
I Can be seen as Lagrangian relaxation of the marginalizationconstraint of the MAP-MRF LP
I See also (Johnson et al., ACCC 2007), (Schlesinger and Giginyak,CSC 2007), (Wainwright and Jordan, 2008, Sect. 4.1.3), (Werner,PAMI 2005), (Werner, TechReport 2009), (Werner, CVPR 2008)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decomposition
minµ
∑i∈V
∑yi∈Yi
θi (yi )µi (yi ) +∑{i,j}∈
E
∑(yi ,yj )∈Yi×Yj
θi,j(yi , yj)µi,j(yi , yj)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj .
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decomposition
minµ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E ,∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V ,∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj .
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decompositionminµ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉+∑
yA∈YA
θA(yA)µA(yA)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
∑yA∈YA,[yA]B=yB
µA(yA) = µB(yB), ∀B ⊂ A, yB ∈ YB ,
∑yA∈YA
µA(yA) = 1,
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E ,∀(yi , yj) ∈ Yi × Yj ,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decompositionminµ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉+∑
yA∈YA
θA(yA)µA(yA)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
∑yA∈YA,[yA]B=yB
µA(yA) = µB(yB) : φA→B(yB), ∀B ⊂ A, yB ∈ YB ,
∑yA∈YA
µA(yA) = 1,
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E ,∀(yi , yj) ∈ Yi × Yj ,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decompositionq(φ) := min
µ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉+∑
yA∈YA
θA(yA)µA(yA)
+∑B⊂A,yB∈YB
φA→B(yB)
∑yA∈YA,[yA]B=yB
µA(yA)− µB(yB)
sb.t.
∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yA∈YA
µA(yA) = 1,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E ,∀(yi , yj) ∈ Yi × Yj ,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decompositionq(φ) := min
µ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉+∑
yA∈YA
θA(yA)µA(yA)
+∑B⊂A,yB∈YB
φA→B(yB)∑
yA∈YA,[yA]B=yB
µA(yA)−∑B⊂A,yB∈YB
φA→B(yB)µB(yB)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yA∈YA
µA(yA) = 1,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj ,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decompositionq(φ) := min
µ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉 −∑B⊂A,yB∈YB
φA→B(yB)µB(yB)
+∑B⊂A,yB∈YB
φA→B(yB)∑
yA∈YA,[yA]B=yB
µA(yA) +∑
yA∈YA
θA(yA)µA(yA)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yA∈YA
µA(yA) = 1,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj ,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decompositionq(φ) := min
µ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉 −∑B⊂A,yB∈YB
φA→B(yB)µB(yB)
+∑B⊂A,yB∈YB
φA→B(yB)∑
yA∈YA,[yA]B=yB
µA(yA) +∑
yA∈YA
θA(yA)µA(yA)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yA∈YA
µA(yA) = 1,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj ,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Komodakis MAP-MRF Decompositionq(φ) := min
µ
∑i∈V〈θi , µi 〉+
∑{i,j}∈
E
〈θi,j , µi,j〉 −∑B⊂A,yB∈YB
φA→B(yB)µB(yB)
+∑B⊂A,yB∈YB
φA→B(yB)∑
yA∈YA,[yA]B=yB
µA(yA) +∑
yA∈YA
θA(yA)µA(yA)
sb.t.∑yi∈Yi
µi (yi ) = 1, ∀i ∈ V ,
∑(yi ,yj )∈Yi×Yj
µi,j(yi , yj) = 1, ∀{i , j} ∈ E ,
∑yA∈YA
µA(yA) = 1,
∑yj∈Yj
µi,j(yi , yj) = µi (yi ), ∀{i , j} ∈ E , ∀yi ∈ Yi
µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j(yi , yj) ∈ {0, 1}, ∀{i , j} ∈ E , ∀(yi , yj) ∈ Yi × Yj ,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Message passing
A
B
C E
D
φA→B
φA→C
φA→D
φA→E
I Maximizing q(φ) corresponds to message passing from thehigher-order factor to the unary factors
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
The higher-order subproblem
minµA
∑B⊂A,yB∈YB
φA→B(yB)∑
yA∈YA,[yA]B=yB
µA(yA) +∑
yA∈YA
θA(yA)µA(yA)
sb.t.∑
yA∈YA
µA(yA) = 1,
µA(yA) ∈ {0, 1}, ∀yA ∈ YA.
I Simple structure: select one element
I Simplify to an optimization problem over yA ∈ YA
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
The higher-order subproblem
minyA∈YA
θA(yA) +∑B⊂A,yB∈YB
I ([yA]B = yB)φA→B(yB)
I Solvability depends on structure of θA as function of yAI (For now) assume we can solve for y∗A ∈ YAI How can we use this to solve the original large LP?
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Maximizing the dual function: Subgradient Method
Maximizing q(φ):
1. Initialize φ
2. Iterate
2.1 Given φ solve pairwise MAP subproblem, obtain µ∗1 , µ∗22.2 Given φ solve higher-order MAP subproblem, obtain y∗A2.3 Update φ
I Supergradient
∂
∂φB(yB)3
∑B⊂A,yB∈YB
(I ([y∗A ]B = yB)− µB(yB))
I Supergradient ascent, step size αt > 0
φB(yB)← φB(yB) + αt
∑B⊂A,yB∈YB
(I ([y∗A ]B = yB)− µB(yB))
The step size αt must satisfy the “diminishing step size conditions”,i) limt→∞ αt → 0, and ii)
∑∞t=0 αt →∞.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Subgradient Method (cont)
Typical step size choices
1. Simple diminishing step size,
αt =1 + m
t + m,
where m > 0 is an arbitrary constant.
2. Polyak’s step size,
αt = βt q − q(λt , µt)
‖u(y(λt , µt))‖2 + ‖v(y(λt , µt))‖2,
whereI 0 < βt < 2 is a diminishing step size, andI q ≥ q(λ∗, µ∗) is an upper bound on the optimal dual objective.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Subgradient Method (cont)
Primal solution recovery
I Uniform average
limT→∞
1
T
T∑
t=1
y(λt , µt)→ y∗.
I Geometric series (volume algorithm)
y t = γy(λt , µt) + (1− γ)y t−1,
where 0 < γ < 1 is a small constant such as γ = 0.1, and y t mightbe slightly primal infeasible.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Pattern-based higher order potentials
Example (Pattern-based potentials)
Given a small set of patterns P = {y1, y2, . . . , yK}, define
θA(yA) =
{CyA if yA ∈ P,
Cmax otherwise.
I (Rother et al, CVPR 2009), (Komodakis and Paragios, CVPR 2009)
I Generalizes Pn-Potts model (Kohli et al., CVPR 2007)
I “Patterns have low energies”: Cy ≤ Cmax for all y ∈ P
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Lagrangian Relaxation
Pattern-based higher order potentials
Example (Pattern-based potentials)
Given a small set of patterns P = {y1, y2, . . . , yK}, define
θA(yA) =
{CyA if yA ∈ P,
Cmax otherwise.
Can solve
minyA∈YA
θA(yA) +∑
B⊂A,yB∈YB
I ([yA]B = yB)φA→B(yB)
by
1. Testing all yA ∈ P, and
2. Fixing θA(yA) = Cmax and solving for the minimum of the secondterm.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Constructing Relaxations
Principled methods to construct relaxations
I Linear Programming Relaxations
I Lagrangian relaxation
I Lagrangian/Dual decomposition
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Dual/Lagrangian DecompositionAdditive structure
miny
K∑
k=1
gk(y)
sb.t. y ∈ Yk , ∀k = 1, . . . ,K ,
such that∑K
k=1 gk(y) is hard, but for any k ,
miny
gk(y)
sb.t. y ∈ Yk
is tractable.
I (Guignard, “Lagrangean Decomposition”, MathProg 1987)
I (Conejo et al., “Decomposition Techniques in MathematicalProgramming”, 2006)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Dual/Lagrangian Decomposition
Idea
1. Selectively duplicate variables (“variable splitting”)
miny1,...,yK ,y
K∑
k=1
gk(yk)
sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,
y = yk : λk , k = 1, . . . ,K , (4)
where Y ′ ⊇ ⋂k=1,...,K Yk .
2. Add coupling equality constraints between duplicated variables
3. Apply Lagrangian relaxation to the coupling constraint
Also known as dual decomposition and variable splitting.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Dual/Lagrangian Decomposition (cont)
miny1,...,yK ,y
K∑
k=1
gk(yk)
sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,
y = yk : λk , k = 1, . . . ,K . (5)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Dual/Lagrangian Decomposition (cont)
miny1,...,yK ,y
K∑
k=1
gk(yk) +K∑
k=1
λk> (y − yk)
sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Dual/Lagrangian Decomposition (cont)
miny1,...,yK ,y
K∑
k=1
(gk(yk)− λk>yk
)+
(K∑
k=1
λk
)>y
sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,
I Problem is decomposed
I There can be multiple possible decompositions of different quality
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Dual/Lagrangian Decomposition (cont)
Dual function
q(λ) := miny1,...,yK ,y
K∑
k=1
(gk(yk)− λk>yk
)+
(K∑
k=1
λk
)>y
sb.t. y ∈ Y ′,yk ∈ Yk , k = 1, . . . ,K ,
Dual maximization problem
maxλ
q(λ)
sb.t.
K∑
k=1
λk = 0,
where {λ|∑Kk=1 λk 6= 0} is the domain where q(λ) > −∞.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Dual/Lagrangian Decomposition (cont)
Dual maximization problem
maxλ
q(λ)
sb.t.
K∑
k=1
λk = 0,
Projected subgradient method, using
∂
∂λk3 (y − yk)− 1
K
K∑
`=1
(y − y`) =1
K
K∑
`=1
y` − yk .
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
miny
E (y) + C (y),
where E (y) is a normal binary pairwise segmentation energy, and
C (y) =
{0 if marked points are connected,∞ otherwise.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
miny ,z E (y) + C (z)
sb.t. y = z .
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
miny ,z E (y) + C (z)
sb.t. y = z : λ.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
miny ,z E (y) + C (z) + λ>(y − z)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
q(λ) := miny ,z E (y) + λ>y︸ ︷︷ ︸Subproblem 1
+ C (z)− λ>z︸ ︷︷ ︸Subproblem 2
I Subproblem 1: normal binary image segmentation energy
I Subproblem 2: shortest path problem (no pairwise terms)
I → Maximize q(λ) to find best lower bound
I Above derivation simplified, see (Vicente et al., CVPR 2008) for amore sophisticated decomposition using three subproblems
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Example (Connectivity, Vicente et al., CVPR 2008)
q(λ) := miny ,z E (y) + λ>y︸ ︷︷ ︸Subproblem 1
+ C (z)− λ>z︸ ︷︷ ︸Subproblem 2
I Subproblem 1: normal binary image segmentation energy
I Subproblem 2: shortest path problem (no pairwise terms)
I → Maximize q(λ) to find best lower bound
I Above derivation simplified, see (Vicente et al., CVPR 2008) for amore sophisticated decomposition using three subproblems
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Other examples
Plenty of applications of decomposition:
I (Komodakis et al., ICCV 2007)
I (Strandmark and Kahl, CVPR 2010)
I (Torresani et al., ECCV 2008)
I (Werner, TechReport 2009)
I (Strandmark and Kahl, CVPR 2010)
I (Kumar and Koller, CVPR 2010)
I (Woodford et al., ICCV 2009)
I (Vicente et al., ICCV 2009)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Geometric Interpretation
Theorem (Lagrangian Decomposition Primal)
Let Yk be finite sets and g(y) = c>y a linear objective function. Thenthe optimal solution of the Lagrangian decomposition obtains the valueof the following relaxed primal optimization problem.
miny
K∑
k=1
gk(y)
sb.t. y ∈ conv(Yk), ∀k = 1, . . . ,K .
Therefore, optimizing the Lagrangian decomposition dual is equivalent to
I optimizing the primal objective,
I on the intersection of the convex hulls of the individual feasible sets.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Geometric Interpretation
Theorem (Lagrangian Decomposition Primal)
Let Yk be finite sets and g(y) = c>y a linear objective function. Thenthe optimal solution of the Lagrangian decomposition obtains the valueof the following relaxed primal optimization problem.
miny
K∑
k=1
gk(y)
sb.t. y ∈ conv(Yk), ∀k = 1, . . . ,K .
Therefore, optimizing the Lagrangian decomposition dual is equivalent to
I optimizing the primal objective,
I on the intersection of the convex hulls of the individual feasible sets.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Geometric Interpretation (cont)
conv(Y2)conv(Y1)
conv(Y1)∩conv(Y2)
yD
y∗
c>y
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Geometric Interpretation (cont)
conv(Y2)conv(Y1)
conv(Y1)∩conv(Y2)
yD
y∗
c>y
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Geometric Interpretation (cont)
conv(Y2)conv(Y1)
conv(Y1)∩conv(Y2)
yD
y∗
c>y
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Geometric Interpretation (cont)
conv(Y2)conv(Y1)
conv(Y1)∩conv(Y2)
yD
y∗
c>y
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Lagrangian Relaxation vs. Decomposition
Lagrangian Relaxation
I Needs explicit complicating constraints
Lagrangian/Dual Decomposition
I Can work with black-box solver for subproblem (implicit)
I Can yield stronger bounds
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Dual Decomposition
Lagrangian Relaxation vs. Decomposition
conv(Y2)conv(Y1)
conv(Y1)∩conv(Y2)
yD yR
y∗
c>y
Y ′1 ⊇ conv(Y1)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Polyhedral View on Higher-order Interactions
Idea
I MAP-MRF LP relaxation: LOCAL polytope
I Incorporate higher-order interactions directly by constraining LOCAL
I Techniques from polyhedral combinatorics
I See (Werner, CVPR 2008), (Nowozin and Lampert, CVPR 2009),(Lempitsky et al., ICCV 2009)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Representation of Polytopes
Convex polytopes have two equivalentrepresentations
I As a convex combination of extremepoints
I As a set of facet-defining linearinequalities
A linear inequality with respect to apolytope can be
I valid, does not cut off the polytope,
I representing a face, valid and touching,
I facet-defining, representing a face ofdimension one less than the polytope.
Z
d>1 y ≤ 1
d>2 y ≤ 1d>3 y ≤ 1
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Intersecting Polytopes
Construction
1. Consider marginal polytopeM(G ) of a discrete graphicalmodel
2. Vertices of M(G ) are feasiblelabelings
3. Add linear inequalities d>µ ≤ bthat cut off vertices
4. Optimize over resultingintersection M′(G ),
EA(µA) =
{0 if µ is valid,∞ otherwise.
M(G)
µ
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Intersecting Polytopes
Construction
1. Consider marginal polytopeM(G ) of a discrete graphicalmodel
2. Vertices of M(G ) are feasiblelabelings
3. Add linear inequalities d>µ ≤ bthat cut off vertices
4. Optimize over resultingintersection M′(G ),
EA(µA) =
{0 if µ is valid,∞ otherwise.
M(G)
µ
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Intersecting Polytopes
Construction
1. Consider marginal polytopeM(G ) of a discrete graphicalmodel
2. Vertices of M(G ) are feasiblelabelings
3. Add linear inequalities d>µ ≤ bthat cut off vertices
4. Optimize over resultingintersection M′(G ),
EA(µA) =
{0 if µ is valid,∞ otherwise.
M(G)
µ
d>µ ≤ b
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Intersecting Polytopes
Construction
1. Consider marginal polytopeM(G ) of a discrete graphicalmodel
2. Vertices of M(G ) are feasiblelabelings
3. Add linear inequalities d>µ ≤ bthat cut off vertices
4. Optimize over resultingintersection M′(G ),
EA(µA) =
{0 if µ is valid,∞ otherwise.
M′(G)
µ
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Defining the Inequalities
Questions
1. Where do the inequalities come from?
2. Is this construction always possible?
3. How to optimize over M′(G )?
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
1. Deriving Combinatorial Polytopes
I “Foreground mask must be connected”
I Global constraint on labelings
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
1. Deriving Combinatorial Polytopes
I “Foreground mask must be connected”
I Global constraint on labelings
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Constructing the Inequalities
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Constructing the Inequalities
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Constructing the Inequalities
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Constructing the Inequalities
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Constructing the Inequalities
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Constructing the Inequalities
y0 + y2 − y1 ≤ 1
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Generalization and Intuition
The following inequalities are violated by unconnected subgraphs:
yi + yj −∑
k∈Syk ≤ 1, ∀(i , j) /∈ E : ∀S ∈ S(i , j)
Intuition: “If two vertices i and j are selected (yi = yj = 1, shown inblack), then any set of vertices which separate them (set S) must containat least one selected vertex.”
i j
S
. . . . . . . . .. . .
Figure: Vertex i and j and one vertex separator set S ∈ S(i , j).
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Derivation, Formal StepsSteps from intuition to inequalities:
1. Identify invariants that hold for any “good” solution
2. Formulate invariants as linear inequalities
3. Prove that inequalities are facet-defining
4. Prove that together with integrality they are a formulation
Final result is a statement like:
TheoremFor a given graph G = (V ,E ), the set of all induced subgraphs that areconnected, can be described exactly by the following constraint set.
yi + yj −∑
k∈Syk ≤ 1,∀(i , j) /∈ E : ∀S ∈ S(i , j),
yi ∈ {0, 1}, i ∈ V .
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Derivation, Formal StepsSteps from intuition to inequalities:
1. Identify invariants that hold for any “good” solution
2. Formulate invariants as linear inequalities
3. Prove that inequalities are facet-defining
4. Prove that together with integrality they are a formulation
Final result is a statement like:
TheoremFor a given graph G = (V ,E ), the set of all induced subgraphs that areconnected, can be described exactly by the following constraint set.
yi + yj −∑
k∈Syk ≤ 1,∀(i , j) /∈ E : ∀S ∈ S(i , j),
yi ∈ {0, 1}, i ∈ V .
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
2. Is this always possible?
Yes, but...
I identifying and proving the inequalities is not straight-forward,
I even if all facet-defining inequalities are known, problem can remainintractable.
Further reading
I Polyhedral combinatorics literature (Schrijver, 2003), (Ziegler, 1994)
I There exist software to help identifying facet-defining inequalities,see e.g. polymake
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
3. How to optimize over M′(G )?
I LOCAL has a small number of inequalities,
I M′(G ) typically has an exponential number of inequalities,
I Optimization over M′(G ) is still tractable if separation problem canbe solved.
Definition (Separation problem)
Given a point y , prove that y ∈M′(G ) or produce a valid inequality thatis strictly violated.
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Example Separation for Connectivity
Need to show that for all (i , j) /∈ E with yi > 0, yj > 0:
yi + yj −∑
k∈Syk − 1 ≤ 0, ∀S ∈ S(i , j).
Equivalently, in variational form:
maxS∈S(i,j)
(yi + yj −
∑
k∈Syk − 1
)≤ 0. (5)
If (5) is satisfied, no violated inequality for (i , j) exist. Otherwise, the setcorresponding to the violated inequality is
S∗(i , j) = argminS∈S(i,j)
∑
k∈Syk . (6)
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Example Separation for Connectivity (cont)
Need to solve
S∗(i , j) = argminS∈S(i,j)
∑
k∈Syk
I Transform G into a directededge-weighted graph
I Solve minimum (i , j)-cut problem
I Equivalently: solve a linear max-flowproblem (efficient)
It follows that,
I by construction, a finite-capacity(i , j)-cut always exist,
I the cut found corresponds to a violatedinequality or proves absence.
i j
a b
c
∞
∞
∞
∞
∞
∞ ji
ya yb
yc
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Example
X pattern
5 10 15 20 25 30
5
10
15
20
25
30
Noisy X pattern
5 10 15 20 25 30
5
10
15
20
25
30
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
ExampleMRF result
5 10 15 20 25 30
5
10
15
20
25
30
MRFcomp result
5 10 15 20 25 30
5
10
15
20
25
30
CMRF result
5 10 15 20 25 30
5
10
15
20
25
30
Figure: MRF/MRFcomp/CMRF: E = −985.61, E = −974.16, E = −984.21MRF result
5 10 15 20 25 30
5
10
15
20
25
30
MRFcomp result
5 10 15 20 25 30
5
10
15
20
25
30
CMRF result
5 10 15 20 25 30
5
10
15
20
25
30
Figure: MRF/MRFcomp/CMRF: E = −980.13, E = −974.03, E = −976.83
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Bounding-box Prior
Example (Bounding Box Prior (Lempitsky et al., ICCV 2009))
I Prior: segmentation should be tight wrt bounding box
I Exponential number of linear inequalities
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference
Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions
Polyhedral Higher-order Interactions
Relaxation and Decomposition: Summary
Lagrangian relaxation
I requires: complicating constraints
I widely applicable
Problem decomposition
I requires: multiple tractable subproblems
I can make use of efficient algorithms for the subproblems
I widely applicable
Polyhedral higher-order interactions
I requires: combinatorial understanding of constraint set
I universal: can represent any constraint
I tractability depends on tractable separation oracle
Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference