an introduction to submodular functions and optimization
TRANSCRIPT
![Page 1: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/1.jpg)
An Introduction to Submodular
Functions and Optimization
Maurice Queyranne
University of British Columbia,
and IMA Visitor (Fall 2002)
IMA, November 4, 2002
![Page 2: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/2.jpg)
1. Submodular set functions
• Generalization to lattice functions
• Why are submodular functions interesting?
• Example: joint replenishment functions
in supply chain management
• Example: forests and trees
• Matroids rank functions (optional)
• Example: cut functions
2. Submodular polyhedra and greedy
algorithms
• Submodular polyhedra
• Submodular polyhedron greedy algorithm
• Example: the core of convex games
• Example: scheduling polyhedra 3. Minimiz-
ing submodular functions
• Ellipsoid approaches
• The Edmonds-Cunningham approach
• Column generation
• A least squares approach
• Recent, more combinatorial algorithms
• Open questions
1
![Page 3: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/3.jpg)
1. Submodular set functions:
Let N be a finite set (the ground set2N denotes the set of all subsets of N
A set function f : 2N 7→ R is• submodular iff for all A,B ⊆ N
f(A ∪B) + f(A ∩B) ≤ f(A) + f(B)
• supermodular iff −f is submodular• modular iff both sub- and supermodular
Generalization to lattice functions:
A lattice L is a a partially ordered set in whichany two elements a, b ∈ L have• a least common upper bound (join) a ∨ band• a largest common lower bound (meet) a ∧ b
A function f : L 7→ R is submodular ifffor all a, b ∈ L,
f(a ∨ b) + f(a ∧ b) ≤ f(a) + f(b)
2
![Page 4: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/4.jpg)
Why are submodular functions interesting?
Role somewhat similar to that played by
convex/concave functions in continuous opti-
mization:
• arise in many applications
• preserved by some useful operations
• lead to nice theory and structural results
• some related optimization problems can
be solved efficiently
• nice parametric and postoptimality properties
(not discussed in this talk)
3
![Page 5: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/5.jpg)
Lemma: When N is finite, the submodularityof f : 2N 7→ R is equivalent to:
• ∀A ⊂ B ⊂ N, ∀j ∈ N \B,f(A+ j)− f(A) ≥ f(B + j)− f(B)
(nonincreasing first differences,economies of scale, economies of scope),
and also to:
• ∀A ⊂ N, ∀i, j ∈ N \A,f(A+ j)− f(A) ≥ f(A+ i+ j)− f(A+ i)
(nonpositive second discrete differences,local submodularity)
where f(A+ j) stands for f(A ∪ {j})(assuming j 6∈ A).
Corollary (cardinality functions):If f(A) = g(|A|) for all A ⊆ N where g : N 7→ R
then f is submodular iff g is concave
4
![Page 6: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/6.jpg)
Example: joint replenishment functions insupply chain management
Ground set N is the set of all items beingstocked (stock keeping units, or skus)
Managing the inventory of these items impliesfinding an optimum tradeoff between• ordering costs and economies of scale
(which favor large, infrequent replenishments);and• holding costs and service requirements
(which favor small, frequent replenishments)
Ordering costs often include fixed costs,which depend only on the subset of jointlyordered items (and not their order quantities).
Economies of scope in• procurement,• order processing,• transportation,often imply that these joint replenishmentfixed costs are submodular set functions.
5
![Page 7: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/7.jpg)
Example: Forests and Trees
In a given graph G = (V,E), a subgraphS = (V, F ) is a forest iff it contains no cycle.
A (spanning) tree is a forest (V, F )with |F | = |V | − 1
Graphic rank function:• ground set N = E
• the rank of A ⊆ E isr(A) = max{|F | : F ⊆ A and (V, F ) is a forest}
Lemma: This rank function is submodular.
This graphic rank function arises informulating connectivity constraintsfor network design problems
The number of connected components in thesubgraph (V,A) is: nc(A) = |V | − r(A)⇒ it is a supermodular function
6
![Page 8: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/8.jpg)
Matroid rank functions (optional material)
A set system (N,F) is defined by• a finite ground set N , and• a family F ⊆ 2N of independent sets.
A set system (N,F) is a matroid iff(M1) ∅ ∈ F, and(M2) if X ⊆ Y ∈ F then X ∈ F(M3) if X,Y ∈ F and |X| > |Y |
then ∃e ∈ X \ Y such that Y + e ∈ F
Define the rank function of a set system (N,F):∀X ⊆ N, r(X) = max{|F | : X ⊇ F ∈ F}
Theorem: Let r : 2N 7→ N.The following are equivalent:(i) F = {F ⊆ N : r(F ) = |F |} defines a matroid
(N,F) and r is its rank function(ii) r satisfies, for all X,Y ⊆ N
(R1) r(X) ≤ |X|;(R2) if X ⊆ Y then r(X) ≤ r(Y ); and(R3) r is submodular.
7
![Page 9: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/9.jpg)
Example: Cut functionsLet G = (V,A) be a digraph,
with given arc capacities c(a), ∀a ∈ AHere, the ground set N = V .
• the cut (or coboundary) δ+(S) of S ⊆ V is:δ+(S) = {a = (i, j) ∈ A : i ∈ S and j 6∈ S}
• global cut function f : 2V 7→ R is defined byf(S) = c(δ+(S)) =
∑a∈δ+(S) c(a) ∀S ⊆ V
• given s 6= t ∈ V , the s, t-cut functionfst : 2Vst 7→ R is defined byfst(S) = f(S + s) ∀S ⊆ Vst = V \ {s, t}
Lemma: If c ≥ 0 then these cut functionsf and fst are submodular.
Proof: f(S) + f(T )− f(S ∩ T )− f(S ∪ T )= c(A(S : T )) + c(A(T : S))
where A(X : Y ) = {(i, j) ∈ A : i ∈ X, j ∈ Y }.QED
All these also apply to the undirected case.
8
![Page 10: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/10.jpg)
Cut functions arise in connection with many
combinatorial optimization models, in particu-
lar:
• network flows
• optimum selection of contingent investments
• design of an open pit mine
• precedence-constrained scheduling
Cut functions are also used in formulating
connectivity constraints, in particular for
• vehicle routing (e.g., TSP)
• network design
9
![Page 11: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/11.jpg)
2. Submodular polyhedra and Greedy al-gorithms
Given a finite set N and f : 2N 7→ R, let
P (f) = {x ∈ RN : x(A) ≤ f(A) ∀A ⊆ N}where x(A) =
∑{x(e) : e ∈ A}.
Remark: P (f) 6= ∅ iff f(∅) ≥ 0
Define P (f) to be a submodular polyhedroniff f is submodular.
A linear programming problem:Given:• a finite set N ,• a set function f : 2N 7→ R with f(∅) ≥ 0, and• a weight vector w ∈ RN ,solve
(LP) max{wx : x ∈ P (f)}
Remark: (LP) has bounded optimum iff w ≥ 0.
10
![Page 12: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/12.jpg)
Submodular Polyhedron Greedy Algorithm1. Sort the elements of N as
w(e1) ≥ w(e2) ≥ · · · ≥ w(en)2. Let V0 = ∅.
For i = 1 to n letVi = Vi−1+ei and xG(ei) = f(Vi)−f(Vi−1).
Theorem (Edmonds, 1971; Lovasz, 1983):The Submodular Polyhedron Greedy Algorithmsolves (LP) for all weight vectors w ≥ 0iff f is submodular.
• If, in addition, f is integral, thenthe greedy solution xG is integral
• An optimal dual solution is
y(A) =
w(ei)− w(ei+1) if A = Vi,
0 otherwise
where w(en+1) = 0.
⇒ when f is submodular, the constraintsdefining P (f) are totally dual integral.
11
![Page 13: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/13.jpg)
Corollary:When f is submodular, the base polytope
B(f) = P (f) ∩ {x ∈ RN : x(N) = f(N)}is non-empty.
Example: The core of convex gamesCooperative games:cost (or profit) allocation example:• how to share the cost of (or profit from)
a facility serving different “players”?
A (characteristic-function) game is a pair (N, f)where:• N is a set of players• f : 2N 7→ R is a set function such that f ≥ 0
and f(∅) = 0
• a subset S ⊂ N (S 6= ∅) is a possible coalition• f(S) is the cost if the facility is built
(and/or operated) only for the players in S• N is the grand coalition
• A game is convex if f is submodular12
![Page 14: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/14.jpg)
A cost allocation is a vector x ∈ RN such thatx(N) = f(N).
It is efficient (or stable) if no coalition hasany advantage to seceding:
x(S) ≤ f(S) for all S ⊂ N(subgroup rationality ; the special case whereS = {i} is individual rationality for player i)
The core is the set of all efficient allocations:
core(N, f) = {x ∈ RN : x(N) = f(N),
x(S) ≤ f(S) ∀S ⊂ N}= B(f)
Theorem (Shapley, 1971)• The core of a convex game is a nonempty
polytope.• Its extreme points are obtained by the
Submodular Polyhedron Greedy Algorithm.
Many other related solution concepts exist.
13
![Page 15: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/15.jpg)
Example: Scheduling PolyhedraGiven a machine, which can process at mostone task at a time; andn jobs J1, . . . , Jn to be processed, each• available at date 0,• with given processing time pj > 0(and possibly other constraints)
A feasible schedule S induces a vector
CS = (CS1 , CS2 , . . . , C
Sn) ∈ RN
where Cj is the completion time of job Jj in S.
The performance region is the set ofall such feasible completion time vectors CS.• a non-connected set (e.g., without
additional constraints, it is the unionof n! disjoint affine cones)
We are often interested in minimizing a linear(or concave) function of CS
⇒ it suffices to consider the convex hull ofthe performance region.
14
![Page 16: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/16.jpg)
The ground set is the job set N .For A ⊆ N , define
g(A) =1
2
∑j∈A
pj
2
+1
2
∑j∈A
p2j
Lemma: g is a supermodular set function
Theorem (Wolsey, 1985; Q, 1993):(i) The supermodular polyhedron
Q(g) = {C ∈ RN :∑j∈A
pjCj ≥ g(A) for all A ⊆ N}
is the convex hull of the performance region.
(ii) Its base polytope
Q(g) ∩ {C ∈ RN :∑j∈N
pjCj = g(N)}
is that of the performance region restrictedto all feasible schedules without idle time.
Remark: These results apply to bothpreemptive and nonpreemptive schedules.
15
![Page 17: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/17.jpg)
3. Minimizing Submodular Functions
minimize{f(S) : S ⊆ N}where f : 2N 7→ R is submodular
• equivalently, we may want to max imizea supermodular function
• w.l.o.g., we may assume that f is normalizedi.e., f(∅) = 0.
Example: the separation problem fora submodular polyhedron
Given• a submodular set function f : 2N 7→ R
• a point x ∈ RNdecide whether or not x ∈ P (f)and, if not, find a violated inequality.
A most violated inequality is defined by S ⊆ Nmaximizing the violation vx(x) = x(S)− f(S)• vx is a supermodular function
16
![Page 18: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/18.jpg)
How to input a submodular set function?
Assume that f is given by a value oracle, i.e.,a “black box” which, for any input subset Areturns the value f(A)
• let FE be an upper bound on the time neededfor such a function evaluation
Theorem (Grotschel, Lovasz & Schrijver, 1981)A submodular set function can be minimizedin strongly polynomial time(i.e., in time polynomial in |N | and FE)
Remark: related problems can be difficult, e.g.,• max imizing a submodular function f , or• minimizing a submodular function f , subject
to a cardinality constraint, such as|A| = b, or |A| ≥ b, or |A| ≤ b
are NP-hard when f is a cut function.Applications: • facilities location
• entropy maximization
17
![Page 19: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/19.jpg)
Extensions
A submodular function on a product lattice
(or a sublattice thereof) can be minimized
using a pseudopolynomial number of
submodular set function minimizations
(Q and Tardella, 1992)
• pseudopolynomial is best possible for
such problems
The discrete submodular resource allocation
problem with separable concave profit
function,
(RA) max{n∑
j=1
gj(xj) : x ∈ P (f) ∩ NN}
can be solved in polynomial time
(Federgruen and Groenevelt, 1986)
Other types of discretely convex functions can
also be minimized in polynomial time
(Favati and Tardella; Murota, Shioura)
18
![Page 20: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/20.jpg)
Ellipsoid approaches(Grotschel, Lovasz & Schrijver, 1981, 1988)
1. (Polynomial equivalence of Optimizationand Separation problems):
The Optimization problem on submodular poly-hedra can be solved in P-time (Greedy Alg.)⇒ the Separation problem can be solved
in P-time (by the Ellipsoid method)
Let α ≤ 0 be a trial value for f(S∗) = min f(S)and define fα : 2N 7→ R by
fα(A) =
f(A)− α if A 6= ∅;0 otherwise
Lemma: fα is submodular when α ≤ 0
• Solve the Separation problem for P (fα)with x = 0. (Note: 0 ∈ P (fα) iff f(S∗) ≥ α)
• Use binary search for the value α = f(S∗)over an interval [LB, 0]where LB is any lower bound on f(S∗)e.g., LB =
∑e [f(N)− f(N \ {e})]
19
![Page 21: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/21.jpg)
2. Lovasz’s extension f of f to RN+:
f(w) = max{wx : x ∈ P (f)} for all w ∈ RN , w ≥ 0}(Note: non-standard definition!)
For any set function f , the function f• is piecewise linear convex• is (positively) homogeneous:
f(λw) = λf(w) for all λ ≥ 0 (λ ∈ R)
Lemma: When f is submodular, f• is evaluated in P-time (Greedy Alg.)• coincides with f at all 0-1 points:
f(S) = f(χS) for all S ⊆ N• has its minimum over the unit cube [0,1]E
attained at a vertex
⇒ f can be minimized over the unit cubein P-time (by the Ellipsoid method)
3. Strongly P-time algorithm in GLS (1988)uses O(|N |4) f-function evaluations
How about a P-time combinatorial algorithm?
20
![Page 22: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/22.jpg)
The Edmonds-Cunningham approach
Lemma: Let u∗ be an optimum solution tothe LP
z∗ = min∑S⊆N
f(S) uS
s.t.∑
S:e∈SuS ≤ 1 ∀e ∈ N
u ≥ 0
If f is submodular and normalized thenS∗ =
⋃{S : u∗S > 0} minimizes f(S).
The dual problem is:
(S − LP ) max∑e∈N
xe
s.t. x(S) ≤ f(S) ∀S ⊆ Nx ≤ 0
Cunningham (1984, 1985):• a pseudopolynomial time algorithm
polynomial in |N | and f(S∗)
⇒ strongly P-time for matroid rank functions
21
![Page 23: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/23.jpg)
Column generation approachThe extreme points b1, . . . , bk of P (f) can begenerated by the Greedy Algorithm, and
P (f) = conv{b1, . . . , bk}+ RN−
Applying Dantzig-Wolfe decomposition:
(DW ) max∑u∈N
xv
s.t. x ≤k∑i=1
λibi
k∑i=1
λi = 1
λ ≥ 0, x ≤ 0
• the column generation subproblem is solvedby the Greedy Algorithm
• “greedy” column generation is equivalentto Successive Linear Programming (SLP)applied to the Lovasz extension problem
min{f(x) : 0 ≤ x ≤ 1}22
![Page 24: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/24.jpg)
A least squares approach
Theorem (Fujishige, 1984)
If x∗ solves the least squares problem
(LS) max∑e∈N
x2e
s.t. x(S) ≤ f(S) ∀S ⊆ Nx ≤ 0
then
• f(S∗) = x∗(N)
• S∗ is the largest tight set
i.e., the largest set S such that x∗(S) = f(S)
23
![Page 25: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/25.jpg)
Recent, more combinatorial algorithms
Recent primal feasible algorithms for (S-LP):
• a convex representation of current solution x
• augmenting paths, and
• exchange capacities in strongly P-time
Schrijver (1999):
• “short” augm. paths in a layered network
• lower bounds on exchange capacities
• removing selected arcs from augm. network
Iwata, Fleischer and Fujishige (IFF, 1999-2000):
• scaling approach
• relaxation of f with a scaled penalty
• “fat” augmenting path analysis
(see Fleischer, OPTIMA 64, Sept. 2000; and
McCormick’s chapter to appear in Handbook
of Discrete Optimization)
24
![Page 26: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/26.jpg)
Iwata (2000):a “fully combinatorial” variant of IFF• uses only additions, subtractions and
comparisons (no multiplications or divisions)• O((n9 log2 n)FE + n11 log2 n) running time
Iwata (2001): Hybrid algorithm• combines ideas from Schrijver’s algorithm
into IFF• O(n8FE + n9) running time
Open questions
How low can we reduce the running time?
Can we find (nontrivial) lower bounds• on the running time of any submodular
minimization algorithm?• on the number of function evaluations
needed to minimize a submodular functiongiven by a value oracle?
25
![Page 27: An Introduction to Submodular Functions and Optimization](https://reader031.vdocuments.net/reader031/viewer/2022020703/61fb2ac12e268c58cd5af1df/html5/thumbnails/27.jpg)
A few selected references
The following textbooks have a chapter on submodular set func-tions and optimization:
B. Korte & J. Vygen, Combinatorial Optimization: Theory andAlgorithms, Springer, 2002.
G.L. Nemhauser & L.A. Wolsey, Integer and Combinatorial Opti-mization, Wiley 1988, 1999.
Monographs on sub/supermodularity:
S. Fujishige, Submodular Functions and Optimization, North-Holland,1991.
H. Narayanan, Submodular Functions and Electrical Networks, North-Holland, 1997.
D.M. Topkis, Supermodularity and Complementarity, PrincetonUniv. Press, 2001.
Recent reviews on submodular set function minimization:
L. Fleischer, “Recent Progress in Submodular Function Minimiza-tion”, OPTIMA 64 (September 2000) 1–11.http://www.ise.ufl.edu/∼optima/optima64.pdf
S.T. McCormick, “Submodular Function Minimization”, to appearin Handbook of Combinatorial Optimization.([email protected])
A recent application in statistical mechanics:
J.-Ch. Angles d’Auriac, F. Igloi, M. Preissmann & A. Sebo, “Opti-mal cooperation and submodularity for computing Potts’ partitionfunctions with a large number of states”, J. Phys. A: Math. Gen.35 (2002) 6973–6983.
26