optimization under uncertainty: a dissertationkb071qr2204/... · in scope and quality as a...
TRANSCRIPT
OPTIMIZATION UNDER UNCERTAINTY:
BOUNDING THE CORRELATION GAP
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Shipra Agrawal
May 2011
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/kb071qr2204
© 2011 by Shipra Agrawal. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Yinyu Ye, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Timothy Roughgarden, Co-Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Ashish Goel
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Amin Saberi
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
Preface
Modern decision models increasingly involve parameters that are unknown or un-
certain. Uncertainty is typically modeled by probability distribution over possible
realizations of some random parameters. In presence of high dimensional multivari-
ate random variables, estimating the joint probability distributions is difficult, and
optimization models are often simplified by assuming that the random variables are
independent. Although popular, the effect of this heuristic on the solution quality
was little understood. This thesis centers around the following question:
“How much can the expected cost increase if the
random variables are arbitrarily correlated?”
We introduce a new concept of Correlation Gap to quantify this increase. For given
marginal distributions, Correlation Gap compares the expected value of a function
on the worst case (expectation maximizing) joint distribution to its expected value
on the independent (product) distribution.
Correlation gap captures the “Price of Correlations” in stochastic optimization –
using a distributionally robust stochastic programming model, we show that a small
correlation gap implies that the efficient heuristic of assuming independence is actu-
ally robust against any adversarial correlations, while a large correlation gap suggests
that it is important to invest more in data collection and learning correlations. Apart
from decision making under uncertainty, we show that our upper bounds on correla-
tion gap are also useful for solving many deterministic optimization problems like wel-
fare maximization, k-dimensional matching and transportation problems, for which
it captures the performance of randomized algorithmic techniques like independent
random selection and independent randomized rounding.
iv
Our main technical results include upper and lower bounds on correlation gap
based on the properties of the cost function. We demonstrate that monotonicity and
submodularity of function implies a small correlation gap. Further, we employ tech-
niques of cross-monotonic cost-sharing schemes from game theory in a novel manner
to provide a characterization of non-submodularity functions with small correlation
gap. Results include small constant bounds for cost functions resulting from many
popular applications such as stochastic facility location, Steiner tree network design,
minimum spanning tree, minimum makespan scheduling, single-source rent-or-buy
network design etc. Notably, we show that for many interesting functions, correlation
gap is bounded irrespective of the dimension of the problem or type of marginal dis-
tributions. Additionally, we demonstrate the tightness of our characterization, that
is, small correlation gap of a function implies existence of an “approximate” cross-
monotonic cost-sharing scheme. This observation could also be useful for enhancing
the understanding of such schemes, and may be of independent interest.
v
Acknowledgments
I am deeply indebted to my advisor Prof. Yinyu Ye, for his enthusiasm, inspiration
and direction. This thesis would not have been possible without his support and
encouragement. Besides my advisor, my sincere thanks also goes to Prof. Amin
Saberi and Nimrod Megiddo for their mentoring and guidance during my doctoral
studies.
I would like to thank the rest of my dissertation reading and orals committee:
Prof. Tim Roughgarden, Prof. Ashish Goel, and Prof. Persi Diaconis, for their time,
insightful questions, and constructive comments.
I thank my friends, colleagues, and coauthors: Benjamin Armbruster, Erick De-
lage, Yichuan Ding, Anthony Man-Cho So, and Zizhuo Wang, for stimulating discus-
sions and valuable collaborations.
I wish to thank my entire family, my parents, sisters, parents-in-law, brothers-in-
law, sisters-in-law, for providing a loving and encouraging environment for me.
Last but not the least, I wish to thank my husband and best friend Piyush, for
helping me get through the difficult times and make most of the good times. This
journey would not have been so rewarding without his love and support. To him I
dedicate this thesis.
vi
Contents
Preface iv
Acknowledgments vi
1 Introduction 1
1.1 Correlation Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Price of Correlations (POC ) in Stochastic Optimization . . . . . . . . 3
1.3 Bounding POC via Correlation Gap . . . . . . . . . . . . . . . . . . 6
2 Upper bounds 8
2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Submodularity . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Cross-monotone cost-sharing property . . . . . . . . . . . . . 10
2.1.3 Cross-monotone cost-sharing with a prefix property . . . . . . 14
2.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Proof for binary random variables . . . . . . . . . . . . . . . . 16
2.2.2 Proof for finite domains . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 Proof for countably infinite domains . . . . . . . . . . . . . . 24
2.2.4 Proof for the uncountable domains Ω ⊆ Rn . . . . . . . . . . . 28
3 Applications 32
3.1 Approximation of Distributionally Robust Stochastic Optimization . 32
3.1.1 Stochastic Uncapacitated Facility Location (SUFL) . . . . . . 35
3.1.2 Stochastic Steiner Tree (SST) . . . . . . . . . . . . . . . . . . 36
vii
3.1.3 Stochastic bottleneck matching . . . . . . . . . . . . . . . . . 37
3.2 Deterministic Optimization . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 d-dimensional maximum matching . . . . . . . . . . . . . . . 37
3.2.2 Welfare maximization . . . . . . . . . . . . . . . . . . . . . . . 40
4 Lower bounds 44
4.1 Lower bounds by examples . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Supermodular functions . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Subadditive functions . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.3 Submodular functions . . . . . . . . . . . . . . . . . . . . . . 47
4.1.4 Uncapacitated metric facility location . . . . . . . . . . . . . . 48
4.1.5 Steiner forest . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Tightness of cost-sharing condition . . . . . . . . . . . . . . . . . . . 50
5 Conclusions 53
Bibliography 54
A Proof of Theorem 4 59
B Proof for binary random variables (details) 62
B.1 Properties of Split Operation . . . . . . . . . . . . . . . . . . . . . . 62
B.2 Handling irrational probabilities . . . . . . . . . . . . . . . . . . . . . 67
C Proof for finite domains (details) 69
D Maximum of Poisson Random Variables 73
viii
List of Tables
ix
List of Figures
4.1 Example with an exponential correlation gap . . . . . . . . . . . . . . 45
x
Chapter 1
Introduction
In many planning problems, it is crucial to consider correlations 1 among individual
events. For example, an emergency service (medical services, fire rescue, etc.) plan-
ner needs to carefully locate emergency service stations and determine the number
of emergency vehicles that need to be maintained in order to dispatch vehicles to the
call points in time. If the planner assumes emergency calls are rare and independent
events, he simply needs to make sure that every potential call point is in service range
of at least one station; however, there might exist certain kind of dependence between
those rare events, so that the planner cannot ignore the chance of simultaneous oc-
currences of those emergency events. The underlying correlations, possibly caused by
some common trigger factors (e.g., weather, festivals), are often difficult to predict or
analyze, which makes the planning problem complicated. Other examples include the
portfolio selection problem, in which the risk averse investor has to take into account
the correlations among multiple risky assets as well as their individual performances,
and the stochastic facility location problem, in which the supplier needs to consider
the correlations between demands from different retailers.
As these examples illustrate, information about correlations can be crucial for
operational planning, especially for large system planning. However, estimating the
joint distribution in presence of correlations is usually difficult, and much harder than,
1Here, “correlation” refers to any departure of two or more random variables from probabilisticindependence.
1
CHAPTER 1. INTRODUCTION 2
for example, estimating marginal distributions. Reasons for this include the huge
sample size required to characterize joint distribution accurately, and the practical
difficulty of retrieving centralized information, e.g., the retailers may only be able
to provide statistics about the demand for their own products. In presence of only
the marginal distribution information, a common heuristic is to assume that the
involved random variables are independent, and thus substitute the joint distribution
by independent (product) distribution with the given marginals. Such an assumption
not only simplifies the task of sampling and estimation, but also enriches the structure
of optimization problems under uncertainty, leading to efficient solution techniques
(e.g., see Kleinberg et al. (1997), Mohring et al. (1999)). However, the effect of such
an heuristic on the solution quality is little understood.
In this work, we evaluate the effectiveness of independence assumption by intro-
ducing new concepts of “Correlation Gap” and “Price of Correlations”. For given
marginal distributions, Correlation Gap compares the expected value of a function
on independent distribution to its expected value on the worst-case (expectation-
maximizing) distribution. Price of Correlations quantifies the robustness of the solu-
tion for an optimization problem under uncertainty that was obtained assuming in-
dependence. A small Correlation Gap of the objective function for all fixed decisions
will imply that Price of Correlations is small, i.e., the decision obtained assuming
independence is almost as robust as the most robust solution for the optimization
problem. Below, we give precise definitions.
1.1 Correlation Gap
Let ξ = (ξ1, . . . , ξn) denote an n-dimensional random vector taking values in Ω =
Ω1 ×· · ·×Ωn. For each i, we are given pi, a probability measure over Ωi. In practice,
for large domains the marginal distributions could be available explicitly in closed
parametric form or as a black box sampling oracle.
Denote by P , the collection of all multivariate probability measures p over Ω such
CHAPTER 1. INTRODUCTION 3
that each component ξi has a fixed marginal distribution pi. That is,
P =
p :
∫
Ω
I(ξi = θi)dp(ξ) = pi(θi), ∀θi ∈ Ωi, i = 1, . . . , n
, (1.1)
where I(·) denotes the indicator function. Let p ∈ P denote the product distribution,
i.e., p(ξ) =∏
i pi(ξi),∀ξ.
Definition 1. Correlation Gap of an instance (f, Ω, pi) is defined as the ratio
κ:
κ = supp∈P
Ep[f(ξ)]
Ep[f(ξ)],
where P is the set of distributions given by Equation (1.1), p is the product distribution
with given marginals. We redefine κ = 1 if Ep[f(ξ)] = supp∈P Ep[f(ξ)] = 0, or if
Ep[f(ξ)] = ∞.
Definition 2. Correlation gap of a function f : Ω → R+ is κf if the correlation
gap of every instance (f, Ω, pi) is bounded by κf . That is,
κf = suppi
supp∈P(pi)
Ep[f(ξ)]
Ep[f(ξ)].
1.2 Price of Correlations (POC ) in Stochastic Op-
timization
We define Price of Correlations (POC ) to quantify robustness of independence as-
sumption in optimization under uncertainty. Decision making under uncertainty is
usually investigated in context of Stochastic Programming (SP) (e.g., see Ruszczyn-
ski and Shapiro (2003) and references therein). In SP, the decision maker optimizes
expected value of an objective function that involves random parameters. In general,
a stochastic program is expressed as
(SP) minimizex∈C E[h(x, ξ)], (1.2)
CHAPTER 1. INTRODUCTION 4
where x is the decision variable constrained to lie in set C, and the random variable
ξ ∈ Ω cannot be observed before the decision x is made. The cost function h(x, ξ)
depends on both the decision x ∈ C and the random vector ξ ∈ Ω. If underlying dis-
tribution of random variables is unknown, then the decision maker needs to estimate it
either via a parametric approach, which assumes the distribution has a certain closed
form and fits its parameters by empirical data, or via a non-parametric approach,
e.g., the Sample Average Approximation (SAA) method (e.g., Ahmed and Shapiro
(2002), Ruszczynski and Shapiro (2003), Swamy and Shmoys (2005), Charikar et al.
(2005)), which optimizes the average objective value over a set of samples. However,
these models are suitable only when one has access to a significant amount of reliable
time-invariant statistical information. If the available samples are insufficient to fit
the parameters of the distribution or to accurately estimate the expected value of the
cost function, then SP fails to address the problem.
One alternate approach is to instead optimize the worst-case outcome, which is
usually easier to characterize than estimating the joint distribution. That is,
(RO) minimizex∈C maximizeξ∈Ω h(x, ξ). (1.3)
Such a method is termed Robust Optimization (RO) following the recent literature
(e.g., Ben-Tal and Nemirovski (1998, 2000), Ben-Tal (2001)). However, such a robust
solution is often too pessimistic compared to SP (e.g., see Ben-Tal and Nemirovski
(2000), Bertsimas and Sim (2004), Chen et al. (2007)) because the worst-case scenario
can be very unlikely. In particular, this model does not utilize any available or easy to
estimate information about the distribution, such as marginal distributions of random
variables.
An intermediate approach that may address the limitations of SP and RO is
distributionally-robust stochastic programming (DRSP). In this approach, one min-
imizes the expected cost over the worst joint distribution among all probability dis-
tributions consistent with the available information. That is,
(DRSP) minimizex∈C maximizep∈P Ep[h(x, ξ)], (1.4)
CHAPTER 1. INTRODUCTION 5
where P is the collection of possible probability distributions on Ω consistent with
the marginal distribution information (refer to Equation (1.1)), and for any x ∈ C,
Ep[h(x, ξ)] denotes the expected value of h(x, ξ) over a distribution p on ξ. The
DRSP model can be interpreted as a two-person game. The decision maker chooses
a decision x hoping to minimize the expected cost, while the nature adversarially
chooses a distribution p from the collection P to maximize the expected cost of the
decision.
Our model for characterizing price of correlations is based on this distributionally
robust model of optimization. Given a problem instance (h, Ω, pi), let xI be the
optimal solution of stochastic program (1.2) assuming independent (product) distri-
bution, and xR be the optimal decision for the DRSP problem (1.4). Then, Price of
Correlations (POC ) compares the performance of xI to xR.
Definition 3. Given a problem instance (h, Ω, pi), where h : C × Ω → R+. For
any x ∈ C, define
g(x) = supp∈P
Ep[h(x, ξ)]. (1.5)
Let xI = arg minx∈C Ep[h(x, ξ)], xR = arg minx∈C g(x). Then Price of Correla-
tions (POC ) is defined as:
POC =g(xI)
g(xR). (1.6)
We redefine POC = 1, if g(xI) = g(xR) = 0, or if g(xR) = ∞.
Here R+ denotes the set of non-negative real numbers. Note that POC ≥ 1,
and POC = 1 corresponds to the case where a stochastic program with product
distribution yields the same result as the DRSP or minimax approach. A small
upper bound on POC would indicate that the optimal solution obtained assuming
independence is almost as robust as the most robust solution.
In many real data collection scenarios, practical constraints can make it very dif-
ficult (or costly) to learn the complete information about correlations in data. In
lack of sufficient data, a widely adopted strategy in practice is to use independent
distribution as a simple substitute for joint distribution. DRSP provides an alternate
optimization based approach to this problem promoting worst case distribution under
CHAPTER 1. INTRODUCTION 6
given marginals as a substitute to the joint distribution. However, a practitioner may
doubt that the worst case distribution is very pessimistic for the problem at hand.
We believe that POC provides a conceptual understanding of the value of correla-
tions in a decision problem involving uncertainties. It quantifies the gap between the
two approaches of assuming no correlations and assuming worst case correlations,
providing a guiding principle for the decision maker. A small POC indicates that the
solution obtained assuming independence is reasonably robust against correlations.
On the other hand, if POC is very high, and from experience a practitioner expects
the involved random variables to be not very correlated, she may decide that for this
problem the DRSP approach is indeed very pessimistic, and it is essential to invest
in learning the joint distribution. In this case, even getting poor estimates of corre-
lations may be a better approach than ignoring the correlations or assuming worst
case.
1.3 Bounding POC via Correlation Gap
It is easy to show that a uniform bound on correlation gap for all x will bound POC .
Let κ(x) denote correlation gap of function h(x, ξ) at x, i.e.,
κ(x) = supp∈P
Ep[h(x, ξ)]
Ep[h(x, ξ)],
Let for all feasible x,
κ(x) ≤ κ.
Then,
g(xI) = supp∈P
Ep[h(xI , ξ)],
g(xR) = supp∈P
Ep[h(xR, ξ)] ≥ Ep[h(xR, ξ)] ≥ Ep[h(xI , ξ)],
and hence,
POC =g(xI)
g(xR)≤ sup
p∈P
Ep[h(xI , ξ)]
Ep[h(xI , ξ)]= κ(xI) ≤ κ.
CHAPTER 1. INTRODUCTION 7
The rest of the thesis is organized as follows. In Chapter 2, we give formal state-
ments of our main results on upper bounding correlation gap (and POC ), and provide
rigorous proofs. In Chapter 3, we discuss applications of our upper bounds for stochas-
tic and deterministic optimization problems. In Chapter 4, we provide lower bounds
to demonstrate tightness of our upper bounds.
Chapter 2
Upper bounds
2.1 Results
In this section, we present upper bounds on correlation gap based on the properties
of function f . Upper bound on POC will be derived as a corollary for corresponding
stochastic optimization problems.
We assume that there is a complete ordering, denoted by ≤, on each Ωi, i = 1, ..., n.
We also use ≤ to denote the induced product order on Ω. That is, for any ξ,θ ∈ Ω,
ξ ≤ θ iff ξi ≤ θi, for all i. Also, for any ξ,θ ∈ Ω, let supξ,θ and infξ,θ denotes
supremum and infimum, respectively, of the two vectors taken with respect to the
partial order ≤ on Ω. That is, (supξ,θ)i = maxξi, θi, (infξ,θ)i = minξi, θi,for all i.
2.1.1 Submodularity
Our first result is that Correlation Gap (and POC ) has a small upper bound for
monotone submodular functions. Submodular functions are defined as follows:
Definition 4. A function f(ξ) : Ω → R is submodular iff
f(supξ,θ) + f(infξ,θ) ≤ f(ξ) + f(θ), ∀ξ,θ ∈ Ω. (2.1)
8
CHAPTER 2. UPPER BOUNDS 9
Property (2.1) also appears as Monge property for matrices in the literature. An
n-dimensional matrix M with k columns in each dimension is a Monge matrix iff
the function f : 1, . . . , kn → R defined to take values in the corresponding cells of
the matrix M , is submodular. For functions of binary variables f : 2V → R, where
V = 1, . . . , n, submodularity condition is equivalent to,
f(S ∪ i) − f(S) ≥ f(T ∪ i) − f(T ), ∀S ⊆ T ⊆ V, i /∈ T. (2.2)
A continuous differentiable function f : Rn → R is submodular iff
∇2ijf(ξ) ≤ 0, ∀i, j, (2.3)
where ∇2f denotes the Hessian matrix of f .
Some popular examples of submodular functions of binary variables are rank func-
tion of matroids, and information measures like entropy, symmetric mutual informa-
tion, information gain measure in machine learning . Examples of continuous func-
tions that satisfy this property are f(ξ) = maxi ξi, and Lq norm f(ξ) = ||ξ||q, when
q ≥ 1 and ξ ∈ Rn+.
Intuitively, submodularity of a utility function corresponds to decreasing marginal
utilities – the marginal utility of an item decreases as the set of other items increases.
The notion of submodularity is very similar to the property of gross-substitutes, which
is however a stronger property in the sense that a utility function that satisfies gross
substitutes property is always submodular (see Gul and Stacchetti (1999), Kelso Jr
and Crawford (1982)).
We will prove the following theorem:
Theorem 1. Correlation gap of a non-negative, monotone, and submodular function
f is at most e/(e − 1).
Corollary 1. If the cost function h(x, ξ) is non-negative, monotone, and submodular
in ξ for all feasible x, then for any instance (h, Ω, pi), POC ≤ e/(e − 1).
For the special case of Ω = 0, 1n, above result for submodular functions can also
CHAPTER 2. UPPER BOUNDS 10
be derived from a result in Calinescu et al. (2007). However, our more general result
makes no assumption on Ωis, and holds for general domains, e.g., for continuous case
of Ω = Rn and discrete case of Ω = N
n. To our understanding, the technique in
Calinescu et al. (2007) can not be easily applied to prove these general cases.
2.1.2 Cross-monotone cost-sharing property
Unfortunately, the condition of submodularity can be quite restrictive in practice.
Many popular applications such as stochastic facility location, stochastic Steiner tree
network design, stochastic scheduling, involve cost functions that are subadditive but
not submodular in the random variables. And, we demonstrate in the Section 4 that
there exist examples in the class of monotone subadditive (or fractionally subaddi-
tive) functions such that correlation gap (and POC ) can be arbitrarily large for large
n. Therefore, it is apparent that in order to obtain interesting upper bounds on cor-
relation gap we need a different characterization of functions; a property that relaxes
submodularity but is more restrictive than subadditivity or fractional subadditivity.
We will derive our characterization using concept of cross-monotone cost-sharing
schemes from game theory. Before we describe this property, we further motivate
the need for this characterization by demonstrating that other natural relaxations of
submodularity are bound to fail for this problem. For simplicity, consider functions of
binary variables, f : 2V → R, V = 1, . . . , n. Submodularity is equivalently defined
for such functions by either of the following two inequalities.
f(A ∪ B) + f(A ∩ B) ≤ f(A) + f(B), ∀A,B ⊆ V (2.4)
f(T ∪ i) − f(T ) ≤ f(S ∪ i) − f(S), ∀S ⊆ T ⊆ V (2.5)
First, consider defining a notion of “approximately submodular” functions as func-
tions that satisfy the inequality in Equation (2.4) within a factor of β. We would hope
to obtain a class of functions with small correlation gap as the class of functions that
have small constant value of β. However, note that for any monotone subadditive
CHAPTER 2. UPPER BOUNDS 11
function,
f(A ∪ B) + f(A ∩ B) ≤ f(A) + f(B) +f(A)
2+
f(B)
2=
3
2(f(A) + f(B))
Therefore, β ≤ 3/2 for all monotone subadditive functions. If correlation gap could be
bounded by O(β) for this class of functions, then all monotone subadditive functions
would have constant correlation gap upper bound, which is a contradiction. Thus,
this relaxation is too loose.
Alternatively, consider relaxing the inequality in Equation (2.5) by a factor of
β. We show that this relaxation is not loose enough to include many interesting
non-submodular functions. In particular, it is easy to construct instances of facility
location and network design problems where β can be arbitrarily large. As an exam-
ple, consider a facility location instance with two facilities F1, F2. Assume that both
facilities are extremely expensive, so that an optimal solution will always contain only
one of the two facilities. Let F1, F2 be at distance L from each other. Let c1, c2 be
clients located very close (distance ǫ) to facility F1, c3, c4 be clients located very close
(distance ǫ) to facility F2. Let S = c1, T = c3, c4. Then f(S ∪ c2) − f(S) = ǫ,
f(T ∪ c2) − f(T ) = L, so that β = L/ǫ, which can be made arbitrarily large by
increasing L or decreasing ǫ.
Above discussion demonstrates that directly relaxing the conditions for submod-
ularity are not likely to yield a useful notion of “approximate submodularity” for our
purpose. We propose using the concept of cross-monotone cost-sharing to extend the
correlation gap bounds to a large and interesting class of non-submodular functions.
A cost-sharing scheme refers to a scheme for dividing the cost f(ξ) among the n
components.
Definition 5. Given total cost f(ξ) of servicing ξ ∈ Ω, a cost allocation is a function
Ψ : [n] → R+, which for every i = 1, . . . , n specifies the share Ψ(i) of i in the total cost
f(ξ). A cost-sharing scheme χ : [n] × Ω → R+ is a collection of cost allocations
for all ξ ∈ Ω.
We will often use a more compact notation χi(ξ) to denote the cost-share χ(i, ξ)
of i in f(ξ).
CHAPTER 2. UPPER BOUNDS 12
Ideally, we want the cost-sharing scheme (and corresponding cost-allocations) to
be budget balanced, that is,∑n
i=1 χi(ξ) = f(ξ) for all ξ. However, it is not always
possible to achieve budget balance in combination with other properties. So, a relaxed
notion of β-budget-balanced cost-sharing scheme is often considered in literature.
Definition 6. A cost-sharing scheme χ is β-budget-balanced if
f(ξ)
β≤
n∑
i=1
χi(ξ) ≤ f(ξ), ∀ξ ∈ Ω.
Since we are only interested in expected values, we will use a further relaxation of
this concept where we require the upper bound in budget-balance condition to hold
only in expectation. We define this property as follows.
Definition 7. A cost-sharing scheme χ is β-budget-balanced in expectation with
respect to a distribution p over Ω if
f(ξ)
β≤
n∑
i=1
χi(ξ), ∀ξ ∈ Ω, and Ep[n∑
i=1
χi(ξ)] ≤ Ep[f(ξ)]
For our characterization of functions with small correlation gap, we are inter-
ested in cost-sharing schemes with additional properties of cross-monotonicity. Cross-
monotonicity (or population monotonicity) was studied by Moulin (1999) and Moulin
and Shenker (2001) in order to design group-strategyproof mechanisms, and has re-
cently received considerable attention in the computer science literature (see, for
example, Pal and Tardos (2003), Mahdian and Pal (2003), Konemann et al. (2005),
Immorlica et al. (2008), Nisan et al. (2007) and references therein). This property
captures the notion that an agent should not be penalized as the demands of other
agents grow.
Definition 8. A cost-sharing scheme χ is cross-monotonic if for all i, ξi ∈ Ωi, ξ−i,θ−i ∈Πj 6=iΩj,
χi(ξi, ξ−i) ≥ χi(ξi,θ−i), if ξ−i ≤ θ−i.
We summarize the required cost-sharing properties of a function as β-cost-sharing
property.
CHAPTER 2. UPPER BOUNDS 13
Definition 9. A function f :∏
i Ωi → R satisfies β-cost-sharing property if
given any product distribution p on∏
i Ωi, there exists a cost-sharing scheme for f
that is (a) cross-monotonic and (b) β-budget-balanced in expectation with respect to
distribution p.
Theorem 2. If function f is non-negative, monotone, and satisfies β-cost-sharing
property with β < ∞, then correlation gap of function f is at most 2β.
Remark 1. Note that any function f that has a β-budget-balanced cross-monotonic
cost-sharing scheme satisfies β-cost-sharing property. The property of β-budget-balance
“in expectation” is a relaxation, relevance of which will become clear when we discuss
tightness of our upper bound condition in Section 4.2.
Remark 2. A monotone submodular function f has a 1-cost-sharing property. An
example of cross-monotonic 1-budget-balanced scheme for such a function is the in-
cremental cost-sharing scheme, defined as,
χi(ξ) = f(ξ1, . . . , ξi, 0, . . . , 0) − f(ξ1, . . . , ξi−1, 0, . . . , 0),
where 0 denotes the smallest element in the domain Ωj,∀j. However, above theorem
only bounds the correlation gap of such a function by 2, and thus does not achieve
bound of e/(e − 1) presented in previous subsection. In the next subsection, we will
provide a stricter (but more complicated and harder to achieve) property that will
achieve the e/(e − 1) bound for submodular functions.
Corollary 2. If for every feasible x, the cost function h(x, ξ) is non-negative and
monotone in ξ, and satisfies β-cost-sharing property for some β < ∞, then for any
instance (h, Ω, pi), POC ≤ 2β.
Note that above corollary requires that a β-cost-sharing scheme exists for every
feasible x, where β should not depend on x.
Above results connecting cross-monotone cost-sharing schemes to correlation gap
are particularly interesting since they allow us to use the already existing non-trivial
work on designing β-budget-balanced cross-monotone cost-sharing schemes for non-
submodular cost functions such as those resulting from facility location, Steiner forest
CHAPTER 2. UPPER BOUNDS 14
network design, single source rent-or-buy, minimum makespan scheduling, etc. (see
Pal and Tardos (2003), Konemann et al. (2005), Gupta et al. (2007), Leonardi and
Schaefer (2004), Bleischwitz and Monien (2009)). Following are some results that
we can obtain as direct corollaries, using this existing literature on cross-monotonic
cost-sharing schemes.
Corollary 3. Let f(S) denote the makespan of the optimal assignment of jobs in set
S ⊆ V to m machines. Then correlation gap of f is atmost 4m/(m + 1), if either all
the jobs have identical workloads or if all the machines are identical. Correlation gap
is atmost 4d if there are d different workloads.
Corollary 4. Let f(S) denote the minimum cost of (metric) uncapacitated facility
location for serving cities in set S ⊆ V . Then correlation gap of f is atmost 6.
POC ≤ 6 for the corresponding two-stage stochastic (metric) uncapacitated facility
location problem.
Corollary 5. Let f(S) denote the minimum cost of Steiner forest network for con-
necting terminals in set S ⊆ V . Then correlation gap of f is atmost 4. POC ≤ 4 for
the corresponding two-stage stochastic Steiner forest problem.
We will discuss the formulations of two-stage stochastic problems further in Chap-
ter 3. In the following subsections, we provide rigorous proofs for the upper bounds
given by Theorem 1 and Theorem 2.
2.1.3 Cross-monotone cost-sharing with a prefix property
In this section, we consider functions of binary variables that have a cross-monotone
cost-sharing scheme with an additional prefix property. This property will allow us to
obtain a strict extension of results based on submodularity. Since we are considering
binary random vector ξ, we will equivalently represent it by the corresponding random
subset S of set V = 1, . . . , n. That is, Ω = 2V , the set of all subsets of set V . And,
for each i, pi is the probability of i to appear in random set S.
To define prefix property of cost-sharing, we need to make explicit the concept of
order-specific cost-sharing.
CHAPTER 2. UPPER BOUNDS 15
Definition 10. For any set S ⊆ V , we denote by χ(i, S, σS), the order-specific cost-
share of i ∈ S, given ordering σS on elements in S. An example of order-specific
cost-sharing scheme is the incremental cost-share,
χ(iℓ, S, σS) = f(Sℓ) − f(Sℓ−1), (2.6)
where iℓ denotes the ℓth element, and Sℓ denotes the set of elements of rank 1 to ℓ
in S, according to ordering σS.
Definition 11. An order-specific cost-sharing scheme has a prefix property, iff for all
S, σS,
χ(iℓ, S, σS) = χ(iℓ, Sℓ, σSℓ),
where σSℓis the restriction of ordering σS to elements in Sℓ. For example, the incre-
mental cost-sharing scheme in Equation (2.6) has prefix property.
Definition 12. We say that a function f : 2V → R has β-prefix-cost-sharing prop-
erty if for every product distribution p, f has an order-specific cost-sharing scheme
χ(i, S, σS) with
1. Expected β-budget balance: For all S, and orderings σS on S:
f(S)
β≤∑
i∈S
χ(i, S, σS), Ep[∑
i∈S
χ(i, S, σS)] ≤ Ep[f(S)]
2. Cross-monotonicity: For all i ∈ S, S ⊆ T , σS ⊆ σT :
χ(i, S, σS) ≥ χ(i, T, σT )
Here , σS ⊆ σT means that the ordering of elements of S is same in σS and σT ,
i.e., σS is a restriction of ordering σT to subset S.
3. Prefix property.
For submodular functions, the incremental cost-sharing scheme χ(iℓ, S, σS) =
f(Sℓ) − f(Sℓ−1) discussed earlier is a β-prefix-cost-sharing scheme with β = 1. An-
other example is summable cost-sharing schemes (see Roughgarden and Sundararajan
CHAPTER 2. UPPER BOUNDS 16
(2006)). We will bound correlation gap for functions of binary variables that have
β-prefix-cost-sharing property.
Theorem 3. If a function f : 2V → R is non-negative, monotone, and satisfies
β-prefix-cost-sharing property with β < ∞, then correlation gap is at most β ee−1
.
Note that the result in Theorem 1 for binary random variables can be derived as
a corollary of Theorem 3, since β = 1 for monotone submodular functions.
2.2 Proofs
2.2.1 Proof for binary random variables
In this subsection, we will equivalently represent a binary random variable ξ by
random subset S of set V = 1, . . . , n. That is, Ω = 2V , the set of all subsets of set
V . And, for each i, marginal probability pi will denote the probability of i to appear
in random set S.
For submodular functions on binary variables, a bound of e/(e− 1) on correlation
gap can be derived from results (Lemma 4 and Lemma 5) in Calinescu et al. (2007).
Theorem (Lemma 4 and Lemma 5 in Calinescu et al. (2007)): For any
instance (f, 2V , pi), if f is non-negative, non-decreasing and submodular, correla-
tion gap is bounded above by e/(e − 1).
At the end of this section, we prove the bound of βe/(e − 1) for functions on bi-
nary variables with β-prefix-cost-sharing. The corresponding result for submodular
function will also follow as a corollary of this result by substituting β = 1.
First, we present the proof of correlation gap bound of 2β for functions on binary
random variables that admit a β-cost-sharing scheme.
Lemma 1. For any instance (f, 2V , pi), if f is non-negative, non-decreasing, and
satisfies β-cost-sharing property, correlation gap is bounded above by 2β.
CHAPTER 2. UPPER BOUNDS 17
Proof. First, we consider a simplified problem. We assume that (a) all pis are equal
to 1/K for some finite integer K > 0, and (b) the worst case distribution is a “K-
partition-type” distribution. That is, the worst case distribution has support on
K disjoint sets A1, . . . , AK that form a partition of V , and each Ak occurs with
probability 1/K. Let us call such instances (f, 2V , 1/K) “nice” instances. Here, we
show that correlation gap is bounded by 2β for all “nice” instances. In Lemma 2, we
show that it is sufficient to consider only the “nice” instances, and thus complete the
proof.
For any set S ⊆ V , denote S∩k = S ∩Ak, and S−k = S\Ak, for k = 1, . . . , K. Let
χ be the β-cost-sharing scheme for function f , as per the assumptions of the lemma.
Also, for any subset T of S, denote χ(T, S) :=∑
i∈T χi(S). Then, by the budget
balance property of χ, expected value under the independent distribution,
ES[f(S)] ≥ ES
[∑K
k=1 χ(S∩k, S)]
. (2.7)
Note that under independent distribution, the marginal probability that an ele-
ment i ∈ Ak appears in random set S∩k is 1/K. Using this observation along with
cross-monotonicity of cost-sharing scheme χ and properties of independent distribu-
tion, we can derive that for any k,
ES[χ(S∩k, S)] ≥ ES[χ(S∩k, S ∪ Ak)]
= ES[∑
i∈AkI(i ∈ S∩k)χ(i, S−k ∪ Ak)]
= ES−k[∑
i∈AkES∩k
[I(i ∈ S∩k)χ(i, S−k ∪ Ak)|S−k]]
= 1K
E[∑
i∈Akχ(i, S ∪ Ak)]
= 1K
E[χ(Ak, S ∪ Ak)] .
(2.8)
Here, I(·) denotes the indicator function. Apply the above inequality to γ = 12−1/K
CHAPTER 2. UPPER BOUNDS 18
fraction of each term χ(S∩k, S) in (2.7) to obtain
ES[f(S)] ≥ ES
[∑K
k=1 (1 − γ) χ(S∩k, S) + γ 1K
χ(Ak, S ∪ Ak)]
= ES
[∑K
k=1
(
1−γK−1
)
(∑
j 6=k χ(S∩j, S)) + γ 1K
χ(Ak, S ∪ Ak)]
= 1(2K−1)
ES
[∑K
k=1(∑
j 6=k χ(S∩j, S)) + χ(Ak, S ∪ Ak)]
(using cross-monotonicity of χ) ≥ 1(2K−1)
ES
[∑K
k=1(∑
j 6=k χ(S∩j, S ∪ Ak)) + χ(Ak, S ∪ Ak)]
(using β-budget balance) ≥ 1(2K−1)β
ES
[∑K
k=1 f(S ∪ Ak)]
(using monotonicity of f) ≥ 1(2−1/K)β
(
1K
∑Kk=1 f(Ak)
)
.
Under the assumption of “nice” instance, the expected value on worst case distribution
is given by 1K
∑Kk=1 f(Ak). Therefore, correlation gap is bounded by 2β for nice
instances. Lemma 2 shows that it is sufficient to consider only the nice instances, and
completes the proof.
Lemma 2. For every instance (f, 2V , pi) such that f is non-decreasing and satisfies
β-cost-sharing property, there exists a nice instance (f ′, 2V ′
, 1/K) for some inte-
ger K > 0 such that f ′ is non-decreasing and satisfies β-cost-sharing property, and
correlation gap of instance (f ′, 2V ′
, 1/K) is at least as large as that of (f, 2V , pi).We make a technical assumption that all pis are rational and non-zero.
Proof. We use the following split operation.
Split: Given a problem instance (f, 2V , pi), and integers mi ≥ 1, i ∈ V , the
split operation defines a new instance (f ′, 2V ′
, p′r) as follows: split each item i ∈ V
into mi copies Ci1, C
i2, . . . , C
imi
, and assign a marginal probability of p′Ci
j
= pi
mito each
copy. Let V ′ denote the new ground set of size∑
i mi that contains all the duplicates.
Define the new cost function f ′ : 2V ′ → R as:
f ′(S ′) = f(Π(S ′)), for all S ′ ⊆ V ′, (2.9)
where Π(S ′) ⊆ V is the original subset of elements whose duplicates appear in S ′, i.e.
Π(S ′) = i ∈ V |Cij ∈ S ′ for some j ∈ 1, 2, . . . ,mi.
CHAPTER 2. UPPER BOUNDS 19
We claim that splitting an instance
• does not change the expected value under worst case distribution.
• can only decrease the expected value under independent distribution.
• preserves the monotonicity and β-cost-sharing property of the function.
The proofs of these claims appear in Appendix B.1.
Thus, the new instance generated on splitting has correlation gap at least as
large as the original instance. The remaining proof tries to use split operation to
reduce any given instance to a “nice” instance. Let p be the worst case distribution
for instance (f, 2V , pi). The set of distributions P is defined by a rational linear
program with |V | constraints, so P is compact, and p exists. Suppose that p is
not a partition-type distribution. Then, split any element i that appears in two
different sets in the support of worst case distribution. Simultaneously, split the
distribution by assigning probability p′(S ′) = p(Π(S ′)) to each set S ′ that contains
exactly one copy of i, and probability 0 to all other sets. Since each set in the support
of new distribution contains exactly one copy of every element i, by definition of
function f ′, the expected function value of f ′ on p′ is same as that of f on p. By
properties of split operation, the worst case expected values for the two instances
(before and after splitting) must be the same, so the distribution p′ forms a worst
case distribution for the new instance. Repeat the splitting until the distribution
becomes a partition-type distribution. Then, we further split each element (and
simultaneously the distribution) until the marginal probability of each new element
is 1/K for some large enough integer K. Note that such a finite K always exists
assuming pis are rational and non-zero.
Lemma 3. For any instance (f, 2V , pi), if f is non-negative, non-decreasing, and
satisfies β-prefix-cost-sharing property, correlation gap is bounded above by β e(e−1)
.
Proof. In this proof, we will first assume “nice instances” as in the proof for functions
with β-cost-sharing.
Let the optimal K-partition corresponding to the worst case distribution is
A1, A2, . . . , AK. Assume w.l.o.g that that f(A1) ≥ f(A2) ≥ . . . ≥ f(AK). Also
CHAPTER 2. UPPER BOUNDS 20
assume w.l.o.g. that elements of Ak come before Ak−1, i.e., AK = 1, . . . , |AK |,AK−1 = |AK | + 1, . . . , |AK | + |AK−1|, and so on.
Let χ(i, S, σS) is an order-specific β-prefix-cost-sharing scheme for function f , as
per the assumptions of the lemma. We will only be interested in orderings σS obtained
by restricting the fixed ordering 1, . . . , n to elements in S. So, we abbreviate χ(i, S, σS)
to χ(i, S) for simplicity of notation in the remaining proof. For any subset S ⊆ V ,
let Sl be the restriction of elements in set S to its smallest l elements. Let il denote
the lth element in S.
Then by prefix property and expected budget-balance of χ with respect to product
distribution∏
i pi:
ES⊆V [f(S)] ≥ ES⊆V
[∑|S|
l=1 χ(il, Sl)]
(2.10)
where the expected value is taken over the product distribution.
Denote φ(V ) := ES⊆V
[∑|S|
l=1 χ(il, Sl)]
. Let q = 1/K. We will show that
φ(V ) ≥ (1 − q)φ(V \A1) +1
βqf(A1)
Recursively using this inequality will prove the result. To prove this inequality, denote
S−1 = S ∩ (V \A1), S1 = S ∩A1, for any S ⊆ V . Since elements in A1 come after the
elements in V \A1, note that for any ℓ ≤ |S−1|, Sℓ ⊆ S−1, and for ℓ > |S−1|, iℓ ∈ S1.
φ(V ) = ES
[∑|S−1|
l=1 χ(il, Sl)]
+ ES
[∑|S|
l=|S−1|+1 χ(il, Sl)]
(2.11)
Since Sℓ ⊆ S ∪ A1, using cross-monotonicity of χ, the second term above can be
bounded as:
ES[∑|S|
l=|S−1|+1 χ(il, Sl)] ≥ ES[∑|S|
l=|S−1|+1 χ(il, S ∪ A1)] (2.12)
Because S−1 and S1 are mutually independent, for any fixed S−1, each i ∈ A1 will
have the same conditional probability q = 1/K of appearing in S1. Therefore,
CHAPTER 2. UPPER BOUNDS 21
ES
[∑|S|
l=|S−1|+1 χ(il, S ∪ A1)]
= ES−1
[
ES1[∑|S|
l=|S−1|+1 χ(il, S−1 ∪ A1)|S−1]]
= q ES−1[∑
i∈A1χ(i, S−1 ∪ A1)]
(2.13)
Again, using independence and cross-monotonicity, analyze the first term in the
right hand side of (2.11),
ES[∑|S−1|
l=1 χ(il, Sl)] = ES−1[∑|S−1|
l=1 χ(il, Sl)]
≥ (1 − q) ES−1[∑|S−1|
l=1 χ(il, Sl)] + q ES−1[∑|S−1|
l=1 χ(il, S−1 ∪ A1)]
= (1 − q) φ(V \A1) + q ES−1[∑|S−1|
l=1 χ(il, S−1 ∪ A1)].
(2.14)
Based on (2.11), (2.13) and (2.14), and the fact that the cost-sharing scheme χ is
β-budget balanced, we deduce
φ(V ) = (1 − q) φ(V \A1) + q ES−1[∑|S−1|
l=1 χ(il, S−1 ∪ A1) +∑
i∈A1χ(i, S−1 ∪ A1)]
≥ (1 − q) φ(V \A1) + 1βq ES−1
[f(S−1 ∪ A1)]
≥ (1 − q) φ(V \A1) + 1βq f(A1).
(2.15)
The last inequality follows from monotonicity of f . Expanding the above recursive
inequality for A2, . . ., AK , we get
φ(V ) ≥ 1
βq
K∑
k=1
(1 − q)k−1f(Ak). (2.16)
Since f(Ak) is decreasing in k, and q = 1/K by simple arithmetic one can show
φ(V ) ≥ 1β·∑K
k=1 qf(Ak) · (∑K
k=1(1−q)k−1)
K
≥ 1β· (1 − 1
e) ·∑K
k=1 qf(Ak)
By definition of φ(V ), this gives,
κ ≤ βe
(e − 1).
CHAPTER 2. UPPER BOUNDS 22
Next, we show that it is sufficient to consider nice instances. We use “Split”
operation as in Lemma 2 to reduce any instance to a nice instance. As in Lemma 2,
we use the observation that for any monotone function, the monotonicity is preserved
and the correlation gap can only become larger on splitting (see Property 1, 4, 5 in
Appendix B.1).
It remains to show that the new “nice” instance also has required cost-sharing
property. Given that the function f in original instance has a β-prefix-cost-sharing
property, we show that for any fixed ordering consistent with the ordering AK , . . . , A1
on the items in the new (nice) instance and product distribution with marginals 1/K,
there exists a cost-sharing method χ′ such that χ′ (a) is expected β-budget balanced,
(b) has a prefix property , and (c) is cross-monotone in following weaker sense: χ′ is
cross-monotone for any S ′ ⊆ T ′ such that S ′ is a partial-prefix of T ′, that is, for some
k ∈ 1, . . . , K, S ′ ⊆ AK ∪ · · · ∪ Ak, and T ′\S ′ ⊆ Ak ∪ · · · ∪ A1.
We note that this weaker version of cross-monotonicity for the new instance is
actually sufficient for above proof. To see this, observe that cross monotonicity is
used only in Equation (2.12) and (2.14), and at both of these places, the partial-
prefix property is satisfied.
The construction of this cost-sharing scheme χ′ is given in Appendix B.1, Property
6.
Remark 3. In above proofs, we assumed for simplicity that all pis are rational and
non-zero. In Appendix B.2, we show that the results hold even if the pis are not
rational. The assumption of pi being non-zero is without loss of generality, since we
could simply remove an item in V from the problem if its probability to appear in a
random set is 0.
2.2.2 Proof for finite domains
Lemma 4. Consider any instance (f, Ω, pi), f : Ω → R+, where Ω =∏
i Ωi and
for all i, |Ωi| < ∞. Then, correlation gap of instance (f, Ω, pi) is bounded by
• e/(e − 1), if f(ξ) is monotone and submodular.
CHAPTER 2. UPPER BOUNDS 23
• 2β, if f(ξ) is monotone and satisfies β-cost-sharing property.
Proof. We prove this lemma by first reducing the problem to a problem with binary
random variables only. Then, the results in the previous subsection on bounding
correlation gap for instances with binary random variables will complete the proof.
W.l.o.g, let Ωi = 0, . . . , Ki, where Ki < ∞. Given instance (f, Ω, pi), we
create a new instance (f ′, Ω′, p′ij) with binary variables only as follows. For every
variable ξi in the original instance, we create Ki new binary variables ξ′ijKi
j=1 in the
new instance, and set the marginal probability of ξ′ij to take value 1 as: p′ij = pi(j).
Also, given f :∏
i Ωi → R+, we define new function f ′ :∏n
i=10, 1Ki → R as,
f ′(ξ′) := f(Θ(ξ′)),
where for i = 1, . . . , n, the value of Θ(ξ′)i is given by the largest j for which ξ′ij is
non-zero. That is, for all i,
Θ(ξ′)i = maxj : ξ′ij = 1,
assuming maxj : ξ′ij = 1 returns 0 if none of ξ′ij, j = 1, . . . , Ki is 1.
Next, we compare the problem instance (f,∏
i Ωi, pi) to the reduced 0-1 in-
stance (f ′,∏n
i=10, 1Ki , p′ij). We show that this reduction has the properties of
preserving monotonicity, submodularity, β-cost-sharing, and the expected value over
worst case distribution. Also, the expected value over independent distribution can
only decrease. The proofs of these claims use very similar ideas as in the proof of
Lemma 2. See Appendix C for details.
Thus, we get a new instance (f ′, Ω′, p′r) where Ω′ = 0, 1n′
, f ′ is monotone
and has a β-cost-sharing scheme (or is submodular). And, correlation gap of the new
instance bounds the correlation gap of the original instance. Since, correlation gap of
the new 0-1 instance is bounded as required by the results in the previous subsection,
this completes the proof of the lemma.
CHAPTER 2. UPPER BOUNDS 24
2.2.3 Proof for countably infinite domains
Lemma 5. For any instance (f, Ω, pi), f : Ω → R+, where Ω = Ω1 × · · · ×Ωn is a
countable (possibly infinite) set, correlation gap is bounded by
• e/(e − 1), if f(ξ) is monotone and submodular.
• 2β, if f(ξ) is monotone and satisfies β-cost-sharing property.
In this section, we prove the above lemma using the corresponding result in Lemma
4 for finite domains.
Symbols and Notations W.l.o.g., let Ωi = 0, 1, 2, . . ., 0 being the smallest
element. Consider truncated sets Ωti, formed by truncating Ωis to its smallest t + 1
elements, i.e., Ωti = 0, . . . , t. Denote Ωt = Ωt
1 × · · ·×Ωtn. Throughout the proof, we
will use ξ to denote a random vector from the infinite space Ω, and use θ to denote a
random vector from the truncated domain Ωt. We also define a truncation operator
τi : Ωi → Ωti, which replaces every ξi greater than t by 0, i.e.,
τi(ξi) =
0, if ξi > t
ξi, o.w., ∀ξi ∈ Ωi.
Note that the above operator always replaces an element by a smaller (or same)
element. The corresponding generalization to n-dimensional vector τ : Ω → Ωt,
τ(ξ) = (τ1(ξ1), . . . , τn(ξn)), ∀ξ ∈ Ω.
Now, we define the truncated marginal probability distribution over each Ωti as:
pti(θi) =
∫
Ωi
I(τ(ξi) = θi)dpi(ξi), ∀θi ∈ Ωti.
Note that by such a definition, the probability mass for ξi > t will been shifted to
ξi = 0 in the new distribution pti.
CHAPTER 2. UPPER BOUNDS 25
Let p∗ denote the worst case joint distribution with marginals pi. That is,
p∗ = arg maxp∈P
Ep[f(ξ)]
For finite domains, it is easy to show that the set of distributions P is compact, and
hence p∗ exists. In general, one may consider p∗ to be a distribution in P such that
Ep∗ [f(ξ)] approximates supp∈P Ep[f(ξ)] arbitrarily well.
Next, we define three joint distributions (p∗)t, (pt)∗, and pt on Ωt, all with marginal
distributions pti.
• (p∗)t is the truncated version of p∗, defined as,
(p∗)t(θ) =
∫
Ω
I(τ(ξ) = θ)dp∗(ξ), ∀θ ∈ Ωt.
It is easy to check that (p∗)t has marginals pti.
• (pt)∗ is defined as the worst case (expectation maximizing) distribution over Ωt
such that the marginal distributions are given by pti.
• pt is the independent (product) distribution with marginals pti.
Also, define a truncated function f t by assigning its value at ξ to 0 if any ξi is
greater than t, i.e.,
f t(ξ) =
f(ξ), if τ(ξ) = ξ,
0, o.w. .
Proof Outline For the truncated domain Ωt, Lemma 4 for finite domains already
proves thatE(pt)∗ [f(θ)]
Ept [f(θ)]≤ κ, ∀t, (2.17)
where κ = e/(e − 1) for monotone submodular functions, and κ = 2β < ∞ for
functions with β-cost-sharing scheme. We want to prove that when t → ∞, then both
CHAPTER 2. UPPER BOUNDS 26
the numerator and the denominator converge to corresponding expected values on the
infinite domain Ω, and furthermore, for the worst case distribution, we want to show
that the expected value over infinite domain is finite (for the product distribution,
we have assumed that the expected value is finite). Specifically, we aim to prove the
following two limits.
limt→∞ Ept [f(θ)] = Ep[f(ξ)] (2.18)
and
limt→∞ E(pt)∗ [f(θ)] = Ep∗ [f(ξ)] < ∞. (2.19)
Once (2.18) and (2.19) are proved, the results in (2.17) for the finite domain can then
be extended to the infinite domain, i.e.,
limt→∞
E(pt)∗ [f(θ)]
Ept [f(θ)]=
limt→∞ E(pt)∗ [f(θ)]
limt→∞ Ept [f(θ)]=
Ep∗ [f(ξ)]
Ep[f(ξ)]≤ κ. (2.20)
We will first prove (2.18), and then use similar techniques to prove (2.19).
Proof of Equation (2.18) It is difficult to prove (2.18) directly, because the trun-
cated probability distribution pt changes with t. Instead, we will prove the conver-
gence using squeeze theorem. Specifically, we will show that
Ep[ft(ξ)] ≤ Ept [f(θ)] ≤ Ep[f(ξ)]. (2.21)
Then, since ft is non-decreasing and ft → f pointwise, by Lebesgue monotone conver-
gence theorem we have that Ep[ft(ξ)] → Ep[f(ξ)] when t → ∞. Then, (2.18) follows
from (2.21) using squeeze theorem (also known as Sandwich theorem) for limits.
To prove (2.21), observe that based on our definition of f t and the truncation
CHAPTER 2. UPPER BOUNDS 27
operator τ , it follows that
Ep[ft(ξ)] =
∫
Ω
f t(ξ)dp(ξ)
=
∫
Ω
I(τ(ξ) = ξ)f(ξ)dp(ξ)
≤∫
Ω
I(τ(ξ) = ξ)f(ξ)dp(ξ) +
∫
Ω
I(τ(ξ) 6= ξ)f(τ(ξ))dp(ξ)
=
∫
Ω
f(τ(ξ))dp(ξ)
= Ept [f(θ)],
and
Ep[f(ξ)] =
∫
Ω
f(ξ)dp(ξ) ≥∫
Ω
f(τ(ξ))dp(ξ) =
∫
Ωt
f(θ)dpt(θ) = Ept [f(θ)].
Thus, (2.21) is proved and the result in (2.18) follows from the squeeze theorem.
Proof of Equation (2.19) Similarly, we use squeeze theorem to prove equation
(2.19). We show that
Ep∗ [ft(ξ)] ≤ E(p∗)t [f(θ)] ≤ E(pt)∗ [f(θ)] ≤ Ep∗ [f(ξ)] < ∞. (2.22)
We observe that
Ep∗ [ft(ξ)] =
∫
Ω
f t(ξ)dp∗(ξ)
=
∫
Ω
I(τ(ξ) = ξ)f(ξ)dp∗(ξ)
≤∫
Ω
f(τ(ξ))dp∗(ξ)
=
∫
Ωt
f(θ)
(∫
Ω
I(τ(ξ) = θ)dp∗(ξ)
)
dθ
=
∫
Ωt
f(θ)d(p∗)t(θ)
= E(p∗)t [f(θ)],
CHAPTER 2. UPPER BOUNDS 28
which proves the first inequality in (2.22) from the left.
The second inequality in (2.22), i.e., E(p∗)t [f(θ)] ≤ E(pt)∗ [f(θ)], follows simply
from the fact that both (p∗)t and (pt)∗ have marginal distributions pti, and (pt)∗ is
defined as the expectation maximizing distribution with these marginals.
For the third inequality, given the distribution (pt)∗ on the truncated domain Ωt,
we will define a distribution p on Ω such that probability on ξ : τ(ξ) = θ sums to
(pt)∗(θ). For any ξ such that τ(ξ) = θ, we assign p(ξ) = (pt)∗(θ)∏
i pi(ξi)/∏
i pti(θi).
It is easy to verify that p will have marginals pi. Then
E(pt)∗ [f(θ)] =
∫
Ωt
f(θ)d(pt)∗(θ)
=
∫
Ω
f(τ(ξ))dp(ξ)
≤∫
Ω
f(ξ)dp(ξ)
= Ep∗ [f(ξ)].
We are left to show that all these integrals have finite values and the convergence
Ep∗ [ft(ξ)] → Ep∗ [f(ξ)]. We first observe that by (2.17) and (2.18), E(pt)∗ [f(ξ)] is
uniformly upper bounded by the expected value of f over the product distribution on
the countably infinite domain Ω, and this expected value is finite by our assumption.
Specifically,
E(pt)∗ [f(θ)] ≤ κEpt [f(θ)] → κEp[f(ξ)] < ∞. (2.23)
Now, since f t is monotone in t, and f t → f pointwise, by Lebesgue monotone conver-
gence theorem Ep∗ [ft(ξ)] → Ep∗ [f(ξ)]. By applying squeeze theorem on (2.22), this
implies E(pt)∗ [f(θ)] → Ep∗ [f(ξ)], and from (2.23), Ep∗ [f(ξ)] < ∞. This completes the
proof.
2.2.4 Proof for the uncountable domains Ω ⊆ Rn
Lemma 6. For any instance (f, Ω, pi), f : Ω → R+, where Ω = Ω1×· · ·×Ωn ⊆ Rn,
correlation gap is bounded by
CHAPTER 2. UPPER BOUNDS 29
• e/(e − 1), if f(ξ) is monotone and submodular.
• 2β, if f(ξ) is monotone and has a β-cost-sharing scheme.
We also make a technical assumption that f has finite expected value on the product
distribution.
Proof. First, let us assume that Ω = (0, 1)n, and each pi is a uniform distribution on
interval (0, 1). Then, since the function f : Ω → R+ is monotone, the set of all points
of discontinuity of f is of Lebesgue measure 0 (Lavric (1993)). That is,
∫
discont(f)
dξ = 0
Therefore, for any joint probability distribution p with uniform marginals pi,∫
discont(f)
dp(ξ) ≤∫
discont(f)
p1(ξ1)dξ =
∫
discont(f)
dξ = 0
Therefore f is continuous almost everywhere with respect to any joint distribution p
with marginals pi, which includes in particular the worst case joint distribution p∗
and the independent distribution p.
Let us now define a lattice ΓL = ℓi2−Ln
i=1, ℓi = 1, . . . , 2L − 1,∀i, and a cor-
responding lattice function fL(ξ) = f(maxθ ∈ ΓL, θ ≤ ξ). Note that since f
is a non-decreasing function, fL(ξ) is non-decreasing in L. By our construction of
ΓL, there will be a sequence of lattice points θL → ξ, where θL ∈ ΓL. So, if f is
continuous at ξ, then fL(ξ) → f(ξ) as L → ∞. Therefore, we have that fL converges
to limit f almost everywhere (with respect to probability measures p∗ and p). So, by
Lebesgue monotone convergence theorem,
Ep∗fL → Ep∗f, (2.24)
EpfL → Epf. (2.25)
Next, for a fixed L, we can define truncated probability distributions (that have
support restricted to lattice ΓL) pLi , (p∗)L, (pL)∗, pL, in a manner similar to the
CHAPTER 2. UPPER BOUNDS 30
previous section on countable domains. The truncation operator will now be defined
as
τ(ξ) = maxθ ∈ ΓL,θ ≤ ξ.
The result for each fixed lattice size will then follow from the result for finite domains.
That is, by Lemma 4,E(pL)∗ [f(θ)]
EpL [f(θ)]≤ κ,
where κ = e/(e − 1) if f is monotone and submodular, and κ = 2β if f is monotone
and has a β-cost-sharing scheme. Then, convergence to infinite lattice can be proven
in a manner similar to Subsection 2.2.3 and 2.2.3, using (2.24) and (2.25) along with
Squeeze theorem.
The remaining proof will demonstrate that w.l.o.g. we can assume that the
marginal distribution on each variable is uniform distribution on (0, 1). Otherwise,
following simple transformation can be made such that each transformed marginal
variable has a uniform distribution. The idea is to replace each variable by its cu-
mulative probability. More precisely, let Fi(ξi) represents the cumulative probability
distribution corresponding to pi. Replace each random variable ξi by ξ′i that is uni-
formly distributed on (0, 1), and replace function f by f ′ defined as f ′(ξ′) = f(ξ),
where
ξi = F−1i (ξ′i) = infr ∈ Ωi : Fi(r) ≥ ξ′i.
It is easy to verify that monotonicity and cost-sharing/submodularity properties are
preserved by this transformation (consider new cost-shares χ′i(ξ
′) = χi(ξ)). Also,
observe that for any joint distribution p′ on (0, 1)n with uniform marginals, there
exists a corresponding distribution p on Ω that has marginals pi and the same
expected value; and vice versa. For example, for any given p′, consider
p(ξ) =
∫
Ω
I(F−1i (ξ′i) = ξi,∀i)p′(ξ′)dξ′.
CHAPTER 2. UPPER BOUNDS 31
And, for any given p, consider any distribution that spreads the support on ξ uni-
formly on the set ξ′ : F−1i (ξ′i) = ξi,∀i. Therefore, the expected value under worst
case distribution does not change as a result of this transformation. Also, below we
show that the expected value on independent distribution does not change.
∫
(0,1)n
f(ξ′)dξ′ =
∫
Ω
f(ξ)
(∫
(0,1)n
I(F−1(ξ′) = ξ)dξ′)
dξ
=
∫
Ω
f(ξ)
(
n∏
i=1
∫
(0,1)
I(F−1i (ξ′i) = ξi)dξ′i
)
dξ
=
∫
Ω
f(ξ)n∏
i=1
pi(ξi)dξ
= Ep[f(ξ)]
Chapter 3
Applications
3.1 Approximation of Distributionally Robust Stochas-
tic Optimization
The DRSP model, also known as minimax stochastic programs, was proposed by
Scarf as early as 1958 (Scarf (1958)) and Zackova (1966) for modeling robust decision
making under limited distribution information.
In this approach, one minimizes the expected cost over the worst joint distribution
among all probability distributions consistent with the available information. That
is,
(DRSP) minimizex∈C maximizep∈P Ep[h(x, ξ)], (3.1)
where P is a collection of possible probability distributions on Ω, and for any x ∈ C,
Ep[h(x, ξ)] denotes the expected value of h(x, ξ) over a distribution p on ξ. The
DRSP model can be interpreted as a two-person game. The decision maker chooses
a decision x hoping to minimize the expected cost, while the nature adversarially
chooses a distribution p from the collection P to maximize the expected cost of the
decision.
Since its introduction, it has gained extensive interest (e.g., see Dupacova (1987),
Shapiro and Kleywegt (2002), Dupacova (2001) and references therein). The inner
32
CHAPTER 3. APPLICATIONS 33
maximization problem has also been studied as moment problem (e.g., in Rogosinski
(1958) and in Landau (1987)). The applicability of various existing results depends
on the assumed form of set P and the properties of objective function h. In Lagoa and
Barmish (2002) and in Shapiro (2006), the authors consider a set containing unimodal
distributions that satisfy some given support constraints. Under some conditions on
h(x, ξ), they characterize the worst distribution as being the uniform distribution.
The most popular type of distributional set P imposes linear constraints on moments
of the distribution, as considered in Scarf (1958), in Dupacova (1987), in Prekopa
(1995), in Bertsimas et al. (2000) and in Bertsimas and Popescu (2005). Scarf (1958)
exploited the fact that for the newsvendor problem the worst distribution of demand
with given mean and variance could be chosen to be one with all of its weight on two
points. This idea was reused in Yue et al. (2006), and in Popescu (2007), although for
more general forms of objective function. More recently, Delage and Ye (2010) showed
that if the distributions have a fixed mean, bounded (by the positive semidefinite
partial order) covariance matrix, and are supported on a closed convex set, then
DRSP can be reformulated as a semidefinite program, solvable in polynomial time.
Goh and Sim (2010) could develop tractable approximations to the DRSP problem by
restricting to linear decision rules. Global optimization methods have been suggested
to compute the worst case distribution in Ermoliev et al. (1985) and in Gaivoronski
(1991)). Shapiro and Ahmed (2004) use duality to reduce any minimax program with
given moment constraints to a minimization type stochastic program, and suggest
using Sample Average Approximation (SAA) method to solve this stochastic program.
They do not explicitly derive polynomial time bounds on the sample size required by
the SAA method, which may depend on the objective function and constraints of the
formulated stochastic program.
In this work, we are interested in the Distributionally Robust Stochastic Program
under given marginal distribution constraints. That is, P denotes the set of all joint
distributions on Ω = Ω1 × · · · × Ωn, such that marginal distribution on each Ωi is pi
(refer to Equation (1.1) in Chapter 1).
The DRSP models closest to ours are those considered in Klein Haneveld (1986)
and Bertsimas et al. (2005). Klein Haneveld (1986) considers the problem of finding
CHAPTER 3. APPLICATIONS 34
the worst case distribution for a PERT-type project planning problem under given
marginal distributions. Due to special structure of this application, the dual problem
can be reduced to a finite dimensional convex program. Bertsimas et al. (2005) study
the worst case expectation of optimal value of generic combinatorial optimization
problems with random objective coefficients, under limited marginal distribution in-
formation. They show that tight upper and lower bounds on this worst case expected
value can be computed in polynomial time, under certain conditions. They also an-
alyze the asymptotic behavior of this expected value under knowledge of complete
marginal distributions.
In general, computing the worst case joint distribution or the corresponding ex-
pected value is difficult. The DRSP problem cannot be solved to optimality in poly-
nomial time, especially when the support of the distributions is restricted. Bertsimas
and Popescu (2005) show that the problem is NP-hard if moments of third or higher
order are given, or if moments of second order are given and the domain of each ran-
dom variable is restricted to non-negative real numbers. For binary random variables,
i.e. if the domain is restricted to 0, 1, the problem of finding worst case distribution
with given marginals is equivalent to finding worst case distribution with given mean
or first order moments. We show that this problem is NP-hard even when restricted
to objective functions that are monotone and submodular in the random variable. In
fact, we prove a stronger result that this problem is hard to approximate within any
reasonable factor even with specific assumptions on the objective function.
Theorem 4. Given a function of n binary random variables, the problem of com-
puting its expected value under worst case distribution with given mean is NP-hard,
even when restricted to functions that are monotone and submodular in the random
variables. The problem cannot be approximated within a factor better than O( 1√n) in
polynomial time for some monotone and subadditive functions.
The proof of NP-hardness is based on the observation that even though the prob-
lem of finding worst case distribution with given mean can be formulated as a linear
program (with exponential number of variables), the problem of computing a sepa-
rating hyperplane for this linear program is at least as hard as MAX-CUT problem.
CHAPTER 3. APPLICATIONS 35
The proof appears in Appendix A.
Considering the difficulty of computing the worst case distribution, a natural
question is how much risk it involves to simply ignore the correlations and, given
marginal distributions pi, minimize the expected cost under the independent or
product distribution
p(ξ) =∏
i
pi(ξi)
instead of the worst case distribution. Or, in other words, how well does the stochas-
tic optimization model with independent distribution approximates the robust DRSP
model. “Price of correlations” defined in this work quantifies exactly this approxima-
tion factor. Given a problem instance (h, Ω, pi), let xI be the optimal decision of
stochastic optimization problem with objective h, assuming independent (product)
distribution. Then, by definition, Price of Correlations (POC ) is the approximation
factor that xI achieves for the corresponding DRSP problem (refer to Equation 1.6).
And a stochastic optimization problem with independent (product) distribution is
often relatively easier to solve, either by sampling or by other algorithmic techniques
(e.g., see Kleinberg et al. (1997), Mohring et al. (1999)). Thus, a small upper bound
on POC would yield a simple approximation technique for the DRSP problem proven
earlier to be difficult to solve.
3.1.1 Stochastic Uncapacitated Facility Location (SUFL)
In two-stage stochastic facility location problem, any facility j ∈ F can be bought at
a low cost wIj in the first stage, and higher cost wII
j > wIj in the second stage, that is,
after the random set S ⊆ V of cities to be served is revealed. The decision maker’s
problem is to decide x ∈ 0, 1|F |, the facilities to be built in the first stage so that
the total expected cost E[h(x, S)] of facility location is minimized (refer to Swamy
and Shmoys (2005) for further details on the problem definition).
Proposition 1. For the two-stage stochastic metric uncapacitated facility location
(SUFL) problem, POC ≤ 6.
Proof. Given a first stage decision x, the cost function h(x, S) = wI · x + c(x, S),
CHAPTER 3. APPLICATIONS 36
where c(x, S) is the cost of deterministic UFL for set S ⊆ V of customers and set F
of facilities such that the facilities x already bought in first stage are available freely
at no cost, while any other facility j costs wIIj . For deterministic metric UFL there
exists a cross-monotonic, 3-budget balanced, cost-sharing scheme (Pal and Tardos
(2003)). Therefore, using Theorem 2, we know that the POC for stochastic metric
UFL has an upper bound of 2β = 6.
Above proposition reduces the distributionally robust facility location problem to
the well-studied (e.g., see Swamy and Shmoys (2005)) stochastic UFL problem un-
der known (independent Bernoulli) distribution at the expense of a 6-approximation
factor.
3.1.2 Stochastic Steiner Tree (SST)
In the two-stage stochastic Steiner tree problem, we are given a graph G = (V,E).
An edge e ∈ E can be bought at cost wIe in the first stage. A random set S ⊆ V of
terminal nodes to be connected are revealed in the second stage. More edges may be
bought at a higher cost wIIe , e ∈ E in the second stage after observing the actual set
of terminals. Here, decision variable x is the edges to be bought in the first stage,
and cost function h(x, S) = wI ·x+c(x, S), where c(x, S) is the deterministic Steiner
tree cost for connecting nodes in set S, given that the edges in x are already bought.
Proposition 2. For the two-stage stochastic Steiner tree (SST) problem, POC ≤ 4.
Proof. Since a 2-budget balanced cross-monotonic cost-sharing scheme is known for
deterministic Steiner tree (see Konemann et al. (2005)), we can use Theorem 2 to
conclude that for this problem POC ≤ 2β = 4.
Above proposition reduces the distributionally robust stochastic Steiner tree prob-
lem to the well-studied (for example see Gupta et al. (2004)) SST problem under
known (independent Bernoulli) distribution at the expense of a 4-approximation fac-
tor.
CHAPTER 3. APPLICATIONS 37
3.1.3 Stochastic bottleneck matching
Consider a graph G = (V,E), let M denote the set of all perfect matchings in graph
G. Associate a cost ξij ∈ R+ with every edge (i, j) ∈ E. Bottleneck matching
problem (Derigs (1980)) is to find a perfect matching of minimum cost, where cost of
a matching is determined by the most expensive edge in the matching. Formally,
minimizeσ∈M maximize(i,j)∈σ ξij .
In the stochastic version of this problem, the edge costs ξij are random variables,
and objective is to find the matching σ that minimizes expected cost E[h(σ, ξ)],
where h(σ, ξ) = max(i,j)∈σ ξij. It is easy to verify that for every fixed matching σ, the
objective function h(σ, ξ) is non-decreasing and submodular in ξ. Therefore, applying
Theorem 1 for submodular functions of continuous random variables (Ω = Rn), we
obtain the following result.
Proposition 3. For stochastic bottleneck matching, POC ≤ e/(e − 1).
Thus, if correlations are unknown, random variables ξij can be assumed to be
independent to get e/(e − 1) approximation for the corresponding (DRSP) model.
3.2 Deterministic Optimization
3.2.1 d-dimensional maximum matching
A d-dimensional matching is a generalization of bipartite matching (a.k.a. 2-dimensional
matching) to d-uniform hypergraphs. A d-uniform hypergraph consists of d disjoint
sets of vertices V1, . . . , Vd. The set of hyperedges is given by E = V1×V2 · · ·×Vd, and
each edge is associated with a non-negative weight given by a d dimensional weight
matrix W . For an edge (i1, . . . , id) ∈ E, we denote its weight by W [i1, . . . , id]. A set
of edges M ⊆ E forms a d-dimensional matching if every vertex appears in at most
one edge in M . d-dimensional maximum matching problem is to find a matching M
of maximum weight. Assuming w.l.o.g. that |V1| = · · · = |Vd| = T , the problem is
CHAPTER 3. APPLICATIONS 38
formulated as,
maxx
∑
1≤i1,...,id≤T W [i1, . . . , id]x[i1, . . . , id]
s.t.∑
i1,...,id|ij=t x[i1, . . . , id] = 1 for 1 ≤ j ≤ d, 1 ≤ t ≤ T
x[i1, . . . , id] ∈ 0, 1 for all i1, . . . , id,
(3.2)
where W [i1, . . . , id] denotes the weight of hyperedge (i1, . . . , id), and x[i1, . . . , id] de-
notes the decision whether to include the hyperedge (i1, . . . , id) in the matching.
Every node should be included in exactly one hyperedge in a matching.
d-dimensional maximum matching is a notoriously hard problem. To date the best
approximation known for general d is 2/d (Hurkens and Schrijver (1989), Berman
(2000)). Also, it is known there is no polynomial time algorithm that achieves an
approximation factor of Ω(ln(d)/d) unless P=NP (Hazan et al. (2006)).
Our result on correlation gap will provide a (1 − 1/e)-approximate greedy algo-
rithm for this problem when the weight matrix W is monotone and satisfies “Monge
property”.
Definition 13. A d-dimensional matrix W is said to be a “Monge matrix” if for any
set of indices (i1, . . . , id), (j1, . . . , jd),
W [i1, . . . , id] + W [j1, . . . , jd]
≥ W [maxi1, j1, . . . , maxid, jd] + W [mini1, j1, . . . , minid, jd]
For 2-dimensional matrix, it simply means the for any 2 × 2 submatrix, the sum of
main diagonal entries is less than or equal to the sum of off-diagonal entries. Also,
observe that the condition is same as the condition for submodularity of function
W : 1, . . . , Td → R.
Monge property has been studied extensively earlier for d-dimensional minimum
matching and transportation problem (e.g., in Burkard et al. (1996), Bein et al.
(1995)), in which case it characterizes precisely the instances of the minimization
problem which are solvable by simple greedy algorithm that at every step adds the
maximum weight edge compatible to the matching created so far. However, in case
CHAPTER 3. APPLICATIONS 39
of maximum matching, it is easy to show that the approximation factor for greedy
algorithm can be as bad as O(1/n) even under Monge property. To our knowledge,
ours is the first result that shows maximum matching problem has a constant approx-
imation under monotonicity and Monge property. We show interesting applications
in display advertising where the weight matrix satisfies this property.
We first show that expected weight of uniformly random matching is away from
the weight of optimal matching by atmost a factor equal to the correlation gap of the
weight function. A greedy algorithm will then be developed by derandomizing this
randomized algorithm.
Proposition 4. Expected weight of uniformly random matching is atleast 1κ
of the
maximum d-dimensional matching, where κ is the correlation gap of function W (i1, . . . , id).
κ ≤ e/(e − 1) if matrix W is monotone in i1, . . . , id and satisfies Monge property.
Proof. Observe that on relaxing the integrality constraints of (3.2) and scaling the
variables by T , we get exactly the problem of finding the worst case joint distribution
for random vector (i1, . . . , id), given that marginal distribution on each variable ij
is uniform distribution on 1, . . . , T. Therefore, a solution generated by sampling
from product of uniform distributions will give 1/κ approximation to this problem.
Now, since the probability of any edge (i1, . . . , id) in product of uniform distributions
is same as the probability of an edge in a uniformly random matching, the claim
follows.
When W is a monotone Monge matrix, the function W (i1, . . . , id) is monotone
and submodular in (i1, . . . , id), therefore κ ≤ e/(e − 1).
A greedy algorithm (Algorithm 1) can be obtained by derandomizing this random
choice.
Proposition 5. The weight of matching produced by greedy Algorithm 1 is atleast
1/κ of the maximum matching.
Proof. We show that the weight of matching produced by the greedy algorithm is
greater than or equal to the expected weight of uniformly random matching. Note that
CHAPTER 3. APPLICATIONS 40
Algorithm 1 A greedy algorithm for d-dimensional matching
1. Initialize matching M = . Let Vj denote the set of nodes not matched by anyedge in the matching constructed so far. Initialize Vj = Vj for all j = 1, . . . , d.
2. For t = 1, . . . , T
• Find the edge (i∗1, . . . , i∗d) ∈ V1 × · · · × Vd that maximizes a combination of
its own weight and average weight of the remaining edges. That is,
(i∗1, . . . , i∗d) = arg max
(i1,...,id)∈V1×···×Vd
W [i1, . . . , id]+(T−t)AVG(V1−i1, . . . , Vn−in)
where AVG(V1, . . . , Vn) denotes the average weight of edges in V1×· · ·×Vn.
• Add edge (i∗1, . . . , i∗d) to the matching M.
• Update Vj = Vj − i∗j, j = 1, . . . , d.
3. Output the matching M.
the expected weight of uniformly random matching is T ·AVG(V1, . . . , Vd). Then, the
claim simply follows from the observation that for any V1, . . . , Vd, |V1| = · · · = |Vd| = r,
rAVG(V1, . . . , Vd) ≥ maxi1,...,id
W [i1, . . . , id] + (r − 1) · AVG(V1 − i1, . . . , Vd − id).
The claim in Proposition 4 completes the proof.
3.2.2 Welfare maximization
Consider the problem of maximizing total utility achieved by partitioning a set V of
n goods among K players, all with identical utility functions f(S) for a subset S of
goods. The optimal welfare OPT is obtained by following integer program:
maxα
∑
S αSf(S)
s.t.∑
S:i∈S αS = 1, ∀i ∈ V∑
S αS = K
αS ∈ 0, 1, ∀S ⊆ V.
(3.3)
CHAPTER 3. APPLICATIONS 41
The welfare maximization problem recently attracted significant attention in com-
puter science literature (e.g., Feige (2006), Mirrokni et al. (2008), Vondrak (2008),
Nisan et al. (2007) and references therein). A more general formulation of this prob-
lem that is often considered in the literature allows non-identical utility functions for
different players. The problem is known to be NP-hard even for the case of identical
monotone utility functions. Also, the problem cannot be approximated in polynomial
time within a factor better than 1−1/e+ ǫ for identical submodular utility functions,
and 1/√
n for identical subadditive utility functions 1 (Mirrokni et al. (2008)). A
1− 1/e approximation is known for non-decreasing and submodular utility functions
(Calinescu et al. (2007), Vondrak (2008)). In this section, we show that a simple
greedy algorithm gives an approximation factor bounded by correlation gap of func-
tion f for the welfare maximization problem with identical utility functions f . This
will match the known approximation result for identical submodular utilities and give
new approximation results for some interesting non-submodular utility functions.
Observe that on relaxing the integrality constraints on α and scaling it by 1/K,
the problem in (3.3) reduces to that of finding the worst-case distribution α∗ such
that the marginal probability∑
S:i∈S αS of each element i ∈ V is 1/K. Therefore,
OPT ≤ Eα∗ [Kf(S)] .
Consequently, our correlation gap bounds lead to the following corollary for welfare
maximization problems:
Proposition 6. For the welfare maximization problem (3.3) with identical utility
functions f , the expected utility achieved by randomized algorithm that assigns goods
uniformly at random to a player is at least 1/κ of optimal, where κ is the correlation
gap of f .
Derandomizing the above randomized algorithm leads to following simple greedy
algorithm.
1Under the assumption of “demand oracle”, that is, assuming that the NP-hard problemmaxS f(S)−λT S can be solved exactly for any fixed λ, a 2-approximation is known for non-decreasingsubadditive functions (Feige (2006)).
CHAPTER 3. APPLICATIONS 42
Algorithm 2 Greedy algorithm for welfare maximization with identical utility
1. Let Ak denote the goods assigned to player k so far. Initialize Ak = for allk = 1, . . . , K.
2. For i = 1, . . . , n
• Assign good i to player j who gives maximum expected increment in utility,given the already existing assignment, i.e.,
j = arg maxk
E[f(S ∪ Ak ∪ i) − f(S ∪ Ak)]
where S is a random subset of remaining goods i + 1, . . . , n, and eachremaining good appears in S independently with probability 1/K.
• Update Aj = Aj ∪ i.
3. Output partition A1, . . . , Ak.
Proposition 7. For the welfare maximization problem (3.3) with identical utility
functions f , the total utility of the partition formed by greedy Algorithm 2 is atleast
1/κ of the optimal, where κ is the correlation gap of f .
Proof. Let Sik denotes the random subset of goods among i + 1, . . . , n that player
k gets in a random partition formed by assigning each good independently to the K
players with probability 1/K. Let Ai1, . . . , A
iK denote the partition of goods 1, . . . , i
after step i of the greedy algorithm. We prove that after step i,∑
k E[f(Sik ∪ Ai
k)] is
greater than or equal to the expected utility achieved by the randomized algorithm
that assigns goods uniformly at random. The claim will then follow from Proposition
6.
Clearly, in the beginning when A0k = , the claim is trivially true. Assume that
it is true at step i − 1. Then,
CHAPTER 3. APPLICATIONS 43
∑
k
E[f(Si−1k ∪ Ai−1
k )] =K∑
j=1
1
K
∑
k 6=j
E[f(Sik ∪ Ai−1
k )] + E[f(Sik ∪ Ai−1
j ∪ i)]
≤∑
k 6=j∗
E[f(Sik ∪ Ai−1
k )] + E[f(Sik ∪ Ai−1
j∗ ∪ i)]
=∑
k
E[f(Sik ∪ Ai
k)]
where Aij∗ = Ai−1 ∪ j∗, Ai
k = Ai−1k , k 6= j∗, and choice of j∗ is given by,
j∗ = arg maxj
E[∑
k 6=j
f(Sik ∪ Ai−1
k )] + E[f(Sik ∪ Ai−1
j ∪ i)]
= arg maxj
E[f(Sij ∪ Ai−1
j ∪ i) − f(Sij ∪ Ai−1
j )]
= arg maxj
E[f(S ∪ Ai−1j ∪ i) − f(S ∪ Ai−1
j )]
where S denotes a random subset of goods i+1, . . . , n with each good appearing in
S independently with probability 1/K. The last equality follows from the observation
for any fixed k, and any fixed subset R ⊆ i + 1, . . . , n, Pr(Sik = R) = Pr(S = R).
Since above choice of j∗ as the choice of j made in our greedy algorithm, this completes
the proof.
For submodular functions, our correlation gap bounds lead to 1 − 1/e approxi-
mation factor that matches the approximation factor proven earlier in the literature
(Calinescu et al. (2007), Vondrak (2008)) for the case of identical monotone submodu-
lar functions. Further, the cost-sharing based criterion extends the result to problems
with non-submodular functions not previously studied in the literature.
Corollary 6. For welfare maximization with identical non-decreasing submodular
utility functions, greedy Algorithm 2 is a (1 − 1/e)-approximate algorithm.
Chapter 4
Lower bounds
In this chapter, we demonstrate the tightness of our upper bounds on correlation gap
and POC .
In the first section, we provide examples of functions that are monotone super-
modular and monotone subadditive respectively in the random variables, but could
still have arbitrarily large correlation gap if n is large. This illustrates the importance
of characterization using techniques like cost-sharing, in order to get upper bounds
as in Theorem 2 that do not depend on n. Also, we provide examples of instances
with correlation gap close to the upper bounds provided by our upper bound theo-
rem for cost functions like facility location, and Steiner forest that have approximate
cross-monotonic cost-sharing schemes. We also show tightness of our upper bound
for submodular functions via a counter-example.
In the second section, we prove a tightness result of stronger nature – we prove that
any problem instance that has β correlation gap will have a β-cost-sharing scheme.
Thus, our cost-sharing based condition is a near-tight characterization of functions
with small correlation gap.
44
CHAPTER 4. LOWER BOUNDS 45
x
s u
11
1
t1
t2
tn
Figure 4.1: Example with an exponential correlation gap
4.1 Lower bounds by examples
In this section, we provide examples of instances with large correlation gap. Note
that since POC is bounded by correlation gap, a lower bound on POC implies cor-
responding lower bound on correlation gap, and it is sufficient to provide examples
for the former.
4.1.1 Supermodular functions
Lemma 7. There exists an instance (h, 2V , pi) with function h(x, S) that is non-
decreasing and supermodular in S, and POC ≥ Ω(2n). Here n = |V |.
Proof. Consider a two-stage minimum cost flow problem as in Figure 1. There is a
single source s, and n sinks t1, t2, . . . , tn. Each sink ti has a probability pi = 12
to
demand a flow, and then a unit flow has to be sent from s to ti. Each edge (u, ti) has
a fixed capacity 1, but the capacity of edge (s, u) needs to be purchased. The cost
of capacity x on edge (s, u) is cI(x) in the first stage, and cII(x) in the second stage
after the set of demands is revealed, defined as
cI(x) =
x, x ≤ n − 1,
n + 1, x = n,cII(x) = 2nx.
Given a first stage decision x, the total cost of edges that need to be bought in
order to serve a set S of requests is given by: h(x, S) = cI(x)+cII((|S|−x)+) = cI(x)+
CHAPTER 4. LOWER BOUNDS 46
2n(|S| − x)+. It is easy to check that h(x, S) is non-decreasing and supermodular in
S for any given x, i.e. h(x, S ∪ i) − h(x, S) ≥ h(x, T ∪ i) − h(x, T ) for any S ⊇ T .
The objective is to minimize the total expected cost E[h(x, S)]. If the decision maker
assumes independent demands from the sinks, then xI = n−1 minimizes the expected
cost, and the expected cost is n; however, for the worst case distribution the expected
cost of this decision will be g(xI) = 2n−1 +n−1 (when Pr(1, . . . , n) = Pr(∅) = 1/2
and all other scenarios have zero probability).
Hence, the correlation gap at xI is exponentially high. A risk-averse strategy is to
instead use the robust solution xR = n, which leads to a cost g(xR) = n + 1. Thus,
POC = g(xI)/g(xR) = Ω(2n).
We may remark here that although independent distribution does not appear to
be a good substitute of worst case distribution for supermodular functions, the worst
case joint distribution actually has a nice closed form in this case, and (DRSP) model
is directly solvable in polynomial time. Refer to Agrawal et al. (2010) for details.
4.1.2 Subadditive functions
Lemma 8. There exists an instance (h, 2V , pi) with function h(x, S) that is non-
decreasing and (fractionally) subadditive 1 in S, and POC ≥ Ω(√
n log log(n)/ log(n)).
Here n = |V |.
Proof. Consider a set cover problem with elements V = 1, . . . , n. Each item i ∈ V
has a marginal probability of 1/K to appear in the random set S where K =√
n. The
covering sets are defined as follows. Consider a partition of V into K sets A1, . . . , AK
each containing K elements. The covering sets are all the sets in the Cartesian
product A1×· · ·×AK . Each set has unit cost. Then, cost of covering a set S is given
by subadditive (in fact, fractionally subadditive) function
c(S) = maxk=1,...,K
|S ∩ Ak| ∀S ⊆ V.
1A function f : 2V → R is subadditive iff f(S ∪ T ) ≤ f(S) + f(T ),∀S, T ⊆ V . f is fractionallysubadditive iff f(S) ≤ ∑
j ajf(Tj), for all S ⊆ V and collection Tj of subsets of S that forms afractional cover of S, i.e., ∀i ∈ S,
∑
j:i∈Tjaj ≥ 1.
CHAPTER 4. LOWER BOUNDS 47
The worst case distribution with marginal probabilities pi = 1/K is one where prob-
abilities Pr(S) = 1/K for S = Ak, k = 1, 2, . . . , K, and Pr(S) = 0 otherwise. The ex-
pected value of c(S) under this distribution is K =√
n. For independent distribution,
c(S) = maxk=1,...,K ζk, where ζk = |S ∩ Ak| are independent (K, 1/K)-binomially dis-
tributed random variables. Using some known statistical results in Kimber (1983), it
can be shown that as K =√
n approaches ∞, E[c(S)] approaches Θ(log n/ log log n).
See Appendix D for details.
So, the correlation gap of cost function c(S) is bounded below by Ω(√
n log log n/ log n).
To get corresponding lower bound on POC , consider a two-stage stochastic set cover
problem where sets cost little more than Ω(log n/√
n log log n) in first stage, and 1
in second stage. Then, on assuming independent distribution, optimal decision is
to buy no or very few sets in the first stage. However, for worst case distribution,
the expected second stage cost of this decision will be√
n. On the other hand, a
robust solution considering worst case distribution is to cover all the elements in
the first stage costing O(log(n)/ log log(n)), and nothing in the second stage. Thus,
POC ≥ Ω(√
n log log n/ log n).
4.1.3 Submodular functions
Lemma 9. There exists an instance (h, 2V , pi) with function h(x, S) that is non-
decreasing and submodular in S, and POC = e/(e − 1). Thus, upper bound given by
Theorem 1 is tight.
Proof. Let V := 1, 2, . . . , n, define submodular function f(S) = 1 if S 6= ∅, and
h(∅) = 0. Let each item i ∈ V have marginal probability pi = 1n
to appear in
random set S. The worst case distribution that maximizes E[f(S)] is the one with
Pr(i) = 1/n for all i ∈ V , with expected value 1. On the independent distribution
with the same marginals, f has an expected cost 1 − (1 − 1n)n → 1 − 1/e as n → ∞.
Thus, correlation gap is e/(e − 1).
To obtain corresponding lower bound on POC , we can extend this example to
a stochastic decision problem with two possible decisions x1, x2 as follows. Define
h(x1, S) = f(S),∀S, and h(x2, S) = 1 − 1e
+ ǫ,∀S for some arbitrarily small ǫ > 0.
CHAPTER 4. LOWER BOUNDS 48
Then, on assuming independent distribution, x1 seems to be the optimal decision,
however it will have expected cost 1 on the worst case distribution. On the other
hand decision x2 would cost 1− 1e+ ǫ in the worst case, giving POC = e
(e−1)− ǫ.
4.1.4 Uncapacitated metric facility location
Lemma 10. There exists an instance of facility location problem with correlation gap
at least 3, and two-stage stochastic facility location with POC ≥ 3.
Proof. Consider following two-stage stochastic metric facility location instance with n
cities. There is a partition of the n cities into√
n disjoint sets A1, . . . , A√n containing
√n cities each. Corresponding to each set B of cities in the Cartesian product B =
A1 × · · · × A√n, there is a facility FB with connection cost 1 to each city in B. The
remaining connection costs are defined by extending the metric, that is, the cost of
connecting any city i to facility FB such that i /∈ B is 3. Assume that each city has
a marginal probability of 1√n
to appear in the random demand set S. Each facility
costs wI = 3 log n√n
in the first stage, and wII = 3 in the second stage.
First, consider the independent distribution case. Regardless of how many fa-
cilities are opened in the first stage, the expected cost in the second stage will be
no more than 3E[maxk |Ak ∩ S|] +√
n. E[maxk |Ak ∩ S|] asymptotically reaches
O( log nlog log n
) = o(log(n)) for large n. Therefore, for any ǫ > 0, for large enough n,
E[maxk |Ak ∩ S|] < ǫ log(n). As a result, if the decision maker assumes indepen-
dent distribution, she will never buy more than√
nǫ facilities in the first stage
which would cost her 3ǫ log(n). However, if the distribution turns out to be of
form Pr(Ak) = 1√n, k = 1, . . . ,
√n, then such a strategy induces an expected cost
g(xI) ≥ 3(1− ǫ)√
n + 3ǫ log(n). A robust solution is to instead build√
n facilities in
the first stage, corresponding to a collection of√
n disjoint sets in the collection B.
These facilities will cover every city with a connection cost of 1. Thus, worst case ex-
pected cost for robust solution g(xR) ≤ 3 log n+√
n. This shows g(xI) ≥ (3−ǫ)g(xR)
for any ǫ > 0.
Since a bound on correlation gap would imply a corresponding bound on POC ,
above example implies a lower bound of 3 on correlation gap.
CHAPTER 4. LOWER BOUNDS 49
4.1.5 Steiner forest
Lemma 11. There exists an instance of Steiner forest network design with correlation
gap at least 2, and two-stage stochastic Steiner forest with POC ≥ 2.
Proof. The following example shows an instance of two stage stochastic Steiner tree
with POC ≥ 2. The construction of this example is very similar to the example used
in previous subsection to show a lower bound on POC for stochastic facility location.
Consider the following instance of two-stage stochastic Steiner tree problem with
n terminal nodes. There is a partition of the n terminal nodes into√
n disjoint sets
A1, . . . , A√n containing
√n nodes each. Corresponding to each set B in the Cartesian
product B = A1 × · · · × A√n, there is a (non-terminal) node vB in the graph which
is connected directly via an edge to each terminal node in B. Assume that each
terminal node has a marginal probability of 1√n
to appear in the demand set S. Each
edge e ∈ E costs wIe = log n√
nin the first stage, and wII
e = 1 in the second stage.
Then, in the optimal decision made using independent distribution at most ǫ√
n
edges will be bought in the first stage which can make available at most ǫ√
n non-
terminal nodes. Since no two nodes in any Ak are directly connected to each other
or to any common non-terminal node, these ǫ√
n non-terminal nodes are directly
connected to at most ǫ√
n nodes in a set Ak. Each of the remaining node in Ak will
require at least two edges in order to be connected to the Steiner tree. Therefore,
if the distribution is of form Pr(Ak) = 1√n, k = 1, . . . ,
√n, then the expected cost of
this decision will be g(xI) ≥ 2√
n(1 − ǫ) + ǫ log(n). A robust solution is to instead
buy enough edges in the first stage so that a set of√
n non-terminal nodes vBcorresponding to a collection of
√n disjoint sets in B are connected to each other.
By construction, any two non-terminal nodes are connected by a path of length at
most 3 to each other, therefore this requires buying at most 3√
n edges in the first
stage costing at most 3 log(n). Also, for any k, each node in Ak is connected directly
to one of these non-terminal nodes. Therefore, the worst case expected cost for this
solution is g(xR) ≤ 3 log(n) +√
n. This shows g(xI) ≥ (2 − ǫ)g(xR) for any ǫ > 0.
Since a bound on correlation gap would imply a corresponding bound on POC ,
above example implies a lower bound of 2 on correlation gap.
CHAPTER 4. LOWER BOUNDS 50
4.2 Tightness of cost-sharing condition
In this section, we show that our upper bound condition based on β-cost-sharing
property is a tight (within a factor of 2) characterization of functions with small
correlation gap.
Theorem 5. For any function f : Ω → R+, if correlation gap of f is upper bounded
by β, then it satisfies β-cost-sharing property. We assume |Ω| < ∞.
We prove a slightly stronger version of above theorem.
Lemma 12. For any instance (f, Ω, pi), |Ω| < ∞, if correlation gap is upper
bounded by β, then there exists a cross-monotonic cost-sharing scheme for f that is
β-budget balanced in expectation with respect to product distribution∏
i pi.
Proof. Let bξ be the product distribution with given marginals, i.e. bξ =∏
i pi(ξi).
Then, the expected value on worst-case (expectation-maximizing) distribution with
marginals pi is given by
maxα
∑
ξ αξf(ξ)
s. t.∑
ξ:ξi=t αξ ≤∑
ξ:ξi=t bξ ∀i, t ∈ Ωi∑
ξ αξ ≤ 1
αξ ≥ 0 ∀ξ.
If correlation gap is bounded by β, then the optimal value of above program is atmost
β∑
ξ bξf(ξ). Adding more constraints can only decrease the optimal value, therefore,
β∑
ξ
bξf(ξ) ≥
maxα
∑
ξ αξf(ξ)
s. t.∑
ξ≤θ:ξi=t αξ ≤∑
ξ≤θ:ξi=t bξ ∀i, t ∈ Ωi,θ ∈ Ω∑
ξ αξ ≤ 1
αξ ≥ 0 ∀ξ.
Let us consider the Lagrangian dual of the above linear program. Let γi,t,θ denote
the dual variable corresponding to constraint for i, t ∈ Ωi,θ ∈ Ω. Then, by strong
CHAPTER 4. LOWER BOUNDS 51
duality,
β∑
ξ
bξf(ξ) ≥
minγ
∑
ξ bξ(∑
i
∑
θ≥ξ γi,ξi,θ) + δ
s. t.∑
i
∑
θ≥ξ γi,ξi,θ + δ ≥ f(ξ) ∀ξ ∈ Ω
γi,t,θ ≥ 0 ∀i, t ∈ Ωi,θ ∈ Ω
δ ≥ 0
Let γ∗i,t,θ, δ∗ be the optimal dual solution. Then, consider cost sharing scheme χ
defined as,
χi(ξ) =∑
θ≥ξ
γ∗i,ξi,θ
+δ∗
n.
We have
• Cross-monotonicity: For every ϑ ≥ ξ such that ϑi = ξi, we have
χi(ξ) =∑
θ≥ξ
γ∗i,ξi,θ
+δ∗
n≥∑
θ≥ϑ
γ∗i,ξi,θ
+δ∗
n=∑
θ≥ϑ
γ∗i,ϑi,θ
+δ∗
n= χi(ϑ)
• (Expected) β-budget-balance: Due to the constraint in the dual program,
for every ξ ∈ Ω,
∑
i
χi(ξ) =∑
i
∑
θ≥ξ
γ∗i,ξi,θ
+δ∗
n≥ f(ξ).
And,∑
ξ
bξ
∑
i
χi(ξ) =∑
ξ
bξ
∑
i
∑
θ≥ξ
γ∗i,ξi,θ
+δ∗
n≤ β
∑
ξ
bξf(ξ)
Observe that above is not simply a proof by counter-example, it shows that if
any function has a small correlation gap, then it will have β-cost-sharing property
with small β. However, there is a gap of factor 2 in our upper bound and lower
bound results. Below is an example of a function which has a 1-cost-sharing but the
correlation gap is arbitrarily close to 2. This example, constructed and communicated
CHAPTER 4. LOWER BOUNDS 52
to us in private correspondence by Jan Vondrak, proves that this factor of 2 cannot
be eliminated using only the cross-monotone cost-sharing criterion.
Example of function with 1-cost-sharing and correlation gap of 2: Consider
function f : 2V → R, for V = 1, . . . , n defined as:
f(S) =
0 for |S| = 0
k for 0 < |S| < k
|S| for |S| >= k
Function f is not submodular, but admits a cross-monotonic cost sharing scheme,
where each element in a set has the same share. The shares are k/|S| in a set of size
|S| < k, and 1 for |S| ≥ k, which are non-increasing with set size.
Now consider a distribution p which generates the whole ground set with probabil-
ity k/n, or else a random singleton; i.e. each singleton with probability (1− k/n)/n.
The expected value of f over this distribution is
ES∼p[f(S)] = nk/n + k(1 − k/n) = 2k − k2/n.
Let us now make n large compared to k, for example n = k3. Then, ES∼p[f(S)] =
2k−1/k. The marginal probabilities of p are pi = k/n+(1−k/n)/n = (k+1−k/n)/n
for each element i ∈ V . For large n, on sampling n times independently with this
probability, the expected size of random set will be very close to k + 1 − k/n (by
a Chernoff bound). The expected value of f on such random set will be something
between k and k + 1. Therefore, the correlation gap of f is arbitrarily close to 2.
Chapter 5
Conclusions
In this paper, we investigated how effective is the simple approach of ignoring correla-
tions for optimization under uncertainty. We introduced a new concept of Correlation
Gap to quantify the deterioration in the performance of a decision obtained assuming
independence, in presence of arbitrary correlations. We believe this concept is es-
pecially attractive because it characterizes the cases when the seemingly pessimistic
worst case joint distribution is close to the more natural independent distribution, in
the sense that former can be substituted by the latter. By proving upper and lower
bounds on correlation gap for a wide range of problems, our research sheds important
insights on when correlations can be ignored in practice. We also show that many
deterministic optimization problems that involve matching or partitioning constraints
can be formulated as the problem of computing worst case distribution with given
marginals. Hence, our results provide approximation algorithms for those as well.
Finally, our methodology of bounding correlation gap using cost-sharing schemes is
a novel application of these algorithmic game theory techniques and deserves further
study.
Some directions of future research include reducing the gap when the available
information can be strengthened to include some partial knowledge about correlations.
Such partial information may be available in form of partial or complete knowledge of
covariance matrix; or partial knowledge of dependence structure in the form of edges
in graphical models such as Markov random fields and Bayesian networks.
53
Bibliography
Shipra Agrawal, Yichuan Ding, Amin Saberi, and Yinyu Ye. Correlation Robust Stochastic
Optimization. In SODA ’10: Proceedings of the sixteenth annual ACM-SIAM sympo-
sium on Discrete algorithms, 2010.
Shabbir Ahmed and Alexander Shapiro. The Sample Average Approximation Method for
Stochastic Programs with Integer Recourse. SIAM Journal on Optimization, 12:479–
502, 2002.
Wolfgang W. Bein, Peter Brucker, James K. Park, and Pramod K. Pathak. A Monge Prop-
erty for the d-Dimensional Transportation Problem. Discrete Applied Mathematics, 58
(2):97–109, 1995.
A. Ben-Tal and A. Nemirovski. Robust Convex Optimization. Mathematics of Operations
Research, 23(4):769–805, 1998. ISSN 0364-765X.
Aharon Ben-Tal. Robust Optimization - Methodology and Applications. Mathematical
Programming, 92(3):453–480, 2001.
Aharon Ben-Tal and Arkadi Nemirovski. Robust solutions of Linear Programming problems
contaminated with uncertain data. Mathematical Programming, 88:411–424, 2000.
Piotr Berman. A d/2 approximation for maximum weight independent set in d-claw free
graphs. Nordic Journal of Computing, 7:178–184, September 2000. ISSN 1236-6064.
D. Bertsimas and I. Popescu. Optimal Inequalities in Probability Theory: A Convex Opti-
mization Approach. SIAM Journal on Optimization, 15(3):780–804, 2005.
D. Bertsimas, I. Popescu, and J. Sethuraman. Moment Problems and Semidefinite Pro-
gramming. Kluwer Academic Publishers, 2000.
Dimitris Bertsimas and Melvyn Sim. The Price of Robustness. Operations Research, 52(1):
35–53, 2004. ISSN 0030-364X.
54
BIBLIOGRAPHY 55
Dimitris Bertsimas, Karthik Natarajan, and Chung-Piaw Teo. Probabilistic Combinatorial
Optimization: Moments, Semidefinite Programming, and Asymptotic Bounds. SIAM
Journal on Optimization, 15:185–209, January 2005.
Yvonne Bleischwitz and Burkhard Monien. Fair cost-sharing methods for scheduling jobs on
parallel machines. Journal of Discrete Algorithms, 7:280–290, September 2009. ISSN
1570-8667.
R. E. Burkard, B. Klinz, and R. Rudolf. Perspectives of Monge properties in optimization.
Discrete Applied Mathematics, 70(2):95–161, 1996.
Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing a Submod-
ular Set Function Subject to a Matroid Constraint (Extended Abstract). In IPCO,
pages 182–196, 2007.
Moses Charikar, Chandra Chekuri, and Martin Pal. Sampling Bounds for Stochastic Opti-
mization. In APPROX-RANDOM, pages 257–269, 2005.
Xin Chen, Melvyn Sim, and Peng Sun. A Robust Optimization Perspective on Stochastic
Programming. Operations Research, 55(6):1058–1071, 2007.
Erick Delage and Yinyu Ye. Distributionally Robust Optimization Under Moment Uncer-
tainty with Application to Data-Driven Problems. Operations Research, 58:595–612,
May 2010.
Ulrich Derigs. On two methods for solving the bottleneck matching problem. In Lecture
Notes in Control and Information Sciences, volume 23. Springer Berlin / Heidelberg,
1980.
J. Dupacova. The Minimax Approach to Stochastic Programming and an Illustrative Ap-
plication. Stochastics, 20(1):73–88, 1987.
J. Dupacova. Stochastic Programming: Minimax Approach. Encyclopedia of Optimization,
5:327–330, 2001.
Y. Ermoliev, A. Gaivoronski, and C. Nedeva. Stochastic optimization problems with in-
complete information on distribution functions. SIAM Journal on Control and Opti-
mization, 23:697, 1985.
Uriel Feige. On maximizing welfare when utility functions are subadditive. In Proceedings
of the thirty-eighth annual ACM symposium on Theory of computing, STOC ’06, pages
41–50, 2006.
BIBLIOGRAPHY 56
A.A. Gaivoronski. A numerical method for solving stochastic programming problems with
moment constraints on a distribution function. Annals of Operations Research, 31(1):
347–369, 1991.
Joel Goh and Melvyn Sim. Distributionally Robust Optimization and Its Tractable Ap-
proximations. Operations Research, 58(4-Part-1):902–917, 2010.
M. Grotschel, L. Lovasz, and A. Schrijver. Geometric algorithms and combinatorial opti-
mization. Springer Berlin, 1988.
F. Gul and E. Stacchetti. Walrasian Equilibrium with Gross Substitutes. Journal of Eco-
nomic Theory, 87(1):95–124, 1999.
Anupam Gupta, Martin Pal, R. Ravi, and Amitabh Sinha. Boosted sampling: Approxi-
mation algorithms for stochastic optimization. In In Proceedings of the 36th Annual
ACM Symposium on Theory of Computing, pages 417–426, 2004.
Anupam Gupta, Aravind Srinivasan, and Eva Tardos. Cost-sharing mechanisms for network
design. Algorithmica, 50:98–119, December 2007. ISSN 0178-4617.
Elad Hazan, Shmuel Safra, and Oded Schwartz. On the complexity of approximating k-set
packing. Computational Complexity, 15(1):20–39, 2006. ISSN 1016-3328.
C.A.J. Hurkens and A. Schrijver. On the size of systems of sets every t of which have an
SDR, with an application to the worst-case ratio of heuristics for packing problems.
SIAM Journal on Discrete Mathematics, 2(1):68–72, 1989.
Nicole Immorlica, Mohammad Mahdian, and Vahab S. Mirrokni. Limitations of Cross-
monotonic Cost-sharing Schemes. ACM Transactions on Algorithms, 4(2):1–25, 2008.
ISSN 1549-6325.
A.S. Kelso Jr and V.P. Crawford. Job matching, coalition formation, and gross substitutes.
Econometrica: Journal of the Econometric Society, 50(6):1483–1504, 1982.
A. C. Kimber. A note on Poisson maxima. Probability Theory and Related Fields, 63:
551–552, 1983.
W. Klein Haneveld. Robustness against dependence in PERT: An application of duality and
distributions with known marginals. Mathematical Programming Studies, 27:153–182,
1986.
Jon Kleinberg, Yuval Rabani, and Eva Tardos. Allocating bandwidth for bursty connections.
In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing,
STOC ’97, pages 664–673, 1997.
BIBLIOGRAPHY 57
Jochen Konemann, Stefano Leonardi, and Guido Schafer. A group-strategyproof mechanism
for Steiner forests. In Proceedings of the sixteenth annual ACM-SIAM symposium on
Discrete algorithms, SODA ’05, pages 612–619, 2005.
C. M. Lagoa and B. R. Barmish. Distributionally robust Monte Carlo simulation: a tutorial
survey. In Proceedings of the IFAC World Congress, pages 1–12, 2002.
H.J. Landau. Moments in Mathematics: Lecture Notes Prepared for the AMS Short Course.
American Mathematical Society, 1987.
Boris Lavric. Continuity of Monotone Functions. Archivum Mathematicum, 29:1–4, 1993.
ISSN 0044-8753.
Stefano Leonardi and Guido Schaefer. Cross-monotonic cost-sharing methods for connected
facility location games. In EC ’04: Proceedings of the 5th ACM conference on Elec-
tronic commerce, pages 242–243, 2004.
Mohammad Mahdian and Martin Pal. Universal facility location. In in Proc. of ESA 03,
pages 409–421, 2003.
V. Mirrokni, M. Schapira, and J. Vondrak. Tight information-theoretic lower bounds for
welfare maximization in combinatorial auctions. In Proceedings of the 9th ACM con-
ference on Electronic commerce, pages 70–77. ACM, 2008.
Rolf H. Mohring, Andreas S. Schulz, and Marc Uetz. Approximation in stochastic schedul-
ing: the power of LP-based priority policies. Journal of the ACM, 46(6):924–942, 1999.
ISSN 0004-5411.
Herve Moulin. Incremental cost sharing: Characterization by coalition strategy-proofness.
Social Choice and Welfare, 16:279–320, 1999.
Herve Moulin and Scott Shenker. Strategyproof sharing of submodular costs: budget bal-
ance versus efficiency. Economic Theory, 18:511–533, 2001.
Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V. Vazirani. Algorithmic Game
Theory. Cambridge University Press, New York, NY, USA, 2007. ISBN 0521872820.
Martin Pal and Eva Tardos. Group Strategyproof Mechanisms via Primal-Dual Algorithms.
In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer
Science, FOCS ’03, pages 584–593, 2003.
I. Popescu. Robust Mean-Covariance Solutions for Stochastic Optimization. Operations
Research, 55(1):98, 2007.
BIBLIOGRAPHY 58
A. Prekopa. Stochastic programming. Kluwer Academic Publishers, 1995.
W. W. Rogosinski. Moments of Non-negative Mass. Proceedings of the Royal Society of
London. Series A, Mathematical and Physical Sciences, 245(1240):1–27, 1958.
Tim Roughgarden and Mukund Sundararajan. New trade-offs in cost-sharing mechanisms.
In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on Theory of
computing, pages 79–88, New York, NY, USA, 2006. ACM. ISBN 1-59593-134-1.
A. Ruszczynski and A. Shapiro, editors. Stochastic Programming, volume 10 of Handbooks
in Operations Research and Management Science. Elsevier, 2003.
Herbert E. Scarf. A min-max solution of an inventory problem. Studies in The Mathematical
Theory of Inventory and Production, pages 201–209, 1958.
A. Shapiro. Worst-case distribution analysis of stochastic programs. Mathematical Pro-
gramming, 107(1):91–96, 2006.
A. Shapiro and S. Ahmed. On a class of minimax stochastic programs. SIAM Journal on
Optimization, 14(4):1237–1252, 2004.
A. Shapiro and A. Kleywegt. Minimax analysis of stochastic problems. Optimization Meth-
ods and Software, 17(3):523–542, 2002.
Chaitanya Swamy and David B. Shmoys. Sampling-based approximation algorithms for
multistage stochastic optimization. In In Proceedings of the 46th Annual IEEE Sym-
posium on Foundations of Computer Science, pages 357–366, 2005.
Jan Vondrak. Optimal approximation for the submodular welfare problem in the value
oracle model. In STOC ’08: Proceedings of the 40th annual ACM symposium on
Theory of computing, pages 67–74, 2008. ISBN 978-1-60558-047-0.
J. Yue, B. Chen, and M.C. Wang. Expected Value of Distribution Information for the
Newsvendor Problem. Operations research, 54(6):1128, 2006.
J. Zackova. On Minimax Solutions of Stochastic Linear Programming Problems. Casopis
pro pestovanı matematiky, 91(4):423–430, 1966.
Appendix A
Proof of Theorem 4
We assume that the random variable ξ is a vector n binary variables. That is, it
represents a random subset S of a ground set V = 1, . . . , n. Consider function
f(S) : 2V → R. The marginal probability of each i ∈ V to appear in the random set
S is given by pi. Then, the worst case distribution under given marginals pini=1 is
given by the optimal solution of following linear program:
maxα
∑
S⊆V αSf(S)
s.t.∑
S:i∈S αS = pi, ∀i ∈ V∑
S αS = 1
αS ≥ 0 .
(A.1)
By the equivalence of separation and optimization (Grotschel et al. (1988)), solving
above problem is equivalent to solving the separation problem
maxS
f(S) −∑
i∈S
λi, (A.2)
given any λ ∈ Rn+. We show that there exists a non-negative monotone submod-
ular function f such that the above separation problem is at least as hard as the
MAX-CUT problem.
Definition 14. Given an undirected graph G = (V,E), a cut in G is a subset S ⊆ V .
59
APPENDIX A. PROOF OF THEOREM 4 60
Let S = V \S, and let E(S, S) denote the set of edges with one vertex in S and one
vertex in S. The MAX-CUT problem is to find the cut S that maximizes |E(S, S)|.The MAX-CUT problem is NP-hard.
Claim 1. Problem (A.2) for monotone submodular functions f is at least as hard as
the MAX-CUT problem.
Proof. Proof. Consider graph G = (V,E). For any set S ⊆ V , define the value of
f(S) as two times the number of edges that have at least one of their end points
in S. Observe that function f(S) is monotone and submodular. Define λi as the
number of edges incident on vertex i. Then, the proof is completed by observing that
|E(S, S)| = f(S) −∑i∈S λi. The observation holds because∑
i∈S λi counts every
edge with its both endpoints in S twice, but the edge going from S to S only once.
On the other hand, f(S) counts each of these edges twice.
This proves that finding worst case distribution is NP-hard even when function
f is restricted to be monotone and submodular. To prove the approximation hard-
ness of O(n−1/2) for monotone submodular functions, we use the connection of our
problem to welfare maximization problem discussed in Section 3.2.2 (refer to equa-
tion (3.3)). In Mirrokni et al. (2008), the authors proved the following hardness result.
Theorem (Mirrokni et al. (2008)): For any ǫ > 0, achieving an approxima-
tion ratio of n−1/2+ǫ for welfare maximization problem (3.3) with identical monotone
subadditive utility functions f(S) requires exponential number of function evaluations.
Thus, assuming each function evaluation takes at least a constant time, the prob-
lem cannot be approximated within a factor better than n−1/2+ǫ in polynomial time.
In fact, they prove this bound for a more restrictive class of fractionally subadditive
functions. On the other hand, Feige (2006) proved that for any subadditive function
(in fact fractionally subadditive), one can round any solution to the LP relaxation
of (3.3) in polynomial time to obtain a feasible allocation for welfare maximization
problem of value at least 1/2 of the value of LP solution.
APPENDIX A. PROOF OF THEOREM 4 61
As we observed in Section 3.2.2, the LP relaxation of (3.3) is equivalent to the
problem of finding worst case distribution with given marginal probabilities. Now,
suppose that there were a polynomial time algorithm solving the worst case distri-
bution problem within an approximation factor better than 2n−1/2+ǫ. Then, using
the polynomial-time rounding technique of Feige (2006) on the obtained LP solution,
we could obtain an approximation factor better than n−1/2+ǫ for welfare maximiza-
tion problem, thus contradicting the above theorem of Mirrokni et al. (2008). This
completes the proof.
Appendix B
Proof for binary random variables
(details)
B.1 Properties of Split Operation
In this section, we prove the properties of Split operation used in the proof of Lemma
2. For an instance (f, Ω, pi), we will use L (f, Ω, pi) and I (f, Ω, pi) to denote
the expected value under worst case (expectation-maximizing) distribution and inde-
pendent distribution, respectively. Given function f(S), we use f ′(S ′) to denote the
function obtained after splitting.
Property 1. If f is a non-decreasing function, then so is f ′.
Proof. Proof. Monotonicity holds since for any S ′ ⊆ T ′ ⊆ V ′, Π(S ′) ⊆ Π(T ′), and
f ′(S ′) = f(Π(S ′)) ≤ f(Π(T ′)) = f ′(T ′).
Property 2. If f has β-cost sharing property, then so does f ′.
Proof. We construct cost-sharing scheme χ′ for f ′ from a cost-sharing scheme χ for f
as follows. Consider an arbitrary but fixed ordering on elements of V ′. Cost-sharing
62
APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 63
scheme χ′ coincides with the original scheme χ for the sets without duplicates, but
for a set with duplicates, it assigns the cost-share solely to the copy with smallest
index (as per the fixed ordering). That is, for any S ′ ⊆ V ′, and item Cij (j-th copy of
item i) in S ′,
χ′ij(S
′) =
χi(S), if j = minh : Cih ∈ S ′,
0, o.w. ,(B.1)
where S = Π(S ′), and min computes the lowest index with respect to the fixed
ordering on elements.
To make sure that the new cost-sharing scheme χ′ has expected β-budget-balance
with respect to product distribution for given marginals p′ij, we need to choose
the original cost-sharing scheme χ carefully. Pick χ as the cost-sharing scheme that
satisfied expected β-budget balance with respect to the product distribution with
marginals pi, where pi is defined as
pi = Pr(Cih ∈ S ′ for some h = 1, . . . ,mi),
when S ′ is distributed according to product distribution with marginals p′ij. Since
f has a cross-monotonic, expected β-budget balanced scheme with respect to any
product distribution, such a scheme χ exists. Now, let p denote product distribution
with marginals pi , and p′ denote product distribution on S ′ with marginals p′ij.Then,
Ep′ [∑
ij:Cij∈S′
χ′ij(S
′)] = Ep[∑
i∈S
χi(S)] ≤ Ep[f(S)] = Ep′ [f′(S ′)].
Also, for any S ′, and S = Π(S ′),
∑
ij:Cij∈S′
χ′ij(S
′) =∑
i∈S
χi(S) ≥ f(S)
β=
f ′(S ′)
β.
To observe that cross-monotonicity holds, consider S ′ ⊆ T ′. Now, for any Cij ∈ S ′,
if j is not a lowest indexed copy of i in T ′, then χ′ij(T
′) = 0, so that the condition
is automatically satisfied. Let j is the lowest indexed copy of i in T ′, then it must
APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 64
have been a lowest indexed copy in S ′, since S ′ is a subset of T ′. Then, by cross-
monotonicity of χ:
χ′ij(T
′) = χi(T ) ≤ χi(S) = χ′ij(S
′),
where S = Π(S ′), T = Π(T ′).
Property 3. If f is submodular, then so is f ′.
Proof. For submodularity, consider any S ′, T ′, with S = Π(S ′), T = Π(T ′). Then,
observe that Π(S ′∪T ′) = S ∪T , and Π(S ′∩T ′) ⊆ S ∩T . Therefore, by monotonicity
and submodularity of f ,
f ′(S ′ ∪ T ′) + f ′(S ′ ∩ T ′) ≤ f(S ∪ T ) + f(S ∩ T ) ≤ f(S) + f(T ) = f ′(S ′) + f ′(T ′).
Property 4. If f(S) is non-decreasing in S, then after splitting,
L (f ′, 2V ′
, p′r) = L (f, 2V , pi).
Proof. Proof. Suppose item 1 is split into m1 pieces, and each piece is assigned a
probability p1
m1
. Let αS denote the worst case distribution with given marginals
for the instance (f, 2V , pi), where αS denotes the probability of set S. Then we
can construct a distribution for the new instance (f ′, 2V ′
, p′r) which has the same
objective value by assigning non-zero probabilities to only those sets which have no
duplicates. That is, define
∀S ′ ⊆ V ′, α′S′ =
αΠ(S′), if S ′ contains no copies of item 11
m1
αΠ(S′), if S ′ contains exactly one copy of item 1
0, otherwise
One can verify that α′S′ is a feasible distribution for the new instance (f ′, 2V ′
, p′r),i.e., it satisfies the marginal distribution constraints. And, it has the same objective
value as L (f, 2V , pi). Hence, L (f, 2V , pi) ≤ L (f ′, 2V ′
, p′r).
APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 65
For the other direction, consider a worst case distribution α′S′ for the new in-
stance. It is easy to see that there exists an optimal distribution α′S′ such that
α′S′ = 0 for all S ′ that contain more than one copy of item 1. To see this, assume
for contradiction that some set with non-zero probability has two copies of item 1.
By definition of f ′, removing one copy will not decrease the function value. Then,
because of monotonicity of f ′, we can move out one copy to another set T that has
no copy of item 1. Such T always exists since the probabilities of copies of item 1
must sum up to p1 ≤ 1. So, we can assume that in the optimal distribution α′S′ = 0
for any set S ′ containing more than one copy. Then, we can set αS =∑
S′:Π(S′)=S α′S′
to form a feasible distribution for the original instance with same objective value as
L (f ′, 2V ′
, p′r). Thus, L (f ′, 2V ′
, p′r) ≤ L (f, 2V , pi).We can apply the argument recursively for all the items to prove the lemma.
Next, we prove that the expected cost under independent distribution can only
decrease on splitting.
Property 5. If f(S) is non-decreasing in S, then after splitting,
I (f ′, 2V ′
, p′r) ≤ I (f, 2V , pi).
Proof. Proof. Let (f ′, 2V ′
, p′r) denote the new instance obtained by splitting item
1 into m1 pieces. Denote
Λ := S ′ ⊆ V ′|S ′ contains at least one copy of item 1,
and denote π = Pr(S ′ ∈ Λ). Consider the expected cost under independent Bernoulli
distribution. By independence,
I (f ′, 2V ′
, p′r) = ES′ [f ′(S ′) I(S ′ ∈ Λ)] + ES′ [f ′(S ′) I(S ′ /∈ Λ)]
= π ES⊆V \1[f(S ∪ 1)] + (1 − π) ES⊆V \1[f(S)]
≤ p1 ES⊆V \1[f(S ∪ 1)] + (1 − p1) ES⊆V \1[f(S)]
= I (f, 2V , pi).
The second last inequality holds because π = 1 − (1 − p1
m1
)m1 ≤ p1, and f(S) ≤
APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 66
f(S ∪ 1) by monotonicity. Repeating this argument for all items completes the
proof.
Property 6. If in instance (f, Ω, pi), f has a β-prefix-cost-sharing property, then
in the nice instance (f ′, Ω′, 1/K) obtained by splitting, f ′ has an order specific cost-
sharing scheme with respect to any fixed ordering consistent with partial order given
by AK , . . . , A1 , such that it satisfies
(a) (expected) β-budget balance on the product distribution with marginals 1/K,
(b) prefix property,
(c) cross-monotonicity for any S ′ ⊆ T ′ such that S ′ is partial-prefix of T ′ with respect
to AK , . . . , A1.
Here AK , . . . , A1 denote the disjoint sets in the support of worst case distribution for
the nice instance (f ′, Ω′, 1/K).
Proof. Let σ′ denote a fixed ordering consistent with partial ordering AK , . . . , A1, and
σ′S′ denote the restriction of σ′ to elements in S ′. We construct cost-sharing scheme
χ′ for the nice instance as follows.
χ′(Cij, S
′, σ′S′) =
χ(i, S, σS), j = minh : Cih ∈ S ′,
0, o.w.(B.2)
where σS is the ordering of lowest index copies in σ′S′ , S = Π(S ′), χ is a β-budget
balanced cost-sharing scheme with prefix property with respect to ordering σS, and
min is taken with respect to the ordering σ′S′ . And, χ is a cross-monotonic cost-
sharing scheme for function f that is expected β-budget balanced with respect to
product distributions with marginals pi, defined as
pi = Pr(Cih ∈ S ′ for some h = 1, . . . ,mi),
when S ′ is distributed according to product distribution with marginals 1/K.Then, χ′ satisfies (expected) β-budget-balance with respect product distribution
with marginals 1/K, the proof is same as in the proof of Property 2.
APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 67
Let S(i) denote the items in S preceding and including i, with respect to the given
ordering. Then, due to prefix property of χ, and definition of χ′,
χ′(Cij, S
′, σ′S′) = χ(i, S, σS) = χ(i, S(i), σS) = χ′(Ci
j, S′(Ci
j), σ′S′).
Therefore χ′ has a prefix property with respect to σ′.
For cross-monotonicity, consider S ′ ⊆ T ′ such that S ′ is a “partial prefix” of T ′.
Now, for any i′ ∈ S ′, if i′ is not a lowest indexed copy in T ′, then χ(i′, T ′, σ′T ′) = 0,
so that the condition is automatically satisfied. Let i′ is one of the lowest indexed
copies in T ′, then it must have been a lowest indexed copy in S ′, since S ′ is a subset
of T ′. Thus,
χ(i′, T ′, σ′T ′) = χ(i, T, σT ) ≤ χ(i, S, σS) = χ(i′, S ′, σ′
S′)
where S = Π(S ′), T = Π(T ′), σS, σT are the orderings of lowest indexed copies in
S ′, T ′ respectively. Note that the inequality in above uses cross-monotonicity of χ,
which inherently assumes that the two cost-sharing schemes – for orderings σS, σT –
coincide. This is true if we have that σS ⊆ σT , that is, if the ordering of elements of
S is same in σS and σT . We show that this is true given the assumption that σS′ , σT ′
respect the partial ordering AK , . . . , A1, and S ′ is a “partial prefix” of T ′. That is,
S ′ ⊆ AK ∪ · · · ∪ Ak, and T ′\S ′ ⊆ Ak ∪ · · · ∪ A1 for some k.
To see this, observe that the splitting was performed in a manner so that atmost
one copy of any element appears in each Ak. So, among the newly added copies
T ′\S ′, any copy of an element of S can occur only in T ′ ∩ Ak+1 or later. Since
S ′ ⊆ A1 ∪ · · · ∪ Ak, this means that for any element i ∈ S, the newly added copies
occur only later in the ordering and they cannot alter the order of lowest indexed
copies of elements of S. This proves that σS ⊆ σT .
B.2 Handling irrational probabilities
In this section, we show that the bounds on correlation gap in Section 2.2.1 for binary
random variables continue to hold even if the probabilities pi are irrational.
APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 68
For any irrational vector pi, there exists a nondecreasing sequence pℓi of ra-
tional vectors such that
pℓi → pi as ℓ → ∞
Then, for each instance (f, Ω, pℓi), since the marginals are rational, we have that
E(pℓ)∗ [f(S)]
Epℓ [f(S)]≤ κ
where (pℓ)∗ and pℓ are worst case and independent joint distributions, respectively,
for instance (f, Ω, pℓi). κ = e/(e− 1) if f is monotone and submodular, and κ = 2β
if f is monotone and has a β-cost-sharing scheme.
Let p∗ and p denote the worst case and independent distribution, respectively, for
instance (f, Ω, pi). Now, observe that due to monotonicity of f , Epℓ [f(S)] is non-
decreasing in ℓ and upper-bounded by Ep[f(S)] < ∞ (by assumption). Therefore, by
monotone convergence theorem Epℓ [f(S)] converges to supℓ Epℓ [f(S)] = Ep[f(S)].
Also, E(pℓ)∗ [f(S)] is non-decreasing in ℓ, and upper-bounded by κEpℓ [f(S)] < ∞.
Therefore, by monotone convergence theorem E(pℓ)∗ [f(S)] converges to supℓ E(pℓ)∗ [f(S)] =
Ep∗ [f(S)].
And hence,E(pℓ)∗ [f(S)]
Epℓ [f(S)]→ Ep∗ [f(S)]
Ep[f(S)]≤ κ
Appendix C
Proof for finite domains (details)
In this section, we rigorously prove the properties of reduction in Lemma 4. As
before, for an instance (f, Ω, pi), we will use L (f, Ω, pi) and I (f, Ω, pi) to
denote the expected value under worst case (expectation-maximizing) distribution
and independent distribution respectively.
We assume w.l.o.g. f(0) = 0. Otherwise, we could instead consider function
f(ξ) − f(0), and the approximation factor proven would hold for f(ξ).
Claim 2. The reduction preserves the monotonicity and β-cost-sharing/submodularity
of the function.
Proof. Consider any ξ′,θ′ ∈ Ω′, with ξ = Θ(ξ′), θ = Θ(θ′).
By definition, if ξ′ ≥ θ′, then ξ ≥ θ. Therefore, monotonicity of function f ′
follows directly from monotonicity of f ,
f ′(ξ′) = f(ξ) ≥ f(θ) = f ′(θ′) .
We define cost-sharing scheme χ′ for f ′ using a cost-sharing scheme χ for f , as follows:
χ′ij(ξ
′) =
χi(ξ) if ξi = j,
0 o.w. .
Given a set of marginals p′ij for the new instance, to ensure that the χ′ has expected
69
APPENDIX C. PROOF FOR FINITE DOMAINS (DETAILS) 70
β-budget-balance with respect to product distribution∏
ij p′ij, we need to pick χ
carefully. We pick χ as a cross-monotone cost-sharing scheme for f that satisfies
expected β-budget balance with respect to the product distribution with marginals
pi defined as the probability
pi(j) = Pr(Θ(ξ′)i = j),
when ξ′ is distributed as product distribution with marginals p′ij. Since f has a
cross-monotonic, expected β-budget balanced scheme with respect to any product
distribution, such a scheme χ exists. Now, let p denote product distribution with
marginals pi , and p′ denote product distribution with marginals p′ij. Then,
f ′(ξ′)
β=
f(ξ)
β≤
n∑
i=1
χi(ξ) =∑
i,j=ξi
χ′ij(ξ
′) =∑
i,j
χ′ij(ξ
′).
Ep′ [f′(ξ′)] = Ep[f(ξ)] ≥ Ep[
n∑
i=1
χi(ξ)] = Ep′ [∑
i,j
χ′ij(ξ
′)].
So, χ′ is expected β-budget-balanced with respect to product distribution p′. For
cross-monotonicity, given i, j, if χ′ij(ξ
′) = 0, then the condition is automatically
satisfied. Otherwise, j = maxr : ξ′ir = 1, and
χ′ij(ξ
′ij, ξ
′−ij) = χi(j, ξ−i).
Now, for θ′−ij ≥ ξ′
−ij, assume j remains the maximum index (otherwise, χ′ij(ξ
′ij,θ
′−ij) =
0, and the condition is automatically satisfied), then, χ′ij(ξ
′ij,θ
′−ij) = χi(Θ(ξ′ij,θ
′−ij)).
Therefore,
χ′ij(ξ
′ij,θ
′−ij) = χi(j, Θ(ξ′ij,θ
′−ij)−i) ≤ χi(j, Θ(ξ′ij, ξ
′−ij)−i) = χi(j, ξ−i) = χ′
ij(ξ′ij, ξ
′−ij).
The inequality follows since
Θ(ξ′ij,θ′−ij)−i ≥ Θ(ξ′ij, ξ
′−ij)−i.
APPENDIX C. PROOF FOR FINITE DOMAINS (DETAILS) 71
For submodularity, consider any ξ′,θ′, with ξ = Θ(ξ′), θ = Θ(θ′). Then, observe
that Θ(maxξ′,θ′) = max(ξ,θ), and Θ(minξ′,θ′) ≤ min(ξ,θ). Therefore, by
monotonicity and submodularity of f ,
f ′(maxξ′, θ′)+f ′(minξ′, θ′) ≤ f(maxξ, θ)+f(minξ, θ) ≤ f(ξ)+f(θ) = f ′(ξ′)+f ′(θ′).
Claim 3. The reduction does not change the expected value under worst case distri-
bution.
Proof. To show that expected value over worst-case distribution does not change,
it is sufficient to observe that for any distribution in the original instance, we can
construct a distribution with same expected value in the new instance and vice-versa.
For the first direction, one can simply construct a new distribution by replacing each
ξ in the support of original distribution with ξ′ such that only non-zero components
of ξ′ are ξ′iξi= 1,∀i, and all other components are 0. For other direction, suppose
that there were a ξ′ in the support of worst case distribution with multiple non-zero
components among ξ′i = ξ′ijKi
j=1, for some i. Then, we could move all the non-zero
components, except one, to some other ξ′′
iwith no non-zero components. Due to
marginal distribution constraints such ξ′′ would exist in the support of worst case
distribution. And, due to monotonicity of f , this move cannot decrease the expected
value.
Claim 4. The reduction can only decrease the expected value under independent dis-
tribution.
Proof. For simplicity, let us start by comparing the original instance (f, Ω, pi) to
the new instance (f, Ω′, p′ij) formed by replacing only the first random variable ξ1
by K1 binary random variables ξ′11, . . . , ξ′1Ki
, while keeping the remaining variables
intact. That is, Ω′ = 0, 1K1
∏
i6=1 Ωi. For any ξ′ ∈ Ω′, ξ = Θ(ξ′) is given by
ξ1 = maxj : ξ′1j = 1, and ξi = ξ′i for i 6= 1. Also p′1j = p1(j) for j = 1, . . . , K1, and
p′i = pi for i 6= 1.
APPENDIX C. PROOF FOR FINITE DOMAINS (DETAILS) 72
Now, given independent distribution over Ω′, denote by πj the probability that
ξ1 = Θ(ξ′)1 takes value j. Then, expected value over independent distribution
I (f ′, Ω′, p′ij) =
K1∑
j=0
E [f ′(ξ′) I(Θ(ξ′)1 = j)]
=
K1∑
j=1
πj E[f(j, ξ−1)] + (1 −∑
j 6=0
πj) E[f(0, ξ−1)]
≤K1∑
j=1
p′1j E[f(j, ξ−1)] + (1 −∑
j 6=0
p′1j) E[f(0, ξ−1)]
=
K1∑
j=1
p1(j) E[f(j, ξ−1)] + (1 −∑
j 6=0
p1(j)) E[f(0, ξ−1)]
= I (f, Ω, pi).
The inequality in above holds because πj ≤ p′1j, and f(0, ξ−1) ≤ f(j, ξ−1) by mono-
tonicity. Repeating this argument for all the components i = 1, . . . , n completes the
proof.
Appendix D
Maximum of Poisson Random
Variables
In this section, we show that the expected value of the maximum of a set of M
independent identically distributed Poisson random variables can be bounded as
O(log M/ log log M) for large M .
Let λ denote the mean, and F denote the distribution of i.i.d. Poisson variables
Xi. Define G = 1 − F . Also define continuous extension of G,
Gc(x) = exp(−λ)∞∑
j=1
λ(x+j)/Γ(x + j + 1).
Note that G(k) = Gc(k) for any non-negative integer k. Let sequence ak∞k=1 is
defined by Gc(ak) = 1/k. Define continuous function L(x) = log(x)/ log log(x). Then,
in Kimber (1983), it is shown that for large k, ak ∼ L(k).
We use these asymptotic results to derive a bound on expectation of Z = maxi=1,...,M Xi
73
APPENDIX D. MAXIMUM OF POISSON RANDOM VARIABLES 74
for large M .
E[Z] =∞∑
k=0
Pr(Z > k)
=
⌈L(M2)⌉∑
k=0
Pr(Z > k) +∞∑
k=⌈L(M2)⌉+1
Pr(Z > k)
≤ L(M2) + 1 +
∫ ∞
x=L(M2)
Pr(Z > x)dx . (D.1)
Next, we show that the integral term on the right hand side is bounded by a constant
for large M . Substituting x = L(y) in the integration on the right hand side, we get
∫ ∞
x=L(M2)
Pr(Z > x)dx
=
∫ ∞
L(y)=L(M2)
Pr(Z > L(y))L′(y)dy
≤∫ ∞
y=M2
Pr(Z > L(y))1
ydy .
L′(y) denotes the derivative of function L(y). The last step follows because L′(y) ≤ 1y
for large enough y (i.e. if log log y ≥ 1). Further, since Pr(Z>L(k))k
is a decreasing
function in k, it follows that:
∫ ∞
y=M2
Pr(Z > L(y))
ydy ≤
∞∑
k=M2
Pr(Z > L(k))
k.
Now, for large k, L(k) ∼ ak, and
Pr(Z > ak) ≤ 1 − (1 − Gc(ak))M = 1 −
(
1 − 1
k
)M
.
APPENDIX D. MAXIMUM OF POISSON RANDOM VARIABLES 75
Therefore, for large M ,
∞∑
k=M2
Pr(Z > L(k))
k≤
∞∑
k=M2
1
k− 1
k
(
1 − 1
k
)M
≤∞∑
k=M2
2M
k2
≤ 1 .
This proves that the integral term on the right hand side of (D.1) is bounded by a
constant, and thus, for large M ,
E[Z] ≤ L(M2) + 2 = O(log M/ log log M).