optimization under uncertainty: a dissertationkb071qr2204/... · in scope and quality as a...

85
OPTIMIZATION UNDER UNCERTAINTY: BOUNDING THE CORRELATION GAP A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Shipra Agrawal May 2011

Upload: others

Post on 28-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

OPTIMIZATION UNDER UNCERTAINTY:

BOUNDING THE CORRELATION GAP

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Shipra Agrawal

May 2011

Page 2: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/kb071qr2204

© 2011 by Shipra Agrawal. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii

Page 3: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Yinyu Ye, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Timothy Roughgarden, Co-Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Ashish Goel

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Amin Saberi

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Page 4: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Preface

Modern decision models increasingly involve parameters that are unknown or un-

certain. Uncertainty is typically modeled by probability distribution over possible

realizations of some random parameters. In presence of high dimensional multivari-

ate random variables, estimating the joint probability distributions is difficult, and

optimization models are often simplified by assuming that the random variables are

independent. Although popular, the effect of this heuristic on the solution quality

was little understood. This thesis centers around the following question:

“How much can the expected cost increase if the

random variables are arbitrarily correlated?”

We introduce a new concept of Correlation Gap to quantify this increase. For given

marginal distributions, Correlation Gap compares the expected value of a function

on the worst case (expectation maximizing) joint distribution to its expected value

on the independent (product) distribution.

Correlation gap captures the “Price of Correlations” in stochastic optimization –

using a distributionally robust stochastic programming model, we show that a small

correlation gap implies that the efficient heuristic of assuming independence is actu-

ally robust against any adversarial correlations, while a large correlation gap suggests

that it is important to invest more in data collection and learning correlations. Apart

from decision making under uncertainty, we show that our upper bounds on correla-

tion gap are also useful for solving many deterministic optimization problems like wel-

fare maximization, k-dimensional matching and transportation problems, for which

it captures the performance of randomized algorithmic techniques like independent

random selection and independent randomized rounding.

iv

Page 5: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Our main technical results include upper and lower bounds on correlation gap

based on the properties of the cost function. We demonstrate that monotonicity and

submodularity of function implies a small correlation gap. Further, we employ tech-

niques of cross-monotonic cost-sharing schemes from game theory in a novel manner

to provide a characterization of non-submodularity functions with small correlation

gap. Results include small constant bounds for cost functions resulting from many

popular applications such as stochastic facility location, Steiner tree network design,

minimum spanning tree, minimum makespan scheduling, single-source rent-or-buy

network design etc. Notably, we show that for many interesting functions, correlation

gap is bounded irrespective of the dimension of the problem or type of marginal dis-

tributions. Additionally, we demonstrate the tightness of our characterization, that

is, small correlation gap of a function implies existence of an “approximate” cross-

monotonic cost-sharing scheme. This observation could also be useful for enhancing

the understanding of such schemes, and may be of independent interest.

v

Page 6: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Acknowledgments

I am deeply indebted to my advisor Prof. Yinyu Ye, for his enthusiasm, inspiration

and direction. This thesis would not have been possible without his support and

encouragement. Besides my advisor, my sincere thanks also goes to Prof. Amin

Saberi and Nimrod Megiddo for their mentoring and guidance during my doctoral

studies.

I would like to thank the rest of my dissertation reading and orals committee:

Prof. Tim Roughgarden, Prof. Ashish Goel, and Prof. Persi Diaconis, for their time,

insightful questions, and constructive comments.

I thank my friends, colleagues, and coauthors: Benjamin Armbruster, Erick De-

lage, Yichuan Ding, Anthony Man-Cho So, and Zizhuo Wang, for stimulating discus-

sions and valuable collaborations.

I wish to thank my entire family, my parents, sisters, parents-in-law, brothers-in-

law, sisters-in-law, for providing a loving and encouraging environment for me.

Last but not the least, I wish to thank my husband and best friend Piyush, for

helping me get through the difficult times and make most of the good times. This

journey would not have been so rewarding without his love and support. To him I

dedicate this thesis.

vi

Page 7: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Contents

Preface iv

Acknowledgments vi

1 Introduction 1

1.1 Correlation Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Price of Correlations (POC ) in Stochastic Optimization . . . . . . . . 3

1.3 Bounding POC via Correlation Gap . . . . . . . . . . . . . . . . . . 6

2 Upper bounds 8

2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Submodularity . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.2 Cross-monotone cost-sharing property . . . . . . . . . . . . . 10

2.1.3 Cross-monotone cost-sharing with a prefix property . . . . . . 14

2.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Proof for binary random variables . . . . . . . . . . . . . . . . 16

2.2.2 Proof for finite domains . . . . . . . . . . . . . . . . . . . . . 22

2.2.3 Proof for countably infinite domains . . . . . . . . . . . . . . 24

2.2.4 Proof for the uncountable domains Ω ⊆ Rn . . . . . . . . . . . 28

3 Applications 32

3.1 Approximation of Distributionally Robust Stochastic Optimization . 32

3.1.1 Stochastic Uncapacitated Facility Location (SUFL) . . . . . . 35

3.1.2 Stochastic Steiner Tree (SST) . . . . . . . . . . . . . . . . . . 36

vii

Page 8: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

3.1.3 Stochastic bottleneck matching . . . . . . . . . . . . . . . . . 37

3.2 Deterministic Optimization . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.1 d-dimensional maximum matching . . . . . . . . . . . . . . . 37

3.2.2 Welfare maximization . . . . . . . . . . . . . . . . . . . . . . . 40

4 Lower bounds 44

4.1 Lower bounds by examples . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1.1 Supermodular functions . . . . . . . . . . . . . . . . . . . . . 45

4.1.2 Subadditive functions . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.3 Submodular functions . . . . . . . . . . . . . . . . . . . . . . 47

4.1.4 Uncapacitated metric facility location . . . . . . . . . . . . . . 48

4.1.5 Steiner forest . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 Tightness of cost-sharing condition . . . . . . . . . . . . . . . . . . . 50

5 Conclusions 53

Bibliography 54

A Proof of Theorem 4 59

B Proof for binary random variables (details) 62

B.1 Properties of Split Operation . . . . . . . . . . . . . . . . . . . . . . 62

B.2 Handling irrational probabilities . . . . . . . . . . . . . . . . . . . . . 67

C Proof for finite domains (details) 69

D Maximum of Poisson Random Variables 73

viii

Page 9: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

List of Tables

ix

Page 10: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

List of Figures

4.1 Example with an exponential correlation gap . . . . . . . . . . . . . . 45

x

Page 11: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Chapter 1

Introduction

In many planning problems, it is crucial to consider correlations 1 among individual

events. For example, an emergency service (medical services, fire rescue, etc.) plan-

ner needs to carefully locate emergency service stations and determine the number

of emergency vehicles that need to be maintained in order to dispatch vehicles to the

call points in time. If the planner assumes emergency calls are rare and independent

events, he simply needs to make sure that every potential call point is in service range

of at least one station; however, there might exist certain kind of dependence between

those rare events, so that the planner cannot ignore the chance of simultaneous oc-

currences of those emergency events. The underlying correlations, possibly caused by

some common trigger factors (e.g., weather, festivals), are often difficult to predict or

analyze, which makes the planning problem complicated. Other examples include the

portfolio selection problem, in which the risk averse investor has to take into account

the correlations among multiple risky assets as well as their individual performances,

and the stochastic facility location problem, in which the supplier needs to consider

the correlations between demands from different retailers.

As these examples illustrate, information about correlations can be crucial for

operational planning, especially for large system planning. However, estimating the

joint distribution in presence of correlations is usually difficult, and much harder than,

1Here, “correlation” refers to any departure of two or more random variables from probabilisticindependence.

1

Page 12: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 1. INTRODUCTION 2

for example, estimating marginal distributions. Reasons for this include the huge

sample size required to characterize joint distribution accurately, and the practical

difficulty of retrieving centralized information, e.g., the retailers may only be able

to provide statistics about the demand for their own products. In presence of only

the marginal distribution information, a common heuristic is to assume that the

involved random variables are independent, and thus substitute the joint distribution

by independent (product) distribution with the given marginals. Such an assumption

not only simplifies the task of sampling and estimation, but also enriches the structure

of optimization problems under uncertainty, leading to efficient solution techniques

(e.g., see Kleinberg et al. (1997), Mohring et al. (1999)). However, the effect of such

an heuristic on the solution quality is little understood.

In this work, we evaluate the effectiveness of independence assumption by intro-

ducing new concepts of “Correlation Gap” and “Price of Correlations”. For given

marginal distributions, Correlation Gap compares the expected value of a function

on independent distribution to its expected value on the worst-case (expectation-

maximizing) distribution. Price of Correlations quantifies the robustness of the solu-

tion for an optimization problem under uncertainty that was obtained assuming in-

dependence. A small Correlation Gap of the objective function for all fixed decisions

will imply that Price of Correlations is small, i.e., the decision obtained assuming

independence is almost as robust as the most robust solution for the optimization

problem. Below, we give precise definitions.

1.1 Correlation Gap

Let ξ = (ξ1, . . . , ξn) denote an n-dimensional random vector taking values in Ω =

Ω1 ×· · ·×Ωn. For each i, we are given pi, a probability measure over Ωi. In practice,

for large domains the marginal distributions could be available explicitly in closed

parametric form or as a black box sampling oracle.

Denote by P , the collection of all multivariate probability measures p over Ω such

Page 13: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 1. INTRODUCTION 3

that each component ξi has a fixed marginal distribution pi. That is,

P =

p :

Ω

I(ξi = θi)dp(ξ) = pi(θi), ∀θi ∈ Ωi, i = 1, . . . , n

, (1.1)

where I(·) denotes the indicator function. Let p ∈ P denote the product distribution,

i.e., p(ξ) =∏

i pi(ξi),∀ξ.

Definition 1. Correlation Gap of an instance (f, Ω, pi) is defined as the ratio

κ:

κ = supp∈P

Ep[f(ξ)]

Ep[f(ξ)],

where P is the set of distributions given by Equation (1.1), p is the product distribution

with given marginals. We redefine κ = 1 if Ep[f(ξ)] = supp∈P Ep[f(ξ)] = 0, or if

Ep[f(ξ)] = ∞.

Definition 2. Correlation gap of a function f : Ω → R+ is κf if the correlation

gap of every instance (f, Ω, pi) is bounded by κf . That is,

κf = suppi

supp∈P(pi)

Ep[f(ξ)]

Ep[f(ξ)].

1.2 Price of Correlations (POC ) in Stochastic Op-

timization

We define Price of Correlations (POC ) to quantify robustness of independence as-

sumption in optimization under uncertainty. Decision making under uncertainty is

usually investigated in context of Stochastic Programming (SP) (e.g., see Ruszczyn-

ski and Shapiro (2003) and references therein). In SP, the decision maker optimizes

expected value of an objective function that involves random parameters. In general,

a stochastic program is expressed as

(SP) minimizex∈C E[h(x, ξ)], (1.2)

Page 14: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 1. INTRODUCTION 4

where x is the decision variable constrained to lie in set C, and the random variable

ξ ∈ Ω cannot be observed before the decision x is made. The cost function h(x, ξ)

depends on both the decision x ∈ C and the random vector ξ ∈ Ω. If underlying dis-

tribution of random variables is unknown, then the decision maker needs to estimate it

either via a parametric approach, which assumes the distribution has a certain closed

form and fits its parameters by empirical data, or via a non-parametric approach,

e.g., the Sample Average Approximation (SAA) method (e.g., Ahmed and Shapiro

(2002), Ruszczynski and Shapiro (2003), Swamy and Shmoys (2005), Charikar et al.

(2005)), which optimizes the average objective value over a set of samples. However,

these models are suitable only when one has access to a significant amount of reliable

time-invariant statistical information. If the available samples are insufficient to fit

the parameters of the distribution or to accurately estimate the expected value of the

cost function, then SP fails to address the problem.

One alternate approach is to instead optimize the worst-case outcome, which is

usually easier to characterize than estimating the joint distribution. That is,

(RO) minimizex∈C maximizeξ∈Ω h(x, ξ). (1.3)

Such a method is termed Robust Optimization (RO) following the recent literature

(e.g., Ben-Tal and Nemirovski (1998, 2000), Ben-Tal (2001)). However, such a robust

solution is often too pessimistic compared to SP (e.g., see Ben-Tal and Nemirovski

(2000), Bertsimas and Sim (2004), Chen et al. (2007)) because the worst-case scenario

can be very unlikely. In particular, this model does not utilize any available or easy to

estimate information about the distribution, such as marginal distributions of random

variables.

An intermediate approach that may address the limitations of SP and RO is

distributionally-robust stochastic programming (DRSP). In this approach, one min-

imizes the expected cost over the worst joint distribution among all probability dis-

tributions consistent with the available information. That is,

(DRSP) minimizex∈C maximizep∈P Ep[h(x, ξ)], (1.4)

Page 15: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 1. INTRODUCTION 5

where P is the collection of possible probability distributions on Ω consistent with

the marginal distribution information (refer to Equation (1.1)), and for any x ∈ C,

Ep[h(x, ξ)] denotes the expected value of h(x, ξ) over a distribution p on ξ. The

DRSP model can be interpreted as a two-person game. The decision maker chooses

a decision x hoping to minimize the expected cost, while the nature adversarially

chooses a distribution p from the collection P to maximize the expected cost of the

decision.

Our model for characterizing price of correlations is based on this distributionally

robust model of optimization. Given a problem instance (h, Ω, pi), let xI be the

optimal solution of stochastic program (1.2) assuming independent (product) distri-

bution, and xR be the optimal decision for the DRSP problem (1.4). Then, Price of

Correlations (POC ) compares the performance of xI to xR.

Definition 3. Given a problem instance (h, Ω, pi), where h : C × Ω → R+. For

any x ∈ C, define

g(x) = supp∈P

Ep[h(x, ξ)]. (1.5)

Let xI = arg minx∈C Ep[h(x, ξ)], xR = arg minx∈C g(x). Then Price of Correla-

tions (POC ) is defined as:

POC =g(xI)

g(xR). (1.6)

We redefine POC = 1, if g(xI) = g(xR) = 0, or if g(xR) = ∞.

Here R+ denotes the set of non-negative real numbers. Note that POC ≥ 1,

and POC = 1 corresponds to the case where a stochastic program with product

distribution yields the same result as the DRSP or minimax approach. A small

upper bound on POC would indicate that the optimal solution obtained assuming

independence is almost as robust as the most robust solution.

In many real data collection scenarios, practical constraints can make it very dif-

ficult (or costly) to learn the complete information about correlations in data. In

lack of sufficient data, a widely adopted strategy in practice is to use independent

distribution as a simple substitute for joint distribution. DRSP provides an alternate

optimization based approach to this problem promoting worst case distribution under

Page 16: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 1. INTRODUCTION 6

given marginals as a substitute to the joint distribution. However, a practitioner may

doubt that the worst case distribution is very pessimistic for the problem at hand.

We believe that POC provides a conceptual understanding of the value of correla-

tions in a decision problem involving uncertainties. It quantifies the gap between the

two approaches of assuming no correlations and assuming worst case correlations,

providing a guiding principle for the decision maker. A small POC indicates that the

solution obtained assuming independence is reasonably robust against correlations.

On the other hand, if POC is very high, and from experience a practitioner expects

the involved random variables to be not very correlated, she may decide that for this

problem the DRSP approach is indeed very pessimistic, and it is essential to invest

in learning the joint distribution. In this case, even getting poor estimates of corre-

lations may be a better approach than ignoring the correlations or assuming worst

case.

1.3 Bounding POC via Correlation Gap

It is easy to show that a uniform bound on correlation gap for all x will bound POC .

Let κ(x) denote correlation gap of function h(x, ξ) at x, i.e.,

κ(x) = supp∈P

Ep[h(x, ξ)]

Ep[h(x, ξ)],

Let for all feasible x,

κ(x) ≤ κ.

Then,

g(xI) = supp∈P

Ep[h(xI , ξ)],

g(xR) = supp∈P

Ep[h(xR, ξ)] ≥ Ep[h(xR, ξ)] ≥ Ep[h(xI , ξ)],

and hence,

POC =g(xI)

g(xR)≤ sup

p∈P

Ep[h(xI , ξ)]

Ep[h(xI , ξ)]= κ(xI) ≤ κ.

Page 17: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 1. INTRODUCTION 7

The rest of the thesis is organized as follows. In Chapter 2, we give formal state-

ments of our main results on upper bounding correlation gap (and POC ), and provide

rigorous proofs. In Chapter 3, we discuss applications of our upper bounds for stochas-

tic and deterministic optimization problems. In Chapter 4, we provide lower bounds

to demonstrate tightness of our upper bounds.

Page 18: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Chapter 2

Upper bounds

2.1 Results

In this section, we present upper bounds on correlation gap based on the properties

of function f . Upper bound on POC will be derived as a corollary for corresponding

stochastic optimization problems.

We assume that there is a complete ordering, denoted by ≤, on each Ωi, i = 1, ..., n.

We also use ≤ to denote the induced product order on Ω. That is, for any ξ,θ ∈ Ω,

ξ ≤ θ iff ξi ≤ θi, for all i. Also, for any ξ,θ ∈ Ω, let supξ,θ and infξ,θ denotes

supremum and infimum, respectively, of the two vectors taken with respect to the

partial order ≤ on Ω. That is, (supξ,θ)i = maxξi, θi, (infξ,θ)i = minξi, θi,for all i.

2.1.1 Submodularity

Our first result is that Correlation Gap (and POC ) has a small upper bound for

monotone submodular functions. Submodular functions are defined as follows:

Definition 4. A function f(ξ) : Ω → R is submodular iff

f(supξ,θ) + f(infξ,θ) ≤ f(ξ) + f(θ), ∀ξ,θ ∈ Ω. (2.1)

8

Page 19: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 9

Property (2.1) also appears as Monge property for matrices in the literature. An

n-dimensional matrix M with k columns in each dimension is a Monge matrix iff

the function f : 1, . . . , kn → R defined to take values in the corresponding cells of

the matrix M , is submodular. For functions of binary variables f : 2V → R, where

V = 1, . . . , n, submodularity condition is equivalent to,

f(S ∪ i) − f(S) ≥ f(T ∪ i) − f(T ), ∀S ⊆ T ⊆ V, i /∈ T. (2.2)

A continuous differentiable function f : Rn → R is submodular iff

∇2ijf(ξ) ≤ 0, ∀i, j, (2.3)

where ∇2f denotes the Hessian matrix of f .

Some popular examples of submodular functions of binary variables are rank func-

tion of matroids, and information measures like entropy, symmetric mutual informa-

tion, information gain measure in machine learning . Examples of continuous func-

tions that satisfy this property are f(ξ) = maxi ξi, and Lq norm f(ξ) = ||ξ||q, when

q ≥ 1 and ξ ∈ Rn+.

Intuitively, submodularity of a utility function corresponds to decreasing marginal

utilities – the marginal utility of an item decreases as the set of other items increases.

The notion of submodularity is very similar to the property of gross-substitutes, which

is however a stronger property in the sense that a utility function that satisfies gross

substitutes property is always submodular (see Gul and Stacchetti (1999), Kelso Jr

and Crawford (1982)).

We will prove the following theorem:

Theorem 1. Correlation gap of a non-negative, monotone, and submodular function

f is at most e/(e − 1).

Corollary 1. If the cost function h(x, ξ) is non-negative, monotone, and submodular

in ξ for all feasible x, then for any instance (h, Ω, pi), POC ≤ e/(e − 1).

For the special case of Ω = 0, 1n, above result for submodular functions can also

Page 20: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 10

be derived from a result in Calinescu et al. (2007). However, our more general result

makes no assumption on Ωis, and holds for general domains, e.g., for continuous case

of Ω = Rn and discrete case of Ω = N

n. To our understanding, the technique in

Calinescu et al. (2007) can not be easily applied to prove these general cases.

2.1.2 Cross-monotone cost-sharing property

Unfortunately, the condition of submodularity can be quite restrictive in practice.

Many popular applications such as stochastic facility location, stochastic Steiner tree

network design, stochastic scheduling, involve cost functions that are subadditive but

not submodular in the random variables. And, we demonstrate in the Section 4 that

there exist examples in the class of monotone subadditive (or fractionally subaddi-

tive) functions such that correlation gap (and POC ) can be arbitrarily large for large

n. Therefore, it is apparent that in order to obtain interesting upper bounds on cor-

relation gap we need a different characterization of functions; a property that relaxes

submodularity but is more restrictive than subadditivity or fractional subadditivity.

We will derive our characterization using concept of cross-monotone cost-sharing

schemes from game theory. Before we describe this property, we further motivate

the need for this characterization by demonstrating that other natural relaxations of

submodularity are bound to fail for this problem. For simplicity, consider functions of

binary variables, f : 2V → R, V = 1, . . . , n. Submodularity is equivalently defined

for such functions by either of the following two inequalities.

f(A ∪ B) + f(A ∩ B) ≤ f(A) + f(B), ∀A,B ⊆ V (2.4)

f(T ∪ i) − f(T ) ≤ f(S ∪ i) − f(S), ∀S ⊆ T ⊆ V (2.5)

First, consider defining a notion of “approximately submodular” functions as func-

tions that satisfy the inequality in Equation (2.4) within a factor of β. We would hope

to obtain a class of functions with small correlation gap as the class of functions that

have small constant value of β. However, note that for any monotone subadditive

Page 21: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 11

function,

f(A ∪ B) + f(A ∩ B) ≤ f(A) + f(B) +f(A)

2+

f(B)

2=

3

2(f(A) + f(B))

Therefore, β ≤ 3/2 for all monotone subadditive functions. If correlation gap could be

bounded by O(β) for this class of functions, then all monotone subadditive functions

would have constant correlation gap upper bound, which is a contradiction. Thus,

this relaxation is too loose.

Alternatively, consider relaxing the inequality in Equation (2.5) by a factor of

β. We show that this relaxation is not loose enough to include many interesting

non-submodular functions. In particular, it is easy to construct instances of facility

location and network design problems where β can be arbitrarily large. As an exam-

ple, consider a facility location instance with two facilities F1, F2. Assume that both

facilities are extremely expensive, so that an optimal solution will always contain only

one of the two facilities. Let F1, F2 be at distance L from each other. Let c1, c2 be

clients located very close (distance ǫ) to facility F1, c3, c4 be clients located very close

(distance ǫ) to facility F2. Let S = c1, T = c3, c4. Then f(S ∪ c2) − f(S) = ǫ,

f(T ∪ c2) − f(T ) = L, so that β = L/ǫ, which can be made arbitrarily large by

increasing L or decreasing ǫ.

Above discussion demonstrates that directly relaxing the conditions for submod-

ularity are not likely to yield a useful notion of “approximate submodularity” for our

purpose. We propose using the concept of cross-monotone cost-sharing to extend the

correlation gap bounds to a large and interesting class of non-submodular functions.

A cost-sharing scheme refers to a scheme for dividing the cost f(ξ) among the n

components.

Definition 5. Given total cost f(ξ) of servicing ξ ∈ Ω, a cost allocation is a function

Ψ : [n] → R+, which for every i = 1, . . . , n specifies the share Ψ(i) of i in the total cost

f(ξ). A cost-sharing scheme χ : [n] × Ω → R+ is a collection of cost allocations

for all ξ ∈ Ω.

We will often use a more compact notation χi(ξ) to denote the cost-share χ(i, ξ)

of i in f(ξ).

Page 22: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 12

Ideally, we want the cost-sharing scheme (and corresponding cost-allocations) to

be budget balanced, that is,∑n

i=1 χi(ξ) = f(ξ) for all ξ. However, it is not always

possible to achieve budget balance in combination with other properties. So, a relaxed

notion of β-budget-balanced cost-sharing scheme is often considered in literature.

Definition 6. A cost-sharing scheme χ is β-budget-balanced if

f(ξ)

β≤

n∑

i=1

χi(ξ) ≤ f(ξ), ∀ξ ∈ Ω.

Since we are only interested in expected values, we will use a further relaxation of

this concept where we require the upper bound in budget-balance condition to hold

only in expectation. We define this property as follows.

Definition 7. A cost-sharing scheme χ is β-budget-balanced in expectation with

respect to a distribution p over Ω if

f(ξ)

β≤

n∑

i=1

χi(ξ), ∀ξ ∈ Ω, and Ep[n∑

i=1

χi(ξ)] ≤ Ep[f(ξ)]

For our characterization of functions with small correlation gap, we are inter-

ested in cost-sharing schemes with additional properties of cross-monotonicity. Cross-

monotonicity (or population monotonicity) was studied by Moulin (1999) and Moulin

and Shenker (2001) in order to design group-strategyproof mechanisms, and has re-

cently received considerable attention in the computer science literature (see, for

example, Pal and Tardos (2003), Mahdian and Pal (2003), Konemann et al. (2005),

Immorlica et al. (2008), Nisan et al. (2007) and references therein). This property

captures the notion that an agent should not be penalized as the demands of other

agents grow.

Definition 8. A cost-sharing scheme χ is cross-monotonic if for all i, ξi ∈ Ωi, ξ−i,θ−i ∈Πj 6=iΩj,

χi(ξi, ξ−i) ≥ χi(ξi,θ−i), if ξ−i ≤ θ−i.

We summarize the required cost-sharing properties of a function as β-cost-sharing

property.

Page 23: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 13

Definition 9. A function f :∏

i Ωi → R satisfies β-cost-sharing property if

given any product distribution p on∏

i Ωi, there exists a cost-sharing scheme for f

that is (a) cross-monotonic and (b) β-budget-balanced in expectation with respect to

distribution p.

Theorem 2. If function f is non-negative, monotone, and satisfies β-cost-sharing

property with β < ∞, then correlation gap of function f is at most 2β.

Remark 1. Note that any function f that has a β-budget-balanced cross-monotonic

cost-sharing scheme satisfies β-cost-sharing property. The property of β-budget-balance

“in expectation” is a relaxation, relevance of which will become clear when we discuss

tightness of our upper bound condition in Section 4.2.

Remark 2. A monotone submodular function f has a 1-cost-sharing property. An

example of cross-monotonic 1-budget-balanced scheme for such a function is the in-

cremental cost-sharing scheme, defined as,

χi(ξ) = f(ξ1, . . . , ξi, 0, . . . , 0) − f(ξ1, . . . , ξi−1, 0, . . . , 0),

where 0 denotes the smallest element in the domain Ωj,∀j. However, above theorem

only bounds the correlation gap of such a function by 2, and thus does not achieve

bound of e/(e − 1) presented in previous subsection. In the next subsection, we will

provide a stricter (but more complicated and harder to achieve) property that will

achieve the e/(e − 1) bound for submodular functions.

Corollary 2. If for every feasible x, the cost function h(x, ξ) is non-negative and

monotone in ξ, and satisfies β-cost-sharing property for some β < ∞, then for any

instance (h, Ω, pi), POC ≤ 2β.

Note that above corollary requires that a β-cost-sharing scheme exists for every

feasible x, where β should not depend on x.

Above results connecting cross-monotone cost-sharing schemes to correlation gap

are particularly interesting since they allow us to use the already existing non-trivial

work on designing β-budget-balanced cross-monotone cost-sharing schemes for non-

submodular cost functions such as those resulting from facility location, Steiner forest

Page 24: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 14

network design, single source rent-or-buy, minimum makespan scheduling, etc. (see

Pal and Tardos (2003), Konemann et al. (2005), Gupta et al. (2007), Leonardi and

Schaefer (2004), Bleischwitz and Monien (2009)). Following are some results that

we can obtain as direct corollaries, using this existing literature on cross-monotonic

cost-sharing schemes.

Corollary 3. Let f(S) denote the makespan of the optimal assignment of jobs in set

S ⊆ V to m machines. Then correlation gap of f is atmost 4m/(m + 1), if either all

the jobs have identical workloads or if all the machines are identical. Correlation gap

is atmost 4d if there are d different workloads.

Corollary 4. Let f(S) denote the minimum cost of (metric) uncapacitated facility

location for serving cities in set S ⊆ V . Then correlation gap of f is atmost 6.

POC ≤ 6 for the corresponding two-stage stochastic (metric) uncapacitated facility

location problem.

Corollary 5. Let f(S) denote the minimum cost of Steiner forest network for con-

necting terminals in set S ⊆ V . Then correlation gap of f is atmost 4. POC ≤ 4 for

the corresponding two-stage stochastic Steiner forest problem.

We will discuss the formulations of two-stage stochastic problems further in Chap-

ter 3. In the following subsections, we provide rigorous proofs for the upper bounds

given by Theorem 1 and Theorem 2.

2.1.3 Cross-monotone cost-sharing with a prefix property

In this section, we consider functions of binary variables that have a cross-monotone

cost-sharing scheme with an additional prefix property. This property will allow us to

obtain a strict extension of results based on submodularity. Since we are considering

binary random vector ξ, we will equivalently represent it by the corresponding random

subset S of set V = 1, . . . , n. That is, Ω = 2V , the set of all subsets of set V . And,

for each i, pi is the probability of i to appear in random set S.

To define prefix property of cost-sharing, we need to make explicit the concept of

order-specific cost-sharing.

Page 25: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 15

Definition 10. For any set S ⊆ V , we denote by χ(i, S, σS), the order-specific cost-

share of i ∈ S, given ordering σS on elements in S. An example of order-specific

cost-sharing scheme is the incremental cost-share,

χ(iℓ, S, σS) = f(Sℓ) − f(Sℓ−1), (2.6)

where iℓ denotes the ℓth element, and Sℓ denotes the set of elements of rank 1 to ℓ

in S, according to ordering σS.

Definition 11. An order-specific cost-sharing scheme has a prefix property, iff for all

S, σS,

χ(iℓ, S, σS) = χ(iℓ, Sℓ, σSℓ),

where σSℓis the restriction of ordering σS to elements in Sℓ. For example, the incre-

mental cost-sharing scheme in Equation (2.6) has prefix property.

Definition 12. We say that a function f : 2V → R has β-prefix-cost-sharing prop-

erty if for every product distribution p, f has an order-specific cost-sharing scheme

χ(i, S, σS) with

1. Expected β-budget balance: For all S, and orderings σS on S:

f(S)

β≤∑

i∈S

χ(i, S, σS), Ep[∑

i∈S

χ(i, S, σS)] ≤ Ep[f(S)]

2. Cross-monotonicity: For all i ∈ S, S ⊆ T , σS ⊆ σT :

χ(i, S, σS) ≥ χ(i, T, σT )

Here , σS ⊆ σT means that the ordering of elements of S is same in σS and σT ,

i.e., σS is a restriction of ordering σT to subset S.

3. Prefix property.

For submodular functions, the incremental cost-sharing scheme χ(iℓ, S, σS) =

f(Sℓ) − f(Sℓ−1) discussed earlier is a β-prefix-cost-sharing scheme with β = 1. An-

other example is summable cost-sharing schemes (see Roughgarden and Sundararajan

Page 26: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 16

(2006)). We will bound correlation gap for functions of binary variables that have

β-prefix-cost-sharing property.

Theorem 3. If a function f : 2V → R is non-negative, monotone, and satisfies

β-prefix-cost-sharing property with β < ∞, then correlation gap is at most β ee−1

.

Note that the result in Theorem 1 for binary random variables can be derived as

a corollary of Theorem 3, since β = 1 for monotone submodular functions.

2.2 Proofs

2.2.1 Proof for binary random variables

In this subsection, we will equivalently represent a binary random variable ξ by

random subset S of set V = 1, . . . , n. That is, Ω = 2V , the set of all subsets of set

V . And, for each i, marginal probability pi will denote the probability of i to appear

in random set S.

For submodular functions on binary variables, a bound of e/(e− 1) on correlation

gap can be derived from results (Lemma 4 and Lemma 5) in Calinescu et al. (2007).

Theorem (Lemma 4 and Lemma 5 in Calinescu et al. (2007)): For any

instance (f, 2V , pi), if f is non-negative, non-decreasing and submodular, correla-

tion gap is bounded above by e/(e − 1).

At the end of this section, we prove the bound of βe/(e − 1) for functions on bi-

nary variables with β-prefix-cost-sharing. The corresponding result for submodular

function will also follow as a corollary of this result by substituting β = 1.

First, we present the proof of correlation gap bound of 2β for functions on binary

random variables that admit a β-cost-sharing scheme.

Lemma 1. For any instance (f, 2V , pi), if f is non-negative, non-decreasing, and

satisfies β-cost-sharing property, correlation gap is bounded above by 2β.

Page 27: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 17

Proof. First, we consider a simplified problem. We assume that (a) all pis are equal

to 1/K for some finite integer K > 0, and (b) the worst case distribution is a “K-

partition-type” distribution. That is, the worst case distribution has support on

K disjoint sets A1, . . . , AK that form a partition of V , and each Ak occurs with

probability 1/K. Let us call such instances (f, 2V , 1/K) “nice” instances. Here, we

show that correlation gap is bounded by 2β for all “nice” instances. In Lemma 2, we

show that it is sufficient to consider only the “nice” instances, and thus complete the

proof.

For any set S ⊆ V , denote S∩k = S ∩Ak, and S−k = S\Ak, for k = 1, . . . , K. Let

χ be the β-cost-sharing scheme for function f , as per the assumptions of the lemma.

Also, for any subset T of S, denote χ(T, S) :=∑

i∈T χi(S). Then, by the budget

balance property of χ, expected value under the independent distribution,

ES[f(S)] ≥ ES

[∑K

k=1 χ(S∩k, S)]

. (2.7)

Note that under independent distribution, the marginal probability that an ele-

ment i ∈ Ak appears in random set S∩k is 1/K. Using this observation along with

cross-monotonicity of cost-sharing scheme χ and properties of independent distribu-

tion, we can derive that for any k,

ES[χ(S∩k, S)] ≥ ES[χ(S∩k, S ∪ Ak)]

= ES[∑

i∈AkI(i ∈ S∩k)χ(i, S−k ∪ Ak)]

= ES−k[∑

i∈AkES∩k

[I(i ∈ S∩k)χ(i, S−k ∪ Ak)|S−k]]

= 1K

E[∑

i∈Akχ(i, S ∪ Ak)]

= 1K

E[χ(Ak, S ∪ Ak)] .

(2.8)

Here, I(·) denotes the indicator function. Apply the above inequality to γ = 12−1/K

Page 28: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 18

fraction of each term χ(S∩k, S) in (2.7) to obtain

ES[f(S)] ≥ ES

[∑K

k=1 (1 − γ) χ(S∩k, S) + γ 1K

χ(Ak, S ∪ Ak)]

= ES

[∑K

k=1

(

1−γK−1

)

(∑

j 6=k χ(S∩j, S)) + γ 1K

χ(Ak, S ∪ Ak)]

= 1(2K−1)

ES

[∑K

k=1(∑

j 6=k χ(S∩j, S)) + χ(Ak, S ∪ Ak)]

(using cross-monotonicity of χ) ≥ 1(2K−1)

ES

[∑K

k=1(∑

j 6=k χ(S∩j, S ∪ Ak)) + χ(Ak, S ∪ Ak)]

(using β-budget balance) ≥ 1(2K−1)β

ES

[∑K

k=1 f(S ∪ Ak)]

(using monotonicity of f) ≥ 1(2−1/K)β

(

1K

∑Kk=1 f(Ak)

)

.

Under the assumption of “nice” instance, the expected value on worst case distribution

is given by 1K

∑Kk=1 f(Ak). Therefore, correlation gap is bounded by 2β for nice

instances. Lemma 2 shows that it is sufficient to consider only the nice instances, and

completes the proof.

Lemma 2. For every instance (f, 2V , pi) such that f is non-decreasing and satisfies

β-cost-sharing property, there exists a nice instance (f ′, 2V ′

, 1/K) for some inte-

ger K > 0 such that f ′ is non-decreasing and satisfies β-cost-sharing property, and

correlation gap of instance (f ′, 2V ′

, 1/K) is at least as large as that of (f, 2V , pi).We make a technical assumption that all pis are rational and non-zero.

Proof. We use the following split operation.

Split: Given a problem instance (f, 2V , pi), and integers mi ≥ 1, i ∈ V , the

split operation defines a new instance (f ′, 2V ′

, p′r) as follows: split each item i ∈ V

into mi copies Ci1, C

i2, . . . , C

imi

, and assign a marginal probability of p′Ci

j

= pi

mito each

copy. Let V ′ denote the new ground set of size∑

i mi that contains all the duplicates.

Define the new cost function f ′ : 2V ′ → R as:

f ′(S ′) = f(Π(S ′)), for all S ′ ⊆ V ′, (2.9)

where Π(S ′) ⊆ V is the original subset of elements whose duplicates appear in S ′, i.e.

Π(S ′) = i ∈ V |Cij ∈ S ′ for some j ∈ 1, 2, . . . ,mi.

Page 29: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 19

We claim that splitting an instance

• does not change the expected value under worst case distribution.

• can only decrease the expected value under independent distribution.

• preserves the monotonicity and β-cost-sharing property of the function.

The proofs of these claims appear in Appendix B.1.

Thus, the new instance generated on splitting has correlation gap at least as

large as the original instance. The remaining proof tries to use split operation to

reduce any given instance to a “nice” instance. Let p be the worst case distribution

for instance (f, 2V , pi). The set of distributions P is defined by a rational linear

program with |V | constraints, so P is compact, and p exists. Suppose that p is

not a partition-type distribution. Then, split any element i that appears in two

different sets in the support of worst case distribution. Simultaneously, split the

distribution by assigning probability p′(S ′) = p(Π(S ′)) to each set S ′ that contains

exactly one copy of i, and probability 0 to all other sets. Since each set in the support

of new distribution contains exactly one copy of every element i, by definition of

function f ′, the expected function value of f ′ on p′ is same as that of f on p. By

properties of split operation, the worst case expected values for the two instances

(before and after splitting) must be the same, so the distribution p′ forms a worst

case distribution for the new instance. Repeat the splitting until the distribution

becomes a partition-type distribution. Then, we further split each element (and

simultaneously the distribution) until the marginal probability of each new element

is 1/K for some large enough integer K. Note that such a finite K always exists

assuming pis are rational and non-zero.

Lemma 3. For any instance (f, 2V , pi), if f is non-negative, non-decreasing, and

satisfies β-prefix-cost-sharing property, correlation gap is bounded above by β e(e−1)

.

Proof. In this proof, we will first assume “nice instances” as in the proof for functions

with β-cost-sharing.

Let the optimal K-partition corresponding to the worst case distribution is

A1, A2, . . . , AK. Assume w.l.o.g that that f(A1) ≥ f(A2) ≥ . . . ≥ f(AK). Also

Page 30: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 20

assume w.l.o.g. that elements of Ak come before Ak−1, i.e., AK = 1, . . . , |AK |,AK−1 = |AK | + 1, . . . , |AK | + |AK−1|, and so on.

Let χ(i, S, σS) is an order-specific β-prefix-cost-sharing scheme for function f , as

per the assumptions of the lemma. We will only be interested in orderings σS obtained

by restricting the fixed ordering 1, . . . , n to elements in S. So, we abbreviate χ(i, S, σS)

to χ(i, S) for simplicity of notation in the remaining proof. For any subset S ⊆ V ,

let Sl be the restriction of elements in set S to its smallest l elements. Let il denote

the lth element in S.

Then by prefix property and expected budget-balance of χ with respect to product

distribution∏

i pi:

ES⊆V [f(S)] ≥ ES⊆V

[∑|S|

l=1 χ(il, Sl)]

(2.10)

where the expected value is taken over the product distribution.

Denote φ(V ) := ES⊆V

[∑|S|

l=1 χ(il, Sl)]

. Let q = 1/K. We will show that

φ(V ) ≥ (1 − q)φ(V \A1) +1

βqf(A1)

Recursively using this inequality will prove the result. To prove this inequality, denote

S−1 = S ∩ (V \A1), S1 = S ∩A1, for any S ⊆ V . Since elements in A1 come after the

elements in V \A1, note that for any ℓ ≤ |S−1|, Sℓ ⊆ S−1, and for ℓ > |S−1|, iℓ ∈ S1.

φ(V ) = ES

[∑|S−1|

l=1 χ(il, Sl)]

+ ES

[∑|S|

l=|S−1|+1 χ(il, Sl)]

(2.11)

Since Sℓ ⊆ S ∪ A1, using cross-monotonicity of χ, the second term above can be

bounded as:

ES[∑|S|

l=|S−1|+1 χ(il, Sl)] ≥ ES[∑|S|

l=|S−1|+1 χ(il, S ∪ A1)] (2.12)

Because S−1 and S1 are mutually independent, for any fixed S−1, each i ∈ A1 will

have the same conditional probability q = 1/K of appearing in S1. Therefore,

Page 31: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 21

ES

[∑|S|

l=|S−1|+1 χ(il, S ∪ A1)]

= ES−1

[

ES1[∑|S|

l=|S−1|+1 χ(il, S−1 ∪ A1)|S−1]]

= q ES−1[∑

i∈A1χ(i, S−1 ∪ A1)]

(2.13)

Again, using independence and cross-monotonicity, analyze the first term in the

right hand side of (2.11),

ES[∑|S−1|

l=1 χ(il, Sl)] = ES−1[∑|S−1|

l=1 χ(il, Sl)]

≥ (1 − q) ES−1[∑|S−1|

l=1 χ(il, Sl)] + q ES−1[∑|S−1|

l=1 χ(il, S−1 ∪ A1)]

= (1 − q) φ(V \A1) + q ES−1[∑|S−1|

l=1 χ(il, S−1 ∪ A1)].

(2.14)

Based on (2.11), (2.13) and (2.14), and the fact that the cost-sharing scheme χ is

β-budget balanced, we deduce

φ(V ) = (1 − q) φ(V \A1) + q ES−1[∑|S−1|

l=1 χ(il, S−1 ∪ A1) +∑

i∈A1χ(i, S−1 ∪ A1)]

≥ (1 − q) φ(V \A1) + 1βq ES−1

[f(S−1 ∪ A1)]

≥ (1 − q) φ(V \A1) + 1βq f(A1).

(2.15)

The last inequality follows from monotonicity of f . Expanding the above recursive

inequality for A2, . . ., AK , we get

φ(V ) ≥ 1

βq

K∑

k=1

(1 − q)k−1f(Ak). (2.16)

Since f(Ak) is decreasing in k, and q = 1/K by simple arithmetic one can show

φ(V ) ≥ 1β·∑K

k=1 qf(Ak) · (∑K

k=1(1−q)k−1)

K

≥ 1β· (1 − 1

e) ·∑K

k=1 qf(Ak)

By definition of φ(V ), this gives,

κ ≤ βe

(e − 1).

Page 32: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 22

Next, we show that it is sufficient to consider nice instances. We use “Split”

operation as in Lemma 2 to reduce any instance to a nice instance. As in Lemma 2,

we use the observation that for any monotone function, the monotonicity is preserved

and the correlation gap can only become larger on splitting (see Property 1, 4, 5 in

Appendix B.1).

It remains to show that the new “nice” instance also has required cost-sharing

property. Given that the function f in original instance has a β-prefix-cost-sharing

property, we show that for any fixed ordering consistent with the ordering AK , . . . , A1

on the items in the new (nice) instance and product distribution with marginals 1/K,

there exists a cost-sharing method χ′ such that χ′ (a) is expected β-budget balanced,

(b) has a prefix property , and (c) is cross-monotone in following weaker sense: χ′ is

cross-monotone for any S ′ ⊆ T ′ such that S ′ is a partial-prefix of T ′, that is, for some

k ∈ 1, . . . , K, S ′ ⊆ AK ∪ · · · ∪ Ak, and T ′\S ′ ⊆ Ak ∪ · · · ∪ A1.

We note that this weaker version of cross-monotonicity for the new instance is

actually sufficient for above proof. To see this, observe that cross monotonicity is

used only in Equation (2.12) and (2.14), and at both of these places, the partial-

prefix property is satisfied.

The construction of this cost-sharing scheme χ′ is given in Appendix B.1, Property

6.

Remark 3. In above proofs, we assumed for simplicity that all pis are rational and

non-zero. In Appendix B.2, we show that the results hold even if the pis are not

rational. The assumption of pi being non-zero is without loss of generality, since we

could simply remove an item in V from the problem if its probability to appear in a

random set is 0.

2.2.2 Proof for finite domains

Lemma 4. Consider any instance (f, Ω, pi), f : Ω → R+, where Ω =∏

i Ωi and

for all i, |Ωi| < ∞. Then, correlation gap of instance (f, Ω, pi) is bounded by

• e/(e − 1), if f(ξ) is monotone and submodular.

Page 33: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 23

• 2β, if f(ξ) is monotone and satisfies β-cost-sharing property.

Proof. We prove this lemma by first reducing the problem to a problem with binary

random variables only. Then, the results in the previous subsection on bounding

correlation gap for instances with binary random variables will complete the proof.

W.l.o.g, let Ωi = 0, . . . , Ki, where Ki < ∞. Given instance (f, Ω, pi), we

create a new instance (f ′, Ω′, p′ij) with binary variables only as follows. For every

variable ξi in the original instance, we create Ki new binary variables ξ′ijKi

j=1 in the

new instance, and set the marginal probability of ξ′ij to take value 1 as: p′ij = pi(j).

Also, given f :∏

i Ωi → R+, we define new function f ′ :∏n

i=10, 1Ki → R as,

f ′(ξ′) := f(Θ(ξ′)),

where for i = 1, . . . , n, the value of Θ(ξ′)i is given by the largest j for which ξ′ij is

non-zero. That is, for all i,

Θ(ξ′)i = maxj : ξ′ij = 1,

assuming maxj : ξ′ij = 1 returns 0 if none of ξ′ij, j = 1, . . . , Ki is 1.

Next, we compare the problem instance (f,∏

i Ωi, pi) to the reduced 0-1 in-

stance (f ′,∏n

i=10, 1Ki , p′ij). We show that this reduction has the properties of

preserving monotonicity, submodularity, β-cost-sharing, and the expected value over

worst case distribution. Also, the expected value over independent distribution can

only decrease. The proofs of these claims use very similar ideas as in the proof of

Lemma 2. See Appendix C for details.

Thus, we get a new instance (f ′, Ω′, p′r) where Ω′ = 0, 1n′

, f ′ is monotone

and has a β-cost-sharing scheme (or is submodular). And, correlation gap of the new

instance bounds the correlation gap of the original instance. Since, correlation gap of

the new 0-1 instance is bounded as required by the results in the previous subsection,

this completes the proof of the lemma.

Page 34: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 24

2.2.3 Proof for countably infinite domains

Lemma 5. For any instance (f, Ω, pi), f : Ω → R+, where Ω = Ω1 × · · · ×Ωn is a

countable (possibly infinite) set, correlation gap is bounded by

• e/(e − 1), if f(ξ) is monotone and submodular.

• 2β, if f(ξ) is monotone and satisfies β-cost-sharing property.

In this section, we prove the above lemma using the corresponding result in Lemma

4 for finite domains.

Symbols and Notations W.l.o.g., let Ωi = 0, 1, 2, . . ., 0 being the smallest

element. Consider truncated sets Ωti, formed by truncating Ωis to its smallest t + 1

elements, i.e., Ωti = 0, . . . , t. Denote Ωt = Ωt

1 × · · ·×Ωtn. Throughout the proof, we

will use ξ to denote a random vector from the infinite space Ω, and use θ to denote a

random vector from the truncated domain Ωt. We also define a truncation operator

τi : Ωi → Ωti, which replaces every ξi greater than t by 0, i.e.,

τi(ξi) =

0, if ξi > t

ξi, o.w., ∀ξi ∈ Ωi.

Note that the above operator always replaces an element by a smaller (or same)

element. The corresponding generalization to n-dimensional vector τ : Ω → Ωt,

τ(ξ) = (τ1(ξ1), . . . , τn(ξn)), ∀ξ ∈ Ω.

Now, we define the truncated marginal probability distribution over each Ωti as:

pti(θi) =

Ωi

I(τ(ξi) = θi)dpi(ξi), ∀θi ∈ Ωti.

Note that by such a definition, the probability mass for ξi > t will been shifted to

ξi = 0 in the new distribution pti.

Page 35: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 25

Let p∗ denote the worst case joint distribution with marginals pi. That is,

p∗ = arg maxp∈P

Ep[f(ξ)]

For finite domains, it is easy to show that the set of distributions P is compact, and

hence p∗ exists. In general, one may consider p∗ to be a distribution in P such that

Ep∗ [f(ξ)] approximates supp∈P Ep[f(ξ)] arbitrarily well.

Next, we define three joint distributions (p∗)t, (pt)∗, and pt on Ωt, all with marginal

distributions pti.

• (p∗)t is the truncated version of p∗, defined as,

(p∗)t(θ) =

Ω

I(τ(ξ) = θ)dp∗(ξ), ∀θ ∈ Ωt.

It is easy to check that (p∗)t has marginals pti.

• (pt)∗ is defined as the worst case (expectation maximizing) distribution over Ωt

such that the marginal distributions are given by pti.

• pt is the independent (product) distribution with marginals pti.

Also, define a truncated function f t by assigning its value at ξ to 0 if any ξi is

greater than t, i.e.,

f t(ξ) =

f(ξ), if τ(ξ) = ξ,

0, o.w. .

Proof Outline For the truncated domain Ωt, Lemma 4 for finite domains already

proves thatE(pt)∗ [f(θ)]

Ept [f(θ)]≤ κ, ∀t, (2.17)

where κ = e/(e − 1) for monotone submodular functions, and κ = 2β < ∞ for

functions with β-cost-sharing scheme. We want to prove that when t → ∞, then both

Page 36: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 26

the numerator and the denominator converge to corresponding expected values on the

infinite domain Ω, and furthermore, for the worst case distribution, we want to show

that the expected value over infinite domain is finite (for the product distribution,

we have assumed that the expected value is finite). Specifically, we aim to prove the

following two limits.

limt→∞ Ept [f(θ)] = Ep[f(ξ)] (2.18)

and

limt→∞ E(pt)∗ [f(θ)] = Ep∗ [f(ξ)] < ∞. (2.19)

Once (2.18) and (2.19) are proved, the results in (2.17) for the finite domain can then

be extended to the infinite domain, i.e.,

limt→∞

E(pt)∗ [f(θ)]

Ept [f(θ)]=

limt→∞ E(pt)∗ [f(θ)]

limt→∞ Ept [f(θ)]=

Ep∗ [f(ξ)]

Ep[f(ξ)]≤ κ. (2.20)

We will first prove (2.18), and then use similar techniques to prove (2.19).

Proof of Equation (2.18) It is difficult to prove (2.18) directly, because the trun-

cated probability distribution pt changes with t. Instead, we will prove the conver-

gence using squeeze theorem. Specifically, we will show that

Ep[ft(ξ)] ≤ Ept [f(θ)] ≤ Ep[f(ξ)]. (2.21)

Then, since ft is non-decreasing and ft → f pointwise, by Lebesgue monotone conver-

gence theorem we have that Ep[ft(ξ)] → Ep[f(ξ)] when t → ∞. Then, (2.18) follows

from (2.21) using squeeze theorem (also known as Sandwich theorem) for limits.

To prove (2.21), observe that based on our definition of f t and the truncation

Page 37: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 27

operator τ , it follows that

Ep[ft(ξ)] =

Ω

f t(ξ)dp(ξ)

=

Ω

I(τ(ξ) = ξ)f(ξ)dp(ξ)

≤∫

Ω

I(τ(ξ) = ξ)f(ξ)dp(ξ) +

Ω

I(τ(ξ) 6= ξ)f(τ(ξ))dp(ξ)

=

Ω

f(τ(ξ))dp(ξ)

= Ept [f(θ)],

and

Ep[f(ξ)] =

Ω

f(ξ)dp(ξ) ≥∫

Ω

f(τ(ξ))dp(ξ) =

Ωt

f(θ)dpt(θ) = Ept [f(θ)].

Thus, (2.21) is proved and the result in (2.18) follows from the squeeze theorem.

Proof of Equation (2.19) Similarly, we use squeeze theorem to prove equation

(2.19). We show that

Ep∗ [ft(ξ)] ≤ E(p∗)t [f(θ)] ≤ E(pt)∗ [f(θ)] ≤ Ep∗ [f(ξ)] < ∞. (2.22)

We observe that

Ep∗ [ft(ξ)] =

Ω

f t(ξ)dp∗(ξ)

=

Ω

I(τ(ξ) = ξ)f(ξ)dp∗(ξ)

≤∫

Ω

f(τ(ξ))dp∗(ξ)

=

Ωt

f(θ)

(∫

Ω

I(τ(ξ) = θ)dp∗(ξ)

)

=

Ωt

f(θ)d(p∗)t(θ)

= E(p∗)t [f(θ)],

Page 38: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 28

which proves the first inequality in (2.22) from the left.

The second inequality in (2.22), i.e., E(p∗)t [f(θ)] ≤ E(pt)∗ [f(θ)], follows simply

from the fact that both (p∗)t and (pt)∗ have marginal distributions pti, and (pt)∗ is

defined as the expectation maximizing distribution with these marginals.

For the third inequality, given the distribution (pt)∗ on the truncated domain Ωt,

we will define a distribution p on Ω such that probability on ξ : τ(ξ) = θ sums to

(pt)∗(θ). For any ξ such that τ(ξ) = θ, we assign p(ξ) = (pt)∗(θ)∏

i pi(ξi)/∏

i pti(θi).

It is easy to verify that p will have marginals pi. Then

E(pt)∗ [f(θ)] =

Ωt

f(θ)d(pt)∗(θ)

=

Ω

f(τ(ξ))dp(ξ)

≤∫

Ω

f(ξ)dp(ξ)

= Ep∗ [f(ξ)].

We are left to show that all these integrals have finite values and the convergence

Ep∗ [ft(ξ)] → Ep∗ [f(ξ)]. We first observe that by (2.17) and (2.18), E(pt)∗ [f(ξ)] is

uniformly upper bounded by the expected value of f over the product distribution on

the countably infinite domain Ω, and this expected value is finite by our assumption.

Specifically,

E(pt)∗ [f(θ)] ≤ κEpt [f(θ)] → κEp[f(ξ)] < ∞. (2.23)

Now, since f t is monotone in t, and f t → f pointwise, by Lebesgue monotone conver-

gence theorem Ep∗ [ft(ξ)] → Ep∗ [f(ξ)]. By applying squeeze theorem on (2.22), this

implies E(pt)∗ [f(θ)] → Ep∗ [f(ξ)], and from (2.23), Ep∗ [f(ξ)] < ∞. This completes the

proof.

2.2.4 Proof for the uncountable domains Ω ⊆ Rn

Lemma 6. For any instance (f, Ω, pi), f : Ω → R+, where Ω = Ω1×· · ·×Ωn ⊆ Rn,

correlation gap is bounded by

Page 39: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 29

• e/(e − 1), if f(ξ) is monotone and submodular.

• 2β, if f(ξ) is monotone and has a β-cost-sharing scheme.

We also make a technical assumption that f has finite expected value on the product

distribution.

Proof. First, let us assume that Ω = (0, 1)n, and each pi is a uniform distribution on

interval (0, 1). Then, since the function f : Ω → R+ is monotone, the set of all points

of discontinuity of f is of Lebesgue measure 0 (Lavric (1993)). That is,

discont(f)

dξ = 0

Therefore, for any joint probability distribution p with uniform marginals pi,∫

discont(f)

dp(ξ) ≤∫

discont(f)

p1(ξ1)dξ =

discont(f)

dξ = 0

Therefore f is continuous almost everywhere with respect to any joint distribution p

with marginals pi, which includes in particular the worst case joint distribution p∗

and the independent distribution p.

Let us now define a lattice ΓL = ℓi2−Ln

i=1, ℓi = 1, . . . , 2L − 1,∀i, and a cor-

responding lattice function fL(ξ) = f(maxθ ∈ ΓL, θ ≤ ξ). Note that since f

is a non-decreasing function, fL(ξ) is non-decreasing in L. By our construction of

ΓL, there will be a sequence of lattice points θL → ξ, where θL ∈ ΓL. So, if f is

continuous at ξ, then fL(ξ) → f(ξ) as L → ∞. Therefore, we have that fL converges

to limit f almost everywhere (with respect to probability measures p∗ and p). So, by

Lebesgue monotone convergence theorem,

Ep∗fL → Ep∗f, (2.24)

EpfL → Epf. (2.25)

Next, for a fixed L, we can define truncated probability distributions (that have

support restricted to lattice ΓL) pLi , (p∗)L, (pL)∗, pL, in a manner similar to the

Page 40: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 30

previous section on countable domains. The truncation operator will now be defined

as

τ(ξ) = maxθ ∈ ΓL,θ ≤ ξ.

The result for each fixed lattice size will then follow from the result for finite domains.

That is, by Lemma 4,E(pL)∗ [f(θ)]

EpL [f(θ)]≤ κ,

where κ = e/(e − 1) if f is monotone and submodular, and κ = 2β if f is monotone

and has a β-cost-sharing scheme. Then, convergence to infinite lattice can be proven

in a manner similar to Subsection 2.2.3 and 2.2.3, using (2.24) and (2.25) along with

Squeeze theorem.

The remaining proof will demonstrate that w.l.o.g. we can assume that the

marginal distribution on each variable is uniform distribution on (0, 1). Otherwise,

following simple transformation can be made such that each transformed marginal

variable has a uniform distribution. The idea is to replace each variable by its cu-

mulative probability. More precisely, let Fi(ξi) represents the cumulative probability

distribution corresponding to pi. Replace each random variable ξi by ξ′i that is uni-

formly distributed on (0, 1), and replace function f by f ′ defined as f ′(ξ′) = f(ξ),

where

ξi = F−1i (ξ′i) = infr ∈ Ωi : Fi(r) ≥ ξ′i.

It is easy to verify that monotonicity and cost-sharing/submodularity properties are

preserved by this transformation (consider new cost-shares χ′i(ξ

′) = χi(ξ)). Also,

observe that for any joint distribution p′ on (0, 1)n with uniform marginals, there

exists a corresponding distribution p on Ω that has marginals pi and the same

expected value; and vice versa. For example, for any given p′, consider

p(ξ) =

Ω

I(F−1i (ξ′i) = ξi,∀i)p′(ξ′)dξ′.

Page 41: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 2. UPPER BOUNDS 31

And, for any given p, consider any distribution that spreads the support on ξ uni-

formly on the set ξ′ : F−1i (ξ′i) = ξi,∀i. Therefore, the expected value under worst

case distribution does not change as a result of this transformation. Also, below we

show that the expected value on independent distribution does not change.

(0,1)n

f(ξ′)dξ′ =

Ω

f(ξ)

(∫

(0,1)n

I(F−1(ξ′) = ξ)dξ′)

=

Ω

f(ξ)

(

n∏

i=1

(0,1)

I(F−1i (ξ′i) = ξi)dξ′i

)

=

Ω

f(ξ)n∏

i=1

pi(ξi)dξ

= Ep[f(ξ)]

Page 42: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Chapter 3

Applications

3.1 Approximation of Distributionally Robust Stochas-

tic Optimization

The DRSP model, also known as minimax stochastic programs, was proposed by

Scarf as early as 1958 (Scarf (1958)) and Zackova (1966) for modeling robust decision

making under limited distribution information.

In this approach, one minimizes the expected cost over the worst joint distribution

among all probability distributions consistent with the available information. That

is,

(DRSP) minimizex∈C maximizep∈P Ep[h(x, ξ)], (3.1)

where P is a collection of possible probability distributions on Ω, and for any x ∈ C,

Ep[h(x, ξ)] denotes the expected value of h(x, ξ) over a distribution p on ξ. The

DRSP model can be interpreted as a two-person game. The decision maker chooses

a decision x hoping to minimize the expected cost, while the nature adversarially

chooses a distribution p from the collection P to maximize the expected cost of the

decision.

Since its introduction, it has gained extensive interest (e.g., see Dupacova (1987),

Shapiro and Kleywegt (2002), Dupacova (2001) and references therein). The inner

32

Page 43: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 33

maximization problem has also been studied as moment problem (e.g., in Rogosinski

(1958) and in Landau (1987)). The applicability of various existing results depends

on the assumed form of set P and the properties of objective function h. In Lagoa and

Barmish (2002) and in Shapiro (2006), the authors consider a set containing unimodal

distributions that satisfy some given support constraints. Under some conditions on

h(x, ξ), they characterize the worst distribution as being the uniform distribution.

The most popular type of distributional set P imposes linear constraints on moments

of the distribution, as considered in Scarf (1958), in Dupacova (1987), in Prekopa

(1995), in Bertsimas et al. (2000) and in Bertsimas and Popescu (2005). Scarf (1958)

exploited the fact that for the newsvendor problem the worst distribution of demand

with given mean and variance could be chosen to be one with all of its weight on two

points. This idea was reused in Yue et al. (2006), and in Popescu (2007), although for

more general forms of objective function. More recently, Delage and Ye (2010) showed

that if the distributions have a fixed mean, bounded (by the positive semidefinite

partial order) covariance matrix, and are supported on a closed convex set, then

DRSP can be reformulated as a semidefinite program, solvable in polynomial time.

Goh and Sim (2010) could develop tractable approximations to the DRSP problem by

restricting to linear decision rules. Global optimization methods have been suggested

to compute the worst case distribution in Ermoliev et al. (1985) and in Gaivoronski

(1991)). Shapiro and Ahmed (2004) use duality to reduce any minimax program with

given moment constraints to a minimization type stochastic program, and suggest

using Sample Average Approximation (SAA) method to solve this stochastic program.

They do not explicitly derive polynomial time bounds on the sample size required by

the SAA method, which may depend on the objective function and constraints of the

formulated stochastic program.

In this work, we are interested in the Distributionally Robust Stochastic Program

under given marginal distribution constraints. That is, P denotes the set of all joint

distributions on Ω = Ω1 × · · · × Ωn, such that marginal distribution on each Ωi is pi

(refer to Equation (1.1) in Chapter 1).

The DRSP models closest to ours are those considered in Klein Haneveld (1986)

and Bertsimas et al. (2005). Klein Haneveld (1986) considers the problem of finding

Page 44: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 34

the worst case distribution for a PERT-type project planning problem under given

marginal distributions. Due to special structure of this application, the dual problem

can be reduced to a finite dimensional convex program. Bertsimas et al. (2005) study

the worst case expectation of optimal value of generic combinatorial optimization

problems with random objective coefficients, under limited marginal distribution in-

formation. They show that tight upper and lower bounds on this worst case expected

value can be computed in polynomial time, under certain conditions. They also an-

alyze the asymptotic behavior of this expected value under knowledge of complete

marginal distributions.

In general, computing the worst case joint distribution or the corresponding ex-

pected value is difficult. The DRSP problem cannot be solved to optimality in poly-

nomial time, especially when the support of the distributions is restricted. Bertsimas

and Popescu (2005) show that the problem is NP-hard if moments of third or higher

order are given, or if moments of second order are given and the domain of each ran-

dom variable is restricted to non-negative real numbers. For binary random variables,

i.e. if the domain is restricted to 0, 1, the problem of finding worst case distribution

with given marginals is equivalent to finding worst case distribution with given mean

or first order moments. We show that this problem is NP-hard even when restricted

to objective functions that are monotone and submodular in the random variable. In

fact, we prove a stronger result that this problem is hard to approximate within any

reasonable factor even with specific assumptions on the objective function.

Theorem 4. Given a function of n binary random variables, the problem of com-

puting its expected value under worst case distribution with given mean is NP-hard,

even when restricted to functions that are monotone and submodular in the random

variables. The problem cannot be approximated within a factor better than O( 1√n) in

polynomial time for some monotone and subadditive functions.

The proof of NP-hardness is based on the observation that even though the prob-

lem of finding worst case distribution with given mean can be formulated as a linear

program (with exponential number of variables), the problem of computing a sepa-

rating hyperplane for this linear program is at least as hard as MAX-CUT problem.

Page 45: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 35

The proof appears in Appendix A.

Considering the difficulty of computing the worst case distribution, a natural

question is how much risk it involves to simply ignore the correlations and, given

marginal distributions pi, minimize the expected cost under the independent or

product distribution

p(ξ) =∏

i

pi(ξi)

instead of the worst case distribution. Or, in other words, how well does the stochas-

tic optimization model with independent distribution approximates the robust DRSP

model. “Price of correlations” defined in this work quantifies exactly this approxima-

tion factor. Given a problem instance (h, Ω, pi), let xI be the optimal decision of

stochastic optimization problem with objective h, assuming independent (product)

distribution. Then, by definition, Price of Correlations (POC ) is the approximation

factor that xI achieves for the corresponding DRSP problem (refer to Equation 1.6).

And a stochastic optimization problem with independent (product) distribution is

often relatively easier to solve, either by sampling or by other algorithmic techniques

(e.g., see Kleinberg et al. (1997), Mohring et al. (1999)). Thus, a small upper bound

on POC would yield a simple approximation technique for the DRSP problem proven

earlier to be difficult to solve.

3.1.1 Stochastic Uncapacitated Facility Location (SUFL)

In two-stage stochastic facility location problem, any facility j ∈ F can be bought at

a low cost wIj in the first stage, and higher cost wII

j > wIj in the second stage, that is,

after the random set S ⊆ V of cities to be served is revealed. The decision maker’s

problem is to decide x ∈ 0, 1|F |, the facilities to be built in the first stage so that

the total expected cost E[h(x, S)] of facility location is minimized (refer to Swamy

and Shmoys (2005) for further details on the problem definition).

Proposition 1. For the two-stage stochastic metric uncapacitated facility location

(SUFL) problem, POC ≤ 6.

Proof. Given a first stage decision x, the cost function h(x, S) = wI · x + c(x, S),

Page 46: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 36

where c(x, S) is the cost of deterministic UFL for set S ⊆ V of customers and set F

of facilities such that the facilities x already bought in first stage are available freely

at no cost, while any other facility j costs wIIj . For deterministic metric UFL there

exists a cross-monotonic, 3-budget balanced, cost-sharing scheme (Pal and Tardos

(2003)). Therefore, using Theorem 2, we know that the POC for stochastic metric

UFL has an upper bound of 2β = 6.

Above proposition reduces the distributionally robust facility location problem to

the well-studied (e.g., see Swamy and Shmoys (2005)) stochastic UFL problem un-

der known (independent Bernoulli) distribution at the expense of a 6-approximation

factor.

3.1.2 Stochastic Steiner Tree (SST)

In the two-stage stochastic Steiner tree problem, we are given a graph G = (V,E).

An edge e ∈ E can be bought at cost wIe in the first stage. A random set S ⊆ V of

terminal nodes to be connected are revealed in the second stage. More edges may be

bought at a higher cost wIIe , e ∈ E in the second stage after observing the actual set

of terminals. Here, decision variable x is the edges to be bought in the first stage,

and cost function h(x, S) = wI ·x+c(x, S), where c(x, S) is the deterministic Steiner

tree cost for connecting nodes in set S, given that the edges in x are already bought.

Proposition 2. For the two-stage stochastic Steiner tree (SST) problem, POC ≤ 4.

Proof. Since a 2-budget balanced cross-monotonic cost-sharing scheme is known for

deterministic Steiner tree (see Konemann et al. (2005)), we can use Theorem 2 to

conclude that for this problem POC ≤ 2β = 4.

Above proposition reduces the distributionally robust stochastic Steiner tree prob-

lem to the well-studied (for example see Gupta et al. (2004)) SST problem under

known (independent Bernoulli) distribution at the expense of a 4-approximation fac-

tor.

Page 47: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 37

3.1.3 Stochastic bottleneck matching

Consider a graph G = (V,E), let M denote the set of all perfect matchings in graph

G. Associate a cost ξij ∈ R+ with every edge (i, j) ∈ E. Bottleneck matching

problem (Derigs (1980)) is to find a perfect matching of minimum cost, where cost of

a matching is determined by the most expensive edge in the matching. Formally,

minimizeσ∈M maximize(i,j)∈σ ξij .

In the stochastic version of this problem, the edge costs ξij are random variables,

and objective is to find the matching σ that minimizes expected cost E[h(σ, ξ)],

where h(σ, ξ) = max(i,j)∈σ ξij. It is easy to verify that for every fixed matching σ, the

objective function h(σ, ξ) is non-decreasing and submodular in ξ. Therefore, applying

Theorem 1 for submodular functions of continuous random variables (Ω = Rn), we

obtain the following result.

Proposition 3. For stochastic bottleneck matching, POC ≤ e/(e − 1).

Thus, if correlations are unknown, random variables ξij can be assumed to be

independent to get e/(e − 1) approximation for the corresponding (DRSP) model.

3.2 Deterministic Optimization

3.2.1 d-dimensional maximum matching

A d-dimensional matching is a generalization of bipartite matching (a.k.a. 2-dimensional

matching) to d-uniform hypergraphs. A d-uniform hypergraph consists of d disjoint

sets of vertices V1, . . . , Vd. The set of hyperedges is given by E = V1×V2 · · ·×Vd, and

each edge is associated with a non-negative weight given by a d dimensional weight

matrix W . For an edge (i1, . . . , id) ∈ E, we denote its weight by W [i1, . . . , id]. A set

of edges M ⊆ E forms a d-dimensional matching if every vertex appears in at most

one edge in M . d-dimensional maximum matching problem is to find a matching M

of maximum weight. Assuming w.l.o.g. that |V1| = · · · = |Vd| = T , the problem is

Page 48: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 38

formulated as,

maxx

1≤i1,...,id≤T W [i1, . . . , id]x[i1, . . . , id]

s.t.∑

i1,...,id|ij=t x[i1, . . . , id] = 1 for 1 ≤ j ≤ d, 1 ≤ t ≤ T

x[i1, . . . , id] ∈ 0, 1 for all i1, . . . , id,

(3.2)

where W [i1, . . . , id] denotes the weight of hyperedge (i1, . . . , id), and x[i1, . . . , id] de-

notes the decision whether to include the hyperedge (i1, . . . , id) in the matching.

Every node should be included in exactly one hyperedge in a matching.

d-dimensional maximum matching is a notoriously hard problem. To date the best

approximation known for general d is 2/d (Hurkens and Schrijver (1989), Berman

(2000)). Also, it is known there is no polynomial time algorithm that achieves an

approximation factor of Ω(ln(d)/d) unless P=NP (Hazan et al. (2006)).

Our result on correlation gap will provide a (1 − 1/e)-approximate greedy algo-

rithm for this problem when the weight matrix W is monotone and satisfies “Monge

property”.

Definition 13. A d-dimensional matrix W is said to be a “Monge matrix” if for any

set of indices (i1, . . . , id), (j1, . . . , jd),

W [i1, . . . , id] + W [j1, . . . , jd]

≥ W [maxi1, j1, . . . , maxid, jd] + W [mini1, j1, . . . , minid, jd]

For 2-dimensional matrix, it simply means the for any 2 × 2 submatrix, the sum of

main diagonal entries is less than or equal to the sum of off-diagonal entries. Also,

observe that the condition is same as the condition for submodularity of function

W : 1, . . . , Td → R.

Monge property has been studied extensively earlier for d-dimensional minimum

matching and transportation problem (e.g., in Burkard et al. (1996), Bein et al.

(1995)), in which case it characterizes precisely the instances of the minimization

problem which are solvable by simple greedy algorithm that at every step adds the

maximum weight edge compatible to the matching created so far. However, in case

Page 49: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 39

of maximum matching, it is easy to show that the approximation factor for greedy

algorithm can be as bad as O(1/n) even under Monge property. To our knowledge,

ours is the first result that shows maximum matching problem has a constant approx-

imation under monotonicity and Monge property. We show interesting applications

in display advertising where the weight matrix satisfies this property.

We first show that expected weight of uniformly random matching is away from

the weight of optimal matching by atmost a factor equal to the correlation gap of the

weight function. A greedy algorithm will then be developed by derandomizing this

randomized algorithm.

Proposition 4. Expected weight of uniformly random matching is atleast 1κ

of the

maximum d-dimensional matching, where κ is the correlation gap of function W (i1, . . . , id).

κ ≤ e/(e − 1) if matrix W is monotone in i1, . . . , id and satisfies Monge property.

Proof. Observe that on relaxing the integrality constraints of (3.2) and scaling the

variables by T , we get exactly the problem of finding the worst case joint distribution

for random vector (i1, . . . , id), given that marginal distribution on each variable ij

is uniform distribution on 1, . . . , T. Therefore, a solution generated by sampling

from product of uniform distributions will give 1/κ approximation to this problem.

Now, since the probability of any edge (i1, . . . , id) in product of uniform distributions

is same as the probability of an edge in a uniformly random matching, the claim

follows.

When W is a monotone Monge matrix, the function W (i1, . . . , id) is monotone

and submodular in (i1, . . . , id), therefore κ ≤ e/(e − 1).

A greedy algorithm (Algorithm 1) can be obtained by derandomizing this random

choice.

Proposition 5. The weight of matching produced by greedy Algorithm 1 is atleast

1/κ of the maximum matching.

Proof. We show that the weight of matching produced by the greedy algorithm is

greater than or equal to the expected weight of uniformly random matching. Note that

Page 50: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 40

Algorithm 1 A greedy algorithm for d-dimensional matching

1. Initialize matching M = . Let Vj denote the set of nodes not matched by anyedge in the matching constructed so far. Initialize Vj = Vj for all j = 1, . . . , d.

2. For t = 1, . . . , T

• Find the edge (i∗1, . . . , i∗d) ∈ V1 × · · · × Vd that maximizes a combination of

its own weight and average weight of the remaining edges. That is,

(i∗1, . . . , i∗d) = arg max

(i1,...,id)∈V1×···×Vd

W [i1, . . . , id]+(T−t)AVG(V1−i1, . . . , Vn−in)

where AVG(V1, . . . , Vn) denotes the average weight of edges in V1×· · ·×Vn.

• Add edge (i∗1, . . . , i∗d) to the matching M.

• Update Vj = Vj − i∗j, j = 1, . . . , d.

3. Output the matching M.

the expected weight of uniformly random matching is T ·AVG(V1, . . . , Vd). Then, the

claim simply follows from the observation that for any V1, . . . , Vd, |V1| = · · · = |Vd| = r,

rAVG(V1, . . . , Vd) ≥ maxi1,...,id

W [i1, . . . , id] + (r − 1) · AVG(V1 − i1, . . . , Vd − id).

The claim in Proposition 4 completes the proof.

3.2.2 Welfare maximization

Consider the problem of maximizing total utility achieved by partitioning a set V of

n goods among K players, all with identical utility functions f(S) for a subset S of

goods. The optimal welfare OPT is obtained by following integer program:

maxα

S αSf(S)

s.t.∑

S:i∈S αS = 1, ∀i ∈ V∑

S αS = K

αS ∈ 0, 1, ∀S ⊆ V.

(3.3)

Page 51: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 41

The welfare maximization problem recently attracted significant attention in com-

puter science literature (e.g., Feige (2006), Mirrokni et al. (2008), Vondrak (2008),

Nisan et al. (2007) and references therein). A more general formulation of this prob-

lem that is often considered in the literature allows non-identical utility functions for

different players. The problem is known to be NP-hard even for the case of identical

monotone utility functions. Also, the problem cannot be approximated in polynomial

time within a factor better than 1−1/e+ ǫ for identical submodular utility functions,

and 1/√

n for identical subadditive utility functions 1 (Mirrokni et al. (2008)). A

1− 1/e approximation is known for non-decreasing and submodular utility functions

(Calinescu et al. (2007), Vondrak (2008)). In this section, we show that a simple

greedy algorithm gives an approximation factor bounded by correlation gap of func-

tion f for the welfare maximization problem with identical utility functions f . This

will match the known approximation result for identical submodular utilities and give

new approximation results for some interesting non-submodular utility functions.

Observe that on relaxing the integrality constraints on α and scaling it by 1/K,

the problem in (3.3) reduces to that of finding the worst-case distribution α∗ such

that the marginal probability∑

S:i∈S αS of each element i ∈ V is 1/K. Therefore,

OPT ≤ Eα∗ [Kf(S)] .

Consequently, our correlation gap bounds lead to the following corollary for welfare

maximization problems:

Proposition 6. For the welfare maximization problem (3.3) with identical utility

functions f , the expected utility achieved by randomized algorithm that assigns goods

uniformly at random to a player is at least 1/κ of optimal, where κ is the correlation

gap of f .

Derandomizing the above randomized algorithm leads to following simple greedy

algorithm.

1Under the assumption of “demand oracle”, that is, assuming that the NP-hard problemmaxS f(S)−λT S can be solved exactly for any fixed λ, a 2-approximation is known for non-decreasingsubadditive functions (Feige (2006)).

Page 52: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 42

Algorithm 2 Greedy algorithm for welfare maximization with identical utility

1. Let Ak denote the goods assigned to player k so far. Initialize Ak = for allk = 1, . . . , K.

2. For i = 1, . . . , n

• Assign good i to player j who gives maximum expected increment in utility,given the already existing assignment, i.e.,

j = arg maxk

E[f(S ∪ Ak ∪ i) − f(S ∪ Ak)]

where S is a random subset of remaining goods i + 1, . . . , n, and eachremaining good appears in S independently with probability 1/K.

• Update Aj = Aj ∪ i.

3. Output partition A1, . . . , Ak.

Proposition 7. For the welfare maximization problem (3.3) with identical utility

functions f , the total utility of the partition formed by greedy Algorithm 2 is atleast

1/κ of the optimal, where κ is the correlation gap of f .

Proof. Let Sik denotes the random subset of goods among i + 1, . . . , n that player

k gets in a random partition formed by assigning each good independently to the K

players with probability 1/K. Let Ai1, . . . , A

iK denote the partition of goods 1, . . . , i

after step i of the greedy algorithm. We prove that after step i,∑

k E[f(Sik ∪ Ai

k)] is

greater than or equal to the expected utility achieved by the randomized algorithm

that assigns goods uniformly at random. The claim will then follow from Proposition

6.

Clearly, in the beginning when A0k = , the claim is trivially true. Assume that

it is true at step i − 1. Then,

Page 53: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 3. APPLICATIONS 43

k

E[f(Si−1k ∪ Ai−1

k )] =K∑

j=1

1

K

k 6=j

E[f(Sik ∪ Ai−1

k )] + E[f(Sik ∪ Ai−1

j ∪ i)]

≤∑

k 6=j∗

E[f(Sik ∪ Ai−1

k )] + E[f(Sik ∪ Ai−1

j∗ ∪ i)]

=∑

k

E[f(Sik ∪ Ai

k)]

where Aij∗ = Ai−1 ∪ j∗, Ai

k = Ai−1k , k 6= j∗, and choice of j∗ is given by,

j∗ = arg maxj

E[∑

k 6=j

f(Sik ∪ Ai−1

k )] + E[f(Sik ∪ Ai−1

j ∪ i)]

= arg maxj

E[f(Sij ∪ Ai−1

j ∪ i) − f(Sij ∪ Ai−1

j )]

= arg maxj

E[f(S ∪ Ai−1j ∪ i) − f(S ∪ Ai−1

j )]

where S denotes a random subset of goods i+1, . . . , n with each good appearing in

S independently with probability 1/K. The last equality follows from the observation

for any fixed k, and any fixed subset R ⊆ i + 1, . . . , n, Pr(Sik = R) = Pr(S = R).

Since above choice of j∗ as the choice of j made in our greedy algorithm, this completes

the proof.

For submodular functions, our correlation gap bounds lead to 1 − 1/e approxi-

mation factor that matches the approximation factor proven earlier in the literature

(Calinescu et al. (2007), Vondrak (2008)) for the case of identical monotone submodu-

lar functions. Further, the cost-sharing based criterion extends the result to problems

with non-submodular functions not previously studied in the literature.

Corollary 6. For welfare maximization with identical non-decreasing submodular

utility functions, greedy Algorithm 2 is a (1 − 1/e)-approximate algorithm.

Page 54: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Chapter 4

Lower bounds

In this chapter, we demonstrate the tightness of our upper bounds on correlation gap

and POC .

In the first section, we provide examples of functions that are monotone super-

modular and monotone subadditive respectively in the random variables, but could

still have arbitrarily large correlation gap if n is large. This illustrates the importance

of characterization using techniques like cost-sharing, in order to get upper bounds

as in Theorem 2 that do not depend on n. Also, we provide examples of instances

with correlation gap close to the upper bounds provided by our upper bound theo-

rem for cost functions like facility location, and Steiner forest that have approximate

cross-monotonic cost-sharing schemes. We also show tightness of our upper bound

for submodular functions via a counter-example.

In the second section, we prove a tightness result of stronger nature – we prove that

any problem instance that has β correlation gap will have a β-cost-sharing scheme.

Thus, our cost-sharing based condition is a near-tight characterization of functions

with small correlation gap.

44

Page 55: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 45

x

s u

11

1

t1

t2

tn

Figure 4.1: Example with an exponential correlation gap

4.1 Lower bounds by examples

In this section, we provide examples of instances with large correlation gap. Note

that since POC is bounded by correlation gap, a lower bound on POC implies cor-

responding lower bound on correlation gap, and it is sufficient to provide examples

for the former.

4.1.1 Supermodular functions

Lemma 7. There exists an instance (h, 2V , pi) with function h(x, S) that is non-

decreasing and supermodular in S, and POC ≥ Ω(2n). Here n = |V |.

Proof. Consider a two-stage minimum cost flow problem as in Figure 1. There is a

single source s, and n sinks t1, t2, . . . , tn. Each sink ti has a probability pi = 12

to

demand a flow, and then a unit flow has to be sent from s to ti. Each edge (u, ti) has

a fixed capacity 1, but the capacity of edge (s, u) needs to be purchased. The cost

of capacity x on edge (s, u) is cI(x) in the first stage, and cII(x) in the second stage

after the set of demands is revealed, defined as

cI(x) =

x, x ≤ n − 1,

n + 1, x = n,cII(x) = 2nx.

Given a first stage decision x, the total cost of edges that need to be bought in

order to serve a set S of requests is given by: h(x, S) = cI(x)+cII((|S|−x)+) = cI(x)+

Page 56: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 46

2n(|S| − x)+. It is easy to check that h(x, S) is non-decreasing and supermodular in

S for any given x, i.e. h(x, S ∪ i) − h(x, S) ≥ h(x, T ∪ i) − h(x, T ) for any S ⊇ T .

The objective is to minimize the total expected cost E[h(x, S)]. If the decision maker

assumes independent demands from the sinks, then xI = n−1 minimizes the expected

cost, and the expected cost is n; however, for the worst case distribution the expected

cost of this decision will be g(xI) = 2n−1 +n−1 (when Pr(1, . . . , n) = Pr(∅) = 1/2

and all other scenarios have zero probability).

Hence, the correlation gap at xI is exponentially high. A risk-averse strategy is to

instead use the robust solution xR = n, which leads to a cost g(xR) = n + 1. Thus,

POC = g(xI)/g(xR) = Ω(2n).

We may remark here that although independent distribution does not appear to

be a good substitute of worst case distribution for supermodular functions, the worst

case joint distribution actually has a nice closed form in this case, and (DRSP) model

is directly solvable in polynomial time. Refer to Agrawal et al. (2010) for details.

4.1.2 Subadditive functions

Lemma 8. There exists an instance (h, 2V , pi) with function h(x, S) that is non-

decreasing and (fractionally) subadditive 1 in S, and POC ≥ Ω(√

n log log(n)/ log(n)).

Here n = |V |.

Proof. Consider a set cover problem with elements V = 1, . . . , n. Each item i ∈ V

has a marginal probability of 1/K to appear in the random set S where K =√

n. The

covering sets are defined as follows. Consider a partition of V into K sets A1, . . . , AK

each containing K elements. The covering sets are all the sets in the Cartesian

product A1×· · ·×AK . Each set has unit cost. Then, cost of covering a set S is given

by subadditive (in fact, fractionally subadditive) function

c(S) = maxk=1,...,K

|S ∩ Ak| ∀S ⊆ V.

1A function f : 2V → R is subadditive iff f(S ∪ T ) ≤ f(S) + f(T ),∀S, T ⊆ V . f is fractionallysubadditive iff f(S) ≤ ∑

j ajf(Tj), for all S ⊆ V and collection Tj of subsets of S that forms afractional cover of S, i.e., ∀i ∈ S,

j:i∈Tjaj ≥ 1.

Page 57: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 47

The worst case distribution with marginal probabilities pi = 1/K is one where prob-

abilities Pr(S) = 1/K for S = Ak, k = 1, 2, . . . , K, and Pr(S) = 0 otherwise. The ex-

pected value of c(S) under this distribution is K =√

n. For independent distribution,

c(S) = maxk=1,...,K ζk, where ζk = |S ∩ Ak| are independent (K, 1/K)-binomially dis-

tributed random variables. Using some known statistical results in Kimber (1983), it

can be shown that as K =√

n approaches ∞, E[c(S)] approaches Θ(log n/ log log n).

See Appendix D for details.

So, the correlation gap of cost function c(S) is bounded below by Ω(√

n log log n/ log n).

To get corresponding lower bound on POC , consider a two-stage stochastic set cover

problem where sets cost little more than Ω(log n/√

n log log n) in first stage, and 1

in second stage. Then, on assuming independent distribution, optimal decision is

to buy no or very few sets in the first stage. However, for worst case distribution,

the expected second stage cost of this decision will be√

n. On the other hand, a

robust solution considering worst case distribution is to cover all the elements in

the first stage costing O(log(n)/ log log(n)), and nothing in the second stage. Thus,

POC ≥ Ω(√

n log log n/ log n).

4.1.3 Submodular functions

Lemma 9. There exists an instance (h, 2V , pi) with function h(x, S) that is non-

decreasing and submodular in S, and POC = e/(e − 1). Thus, upper bound given by

Theorem 1 is tight.

Proof. Let V := 1, 2, . . . , n, define submodular function f(S) = 1 if S 6= ∅, and

h(∅) = 0. Let each item i ∈ V have marginal probability pi = 1n

to appear in

random set S. The worst case distribution that maximizes E[f(S)] is the one with

Pr(i) = 1/n for all i ∈ V , with expected value 1. On the independent distribution

with the same marginals, f has an expected cost 1 − (1 − 1n)n → 1 − 1/e as n → ∞.

Thus, correlation gap is e/(e − 1).

To obtain corresponding lower bound on POC , we can extend this example to

a stochastic decision problem with two possible decisions x1, x2 as follows. Define

h(x1, S) = f(S),∀S, and h(x2, S) = 1 − 1e

+ ǫ,∀S for some arbitrarily small ǫ > 0.

Page 58: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 48

Then, on assuming independent distribution, x1 seems to be the optimal decision,

however it will have expected cost 1 on the worst case distribution. On the other

hand decision x2 would cost 1− 1e+ ǫ in the worst case, giving POC = e

(e−1)− ǫ.

4.1.4 Uncapacitated metric facility location

Lemma 10. There exists an instance of facility location problem with correlation gap

at least 3, and two-stage stochastic facility location with POC ≥ 3.

Proof. Consider following two-stage stochastic metric facility location instance with n

cities. There is a partition of the n cities into√

n disjoint sets A1, . . . , A√n containing

√n cities each. Corresponding to each set B of cities in the Cartesian product B =

A1 × · · · × A√n, there is a facility FB with connection cost 1 to each city in B. The

remaining connection costs are defined by extending the metric, that is, the cost of

connecting any city i to facility FB such that i /∈ B is 3. Assume that each city has

a marginal probability of 1√n

to appear in the random demand set S. Each facility

costs wI = 3 log n√n

in the first stage, and wII = 3 in the second stage.

First, consider the independent distribution case. Regardless of how many fa-

cilities are opened in the first stage, the expected cost in the second stage will be

no more than 3E[maxk |Ak ∩ S|] +√

n. E[maxk |Ak ∩ S|] asymptotically reaches

O( log nlog log n

) = o(log(n)) for large n. Therefore, for any ǫ > 0, for large enough n,

E[maxk |Ak ∩ S|] < ǫ log(n). As a result, if the decision maker assumes indepen-

dent distribution, she will never buy more than√

nǫ facilities in the first stage

which would cost her 3ǫ log(n). However, if the distribution turns out to be of

form Pr(Ak) = 1√n, k = 1, . . . ,

√n, then such a strategy induces an expected cost

g(xI) ≥ 3(1− ǫ)√

n + 3ǫ log(n). A robust solution is to instead build√

n facilities in

the first stage, corresponding to a collection of√

n disjoint sets in the collection B.

These facilities will cover every city with a connection cost of 1. Thus, worst case ex-

pected cost for robust solution g(xR) ≤ 3 log n+√

n. This shows g(xI) ≥ (3−ǫ)g(xR)

for any ǫ > 0.

Since a bound on correlation gap would imply a corresponding bound on POC ,

above example implies a lower bound of 3 on correlation gap.

Page 59: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 49

4.1.5 Steiner forest

Lemma 11. There exists an instance of Steiner forest network design with correlation

gap at least 2, and two-stage stochastic Steiner forest with POC ≥ 2.

Proof. The following example shows an instance of two stage stochastic Steiner tree

with POC ≥ 2. The construction of this example is very similar to the example used

in previous subsection to show a lower bound on POC for stochastic facility location.

Consider the following instance of two-stage stochastic Steiner tree problem with

n terminal nodes. There is a partition of the n terminal nodes into√

n disjoint sets

A1, . . . , A√n containing

√n nodes each. Corresponding to each set B in the Cartesian

product B = A1 × · · · × A√n, there is a (non-terminal) node vB in the graph which

is connected directly via an edge to each terminal node in B. Assume that each

terminal node has a marginal probability of 1√n

to appear in the demand set S. Each

edge e ∈ E costs wIe = log n√

nin the first stage, and wII

e = 1 in the second stage.

Then, in the optimal decision made using independent distribution at most ǫ√

n

edges will be bought in the first stage which can make available at most ǫ√

n non-

terminal nodes. Since no two nodes in any Ak are directly connected to each other

or to any common non-terminal node, these ǫ√

n non-terminal nodes are directly

connected to at most ǫ√

n nodes in a set Ak. Each of the remaining node in Ak will

require at least two edges in order to be connected to the Steiner tree. Therefore,

if the distribution is of form Pr(Ak) = 1√n, k = 1, . . . ,

√n, then the expected cost of

this decision will be g(xI) ≥ 2√

n(1 − ǫ) + ǫ log(n). A robust solution is to instead

buy enough edges in the first stage so that a set of√

n non-terminal nodes vBcorresponding to a collection of

√n disjoint sets in B are connected to each other.

By construction, any two non-terminal nodes are connected by a path of length at

most 3 to each other, therefore this requires buying at most 3√

n edges in the first

stage costing at most 3 log(n). Also, for any k, each node in Ak is connected directly

to one of these non-terminal nodes. Therefore, the worst case expected cost for this

solution is g(xR) ≤ 3 log(n) +√

n. This shows g(xI) ≥ (2 − ǫ)g(xR) for any ǫ > 0.

Since a bound on correlation gap would imply a corresponding bound on POC ,

above example implies a lower bound of 2 on correlation gap.

Page 60: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 50

4.2 Tightness of cost-sharing condition

In this section, we show that our upper bound condition based on β-cost-sharing

property is a tight (within a factor of 2) characterization of functions with small

correlation gap.

Theorem 5. For any function f : Ω → R+, if correlation gap of f is upper bounded

by β, then it satisfies β-cost-sharing property. We assume |Ω| < ∞.

We prove a slightly stronger version of above theorem.

Lemma 12. For any instance (f, Ω, pi), |Ω| < ∞, if correlation gap is upper

bounded by β, then there exists a cross-monotonic cost-sharing scheme for f that is

β-budget balanced in expectation with respect to product distribution∏

i pi.

Proof. Let bξ be the product distribution with given marginals, i.e. bξ =∏

i pi(ξi).

Then, the expected value on worst-case (expectation-maximizing) distribution with

marginals pi is given by

maxα

ξ αξf(ξ)

s. t.∑

ξ:ξi=t αξ ≤∑

ξ:ξi=t bξ ∀i, t ∈ Ωi∑

ξ αξ ≤ 1

αξ ≥ 0 ∀ξ.

If correlation gap is bounded by β, then the optimal value of above program is atmost

β∑

ξ bξf(ξ). Adding more constraints can only decrease the optimal value, therefore,

β∑

ξ

bξf(ξ) ≥

maxα

ξ αξf(ξ)

s. t.∑

ξ≤θ:ξi=t αξ ≤∑

ξ≤θ:ξi=t bξ ∀i, t ∈ Ωi,θ ∈ Ω∑

ξ αξ ≤ 1

αξ ≥ 0 ∀ξ.

Let us consider the Lagrangian dual of the above linear program. Let γi,t,θ denote

the dual variable corresponding to constraint for i, t ∈ Ωi,θ ∈ Ω. Then, by strong

Page 61: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 51

duality,

β∑

ξ

bξf(ξ) ≥

minγ

ξ bξ(∑

i

θ≥ξ γi,ξi,θ) + δ

s. t.∑

i

θ≥ξ γi,ξi,θ + δ ≥ f(ξ) ∀ξ ∈ Ω

γi,t,θ ≥ 0 ∀i, t ∈ Ωi,θ ∈ Ω

δ ≥ 0

Let γ∗i,t,θ, δ∗ be the optimal dual solution. Then, consider cost sharing scheme χ

defined as,

χi(ξ) =∑

θ≥ξ

γ∗i,ξi,θ

+δ∗

n.

We have

• Cross-monotonicity: For every ϑ ≥ ξ such that ϑi = ξi, we have

χi(ξ) =∑

θ≥ξ

γ∗i,ξi,θ

+δ∗

n≥∑

θ≥ϑ

γ∗i,ξi,θ

+δ∗

n=∑

θ≥ϑ

γ∗i,ϑi,θ

+δ∗

n= χi(ϑ)

• (Expected) β-budget-balance: Due to the constraint in the dual program,

for every ξ ∈ Ω,

i

χi(ξ) =∑

i

θ≥ξ

γ∗i,ξi,θ

+δ∗

n≥ f(ξ).

And,∑

ξ

i

χi(ξ) =∑

ξ

i

θ≥ξ

γ∗i,ξi,θ

+δ∗

n≤ β

ξ

bξf(ξ)

Observe that above is not simply a proof by counter-example, it shows that if

any function has a small correlation gap, then it will have β-cost-sharing property

with small β. However, there is a gap of factor 2 in our upper bound and lower

bound results. Below is an example of a function which has a 1-cost-sharing but the

correlation gap is arbitrarily close to 2. This example, constructed and communicated

Page 62: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

CHAPTER 4. LOWER BOUNDS 52

to us in private correspondence by Jan Vondrak, proves that this factor of 2 cannot

be eliminated using only the cross-monotone cost-sharing criterion.

Example of function with 1-cost-sharing and correlation gap of 2: Consider

function f : 2V → R, for V = 1, . . . , n defined as:

f(S) =

0 for |S| = 0

k for 0 < |S| < k

|S| for |S| >= k

Function f is not submodular, but admits a cross-monotonic cost sharing scheme,

where each element in a set has the same share. The shares are k/|S| in a set of size

|S| < k, and 1 for |S| ≥ k, which are non-increasing with set size.

Now consider a distribution p which generates the whole ground set with probabil-

ity k/n, or else a random singleton; i.e. each singleton with probability (1− k/n)/n.

The expected value of f over this distribution is

ES∼p[f(S)] = nk/n + k(1 − k/n) = 2k − k2/n.

Let us now make n large compared to k, for example n = k3. Then, ES∼p[f(S)] =

2k−1/k. The marginal probabilities of p are pi = k/n+(1−k/n)/n = (k+1−k/n)/n

for each element i ∈ V . For large n, on sampling n times independently with this

probability, the expected size of random set will be very close to k + 1 − k/n (by

a Chernoff bound). The expected value of f on such random set will be something

between k and k + 1. Therefore, the correlation gap of f is arbitrarily close to 2.

Page 63: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Chapter 5

Conclusions

In this paper, we investigated how effective is the simple approach of ignoring correla-

tions for optimization under uncertainty. We introduced a new concept of Correlation

Gap to quantify the deterioration in the performance of a decision obtained assuming

independence, in presence of arbitrary correlations. We believe this concept is es-

pecially attractive because it characterizes the cases when the seemingly pessimistic

worst case joint distribution is close to the more natural independent distribution, in

the sense that former can be substituted by the latter. By proving upper and lower

bounds on correlation gap for a wide range of problems, our research sheds important

insights on when correlations can be ignored in practice. We also show that many

deterministic optimization problems that involve matching or partitioning constraints

can be formulated as the problem of computing worst case distribution with given

marginals. Hence, our results provide approximation algorithms for those as well.

Finally, our methodology of bounding correlation gap using cost-sharing schemes is

a novel application of these algorithmic game theory techniques and deserves further

study.

Some directions of future research include reducing the gap when the available

information can be strengthened to include some partial knowledge about correlations.

Such partial information may be available in form of partial or complete knowledge of

covariance matrix; or partial knowledge of dependence structure in the form of edges

in graphical models such as Markov random fields and Bayesian networks.

53

Page 64: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Bibliography

Shipra Agrawal, Yichuan Ding, Amin Saberi, and Yinyu Ye. Correlation Robust Stochastic

Optimization. In SODA ’10: Proceedings of the sixteenth annual ACM-SIAM sympo-

sium on Discrete algorithms, 2010.

Shabbir Ahmed and Alexander Shapiro. The Sample Average Approximation Method for

Stochastic Programs with Integer Recourse. SIAM Journal on Optimization, 12:479–

502, 2002.

Wolfgang W. Bein, Peter Brucker, James K. Park, and Pramod K. Pathak. A Monge Prop-

erty for the d-Dimensional Transportation Problem. Discrete Applied Mathematics, 58

(2):97–109, 1995.

A. Ben-Tal and A. Nemirovski. Robust Convex Optimization. Mathematics of Operations

Research, 23(4):769–805, 1998. ISSN 0364-765X.

Aharon Ben-Tal. Robust Optimization - Methodology and Applications. Mathematical

Programming, 92(3):453–480, 2001.

Aharon Ben-Tal and Arkadi Nemirovski. Robust solutions of Linear Programming problems

contaminated with uncertain data. Mathematical Programming, 88:411–424, 2000.

Piotr Berman. A d/2 approximation for maximum weight independent set in d-claw free

graphs. Nordic Journal of Computing, 7:178–184, September 2000. ISSN 1236-6064.

D. Bertsimas and I. Popescu. Optimal Inequalities in Probability Theory: A Convex Opti-

mization Approach. SIAM Journal on Optimization, 15(3):780–804, 2005.

D. Bertsimas, I. Popescu, and J. Sethuraman. Moment Problems and Semidefinite Pro-

gramming. Kluwer Academic Publishers, 2000.

Dimitris Bertsimas and Melvyn Sim. The Price of Robustness. Operations Research, 52(1):

35–53, 2004. ISSN 0030-364X.

54

Page 65: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

BIBLIOGRAPHY 55

Dimitris Bertsimas, Karthik Natarajan, and Chung-Piaw Teo. Probabilistic Combinatorial

Optimization: Moments, Semidefinite Programming, and Asymptotic Bounds. SIAM

Journal on Optimization, 15:185–209, January 2005.

Yvonne Bleischwitz and Burkhard Monien. Fair cost-sharing methods for scheduling jobs on

parallel machines. Journal of Discrete Algorithms, 7:280–290, September 2009. ISSN

1570-8667.

R. E. Burkard, B. Klinz, and R. Rudolf. Perspectives of Monge properties in optimization.

Discrete Applied Mathematics, 70(2):95–161, 1996.

Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing a Submod-

ular Set Function Subject to a Matroid Constraint (Extended Abstract). In IPCO,

pages 182–196, 2007.

Moses Charikar, Chandra Chekuri, and Martin Pal. Sampling Bounds for Stochastic Opti-

mization. In APPROX-RANDOM, pages 257–269, 2005.

Xin Chen, Melvyn Sim, and Peng Sun. A Robust Optimization Perspective on Stochastic

Programming. Operations Research, 55(6):1058–1071, 2007.

Erick Delage and Yinyu Ye. Distributionally Robust Optimization Under Moment Uncer-

tainty with Application to Data-Driven Problems. Operations Research, 58:595–612,

May 2010.

Ulrich Derigs. On two methods for solving the bottleneck matching problem. In Lecture

Notes in Control and Information Sciences, volume 23. Springer Berlin / Heidelberg,

1980.

J. Dupacova. The Minimax Approach to Stochastic Programming and an Illustrative Ap-

plication. Stochastics, 20(1):73–88, 1987.

J. Dupacova. Stochastic Programming: Minimax Approach. Encyclopedia of Optimization,

5:327–330, 2001.

Y. Ermoliev, A. Gaivoronski, and C. Nedeva. Stochastic optimization problems with in-

complete information on distribution functions. SIAM Journal on Control and Opti-

mization, 23:697, 1985.

Uriel Feige. On maximizing welfare when utility functions are subadditive. In Proceedings

of the thirty-eighth annual ACM symposium on Theory of computing, STOC ’06, pages

41–50, 2006.

Page 66: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

BIBLIOGRAPHY 56

A.A. Gaivoronski. A numerical method for solving stochastic programming problems with

moment constraints on a distribution function. Annals of Operations Research, 31(1):

347–369, 1991.

Joel Goh and Melvyn Sim. Distributionally Robust Optimization and Its Tractable Ap-

proximations. Operations Research, 58(4-Part-1):902–917, 2010.

M. Grotschel, L. Lovasz, and A. Schrijver. Geometric algorithms and combinatorial opti-

mization. Springer Berlin, 1988.

F. Gul and E. Stacchetti. Walrasian Equilibrium with Gross Substitutes. Journal of Eco-

nomic Theory, 87(1):95–124, 1999.

Anupam Gupta, Martin Pal, R. Ravi, and Amitabh Sinha. Boosted sampling: Approxi-

mation algorithms for stochastic optimization. In In Proceedings of the 36th Annual

ACM Symposium on Theory of Computing, pages 417–426, 2004.

Anupam Gupta, Aravind Srinivasan, and Eva Tardos. Cost-sharing mechanisms for network

design. Algorithmica, 50:98–119, December 2007. ISSN 0178-4617.

Elad Hazan, Shmuel Safra, and Oded Schwartz. On the complexity of approximating k-set

packing. Computational Complexity, 15(1):20–39, 2006. ISSN 1016-3328.

C.A.J. Hurkens and A. Schrijver. On the size of systems of sets every t of which have an

SDR, with an application to the worst-case ratio of heuristics for packing problems.

SIAM Journal on Discrete Mathematics, 2(1):68–72, 1989.

Nicole Immorlica, Mohammad Mahdian, and Vahab S. Mirrokni. Limitations of Cross-

monotonic Cost-sharing Schemes. ACM Transactions on Algorithms, 4(2):1–25, 2008.

ISSN 1549-6325.

A.S. Kelso Jr and V.P. Crawford. Job matching, coalition formation, and gross substitutes.

Econometrica: Journal of the Econometric Society, 50(6):1483–1504, 1982.

A. C. Kimber. A note on Poisson maxima. Probability Theory and Related Fields, 63:

551–552, 1983.

W. Klein Haneveld. Robustness against dependence in PERT: An application of duality and

distributions with known marginals. Mathematical Programming Studies, 27:153–182,

1986.

Jon Kleinberg, Yuval Rabani, and Eva Tardos. Allocating bandwidth for bursty connections.

In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing,

STOC ’97, pages 664–673, 1997.

Page 67: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

BIBLIOGRAPHY 57

Jochen Konemann, Stefano Leonardi, and Guido Schafer. A group-strategyproof mechanism

for Steiner forests. In Proceedings of the sixteenth annual ACM-SIAM symposium on

Discrete algorithms, SODA ’05, pages 612–619, 2005.

C. M. Lagoa and B. R. Barmish. Distributionally robust Monte Carlo simulation: a tutorial

survey. In Proceedings of the IFAC World Congress, pages 1–12, 2002.

H.J. Landau. Moments in Mathematics: Lecture Notes Prepared for the AMS Short Course.

American Mathematical Society, 1987.

Boris Lavric. Continuity of Monotone Functions. Archivum Mathematicum, 29:1–4, 1993.

ISSN 0044-8753.

Stefano Leonardi and Guido Schaefer. Cross-monotonic cost-sharing methods for connected

facility location games. In EC ’04: Proceedings of the 5th ACM conference on Elec-

tronic commerce, pages 242–243, 2004.

Mohammad Mahdian and Martin Pal. Universal facility location. In in Proc. of ESA 03,

pages 409–421, 2003.

V. Mirrokni, M. Schapira, and J. Vondrak. Tight information-theoretic lower bounds for

welfare maximization in combinatorial auctions. In Proceedings of the 9th ACM con-

ference on Electronic commerce, pages 70–77. ACM, 2008.

Rolf H. Mohring, Andreas S. Schulz, and Marc Uetz. Approximation in stochastic schedul-

ing: the power of LP-based priority policies. Journal of the ACM, 46(6):924–942, 1999.

ISSN 0004-5411.

Herve Moulin. Incremental cost sharing: Characterization by coalition strategy-proofness.

Social Choice and Welfare, 16:279–320, 1999.

Herve Moulin and Scott Shenker. Strategyproof sharing of submodular costs: budget bal-

ance versus efficiency. Economic Theory, 18:511–533, 2001.

Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V. Vazirani. Algorithmic Game

Theory. Cambridge University Press, New York, NY, USA, 2007. ISBN 0521872820.

Martin Pal and Eva Tardos. Group Strategyproof Mechanisms via Primal-Dual Algorithms.

In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer

Science, FOCS ’03, pages 584–593, 2003.

I. Popescu. Robust Mean-Covariance Solutions for Stochastic Optimization. Operations

Research, 55(1):98, 2007.

Page 68: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

BIBLIOGRAPHY 58

A. Prekopa. Stochastic programming. Kluwer Academic Publishers, 1995.

W. W. Rogosinski. Moments of Non-negative Mass. Proceedings of the Royal Society of

London. Series A, Mathematical and Physical Sciences, 245(1240):1–27, 1958.

Tim Roughgarden and Mukund Sundararajan. New trade-offs in cost-sharing mechanisms.

In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on Theory of

computing, pages 79–88, New York, NY, USA, 2006. ACM. ISBN 1-59593-134-1.

A. Ruszczynski and A. Shapiro, editors. Stochastic Programming, volume 10 of Handbooks

in Operations Research and Management Science. Elsevier, 2003.

Herbert E. Scarf. A min-max solution of an inventory problem. Studies in The Mathematical

Theory of Inventory and Production, pages 201–209, 1958.

A. Shapiro. Worst-case distribution analysis of stochastic programs. Mathematical Pro-

gramming, 107(1):91–96, 2006.

A. Shapiro and S. Ahmed. On a class of minimax stochastic programs. SIAM Journal on

Optimization, 14(4):1237–1252, 2004.

A. Shapiro and A. Kleywegt. Minimax analysis of stochastic problems. Optimization Meth-

ods and Software, 17(3):523–542, 2002.

Chaitanya Swamy and David B. Shmoys. Sampling-based approximation algorithms for

multistage stochastic optimization. In In Proceedings of the 46th Annual IEEE Sym-

posium on Foundations of Computer Science, pages 357–366, 2005.

Jan Vondrak. Optimal approximation for the submodular welfare problem in the value

oracle model. In STOC ’08: Proceedings of the 40th annual ACM symposium on

Theory of computing, pages 67–74, 2008. ISBN 978-1-60558-047-0.

J. Yue, B. Chen, and M.C. Wang. Expected Value of Distribution Information for the

Newsvendor Problem. Operations research, 54(6):1128, 2006.

J. Zackova. On Minimax Solutions of Stochastic Linear Programming Problems. Casopis

pro pestovanı matematiky, 91(4):423–430, 1966.

Page 69: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Appendix A

Proof of Theorem 4

We assume that the random variable ξ is a vector n binary variables. That is, it

represents a random subset S of a ground set V = 1, . . . , n. Consider function

f(S) : 2V → R. The marginal probability of each i ∈ V to appear in the random set

S is given by pi. Then, the worst case distribution under given marginals pini=1 is

given by the optimal solution of following linear program:

maxα

S⊆V αSf(S)

s.t.∑

S:i∈S αS = pi, ∀i ∈ V∑

S αS = 1

αS ≥ 0 .

(A.1)

By the equivalence of separation and optimization (Grotschel et al. (1988)), solving

above problem is equivalent to solving the separation problem

maxS

f(S) −∑

i∈S

λi, (A.2)

given any λ ∈ Rn+. We show that there exists a non-negative monotone submod-

ular function f such that the above separation problem is at least as hard as the

MAX-CUT problem.

Definition 14. Given an undirected graph G = (V,E), a cut in G is a subset S ⊆ V .

59

Page 70: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX A. PROOF OF THEOREM 4 60

Let S = V \S, and let E(S, S) denote the set of edges with one vertex in S and one

vertex in S. The MAX-CUT problem is to find the cut S that maximizes |E(S, S)|.The MAX-CUT problem is NP-hard.

Claim 1. Problem (A.2) for monotone submodular functions f is at least as hard as

the MAX-CUT problem.

Proof. Proof. Consider graph G = (V,E). For any set S ⊆ V , define the value of

f(S) as two times the number of edges that have at least one of their end points

in S. Observe that function f(S) is monotone and submodular. Define λi as the

number of edges incident on vertex i. Then, the proof is completed by observing that

|E(S, S)| = f(S) −∑i∈S λi. The observation holds because∑

i∈S λi counts every

edge with its both endpoints in S twice, but the edge going from S to S only once.

On the other hand, f(S) counts each of these edges twice.

This proves that finding worst case distribution is NP-hard even when function

f is restricted to be monotone and submodular. To prove the approximation hard-

ness of O(n−1/2) for monotone submodular functions, we use the connection of our

problem to welfare maximization problem discussed in Section 3.2.2 (refer to equa-

tion (3.3)). In Mirrokni et al. (2008), the authors proved the following hardness result.

Theorem (Mirrokni et al. (2008)): For any ǫ > 0, achieving an approxima-

tion ratio of n−1/2+ǫ for welfare maximization problem (3.3) with identical monotone

subadditive utility functions f(S) requires exponential number of function evaluations.

Thus, assuming each function evaluation takes at least a constant time, the prob-

lem cannot be approximated within a factor better than n−1/2+ǫ in polynomial time.

In fact, they prove this bound for a more restrictive class of fractionally subadditive

functions. On the other hand, Feige (2006) proved that for any subadditive function

(in fact fractionally subadditive), one can round any solution to the LP relaxation

of (3.3) in polynomial time to obtain a feasible allocation for welfare maximization

problem of value at least 1/2 of the value of LP solution.

Page 71: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX A. PROOF OF THEOREM 4 61

As we observed in Section 3.2.2, the LP relaxation of (3.3) is equivalent to the

problem of finding worst case distribution with given marginal probabilities. Now,

suppose that there were a polynomial time algorithm solving the worst case distri-

bution problem within an approximation factor better than 2n−1/2+ǫ. Then, using

the polynomial-time rounding technique of Feige (2006) on the obtained LP solution,

we could obtain an approximation factor better than n−1/2+ǫ for welfare maximiza-

tion problem, thus contradicting the above theorem of Mirrokni et al. (2008). This

completes the proof.

Page 72: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Appendix B

Proof for binary random variables

(details)

B.1 Properties of Split Operation

In this section, we prove the properties of Split operation used in the proof of Lemma

2. For an instance (f, Ω, pi), we will use L (f, Ω, pi) and I (f, Ω, pi) to denote

the expected value under worst case (expectation-maximizing) distribution and inde-

pendent distribution, respectively. Given function f(S), we use f ′(S ′) to denote the

function obtained after splitting.

Property 1. If f is a non-decreasing function, then so is f ′.

Proof. Proof. Monotonicity holds since for any S ′ ⊆ T ′ ⊆ V ′, Π(S ′) ⊆ Π(T ′), and

f ′(S ′) = f(Π(S ′)) ≤ f(Π(T ′)) = f ′(T ′).

Property 2. If f has β-cost sharing property, then so does f ′.

Proof. We construct cost-sharing scheme χ′ for f ′ from a cost-sharing scheme χ for f

as follows. Consider an arbitrary but fixed ordering on elements of V ′. Cost-sharing

62

Page 73: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 63

scheme χ′ coincides with the original scheme χ for the sets without duplicates, but

for a set with duplicates, it assigns the cost-share solely to the copy with smallest

index (as per the fixed ordering). That is, for any S ′ ⊆ V ′, and item Cij (j-th copy of

item i) in S ′,

χ′ij(S

′) =

χi(S), if j = minh : Cih ∈ S ′,

0, o.w. ,(B.1)

where S = Π(S ′), and min computes the lowest index with respect to the fixed

ordering on elements.

To make sure that the new cost-sharing scheme χ′ has expected β-budget-balance

with respect to product distribution for given marginals p′ij, we need to choose

the original cost-sharing scheme χ carefully. Pick χ as the cost-sharing scheme that

satisfied expected β-budget balance with respect to the product distribution with

marginals pi, where pi is defined as

pi = Pr(Cih ∈ S ′ for some h = 1, . . . ,mi),

when S ′ is distributed according to product distribution with marginals p′ij. Since

f has a cross-monotonic, expected β-budget balanced scheme with respect to any

product distribution, such a scheme χ exists. Now, let p denote product distribution

with marginals pi , and p′ denote product distribution on S ′ with marginals p′ij.Then,

Ep′ [∑

ij:Cij∈S′

χ′ij(S

′)] = Ep[∑

i∈S

χi(S)] ≤ Ep[f(S)] = Ep′ [f′(S ′)].

Also, for any S ′, and S = Π(S ′),

ij:Cij∈S′

χ′ij(S

′) =∑

i∈S

χi(S) ≥ f(S)

β=

f ′(S ′)

β.

To observe that cross-monotonicity holds, consider S ′ ⊆ T ′. Now, for any Cij ∈ S ′,

if j is not a lowest indexed copy of i in T ′, then χ′ij(T

′) = 0, so that the condition

is automatically satisfied. Let j is the lowest indexed copy of i in T ′, then it must

Page 74: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 64

have been a lowest indexed copy in S ′, since S ′ is a subset of T ′. Then, by cross-

monotonicity of χ:

χ′ij(T

′) = χi(T ) ≤ χi(S) = χ′ij(S

′),

where S = Π(S ′), T = Π(T ′).

Property 3. If f is submodular, then so is f ′.

Proof. For submodularity, consider any S ′, T ′, with S = Π(S ′), T = Π(T ′). Then,

observe that Π(S ′∪T ′) = S ∪T , and Π(S ′∩T ′) ⊆ S ∩T . Therefore, by monotonicity

and submodularity of f ,

f ′(S ′ ∪ T ′) + f ′(S ′ ∩ T ′) ≤ f(S ∪ T ) + f(S ∩ T ) ≤ f(S) + f(T ) = f ′(S ′) + f ′(T ′).

Property 4. If f(S) is non-decreasing in S, then after splitting,

L (f ′, 2V ′

, p′r) = L (f, 2V , pi).

Proof. Proof. Suppose item 1 is split into m1 pieces, and each piece is assigned a

probability p1

m1

. Let αS denote the worst case distribution with given marginals

for the instance (f, 2V , pi), where αS denotes the probability of set S. Then we

can construct a distribution for the new instance (f ′, 2V ′

, p′r) which has the same

objective value by assigning non-zero probabilities to only those sets which have no

duplicates. That is, define

∀S ′ ⊆ V ′, α′S′ =

αΠ(S′), if S ′ contains no copies of item 11

m1

αΠ(S′), if S ′ contains exactly one copy of item 1

0, otherwise

One can verify that α′S′ is a feasible distribution for the new instance (f ′, 2V ′

, p′r),i.e., it satisfies the marginal distribution constraints. And, it has the same objective

value as L (f, 2V , pi). Hence, L (f, 2V , pi) ≤ L (f ′, 2V ′

, p′r).

Page 75: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 65

For the other direction, consider a worst case distribution α′S′ for the new in-

stance. It is easy to see that there exists an optimal distribution α′S′ such that

α′S′ = 0 for all S ′ that contain more than one copy of item 1. To see this, assume

for contradiction that some set with non-zero probability has two copies of item 1.

By definition of f ′, removing one copy will not decrease the function value. Then,

because of monotonicity of f ′, we can move out one copy to another set T that has

no copy of item 1. Such T always exists since the probabilities of copies of item 1

must sum up to p1 ≤ 1. So, we can assume that in the optimal distribution α′S′ = 0

for any set S ′ containing more than one copy. Then, we can set αS =∑

S′:Π(S′)=S α′S′

to form a feasible distribution for the original instance with same objective value as

L (f ′, 2V ′

, p′r). Thus, L (f ′, 2V ′

, p′r) ≤ L (f, 2V , pi).We can apply the argument recursively for all the items to prove the lemma.

Next, we prove that the expected cost under independent distribution can only

decrease on splitting.

Property 5. If f(S) is non-decreasing in S, then after splitting,

I (f ′, 2V ′

, p′r) ≤ I (f, 2V , pi).

Proof. Proof. Let (f ′, 2V ′

, p′r) denote the new instance obtained by splitting item

1 into m1 pieces. Denote

Λ := S ′ ⊆ V ′|S ′ contains at least one copy of item 1,

and denote π = Pr(S ′ ∈ Λ). Consider the expected cost under independent Bernoulli

distribution. By independence,

I (f ′, 2V ′

, p′r) = ES′ [f ′(S ′) I(S ′ ∈ Λ)] + ES′ [f ′(S ′) I(S ′ /∈ Λ)]

= π ES⊆V \1[f(S ∪ 1)] + (1 − π) ES⊆V \1[f(S)]

≤ p1 ES⊆V \1[f(S ∪ 1)] + (1 − p1) ES⊆V \1[f(S)]

= I (f, 2V , pi).

The second last inequality holds because π = 1 − (1 − p1

m1

)m1 ≤ p1, and f(S) ≤

Page 76: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 66

f(S ∪ 1) by monotonicity. Repeating this argument for all items completes the

proof.

Property 6. If in instance (f, Ω, pi), f has a β-prefix-cost-sharing property, then

in the nice instance (f ′, Ω′, 1/K) obtained by splitting, f ′ has an order specific cost-

sharing scheme with respect to any fixed ordering consistent with partial order given

by AK , . . . , A1 , such that it satisfies

(a) (expected) β-budget balance on the product distribution with marginals 1/K,

(b) prefix property,

(c) cross-monotonicity for any S ′ ⊆ T ′ such that S ′ is partial-prefix of T ′ with respect

to AK , . . . , A1.

Here AK , . . . , A1 denote the disjoint sets in the support of worst case distribution for

the nice instance (f ′, Ω′, 1/K).

Proof. Let σ′ denote a fixed ordering consistent with partial ordering AK , . . . , A1, and

σ′S′ denote the restriction of σ′ to elements in S ′. We construct cost-sharing scheme

χ′ for the nice instance as follows.

χ′(Cij, S

′, σ′S′) =

χ(i, S, σS), j = minh : Cih ∈ S ′,

0, o.w.(B.2)

where σS is the ordering of lowest index copies in σ′S′ , S = Π(S ′), χ is a β-budget

balanced cost-sharing scheme with prefix property with respect to ordering σS, and

min is taken with respect to the ordering σ′S′ . And, χ is a cross-monotonic cost-

sharing scheme for function f that is expected β-budget balanced with respect to

product distributions with marginals pi, defined as

pi = Pr(Cih ∈ S ′ for some h = 1, . . . ,mi),

when S ′ is distributed according to product distribution with marginals 1/K.Then, χ′ satisfies (expected) β-budget-balance with respect product distribution

with marginals 1/K, the proof is same as in the proof of Property 2.

Page 77: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 67

Let S(i) denote the items in S preceding and including i, with respect to the given

ordering. Then, due to prefix property of χ, and definition of χ′,

χ′(Cij, S

′, σ′S′) = χ(i, S, σS) = χ(i, S(i), σS) = χ′(Ci

j, S′(Ci

j), σ′S′).

Therefore χ′ has a prefix property with respect to σ′.

For cross-monotonicity, consider S ′ ⊆ T ′ such that S ′ is a “partial prefix” of T ′.

Now, for any i′ ∈ S ′, if i′ is not a lowest indexed copy in T ′, then χ(i′, T ′, σ′T ′) = 0,

so that the condition is automatically satisfied. Let i′ is one of the lowest indexed

copies in T ′, then it must have been a lowest indexed copy in S ′, since S ′ is a subset

of T ′. Thus,

χ(i′, T ′, σ′T ′) = χ(i, T, σT ) ≤ χ(i, S, σS) = χ(i′, S ′, σ′

S′)

where S = Π(S ′), T = Π(T ′), σS, σT are the orderings of lowest indexed copies in

S ′, T ′ respectively. Note that the inequality in above uses cross-monotonicity of χ,

which inherently assumes that the two cost-sharing schemes – for orderings σS, σT –

coincide. This is true if we have that σS ⊆ σT , that is, if the ordering of elements of

S is same in σS and σT . We show that this is true given the assumption that σS′ , σT ′

respect the partial ordering AK , . . . , A1, and S ′ is a “partial prefix” of T ′. That is,

S ′ ⊆ AK ∪ · · · ∪ Ak, and T ′\S ′ ⊆ Ak ∪ · · · ∪ A1 for some k.

To see this, observe that the splitting was performed in a manner so that atmost

one copy of any element appears in each Ak. So, among the newly added copies

T ′\S ′, any copy of an element of S can occur only in T ′ ∩ Ak+1 or later. Since

S ′ ⊆ A1 ∪ · · · ∪ Ak, this means that for any element i ∈ S, the newly added copies

occur only later in the ordering and they cannot alter the order of lowest indexed

copies of elements of S. This proves that σS ⊆ σT .

B.2 Handling irrational probabilities

In this section, we show that the bounds on correlation gap in Section 2.2.1 for binary

random variables continue to hold even if the probabilities pi are irrational.

Page 78: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX B. PROOF FOR BINARY RANDOM VARIABLES (DETAILS) 68

For any irrational vector pi, there exists a nondecreasing sequence pℓi of ra-

tional vectors such that

pℓi → pi as ℓ → ∞

Then, for each instance (f, Ω, pℓi), since the marginals are rational, we have that

E(pℓ)∗ [f(S)]

Epℓ [f(S)]≤ κ

where (pℓ)∗ and pℓ are worst case and independent joint distributions, respectively,

for instance (f, Ω, pℓi). κ = e/(e− 1) if f is monotone and submodular, and κ = 2β

if f is monotone and has a β-cost-sharing scheme.

Let p∗ and p denote the worst case and independent distribution, respectively, for

instance (f, Ω, pi). Now, observe that due to monotonicity of f , Epℓ [f(S)] is non-

decreasing in ℓ and upper-bounded by Ep[f(S)] < ∞ (by assumption). Therefore, by

monotone convergence theorem Epℓ [f(S)] converges to supℓ Epℓ [f(S)] = Ep[f(S)].

Also, E(pℓ)∗ [f(S)] is non-decreasing in ℓ, and upper-bounded by κEpℓ [f(S)] < ∞.

Therefore, by monotone convergence theorem E(pℓ)∗ [f(S)] converges to supℓ E(pℓ)∗ [f(S)] =

Ep∗ [f(S)].

And hence,E(pℓ)∗ [f(S)]

Epℓ [f(S)]→ Ep∗ [f(S)]

Ep[f(S)]≤ κ

Page 79: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Appendix C

Proof for finite domains (details)

In this section, we rigorously prove the properties of reduction in Lemma 4. As

before, for an instance (f, Ω, pi), we will use L (f, Ω, pi) and I (f, Ω, pi) to

denote the expected value under worst case (expectation-maximizing) distribution

and independent distribution respectively.

We assume w.l.o.g. f(0) = 0. Otherwise, we could instead consider function

f(ξ) − f(0), and the approximation factor proven would hold for f(ξ).

Claim 2. The reduction preserves the monotonicity and β-cost-sharing/submodularity

of the function.

Proof. Consider any ξ′,θ′ ∈ Ω′, with ξ = Θ(ξ′), θ = Θ(θ′).

By definition, if ξ′ ≥ θ′, then ξ ≥ θ. Therefore, monotonicity of function f ′

follows directly from monotonicity of f ,

f ′(ξ′) = f(ξ) ≥ f(θ) = f ′(θ′) .

We define cost-sharing scheme χ′ for f ′ using a cost-sharing scheme χ for f , as follows:

χ′ij(ξ

′) =

χi(ξ) if ξi = j,

0 o.w. .

Given a set of marginals p′ij for the new instance, to ensure that the χ′ has expected

69

Page 80: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX C. PROOF FOR FINITE DOMAINS (DETAILS) 70

β-budget-balance with respect to product distribution∏

ij p′ij, we need to pick χ

carefully. We pick χ as a cross-monotone cost-sharing scheme for f that satisfies

expected β-budget balance with respect to the product distribution with marginals

pi defined as the probability

pi(j) = Pr(Θ(ξ′)i = j),

when ξ′ is distributed as product distribution with marginals p′ij. Since f has a

cross-monotonic, expected β-budget balanced scheme with respect to any product

distribution, such a scheme χ exists. Now, let p denote product distribution with

marginals pi , and p′ denote product distribution with marginals p′ij. Then,

f ′(ξ′)

β=

f(ξ)

β≤

n∑

i=1

χi(ξ) =∑

i,j=ξi

χ′ij(ξ

′) =∑

i,j

χ′ij(ξ

′).

Ep′ [f′(ξ′)] = Ep[f(ξ)] ≥ Ep[

n∑

i=1

χi(ξ)] = Ep′ [∑

i,j

χ′ij(ξ

′)].

So, χ′ is expected β-budget-balanced with respect to product distribution p′. For

cross-monotonicity, given i, j, if χ′ij(ξ

′) = 0, then the condition is automatically

satisfied. Otherwise, j = maxr : ξ′ir = 1, and

χ′ij(ξ

′ij, ξ

′−ij) = χi(j, ξ−i).

Now, for θ′−ij ≥ ξ′

−ij, assume j remains the maximum index (otherwise, χ′ij(ξ

′ij,θ

′−ij) =

0, and the condition is automatically satisfied), then, χ′ij(ξ

′ij,θ

′−ij) = χi(Θ(ξ′ij,θ

′−ij)).

Therefore,

χ′ij(ξ

′ij,θ

′−ij) = χi(j, Θ(ξ′ij,θ

′−ij)−i) ≤ χi(j, Θ(ξ′ij, ξ

′−ij)−i) = χi(j, ξ−i) = χ′

ij(ξ′ij, ξ

′−ij).

The inequality follows since

Θ(ξ′ij,θ′−ij)−i ≥ Θ(ξ′ij, ξ

′−ij)−i.

Page 81: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX C. PROOF FOR FINITE DOMAINS (DETAILS) 71

For submodularity, consider any ξ′,θ′, with ξ = Θ(ξ′), θ = Θ(θ′). Then, observe

that Θ(maxξ′,θ′) = max(ξ,θ), and Θ(minξ′,θ′) ≤ min(ξ,θ). Therefore, by

monotonicity and submodularity of f ,

f ′(maxξ′, θ′)+f ′(minξ′, θ′) ≤ f(maxξ, θ)+f(minξ, θ) ≤ f(ξ)+f(θ) = f ′(ξ′)+f ′(θ′).

Claim 3. The reduction does not change the expected value under worst case distri-

bution.

Proof. To show that expected value over worst-case distribution does not change,

it is sufficient to observe that for any distribution in the original instance, we can

construct a distribution with same expected value in the new instance and vice-versa.

For the first direction, one can simply construct a new distribution by replacing each

ξ in the support of original distribution with ξ′ such that only non-zero components

of ξ′ are ξ′iξi= 1,∀i, and all other components are 0. For other direction, suppose

that there were a ξ′ in the support of worst case distribution with multiple non-zero

components among ξ′i = ξ′ijKi

j=1, for some i. Then, we could move all the non-zero

components, except one, to some other ξ′′

iwith no non-zero components. Due to

marginal distribution constraints such ξ′′ would exist in the support of worst case

distribution. And, due to monotonicity of f , this move cannot decrease the expected

value.

Claim 4. The reduction can only decrease the expected value under independent dis-

tribution.

Proof. For simplicity, let us start by comparing the original instance (f, Ω, pi) to

the new instance (f, Ω′, p′ij) formed by replacing only the first random variable ξ1

by K1 binary random variables ξ′11, . . . , ξ′1Ki

, while keeping the remaining variables

intact. That is, Ω′ = 0, 1K1

i6=1 Ωi. For any ξ′ ∈ Ω′, ξ = Θ(ξ′) is given by

ξ1 = maxj : ξ′1j = 1, and ξi = ξ′i for i 6= 1. Also p′1j = p1(j) for j = 1, . . . , K1, and

p′i = pi for i 6= 1.

Page 82: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX C. PROOF FOR FINITE DOMAINS (DETAILS) 72

Now, given independent distribution over Ω′, denote by πj the probability that

ξ1 = Θ(ξ′)1 takes value j. Then, expected value over independent distribution

I (f ′, Ω′, p′ij) =

K1∑

j=0

E [f ′(ξ′) I(Θ(ξ′)1 = j)]

=

K1∑

j=1

πj E[f(j, ξ−1)] + (1 −∑

j 6=0

πj) E[f(0, ξ−1)]

≤K1∑

j=1

p′1j E[f(j, ξ−1)] + (1 −∑

j 6=0

p′1j) E[f(0, ξ−1)]

=

K1∑

j=1

p1(j) E[f(j, ξ−1)] + (1 −∑

j 6=0

p1(j)) E[f(0, ξ−1)]

= I (f, Ω, pi).

The inequality in above holds because πj ≤ p′1j, and f(0, ξ−1) ≤ f(j, ξ−1) by mono-

tonicity. Repeating this argument for all the components i = 1, . . . , n completes the

proof.

Page 83: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

Appendix D

Maximum of Poisson Random

Variables

In this section, we show that the expected value of the maximum of a set of M

independent identically distributed Poisson random variables can be bounded as

O(log M/ log log M) for large M .

Let λ denote the mean, and F denote the distribution of i.i.d. Poisson variables

Xi. Define G = 1 − F . Also define continuous extension of G,

Gc(x) = exp(−λ)∞∑

j=1

λ(x+j)/Γ(x + j + 1).

Note that G(k) = Gc(k) for any non-negative integer k. Let sequence ak∞k=1 is

defined by Gc(ak) = 1/k. Define continuous function L(x) = log(x)/ log log(x). Then,

in Kimber (1983), it is shown that for large k, ak ∼ L(k).

We use these asymptotic results to derive a bound on expectation of Z = maxi=1,...,M Xi

73

Page 84: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX D. MAXIMUM OF POISSON RANDOM VARIABLES 74

for large M .

E[Z] =∞∑

k=0

Pr(Z > k)

=

⌈L(M2)⌉∑

k=0

Pr(Z > k) +∞∑

k=⌈L(M2)⌉+1

Pr(Z > k)

≤ L(M2) + 1 +

∫ ∞

x=L(M2)

Pr(Z > x)dx . (D.1)

Next, we show that the integral term on the right hand side is bounded by a constant

for large M . Substituting x = L(y) in the integration on the right hand side, we get

∫ ∞

x=L(M2)

Pr(Z > x)dx

=

∫ ∞

L(y)=L(M2)

Pr(Z > L(y))L′(y)dy

≤∫ ∞

y=M2

Pr(Z > L(y))1

ydy .

L′(y) denotes the derivative of function L(y). The last step follows because L′(y) ≤ 1y

for large enough y (i.e. if log log y ≥ 1). Further, since Pr(Z>L(k))k

is a decreasing

function in k, it follows that:

∫ ∞

y=M2

Pr(Z > L(y))

ydy ≤

∞∑

k=M2

Pr(Z > L(k))

k.

Now, for large k, L(k) ∼ ak, and

Pr(Z > ak) ≤ 1 − (1 − Gc(ak))M = 1 −

(

1 − 1

k

)M

.

Page 85: OPTIMIZATION UNDER UNCERTAINTY: A DISSERTATIONkb071qr2204/... · in scope and quality as a dissertation for the degree of Doctor of Philosophy. Yinyu Ye, Primary Adviser I certify

APPENDIX D. MAXIMUM OF POISSON RANDOM VARIABLES 75

Therefore, for large M ,

∞∑

k=M2

Pr(Z > L(k))

k≤

∞∑

k=M2

1

k− 1

k

(

1 − 1

k

)M

≤∞∑

k=M2

2M

k2

≤ 1 .

This proves that the integral term on the right hand side of (D.1) is bounded by a

constant, and thus, for large M ,

E[Z] ≤ L(M2) + 2 = O(log M/ log log M).