robust convex optimization: a new ... - dimitris bertsimas

Robust Convex Optimization:

A New Perspective That Unifies And Extends

Dimitris Bertsimas∗ Dick den Hertog† Jean Pauphilet‡ Jianzhe Zhen§

Abstract

Robust convex constraints are difficult to handle, since finding the worst-case scenario is equivalentto maximizing a convex function. In this paper, we propose a new approach to deal with such constraintsthat unifies approaches known in the literature and extends them in a significant way. The extensionis either obtaining better solutions than the ones proposed in the literature, or obtaining solutions forclasses of problems unaddressed by previous approaches. Our solution is based on an extension of theReformulation-Linearization-Technique, and can be applied to general convex inequalities and generalconvex uncertainty sets. It generates a sequence of conservative approximations which can be used toobtain both upper- and lower- bounds for the optimal objective value. We illustrate the numerical benefitof our approach on a robust control and robust geometric optimization example.

1 Introduction.

In this paper, we consider a general hard robust constraint

h (A(x)z + b(x)) ≤ 0, ∀z ∈ Z, (1)

where h : Rm 7→ [−∞,+∞] is a proper, closed and convex function, A : Rnx 7→ Rm×L, b : Rnx 7→ RLare affine, and z ∈ RL. Unlike inequalities that are concave in the uncertain parameters [1], a tractableequivalent reformulation for the robust constraint (1) is out of reach in general.

When h(·) is (conic) quadratic, several exact reformulations have been proposed for specific uncertaintysets. We refer to [2, Chapter 6] for an overview of these methods. In the general case, generic approximatemethods have been proposed for safely approximating (1). For homogeneous convex functions h(·) andsymmetric norm-based uncertainty sets Z, Bertsimas and Sim [8] propose a computationally tractable safeapproximation together with probabilistic guarantees. Zhen et al. [19] develop safe approximations for conicquadratic and SDP problems with polyhedral uncertainty sets. They first reformulate the robust inequalityinto an equivalent set of linear adaptive robust constraints, which can then be approximated by, e.g., lineardecision rules. Roos et al. [17] extend their approach to general convex functions h(·), but still with polyhedraluncertainty sets.

In this paper, we propose two new general safe approximation procedures for robust convex constraint(1). Our approach is ‘general’ in the sense that it can be applied to any convex function h(·) and convexuncertainty set Z. Using the Reformulation-Perspectification-Technique (RPT) as a unifying lens, ourapproach not only unifies previous work from the literature but also extends them in a significant way. First,it is superior on special cases with homogeneous functions h(·) and/or polyhedral uncertainty sets. Second,it is applicable to any convex function h(·) and convex uncertainty set.

The first approach is based on the ‘perspectification’ of the function h(·) and makes the additionalassumption that the uncertainty set lies in the positive orthant. The latter assumption is often satisfiedin practice after lifting the uncertainty set [8, 9]. For homogeneous functions h(·), it coincides with theapproximation of Bertsimas and Sim [8], and thus can be considered as an extension of Bertsimas and Sim

∗Operations Research Center, Massachusetts Institute of Technology, United States†Department of Econometrics and Operations Research, Tilburg University, The Netherlands‡London Business School, London, United Kingdom§ETH Zurich, Switzerland

1

[8] to generic convex functions. Moreover, we show that the perspectification technique is equivalent to theapproach developed in Zhen et al. [19] and Roos et al. [17] with static decision rules.

We then develop our second and main approximation for a generic class uncertainty sets, describedas the intersection of a polyhedron and a convex set. We first prove that the original uncertain convexinequality (1) is equivalent to an uncertain linear constraint with bilinear uncertainty. In other words, wereformulate (1) as a linear robust constraint where the new uncertainty set Θ is non-convex. We then usethe recently developed Reformulation-Perspectification-Technique (RPT) [20] to obtain a hierarchy of safeapproximations, i.e, we construct convex sets Θi, i = 0, 1, . . . , 3, such that Θ ⊆ Θ3 ⊆ Θ2 ⊆ Θ1 ⊆ Θ0. Weshow that this approach unifies all methods known in the literature, and extends them in a significant way.When the uncertainty set lies in the positive orthant, we show that the first-level approximation Θ1 coincideswith the perspectification approach described previously. Consequently, our approach generalizes further thework of Bertsimas and Sim [8] and Roos et al. [17] to a broader class of uncertainty sets, and improves themby proposing tighter approximations, Θ2 and Θ3. For polyhedral uncertainty sets, our RPT-based approachwith Θ1 coincides with the approximation from Roos et al. [17] with linear decision rules [17, Theorem 2].

The paper is structured as follows: We develop a safe approximation via perspectification in Section2. In Section 3, we present our RPT-based hierarchy of safe approximations and connect it with existingresults from the literature. Our approach can also be used to derive lower bounds on the objective value,as described in Section 4. Finally, we assess the numerical performance of our technique in Section 5, andconclude our findings in Section 6.

1.1 Examples.

In this section, we present a few examples of robust convex constraints (1) often encountered in the literatureand summarize them in Table 1.

Quadratic Optimization (QO) We consider the general quadratic constraint

z>F (x)>F (x)z + f(x)>z ≤ g(x), ∀z ∈ Z,

where F : Rnx 7→ Rm×L, f : Rnx 7→ RL and g : Rnx 7→ R are affine in x. The constraint is of the form (1)with

h

([yy

]):= y>y + y, A(x) :=

[F (x)f(x)>

], and b(x) :=

[0

g(x)

].

Alternatively, the constraint can be represented as a second-order cone constraint∥∥∥∥ F (x)z(1 + f(x)>z − g(x))/2

∥∥∥∥2

≤ (1− f(x)>z + g(x)), ∀z ∈ Z,

which is also of the form (1) for

h

yyy

:=√y>y + y2 + y, A(x) :=

F (x)12f(x)>12f(x)>

, and b(x) :=

01−g(x)

2

− 1+g(x)2

.Piecewise linear constraints (PWL) We consider the piece-wise linear convex constraint

maxk∈[m]

{ak(x)>z + bk(x)

}≤ 0, ∀z ∈ Z,

where ak : Rnx 7→ RL and bk : Rnx 7→ R are affine function of x. Such a constraint is a special case of (1)with

h(y) := maxk∈[m]

yk, A(x) := [ak(x)>]k, and b(x) := [bk(x)]k.

Sum-of-max of linear constraints (SML) The sum-of-max of linear constraints is defined as h(y) :=∑k maxi∈[Ik] yi, for any y ∈ Rny .

2

Geometric optimization (GO) In geometric optimization, the function h in (1) is the log-sum-expfunction defined by h(y) := log (ey1 + · · ·+ eyny ) for any y ∈ Rny .

Sum-of-convex constraints (SC) The sum of general convex functions is defined as h(y) :=∑i∈[I] hi(y),

where hi : Rm 7→ [−∞,+∞], i ∈ [I], are proper, closed and convex with ∩i∈[I]ri(domhi) 6= ∅.

Table 1: Example of functions h(·) and their corresponding conjugate, defined as h∗(w) := supy∈domh{y>w−h(y)}.

Type h dom h∗ h∗

QO 1 h(y, y) := y>y + y {(w, w) : w = 1} h∗(w, w) = 14w>w

QO 2 h(y, y) := ‖y‖2 + y {(w, w) : ‖w‖2 ≤ 1, w = 1} h∗(w, w) = 0PWL h(y) := maxi yi {w : w ≥ 0,

∑i wi = 1} h∗(w) = 0

SML h(y) :=∑k maxi∈Ik yi {wk : wk ≥ 0,∀k,

∑i wki = 1} h∗(w) = 0

GO h(y) = log (∑i eyi) {w : w ≥ 0,

∑i wi = 1} h∗(w) =

∑i wi logwi

SC h(y) :=∑i∈[I] hi(y) {{wi}i :

∑iwi = w,wi ∈ dom h∗i ,∀i} h∗(w) = min{wi}i

∑i h∗i (wi)

In this paper, we study robust convex constraints, and not only robust conic constraints, on purpose.Although it is generally believed that most convex problems can be modeled in terms of the five basic cones:linear, quadratic, semi-definite, power, and exponential cones, we do not restrict our attention to robustconic optimization for the following reason: Although conic optimization is a powerful modeling framework,the uncertainty is often split over multiple constraints, and the robust version of the conic representationis not equivalent to the original one. For example, while the sum-of-max of linear functions is linear-conerepresentable, its robust counterpart is not necessarily equivalent with the robust version of the originalconstraint. The conclusion is that the right order is to first develop the robust counterpart of the nonlinearinequality and then reformulate it into a conic representation, instead of the other way around.

1.2 Notations.

We use nonbold face characters (x) to denote scalars, lowercase bold faced characters (x) to denote vectors,uppercase bold faced characters (X) to denote matrices, and bold calligraphic characters such as X to denotesets. We denote by ei the unit vector with 1 at the ith coordinate and zero elsewhere, with dimension impliedby the context.

We use [n] to denote the finite index set [n] = {1, . . . , n} with cardinality |[n]| = n. Whenever the domainof an optimization variable is omitted, it is understood to be the entire space (whose definition will be clearfrom the context).

The function δ∗(x|S) denotes the support function of the set S evaluated at x, i.e.,

δ∗(x|S) = supy∈S

y>x.

A function f : Rny 7→ [−∞,+∞] is said to be proper if there exists at least one vector y such thatf(y) < +∞ and for any y ∈ Rny , f(y) > −∞. For a proper, closed and convex function f , we define itsconjugate as f∗(w) = supy∈dom f

{w>y − f(y)

}. When f is closed and convex, f∗∗ = f . The perspective

function of f is defined for all y ∈ Rny and t ∈ R+ as

(y, t) 7→

{tf(y/t), if t > 0,

f∞(y), if t = 0.

If the function f : Rny 7→ [−∞,+∞] is proper, closed and convex, then f∞(y) is called the recession function,which can be equivalently defined as

f∞(y) = limt→0

tf(y/t) = δ∗(y|dom f∗).

3

Among others, the perspective and conjugate functions of f satisfy the relationship

tf(yt

)= supw∈dom f∗

{y>w − tf∗(w)

}for t > 0.

For ease of exposition, we simply use tf(y/t) to denote the perspective function f .

2 The perspectification approach.

In this section, we propose an extension of the approach proposed by Bertsimas and Sim [8] for generalconvex constraints and general convex uncertainty sets. Although this approach is dominated by our mainRPT-based approach in Section 3, it has the advantage that probability guarantees can be derived for thesolutions obtained (see Section 2.4 and Appendix A).

The analysis in this section holds under the assumptions that the uncertainty set lies in the positiveorthant. Formally, we make the following assumption:

Assumption 1. The uncertainty set Z = {z ∈ RL+ | ck(z) ≤ 0, k ∈ [K]} is full dimensional and bounded,where ck : RL 7→ [−∞,+∞] is proper, closed and convex for each k ∈ [K].

The non-negativity assumption of Z holds without loss of generality and can always be satisfied byproperly lifting or shifting the uncertainty set. Indeed, if Z * RL+, one can decompose z into z = z+ − z−with z+, z− ≥ 0. Then,

A(x)z + b(x) = A(x)z+ −A(x)z− + b(x) = A(x)

[z+

z−

]+ b(x),

where A(x) =[A(x), −A(x)

]is linear in x and our analysis applies to the lifted uncertainty set

Z ′ ={

(z+, z−) ∈ RL+ × RL+ : (z+ − z−) ∈ Z}.

However, the lifted uncertainty set Z ′ defined above might be unbounded. As a result, Bertsimas and Sim[8] and Chen and Zhang [9] proposed to incorporate a non-convex lifting for norm-based uncertainty sets.For instance, if Z =

{z ∈ RL : z ∈ Z, ‖z‖∞ ≤ 1, ‖z‖2 ≤ γ

}, Bertsimas and Sim [8] consider the following

lifted uncertainty set:

Z+ ={

(z+, z−) ∈ RL+ × RL+ : (z+ − z−) ∈ Z, ‖z+ + z−‖∞ ≤ 1, ‖z+ + z−‖2 ≤ γ}.

Alternatively, we can shift the uncertainty set by some constant factor z0 such that z + z0 ∈ z0 + Z ⊆ RL+.Hence,

A(x)z + b(x) = A(x)(z + z0)−A(x)z0 + b(x)︸︷︷︸=: b(x) affine in x

.

We will numerically discuss the implications of these two alternatives on the numerical behavior of ourproposed approximations.

2.1 Safe approximations via perspectification.

The perspectification approach is based on the following perspectification lemma [10]:

Lemma 1 (Perspectification of Convex Functions). If the function g : Rm 7→ [−∞,+∞] is proper, closedand convex, then for any y ∈ Rm and strictly positive weights α1, . . . , αm > 0, there exists t ≥ 0, with∑mi=1 αiti = 1, such that:

g

(m∑i=1

αiyi

)≤

m∑i=1

αitig

(yiti

). (2)

4

Remark 1. If g is positively homogeneous, i.e., for any vector y ∈ Rm and any positive scalar λ > 0,g(λy) = λg(y), then (2) reduces to

g

(m∑i=1

αiyi

)≤

m∑i=1

αig (yi) ,

and that additional variables ti in (2) need not to be introduced. In this case, Lemma 1 simply states that gis sub-additive.

We use this lemma to develop a safe approximation of (1), i.e., a sufficient condition for x to satisfy (1).

Proposition 1 (Safe Approximation via Perspectification). Under Assumption 1, a safe approximation of(1) is:

∀z ∈ Z, ∃(u, u) ∈ RL+1+ , s.t.

L∑`=1

zù`h

(A(x)eù`

)+ u h

(b(x)

u

)≤ 0, (3a)

L∑i=1

ziui + u = 1, (3b)

where A(x)e` selects the `-th column of the matrix A(x) for each ` ∈ [L].

Proof. Proof of Proposition 1For any z ∈ Z, there exists a vector (u, u) ∈ RL+1+ such that

h(A(x)z + b(x)) = h

(L∑`=1

zÀ(x)e` + b(x)

)

≤L∑`=1

zù`h

(A(x)eù`

)+ u h

(b(x)

u

),

where the inequality follows from Lemma 1 under the conditions

L∑`=1

zù` + u = 1, and u1, . . . , uL, u ≥ 0.

Proposition 1 approximates the robust convex constraint (1) via a set of adaptive robust constraints thatdepend linearly in the uncertainty z. Note, however, that even in the fully adaptive case where (u, u) candepend arbitrarily on the uncertain parameter z, Proposition 1 is only a valid approximation of the robustconstraint (1), not an equivalent reformulation.

Remark 2. The safe approximation in Proposition 1 can be concisely written as

supz∈Z

inf(u,u)∈RL+1

+

z>u+u=1

L∑`=1

zù`h

(A(x)eù`

)+ u h

(b(x)

u

)≤ 0.

Expressing h(·) as the conjugate of its conjugate, we have

zù`h

(A(x)eù`

)= supv`:v`/z`∈domh∗

{e>` A(x)>v` − zù`h?

(v`z`

)},

u h

(b(x)

u

)= supw∈domh∗

{b(x)>w − uh?(w)

}.

5

Finally, for a fixed z ∈ Z, we can interchange the inf and sup operators, take the dual with respect to (u, u),and reformulate Proposition 1 as

supz∈Z

supw∈domh∗v`z`∈domh∗

supw0: h∗(w)≤w0

z`h∗(

v`z`

)≤z`w0

L∑`=1

e>` A(x)>v` + b(x)>w − w0,

where w0 is the dual variable associated with the linear constraint z>u + u = 1. However, the maximiza-tion problem above is not convex in (z, {v`}`,w, w0) due to the product of variable z`w0 in the constraint

z`h(v`z`

)≤ z`w0 for every ` = 1, . . . , L.

Since the safe approximation in Proposition 1 is an adaptive robust linear constraint, we can then derivetractable reformulations by restricting our attention to specific adaptive policies. For instance, if we consideronly static policies, we obtain the following safe approximation of the inequalities (3b) and (3a):

Corollary 1. Under Assumption 1, a safe approximation of (1) is:

z>h∞(A(x)) + h (b(x)) ≤ 0, ∀z ∈ Z, (4)

where h∞(A(x)) := (h∞(A(x)e1), . . . , h∞(A(x)eL))>.

Proof. Proof of Corollary 1

∃(u, u) ∈ RL+1+ , s.t.

L∑`=1

zù`h

(A(x)eù`

)+ u h

(b(x)

u

)≤ 0, ∀z ∈ Z,

z>u+ u = 1, ∀z ∈ Z.

Since Z is full dimensional by Assumption 1, the equality constraint implies u = 0 and u = 1.

Remark 3. For the safe approximation (4) to be meaningful, we need the recession function of h, h∞(·) tobe finite. If this is not the case - if h is quadratic for instance - then, by decomposing A(x)z + b(x) into∑` zÀ(x)e` + b(x) + 0 and applying the same line of reasoning as for Proposition 1, we can show that

∃(u, u) ∈ RL+1+ , s.t.

L∑`=1

zù`h

(A(x)eù`

)+ u h

(b(x)

u

)≤ 0, ∀z ∈ Z,

z>u+ u ≤ 1, ∀z ∈ Z,

also constitutes a safe approximation of (1) under the additional assumption that h(0) = 0.

Remark 4. Under Assumption 1, the uncertain parameter z appears linearly in (4), and Z admits aSlater point, thus, the tractable robust counterpart of (4) can be derived using standard robust optimizationtechniques:

∃λ ∈ RK+ , {vk}k ∈ RL, s.t.

K∑k=1

λkc∗k

(vk/λk

)+ h (b(x)) ≤ 0,

h∞(A(x)) ≤K∑k=1

vk.

6

2.2 Connection with the safe approximation of Bertsimas and Sim [8].

In this subsection, we show that Proposition 1 extends the approach of Bertsimas and Sim [8] toproper, closed and convex function h(·) that is not necessarily homogeneous.

Indeed, Bertsimas and Sim [8] construct a safe approximation for the robust constraint (1) in the specialcase where the function h(·) is homogeneous. In this case, they observe that h is sub-additive and usesub-additivity to derive a safe approximation. As we previously observed, the perspectification Lemma 1 isa natural generalization of sub-additivity to general convex functions, so that our approach can be seen asan extension of Bertsimas and Sim [8]. Under Assumption 1, if h(·) is further assumed to be homogeneous,then Proposition 1 leads to the following safe approximation of constraint (1):

L∑`=1

z`h (A(x)e`) + h (b(x)) ≤ 0 ∀z ∈ Z,

which coincides with the approximation proposed by Bertsimas and Sim [8]. Note that in this case wherethe function h(·) is homogeneous, Proposition 1 does not require the introduction of the difficult adaptivedecision variables (u, u). Besides the safe approximation, Bertsimas and Sim [8] also propose a non-convexlifting procedure to transform any norm-based set into a lifted uncertainty set that lies in the positive orthantand satisfies Assumption 1. On that regard, our proposal is agnostic of the procedure used in practice toensure Assumption 1 holds.

2.3 Connection with the adaptive approach from Roos et al. [17].

In this subsection, we show that Corollary 1 coincides with the safe approximation of (1) fromRoos et al. [17] with static policies.

Roos et al. [17] propose to first reformulate (1) as an adaptive robust nonlinear constraint.

Proposition 2 (Roos et al. [17], Theorem 6). Under Assumption 1, the robust convex constraint (1) isequivalent to the set of adaptive robust constraints

∀w ∈ domh∗, ∃λ ≥ 0, {uk}k, s.t.

K∑k=1

λkc∗k

(uk/λk

)+ b(x)>w − h∗(w) ≤ 0,

A(x)>w ≤K∑k=1

uk,

(5)

Similarly as in Corollary 1, Roos et al. [17] then safely approximate the adaptive robust nonlinear in-equalities (5) via static polices:

∃λ ≥ 0, {uk}k, s.t. ∀w ∈ domh∗,

K∑k=1

λkc∗k

(uk/λk

)+ b(x)>w − h∗(w) ≤ 0,

A(x)>w ≤K∑k=1

uk,

which admits a tractable robust counterpart [17, Theorem 7] that coincides with that of (4) in Remark 4.

2.4 Probabilistic guarantees.

The main advantage of the safe approximation obtained with Corollary 1 is that the resulting constraintis a robust constraint with the same uncertainty set Z for which probabilistic guarantees can be derived.

7

Indeed, probabilistic guarantees require distributional assumptions on the original uncertain parameter, sokeeping the uncertainty set intact is a welcome property.

Assumption 1 requires the uncertainty set Z to lie in the positive orthant, an assumption which isoften satisfied in practice once the original uncertain parameter is shifted or lifted. Accordingly, we deriveprobabilistic guarantees for these two cases. For concision, we only report the main results in this sectionand refer proofs to Appendix A for proofs in Appendix A.1 and A.2 respectively.

Shifted uncertainty set

Assumption 2. We assume that

(i) the uncertainty set Z ⊆ RL+ can be decomposed into Z = z0 + U , where U is full dimensional andcontains 0 in its interior;

(ii) the random vector z can be decomposed into z = z0 + u, where z0 is a deterministic vector and u is arandom vector whose coordinates are L independent sub-Gaussian random variables with parameter 1(see Definition 1 in Appendix A.1).

In terms of distribution, Assumption 2 requires the coordinates of u to be sub-Gaussian or, equivalently, tohave Gaussian-type tail probabilities, which is a fairly general assumption. In particular, standard Gaussianrandom variables or random variables with mean 0 and bounded between −1 and 1 are special cases ofsub-Gaussian random variables with parameter 1. Under this set of assumptions, we derive the followingprobabilistic guarantees.

Proposition 3. Under Assumption 1 and 2, if x satisfies

z>h∞(A(x)) + h(b(x)) ≤ 0, ∀z ∈ Z,

with h∞(A(x)) = (h∞(A(x)e1), . . . , h∞(A(x)eL))>

, then

P (h (A(x)z + b(x)) > 0) ≤ exp

(−δ∗ (h∞(A(x))|U)

2

2‖h∞(A(x))‖22

)+ p0

≤ exp(− 1

2ρ(U)2)

+ p0,

where

a) δ?(y|U) := maxu∈U

y>u is the support function of the set U ,

b) ρ(U) := miny:‖y‖2=1

δ∗(y|U) denotes its robust complexity [6],

c) and p0 ≥ P(z � 0).

Proposition 3 displays a posteriori and a priori probabilistic guarantees, i.e., a bound that depends ona specific robust solution x and one that only depends on the uncertainty set U . In both cases, it involvesp0 ≥ P(z � 0) = P(u � −z0), i.e., the probability that a random uncertain parameter z does not belongto the positive orthant. In cases where z is known or assumed to have bounded support - for instancez ∈ [−1, 1]L almost surely - then z0 can be chosen large enough so that P(z � 0) = 0. Otherwise, for ageneral z0 ≥ 0, a union bound inequality yields

P(z � 0) = P(u � −z0) ≤L∑i=1

P(ui � −z0i) ≤L∑i=1

exp(−z20i/2),

where the last inequality follows from the fact that ui ∼ subG(1).The a priori guarantees in Proposition 3 depend on the uncertainty set through its robust complexity,

ρ(U), introduced by Bertsimas et al. [6]. We report closed-form expressions of ρ(U), or at least a lowerbound, for commonly used uncertainty set in Table 2.

8

Table 2: Valid lower bound on robust complexity of U , defined as ρ(U) := miny:‖y‖2=1 maxu∈Z y>u, for

some common uncertainty sets. Instances denoted by a * are valid under the assumption that the trueuncertain parameter u satisfies ‖u‖∞ ≤ 1.

Uncertainty set Definition Safe approximation of ρ(U)

Norm-set {u : ‖u‖p ≤ Γ}

{Γ, if p ≥ 2,

ΓL1/2−1/p, if p ≤ 2.

Budget set* {u : ‖u‖∞ ≤ 1, ‖z‖1 ≤ Γ} Γ/√L

Box-Ellipsoidal set* {u : ‖u‖∞ ≤ 1, ‖z‖2 ≤ Γ} Γ

`∞+`1 set {z = u1 + u2 : ‖u1‖∞ ≤ ρ1, ‖u2‖1 ≤ ρ2} ρ1 + ρ2/√L

`∞+`2 set {u = u1 + u2 : ‖u1‖∞ ≤ ρ1, ‖u2‖2 ≤ ρ2} ρ1 + ρ2

Polyhedral set {u : Du ≤ d} mini

di‖D>ei‖2

Lifted uncertainty set We now consider the case where Z is a lifted version of an original uncertaintyset, i.e,

Assumption 3. We assume that

(i) the uncertainty set Z ⊆ R2L+ is of the form Z = {(z+, z−) ∈ RL+×RL+ | ∃u ∈ U : z+ = max(u,0), z− =

max(−u,0)}, where U is full dimensional and contains 0 in its interior;

(ii) the random vector z can be decomposed into z = (max(u,0),max(−u,0)) ∈ R2L+ , where the coordinates

of u ∈ RL are L independent sub-Gaussian random variables with parameter 1.

Despite additional technicalities due to the fact that the coordinates of z± are defined as the positive(resp. negative) parts of sub-Gaussian random variables, we obtain similar guarantees.

Proposition 4. Define |U | := {|u| : u ∈ U}, g(A,x) := [maxh∞(±A(x)e`)]`=1,...,L and α =√L. Consider

a vector x satisfying

δ∗ (g(A,x)| |U |) + h (b(x)) ≤ 0,

1. If t :=δ∗ (g(A,x)| |U |)α‖g(A,x)‖2

> 1, and under Assumption 3, then

P (h (A(x)u+ b(x)) > 0) ≤√et exp

(− t

2

2

).

2. If ρ(|U |) > α and under Assumption 3, then

P (h (A(x)u+ b(x)) > 0) ≤√e

αρ(|U |) exp

(− 1

2α2ρ(|U |)2

).

Proposition 4 recovers exactly Bertsimas et al. [7, Theorem 2], under the weaker assumption that uj ’sare independent sub-Gaussian random variables (instead of independent Gaussian random variables) andfor any uncertainty set U . In particular, our proof relies on a novel generalization of the Hanson-Wrightinequality to sub-Gaussian random variables, which is of independent interest (see Appendix 1, Theorem 1and Corollary 2). However, the constant α in Proposition 4 is weaker than the ones obtained by Bertsimaset al. [7]. Indeed, while our bound is generic, they provide h-specific values for α, which we could use totighten our bound as well.

9

3 The reformulation-perspectification approach.

In this section, we describe our main approach, which is based on an extension of the Reformulation-Linearization-Technique, that is, Reformulation-Perspectification-Technique [20]. We show that it unifiesmany existing robust convex optimization techniques, including the perspectification approach described inthe previous section, and extends them in a significant way.

Our approach comprises three steps:

Step 1. Reformulate the robust convex constraint (1) into a robust linear optimization problem with bi-linear uncertainty.

Step 2. Apply the Reformulation-Perspectification-Technique to get a safe approximation of the robustconvex inequality that is concave in the uncertain parameters.

Step 3. Construct a computationally tractable robust counterpart of the approximation obtained in Step2 by using the approach described in Ben-Tal et al. [1].

3.1 Bilinear reformulation.

We first describe Step 1 in more detail. The following proposition shows that the robust convex constraint(1) is equivalent to a robust linear constraint with disjoint bilinear uncertainty.

Proposition 5 (Bilinear Reformulation). The robust convex constraint (1) is equivalent to

sup(w,w0,z,V ,v0)∈Θ

{Tr(A(x)>V ) + b(x)>w − w0

}≤ 0, (6)

where Θ ={

(w, w0, z,V ,v0)∣∣ z ∈ Z, w ∈ domh∗, h∗(w) ≤ w0, V = wz>,v0 = w0z

}.

Proof. Proof of Proposition 5Because h is closed and convex we have

h(s) = h∗∗(s) = supw∈domh∗

{s>w − h∗(w)

},

and thus

supz∈Z

h (A(x)z + b(x)) = supz∈Z

supw∈domh∗

{z>A(x)>w + b(x)>w − h∗(w)

}(7)

= supz∈Z

w∈domh∗

h∗(w)≤w0

{Tr(A(x)>wz>) + b(x)>w − w0

}= sup

(w,w0,z,V ,v0)∈Θ

{Tr(A(x)>V ) + b(x)>w − w0

},

for the properly defined set Θ.

The complication in (6) is that the extended uncertainty set Θ is not convex due to the bilinear con-straint V = wz>. Therefore, in Step 2 we propose a safe approximation of Θ based on the Reformulation-Perspectification-Technique [20]. Formally, we provide tractable approximations of the robust convex con-straint in (1) by exhibiting convex sets Φ such that Θ ⊆ Φ. Indeed, if Θ ⊆ Φ, then

sup(w,w0,z,V ,v0)∈Φ

{Tr(A(x)>V ) + b(x)>w − w0

}≤ 0

constitutes a safe approximation of (1). In addition, this safe approximation is linear in the uncertainparameter (w, w0, z,V ,v0) and convex in the decision variable x. Hence, provided that the new uncertaintysets Φ is convex, its robust counterpart can be derived in a tractable manner using techniques from Ben-Talet al. [1] (Step 3).

10

3.2 Hierarchy of safe approximations.

To develop a hierarchy of safe approximations for the robust non-convex constraint (6), we assume that theuncertainty set Z is described through K0 linear and K1 convex inequalities.

Assumption 4. The uncertainty set Z = {z ∈ RL |Dz ≤ d, ck(z) ≤ 0, k ∈ [K1]}, where D ∈ RK0×L,d ∈ RK0 , and ck : RL 7→ [−∞,+∞] is proper, closed and convex for each k ∈ [K1].

Note that Assumption 4 is more general than Assumption 1 since non-negativity constraints are a spe-cial case of linear constraints. Furthermore, the decomposition of Z into linear and convex constraints inAssumption 4 is not unique. Indeed, affine functions are also convex so one could assume K0 = 0 withoutloss of generality. However, the presence of linear constraints in the definition of Z will be instrumental inderiving good approximations. Consequently, from a practical standpoint, we encourage to include as manylinear constraints as possible in the linear system of inequalities Dz ≤ d. We illustrate this modeling choiceon a simple example.

Example 1. Consider the polytope Z = {z ∈ RL|∑L`=1 z` ≤ 1, z` ≥ 0, ` ∈ [L]}. Z satisfies Assumption 1

with K = 1 and c1(z) =∑` z` − 1. Therefore, Z also satisfies Assumption 4 with D = −IL, d = 0L, and

c1(z) =∑` z`− 1, hence K0 = L and K1 = 1. Alternatively, Z satisfies Assumption 4 with K0 = L+ 1 and

K1 = 0 with

D =

[−ILe>

], and d =

[0L1

].

Similarly, we assume that domh∗ can be described with J0 linear and J1 convex inequalities:

domh∗ = {w | Fw ≤ f , gj(w) ≤ 0, j ∈ [J1]},

so that the non-convex uncertainty set Θ can be represented as

Θ =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣∣∣∣Dz ≤ d, ck(z) ≤ 0, k ∈ [K1] (z ∈ Z)Fw ≤ f , gj(w) ≤ 0, i ∈ [J1] (w ∈ domh∗)h∗(w) ≤ w0 (epigraph)V = wz>

v0 = w0z

.

Proposition 6 (Convex Relaxation via RPT). For any (w, w0, z,V ,v0) ∈ Θ, we have

a)dkw − V D>ekdk − e>kDz

∈ domh?, for all k ∈ [K0],

b)(zf> − FV )ej(f − Fw)>ej

∈ Z, for all j ∈ [J0].

Each of these two set memberships can be expressed through constraints that are convex in (w, w0, z,V ,v0).As a result, the three sets

Θ0 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣z ∈ Z,w ∈ domh∗,h∗(w) ≤ w0

,

Θ1 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣(w, w0, z,V ,v0) ∈ Θ0,dkw − V D>ekdk − e>kDz

∈ domh?, k ∈ [K0]

,

Θ2 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣(w, w0, z,V ,v0) ∈ Θ1,(zf> − FV )ej(f − Fw)>ej

∈ Z, j ∈ [J0]

are convex and satisfy Θ ⊆ Θ2 ⊆ Θ1 ⊆ Θ0.

11

Proof. Proof

a) Consider a linear constraint on z, e>k (d−Dz) ≥ 0 for some k ∈ [K0]. By multiplying it by a genericconvex constraint on w, g(w) ≤ 0, we obtain

e>k (d−Dz)g(w) = e>k (d−Dz)g

((wd> − V D>

)ek

(d−Dz)>ek

)≤ 0,

which is convex in ((d−Dz)>ek,(wd> − V D>

)ek), since the constraint function is the perspective

of the convex function g. We apply this methodology to

• Linear constraints on w, g(w) = e>j (Fw − f), j ∈ [J0] and obtain

F(wd> − V D>

)≤ f . (8)

• g(w) = gj(w), j ∈ [J1], i.e,

e>k (d−Dz)gj

((wd> − V D>

)ek

(d−Dz)>ek

)≤ 0, ∀j ∈ [J1], (9)

• and the epigraph constraint, h?(w) ≤ w0, yielding

e>k (d−Dz)h∗

((wd> − V D>

)ek

(d−Dz)>ek

)≤ e>k (w0d−Dv0). (10)

Together, these constraints (8)-(9)-(10) enforce that

dkw − V D>ekdk − e>kDz

∈ domh?.

b) We can also multiply every linear constraint inw, e>j (f−Fw), j ∈ [J0] with a generic convex constraintin z, c(z) ≤ 0 and obtain new a convex constraint

e>j (f − Fw)c

((zf> − FV )eje>j (f − Fw)>ej

)≤ 0. (11)

Only the cases where c(z) = ck(z), k ∈ [K1] have not been covered previously. Eventually, we obtainconstraints (8)-(11), enforcing that

(zf> − FV )ej(f − Fw)>ej

∈ Z.

Proposition 6 exhibits a hierarchy of approximations to convexify the non-convex set Θ. A safe approx-imation is then obtained by considering the robust linear constraint

sup(w,w0,z,V ,v0)∈Φ

{Tr(A(x)>V ) + b(x)>w − w0

}≤ 0,

with Φ = Θi, i = 0, 1, 2. Observe that for Θ0, the robust counterpart of the safe approximation is

A(x) = 0, h(b(x)) ≤ 0.

Accordingly, it is fair to admit that Θ0 offers a poor approximation and that relevant safe approximations areto be found at higher orders of the hierarchy. A higher order in the hierarchy, however, does not necessarily

12

imply a tighter safe approximation. Indeed, if the description of Z involves no linear constraints (K0 = 0),then Θ0 = Θ1. Similarly, Θ1 = Θ2 whenever J0 = 0.

So far, we obtained convex constraints on (w, w0, z,V ,v0) by multiplying a linear constraint on z (resp.w) with a convex constraint on w (resp. z). However, in many cases multiplying two general convexconstraints, e.g., multiplying ck(z) ≤ 0 with gj(w) ≤ 0, also yields valuable convex constraints. Theresulting set, which we denote Θ3 and concisely define as

Θ3 =

{(w, w0, z,V ,v0)

∣∣∣∣ (w, w0, z,V ,v0) ∈ Θ2,ck(z)× gj(w) ≤ 0, k ∈ [K1], j ∈ [J1]

}provides an even tighter safe approximation of (6). In the definition above, “ck(z)×gj(w) ≤ 0” denotes validconvex inequalities obtained when multiplying “ck(z) ≤ 0” with “gj(w) ≤ 0”. In the following example,we illustrate this approach when ck and gj are two conic quadratic functions, which typically occurs withellipsoidal uncertainty sets and a conic quadratic function h. We refer to Zhen et al. [20] for a thoroughanalysis of all 15 possible multiplications of any two of the five basic convex cones (linear, conic quadratic,power, exponential, semi-definite).

Example 2. Consider two conic quadratic inequalities{b1 − a>1 w ≥ ‖D1w + p1‖,b2 − a>2 z ≥ ‖D2z + p2‖,

the first one coming from the description of domh∗, and the second is one of the constraints that define Z.We now apply RPT to these inequalities, i.e., we multiply the LHS and RHS of the two constraints, andwe multiply the LHS of the first constraint with the second constraint, and multiply the LHS of the secondconstraint with the first constraint. We obtain: (b1 − a>1 w)(b2 − a>2 z) ≥ ‖D1w + p1‖‖D2z + p2‖

(b1 − a>1 w)(b2 − a>2 z) ≥ (b1 − a>1 w)‖D2z + p2‖(b1 − a>1 w)(b2 − a>2 z) ≥ (b2 − a>2 z)‖D1w + p1‖

⇐⇒

b1b2 − b1a>2 x− b2a>1 w + a>1 wz>a2 ≥ ‖(D1w + p1)(D2z + p2)>‖2

b1b2 − b1a>2 z − b2a>1 w + a>1 wz>a2 ≥ ‖b1D2z + b1p2 −D2wz

>a1 − a>1 wp2‖b1b2 − b1a>2 z − b2a>1 w + a>1 wz

>a2 ≥ ‖b2D1w + b2p1 −D1wz>a2 − a>2 zp1‖

⇐⇒

b1b2 − b1a>2 z − b2a>1 w + a>1 V a2 ≥ ‖D1V D>2 + p1z

>D>2 +D1wp>2 + p1p

>2 ‖2

b1b2 − b1a>2 z − b2a>1 w + a>1 V a2 ≥ ‖b1D2z + b1p2 −D2V a1 − a>1 wp2‖b1b2 − b1a>2 z − b2a>1 w + a>1 V a2 ≥ ‖b2D1w + b2p1 −D1V a2 − a>2 zp1‖,

since V = wz>. Here, ‖ · ‖2 denotes the 2-norm of a matrix defined as its largest singular value, i.e.,‖M‖2 := σmax(M) =

√λmax(MM>). In particular, we used the fact that for rank-1 matrices M = uv>,

‖M‖2 = ‖u‖‖v‖. In practice, constraints of the form ‖M‖2 ≤ λ can be modeled as semidefinite constraintsusing Schur complement. To improve scalability, however, one can replace the 2-norm in the first constraintby the Frobenius norm of the matrix. Indeed, ‖M‖2 ≤ ‖M‖F , and the equality holds for rank-1 and zeromatrices, so b1b2 − b1a>2 z − b2a>1 w + a>1 V a2 ≥ ‖D1V D

>2 + p1z

>D>2 +D1wp>2 + p1p

>2 ‖F

b1b2 − b1a>2 z − b2a>1 w + a>1 V a2 ≥ ‖b1D2z + b1p2 −D2V a1 − a>1 wp2‖b1b2 − b1a>2 z − b2a>1 w + a>1 V a2 ≥ ‖b2D1w + b2p1 −D1V a2 − a>2 zp1‖

are valid, yet looser, constraints.

We could even further tighten our approximations by applying RPT to z and to w separately. Morespecifically, we could multiply the linear and nonlinear constraints that define Z with each other and expressthem in terms of Z, an additional variable defined as Z = zz>. The same could be done for the constraintsin domh∗, and define W = ww>. By doing so, all the nonlinear substitutions are concisely defined as(

W VV > Z

)=

(wz

)(wz

)>.

13

In a relaxation, the equality = is replaced by � and, by using Schur complements, yields the semidefiniteconstraint: W V w

V > Z zw> z> 1

� 0. (12)

Hence, we can add this LMI to any of the safe approximations obtained by RPT Θi and the resultinguncertainty set would provide a safe approximation at least as good as the one obtained from Θi. However,whether the extra computational burden outweighs the improvement of the approximation remains an openquestion, whose answer might depend on the specific problem. On a geometric optimization example inSection 5.2, for instance, we observed no numerical improvement from adding this LMI.

3.3 Description of the approach for conic constraints.

In this section, we describe our Reformulation-Perspectification-Technique for conic inequalities. We consideran uncertain conic constraint of the type

A(x)z + b(x) �K 0, ∀z ∈ Z, (13)

where K ∈ <m is a closed convex cone with nonempty relative interior. Examples of these cones are thelinear cone, the second-order cone, the semi-definite cone, the power cone, and the exponential cone. Werefer the reader to MOSEK [15] for an extensive treatment of these five cones. The dual cone, referred to asK∗ of K is defined as

K∗ ={w ∈ <m : w>u ≥ 0, ∀u ∈ K

},

and is again a convex cone. Furthermore, we have K = (K∗)∗, because the cone is closed and convex. Usingthese properties, we derive for (13)

∀z ∈ Z : A(x)z + b(x) �K 0

⇐⇒ ∀w ∈ K∗ ∀z ∈ Z : w> (A(x)z + b(x)) ≤ 0

⇐⇒ maxw∈K∗

maxz∈Z

w> (A(x)z + b(x)) ≤ 0

⇐⇒ maxw∈K∗

maxz∈Z

Tr(A(x)>wz>) +w>b(x) ≤ 0.

We can now directly apply our RPT approach to the last inequality that is bilinear in the uncertain param-eters w and z. Note that this conic case is a special case of (6) where we have h∗(w) = 0, and domh∗ = K∗.

3.4 Connection with the perspectification approach.

In this subsection, we show that the safe approximation obtained from Θ1 for (1) with Z ={z | z ≥ 0, ck(z) ≤ 0, k ∈ [K]} coincides with that from the perspectification approach in Sec-tion 2.1; see Corollary 1.

To elicit this connection, we need the following identity, for any u ≥ 0,

u h(yu

)= supw∈domh∗

{y>w − uh?(w)

}.

Denoting between brackets the dual variables associated with constraints in the definition of Θ1,

Θ1 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣∣z ∈ Z,w ∈ domh∗, h∗(w) ≤ w0, [u]

V e`/z` ∈ domh?, z`h∗(V e`z`

)≤ v0`, ` ∈ [L], [u`]

,

14

we have

sup(w,w0,z,V ,v0)∈Θ0

{Tr(A(x)>V ) + b(x)>w − w0

}= sup

z∈Z,w0,v0w∈domh∗

V e`/z`∈domh∗

infu,u≥0

Tr(A(x)>V ) + b(x)>w − w0 −L∑`=1

u`

(z`h∗(V e`z`

)− v0`

)− u(h∗(w)− w0)

= supz∈Z

infu,u≥0

supw0,v0

∑`

u`z`h

(A(x)e`u`

)+ uh

(b(x)

u

)+ u>v0 + (u− 1)w0

= supz∈Z

z>h∞ (A(x)) + h (b(x)) .

Remark 5. As previously observed, Corollary 1 might be uninformative if the recession function of h isunbounded. In this case, we proposed an alternative to Corollary 1 for functions h satisfying h(0) = 0.Similarly, if we assume h(0) = 0, then w0 ≥ h∗(w) ≥ 0. In this case, we should define

Θ0 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣∣z ∈ Z,w ∈ domh∗,w0 ≥ 0, h∗(w) ≤ w0,v0/w0 ∈ Z

,

in Proposition 6, where the membership v0/w0 ∈ Z is enforced by multiplying each constraint in the definitionof Z by w0 ≥ 0, and the connection between Corollary 1 and Θ1 still holds.

3.5 Connection with Roos et al. [17] for polyhedral uncertainty sets.

In this subsection, we show that the safe approximation obtained from Θ1 for (1) with Z ={z |Dz ≤ d}, where D ∈ RK0×L, d ∈ RK0 , coincides with that of Roos et al. [17, Theorem 2].

It follows from Proposition 6 that the convex robust constraint (1) can be safely approximated by

sup(w,w0,z,V ,v0)∈Θ1

{Tr{A(x)>V }+ b(x)>w − w0

}≤ 0,

where

Θ1 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣Dz ≤ d,h∗(w) ≤ w0

(dk − e>kDz)h∗(wdk−V D>ekpk−e>kDz

)≤ dkw0 − e>kDv0, k ∈ [K0]

.

The corresponding tractable robust counterpart constitutes a safe approximation of (1), which coincideswith the safe approximation of (1) proposed by Roos et al. [17, Theorem 2].

3.6 Connection to result for robust quadratic with an ellipsoid.

It is well-known that robust quadratic inequalities with an ellipsoidal uncertainty set admits an exact SDPreformulation via the S-lemma (see [2]). Alternatively, we show that the same reformulation can be obtainedvia RPT. To this end, consider the following robust quadratic inequality, which is a special case of thebilinear reformulation in Proposition 5 with w = z,

z>A(x)z + b(x)>z ≤ g(x) ∀z ∈ {z ∈ RL : z>Dz + d>z ≤ c}, (14)

which is equivalent to

max(z,Z)∈Θell

{Tr(A(x)Z) + b(x)>z

}≤ g(x),

15

where Θell ={

(z,Z) ∈ RL × SL | Tr(DZ) + d>z ≤ c, Z = zz>}

. The set Θell is non-convex because ofthe nonlinear equality Z = zz>, and the outer approximations of Θell proposed in Proposition 6 satisfyΘell ⊆ Θell,0 = Θell,1 = Θell,2. By relaxing the non-convex constraint Z = zz> to Z � zz>, and thenusing Schur complement we obtain:

Θell,3 =

{(z,Z) ∈ RL × SL

∣∣∣∣ Tr(DZ) + d>z ≤ c,(Z zz> 1

)� 0

}.

The tractable robust counterpart of

max(z,Z)∈Θell,3

{Tr(A(x)Z) + b(x)>z

}≤ g(x)

coincides with the tractable robust counterpart of (14) obtained from using S-lemma. Moreover, if theuncertainty set constitutes an intersection of ellipsoids, the obtained safe approximation for the robustquadratic inequality via RPT coincides with that from approximate S-lemma [3]. In Zhen et al. [20], asimilar relationship between (approximate) S-lemma and RPT is established for QCQO.

4 Obtaining lower bounds.

One simple way of directly solving (1) is to consider a finite subset of scenarios Z sampled from the uncer-tainty set Z, and the sampled version of (1) constitutes a finite set of convex inequalities:

h (A(x)z + b(x)) ≤ 0 ∀z ∈ Z =⇒ maxi∈[I]

h(A(x)z(i) + b(x)

)≤ 0

⇐⇒ h(A(x)z(i) + b(x)

)≤ 0, i ∈ [I], (15)

where {z(1), · · · , z(I)} = Z ⊂ Z. Here the point-wise maximum of convex functions in the sampled in-equality can be reformulated into a finite set of convex inequalities (15), and provides a naive progressiveapproximation of (1). Alternatively, we can also sample the scenarios from the dual formulation of (1). Tothis end, we first reformulate (1) into the following adaptive robust nonlinear inequality.

Similarly, consider a finite set W of scenarios from the uncertainty set W , and assign each scenario w(i)

its own recourse decision (λ(i), {u(i)k }k), instead of determining complex (generally non-convex) decision rules

that are feasible for all possible scenarios in W . The sampled dual (5) again constitutes a finite set of convexinequalities:

∑k∈[K]

λ(i)k c∗k

(u

(i)k /λ

(i)k

)+ b(x)>w(i) − h∗(w(i)) ≤ 0, i ∈ [I]

A(x)>w(i) ≤∑k∈[K]

u(i)k , i ∈ [I],

(16)

where {w(1), · · · ,w(I)} = W ⊂ W . One can even combine the inequalities (15) and (16) to compute apossibly tighter approximation of (1) [5].

The question that remains is how to choose the finite set of scenarios. One simple way to do this wouldbe to consider all extreme points of Z, and the sampled version (15) is then optimal. However, the numberof extreme points of Z could be large in general. Hadjiyiannis et al. [11] propose a way to obtain a small buteffective finite set of scenarios for two-stage robust linear problems, which takes scenarios that are bindingfor the model solved with affine policies. We adapt the method of Hadjiyiannis et al. [11] to select a finiteset of scenarios from Z and W . To this end, we first approximate (3) with static policies, that is, (4),then fix the obtained (x,v, w) in (3) and take the binding scenarios z(1) and z(2) for the two inequalities.Therefore, Z = {z(1), z(2)}. Similarly, we can also approximate (5) with static policies, then fix the obtained(x,λ, {uk}k) in (5) and take the binding scenario w(i), i ∈ [nx + 1], for each inequality to construct W .Finally, the inequalities in (15) with Z and (16) with W provide a progressive approximation of (1).

16

We propose to use the following strategy to improve the obtained progressive approximation of (1).The idea here is adopted from Zhen and de Ruiter (2019). Consider the following form of the bilinearreformulation (6) with a fixed x′ from a progressive approximation:

supz∈Z

w∈domh∗

{w>A(x′)z + b(x′)>w − h∗(w)

}. (17)

The embedded maximization problem in (17) can be interpreted as a disjoint bilinear problem, that is,fixing either z or w, the embedded problem becomes a convex optimization problem. By exploiting thisobservation, for any given z′ ∈ Z ⊆ Z, the maximizer

w′ = arg maxw∈dom h∗

w>A(x′)z′ + b(x′)>w − h∗(w)

is a violating scenario if the corresponding optimal value is large than 0, and w′ can be used to update W ,

that is, W ′= W ∪ {w′}. Subsequently, fixing the obtained w′ in (17), the maximizer

z′ = arg maxz∈Z

w′>A(x′)z + b(x′)>w′ − h∗(w) (18)

is a violating scenario if the corresponding optimal value is large than 0, and z′ can be used to update Z,

that is, Z ′ = Z ∪ {z′}. We can repeat this iterative procedure till no improvement on the lower bound

for (17) can be achieved. The enriched sets Z ′ and W ′can then be used to improve x′ by considering (15)

with Z ′ and (16) with W ′. The procedure can be repeated till no improvement can be achieved or the

prescribed computational limit is met.

Remark 6. One can improve the efficiency of the iterative procedure if A(x) = A in (1), that is, A isindependent of x. In this case, for a given w′, de Ruiter et al. [10] have shown that the maximizer z′

in (18) dominates w′, that is, the feasible region of x in (15) with Z ′ = {z′} is larger or equal to the

one in (16) with W ′= {w′}. Therefore, the inequalities involving w′ is redundant, and can be omitted

throughout the procedure to improve the efficiency of this iterative procedure.

5 Computational results.

In this section, we assess the numerical benefit from our RPT hierarchy on two examples, a robust controland geometric optimization setting, respectively.

5.1 Constrained linear-quadratic control.

In this section, we consider an example from stochastic linear-quadratic control as in Bertsimas and Brown[4]. They consider a discrete-time stochastic system of the form:

yk+1 = Akyk +Bkuk +Ckzk, k = 0, . . . , N − 1,

where yk, uk, and zk are the state, control, and disturbance vectors respectively. The objective is to controlthe system (find control vectors uk) in order to minimize the cost function

J(y0,u, z) =

N∑k=1

(y>k Qkyk + 2q>k yk) +

N−1∑k=0

(u>kRkuk + 2r>k uk),

under some uncertainty on the disturbance vectors zk. After algebraic manipulations and given a robustdescription of the uncertainty, the problem can be cast into

minx,t

t s.t. ‖x‖22 + 2(F>x+ h)>z + z>Cz ≤ t, ∀z ∈ Z,

where Z = {z : ‖z‖2 ≤ γ}, and properly define vector h and matrices F , C (C � 0). Note that x canbe subject to other deterministic constraints. For instance, imposing uk ≥ 0 yields linear constraints on x.

17

We refer the reader to Bertsimas and Brown [4] and references therein for problem motivation and detailson the derivations.

Hence, we focus our attention to the robust convex constraint

2(F>x+ h)>z + z>Cz ≤ t− ‖x‖22, ∀z ∈ Z. (19)

As discussed in Section 1.1, constraints of this form can be represented either as quadratic or as second-ordercone constraints. In this section, we only consider implementations with h quadratic. In addition, note thatZ is described using a single quadratic constraint. However, our approximations partly rely on the presenceof linear constraints in the definition of the uncertainty set. Hence, one could consider a shifted uncertaintyset instead, obtained by introducing redundant linear inequalities such as

¯z ≤ z ≤ z with z = γe and

¯z = −γe, or a non-convex lifting of Z as in Bertsimas and Sim [8]. In this section, we compare thesemodeling alternatives with three objectives in mind: (a) illustrate the benefit of our general RPT approachoutlined in Section 3 over the perspectification approach from Section 2, which requires more stringentassumption on the uncertainty set, (b) measure the relative benefit from shifting or lifting the uncertaintyset, and (c) compare and assess the overall strength of our proposed safe approximations.

It is well known that the robust constraint (19) is equivalent to a semidefinite constraint, which we willuse as a benchmark to measure the suboptimality of our approach. In our numerical experiments, as inBertsimas and Brown [4], we take y0 = −1, Ak = Bk = Ck = 1, Qk = Rk = βk with β ∈ (0, 1), andqk = rk = 0.

Benefit from RPT over Perspectification As we proved in Section 3.4, the safe approximation derivedfrom Corollary 1 is equivalent to applying RPT up to Θ1 in Proposition 6. In addition, here, there are nolinear constraints in the description of domh∗, so Θ1 = Θ2 and the benefit from using RPT (Proposition 6)over perspectification (Corollary 1) might be unclear. Yet, as discussed in Section 3.2, RPT is more flexibleand can allow for an arbitrary number of linear constraints, while perspectification can only account fornon-negativity constraints on z. We illustrate this point on the present example.

To derive meaningful safe approximations, we need to enrich the definition of Z with linear constraints.A generic method for doing so is shifting the uncertainty set. For instance, any z : ‖z‖2 ≤ γ satisfies−γe ≤ z ≤ γe. Accordingly, we can:

• Consider the shifted uncertain vector z = γe + z, z ∈ Z, so that z ≥ 0 and apply Corollary 1. Wewill refer to this method as [Cor. 1 LB];

• Consider the shifted uncertainty set γe−Z ⊆ RI+ and apply Corollary 1 [Cor. 1 UB];

• Add the 2L linear inequalities −γe ≤ z ≤ γe to the definition of Z and apply RPT with Θ1 [Prop. 6,Lifted Θ1].

Derivations of the resulting three safe approximations are provided in Appendix C. Figure 1 displays thesub-optimality gap of each alternative, compared with the exact SDP reformulation, as γ increases. [Prop.6, Lifted Θ1] provides a substantial improvement over both [Cor. 1 LB] and [Cor. 1 UB], hence illustratingthe benefit from accounting for both lower and upper bounds on z when safely approximating the robustconstraint. Corollary 1, however, is unable to leverage such information, for it requires exactly and only Lnon-negativity constraints. In addition, applying RPT opens the door to a hierarchy of approximations. Forinstance, by multiplying the two norm constraints in the definition of Z and domh∗ respectively, one canconsider the third level of approximation Θ3 which further reduces the optimality gap (see performance of[Prop. 6, Lifted Θ3] in Figure 1).

Benefit from lifting over shifting Another option to add linear constraints into the definition of theuncertainty set is to decompose z into z = z+ − z− and consider the lifted uncertainty set {(z+, z−) :z+, z− ≥ 0, ‖z+ + z−‖2 ≤ γ}. Observe that this kind of non-convex lifting cannot be applied to anyuncertainty set, for the constraint ‖z‖2 ≤ γ is replaced by ‖z+ +z−‖2 ≤ γ - instead of ‖z+−z−‖2 ≤ γ - andconstitutes a valid lifting of Z for norm-sets only [8]. Yet, as demonstrated on Figure 2, such non-convexlifting of the uncertainty set can lead to tighter safe approximations (see [Prop. 6, Lifted Θ1] and [Prop.6, Lifted Θ3]) than shifting, especially for low to moderate values of γ. We remark that the first level of

18

(a) β = 0.2 (b) β = 0.7, yk ≥ 0

Figure 1: Computational results on constrained linear-quadratic control. Evolution of the sub-optimalitygap of the safe approximation as γ increases for a constrained linear-quadratic control examples with N = 20periods and shifted uncertainty sets.

approximation with the lifted uncertainty set [Prop. 6, Lifted Θ1] - which is equivalent to the approachoutlined in Bertsimas and Sim [8] - competes and even surpasses the third level of approximation on theshifted one. However, we observe that the difference between shifting and lifting shrinks as γ increases. Also,we should emphasize the fact that shifting is a very generic procedure that can be applied to any boundeduncertainty set, without further assumption on its structure.

(a) β = 0.2 (b) β = 0.7, yk ≥ 0

Figure 2: Computational results on constrained linear-quadratic control. Relative performance of shiftingvs. lifting the uncertainty set as γ increases for constrained linear-quadratic control examples with N = 20periods. Performance is measured in terms of sub-optimality gap with respect to the SDP reformulation.

Overall performance Figure 3 compares the performance of the SDP reformulation with safe approxi-mations obtained from Θ3 in Proposition 6, with the original uncertainty set {z : ‖z‖2 ≤ γ}, its shiftedand lifted versions respectively. In short, we observe that our approach successfully provides near-optimalsolution to (within 5% optimality) at a fraction of the cost of solving the exact SDP reformulation.

19

(a) Sub-optimality gap (b) Computational time

Figure 3: Computational results on constrained linear-quadratic control. Numerical behavior of three level-3safe approximations derived with RPT (Proposition 6) as the number of time periods N increases, comparedwith the exact SDP reformulation of the robust constraint. For this experiment, we set β = 1.0 and imposeyk ≥ 0.

5.2 Robust geometric optimization.

We evaluate the performance of our proposed approach on geometric optimization instances that are ran-domly generated in the same way as in Hsiung et al. [13]. In particular, we consider geometric op-timization problems with a linear objective, and a system of two-term log-sum-exp robust inequalities

minx

c>x

s.t. log

(e

(−1+B

(1)i z

)>x

+ e

(−1+B

(2)i z

)>x)≤ 0, ∀z ∈ Z, i ∈ [I], (20)

where c = 1 ∈ Rnx is the all ones vector, and B(1)i , B

(2)i ∈ Rnx×L are randomly generated sparse matrices

with sparsity density 0.1 whose nonzero elements are uniformly distributed on the interval [−1, 1]. Theuncertainty set is assumed to be an intersection of a hypercube and a ball, that is,

Z ={z ∈ RL | ‖z‖∞ ≤ 1, ‖z‖2 ≤ γ

}.

We consider a set of instances with nx = I = 100 and L ∈ {6, 8, · · · , 20}. We select a γ = L

√2LΓ(n/2+1)

πL/2,

where Γ denotes the gamma function, that ensures the volume of the ball {z | ‖z‖2 ≤ γ} coincides withthe volume of the hypercube {z | ‖z‖∞ ≤ 1}. All the reported numerical results are from the average of 10randomly generated instances.

Consider the bilinear reformulation of the i-th constraint of (20)

sup(w,z,v1,v2,v0)∈Θ

{(−1w1 +B

(1)i v1

)>x+

(−1w2 +B

(2)i v2

)>x− w0

}with the non-convex uncertainty set

Θ =

(w, z,v1,v2,v0)

∣∣∣∣∣∣∣∣∣∣‖z‖∞ ≤ 1, ‖z‖2 ≤ γ (z ∈ Z)w1, w2 ≥ 0, w1 + w2 = 1 (w ∈ domh∗)w1 logw1 + w2 logw2 ≤ w0 (epigraph)v1 = w1z, v2 = w2zv0 = w0z

,

20

which is safely approximated by four convex sets that satisfy Θ ⊆ Θ3 ⊆ Θ2 ⊆ Θ1 ⊆ Θ1′ , where Θ1 andΘ2 are obtained from applying Proposition 6 to Θ, while Θ3 is obtained by enriching Θ2 with the LMIin (12). In addition, we also compare the safe approximations from Θ1, Θ2 and Θ3 with Θ1′ , where Θ1′

is the corresponding Θ1 if Zbox = {z | ‖z‖∞ ≤ 1} is considered instead of Z. Note that the obtainedsafe approximation from Θ1′ coincides with that of Roos et al. [17] as discussed in Section 3.5. We refer toAppendix D for a detailed representation of Θ1, Θ1′ , Θ2 and Θ3. We resort to comparing our solutions’objective value to a lower bound. To this end, we compute a lower bound by using the optimal solution to thetightest obtained safe approximation to find potentially critical scenarios in the uncertainty set. The lowerbound is then constructed by solving a model that only safeguards for this finite set of critical scenarios; seemore detail in Section 4.


Figure 4: Computational results on robust geometric optimization. Numerical behavior of the safe approxi-mations from Θ1′ , Θ1 and Θ2 as the size of the instance, that is, L ∈ {6, 8, · · · , 20}, increases.

Numerical performance From Figure 4 we observe that, while our proposed safe approximations from Θ1

and Θ2 requires similar computational effort as that from Θ1′ , the average optimality gaps from consideringΘ1 and Θ2 are consistently smaller than that from considering Θ1′ . For all considered instances, we observethat despite the significant additional computational effort, the optimality gaps obtained from considering Θ3

coincide with the optimality gaps from considering Θ2. Therefore, we do not report the computation timefor the safe approximations from Θ3. Moreover, we also consider the shifted uncertainty set, and observethat the obtained optimality gaps coincide with the optimality gaps when the original uncertainty set isconsidered.

Benefit from lifting Alternatively, we consider the following lifted uncertainty set of Z.

Z+ ={

(z+, z−) ∈ RL+ × RL+ | ‖z+ + z−‖∞ ≤ 1, ‖z+ + z−‖2 ≤ γ}.

This non-convex projection is first proposed in Bertsimas and Sim [8] and then extended to solve adaptiverobust linear problems in Chen and Zhang [9]. By substituting z with z+ − z−, the bilinear reformulationbecomes

sup(w,z+,z−,v+

1 ,v−1

v+2 ,v−2 ,v

+0 ,v−0

)∈Θ+

{(−1w1 +B

(1)i (v+

1 − v−1 ))>x+

(−1w2 +B

(2)i (v+

2 − v−2 ))>x− w0

}

21

with the lifted non-convex uncertainty set

Θ+ =

(w, z+, z−,v+

1 ,v−1

v+2 ,v

−2 ,v

+0 ,v

−0

)∣∣∣∣∣∣∣∣∣∣∣∣∣∣

‖z+ + z−‖∞ ≤ 1, ‖z+ + z−‖2 ≤ γ ((z+, z−) ∈ Z+)w1, w2 ≥ 0, w1 + w2 = 1 (w ∈ domh∗)w1 logw1 + w2 logw2 ≤ w0 (epigrpah)v+

1 = w1z+, v−1 = w1z

−

v+2 = w2z

+, v−2 = w2z−

v+0 = w0z

+, v−0 = w0z−

z+, z− ∈ RL+

,

which is safely approximated by four convex sets that satisfy Θ+ ⊆ Θ+2 ⊆ Θ+

1 ⊆ Θ+1′ and Θ+ ⊆ Θ+

2 ⊆Θ+

1 ⊆ ΘBS, where Θ+1 and Θ+

2 obtained from applying Proposition 6 to Θ+, and Θ+1′ is the corresponding

Θ+1 if Z+

box = {(z+, z−) ∈ RL+×RL+ | ‖z+ +z−‖∞ ≤ 1} is considered instead of Z+. We define ΘBS in sucha way that the obtained safe approximation coincides with that of the extended approach of Bertsimas andSim [8]; see Sections 2.2 and 3.4. We refer to Appendix E for a detailed representation of ΘBS, Θ+

1 , Θ+1′

and Θ+2 . We do not consider Θ+

3 , that is, Θ+2 with addition LMI similarly as in (12), because we already

observe that Θ3 does not improve the approximation from Θ2 for the problem we consider here.

Numerical performance with lifting From Figure 5 we observe that, while our proposed safe approxi-mations from Θ+

1 and Θ+2 require similar computational effort as that from ΘBS, the average optimality gaps

from considering Θ+1 and Θ+

2 are consistently smaller than that from considering ΘBS. For all consideredinstances, we observe that the optimality gaps obtained from considering Θ+

1′ coincide with the optimalitygaps from considering Θ1 (i.e., without lifting). Therefore, we do not report the computation time for thesafe approximations from Θ+

1′ .


Figure 5: Computational results on robust geometric optimization. Numerical behavior of the safe approxi-mations from Θ+

1′ , Θ+1 and Θ+

2 as the size of the instance, that is, L ∈ {6, 8, · · · , 20}, increases.

Overall performance Firstly, the average optimality gaps from Θ1′ (which has the same performanceas Roos et al. [17]; see Section 3.5) and ΘBS (which has the same performance as the extended approachof Bertsimas and Sim [8]; see Sections 2.2 and 3.4) are very close to each other, while the computationaleffort required for computing the safe approximation from ΘBS is almost twice as much as that from Θ1′ .Our proposed safe approximations from Θ1, Θ2, Θ+

1 and Θ+2 consistently outperform that from considering

Θ1′ and ΘBS. Although the computational effort required for computing the safe approximations from Θ+1

and Θ+2 is more than twice of that from Θ1′ and ΘBS, the average optimality gaps from Θ+

1 and Θ+2 are

half of the ones from Θ1′ and ΘBS.

22

6 Conclusion.

We propose a hierarchical safe approximation scheme for (1). Via numerical experiments, we demonstratethat our approach either coincides with or improves upon the existing approaches of Bertsimas and Brown[4] and Roos et al. [17]. Furthermore, our approach not only provides a trade-off between the solution qualityand the computational effort, but also allows a direct integration with existing safe approximation schemes,e.g., the lifting technique proposed in Bertsimas and Sim [8] and Chen and Zhang [9], which is used to furtherimprove the obtained safe approximations.

23

A Probabilistic guarantees.

In this section, we derive probabilistic guarantees for our safe approximation via perspectification withoutadaptive variables (Corollary 1). In order to provide such guarantees, distributional assumptions on the“true” uncertain parameter are needed. However, Assumption 1 requires the uncertainty set Z to lie in thepositive orthant, an assumption which is often satisfied in practice once the original uncertain parameter isshifted or lifted. Hence, we need to specify how the uncertainty set Z relates to the true uncertain parameter.

For clarity, we will refer to this original uncertain vector as u and identify random variables with tildes(·).

A.1 Shifted uncertainty set.

We first consider Assumption 2 and the case where the uncertainty set is shifted. In practice, the uncertainparameter z is often decomposed into z = z + u, where z is the nominal value and u the (uncertain)deviation from the nominal. Then, an uncertainty set for u is built, U , and z is taken in the uncertainty setz + U . Assumption 2 mirrors this modelling process with the additional caveat that the uncertainty set isfurther shifted by z0 − z in order to guarantee that Z ⊆ RL+.

From a probability distribution standpoint, Assumption 2 requires the coordinates of u to be sub-Gaussian, i.e.,

Definition 1. [ Definition 1.2 in 16] A random variable u ∈ R is said to be sub-Gaussian with parameterσ2, denoted u ∼ subG(σ2), if u is centered, i.e., E[u] = 0, and for all s ∈ R,

E[esu]≤ e

s2σ2

2 .

Naturally, centered Gaussian random variables are also sub-Gaussian. Bounded random variables arealso a special case of sub-Gaussian random variables, as a consequence of Hoeffding’s inequality; see [16,Lemma 1.8]. From a bound on the moment generating function, one can derive large deviation bounds.More precisely, if u ∼ subG(σ2), then for any t > 0, P(u > t) ≤ exp(−t2/(2σ2)) [16, Lemma 1.3].

A key quantity in our result will be P(z � 0) = P(u � −z0), i.e, the probability that a random uncertainparameter z does not belong to the positive orthant. In cases where z is known or assumed to have boundedsupport - for instance z ∈ [−1, 1]L almost surely - then z0 can be chosen large enough so that P(z � 0) = 0.Otherwise, for a general z0 ≥ 0, a union bound inequality yields

P(z � 0) = P(u � −z0) ≤L∑i=1

P(ui � −z0i) ≤L∑i=1

exp(−z20i/2),

where the last inequality follows from the fact that ui ∼ subG(1). In the rest of this section, we denote p0

the probability P(z � 0) (or a valid upper bound), which can be as low as 0 and in general depends on theshifting vector z0. We now prove Proposition 3.

Proof. Proof of Proposition 3 We consider a robust solution x. With support function notations, x satisfies

δ∗ (h∞(A(x))|Z) + h(b(x)) ≤ 0.

We denote A the event that z ≥ 0. Hence, P (Ac) = P (z � 0) ≤ p0, by definition of p0. Conditioned on A,the assumptions of Lemma 1 are satisfied, leading to

h(A(x)z + b(x)) ≤ z>h∞(A(x)) + h(b(x)),

and

P(h(A(x)z + b(x)) > 0 | A

)≤ P

(z>h∞(A(x)) > δ∗ (h∞(A(x))|Z)

)= P

(u>h∞(A(x)) > δ∗ (h∞(A(x))|U)

)≤ exp

(−δ

? (h∞(A(x))|U)2

2‖h∞(A(x))‖22

),

24

where the last inequality follows from the fact that, by affine transformation of sub-Gaussian random vari-ables, u>h∞(A(x)) is itself sub-Gaussian with parameter ‖h∞(A(x))‖22. By definition of the robust com-plexity, for any vector v,

δ∗(v|U)

‖v‖2≥ minv:‖v‖2=1

δ∗ (v|U) = ρ(U),

so that we obtain

P(h(A(x)z + b(x)) > 0 | A

)≤ exp

(− 1

2ρ(U)2).

All things considered,

P (h(A(x)z + b(x)) > 0) = P (h(A(x)z + b(x)) > 0|A)P (A) + P (h(A(x)z + b(x)) > 0|Ac)P (Ac)≤ P (h(A(x)z + b(x)) > 0|A) + P (Ac) ,

and the result follows.

Remark 7. As previously suggested, if the recession function of h is not finite, we can further assume thath(0) = 0 and show that

∃(v, v) ∈ RL+1+ , s.t.

L∑`=1

z`v`h

(A(x)e`v`

)+ v h

(b(x)

v

)≤ 0,∀z ∈ Z,

z>v + v ≤ 1,∀z ∈ Z,

constitutes a valid safe approximation of (1). If x satisfies such a safe approximation, then a similar lineof reasoning yields (proof omitted)

P (h (A(x)z + b(x)) > 0) ≤ exp

(−δ∗(g(x,v)|U)2

2‖g(x,v)‖22

)+ exp

(−δ∗(v|U)2

2‖v‖22

)+ p0

≤ 2 exp(− 1

2ρ(U)2)

+ p0,

with g(x,v)>e` = v`h

(A(x)e`v`

).

A.2 Lifted uncertainty set.

We now consider the case where Z is a lifted version of an original uncertainty set, i.e, Assumption 3. Underthis assumption, we can write Corollary 1 as:

(z+)>h∞ (A(x)) + (z−)>h∞ (−A(x)) + h (b(x)) ≤ 0, ∀(z+, z−) ∈ Z.

Since z+ and z− cannot be simultaneously positive, the constraint above is equivalent to∑`

|u`|max [h∞ (±A(x)e`)] + h (b(x)) ≤ 0, ∀u ∈ U .

Before proving Proposition 4, we establish a technical lemma.

Lemma 2. For any x and z, we have

max (h∞ (±A(x)z)) ≤ α‖H(A,x)z‖2.

for α =√L and H(A,x) the L× L diagonal matrix whose diagonal coefficients are max [h∞ (±A(x)e`)].

25

Proof. Proof of Lemma 2 Since h∞ is positively homogeneous with h∞(0) = 0, it is sub-additive and

h∞ (A(x)z) ≤∑`

h∞ (z`A(x)e`) (Sub-additivity)

≤L∑j=1

|zj |h∞ (ε`A(x)e`) with ε` = sign(z`)

≤L∑j=1

|zj |max [h∞ (±A(x)e`)]

≤√L

√√√√ L∑j=1

max [h∞ (±A(x)e`)]2z2j (Cauchy-Schwarz).

Finally, let us prove the main result of this section, Proposition 4.

Proof. Proof of Proposition 4 1. Using the safe approximation

h (A(x)u+ b(x)) ≤ h∞ (A(x)u) + h (b(x)) ≤ max [h∞ (±A(x)u)] + h (b(x)) ,

together with

δ∗ (g(A,x)| |U |) + h (b(x)) ≤ 0,

we have

P (h (A(x)u+ b(x)) > 0) ≤ P (max [h∞ (±A(x)u)] > δ∗ (g(A,x)| |U |)) .

Applying Lemma 2, we eventually get

P (h (A(x)z + b(x)) > 0) ≤ P (α‖H(A,x)u‖2 > δ∗ (g(A,x)| |U |)) .

Both sides of the inequality being positive, we compare their squares and the result follows from the gener-alized Hanson-Wright inequality (Corollary 2) with t = δ∗ (g(A,x)| |U |)2

/α2.

2. The function t 7→√et exp(−t2/2) is monotonically decreasing over [1,∞). Denoting t(x) = δ?(g(A,x)||U |)

α‖g(A,x)‖2 ,

we have t(x) ≥ ρ(|U |)/α.

Remark 8. A tighter, yet more overwhelming, bound can be obtained by invoking the generalized Hanson-Wright inequality stated in Theorem 1 instead of Corollary 2.

Remark 9. We recover exactly Bertsimas et al. [7, Theorem 2], under the weaker assumption that uj’s areindependent sub-Gaussian random variables (instead of independent Gaussian random variables) and for anyuncertainty set U . However, the generic constant α we obtain is weaker. Indeed, Bertsimas et al. [7] providean h-specific proof of Lemma 2 and are able to derive tighter values of α in these special cases, which couldin turn be used to tighten our bound as well.

B Generalization of the Hanson-Wright inequality.

The goal of this section is to derive large deviation bounds on quadratic transforms of sub-Gaussian randomvariables. In the Gaussian case, the chi-square distribution with n degrees of freedom is defined as thedistribution of

∑ni=1 zi

2 where the zi’s are n independent Gaussian random variable with mean 0 andvariance 1. In this section we provide analogous results about the distribution of zi

2 and∑ni=1 zi

2 when zi’sare sub-Gaussian random variables.

We start by a technical lemma regarding the moment generating function of z2.

26

Lemma 3. Let z be a sub-Gaussian random variable with variance proxy 1, and λ ≥ 0 be a non-negativescalar. Then, for any θ ∈ R and β > 2 such that βθλ ≤ 1, we have

E[exp

(θλz2

)]≤ exp

(1

2βθλ ln

(ββ−2

)).

Proof. Proof of Lemma 3 We follow the same proof technique as Bertsimas and Sim [8]. Introducing θ, β ∈ R,

E[exp

(θλz2

)]= E

[exp

(βθλ z

2

β

)]= E

[(ez

2/β)βθλ]

≤ E[ez

2/β]βθλ

,

where the last inequality follows from Jensen’s inequality under the condition that βθλ ≤ 1. Since z issub-Gaussian with variance proxy 1, we have E

[esz

2/2]≤ 1√

1−s for any s ∈ [0, 1) [18,Theorem 2.6]. Hence,

if β > 2,

E[exp

(θλz2

)]≤ exp

(1

2βθλ ln

(ββ−2

)).

We now state a generalization of the Hanson-Wright inequality [12] to the case of sub-Gaussian randomvariables.

Theorem 1. Let z = (z1, . . . , zn) ∈ Rn be a random vector with independent sub-Gaussian coordinates zjwith variance proxy σ2. Let A be an m×n matrix. Then, denoting λ the vector of eigenvalues of A>A, forevery t > σ2‖λ‖1,

P(‖Az‖22 > t

)≤

[√et

σ2‖λ‖1exp

(− t

2σ2‖λ‖1

)]‖λ‖1/‖λ‖∞.

Remark 10. Hsu et al. [14] proved a similar tail bound for quadratic forms of sub-Gaussian vectors. Underthe assumptions and notations of Theorem 1, for a sub-Gaussian random vector with variance proxy 1, theyproved that (Theorem 2.1)

∀s > 0, P(‖Az‖22 > ‖λ‖1 + 2‖λ‖22

√s+ 2‖λ‖∞s

)≤ e−s.

Setting t = ‖λ‖1 + 2‖λ‖2√s + 2‖λ‖∞s > ‖λ‖1, we would similarly obtain a bound which exponentially

decreases in t as exp (−t/2‖λ‖∞).

Proof. Proof of Theorem 1 Without loss of generality, we can prove the result for σ2 = 1. The general resultwill follow after observing that

P(‖Az‖22 > t

)= P

(‖Ay‖22 > t/σ2

),

where y := z/σ2 is a sub-Gaussian vector with variance proxy 1. The rest of the proof is divided in threesteps.

Step 1: Decoupling Our proof technique relies on a decoupling mechanism similar to Hsu et al. [14]. Letus introduce a m-dimensional Gaussian vector g = (g1, . . . , gm) ∼ N (0, Im). We fix some t > 0 and θ ∈ R.On the one side we have,

E[exp(θ g>Az)

]= E

[E[exp(θ g>Az) |g

]]≤ E

[exp

(θ2

2‖A>g‖22

)],

27

where the last inequality follows from the fact that z is sub-Gaussian with variance proxy 1. On the otherside,

E[exp(θ g>Az)

]≥ E

[exp(θ g>Az)I(‖Az‖22 > t)

]= E

[E[exp(θ g>Az)I(‖Az‖22 > t) |z

]]= E

[exp

(1

2θ2‖Az‖22

)I(‖Az‖22 > t)

]≥ exp

(θ2t

2

)P(‖Az‖22 > t

).

All in all,

P(‖Az‖22 > t

)≤ exp

(−θ

2t

2

)E[exp

(1

2θ2 ‖A>g‖22

)].

Step 2: Bounding the moment generating function of the Gaussian chaos Consider the singularvalue decomposition of A, A = U>ΛV where U and V are orthonormal matrices and Λ ∈ Rm×n is adiagonal matrix whose diagonal coefficients are

√λ1, . . . ,

√λmin(m,n). By rotational invariance, y := Ug ∼

N (0, Im) and ‖A>g‖2 = ‖Λy‖2. Hence,

E[exp

(1

2θ2 ‖A>g‖22

)]= E

[exp

(1

2θ2 ‖Λ>y‖22

)]=

min(m,n)∏i=1

E[exp

(1

2θ2λiyi

2

)].

We now apply Lemma 3 to bound each moment generating function: Given β > 2 such that 12βλiσ

2 ≤ 1,we have

E[exp

(1

2θ2 ‖A>g‖22

)]≤ exp

(1

4βθ2‖λ‖1 ln

(ββ−2

)),

where ‖λ‖1 =∑min(m,n)i=1 λi.

Step 3: Conclusion Putting pieces together, we finally obtain

P(‖Az‖22 > t

)≤ exp

(−1

2θ2t+

1

4βθ2‖λ‖1 ln

(ββ−2

)).

We fix the value of θ such that 12β‖λ‖∞θ

2 = 1. As a result the condition 12βλiθ

2 ≤ 1 is naturally satisfiedfor all i = 1, . . . , n and

P(‖Az‖22 > t

)≤ exp

(− t

β‖λ‖∞+‖λ‖1

2‖λ‖∞ln(

ββ−2

)).

Taking derivatives and choosing the best β , we have

β =2t

t− ‖λ‖1> 2 as long as t > ‖λ‖1.

Substituting and simplifying, we finally obtain

P(‖Az‖22 > t

)≤ exp

(‖λ‖1

2‖λ‖∞

)exp

(1

2

‖λ‖1‖λ‖∞

ln

(t

‖λ‖1

))exp

(− t

2‖λ‖∞

), for t > ‖λ‖1σ2.

We conclude this section with a weaker, yet simpler, version of the Hanson-Wright inequality of Theorem1.

28

Corollary 2. Under the same assumptions as Theorem 1, for every t > σ2‖λ‖1,

P(‖Az‖22 > t

)≤

√et

σ2‖λ‖1exp

(− t

2σ2‖λ‖1

).

Proof. Proof of Corollary 2 It follows from Theorem 1 after observing that ‖λ‖1/‖λ‖∞ ≥ 1.

Remark 11. Notice that the bound in Corollary 2 only involves ‖λ‖1, which can be computed directly fromA as ‖λ‖1 = tr(A>A).

C Derivations for the robust control example.

Constraint (19) is of the form

h (A(x)z + b(x)) ≤ t− ‖x‖22, ∀z ∈ Z,

with

h

([uu

])= u>u+ u, A(x) =

[C1/2

2x>F + 2h>

], b(x) =

[00

].

In this case, since there are no linear constraints in the definition of domh∗, Θ1 = Θ2 in the hierarchy ofsafe approximations presented in Proposition 6.

C.1 Ellipsoidal uncertainty set.

We first consider the raw description of the uncertainty set Z = {z : ‖z‖2 ≤ γ} and consider the bilinearreformulation (6)

sup(w,w,w0,z,V ,v,v0)∈Θ

[Tr(C1/2V ) + 2x>F v + 2h>v − w0

]≤ t− ‖x‖22,

with

Θ =

(w, w, w0, z,V , v,v0)

∣∣∣∣∣∣∣∣∣∣∣∣

‖z‖2 ≤ γ, (z ∈ Z)w = 1, ((w, w) ∈ domh∗)0 ≤ w0,

14‖w‖

22 ≤ w0, (h∗(w, w) ≤ w0)

V = wz>,v = wz,v0 = w0z

.

After careful inspection, we can eliminate the variables w(= 1) and v(= z) and consider instead


[Tr(C1/2V ) + 2x>Fz + 2h>z − w0

]≤ t− ‖x‖22,

with

Θ =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣∣‖z‖2 ≤ γ,0 ≤ w0,

14‖w‖

22 ≤ w0,

V = wz>,v0 = w0z

.

However, in the absence of linear inequalities in the description of Z and domh∗, we have Θ0 = Θ1 = Θ2

and the resulting safe approximations are uninformative. Yet, we can apply the third level of approximation,

Θ3 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣∣‖z‖2 ≤ γ,0 ≤ w0,

14‖w‖

22 ≤ w0,

14‖V ‖

22 ≤ γ2w0,

‖v0‖2 ≤ γw0

,

29

and eventually get the following robust counterpart

2γ‖F>x+ h‖2 + γ2‖C‖2 ≤ t− ‖x‖22.

This is equivalent to upper-bounding the strongly convex part in the original constraint by γ2‖C‖2 and takethe robust counterpart of the robust linear constraint. This is the approach chosen by Bertsimas and Brown[4], although with different derivations.

C.2 Shifted uncertainty set.

We enrich the definition of Z with redundant linear constraints, i.e., we consider

Z = {z : ‖z‖2 ≤ γ,¯z ≤ z ≤ z},

with z = γe and¯z = −γe. As a result, z −

¯z ≥ 0 and we can apply Corollary 1 for the shifted uncertainty

−¯z+Z ⊆ RI+. Namely, we introduce additional decision variables u, u ≥ 0 such that u>(z−

¯z)+u ≤ 1, ∀z,

i.e., γ‖u‖2 − u>¯z + u ≤ 1, and such that

−∑i

¯ziCiiui

+ γ

∥∥∥∥2F>x+ 2h+

[Ciiui

]i

∥∥∥∥2

+ ¯z>C

¯z

u≤ t− ‖x‖2.

Alternatively, z − z ≥ 0 and we can apply Corollary 1 for the shifted uncertainty z −Z ⊆ RI+. Namely, weintroduce additional decision variables u, u ≥ 0 such that u>(z−z) +u ≤ 1, ∀z, i.e., u>z+ γ‖u‖2 +u ≤ 1,and such that ∑

i

ziCiiui

+ γ

∥∥∥∥2F>x+ 2h−[Ciiui

]i

∥∥∥∥2

+ ¯z>C

¯z

u≤ t− ‖x‖2.

Finally, we can also employ the bilinear reformulation (6):


[Tr(C1/2V ) + 2x>Fz + 2h>z − w0

]≤ t− ‖x‖22,

with

Θ =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣∣¯z ≤ z, z ≤ z, ‖z‖2 ≤ γ, (z ∈ Z)14‖w‖

22 ≤ w0, (h∗(w, w) ≤ w0)

V = wz>,v0 = w0z

,

and derive the corresponding hierarchy (Proposition 6):

Θ0 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣ ¯z ≤ z, z ≤ z, ‖z‖2 ≤ γ, (z ∈ Z)14‖w‖

22 ≤ w0, (h∗(w, w) ≤ w0)

w0¯z ≤ v0 ≤ w0z, ‖v0‖2 ≤ γw0,

,

Θ1 =

(w, w0, z,V ,v0)

∣∣∣∣∣∣∣∣∣∣∣

(w, w0, z,V ,v0) ∈ Θ0,

14 (zk − zk)

∥∥∥∥ zkw − V ekzk − zk

∥∥∥∥2

2

≤ w0zk − v0k,

14 (zk −

¯zk)

∥∥∥∥V ek − ¯zkw

zk −¯zk

∥∥∥∥2

2

≤ v0k − w0¯zk

,

Θ2 = Θ1,

Θ3 =

{(w, w0, z,V ,v0)

∣∣∣∣∣ (w, w0, z,V ,v0) ∈ Θ2,1

4‖V ‖2F ≤ γ2w0.

}.

30

C.3 Lifted uncertainty set.

Another way to introduce meaningful constraints is to lift the uncertainty set into positive and negativepart, i.e., consider {(z+, z−) ∈ R2L

+ | ‖z+ + z−‖2 ≤ γ}. In this case, we can apply Corollary 1 for the lifteduncertainty set and introduce additional decision variables u+,u−, u+, u− ≥ 0 such that

γ‖max(u+,u−)‖2 + u+ + u− ≤ 1,

γ∥∥∥max

([Ciiu+i

]i+ 2F>x+ 2h,

[Ciiu−i

]i− 2F>x− 2h

)∥∥∥2≤ t− ‖x‖22.

We can also apply the bilinear formulation

sup(w,w0,z±,V±,v0)∈Θ

[Tr(C1/2V +)− Tr(C1/2V −) + 2(F>x+ h)>(z+ − z−)− w0

]≤ t− ‖x‖22,

with

Θ =

(w, w, w0, z±,V±,v0±)

∣∣∣∣∣∣∣∣0 ≤ z±, ‖z+ + z−‖2 ≤ γ, ((z+, z−) ∈ Z)0 ≤ w0,

14‖w‖

22 ≤ w0, (h∗(w, w) ≤ w0)

V± = wz>± ,v0± = w0z±

,

and the corresponding hierarchy is

Θ0 =

(w, w0, z±,V±,v0±)

∣∣∣∣∣∣0 ≤ z±, ‖z+ + z−‖2 ≤ γ, ((z+, z−) ∈ Z)0 ≤ w0,

14‖w‖

22 ≤ w0, (h∗(w, w) ≤ w0)

v0± ≥ 0, ‖v0+ + v0−‖2 ≤ γw0

,

Θ1 =

(w, w0, z±,V±,v0±)

∣∣∣∣∣∣(w, w0, z±,V±,v0±) ∈ Θ0

z±,k4

∥∥∥∥ 1

z±,kV±ek

∥∥∥∥2

2

≤ v0,±,k

,

Θ2 = Θ1,

Θ3 =

{(w, w0, z±,V±,v0±)

∣∣∣∣∣ (w, w0, z±,V±,v0±) ∈ Θ21

4‖V + + V −‖2F ≤ γ2w0

}.

D Supplementary Material: Hierarchical safe approximations forrobust geometric optimization.

The safe approximation at Level 1 with Zbox = {z | ‖z‖∞ ≤ 1} (instead of Z) is:

Θ1′ =

(w, z,v1,v2,v0)

∣∣∣∣∣∣∣∣∣∣∣∣

‖z‖∞ ≤ 1w1, w2 ≥ 0, w1 + w2 = 1w1 logw1 + w2 logw2 ≤ w0

‖v1‖∞ ≤ w1, ‖v2‖∞ ≤ w2, v1 + v2 = z(w1 − v1`) log w1−v1`

1−z` + (w2 − v2`) log w2−v2`1−z` ≤ w0 − v0`, ` ∈ [L]

(w1 + v1`) log w1+v1`1+z`

+ (w2 + v2`) log w2+v2`1+z`

≤ w0 + v0`, ` ∈ [L]

.

The safe approximation at Level 1 is:

Θ1 ={

(w, z,v1,v2,v0) ∈ Θ1′∣∣ ‖z‖2 ≤ γ }

.


Θ2 ={

(w, z,v1,v2,v0) ∈ Θ1

∣∣ ‖v1‖2 ≤ w1γ, ‖v2‖2 ≤ w2γ}.


Θ3 =

(w, z,v1,v2,v0) ∈ Θ2

∣∣∣∣∣∣ ∃Z,W :

W V wV > Z zw> z> 1

� 0

.

where W ∈ R3×3, V = [v0 v1 v2]> ∈ R3×L, Z ∈ RL×L and w = [w0 w1 w2]>.

31

E Supplementary Material: Hierarchical safe approximations forrobust geometric optimization with lifting.

The safe approximation of the extended approach of Bertsimas and Sim [8] is:

ΘBS =

(w, z+, z−,v+

1 ,v−1

v+2 ,v

−2 ,v

+0 ,v

−0

)∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

‖z+ + z−‖∞ ≤ 1, ‖z+ + z−‖2 ≤ γw1, w2 ≥ 0, w1 + w2 = 1w1 logw1 + w2 logw2 ≤ w0

‖v+1 + v−1 ‖∞ ≤ w1, ‖v+

2 + v−2 ‖∞ ≤ w2

v+1 + v+

2 = z+, v−1 + v−2 = z−

v+1` log(v+

1`/z+` ) + v+

2` log(v+2`/z

+` ) ≤ v+

0` ` ∈ [L]v−1` log(v−1`/z

−` ) + v−2` log(v−2`/z

−` ) ≤ v−0` ` ∈ [L]

z+, z−,v+1 ,v

−1 ,v

+2 ,v

−2 ∈ RL+

.

The safe approximation at Level 1 with Z+box =

{(z+, z−) ∈ RL+ × RL+ | ‖z+ + z−‖∞ ≤ 1

}is:

Θ+1′ =

(w, z+, z−,v+

1 ,v−1

v+2 ,v

−2 ,v

+0 ,v

−0

)

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

‖z+ + z−‖∞ ≤ 1w1, w2 ≥ 0, w1 + w2 = 1w1 logw1 + w2 logw2 ≤ w0

‖v+1 + v−1 ‖∞ ≤ w1, ‖v+

2 + v−2 ‖∞ ≤ w2

v+1 + v+

2 = z+, v−1 + v−2 = z−

v+1` log(v+

1`/z+` ) + v+

2` log(v+2`/z

+` ) ≤ v+

0` ` ∈ [L]v−1` log(v−1`/z

−` ) + v−2` log(v−2`/z

−` ) ≤ v−0` ` ∈ [L]

(w1 − v+1` − v

−1`) log

w1−v+1`−v−1`

1−z+` −z−`

+ (w2 − v+2` − v

−2`) log

w2−v+2`−v−2`

1−z+` −z−`

...

≤ w0 − v+0` − v

−0`, ` ∈ [L]

z+, z−,v+1 ,v

−1 ,v

+2 ,v

−2 ∈ RL+

.


Θ+1 =

{(w, z+, z−,v+

1 ,v−1

v+2 ,v

−2 ,v

+0 ,v

−0

)∈ Θ+

1′

∣∣‖z+ + z−‖2 ≤ γ}.


Θ+2 =

{(w, z+, z−,v+

1 ,v−1

v+2 ,v

−2 ,v

+0 ,v

−0

)∈ Θ+

1

∣∣‖v+1 + v−1 ‖2 ≤ w1γ, ‖v+

2 + v−2 ‖2 ≤ w2γ

}.

32

References

[1] Ben-Tal A, Den Hertog D, Vial JP (2015) Deriving robust counterparts of nonlinear uncertain inequalities.Mathematical Programming 149(1-2):265–299.

[2] Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust optimization, volume 28 (Princeton University Press).

[3] Ben-Tal A, Nemirovski A, Roos K (2002) Robust solutions ofuncertain quadratic and conic-quadratic problems.SIAM Journal on Optimization 13(2):535–560.

[4] Bertsimas D, Brown DB (2007) Constrained stochastic lqc: a tractable approach. IEEE Transactions on auto-matic control 52(10):1826–1841.

[5] Bertsimas D, de Ruiter FJCT (2016) Duality in two-stage adaptive linear optimization: Faster computation andstronger bounds. INFORMS Journal on Computing 28(3):500–511.

[6] Bertsimas D, Den Hertog D, Pauphilet J (2019) Probabilistic guarantees in robust optimization. Available onOptimization Online .

[7] Bertsimas D, Pachamanova D, Sim M (2004) Robust linear optimization under general norms. Operations Re-search Letters 32(6):510–516.

[8] Bertsimas D, Sim M (2006) Tractable approximations to robust conic optimization problems. MathematicalProgramming 107(1-2):5–36.

[9] Chen X, Zhang Y (2009) Uncertain linear programs: Extended affinely adjustable robust counterparts. Opera-tions Research 57(6):1469–1482.

[10] de Ruiter FJCT, Zhen J, Den Hertog D (2019) Dual approach for two-stage robust nonlinear optimization.Available on Optimization Online .

[11] Hadjiyiannis MJ, Goulart PJ, Kuhn D (2011) A scenario approach for estimating the suboptimality of linear de-cision rules in two-stage robust optimization. 2011 50th IEEE Conference on Decision and Control and EuropeanControl Conference, 7386–7391 (IEEE).

[12] Hanson DL, Wright FT (1971) A bound on tail probabilities for quadratic forms in independent random variables.The Annals of Mathematical Statistics 42(3):1079–1083.

[13] Hsiung KL, Kim SJ, Boyd S (2008) Tractable approximate robust geometric programming. Optimization andEngineering 9(2):95–118.

[14] Hsu D, Kakade S, Zhang T, et al. (2012) A tail inequality for quadratic forms of subgaussian random vectors.Electronic Communications in Probability 17(52):1–6.

[15] MOSEK (2020) MOSEK Modeling Cookbook (MOSEK ApS).

[16] Rigollet P, Hutter JC (2015) High dimensional statistics. Lecture notes for course 18.S997 .

[17] Roos E, Den Hertog D, Ben-Tal A, de Ruiter FJCT, Zhen J (2018) Approximation of hard uncertain convexinequalities. Available on Optimization Online .

[18] Wainwright MJ (2019) High-dimensional statistics: A non-asymptotic viewpoint, volume 48 (Cambridge Univer-sity Press).

[19] Zhen J, de Ruiter FJCT, Den Hertog D (2019) Robust optimization for models with uncertain soc and sdpconstraints. Available on Optimization Online .

[20] Zhen J, Den Hertog D, Kuhn D (2019) Reformulation-perspectification-technique: an extension of thereformulation-linearization-technique. In Progress .

33

robust convex optimization: a new ... - dimitris bertsimas

Documents