parallel and distributed methods for nonconvex optimization ...efficient audio and video streaming...

16
arXiv:1601.04059v1 [cs.IT] 15 Jan 2016 1 Parallel and Distributed Methods for Nonconvex OptimizationPart II: Applications Gesualdo Scutari, Francisco Facchinei, Lorenzo Lampariello, Peiran Song, and Stefania Sardellitti Abstract—In Part I of this paper, we proposed and analyzed a novel algorithmic framework for the minimization of a nonconvex (smooth) objective function, subject to nonconvex constraints, based on inner convex approximations. This Part II is devoted to the application of the framework to some resource allocation problems in communication networks. In particular, we consider two non-trivial case-study applications, namely: (generalizations of) i) the rate profile maximization in MIMO interference broadcast networks; and the ii) the max-min fair multicast multigroup beamforming problem in a multi-cell environment. We develop a new class of algorithms enjoying the following distinctive features: i) they are distributed across the base stations (with limited signaling) and lead to subproblems whose solutions are computable in closed form; and ii) differently from current relaxation-based schemes (e.g., semidefinite relaxation), they are proved to always converge to d-stationary solutions of the afore- mentioned class of nonconvex problems. Numerical results show that the proposed (distributed) schemes achieve larger worst- case rates (resp. signal-to-noise interference ratios) than state-of- the-art centralized ones while having comparable computational complexity. I. I NTRODUCTION In Part I of this paper [3], we proposed a novel general algorithmic framework for the minimization of a nonconvex (smooth) objective function U : K→ R subject to convex constraints K and nonconvex smooth ones g j (x) 0, with g j : K→ R, min x U (x) s.t. g j (x) 0,j =1,...,m x ∈K X . (1) Building on the idea of inner convex approximation, our approach consists in solving a sequence of strongly convex inner approximations of (1) in the form: given x ν ∈X , ˆ x(x ν )= argmin x ˜ U (x; x ν ) s.t. ˜ g j (x; x ν ) 0,j =1,...,m x ∈K X (x ν ), (2) where ˜ U (x; x ν ) is a strongly convex surrogate function of U (x) and ˜ g j (x; x ν ) is an upper convex approximant of g j (x), Scutari is with the School of Industrial Engineering and the Cy- ber Center (Discovery Park), Purdue University, West-Lafayette, IN, USA; email: [email protected]. Facchinei and Lampariello are with the Dept. of Computer, Control, and Management Engi- neering, University of Rome “La Sapienza”, Rome, Italy; emails: <facchinei, lampariello>@diag.uniroma1.it. Song is with the School of Information and Communication Engineering, Beijing In- formation Science and Technology University, Beijing, China; email: [email protected]. Sardellitti is with the Dept. of Information, Electronics and Telecommunications, University of Rome “La Sapienza”, Rome, Italy; email: [email protected]. The work of Scutari was supported by the USA National Science Foun- dation under Grants CMS 1218717, CCF 1564044, and CAREER Award 1254739. The work of Facchinei was partially supported by the MIUR project PLATINO, Grant n. PON01_01007. Part of this work appeared in [1, 2]. both depending on the current iterate x ν ; and X (x ν ) is the feasible set of (2). Denoting by ˆ x(x ν ) the unique solution of (2), the main iterate of the algorithm reads [3]: given x ν ∈X , x ν+1 = x ν + γ ν x(x ν ) x ν ) , ν 0, (3) were {γ ν } is a step-size sequence. The proposed scheme represents a gamut of new algorithms, each of them corre- sponding to a specific choice of the surrogate functions ˜ U and ˜ g j , and the step-size sequence {γ ν }. Several choices offering great flexibility to control iteration complexity, communica- tion overhead and convergence speed, while all guaranteeing convergence are discussed in Part I of the paper, see [3, Th. 1]. Quite interestingly, the scheme leads to new efficient distributed algorithms even when customized to solve well- researched problems. Some examples include power control problems in cellular systems [4, 5, 6, 7], MIMO relay op- timization [8], dynamic spectrum management in DSL sys- tems [9, 10], sum-rate maximization, proportional-fairness and max-min optimization of SISO/MISO/MIMO ad-hoc networks [11, 12, 13, 14, 15, 16, 17, 18], robust optimization of CR networks [12, 19, 20, 21], transmit beamforming design for multiple co-channel multicast groups [22, 23, 24, 25], and cross-layer design of wireless networks [26, 27, 28]. Among the problems mentioned above, in this Part II, we focus as case-study on two important resource allocation designs (and their generalizations) that are representative of some key challenges posed by the next-generation communi- cation networks, namely: 1) the rate profile maximization in MIMO interference broadcast networks; and 2) the max-min fair multicast multigroup beamforming problem in multi-cell systems. The interference management problem as in 1) has become a compelling task in 5G densely deployed multi-cell networks, where interference seriously limits the achievable rate performance if not properly managed. Multicast beam- forming as in 2) is a part of the Evolved Multimedia Broadcast Multicast Service in the Long-Term Evolution standard for efficient audio and video streaming and has received increasing attention by the research community, due to the proliferation of multimedia services in next-generation wireless networks. Building on the framework developed in Part I, we propose a new class of convex approximation-based algorithms for the aforementioned two problems enjoying several desirable features. First, they provide better performance guarantees than ad-hoc state-of-the art schemes (e.g., [18, 22, 29]), both theoretically and numerically. Specifically, our algorithms achieve (d-)stationary solutions of the problems under con- siderations, whereas current relaxation-based algorithms (e.g., semidefinite relaxation) either converge just to feasible points or to stationary points of a related problem, which are not proved to be stationary for the original problems. Second, our schemes represent the first class of distributed algorithms in the literature for such problems: at each iteration, a convex

Upload: others

Post on 26-Nov-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

arX

iv:1

601.

0405

9v1

[cs.

IT]

15 J

an 2

016

1

Parallel and Distributed Methods for NonconvexOptimization−Part II: Applications

Gesualdo Scutari, Francisco Facchinei, Lorenzo Lampariello, Peiran Song, and Stefania Sardellitti

Abstract—In Part I of this paper, we proposed and analyzed anovel algorithmic framework for the minimization of a nonconvex(smooth) objective function, subject to nonconvex constraints,based on inner convex approximations. This Part II is devotedto the application of the framework to some resource allocationproblems in communication networks. In particular, we considertwo non-trivial case-study applications, namely: (generalizationsof) i) the rate profile maximization in MIMO interferencebroadcast networks; and the ii) the max-min fair multicastmultigroup beamforming problem in a multi-cell environment.We develop a new class of algorithms enjoying the followingdistinctive features: i) they aredistributed across the base stations(with limited signaling) and lead to subproblems whose solutionsare computable in closed form; and ii) differently from curr entrelaxation-based schemes (e.g., semidefinite relaxation), they areproved to always converge to d-stationary solutions of the afore-mentioned class of nonconvex problems. Numerical results showthat the proposed (distributed) schemes achieve larger worst-case rates (resp. signal-to-noise interference ratios) than state-of-the-art centralized ones while having comparable computationalcomplexity.

I. I NTRODUCTION

In Part I of this paper [3], we proposed a novel generalalgorithmic framework for the minimization of a nonconvex(smooth) objective functionU : K → R subject to convexconstraintsK and nonconvex smooth onesgj(x) ≤ 0, withgj : K → R,

minx

U(x)

s.t. gj(x) ≤ 0, j = 1, . . . ,m

x ∈ K

, X .

(1)

Building on the idea of inner convex approximation, ourapproach consists in solving a sequence ofstrongly convexinner approximations of (1) in the form: givenxν ∈ X ,

x(xν) = argminx

U(x;xν)

s.t. gj(x;xν) ≤ 0, j = 1, . . . ,m

x ∈ K

, X (xν ),

(2)where U(x;xν ) is a strongly convex surrogate function ofU(x) andgj(x;xν) is an upper convex approximant ofgj(x),

Scutari is with the School of Industrial Engineering and theCy-ber Center (Discovery Park), Purdue University, West-Lafayette, IN,USA; email: [email protected]. Facchinei and Lamparielloare with the Dept. of Computer, Control, and Management Engi-neering, University of Rome “La Sapienza”, Rome, Italy; emails:<facchinei, lampariello>@diag.uniroma1.it. Song is withthe School of Information and Communication Engineering, Beijing In-formation Science and Technology University, Beijing, China; email:[email protected]. Sardellitti is with the Dept. of Information,Electronics and Telecommunications, University of Rome “La Sapienza”,Rome, Italy; email:[email protected] work of Scutari was supported by the USA National ScienceFoun-dation under Grants CMS 1218717, CCF 1564044, and CAREER Award1254739. The work of Facchinei was partially supported by the MIUR projectPLATINO, Grant n. PON01_01007. Part of this work appeared in[1, 2].

both depending on the current iteratexν ; andX (xν) is thefeasible set of (2). Denoting byx(xν) the unique solution of(2), the main iterate of the algorithm reads [3]: givenxν ∈ X ,

xν+1 = xν + γν (x(xν)− xν) , ν ≥ 0, (3)

were γν is a step-size sequence. The proposed schemerepresents a gamut of new algorithms, each of them corre-sponding to a specific choice of the surrogate functionsU andgj, and the step-size sequenceγν. Several choices offeringgreat flexibility to control iteration complexity, communica-tion overhead and convergence speed, while all guaranteeingconvergence are discussed in Part I of the paper, see [3,Th. 1]. Quite interestingly, the scheme leads to new efficientdistributed algorithms even when customized to solve well-researched problems. Some examples include power controlproblems in cellular systems [4, 5, 6, 7], MIMO relay op-timization [8], dynamic spectrum management in DSL sys-tems [9, 10], sum-rate maximization, proportional-fairness andmax-min optimization of SISO/MISO/MIMO ad-hoc networks[11, 12, 13, 14, 15, 16, 17, 18], robust optimization of CRnetworks [12, 19, 20, 21], transmit beamforming design formultiple co-channel multicast groups [22, 23, 24, 25], andcross-layer design of wireless networks [26, 27, 28].

Among the problems mentioned above, in this Part II,we focus as case-study on two important resource allocationdesigns (and their generalizations) that are representative ofsome key challenges posed by the next-generation communi-cation networks, namely: 1) the rate profile maximization inMIMO interference broadcast networks; and 2) the max-minfair multicast multigroup beamforming problem in multi-cellsystems. The interference management problem as in 1) hasbecome a compelling task in 5G densely deployed multi-cellnetworks, where interference seriously limits the achievablerate performance if not properly managed. Multicast beam-forming as in 2) is a part of the Evolved Multimedia BroadcastMulticast Service in the Long-Term Evolution standard forefficient audio and video streaming and has received increasingattention by the research community, due to the proliferationof multimedia services in next-generation wireless networks.Building on the framework developed in Part I, we proposea new class of convex approximation-based algorithms forthe aforementioned two problems enjoying several desirablefeatures. First, they provide better performance guaranteesthan ad-hoc state-of-the art schemes (e.g., [18, 22, 29]),both theoretically and numerically. Specifically, our algorithmsachieve (d-)stationary solutions of the problems under con-siderations, whereas current relaxation-based algorithms (e.g.,semidefinite relaxation) either converge just to feasible pointsor to stationary points of a related problem, which are notproved to be stationary for the original problems. Second, ourschemes represent the first class ofdistributed algorithms inthe literature for such problems: at each iteration, a convex

Page 2: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

2

problem is solved, which naturally decomposes across theBase-Stations (BSs), and thus is solvable in a distributeway (with limited signaling among the cells). Moreover, thesolution of the subproblems is computable in closed form byeach BS. We remark that the proposed parallel and distributeddecomposition across the cells naturally matches modernmulti-tiers network architectures wherein high-speeds wiredlinks are dedicated to coordination and data exchange amongBSs. Third, our algorithms are quite flexible and applicablealso to generalizations of the original formulations 1) and2).For instance, i) one can readily add additional constraints,including interference constraints, null constraints, per-antennapeak and average power constraints, and quality of serviceconstraints; and/or ii) one can consider several other objectivefunctions (rather than the max-min fairness), such as weightedusers’ sum-rate or rates’ weighted geometric mean; all of thiswithout affecting the convergence of the resulting algorithms.This is a major improvement on current solution methods,which are instead rigid ad-hoc schemes that are not applicableto other (even slightly different) formulations.

The rest of the paper is organized as follows. Sec. IIfocuses on the rate profile maximization in MIMO interferencebroadcast networks and its generalizations: after reviewing thestate of the art (cf. Sec. II-B), we propose centralized anddistributed schemes along with their convergence properties inSec. II-C and Sec. II-D, respectively, while some experimentsare reported in Sec. II-E. Sec. III studies the max-min fair mul-ticast multigroup beamforming problem in multi-cell systems:the state of the art is summarized in Sec. III-B; centralizedalgorithms based on alternative convexifications are introducedin Sec. III-C; whereas distributed schemes are presented inSec. III-D; finally, Sec. III-E presents some numerical results.Conclusions are drawn in Sec. IV.

II. I NTERFERENCEBROADCAST NETWORKS

A. System model

Consider the Interference Broadcast Channel (IBC), mod-eling a cellular system composed ofK cells; each cellk ∈KBS , 1, . . . ,K, contains one Base Station (BS) equippedwith Tk transmit antennas and servingIk Mobile Terminals(MTs); see Fig. 1. We denote byik the i-th user in cellk,equipped withMik antennas; the set of users in cellk and theset of all the users are denoted byIk , ik : 1 ≤ i ≤ Ik andI , ik : ik ∈ Ik andk ∈ KBS, respectively. We denoteby Qk , (Qik)

Iki=1 the tuple of covariance matrices of the

signals transmitted by each BSk to the Ik users in the cell,with eachQik ∈ CTk×Tk being the covariance matrix of theinformation symbols of userik.

Each BSk is subject to power constraints in the form

Qk ,

Qk ∈ Wk : Qik 0, ∀ik ∈ Ik,∑Ik

i=1 tr(Qik) ≤ Pk

, (4)

where the setWk, assumed to be closed and convex (withnonempty relative interior) captures possibly additionalcon-straints, such as: i) per-antenna limits

∑Iki=1[Qik ]tt ≤ αt,

with t = 1, . . . , Tk; ii) null constraintsUHk

∑Iki=1 Qik =

0, where Uk is a given matrix whose columns containthe directions (angle, time-slot, or frequency bands) along

1

2

K

BS1

BS2

BSK

11

MT

12

MT 13

MT

3MT

K

23

MT22

MT21

MT

2MT

K

1MT

K

14

MT

Figure 1: Interference Broadcast network.

with BS k is not allowed to transmit; and iii) soft-shaping∑Iki=1 tr(GH

k QikGk) ≤ βk and peak-power constraints∑Iki=1 λmax(G

Hk QikGk) ≤ βk, which limit respectively the

total average and peak average power radiated along the rangespace of matrixGk, whereλmax(A) denotes the maximumeigenvalue of the Hermitian matrixA.

Treating the intra-cell and inter-cell interference at each MTas noise, the maximum achievable rate of each userik is

Rik(Q) , log det(I+HikkQikH

Hikk

Rik(Q−ik)−1)

(5)

whereHikl ∈ CMik

×Tl represents the channel matrix betweenBS l and MT ik; Rik(Q−ik) , σ2

ikI+∑

j 6=i HikkQjkHHikk

+∑l 6=k

∑Ilj=1 HiklQjlH

Hikl

is the covariance matrix of theGaussian thermal noise (assumed to be white w.l.o.g., other-wise one can always pre-whiten the channel matrices) plus theintra-cell (second term) and inter-cell (last term) interference;and we denotedQ−ik , (Qjl)(j,l) 6=(i,k) andQ , (Qk)

Kk=1.

Accordingly, with slight abuse of notation, we will writeQ ∈ Q to meanQk ∈ Qk for all k ∈ KBS.

Rate profile maximization. Max-min fairness has long beenconsidered an important design criterion for wireless networks.Here we introduce the following more general rate-profilemaximization: given the profileα , (αik )ik∈I , with eachαik > 0 and

∑ik∈I αik = 1, let

maxQ

U(Q) , minik∈I

Rik (Q)

αiks.t. Q ∈ Q.

(P)

Special instances of this formulation have been proved to beNP-hard (see, e.g., [18]); therefore in the following our focusis on computing efficiently (d-)stationary solutions ofP .

Definition 1 (d-stationarity). Given Q and D , (Dk)Kk=1,

with Dk , (Dik )Iki=1 andDik ∈ CTk×Tk , letU ′(Q;D) denote

the directional derivative ofU at Q ∈ Q in the directionD,defined as

U ′(Q;D) , limt↓0

U(Q+ tD)− U(Q)

t.

A tuple Q⋆ is a d-stationary solution ofP if

U ′(Q⋆;Q−Q⋆) ≤ 0, ∀Q ∈ Q. (6)

Of course, (local/global) optimal solutions ofP satisfy (6).Equivalent smooth reformulation:To compute d-stationarysolutions of the nonconvex and nonsmooth problemP , wepreliminarily rewriteP in an equivalentsmooth(still noncon-vex) form: introducing the slack variablesR ≥ 0, we have

maxQ,R≥0

R

s.t. Q ∈ Q

Rik(Q) ≥ αikR, ∀ik ∈ I.

(Ps)

Page 3: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

3

We denote byZ the feasible set ofPs. ProblemsP andPs

are equivalent in the following sense.Proposition 2. GivenP andPs, the following hold:

(i) Every feasible point ofPs is regular [i.e., it satisfiesthe Mangasarian-Fromovitz Constraint Qualification(MFCQ)];

(ii) Q⋆ is a d-stationary solution ofP if and only ifthere existsR⋆ such that(Q⋆, R⋆) is a stationarysolution of Ps.

Proof: See supporting material.

Proposition 2 opens the way to the computation of d-stationary solutions ofP while designing algorithms for thesmooth (nonconvex) formulationPs. We are not aware of anyalgorithm with provable convergence to d-stationary solutionsof P , as documented next.

B. Related works

Several resource allocation problems have been studied inthe literature for the vector Gaussian Interference Channel(IC), modeling multiuser interference networks. Some repre-sentative examples corresponding to different design criteriaare: i) the sum-rate maximization problem [11, 12, 13, 21]; ii)the minimization of the transmit power subject to QoS con-straints [7, 27]; iii) the weighted Mean-Square-Error (MSE)minimization and the min-max MSE fairness design [30, 31];and iv) the rate profile optimization over MISO/SIMO [14, 16,17, 32] and MIMO (single-stream) [15, 33] ICs. Since the ICis a special case of the IBC model, algorithms in [14, 15, 17]along with their convergence analysis cannot be applied toProblemPs (and thusP). Moreover, they are allcentralized.

Related to ProblemP is the max-min fairness formulationrecently considered in [18]. There are however several differ-ences between [18] and our approach. First, the formulationin [18] is a special case of ProblemP, corresponding toequal αik and standard power constrains (i.e., without theadditional constraintsWk). Hence, the algorithm in [18] andits convergence analysis do not apply toP. Second, even in thesimplified setting considered in [18], the algorithm therein isnot proved to converge to a (d-)stationary solution of the max-min fairness formulation, but to critical points of an auxiliarysmooth problem that however might not be stationary forthe original problem. Our algorithmic framework instead isguaranteed to converge to stationary solutions ofPs and thus,by Proposition 2, to d-stationary solutions ofP . Third, thealgorithm in [18] iscentralized.

The analysis of the literature shows that ProblemP (andPs)remains unexplored in its generality. The main contribution ofthis section is to propose the first (distributed) algorithmicframework with provable convergence to d-stationary solu-tions of P. More specifically, building on the iNner cOnVexApproximation (NOVA) framework developed in Part I [3],we propose next three alternative convexifications of thenonconvex constraints inPs, which lead to different convexsubproblems [cf. (2)] and algorithms [cf. (3)]. We start inSec. II-C with a centralized instance, while two alternativedistributed implementations are derived in Sec. II-D. Weremark from the outset that all the convexifications we are

going to introduce satisfy Assumptions 2 and 3 (or 3 and 4)in [3], implying [together with Proposition 2(b)] convergenceof the our algorithms.

C. Centralized solution method

The nonconvexity of ProblemPs is due to the nonconvexrate constraintsRik(Q) ≥ αikR. Exploiting the concave-convex structure of the rate function

Rik(Q) = f+ik(Q)− f−

ik(Q−ik), (7)

wheref+ik(Q) = log det

(Rik(Q−ik) +HikkQikH

Hikk

)and

f−ik(Q−ik) = log det(Rik(Q−ik)) are concave functions, a

tight concave lower bound ofRik(Q) (satisfying Assumptions2-4 in [3]) is naturally obtained by retaining in (7) the concavepartf+

ikand linearizing the convex function−f−

ik, which leads

to the following rate approximation functions: givenQν ,

(Qνik)ik∈I , with eachQν

ik 0,

Rik(Q) ≥ Rik(Q;Qν) , f+ik(Q)− f−

ik(Q−ik ;Q

ν) (8)

withf−ik(Q−ik ;Q

ν)

, f−ik(Qν

−ik) +

(j,l) 6=(i,k)

⟨Π−

ik jl(Qν

−ik),Qjl −Qν

jl

⟩ , (9)

where 〈A,B〉 , Retr(AHB), and we denoted byΠ−

ik jl(Qν

−ik) , ∇Q∗

jlf−ik(Qν

−ik) = HH

iklR−1

ik(Qν

−ik)Hikl the

conjugate gradient off−ik

w.r.t. Qjl [34].Given Zν , (Qν , Rν) ∈ Z and τR, τQ > 0, the convex

approximation of problemPs [cf. (2)] reads

Z(Zν) , argmaxQ, R≥0

R−

τR2(R −Rν)2 − τQ ‖Q−Qν‖2F

s.t. Q ∈ Q

Rik(Q;Qν) ≥ αikR, ∀ik ∈ I,(10)

where the the quadratic terms in the objective function areadded to make it strongly convex (see [3, Assumption B1]);and we denoted byZ(Zν) the unique solution of (10). Sta-tionary solutions of ProblemPs can be then computed solvingthe sequence of convexified problems (10) via (3); the formaldescription of the scheme is given in Algorithm 1, whoseconvergence is stated in Theorem 3; the proof of the theoremfollows readily from Proposition 2 and [3, Th.2].

Algorithm 1: NOVA Algorithm for ProblemPs (andP).

Data: τR, τQ > 0, Z0 , (Q0, R0) ∈ Z; setν = 0.(S.1) If Zν is a stationary solution of (Ps): STOP.(S.2) ComputeZ(Zν) by (10).(S.3) SetZν+1 = Zν+γν(Z(Zν)−Zν) for someγν ∈ (0, 1].(S.4) ν ← ν + 1 and go to step (S.1).

Theorem 3. Let Zν , (Qν , Rν) be the sequence generatedby Algorithm 1. Choose anyτR, τQ > 0, and the step-size sequenceγν such thatγν ∈ (0, 1], γν → 0, and∑

ν γν = +∞. Then Zν is bounded and every of its

limit points Z∞ , (Q∞, R∞) is a stationary solution ofProblemPs. Therefore,Q∞ is d-stationary for ProblemP.Furthermore, if the algorithm does not stop after a finite

Page 4: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

4

number of steps, none of theQ∞ above is a local minimumof U, and thusR∞ > 0.

Theorem 3 offers some flexibility in the choice of freeparameters(τR, τQ) and γν, while guaranteeing conver-gence of Algorithm 1. Some effective choices forγν arediscussed in Part I [3]. Note also that the theorem guaranteesthat Algorithm 1 does not remain trapped inR∞ = 0, a“degenerate” stationary solution ofPs (the global minimizerof Ps andP), at which some users do not receive any service.

Algorithm 1 is centralized because (10) cannot be decom-posed across the base stations. This is due to the lack ofseparability of the rate constraints in (10):f+

ik(Q) depends

on the covariance matrices of all the users. We introduce nextan alternative valid convex approximation of the nonconvexrate constraints leading to distributed schemes.

D. Distributed implementation

A centralized implementation might not be appealing inheterogeneous multi-cell systems, where global informationis not available at each BS. Distributing the computation overthe cells as well as alleviating the communication overheadamong the BSs is thus mandatory. This subsection addressesthis issue, and it is devoted to the design of a distributedalgorithm converging to d-stationary solutions of ProblemP .

By keeping the concave partf+ik(Q) unaltered, the approx-

imation Rik in (8) has the desired property of preserving thestructure of the original constraint functionRik as much aspossible. However, the structure ofRik is not suited to bedecomposed across the users due to thenonadditivecouplingamong the variablesQik in f+

ik(Q). To cope with this issue,

the proposed idea is to introduce inP slack variables whosepurpose is to decouple in eachf+

ikthe covariance matrixQik

of user ik from those of the other users−the interferenceterm Rik(Q−ik). More specifically, introducing the slackvariablesY , (Yk)k∈KBS

, with Yk = (Yik )ik∈Ik, and

setting Iik (Q) ,∑K

l=1

∑Ilj=1 HiklQjlH

Hikl

, we can write

f+ik(Q) = f

+

ik(Yik ), with f

+

ik(Yik) , log det(σ2

ikI + Yik)

andYik = Iik(Q). Then, in view of (7), ProblemPs can berewritten in the following equivalent form:

maxQ,R≥0,Y

R

s.t. Q ∈ Q,

f+

ik (Yik )−f−ik(Q−ik)≥αikR, ∀ik ∈ I,

0 Yik Iik(Q), ∀ik ∈ I.

(P′

s)

The next proposition states the formal connection betweenP′

s

andPs, whose proof is omitted because of space limitations.

Proposition 4. GivenPs andP′

s, the following hold:

(i) Every feasible point(Q, R,Y) of P′

s satisfies theMFCQ;

(ii) (Q⋆, R⋆) is a stationary solution ofPs if andonly if there existsY⋆ such that(Q⋆, R⋆,Y⋆) isa stationary solution ofP

s.

It follows from Propositions 2 and 4 that, to compute d-stationary solutions ofP, we can focus w.l.o.g onP

s. Using(9), we can minorize the left-hand side of the rate constraintsin P

s as

f+

ik (Yik )−f−ik(Q−ik) ≥ Rik (Q−ik ,Yik ;Q

ν)

, f+

ik(Yik)− f−

ik(Q−ik ;Q

ν) .

The approximation of ProblemP′

s becomes [cf. (2)]: given afeasibleWν , (Qν , Rν ,Yν), and anyτQ, τR, τY > 0,

W(Wν) , argmaxQ, R≥0,Y

R−

τR2

(R−Rν)2

−τQ ‖Q−Qν‖2F − τY ‖Y −Yν‖2F

s.t. Q ∈ Q,

Rik (Q−ik , Yik ;Qν) ≥ αikR, ∀ik ∈ I,

0 Yik Iik(Q), ∀ik ∈ I.(11)

To compute d-stationary solutions of ProblemP via (11), wecan invoke Algorithm 1 wherein theZ variables [resp.Z(Zν)in Step 2] are replaced by theW ones [resp.W(Wν), definedin (11)]. Convergence of this new algorithm is still givenby Theorem 3. The difference with the centralized approachin Sec. II-C is that now, thanks to the additively separablestructure of the objective and constraint functions in (11), onecan computeW in a distributed way, by leveraging standarddual decomposition techniques; which is shown next.

With W , (Q, R,Y), denotingλ , (λik )ik∈I andΩ ,

(Ωik)ik∈I the multipliers associated to the rate constraintsRik(Q−ik , Yik ;Q

ν) ≥ αikR and slack variable constraintsYik Iik(Q) respectively, let us define the (partial) La-grangian of (11) as

L(W,λ,Ω;Wν) , LR(R,λ;Rν) +∑K

k=1 LQk(Qk,λ,Ω;Qν)

+∑K

k=1 LYk(Yk,λ,Ω;Yν)

whereLR(R,λ;Rν) , −R+

τR2(R−Rν)2 + λTαR;

LQk(Qk,λ,Ω;Qν) , τQ ‖Qk −Qν

k‖2F +

∑ik∈Ik

λikf−ik(Qν

−ik)

+∑

ik∈Ik

∑(j,l) 6=(i,k)

⟨λjlΠ

−jl ik

(Qν−jl

),Qik −Qνik

−∑

ik∈Ik

⟨∑jl∈I H

Hjlk

ΩjlHjlk,Qik

⟩;

LYk(Yk,λ,Ω;Yν) , τY ‖Yk −Yν

k‖2F +

∑ik∈Ik

〈Ωik ,Yik〉

−∑

ik∈Ikλik log det(σ

2ikI+Yik),

with α , (αik)ik∈I . The additively separable structure ofL(W,λ,Ω;Wν) leads to the following decomposition of thedual function:D(λ,Ω;Wν) , min

R≥0LR(R,λ;Rν)

+∑K

k=1 minQk∈Qk

LQk(Qk,λ,Ω;Qν)

+∑K

k=1 min(Yik

0)ik∈Ik

LYk(Yk,λ,Ω;Yν).

(12)The unique solutions of the above optimization prob-

lems are denoted byR⋆(λ;Rν) = argminR≥0LR(R,λ;Rν),Q⋆

k(λ,Ω;Qν) = argminQk∈QkLQk

(Qk,λ,Ω;Qν), andY⋆

k(λ,Ω;Yν) = argmin(Yik0)ik∈Ik

LYk(Yk,λ,Ω;Yν).

We show next thatR⋆ andY⋆k have a closed form expression,

and so doesQ⋆k, when the feasible setsQk contain only power

budget constraints [i.e., there is no setWk in (4)], a fact that

Page 5: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

5

will be tacitly assumed hereafter (the quite standard proofofLemma 5 is omitted because of space limitations).

Lemma 5 (Closed form ofR⋆(λ;Rν)). The optimal solutionR⋆(λ;Rν) has the following expression

R⋆(λ;Rν) =

[Rν −

λTα− 1

τR

]

+

, (13)

where[x]+ , max(0,x) (applied component-wise).

Lemma 6 (Closed form ofQ⋆k(λ,Ω;Qν)). LetUν

ikDν

ikU

ν Hik

be the eigenvalue/eigenvector decomposition of2τQQνik−∑

(j,l) 6=(i,k) λjlΠ−jl ik

(Qν

−jl

)+∑

jl∈I HHjlk

ΩjlHjlk with

Dνik

, Diag(dνik). Let us partition the optimal solu-

tion Q⋆k(λ,Ω;Qν) , (Q⋆

ik(λ,Ω;Qν))ik∈Ik

. Then eachQ⋆

ik(λ,Ω;Qν) has the following waterfilling-like expression

[we omit the dependence on(λ,Ω;Qν)]

Q⋆ik

= Uνik

Diag

([dνik− ξ⋆k

2τQ

]

+

)Uν H

ik, ∀ik ∈ Ik, (14)

whereξ⋆k ≥ 0 is the water-level chosen to satisfy the powerconstraint

∑Ikik∈Ik

tr(Q⋆ik) ≤ Pk, which can be found either

exactly by the finite-step hypothesis method(see Algorithm 4,Appendix A) or approximately via bisection on the interval[0,∑

ik∈Iktr(Diag(dν

ik))/(Tk Ik)

].

Proof: See Appendix A.

Lemma 7 (Closed form ofY⋆k(λ,Ω;Yν)). Let Vν

ikDν

ikVνH

ikbe the eigenvalue/eigenvector decomposition of2τYYν

ik−Ων

ik,

with Dνik

= Diag(dνik). Let us partition the optimal solu-

tion Y⋆k(λ,Ω;Yν) , (Y⋆

ik(λik ,Ωik ;Y

ν))ik∈Ik. Then, each

Y⋆ik(λik ,Ωik ;Y

ν) has the following expression[we omit thedependence on(λik ,Ωik ;Y

ν)]

Y⋆ik = Vν

ikDiag(yνik

)VνH

ik , ∀ik ∈ Ik, (15)

withyνik

,

[−1

2

(σ2ik −

1

2τYdνik

)

+1

2

√(σ2ik+

1

2τYdνik

)2

+2λik

τY1

+

,

(16)

where1 denotes the vector of all ones.Proof: See Appendix B.

Finally, note that, sinceL (W,λ,Ω;Wν) is strongly con-vex for any givenλ ≥ 0 and Ω , (Ωik 0)ik∈I , thedual functionD(λ,Ω;Wν) in (12) is (R-)differentiable onRI

+ ×∏

ik∈I CMik

×Mik , with (conjugate) gradient given by

∇λikD(λ,Ω;Wν) = αikR

⋆ − Rik

(Q⋆

−ik, Y⋆

ik;Qν

)

∇Ω∗ikD(λ,Ω;Wν) = Y⋆

ik−∑

jl∈I HiklQ⋆jlHH

ikl,

(17)whereR⋆, Q⋆

ik, andY⋆

ikare given by (13), (14), and (15),

respectively. Also, it can be shown that the dual function isC2,with Lipschitz continuous (augmented) Hessian with respectto Wν . Then, the dual problem

maxλ≥0

Ω=(Ωik0)ik∈I

D(λ,Ω;Wν) (18)

can be solved using either first or second order methods. Agradient-based scheme with diminishing step-size is givenin

Algorithm 2, whose convergence is stated in Theorem 8 (theproof of the theorem follows from classical arguments and thusis omitted). In Algorithm 2,PH+(•) denotes the orthogonalprojection onto the set of complex positive semidefinite (andthus Hermitian) matrices.

Algorithm 2 Distributed dual algorithm solving (18)

Data: λ0, (λ0

ik)ik∈I ≥ 0, Ω0

, (Ω0ik 0)ik∈I , Wν =

(Qν , Rν ,Yν). Setn = 0.(S.1) If λn andΩn satisfy a suitable termination criterion:STOP.(S.2) Compute eachR⋆(λn;Rν), Q⋆

ik(λn,Ωn;Qν), and

Y⋆ik(λn,Ωn;Yν) [cf. (13)-(15)].

(S.3) Updateλ andΩ according to

λn+1ik

=[λnik+ βn∇λik

D(λ,Ω;Wν)]+

Ωn+1ik

= PH+

(Ωn

ik+ βn∇Ω∗

ikD(λ,Ω;Wν)

) , ∀ik ∈ I

for someβn > 0.(S.4) n← n+ 1, go back to (S.1).

Theorem 8. Let (λn,Ωn) be the sequence generatedby Algorithm 2. Choose the step-size sequenceβnsuch that βn > 0, βn → 0,

∑n β

n = +∞,and

∑n (β

n)2

< ∞. Then, (λn,Ωn) convergesto a solution of (18). Therefore, the sequence(R⋆(λn;Rν), Q⋆ (λn,Ωn;Qν) ,Y⋆ (λn,Ωn;Yν))converges to the unique solution of (11).

Algorithm 2 can be implemented in a fairly distributedway across the BSs, with limited communication overhead.More specifically, given the current value of(λ,Ω), eachBS k updates individually the covariance matricesQk =(Qik)ik∈Ik

of the users in the cellk as well as the slackvariablesYk = (Yik)ik∈Ik

, by computing the closed formsolutionsQ⋆

ik(λn,Ωn;Qν) [cf. (14)] and Y⋆

ik(λn,Ωn;Yν)

[cf. (15)]. The update of theR-variable [usingR⋆(λ;Rν)] andthe multipliers(λ,Ω) require some coordination among theBSs: it can be either carried out by a BS header or locally byall the BSs if a consensus-like scheme is employed in order toobtain locally the information required to computeR⋆(λ;Rν)and the gradients in (17).

The dual problem (18) can be also solved using a secondorder-based scheme, which is expected in practice to be fasterthan a gradient-based one. It is sufficient to replace Step 3of Algorithm 2 with the following updating rules for themultipliers:

λn+1 = λn + sn([λ

n+1]+ − λn

),

Ωn+1 = Ωn + sn(PH+(Ω

n+1)−Ωn

),

(19)

where λn+1

and Ωn+1

can be computed byΛn+1

,

[λn+1

, vec(Ωn+1

)T ]T , with vec(Ω) , (vec(Ω)ik)ik∈I , whichis updated according to

Λn+1

, Λn+(∇2

λ,vec(Ω∗)D(λn,Ωn;Wν))−1

· ∇λ,vec(Ω∗)D(λn,Ωn;Wν).(20)

Page 6: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

6

0 5 10 15 20 25 300

0.5

1

1.5

2

2.5

3

SNR (dB)

Min

imum

Rat

e (n

ats/

s/H

z)

Max−Min WMMSEWMMSENOVA (proposed approach)SJBRGSJBRAS

Figure 2: Minimum rate vs. SNR: The proposed algorithm reaches bettersolutions than the state of the art.

In Appendix C we provide the explicit expression of the(augmented) Hessian matrices and gradients in (20).

E. Numerical resultsIn this section we present some experiments assessing the

effectiveness of the proposed formulation and algorithms.Example#1 :Centralized algorithm. We start comparingAlgorithm 1 with three other approaches presented in theliterature for the resource allocation over IBCs, namely: 1)the Max-Min WMMSE [18], which aims at maximizing theminimum rate of the system (a special case ofPs); 2) theWMMSE algorithm [35] and the partial linearization-based al-gorithm (termed SJBR) [13], which consider the maximizationof the system sum-rate; and 3) the partial linearization-basedalgorithm [13], applied to maximize the geometric mean ofthe rates (the proportional fairness utility function), termedGSJBR. As benchmark, we also report the results achievedusing the standard nonlinear programming solver in Matlab,specifically the active-set algorithm in ’fmincon’ (among theother options in fmincon, the active-set algorithm was theone that showed better performance); we refer to it as “AS”algorithm. To allow the comparison, we consider a specialcase ofPs as in [18]. We simulated aK = 4 cell IBC withIk = 3 randomly placed active MTs per cell; the BSs andMTs are equipped with 4 antennas. Channels are Rayleighfading, whose path-loss are generated using the 3GPP(TR36.814) methodology. We assume white zero-mean Gaussiannoise at each receiver, with varianceσ2, and same powerbudgetP for all the BSs; the signal to noise ratio is thenSNR = P/σ2. Algorithm 1 is simulated usingτR = 1e−7and the step-size ruleγν = γν−1(1 − 10−3γν−1), withγ0 = 1. The same step-size rule is used for SJBR andGSJBR. The same random feasible initialization is used forall the algorithms. All algorithms are terminated when theabsolute value of the difference between two consecutivevalues of the objective function becomes smaller than1e−3. InFig. 2 we plot the minimum rate versus SNR achieved by theaforementioned algorithms. All results are averaged over 300independent channel/topology realizations. The figures showthat our algorithm yields substantially more fair rate alloca-tions (larger minimum rates) than all the others. As expected,we observed that SJBR and WMMSE achieve higher sum-rates (not reported in the figure) while sacrificing the fairness:SJBR and WMMSE can shut off some users (the associatedminimum rate is zero). More specifically, for SNR∼ 15dB(resp. SNR∼ 30dB) we observed the following average losseson the sum-rate w.r.t. SJBR (and WMMSE):∼ 70% (resp.

0 50 100 150 2000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rn

Iterat ion index n

Slack−based decompositionCentralized algorithm

0 10 20 30 40 50 600.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

rk

Iteration index k

decentralized Newton dual methodcentralized

Figure 3: Minimum rate vs. iterations. Left plot: Centralized algorithm vs.Distributed, first order; Right plot: Centralized algorithm vs. Distributed,second order.

∼ 45%) for our algorithm and Max-Min WMMSE;∼ 50%(resp.∼ 60%) for GSJBR; and∼ 80% (resp.∼ 80%) for AS.Between Algorithm 1 and the Max-Min WMMSE [18], theformer provides better solutions, both in terms of minimumrate (cf. Fig. 2) and sum-rate (not reported). This might be dueto the fact that the Max-Min WMMSE converges to stationarysolutions of an auxiliary nonconvex problem (obtained liftingPs) that are not proved to be stationary also for the originalformulationP; our algorithm instead converges to d-stationarysolutions ofP.Example#2: Distributed algorithms.We test now the dis-tributed algorithms in Sec. II-D and compare them with thecentralized implementation. Specifically, we simulate i) Algo-rithm 1 based on the solutionW(Wν) (termedCentralizedalgorithm); ii) the same algorithm as in i) but withW(Wν)computed in a distributed way using Algorithm 2 (termedDistributed, first-order); and iii) the same algorithm as in i)but with W(Wν) computed solving the dual problem (18)using a second-order method (termedDistributed, second-order). The simulated scenario as well as the tuning of thealgorithms is as in Example#1. In Algorithm 2, the step-sizesequenceβn has been chosen asβn = βn−1(1−0.8·βn−1);also the starting point(λ0,Ω0) is set equal to the optimalsolution (λ⋆,Ω⋆) of the previous round. Fig. 3 shows thenormalized rate evolution achieved by the aforementionedalgorithms versus the iteration indexn. For the distributedalgorithms, the number of iterations counts both the inner andouter iterations. Note that all the algorithms converge to thesame stationary point of ProblemP , and they are quite fast.As expected, exploiting second order information acceleratesthe practical convergence but at the cost of extra signalingamong the BSs.

III. M ULTIGROUP MULTICAST BEAMFORMING

A. System model

We study the general Max-Min Fair (MMF) beamformingproblem for multi-group multicasting [22], where differentgroups of subscribers request different data streams from thetransmitter; see Fig. 4.

We assume that there areK BSs (one per cell), each of themequipped withNt transmit antennas and a total ofI activeusers, which have a single receive antenna; letI , 1, . . . , I.For notational simplicity, we assume w.l.o.g. that each BSserves a single multicast group; letGk denote the group ofusers served by thek-th BS, with k ∈ KBS , 1, · · · ,K;G1, . . . ,GK is a partition ofI. We will denote byik useri belonging to groupGk. The extension of the algorithm to

Page 7: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

7

the multi-group case (i.e., multiple groups in each cell) isstraightforward. Lettingwk ∈ CNt be the beamforming vectorfor transmission at BSk (to groupGk), the Max-Min Fair(MMF) beamforming problem reads

maxw,(wk)k∈KBS

U(w) , minik∈Gk, k∈KBS

wHk Hikkwk∑

ℓ 6=k wHℓ Hikℓwℓ + σ2

ik

s. t. ‖wk‖22 ≤ Pk, ∀k ∈ KBS,(21)

whereHikℓ is a positive semidefinite (not all zero) matrixmodeling the channel between theℓ-th BS and userik ∈ Gk;specifically,Hikℓ = hikℓh

Hikℓ

if instantaneous CSI is assumed,with hikℓ ∈ CNt being the frequency-flat quasi-static channelvector from BS ℓ to user ik; and Hikℓ = E(hikℓh

Hikℓ

)represents the spatial correlation matrix if only long-term CSIis available (in the latter case, no special structure forHikℓ

is assumed);σ2ik

is the variance of the AWGN at receiverik; andPk is the power budget of cellk. Note that differentqualities of service among users can be readily accommodatedby multiplying each SINR in (21) by a predetermined positivefactor, which we will tacitly assume to be absorbed in thechannel matricesHikk. We denote byW the (convex) feasibleset of (21). We remark that, differently from the literature,one can add further (convex) constraints inW , such as per-antenna power, null or interference constraints; the algorithmicframework we are going to introduce is still applicable.

A simplified instance of problem (21), where there existsonly one cell (and multiple groups), i.e.,K = 1, has beenproved to be NP-hard [22]. Therefore, in the following, weaim at computing efficiently (d-)stationary solutions of (21),the definition of which is analogous to the previous case (cf.(6) for ProblemP).Equivalent reformulation: We start rewriting the nonconvexand nonsmooth problem (21) in an equivalentsmooth(stillnonconvex) form: introducing the slack variablest ≥ 0, andβ , ((βik )ik∈Gk

)k∈KBS, we have

maxt ≥ 0, β, w

t

s. t. (a): t · βik −wHk Hikkwk︸ ︷︷ ︸

,gik (t,βik,wk)

≤ 0,

∀ik ∈ Gk, ∀k ∈ KBS,

(b) :∑

ℓ 6=k

wHℓ Hikℓwℓ + σ2

ik≤ βik ,

∀ik ∈ Gk, ∀k ∈ KBS,

(c) : βik ≤ βmaxik

, ∀ik ∈ Gk, ∀k ∈ KBS,

(d) : ‖wk‖22 ≤ Pk, ∀k ∈ KBS,(22)

where condition (c) is meant to bound eachβik by βmaxik

,

maxℓ∈KBS

λmax (Hikℓ) ·∑K

k=1 Pk + σ2ik

, so that the feasible set

of (22), which we will denote byZ, is compact (a conditionthat is desirable to strengthen the convergence propertiesofour algorithms, see [3, Th. 2]). Problems (21) and (22) areequivalent in the following sense.

Proposition 9. Given (21) and (22), the following hold:(i) Every stationary solution(t⋆,β⋆,w⋆) of (22) satis-

fies the MFCQ;

1

K

2 BS1

BS2

BSK

Figure 4: Multicell multi-group multicast network.

(ii) w⋆ is a d-stationary solution of (21) if and onlyif there existt⋆ and β⋆ such that(t⋆,β⋆,w⋆) is astationary solution of (22).

Proof: See supporting material.Therefore, we can focus w.l.o.g. on (22). We remark that

the equivalence stated in Proposition 9 is a new result in theliterature.

B. Related works

The multicast beamforming problem has been widely stud-ied in the literature, under different channel models andsettings (single-group vs. multi-group and single-cell vs. multi-cell). While the general formulation is nonconvex, specialinstances exhibit ad-hoc structures that allow them to besolved efficiently, leveraging equivalent (quasi-)convexrefor-mulations; see, e.g., [36, 37, 38]. In the case of general channelvectors, however, the (single-cell) MMF beamforming problem[and thus also Problem (22)] was proved to be NP-hard [22].This has motivated a lot of interest to pursuit approximatesolutions that approach optimal performance at moderate com-plexity. SemiDefinite Relaxations (SDR) followed by Gaussianrandomization (SDR-G) have been extensively studied in theliterature to obtain good suboptimal solutions [22, 23, 24,25],with theoretical bound guarantees [39, 40]. For a large numberof antennas or users, however, the quality of the approximationobtained by SDR-G methods deteriorates considerably. Infact, SDR-based approaches return feasible points that ingeneral may not be even stationary for the original nonconvexproblem. Moreover, in a multi-cell scenario,SDR-G is notsuitable for a distributedimplementation across the cells.

Two schemes based on heuristic convex approximationshave been recently proposed in [41] and [42] (the latterbased on earlier work [43]) for thesingle-cellmultiple-groupMMF beamforming problem. While extensive experimentsshow that these schemes achieve better solutions than SDR-G approaches, their theoretical convergence and guaranteesremain an open question. Finally, we are not aware of anydistributed scheme with provable convergence for themulti-cell MMF beamforming problem.

Leveraging our NOVA framework, we propose next a novelcentralized algorithm and the firstdistributedalgorithm, both

Page 8: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

8

converging to d-stationary solutions of Problem (21). Numer-ical results (cf. Sec. III-E) show that our schemes reach bettersolutions than SDR-G approaches with high probability, whilehaving comparable computational complexity.

C. Centralized solution methodProblem (22) is nonconvex due to the nonconvex constraint

functionsgik(t, βik ,wk). Several valid convexifications ofgikare possible; two examples are given next.Example #1: Note thatgik is the sum of a bilinear functionand a concave one, namely:gik(t, βik ,wk) = gik,1(t, βik) +gik,2(wk), with

gik,1(t, βik) , t · βik , andgik,2(wk) , −wHk Hikkwk.

(23)A valid surrogategik can be then obtained as follows: i)linearizegik,2(wk) aroundwν

k , that is,

gik,2 (wk;wνk) , − (wν

k)HHikkw

νk

−⟨∇w∗

kgik,2 (w

νk) , wk −wν

k

⟩≥ gik,2(wk),

(24)with ∇w∗

kgik,2 (w

νk) = Hikkw

νk and 〈a, b〉 , 2ReaHb;

and ii) upper boundgik,1(t, βik) around(tν , βνik) 6= (0, 0) as

gik,1(t, βik ; tν , βν

ik) ,

1

2

(βνik

tνt2 +

βνik

β2ik

)≥ gik,1(t, βik).

(25)Overall, this results in the following surrogate function whichsatisfies [3, Assumptions 2-4]:

gik(t, βik ,wk; tν , βν

ik,wν

k)

, gik,1(t, βik ; tν , βν

ik) + gik,2 (wk;w

νk) .

(26)

Example #2: Another valid approximation can be readily ob-tained using a different bound for the bilinear termgik,1(t, βik)in (23). Rewritinggik,1(t, βik) as the difference of two convexfunctions, gik,1(t, βik) = 1

2 ((t + βik)2 − (t2 + β2

ik)), the

desired convex upper bound ofgik,1(t, βik) can be obtainedby linearizing the concave part ofgik,1(t, βik) around(tν , βν

ik)

while retaining the convex part, which leads to

gik,1(t, βik ; tν , βν

ik) ,

1

2

((t+ βik)

2 − (tν)2 −

(βνik

)2)

−(tν (t− tν) + βν

ik

(βik − βν

ik

)).(27)

The resulting valid surrogate function is then

gik(t, βik ,wk; tν , βν

ik,wν

k)

, gik,1(t, βik ; tν , βν

ik) + gik,2 (wk;w

νk) .

(28)

The strongly convex inner approximation of (22) [cf. (2)]then reads: given a feasiblezν , (tν ,βν ,wν),

maxt ≥ 0, β, w

t−τt2(t− tν)

2 − τw ‖w −wν‖22 −τβ2 ‖β − βν‖22

s. t. gik(t, βik ,wk; tν , βν

ik,wν

k) ≤ 0, ∀ik ∈ Gk, ∀k ∈ KBS,

(b), (c), and (d) of(22);(29)

wheregik is the surrogate defined either in (26) or in (28). Inthe objective function of (29) we added a proximal regulariza-tion to make it strongly convex; therefore, problem (29) hasaunique solution, which we denote byz(zν).

Using (29), the NOVA algorithm based on (3) is describedin Algorithm 3, whose convergence is established in Theorem10. Note thattν > 0, for all ν ≥ 1 (provided thatt0 > 0),which guarantees that, if (26) is used in (29), thengi is alwayswell defined. Also, the algorithm will never converge to adegenerate stationary solution of (21) (U = 0, i.e., t∞ = 0),at which some users will not receive any signal.

Algorithm 3: NOVA Algorithm for Problem (21)

Data: z0 , (t0,β0,w0) ∈ Z, with t0 > 0, and(τt, τw, τβ) >0. Setν = 0.(S.1) If zν is a stationary solution of (21):STOP.(S.2) Computez(zν).(S.3) Setzν+1 = zν +γν (z(zν)− zν) for someγν ∈ (0, 1].(S.4) ν ← ν + 1 and go to step (S.1).

Theorem 10. Let zν = (tν ,βν ,wν) be the sequencegenerated by Algorithm 3. Choose anyτt, τw, τβ > 0, andthe step-size sequenceγν such thatγν ∈ (0, 1], γν → 0,and

∑ν γ

ν = +∞. Then,zν is bounded (withtν > 0,for all ν ≥ 1), and every of its limit points(t∞,β∞,w∞)is a stationary solution of Problem (22), such thatt∞ > 0.Therefore,w∞ is d-stationary for Problem (21). Furthermore,if the algorithm does not stop after a finite number of steps,none of thew∞ above is a local minimum of U.

Proof: See Appendix D.

D. Distributed implementation

Algorithm 3 is centralized, because subproblems (29) donot decouple across the BSs. In this section, we develop adistributed solution method for (29), using the surrogate inExample#1 [cf. (26)]. We exploit the additive separability inthe BSs’ variables of the objective function and constraints in(29), as outlined next.

Denoting by λ , (λk , (λik )ik∈Gk)k∈KBS and

η , (ηk , (ηik)ik∈Gk)k∈KBS

the multipliers associatedto the constraintsgik ≤ 0 and (b) in (29), respectively, andintroducing σ2 , ((σ2

ik)ik∈Gk

)k∈KBS, βk , (βik)ik∈Gk

,and β

max, (βmax

k , (βmaxik

)ik∈Gk)k∈KBS, the (partial)

Lagrangian of (29) can be shown to have the followingstructure: L (t, β,w,λ,η; zν) = Lt (t,λ,η; tν ,β

ν) +∑Kk=1 Lwk

(wk,λ,η;wν) +

∑Kk=1 Lβk

(βk,λ,η; tν ,βν),

where

Lt (t,λ,η; tν ,βν) , −t+ τt

2 (t− tν)2+ λTβν

2tν t2 + ηTσ2;

Lwk(wk,λ,η;w

ν) , τw‖wk −wνk‖

22

−∑

ik∈Gkλikw

νHk Hikkw

νk

−∑

ik∈Gkλik 〈Hikkw

νk,wk −wν

k〉

+∑

ℓ 6=k

∑iℓ∈Gℓ

ηiℓwHk Hiℓkwk;

Lβk(βk,λ,η; t

ν ,βν) ,τβ2 ‖βk − βν

k‖22 − ηT

k βk

+∑

ik∈Gk

λik· tν

2βνik

β2ik.

The above structure of the Lagrangian leads naturally to the

Page 9: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

9

following decomposition of the dual function:

D (λ,η; zν) = mint≥0Lt (t,λ,η; tν ,β

ν)

+∑

k∈KBS

min‖wk‖2

2≤Pk

Lwk(wk,λ,η;w

ν)

+∑

k∈KBS

min0≤βk≤βmax

k

Lβk(βk,λ,η; t

ν ,βν) .

(30)The unique solutions of the above optimization problems canbe computed in closed form (the proof is omitted because ofspace limitations):

t (λ; tν ,βν) , argmint≥0

Lt (t,λ,η; tν ,βν) =

[1 + τt · tν

τt + λTβν/tν

]

+

βk (λ,η; tν ,βν) , argmin

0≤βk≤βmaxk

Lβk(βk,λ,η; t

ν ,βν)

=

([τβ · βν

ik+ ηik

τβ + λik · tν/βν

ik

]βmaxik

0

)

ik∈Gk

wk (λ,η;wν) , argmin

‖wk‖22≤Pk

Lwk(wk,λ,η;w

ν)

= (ξ⋆kI+Ak)−1bν

k,(31)

where[x]ba , min(b,max(a, x)), Ak andbνk are defined as

Ak , τwI+∑

ℓ 6=k

∑iℓ∈Gℓ

ηiℓHiℓk

bνk ,

(τwI+

∑i∈Gk

λikHikk

)wν

k ,

and ξ⋆k, which is such that0 ≤ ξ⋆k ⊥ ‖wk (λ,η, ξ⋆k) ‖

22 −

Pk ≤ 0, can be efficiently computed as follows. Denoting byUkDkU

Hk the eigendecomposition ofAk, we havefk(ξk) ,

‖wνk (λ,η;w

ν) ‖22−Pk =∑Nt

j=1[UH

k bνkb

νHk Uk]jj

(ξk+[Dk]jj)2 −Pk. There-

fore, ξ⋆k = 0 if fk(0) < 0; otherwise ξ⋆k is such thatfk(ξ

⋆k) = 0, which can be computed using bisection on

[0,√

tr(UHk bν

kbνHk Uk)/Pk −min

j[Dk]jj ].

Finally, note that the dual functionD (λ,η; zν) is differen-tiable onRI

+ × RI+ ,with gradient given by

∇λikDν (λ,η; zν) = gik

(t(λ; tν ,βν), β(λ,η; tν ,βν),

w(λ,η;wν); zν),

∇ηikDν (λ,η; zν) =

∑ℓ 6=k wℓ(λ,η;w

ν)HHikℓwℓ(λ,η;wν)

+σ2ik− βik (λ,η; t

ν ,βν) . (32)Using (32), the dual problemmaxλ,η≥0D (λ,η; zν) can besolved in a distributed way with convergence guarantees using,e.g., a gradient-based scheme with diminishing step-size;weomit further details. Overall, the proposed algorithm consistsin updating zν via zν+1 = zν + γν (z(zν)− zν) [cf. (3)]wherein z(zν) is computed in a distributed way solving thedual problem (30), e.g., using a first or second order method.The algorithm is thus a double-loop scheme. The inner loopdeals with the update of the multipliers(λ,η), given zν ; let(λ∞,η∞) be the limit point (within the desired accuracy) ofthe sequence(λn,ηn) generated by the algorithm solvingthe dual problem (30). In the outer loop the BSs update locallytheir t, βks andwks using the closed form solutionsz(zν) =(t(λ∞; tν ,βν), β(λ∞,η∞; tν ,βν), w(λ∞,η∞;wν))[cf. (31)]. The inner and outer updates can be performedin a fairly distributed way among the cells. Indeed, to

compute the closed form solutionswk(λn,ηn;wν) and

βk(λn,ηn; tν ,βν), the BSs need only information within

their cell. The update of thet−variable [usingt(λn; tν ,βν)]and multipliers

(λn+1,ηn+1

)require some coordination

among the BSs: it can be either carried out by a BS headeror locally by all the BSs if a consensus-like scheme isemployed to obtain locally the information required tocomputet(λn; tν ,βν) and the gradients in (32).

E. Numerical ResultsIn this section, we present some numerical results validating

the proposed approach and algorithmic framework.Example#1 :Centralized algorithm. We compare our NOVAalgorithm with the renowned SDR-G scheme in [22]. For theNOVA algorithm, we considered two instances, correspondingto the two approximation strategies introduced in Sec. III-C(see Examples 1 and 2); we will term them “NOVA1” and“NOVA2”, respectively. The setup of our experiment is thefollowing. We simulated a single BS system; the transmitteris equipped withNt = 8 transmit antennas and servesK = 2multicast groups, each withI single-antenna users. Differentnumbers of users per group are considered, namely:I =12, 24, 30, 50, 100. NOVA algorithms are simulated usingthe step-size ruleγν = γν−1

(1− 10−2γν−1

), with γ0 = 1;

the proximal gain is set toτ = 1e−5. The iterate is terminatedwhen the absolute value of the difference of the objectivefunction in two consecutive iterations is less than1e−3. Forthe SDR-G in [22], 300 Gaussian samples are taken duringthe randomization phase, where the principal component ofthe relaxed SDP solution is also included as a candidate; thebest value of the resulting objective function is denoted bytSDR. To be fair, for our schemes, we considered 300 randomfeasible starting points and kept the best value of the objectivefunction at convergence, denoted bytNOVA . We then comparedthe performance of the two algorithms in terms of the ratiotNOVA/tSDR. As benchmark, we also report the results achievedusing the standard nonlinear programming solver in Matlab,specifically the active-set algorithm in ’fmincon’; we referto it as “AS” algorithm and denote bytAS the best valueof the objective function at convergence (obtained over thesame random initializations of the NOVA schemes). In Fig.5 a) we plot the probability thattNOVA/AS/tSDR ≥ α versusα, for different values ofI (number of users per group), andSNR , P/σ2 = 3dB; this probability is estimated taking300 independent channel realizations. The figures show asignificant gain of the proposed NOVA methods. For instance,when I = 30, the minimum achieved SINR of all NOVAmethods is aboutat leastthree times andat most5 times theone achieved by SDR-G, with probability one. It seems thatthe gap tends to grow in favor of the NOVA methods, as thenumber of users increases. In Fig. 5 b) we plot the distributionof tNOVA/AS/tSDR. For instance, whenI = 30, the minimumachieved SINR of all NOVA methods is on average about fourtimes the one achieved by SDR-G; the variance is about 1.

In Fig. 6 we plot the average (normalized) distance of theobjective value achieved at convergence of the aforementionedalgorithms from the upper bound obtained solving the SDPrelaxation (denoted bytSDP). More specifically, we plot the

Page 10: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

10

average of1− tapprox/tSDP versusI (the average is taken over300 independent channel realizations), wheretapprox = tSDR

for the SDR-G algorithm,tapprox = tNOVA for our methods,and tapprox = tAS for the AS algorithm in Matlab, withtSDR,tNOVA and tAS defined as in Fig. 5. The figure shows that theobjective value reached by our methods is much closer to theSDP bound than the one obtained by SDR-G. For instance,when I = 30, the solution of our NOVA methods are within25% the upper bound.Example#2 : Distributed algorithms.The previous exampleshows that the proposed schemes compare favorably with thecommercial off-the-shelf software and outperform SDR-basedscheme (in terms of quality of the solution and convergencespeed). However, differently from off-the-shelf softwares, ourschemes allow for a distributed implementation in a multi-cellscenario with convergence guarantees. We test the distributedimplementation of the algorithms, as described in Sec. III-D,and compare it with the centralized one. More specifically, wesimulate i) Algorithm 3 based on the solutionz(zν) (termedCentralized algorithm); ii) the same algorithm as in i) butwith z(zν) computed in a distributed way using the heavyball method (termedDistributed, first-order); and iii) the samealgorithm as in i) but withz(zν) computed by solving the dualproblemmaxλ,η≥0D

ν (λ,η; zν) using the damped Netwonmethod (termedDistributed, second-order). The simulatedscenario of our experiment is the following. We simulated asystem comprisingK = 4 BSs, each equipped withNt = 4transmit antennas and servingG = 1 multicast group. Eachgroup hasI = 3 single-antenna users. In both loops (innerand outer), the iterate is terminated when the absolute valueof the difference of the objective function in two consecutiveiterations is less than1e−2. Fig. 7 shows the evolution ofthe objective functiont of (22) versus the iterations. For thedistributed algorithms, the number of iterations counts both theinner and outer iterations. Note that all the algorithms convergeto the same stationary point of Problem (21), and they arequite fast. As expected, exploiting second order informationaccelerates the practical convergence but with the cost of extrasignaling among the BSs.

IV. CONCLUDING REMARKS

In this two-part paper we introduced and analyzed a newalgorithmic framework for the distributed minimization ofnonconvex functions, subject to nonconvex constraints. PartI [44] developed the general framework and studied itsconvergence properties. In this Part II, we customized ourgeneral results to two challenging and timely problems incommunications, namely: 1) the rate profile maximization inMIMO IBCs; and 2) the max-min fair multicast multigroupbeamforming problem in multi-cell systems. Our algorithmsi) were proved to converge to d-stationary solutions of theaforementioned problems; ii) represent the first attempt todesign distributed solution methods for 1) and 2); and iii) wereshown numerically to reach better local optimal solutions thanad-hoc schemes proposed in the literature for special casesofthe considered formulations.

We remark that, although we considered in details onlyproblems 1) and 2) above, the applicability of our framework

0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α

Pro

b(tN

OV

A/A

S /tSD

R≥α

)

NOVA1 (proposed)NOVA2 (proposed)AS

I=21

I=12I=30

I=50

I=100

(a)

0 2 4 6 8 10 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

tNOVA/AS/tSDR

PD

F

NOVA1 (proposed) NOVA2 (proposed)AS

I=12

I=21

I=30 I=50

I=100

(b)Figure 5: (a): Prob(tNOVA/AS/tSDR ≥ α) versus α, for I =12, 21, 30, 50, 100; (b): Estimated p.d.f. oftNOVA/tSDR and tAS/tSDR.

12 21 30 50 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Users per Group

Eh[(

tSD

P−

tappr

ox)/

tSD

P]

SDRNOVA1 (proposed)NOVA2 (proposed)AS

Figure 6: Average normalized distance of the achieved minimum SINR fromthe SDP upper bound versusI.

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Iteration index n

tn

Distributed, first−orderCentralized algorithm

0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Iteration index n

tn

Distributed, second−orderCentralized algorithm

Figure 7: Minimum rate vs. iterations. Left plot: Centralized algorithm vs.Distributed, first order; Right plot: Centralized algorithm vs. Distributed,second order.

goes much beyond these two specific formulations and, moregenerally, applications in communications. Moreover, evenwithin the network systems considered in 1) and 2), onecan consider alternative objective functions and constraints.Two examples of unsolved problems to which our frame-work readily applies (with convergence guarantees) are: 1)the distributed minimization of the BS’s (weighted) transmitpower over MIMO IBCs, subject to rate constraints; and 2)the maximization of the (weighted) sum of the multi-castmultigroup capacity in multi-cell systems.

Our framework finds applications also in other areas, suchas signal processing, smart grids, and machine learning.

Page 11: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

11

APPENDIX

A. Proof of Lemma 6

Let Qik , UνHik

QikUνik

, and Qk , (Qik)ik∈Ik. Then,

each problemminQk∈QkLQk

(Qk,λ,Ω;Qν) in (12) can berewritten as

minQk=(Qik

0)ik∈Ik

ik∈Ik

tr((τQQH

ik−Dν

ik

)Qik

)

s.t.∑

ik∈Ik

tr(Qik) ≤ Pk.(33)

We claim that the optimal solution of (33) must be diagonal.Indeed, denoting bydiag(Qik) the diagonal matrix havingthe same diagonal entries ofQik , andoff(Qik) , Qik −diag(Qik), (each term in the sum of) the objective functionof (33) can be lower bounded as

tr((τQQH

ik−Dν

ik

)Qik

)

= tr((τQdiag(Qik)−Dν

ik

)diag(Qik)

)

+tr(τQoff(Qik)

Hoff(Qik))

≥ tr((τQdiag(Qik)−Dν

ik

)diag(Qik)

).

(34)

The claim follows from the fact that the lower bound in (34)is achieved if and only ifoff(Qik) = 0 and that constraintsin (33) depend only ondiag(Qik).

SettingQik = Diag(qik ) to be diagonal with entries theelements ofqik , (qik ,t)

Tk

t=1, (33) becomes

minqik

≥0

ik∈Ik

Tk∑

t=1

(τQq2ik,t − dνik,t · qik,t

)

s.t.∑

ik∈Ik

Tk∑

t=1

qik,t ≤ Pk.

(35)

Problem (35) has a closed form solutionq⋆ik

(up to themultiplier ξ⋆k), given by

q⋆ik =

[dνik− ξ⋆k

2τQ

]

+

, (36)

where ξ⋆k needs to be chosen so that∑

ik∈Ik1T q⋆

ik≤ Pk.

This can be done, e.g., using Algorithm 4, which convergesin a finite number of steps.

Algorithm 4 Efficient computation ofξ⋆k in (36)

Data: dνk , (dνk,j)

Tk·Ikj=1 = (dν

ik)ik∈Ik

(arranged in decreasingorder).(S.0) SetJ , j : dνk,j > 0.

(S.1) If∑

i∈J

dνk,j

2τQ≤ Pk: setξ⋆k = 0 andSTOP.

(S.2) Repeat(a) Setξ⋆k =

[∑j∈J dν

k,j−2τQPk

|J |

]+;

(b) Ifdνk,j−ξ⋆k2τQ

> 0, ∀j ∈ J : STOP;

else J , J / j = |J |;until |J | = 1.

The optimal solution Q⋆ik

is thus given by Q⋆ik

,

UνikDiag(q⋆

ik)UνH

ik, with q⋆

ikdefined in (36), which com-

pletes the proof.

B. Proof of Lemma 7

Let VνikDν

ikVνH

ikbe the eigenvalue/eigenvector decomposi-

tion of 2τYYνik−Ων

ik , with Dνik

= Diag(dνik); defineYik ,

VνHik

YikVνik

. Eachmin(Yik0)ik∈Ik

LYk(Yk,λ,Ω;Yν) in

(12) can be decomposed inIk separate subproblems, theik-thof which is

minYik

0tr((τYYH

ik−Dν

ik

)Yik

)−λν

iklog det

(σ2ikI+ Yik

).

(37)

We claim that the optimal solution of (37) must be diagonal.This is a consequence of the following two inequalities:Denoting bydiag(Yik) the diagonal matrix having the samediagonal entries ofYik , andoff(Yik) , Yik −diag(Yik ),we have [note thatdiag(Yik) 0]

tr((τYYH

ik−Dν

ik

)Yik

)

≥ tr((τYdiag(Yik)−Dν

ik

)diag(Yik )

),

log det(σ2ikI+ Yik

)≤ log det

(σ2ikI+ diag(Yik )

),

where the first inequality has been proved in (34), while thesecond one is the Hadamard’s inequality. Both inequalitiesaresatisfied with equality if and only ifYik = diag(Yik ).

Replacing in (37)Yik = diag(Yik ) , Diag(yik) andchecking thatyik given by (16) satisfies the KKT system ofthe resulting convex optimization problem, one get the closedform expression (16).

C. Augmented Hessian and Gradient Expressions in (19)

In this section, we provide the closed form expressions forthe augmented Hessian matrices and gradients of the updatingrules given in (19). More specifically, we have

∇λ,vec(Ω∗)D(λ,Ω;Wν) , [∇λD(λ,Ω;Wν);

vec(∇Ω∗D(λ,Ω;Wν))].(38)

Define

Wn , (R⋆(λn;Rν),Q⋆(λn,Ωn;Qν),Y⋆(λn,Ωn;Yν))

and

∇λikD(λn,Ωn;Wν) = hik(W;Qν) |W=Wn

∇Ω∗ikD(λn,Ωn;Wν) = hik(W)|W=Wn

where

hik(W;Qν) , αikR− Rik (Q−ik ,Yik ;Qν)

hik(W) , Yik − Iik(Q).

Then, we have

G , [(vec(∇W∗ hik(Wn;Qν))H)∀ik∈I ;

(∇W∗hik(Wn;Qν))∀ik∈I ],

(39)

with

∇2λ,vec(Ω∗)D(λn,Ωn;Wν) = G ·H(Wn;Wν)−1 ·GH ,

and

H(Wn;Wν) = bdiag(∇2Q∗L,

d2L

dR2,∇2

Y∗L),

Page 12: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

12

where, for notation simplicity, we assumeL =L(Wn,λn,Ωn;Wν).

It follows that

∇2Q∗L = bdiag

((HQjm

(Wn,λn,Ωn;Wν))jm∈I

),

∇2Y∗L = bdiag

((HYjm

(Wn,λn,Ωn;Wν))jm∈I

),

d2L

dR2= τR,

(40)

with

HQjm(Wn,λn,Ωn;Wν) =2 τQ(ITk

⊗ ITk),

HYjm(Wn,λn,Ωn;Wν) = 2 τY(IMik

⊗ IMik) + λn

ik·

[(σ2

ikI+Ynik)

−1H ⊗ (σ2ikI+Yn

ik)−1].

(41)

Finally, we obtain

vec(∇W∗ hik(Wn;Qν)) = [(Mik

jl)Hjl∈I , αik , (N

ikjl)Hjl∈I ]

H ,

∇W∗hik(Wn) = [(Fik

jl)jl∈I

,0M2ik

×1, (Gikjl)jl∈I

],

where

Mikjl

=

(ITk

⊗Π−ik,jl

)vec(ITk), if jl 6= ik

(0Tk⊗ 0Tk

)vec(0Tk), otherwise;

Nikjl

=

(IMik

⊗ (σ2ikI+Yik )

−1)vec(IMik), if jl = ik;

(0Mik⊗ 0Mik

)vec(0Mik), otherwise;

Fikjl

= −(H∗ikl ⊗Hikl) ∀jl ∈ I;

Gikjl

=

(IMik

⊗ IMik), if jl = ik;

(0Mik⊗ 0Mik

), otherwise.

D. Proof of Theorem 10

The proof of convergence follows readily from Proposition9 and [3, Th.2], and thus is omitted. So doest∞ > 0, when thealgorithm does not converge in a finite number of steps. Then,we only need to prove that, ift0 > 0, then t(λ; tν ,βν) > 0,for all ν ≥ 1 andλ ≥ 0. Because of space limitations, weconsider (29) with surrogategi given by (26) only.

It is not difficult to see that, given the current iterate(tν ,βν ,wν), t(λ; tν ,βν) has the following expression:

t(λ; tν ,βν) =tν + τt · (tν)2

τt · tν + λTβν, (42)

whereλ and η are the (nonnegative) multipliers associatedwith the constraintsgik(t, βik ,wk; t

ν , βνik,wν

k) ≤ 0 and (b),respectively. It follows from (42), that ift0 6= 0, then t1 =t0+ γ0(t(λ; t0,β0)− t0) > 0. Therefore, so istν , for ν ≥ 2.

REFERENCES

[1] G. Scutari, F. Facchinei, L. Lampariello, and P. Song, “Paralleland distributed methods for nonconvex optimization,” inProc.of IEEE International Conference on Acoustics, Speech, andSignal Processing (ICASSP 14), Florence, Italy, May 4-9 2014.

[2] P. Song, G. Scutari, F. Facchinei, and L. Lampariello, “D3m:Distributed multi-cell multigroup multicasting,” inProc. ofIEEE International Conference on Acoustics, Speech, and Sig-nal Processing (ICASSP 16), Shangai, China, March 20-252016.

[3] G. Scutari, F. Facchinei, L. Lampariello, and P. Song, “Paralleland distributed methods for nonconvex optimization−Part I:

Theory,” IEEE Trans. Signal Process., (submitted). [Online].Available: arXiv:1410.4754

[4] K. T. Phan, S. A. Vorobyov, C. Telambura, and T. Le-Ngoc,“Power control for wireless cellular systems via D.C. pro-gramming,” in IEEE/SP 14th Workshop on Statistical SignalProcessing, 2007, pp. 507–511.

[5] H. Al-Shatri and T. Weber, “Achieving the maximum sum rateusing D.C. programming in cellular networks,”IEEE Trans.Signal Process., vol. 60, no. 3, pp. 1331–1341, Mar. 2012.

[6] N. Vucic, S. Shi, and M. Schubert, “DC programming approachfor resource allocation in wireless networks,” in8th Interna-tional Symposium on Modeling and Optimization in Mobile, AdHoc and Wireless Networks (WiOpt), 2010, pp. 380–386.

[7] M. Chiang, C. W. Tan, D. P. Palomar, D. O. Neill, andD. Julian, “Power control by geometric programming,”IEEETrans. Wireless Commun., vol. 6, no. 7, pp. 2640–2651, Jul.2007.

[8] A. Khabbazibasmenj, F. Roemer, S. A. Vorobyov, andM. Haardt, “Sum-rate maximization in two-way AF MIMO re-laying: Polynomial time solutions to a class of DC programmingproblems,” IEEE Trans. Signal Process., vol. 60, no. 10, pp.5478–5493, 2012.

[9] Y. Xu, T. Le-Ngoc, and S. Panigrahi, “Global concave min-imization for optimal spectrum balancing in multi-user DSLnetworks,” IEEE Trans. Signal Process., vol. 56, no. 7, pp.2875–2885, Jul. 2008.

[10] R. Cendrillon, J. Huang, M. Chiang, and M. Moonen, “Au-tonomous spectrum balancing for digital subscriber lines,” IEEETrans. Signal Process., vol. 55, no. 8, pp. 4241–4257, Aug.2007.

[11] D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick,“Distributed resource allocation schemes: Pricing algorithms forpower control and beamformer design in interference networks,”IEEE Signal Process. Mag., vol. 26, no. 5, pp. 53–63, Sept.2009.

[12] S.-J. Kim and G. B. Giannakis, “Optimal resource allocationfor MIMO ad hoc cognitive radio networks,”IEEE Trans. onInformation Theory, vol. 57, no. 5, pp. 3117–3131, May 2011.

[13] G. Scutari, F. Facchinei, P. Song, D. P. Palomar, andJ.-S. Pang, “Decomposition by partial linearization: Paralleloptimization of multi-agent systems,”IEEE Trans. SignalProcess., vol. 62, no. 3, pp. 641–656, Feb. 2014. [Online].Available: http://arxiv.org/abs/1302.0756.

[14] R. Zhang and S. Cui, “Cooperative interference managementwith miso beamforming,”IEEE Trans. Signal Process., vol. 58,no. 18, pp. 5450–5458, Oct. 2010.

[15] R. Mochaourab, P. Cao, and E. Jorswieck, “Alternating rate pro-file optimization in single stream mimo interference channels,”IEEE Signal Process. Lett., no. 21, Feb. 2014.

[16] J. Qiu, R. Zhang, Z.-Q. Luo, and S. Cui, “Optimal distributedbeamforming for miso interference channels,”IEEE Trans.Signal Process., vol. 59, no. 11, pp. 5638–5643, Nov. 2011.

[17] L. Liu, R. Zhang, and K.-C. Chua, “Achieving global optimalityfor weighted sum-rate maximization in the k-user gaussian inter-ference channel with multiple antennas,”IEEE Trans. WirelessCommun., vol. 11, no. 5, pp. 1933–1945, May 2012.

[18] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “Linear transceiverdesign for a mimo interfering broadcast channel achieving max-min fairness,”Signal Processing, vol. 93, pp. 3327–3340, Dec.2013.

[19] Y. Zhang, E. Dall’Anese, and G. B. Giannakis, “Distributedoptimal beamformers for cognitive radios robust to channeluncertainties,”IEEE Trans. Signal Process., vol. 60, no. 12,pp. 6495–6508, Dec. 2012.

[20] Y. Yang, G. Scutari, P. Song, and D. P. Palomar, “Robustmimo cognitive radio systems under interference temperatureconstraints,”IEEE J. Sel. Areas Commun., vol. 31, no. 11, pp.2465–2482, Nov. 2013.

[21] F. Wang, M. Krunz, and S. Cui, “Price-based spectrum manage-

Page 13: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

13

ment in cognitive radio networks,”IEEE J. Sel. Topics SignalProcess., vol. 2, no. 1, pp. 74–87, Feb. 2008.

[22] E. Karipidis, N. D. Sidiropoulos, and Z.-Q. Luo, “Quality ofservice and max-min fair transmit beamforming to multiplecochannel multicast groups,”IEEE Trans. on Signal Process.,vol. 56, no. 3, pp. 1268 – 1279, Mar. 2008.

[23] D. Christopoulos, S. Chatzinotas, and B. Ottersten, “Weightedfair multicast multigroup beamforming under per-antenna powerconstraints,”IEEE Trans. on Signal Process., vol. 62, no. 19,pp. 5132 – 5142, Oct. 2014.

[24] G.-W. Hsu, H.-H. Wang, H.-J. Su, and P. Lin, “Joint beam-forming for multicell multigroup multicast with per-cell powerconstraints,” in2014 IEEE 25th Annual Int. Symposium onPersonal, Indoor, and Mobile Radio Communication (PIMRC),Washington, DC, Sept. 2-5 2014, pp. 527 – 532.

[25] Z. Xiang, M. Tao, and X. Wang, “Coordinated multicastbeamforming in multicell networks,”IEEE Trans. on WirelessCommun., vol. 12, no. 1, pp. 12 – 21, Jan. 2013.

[26] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle,“Layering as optimization decomposition: A mathematical the-ory of network architectures,”Proc. IEEE, vol. 95, no. 1, pp.255–312, Jan. 2007.

[27] M. Chiang, P. Hande, T. Lan, and C. W. Tan,Power controlin wireless cellular networks. Foundations and Trends inNetworking, Now Publishers, Jul. 2008, vol. 2, no. 4.

[28] D. P. Palomar and M. Chiang, “Alternative distributed algo-rithms for network utility maximization: Framework and appli-cations,” IEEE Trans. on Automatic Control, vol. 52, no. 12,pp. 2254–2269, Dec. 2007.

[29] S. N. D., D. T. N., and Z.-Q. Luo, “Transmit beamforming forphysical-layer multicasting,”IEEE Trans. on Signal Process.,vol. 54, no. 6, pp. 2239–2251, Jun. 2006.

[30] H. Shen, B. Li, M. Tao, and X. Wang, “Mse-based transceiverdesigns for the mimo interference channel,”IEEE Trans. Wire-less Commun., vol. 9, no. 11, pp. 3480–3489, Sept. 2010.

[31] C.-E. Chen and W.-H. Chung, “An iterative minmax per-streammse transceiver design for mimo interference channel,”IEEEWireless Commun. Lett., vol. 1, no. 3, pp. 229–232, Apr. 2012.

[32] J. Qiu, R. Zhang, Z.-Q. Luo, and S. Cui, “Optimal distributedbeamforming for miso interference channels,”IEEE Trans.Signal Process., vol. 59, no. 11, pp. 5638–5643, Jul. 2011.

[33] D. W. H. Cai, T. Q. S. Quek, and C. W. Tan, “A unified analysisof max-min weighted sinr for mimo downlink system,”IEEETrans. on Signal Process., vol. 59, pp. 3850–3862, Aug. 2011.

[34] G. Scutari, F. Facchieni, J.-S. Pang, and D. P. Palomar,“Realand complex monotone communication games,”IEEE Trans. onInformation Theory, vol. 60, no. 7, pp. 4197–4231, July 2014.

[35] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iterativelyweighted MMSE approach to distributed sum-utility maximiza-tion for a MIMO interfering broadcast channel,”IEEE Trans.Signal Process., vol. 59, no. 9, pp. 4331–4340, Sept. 2011.

[36] B. Gopalakrishnan and N. D. Sidiropoulos, “High performanceadaptive algorithms for single-group multicast beamforming,”IEEE Trans. on Signal Process., vol. 63, no. 16, pp. 4373–4384,Aug. 2015.

[37] E. Karipidis, N. D. Sidiropoulos, and Z.-Q. Luo, “Far-fieldmulticast beamforming for uniform linear antenna arrays,”IEEETrans. on Signal Process., vol. 55, no. 10, pp. 4916–4927, Oct.2007.

[38] G. Dartmann and G. Ascheid, “Equivalent quasi-convex formof the multicast max-min beamforming problem,”IEEE Trans.on Vehicular Technology, vol. 62, no. 9, pp. 4643–4648, Nov.2013.

[39] Z.-Q. Luo, W.-K. Ma, A. M.-C. So, Y. Ye, and S. Zhang,“Semidefinite relaxation of quadratic optimization problems,”IEEE Signal Process. Mag., vol. 27, no. 3, pp. 20–34, May2010.

[40] T.-H. Chang, Z.-Q. Luo, and C.-Y. Chi, “Approximation boundsfor semidefinite relaxation of max-min-fair multicast trans-

mit beamforming problem,”IEEE Trans. on Signal Process.,vol. 56, no. 8, pp. 3932–3943, Aug. 2008.

[41] A. Schad and M. Pesavento, “Max-min fair transmit beamform-ing for multi-group multicasting,” in2012 International ITGWorkshop on Smart Antennas (WSA), Dresden, Germany, Mar.7-8 2012, pp. 115 – 118.

[42] D. Christopoulos, S. Chatzinotas, and B. Ottersten, “Multicastmultigroup beamforming for per-antenna power constrainedlarge-scale arrays,”arXiv preprint, Mar. 2015.

[43] O. Mehanna, K. Huang, B. Gopalakrishnan, A. Konar, andN. D. Sidiropoulos, “Feasible point pursuit and successiveapproximation of non-convex qcqps,”IEEE Signal Process.Lett., vol. 22, no. 7, pp. 804–808, Jul. 2015.

[44] E. Larsson, E. Jorswieck, J. Lindblom, and R. Mochaourab,“Game theory and the flat-fading gaussian interference channel,”IEEE Signal Process. Mag., vol. 26, no. 5, pp. 18–27, Sept.2009.

[45] B. S. Mordukhovich, Variational Analysis and GeneralizedDifferentiation: Basic Theory. Springer, 2006.

[46] R. Rockafellar and J. Wets,Variational Analysis. Springer,1998.

Page 14: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

14

E. Proof of Proposition 2

(i): The proof follows readily by inspection of the MFCQ,written for ProblemPs, and thus is omitted.(ii): We prove the statement in a more general setting. Considerthe following two optimization problems

minx∈K

F (x) , max f1(x), · · · , fI(x) (43)

andminx, t≤0

t

s. t. fi(x) ≤ t, ∀i ∈ I,x ∈ K,

(44)

where eachfi : K → R− is a nonpositive (possiblynonconvex) differentiable function with Lipschitz gradient onK; K ⊆ Rn is a (nonempty) closed and convex set (withnonempty relative interior); andI , 1, · · · , I. We alsoassume that (43) has a solution.

In order to prove Proposition 2(a), it is sufficient to showthe following equivalence between (43) and (44):x⋆ is a d-stationary solution of (43) if and only if there exists at⋆ suchthat (x⋆, t⋆) is a regular stationary point of (44).

We first introduce some preliminary results. Denoting by∂F (x) the set of subgradients ofF at x (see [45, Definition1.77]), we recall that, since eachfi is continuously differen-tiable, we can write [46, Corollary 10.51]

∂F (x) = con (∇fi(x) : i ∈ I(x)) ,

where I (x) , i ∈ I : F (x) = fi(x) and con(A)denotes the convex hull of the setA. Moreover, invoking again[46, Corollary 10.51] and thanks to [46, Theorem 9.16], thedirectional derivative ofF

at x in the directiond, F ′(x;d),exists for everyx ∈ K andd ∈ Rn, and it is given by

F ′(x;d) = max 〈ξ,d〉 : ξ ∈ ∂F (x) . (45)

Using the above definitions, ifx⋆ is a d-stationary solution of(43), there existsα⋆ ∈ RI

+ such that the following holds:∑

i∈I

α⋆i∇fi (x

⋆)T(x− x⋆) ≥ 0, ∀x ∈ K,

i∈I

α⋆i = 1,

α⋆i ≥ 0, ∀i ∈ I (x⋆) ,

α⋆i = 0, ∀i ∈ I \ I (x⋆) .

(46)

On the other hand, if(x, t) is regular stationary for (44), thereexistsλ ∈ RI

+ such that∑

i∈I

λi∇fi(x)T (x− x) ≥ 0, ∀x ∈ K,

0 ≤∑

i∈I

λi − 1 ⊥ t ≤ 0,

0 ≤ λi ⊥ fi(x)− t ≤ 0, ∀i ∈ I.(47)

We are now ready to prove the desired result.(⇒) Let x⋆ be a d-stationary point of (43). By taking(x, t, λ

), (x⋆, F (x⋆),α⋆) and using (46), it is not difficult

to check that the tuple(x, t, λ

)satisfies the KKT conditions

(47). Therefore(x, t) is stationary for (44).

(⇐) Let (x, t) be a regular stationary point of (44). Then,there existsλ ∈ RI

+ such that(x, t, λ

)satisfies KKT con-

ditions (47). Since∑

i∈I λi > 0, it must be t = fi0(x),for somei0 ∈ I. Therefore,t = F (x). Furthermore, sinceI (x) = i ∈ I : t = fi (x), we have λj = 0 for everyj ∈ I \ I (x). Define λi , λi/λ

ave, for all i ∈ I, whereλave,

∑i∈I(x) λi > 0. It follows that

∑i∈I(x) λi = 1, and

i∈I

λi∇fi(x) =∑

i∈I(x)

λi∇fi(x) ∈ ∂F (x) .

Therefore, for everyx ∈ K,

F ′ (x;x− x) = max 〈ξ,x− x〉 : ξ ∈ ∂F (x)

≥∑

i∈I(x)

λi∇fi(x)T (x− x)

=(1/λave

)·∑

i∈I

λi∇fi(x)T (x− x) ≥ 0,

where the last inequality follows from (47) andλave > 0.Thus, x is d-stationary for (43). This completes the proof ofstatement (ii).

F. Proof of Proposition 9

1) Preliminaries: Let us start by introducing some inter-mediate results needed to prove the proposition. For notationsimplicity, let defineyik(w) ,

∑ℓ 6=k w

Hℓ Hikℓwℓ + σ2

ik, for

all ik ∈ Gk andk ∈ KBS.Following the same arguments as in Appendix E [cf. (46)],

if w⋆ is a d-stationary solution of (21), then there existsα⋆ ∈ R

I+ such that(w⋆,α⋆) satisfies

(a) :∑

k∈KBS

ik∈Gk

α⋆ik〈∇w∗uik(w

⋆),w −w⋆〉 ≤ 0, ∀w ∈ W ,

(b) :∑

k∈KBS

ik∈Gk

α⋆ik

= 1,

(c) : α⋆ik≥ 0, ∀ik ∈ I (w⋆) ,

(d) : α⋆ik

= 0, ∀ik /∈ I (w⋆) ,(48)

where uik(w) , wHk Hikkwk/yik(w), ∇w∗uik(w) =

(∇w∗ℓuik(w))ℓ∈KBS, with

∇w∗ℓuik(w) ,

1yik

(w) Hikkwk, ℓ = k

−wH

k Hikkwk

(yik(w))2

∇w∗ℓyik(w), ℓ 6= k;

andI(w⋆),ik : U(w⋆) = uik(w⋆), with U defined in (21).

On the other hand, if(t, β, w) is a regular stationarysolution of (22), there exist multipliers(λ, η, ρ) = ((λik , ηik ,ρik)ik∈Gk

)k∈KBS andζ such that the following KKT conditions(referred, to be precise, to problem (22) where both sides of

Page 15: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

15

constraints (a) are divided byβik > 0) are satisfied

(a′

) :∑

k∈KBS

ik∈Gk

⟨∇w∗

(λik

βik

wHk Hikkwk− ηik yik(w)

),w − w

≤ 0, ∀w ∈ W ,

(b′

) :∑

k∈KBS

ik∈Gk

λik = 1 + ζ,

(c′

) : ηik =λik w

Hk Hikkwk

β2ik

+ ρik , ∀ik ∈ Gk, k ∈ KBS,

(d′

) : 0 ≤ λik ⊥(t−

wHk Hikkwk

βik

)≤ 0, ∀ik ∈ Gk, k ∈ KBS,

(e′

) : 0 ≤ ηik ⊥ (yik(w)− βik) ≤ 0, ∀ik ∈ Gk, k ∈ KBS,

(f′

) : 0 ≤ ρik ⊥ (βik − βmaxik

) ≤ 0, ∀ik ∈ Gk, k ∈ KBS,

(g′

) : 0 ≤ ζ ⊥ t ≥ 0,(49)

where, with a slight abuse of notation, we denoted by∇w∗(wH

k Hikkwk) and∇w∗(yik(w)) the conjugate gradientof wH

k Hikkwk andyik(w), respectively, evaluated atw = w.Note that it must always beρ = 0, as shown next. Suppose

that there exists aik such thatρik > 0. Then, invoking thecomplementarity condition in(f

), we haveβik = βmaxik

. Also,by the definition ofβmax

ik, it must beβik > yik(w) > 0, and

thus ηik = 0 [by complementarity in(e′

)]. It follows from(c

) that−ρik = λik wHk Hikkwk/β

2ik

< 0, which contradictsthe fact thatλikw

Hk Hikkwk/β

2ik≥ 0 (recall thatHikk is a

positive semidefinite matrix).2) Proof of Proposition 9:We prove only statement (ii).

(⇒): Let w⋆ be a d-stationary solution of (21). Partition theset of usersI according toI = Ia(w

⋆) ∪ Ia(w⋆), where

Ia(w⋆) is the set of active users atw⋆ (i.e., the users served

by a BS), defined as

Ia(w⋆) , ik : Hikkw

⋆k 6= 0 , (50)

andIa(w⋆) is its complement. Note that, since allHikk are

nonzero and positive semidefinite, there must exists a feasiblew = (wk)k∈KBS such thatHikkwk 6= 0 for someik ∈ Gk andk ∈ KBS. We distinguish the following two complementarycases: either 1)Ia(w

⋆) 6= ∅; or 2) Ia(w⋆) = ∅. The former

is a degenerate (undesired) case: the d-stationary pointw⋆ isa globalminimizer of U , sinceU(w⋆) = 0. The latter case,implying U(w⋆) > 0, is instead the interesting one, which theglobal optimal solution of (21) belongs to.Case 1:Ia(w

⋆) 6= ∅. Choose(t, β, w, λ, η, ρ, ζ) as follows:t = 0; β is any vectorβ , ((βik)ik∈Gk

)k∈KBS satisfyingthe feasibility conditions in (e

) and (f′

), that is,yik(w⋆) ≤

βik ≤ βmaxik

, for all ik ∈ Gk and k ∈ KBS; w = w⋆; λ =((α⋆

ik)ik∈Gk

)k∈KBS, η = ρ = 0; andζ = 0. We show next thatsuch a tuple satisfies the KKT conditions (49).

Observe preliminary thatuik(w⋆) = 0 if and only if ik ∈

Ia(w⋆), implyingU(w⋆) = uik(w

⋆) = 0, for all ik ∈ Ia(w⋆)

[recall that alluik(w⋆) ≥ 0]; therefore, it must be

I(w⋆) ≡ Ia(w⋆). (51)

Invoking now (d) in (48) and using (51), we haveλik = 0,for all ik ∈ Ia(w

⋆). It follows that

λik · ∇w∗

(wH

k Hikkwk

)= 0, ∀ik ∈ Gik , ∀k ∈ KBS,

which, together withη = 0, make (a′

) in (49) satisfied, withthe LHS equal to zero.

Condition (b′

) follows readily from∑

k∈KBS

∑ik∈Gk

λik =1 [cf. (b) in (48)] andζ = 0.

Condition (c′

) can be checked observing thatλik ·wH

k Hikkwk = 0, for all ik ∈ Gik andk ∈ KBS.Conditions (d

)-(g′

) follow readily by inspection.Case 2: Ia(w

⋆) = ∅. Choose now(t, β, w, λ, η, ρ, ζ) asfollows: t = U(w⋆); β = (yik(w

⋆))ik∈Gk,k∈KBS; w = w⋆;λ = ((α⋆

ik)ik∈Gk

)k∈KBS; η , (ηik)ik∈Gk,k∈KBS, with eachηik = λik · uik(w

⋆)/yik(w⋆); ρ = 0; and ζ = 0.

By substitution, one can see that

k∈KBS

ik∈Gk

⟨∇w∗

(λik

βik

wHk Hikkwk− ηik yik(w)

),w − w

=∑

k∈KBS

ik∈Gk

α⋆ik〈∇w∗uik(w

⋆),w − w〉 ≤ 0, ∀w ∈ W ,

where the last implication follows from (a) in (48); therefore,(a

) in (49) is satisfied.Condition (b

) follows readily from∑

k∈KBS

∑ik∈Gk

λik =1 [cf. (b) in (48)] andζ = 0.

Condition (c′

) follows by inspection.Condition (d

) can be checked as follows. Since by (b′

)there exists aλik > 0, it must bet = wH

k Hikkwk/βik , whichtogether withβik = yik(w

⋆), can be rewritten ast = U(w⋆).Finally, conditions (e

)-(g′

) follow by inspection.(a) (⇐): Let us prove now the converse. Let(t, β, w) be aregular stationary point of (22). One can show that, ifIa(w) 6=∅, w is a d-stationary solution of (21) withU ′(w;w−w) = 0,for all w ∈ W ; we omit further details.

Let us consider now the nondegenerate caseIa(w) = ∅.Let (λ, η, ρ, ζ) be the multipliers such that(t, β, w, λ, η, ρ, ζ)satisfies (49). It follows from(b

) that there exists aλik > 0.Then, we have the following chain of implication (each ofthem due to the condition reported on top of the implication)

λik > 0(d

′)⇒ t =

wHk Hikkwk

βik

> 0

(c′)⇒ ηik > 0(e

′)⇒ βik = yik(w) > 0(d

′)⇒ t = uik(w).

(52)

Denoting byI(w) , ik : t = uik(w) andα⋆ik

,λik

1+ζ,

for all ik ∈ Gk and k ∈ KBS , it follows from (52) and (b′

)that ∑

k∈KBS

∑ik∈Gk

α⋆ik

= 1,

α⋆ik≥ 0, ∀ik ∈ I(w),

α⋆ik

= 0, ∀ik /∈ I(w).

(53)

Using (53), we finally have

∑k∈KBS

∑ik∈Gk

α⋆ik∇w∗ (−uik(w))

= −∑

ik∈I(w) α⋆ik∇w∗uik(w) ∈ ∂(−U(w)),

Page 16: Parallel and Distributed Methods for Nonconvex Optimization ...efficient audio and video streaming and has received increas ing attention by the research community, due to the proliferation

16

where∂(−U(w)) is the set of (conjugate) subgradients of−Uat w. For everyw ∈ W ,

−U ′(w;w− w) = max 〈ξ,w − w〉 : ξ ∈ ∂(−U(w))

≥⟨−∑

ik∈I(w) α⋆ik∇w∗uik(w),w − w

= −1

1 + ζ

·∑

k∈KBS

ik∈Gk

⟨∇w∗

(λik

βik

wHk Hikkwk− ηik yik(w)

),w− w

≥ 0,

where the last inequality is due to(a′

). This proves thatw isd-stationary for (21).