differentially private distributed convex optimization via ...optimizer or on the smoothness and...

Differentially Private Distributed ConvexOptimization via Functional Perturbation

Erfan Nozari Pavankumar Tallapragada Jorge Cortes

Abstract—We study a class of distributed convex constrainedoptimization problems where a group of agents aim to minimizethe sum of individual objective functions while each desires tokeep its function differentially private. We prove the impossibilityof achieving differential privacy using strategies based on per-turbing the inter-agent messages with noise when the underlyingnoise-free dynamics is asymptotically stable. This justifies ouralgorithmic solution based on the perturbation of individualfunctions with Laplace noise. To this end, we establish a generalframework for differentially private handling of functional data.We further design post-processing steps that ensure the perturbedfunctions regain the smoothness and convexity properties ofthe original functions while preserving the differentially privateguarantees of the functional perturbation step. This methodologyallows us to use any distributed coordination algorithm to solvethe optimization problem on the noisy functions. Finally, weexplicitly bound the magnitude of the expected distance betweenthe perturbed and true optimizers and characterize the privacy-accuracy trade-off. Simulations illustrate our results.

Index Terms—Networks of Autonomous Agents, Optimiza-tion, Distributed Algorithms/Control, Differential Privacy, Cyber-Physical Systems

I. INTRODUCTION

Privacy preservation is an increasingly critical issue thatplays a key role in preventing catastrophic failures in physicalinfrastructure as well as easing the social adoption of new tech-nology. Power networks, manufacturing systems, and smarttransportation are just but a few examples of cyberphysicalapplications in need of privacy-aware design of control andcoordination strategies. In these scenarios, the problem ofoptimizing the operation of a group of networked resourcesis a common and important task, where the individual objec-tive functions associated to the entities, the estimates of theoptimizer, or even the constraints on the optimization mightreveal sensitive information. Our work here is motivated bythe goal of synthesizing distributed coordination algorithmsthat accurately solve networked optimization problems withprivacy guarantees.

Literature review: Our work builds upon the existing lit-erature of distributed convex optimization and differentialprivacy. In the area of networked systems, an increasing bodyof research, e.g., [1]–[5] and references therein, designs andanalyzes algorithms for the problem of distributed convexoptimization both in discrete and continuous time as well asin deterministic and stochastic scenarios. While these works

A preliminary version of this paper has been submitted to the 2016American Control Conference, Boston, MA

The authors are with the Department of Mechanical andAerospace Engineering, University of California, San Diego,enozari,ptallapragada,[email protected]

consider an ambitious suite of topics related to convergenceand performance under various constraints imposed by real-world applications, privacy is an aspect generally absent intheir treatment. The concept of differential privacy [6], [7]was originally proposed for databases of individual recordssubject to public queries and has been extended to severalareas thereafter. The recent work [8] provides a comprehensiverecent account of this area. In machine learning, the problemof differentially private optimization has received attention,see e.g. [9]–[14], as an intermediate, usually centralized, stepfor solving other learning or statistical tasks. The commonparadigm is having the sensitive information correspond tothe entries of a finite database of records or training data thatusually constitute the parameters of an objective function. Ingeneral, the proposed algorithms are not distributed and neitherdesigned for nor capable of preserving the privacy of infinite-dimensional objective functions. Furthermore, the work in thisarea does not study the effect of added noise on the globaloptimizer or on the smoothness and convexity properties of theobjective functions. Of more relevance to our paper are recentworks [15]–[17] that study differentially private distributedoptimization problems for multi-agent systems. These papersconsider as private information, respectively, the objectivefunctions, the optimization constraints, and the agents’ states.The underlying commonality is the algorithm design approachbased on the idea of message perturbation. This idea consistsof adopting a standard distributed optimization algorithm andmodifying it by having agents perturb the messages to theirneighbors or a central coordinator with Laplace or Gaussiannoise. This approach has the advantage of working with theoriginal objective functions and thus is easy to implement.However, for fixed design parameters, the algorithm’s outputdoes not correspond to the true optimizer in the absence ofnoise, suggesting the presence of a steady-state accuracy error.The work [16] addresses this problem by adaptively choosingthe design parameters based on the desired level of privacy.Nevertheless, for any fixed level of privacy, there exists anamount of bias in the algorithm’s output which is not dueto the added noise but to the lack of asymptotic stabilityof the underlying noiseless dynamics. To address this issue,our approach explores the use of functional perturbation toachieve differential privacy. The concept of functional differ-ential privacy combines the benefits of metrics and adjacencyrelations. The work [18] also employs metrics instead of binaryadjacency relations in the context of differential privacy. Thisapproach has the advantage that the difference between theprobabilities of events corresponding to any pair of data setsis bounded by a function of the distance between the data

arX

iv:1

512.

0036

9v1

[m

ath.

OC

] 1

Dec

201

5

sets, eliminating the need for the computation of conservativesensitivity bounds.

Statement of contributions: We consider a group of agentsthat seek to minimize the sum of their individual objectivefunctions over a communication network in a differentially pri-vate manner. Our first contribution is to show that coordinationalgorithms which rely on perturbing the agents’ messages withnoise cannot satisfy the requirements of differential privacy ifthe underlying noiseless dynamics is locally asymptoticallystable. The presence of noise necessary to ensure differentialprivacy is known to affect the algorithm accuracy in solv-ing the distributed convex optimization problem. However,this result explains why message-perturbing strategies incuradditional inaccuracies that are present even if no noise isadded. Our second contribution is motivated by the goal ofguaranteeing that the algorithm accuracy is only affected bythe presence of noise. We propose a general framework forfunctional differential privacy over Hilbert spaces and intro-duce a novel definition of adjacency using adjacency spaces.The latter notion is quite flexible and includes, as a specialcase, the conventional bounded-difference notion of adjacency.We carefully specify these adjacency spaces within the L2

space such that the requirement of differential privacy can besatisfied with bounded perturbations. Our third contributionbuilds on these results on functional perturbation to designa class of distributed, differentially private coordination algo-rithms. We let each agent perturb its own objective functionbased on its desired level of privacy, and then the groupuses any provably correct distributed coordination algorithm tooptimize the sum of the individual perturbed functions. Twochallenges arise to successfully apply this strategy: the factthat the perturbed functions might lose the smoothness andconvexity properties of the original functions and the need tocharacterize the effect of the added noise on the minimizerof the resulting problem. We address the first challenge usinga cascade of smoothening and projection steps that maintainthe differential privacy of the functional perturbation step.We address the second challenge by explicitly bounding theabsolute expected deviation from the original optimizer usinga novel Lipschitz characterization of the argmin map. Byconstruction, the resulting coordination algorithms satisfy therequirement of recovering perfect accuracy in the absence ofnoise. Various simulations illustrate our results.

Organization: We introduce our notation and basic pre-liminaries in Section II and formulate the private distributedoptimization problem in Section III. Section IV presents therationale for our design strategy and Section V describesa general framework for functional differential privacy. Weformulate our solution to the private distributed optimizationproblem in Section VI. We present simulations in Section VIIand collect our conclusions and ideas for future work in Sec-tion VIII. Appendix A gathers our results on the Lipschitznessof the argmin map under suitable assumptions.

II. PRELIMINARIES

In this section, we introduce our notational conventionsand some fundamental facts about Hilbert spaces and robuststability of discrete-time systems.

A. Notation

We use R, R>0, Z≥0, and N to denote the set of reals,positive reals, nonnegative integers, and positive integers,respectively. The space of scalar and n-vector valued infinitesequences are denoted by RN and (Rn)N, respectively. GivenK ∈ N and an element η = η(k)∞k=0 of RN or (Rn)N, weuse the shorthand notation ηK = η(k)Kk=0. If the index of ηstarts at k = 1, with a slight abuse of notation we also denoteη(k)Kk=1 by ηK . We denote by `2 ⊂ RN the space of square-summable infinite sequences. We use | · |p and ‖ · ‖p for the p-norm in finite and infinite-dimensional normed vector spaces,respectively (we drop the index p for p = 2). We let B(c, r)denote the closed ball with center c and radius r in Euclideanspace. For D ⊆ Rd, Do denotes its interior and L2(D)and C2(D) denote the set of square-integrable measurablefunctions and the set of twice continuously differentiablefunctions over D, respectively. Throughout the paper, m(·)denotes the Lebesgue measure. If Ek∞k=1 is a sequence ofsubsets of Ω such that Ek ⊆ Ek+1 and E =

⋃k Ek, then

we write Ek ↑ E as k → ∞. We say Ek ↓ E as k → ∞ ifEck ↑ Ec as k →∞, where Ec = Ω \E is the complement ofE. Given any closed and convex subset S ⊆ H of a Hilbertspace, we denote by projS the orthogonal projection operatoronto S.

We denote by K the set of strictly increasing continuousfunctions α : [0,∞)→ [0,∞) such that α(0) = 0. A functionα belongs to K∞ if α ∈ K and limr→∞ α(r) =∞. We denoteby KL the set of functions β : [0,∞)× [0,∞)→ [0,∞) suchthat, for each s ∈ [0,∞), r 7→ β(r, s) is nondecreasing andcontinuous and β(0, s) = 0 and, for each r ∈ [0,∞), s 7→β(r, s) is monotonically decreasing with β(r, s)→ 0 as s→∞. A map M : X → Y between two normed vector spacesis K-Lipschitz if there exists κ ∈ K∞ such that ‖M(x1) −M(x2)‖Y ≤ κ(‖x1 − x2‖X) for all x1, x2 ∈ X .

The (zero-mean) Laplace distribution with scale b ∈ R>0 isa continuous distribution with probability density function

L(x; b) =1

2be−|x|b .

It is clear that L(x;b)L(y;b) ≤ e

|x−y|b . We use η ∼ Lap(b) to denote

a random variable η with Laplace distribution. It is easy to seethat if η ∼ Lap(b), |η| has an exponential distribution with rateλ = 1

b . Similarly, we use the notation η ∼ N (µ, σ2) when η isnormally distributed with mean µ and variance σ2. The errorfunction erf : R→ R is defined as

erf(x) ,1√π

∫ x

−xe−t

2

dt ≥ 1− e−x2

.

Therefore, P|η| ≤ r = erf(r/√

2σ) if η ∼ N (0, σ2). Givenany random variable η and any convex function φ, Jensen’s in-equality states that E[φ(η)] ≥ φ(E[η]). The opposite inequalityholds if φ is concave.

B. Hilbert Spaces and Orthonormal Bases

We review some basic facts on Hilbert spaces and referthe reader to [19] for a comprehensive treatment. A Hilbertspace H is a complete inner-product space. A set ekk∈I ⊂

H is an orthonormal system if 〈ek, ej〉 = 0 for k 6= j and〈ek, ek〉 = ‖ek‖2 = 1 for all k ∈ I . If, in addition, theset of linear combinations of elements of ekk∈I is densein H, then ekk∈I is an orthonormal basis. Here, I might beuncountable: however, ifH is separable (i.e., it has a countabledense subset), then any orthonormal basis is countable. In thiscase, for any h ∈ H,

h =

∞∑k=1

〈h, ek〉ek.

We define the coefficient sequence θ ∈ RN by θk = 〈h, ek〉 fork ∈ N. Then, θ ∈ `2 and, by Parseval’s identity, ‖h‖ = ‖θ‖.For ease of notation, we define Φ : `2 → H to be the linearbijection that maps the coefficient sequence θ to h. For anarbitrary D ⊆ Rd, Lp(D) is a Hilbert space if and only ifp = 2, and the inner product is the integral of the product offunctions. Moreover, L2(D) is separable. In the remainder ofthe paper, we assume ek∞k=1 is an orthonormal basis forL2(D) and Φ : `2 → L2(D) is the corresponding linearbijection between coefficient sequences and functions.

C. Robust Stability of Discrete-Time Systems

We briefly present some definitions and results on robuststability of discrete-time systems following [20]. Given thevector field f : Rn × Rm → Rn, consider the associateddynamical system

x(k + 1) = f(x(k), η(k)), (1)

with state x : Z≥0 → Rn and input η : Z≥0 → Rm. Givenan equilibrium point x∗ ∈ Rn of the unforced system, we saythat (1) is

(i) 0-input locally asymptotically stable (0-LAS) relative tox∗ if by setting η = 0, there exists ρ > 0 and γ ∈ KLsuch that, for every initial condition x(0) ∈ B(x∗, ρ),we have for all k ∈ Z≥0,

|x(k)− x∗| ≤ γ(|x(0)− x∗|, k).

(ii) locally input-to-state stable (LISS) relative to x∗ if thereexist ρ > 0, γ ∈ KL, and κ ∈ K such that, forevery initial condition x(0) ∈ B(x∗, ρ) and every inputsatisfying ‖η‖∞ ≤ ρ, we have

|x(k)− x∗| ≤ maxγ(|x(0)− x∗|, k), κ(|ηk−1|∞), (2)

for all k ∈ N. In this case, we refer to ρ as the LISSradius of (1) relative to x∗.

By definition, if the system (1) is LISS, then it is also 0-LAS. The converse is also true, cf. [20, Theorem 1]. Thefollowing result is a local version of [21, Lemma 3.8] andstates an important asymptotic behavior of LISS systems.

Proposition II.1. (Asymptotic gain of LISS systems): Assumesystem (1) is LISS relative to x∗ with associated LISS radius ρ.If x(0) ∈ B(x∗, ρ) and ‖η‖∞ ≤ minκ−1(ρ), ρ (whereκ−1(ρ) =∞ if ρ is not in the range of κ), then

lim supk→∞

|x(k)− x∗| ≤ κ(lim supk→∞

|η(k)|).

In particular, x(k)→ x∗ if η(k)→ 0 as k →∞.

Proof: From (2), we have

|x(k)− x∗| ≤ maxγ(|x(0)− x∗|, k), κ(‖η‖∞)≤ maxγ(ρ, k), κ(‖η‖∞), (3)

where we have used x(0) ∈ B(x∗, ρ). Now, for each k ∈ N, letη[k] ∈ (Rn)N be defined by η[k](`) = η(k+`) for all ` ∈ Z≥0.If there exists k0 such that ‖η[k0]‖∞ = 0, then we need toshow that limk→∞ |x(k) − x∗| = 0. Since γ ∈ KL, thereexists K ∈ Z≥0 such that γ(ρ, k) ≤ ρ for all k ≥ K, and sinceκ(‖η‖∞) ≤ ρ as well, it follows from (3) that x(k) ∈ B(x∗, ρ)for all k ≥ K. Let k = maxk0,K. Using (2), we get

|x(k)− x∗| ≤ γ(|x(k)− x∗|, k − k), ∀k > k,

and the result follows. Assume then that no k0 exists suchthat ‖η[k0]‖∞ = 0. Let K0 = 0 and, for each j ∈ N, letKj be such that γ(ρ, k − Kj−1) ≤ κ(‖η[Kj−1]‖∞) for allk ≥ Kj (this sequence is well-defined because γ ∈ KL).Since κ(‖η[Kj−1]‖∞) ≤ κ(‖η‖∞) ≤ ρ, (2) holds if we set the“initial” state to x(Kj−1) which implies that |x(k) − x∗| ≤κ(‖η[Kj−1]‖∞) for all k ≥ Kj . Therefore,

lim supk→∞

|x(k)− x∗| ≤ κ(‖η[Kj ]‖∞), ∀j ∈ Z≥0.

The result follows by taking limit of both sides as j →∞.

III. PROBLEM STATEMENT

Consider a group of n agents whose communication topol-ogy is described by a digraph G. Each agent i ∈ 1, . . . , nhas a local objective function fi : D → R, where D ⊂ Rd isconvex and compact and has nonempty interior. We assumethat each fi, i ∈ 1, . . . , n is convex and twice continuouslydifferentiable, and use the shorthand notation F = fini=1.Consider the following convex optimization problem

minimizex∈D

f(x) ,n∑i=1

fi(x)

subject to G(x) ≤ 0,

Ax = b,

where the component functions of G : D → Rm are convex,A ∈ Rs×d, and b ∈ Rs. Denote by X ⊆ D the feasibility set.The optimization problem can be equivalently written as,

minimizex∈X

f(x). (4)

We assume that X is a global piece of information knownto all agents, while each function fi is sensitive-informationknown to agent i ∈ 1, . . . , n and has to be kept privatefrom other agents or possibly adversaries with access to theinter-agent messages.

The group objective is to solve the convex optimizationproblem (4) in a distributed and private way. By distributed,we mean that each agent can only interact with its neighborsin the graph G. In order to properly define privacy, let usfirst introduce the notion of adjacency. Given any normedvector space (V, ‖ · ‖V) with V ⊆ L2(D), two sets of

functions F (1), F (2) ⊂ L2(D) are V-adjacent if there existsi0 ∈ 1, . . . , n such that

f(1)i = f

(2)i , i 6= i0 and f

(1)i0− f (2)

i0∈ V.

The set V is a design choice that we specify later. Moreover,this definition can be readily extended to the case where Vis any subset of another normed vector space W ⊆ L2(D).With this generalization, the conventional bounded-differencenotion of adjacency becomes a special case of the definitionabove, where V is a closed ball around the origin. We providenext a general definition of differential privacy for a map.

Definition III.1. (Differential Privacy): Let (Ω,Σ,P) be aprobability space and consider a random map

M : L2(D)n × Ω→ X

from the function space L2(D)n to an arbitrary set X . Givenε ∈ Rn>0, the map M is ε-differentially private if, for anytwo V-adjacent sets of functions F (1) and F (2) that (at most)differ in their i0’th element and any set O ⊆ X , one has

Pω ∈Ω | M(F (2), ω) ∈ O (5)

≤ eεi0‖f(1)i0−f(2)

i0‖VPω ∈ Ω | M(F (1), ω) ∈ O. •

Essentially, this notion requires the statistics of the outputof M to change only (relatively) slightly if the objectivefunction of one agent changes (and the change is in V), makingit hard to an “adversary” that observes the output of M todetermine this change. In the case of an iterative asymptoticdistributed optimization algorithm, one should think of Mas representing the action (observed by the adversary) of thealgorithm on the set of local functions F . In other words, Mis the map (parameterized by the initial network condition)that assigns to F the whole sequence of messages transmittedover the network. In this case, (5) has to hold for all allowablevalues of the initial conditions. We are ready to formally statethe network objective.

Problem 1. (Differentially private distributed optimization):Design a distributed and differentially private optimizationalgorithm whose guarantee on accuracy improves as the levelof privacy decreases, leading to the exact optimizer of theaggregate objective function in the absence of privacy. •

The reason for the requirement of recovering the exactoptimizer in the absence of privacy in Problem 1 is thefollowing. It is well-known in the literature of differentialprivacy that there always exists a cost for an algorithm tobe differentially private, i.e., the algorithm inevitably suffersa performance loss that increases as the level of privacyincreases. This phenomenon is a result of the noise addedin the map M, whose variance increases as ε decreases. Withthis requirement on the noise-free behavior of the algorithm,we aim to make sure that the cause of this performance lossis only due to the added noise and not to any other factor.

IV. RATIONALE FOR DESIGN STRATEGY

In this section, we discuss two algorithm design strategiesto solve Problem 1 based on the perturbation of either inter-agent messages or the local objective functions. We point

out an important limitation of the former, and this providesjustification for the ensuing design of our objective-perturbingalgorithm based on functional differential privacy.

A. Limitations of Message-Perturbing Strategies

We use the term message-perturbing strategy to refer tothe result of modifying any of the distributed optimizationalgorithms available in the literature by adding (Gaussian orLaplace) noise to the messages agents send to either neighborsor a central aggregator in order to preserve privacy. A genericmessage-perturbing distributed algorithm takes the form

x(k + 1) = aI(x(k), ξ(k)),

ξ(k) = x(k) + η(k),(6)

where ξ, η : Z≥0 → Rn represent the sequences of messagesand perturbations, respectively, and aI : Rn × Rn → Rndepends on the set of agents’ sensitive information withassociated optimizer x∗I . This formulation is quite generaland can also encode algorithmic solutions for optimizationproblems other than the one in Section III. In the problem ofinterest here, we have I = F = fini=1.

The following result provides conditions on the noise vari-ance that ensure that the noise vanishes asymptotically almostsurely and remains bounded with nonzero probability.

Lemma IV.1. (Convergence and boundedness of Laplace andnormal random sequences with decaying variance): Let η bea sequence of independent random variables defined over thesample space Ω = RN, with η(k) ∼ Lap(b(k)) or η(k) ∼N (0, b(k)) for all k ∈ N. Given r > 0, consider the events

E = η ∈ Ω | limk→∞

η(k) = 0,

Fr = η ∈ Ω | ∀k ∈ N |η(k)| ≤ r.

If b(k) is O( 1kp ) for some p > 0, then P(E) = 1 and P(Fr) =

P(Fr ∩ E) > 0 for all r > 0.

Proof: First, consider the case where η(k) ∼ Lap(b(k)).By the independence of the random variables and the fact that|η(k)| is exponentially distributed with rate 1

b(k) ,

P(Fr) =

∞∏k=1

(1− e−

rb(k)

).

By assumption, b(k) ≤ ckp for all k ∈ N and some p, c > 0.

Therefore,

P(Fr) ≥∞∏k=1

(1− e− rc k

p)> 0,

because the series∑∞k=1 e

− rc kp

converges [22, §1.14]. Next,let E`,K = η ∈ Ω | ∀k ≥ K |η(k)| < υ` where υ`∞`=1

is a monotonically decreasing sequence that converges to zeroas `→∞ (e.g., υ` = 1

` ). Note that

E =

∞⋂`=1

∞⋃K=1

E`,K .

E`,K ↑⋃∞K=1E`,K for all ` ∈ N as K → ∞, and⋃∞

K=1E`,K ↓ E as `→∞. Therefore,

P(E) = lim`→∞

limK→∞

P(E`,K) = lim`→∞

limK→∞

∞∏k=K

(1− e−

υ`b(k)

)≥ lim`→∞

limK→∞

∞∏k=K

(1− e−

υ`c k

p)

= 1.

Then, P(Fr ∩ E) = P(Fr) − P(Fr ∩ Ec) = P(Fr) > 0. Forthe case of normal distribution of random variables,

P|η(k)| ≤ r = erf(

r√2b(k)

)≥ 1− e−

r2

2b(k) ,

and the results follows from the arguments above.Note that Lemma IV.1 also ensures that the probability

that the noise simultaneously converges to zero and remainsbounded is nonzero. One might expect that Lemma IV.1 wouldhold if b(k) → 0 at any rate. However, this is not true. Forinstance, if b(k) = 1

log k , one can show that the probabilitythat η(k) eventually remains bounded is zero for any boundr ≤ 1, so the probability that η(k)→ 0 is zero as well.

The following result shows that a message-perturbing al-gorithm of the form (6) cannot achieve differential privacyif the underlying (noise-free) dynamics is asymptotically sta-ble. For convenience, we employ the short-hand notationaI(x(k), η(k)) = aI(x(k), x(k) + η(k)) to refer to (6).

Proposition IV.2. (Impossibility result for 0-LAS message-perturbing algorithms): Consider the dynamics (6) with eitherηi(k) ∼ Lap(bi(k)) or η(k) ∼ N (0, bi(k)). If aI is 0-LASrelative to x∗I for two information sets I(1) and I(2) withdifferent optimizers x∗I(1) 6= x∗I(2) and associated LISS radiiρ(`), ` = 1, 2, bi(k) is O( 1

kp ) for all i ∈ 1, . . . , n and somep > 0, and at least one of the following holds,

(i) x∗I(1) is not an equilibrium point of x(k + 1) =aI(2)(x(k), 0) and aI(2) is continuous,

(ii) x∗I(1) belongs to the interior of B(x∗I(2) , ρ(2)),

then the algorithm (6) does not preserve the ε-differentiallyprivacy of the information set I for any ε > 0.

Proof: For any fixed initial state x0, if either of ξ or ηis known, the other one can be uniquely determined from (6).Therefore, the mapping ΞI,x0

: (Rn)N → (Rn)N such that

ΞI,x0(η) = ξ

is well-defined and bijective. For ` = 1, 2, let κ(`) ∈ K be thefunction κ in (2) corresponding to aI(`) . Consider as initialcondition x0 = x∗I(1) and define

R(1) =η ∈ Ω | ∀i ∈ 1, . . . , n, lim

k→∞ηi(k) = 0

and |ηi(k)| ≤ minκ−1

(1)(ρ(1)), ρ(1)

, ∀k ∈ N

.

By Lemma IV.1, we have P(R(1)) > 0. By Proposi-tion II.1, since |x0 − x∗I(1) | = 0 ≤ ρ(1) and ‖η(1)‖∞ ≤min

κ−1

(1)(ρ(1)), ρ(1)

for all η(1) ∈ R(1), the sequence

ΞI(1),x0(η(1)) converges to x∗I(1) . Let O = ΞI(1),x0

(R(1)) andR(2) = Ξ−1

I(2),x0(O). Next, we show that no η(2) ∈ R(2) con-

verges to 0 under either hypothesis (i) or (ii) of the statement.

Under (i), there exists a neighborhood of (x∗I(1) , 0) ∈ R2n

in which the infimum of the absolute value of at least oneof the components of aI(2)(x, η) is positive, so whenever(x, η) enters this neighborhood, it exits it in finite time.Therefore, given that any x ∈ O converges to x∗I(1) , noη(2) ∈ R(2) can converge to zero. Under (ii), there existsa neighborhood of x∗I(1) included in B(x∗I(2) , ρ

(2)). SinceΞI(2),x0

(η(2)) → x∗I(1) , there exists K ∈ N such thatΞI(2),x0

(η(2))(k) belongs to B(x∗I(2) , ρ(2)) for all k ≥ K.

Therefore, if |η(2)(k)| ≤ minκ−1

(2)(ρ(2)), ρ(2)

indefinitely

after any point of time, ΞI(2),x0(η(2)) → x∗I(2) by Proposi-

tion II.1 which is a contradiction, so η(2) cannot converge tozero. In both cases, by Lemma IV.1,

P(R(2)) = 0,

which, together with P(R(1)) > 0 and the definition of ε-differential privacy, cf. (5), implies the result.

Note that the hypotheses of Proposition IV.2 are mild andeasily satisfied in most cases. In particular, the result holdsif the dynamics is continuous and globally asymptoticallystable relative to x∗I for two information sets. The main take-away message of this result is that a globally asymptoticallystable distributed optimization algorithm cannot be madedifferentially private by perturbing the inter-agent messageswith asymptotically vanishing noise. This observation is atthe core of the design choices made in the literature regardingthe use of stepsizes with finite sum to make the zero-inputdynamics not asymptotically stable, thereby causing a steady-state error in accuracy which is present independently ofthe amount of noise injected for privacy. For instance, thealgorithmic solution proposed in [15] chooses a finite-sumsequence of stepsizes that leads to a dynamical system whichis not 0-GAS, see Figure 1. Similar observations can be

ǫ

102

103

104

105

106

|x−

x∗|

10-1

100

101

102

103

104

Empirical dataPiecewise linear fit of log |x∗

− x∗| against log ǫ

Theoretical upper bound on |E[x∗]− x∗|

Fig. 1. Privacy-accuracy trade-off for the algorithm proposed in [15] for agroup of n = 10 agents. With that paper’s notation, the algorithm’s parametersare set to q = 0.1, p = 0.11, c = 0.5. The stepsize γt = cqt−1 hasfinite sum. The individual objective functions are two-dimensional quadraticfunctions defined on D = [−50, 50]2 with minimizers chosen uniformlyrandomly on the unit circle X = S1. The circles, dotted line, and solid lineillustrate simulation results for 50 executions, their best linear fit in logarithmicscale, and the upper bound on accuracy provided in [15], respectively.

made in the scenario considered in [16], where the agents’local constraints are the sensitive information (instead of theobjective function). This algorithmic solution uses a constant-variance noise, which would make the dynamics unstable

if executed over an infinite time horizon. This problem iscircumvented by having the algorithm terminate after a finitenumber of steps, and optimizing this number offline as afunction of the desired level of privacy ε.

B. Algorithm Design via Objective Perturbation

To overcome the limitations of message-perturbing strate-gies, here we outline an alternative design strategy to solveProblem 1 based on the perturbation of the agents’ objectivefunctions. The basic idea is to have agents independentlyperturb their objective functions in a differentially private wayand then have them participate in a distributed optimizationalgorithm with the perturbed objective functions instead oftheir original ones. The following result, which is a specialcase of [23, Theorem 1], ensures that the combination withthe distributed optimization algorithm does not affect thedifferential privacy at the functional level.

Proposition IV.3. (Resilience to post-processing): Let M :L2(D)n×Ω→ L2(D)n be an ε-differentially private map andF : L2(D)n → X , where (X ,ΣX ) is an arbitrary measurablespace. Then, F M : L2(D)n × Ω → X is ε-differentiallyprivate.

Proof: Consider the σ-algebra P(L2(D)n) on L2(D)n

where P denotes the power set. With the notation from [23,Theorem 1], the map M2 = F M is a deterministic functionof the output of the map M1 =M. Then, it is easy to verifythat, for any S ∈ ΣX ,

P(M2(F ) ∈ S |M1(F )) = χS(F(M1(F ))),

(with χ· the indicator function) is measurable as a function ofM1(F ) (because F and S are trivially measurable) and definesa probability measure on (X ,ΣX ) (associated to a singleton),so it is a probability kernel.

Our design strategy based on the perturbation of the in-dividual objective functions requires solving the followingchallenges:

(i) establishing a differentially private procedure to perturbthe individual objective functions;

(ii) ensuring that the resulting perturbed functions enjoythe smoothness and regularity properties required bydistributed optimization algorithms to converge;

(iii) with (i) and (ii) in place, characterizing the accuracy ofthe resulting differentially private, distributed coordina-tion algorithm.

Section V addresses (i) and Section VI deals with (ii)and (iii).

V. FUNCTIONAL DIFFERENTIAL PRIVACY

We explore here the concept of functional differential pri-vacy to address the challenge (i) laid out in Section IV-B. Thegenerality of this notion makes it amenable for problems wherethe sensitive information is a function or some of its attributes(e.g., sample points, optimizers, derivatives and integrals). Forsimplicity of exposition and without loss of generality, welimit our discussion to the privacy of a single function.

A. Functional Perturbation via Laplace Noise

Let f ∈ L2(D) be a function whose differential privacy hasto be preserved. With the notation of Section II, we decomposef into its coefficients Φ−1(f) and perturb this sequence byadding noise to all of its elements. Specifically, we set

M(f,η) = Φ(Φ−1(f) + η

)= f + Φ(η), (7)

where

ηk ∼ Lap(bk), (8)

for all k ∈ N. Clearly, for η to belong to `2 and for the seriesΦ(η) to converge, the scales bk∞k=1 cannot be arbitrary. Thenext result addresses this issue.

Lemma V.1. (Sufficient condition for boundedness of per-turbed functions): If there exists K ∈ N such that, for somep > 1

2 and s > 1,

bk ≤1

kp log ks, ∀k ≥ K, (9)

then η defined by (8) belongs to `2 with probability one. Inparticular, if for some p > 1

2 and γ > 0,

bk ≤γ

kp, ∀k ∈ N, (10)

then η defined by (8) belongs to `2 with probability one.

Proof: Equation (9) can be equivalently written ase− 1kpbk ≤ 1

ks , for k ≥ K. In particular, this implies that∑∞k=1 e

− 1kpbk is convergent. Therefore [22, §1.14],

∏∞k=1

(1−

e− 1kpbk

)converges (i.e., the limit exists and is nonzero), so

1 = limK→∞

∞∏k=K

(1− e−

1kpbk

)= limK→∞

P(EK),

where EK =η ∈ RN | ∀k ≥ K, |ηk| ≤ 1

kp

and we have

used the fact that |ηk| is exponentially distributed with rate1bk

. Since EK ↑⋃∞K=1EK as K →∞, we have

1 = P( ∞⋃K=1

EK)

= Pη ∈ RN | ∃K ∈ N s.t. ∀k ≥ K, |ηk| ≤

1

kp

≤ Pη ∈ `2,

as stated. If equation (10) holds, we define p = 12 (p+ 1

2 ) andequivalently write (10) as

bk ≤1

kpγ

kp−p, ∀k ∈ N.

Since p− p > 0, for any s > 1 there exists K ∈ N such thatkp−p ≥ γ log ks for all k ≥ K, and the result follows.

Having established conditions on the noise variance underwhich the map (7) is well defined, we next turn our attentionto establish its differentially private character.

B. Differential Privacy of Functional Perturbation

Here we establish the differential privacy of the map (7).In order to do so, we first specify our choice of adjacencyspace. Given q > 1, consider the weight sequence kq∞k=1

and define the adjacency vector space to be the image of theresulting weighted `2 space under Φ, i.e.,

Vq = Φ(

δ ∈ RN |∞∑k=1

(kqδk)2 <∞).

It is not difficult to see that Vq is a vector space. Moreover,

‖f‖Vq ,( ∞∑k=1

(kqδk)2) 1

2

, with δ = Φ−1(f),

is a norm on Vq . The next result establishes the differentialprivacy of the map (7) for an appropriately chosen noise scalesequence b.

Theorem V.2. (Differential privacy of functional perturba-tion): Given q > 1, γ > 0 and p ∈

(12 , q −

12

), let

bk =γ

kp, k ∈ N. (11)

Then, the map (7) is ε-differentially private with

ε =1

γ

√ζ(2(q − p)), (12)

where ζ is the Riemann zeta function.

Proof: Note that (11) ensures, by Lemma V.1, that ηbelongs to `2 almost surely. Consider functions f (1) and f (2)

with f (1) − f (2) ∈ Vq and an arbitrary set O ⊆ L2(D). For` = 1, 2, we have

Pf (`) + Φ(η(`)) ∈ O = Pη(`) ∈ Φ−1(O − f (`))

= limK→∞

∫Φ−1K (O−f(`))

LK(η(`)K ;bK)dη

(`)K ,

where Φ−1K (·) returns the first K coefficients of Φ−1(·),

LK(η(`)K ;bK) =

K∏k=1

L(η(`)k ; bk),

and Φ−1K (O− f (`)) denotes the inverse image of the set O−

f (`) = g ∈ L2(D) | g + f (`) ∈ O. By linearity of ΦK , wehave

Φ−1K (O − f (2)) = Φ−1

K (O − f (1)) + δK ,

where δ = Φ−1(f (1) − f (2)). Therefore,

Pf (2)+Φ(η(2)) ∈ O

= limK→∞

∫Φ−1K (O−f(1))

LK(η(1)K + δK ;bK)dη

(1)K .

Note that

LK(η(1)K + δK ;bK)

LK(η(1)K ;bK)

=

K∏k=1

L(η(1)k + δk; bk)

L(η(1)k ; bk)

≤ e∑Kk=1

|δk|bk .

After multiplying both sides by LK(η(1)K ;bK), integrating

over Φ−1K (O − f (1)) and letting K →∞, we have

Pf (2) + Φ(η(2)) ∈ O ≤ e∑∞k=1

|δk|bk Pf (1) + Φ(η(1)) ∈ O.

Finally, the coefficient of the exponential can be upperbounded using Holder’s inequality with p = q = 2 as

∞∑k=1

|δk|bk

=

∞∑k=1

kq|δk|kqbk

≤

( ∞∑k=1

1

(kqbk)2

) 12( ∞∑k=1

(kqδk)2

) 12

=1

γ

√ζ(2(q − p))

∥∥∥f (1) − f (2)∥∥∥Vq,

which completes the proof.

VI. DIFFERENTIALLY PRIVATE DISTRIBUTEDOPTIMIZATION

In this section, we employ functional differential privacyto solve the differentially private distributed optimizationproblem formulated in Section III for a group of n ∈ Nagents. For convenience, we introduce the shorthand notationS0 = C2(D) ⊂ L2(D) and, for given u > 0, 0 < α < β,

S = h ∈ S0 | |∇h(x)| ≤ u, ∀x ∈ Dand αId ≤ ∇2h(x) ≤ βId, ∀x ∈ Do,

for twice continuously differentiable functions with boundedgradients and Hessians. In the rest of the paper, we assume thatthe agents’ local objective functions f1, . . . , fn belong to S.

A. Smoothness and Regularity of the Perturbed Functions

We address here the challenge (ii) laid out in Section IV-B.To exploit the framework of functional differential privacy foroptimization, we need to ensure that the perturbed functionshave the smoothness and regularity properties required by thedistributed coordination algorithm. In general, the output (7)might neither be smooth nor convex. We detail next how toaddress these problems by defining appropriate maps that,when composed with M in (7), yield functions with thedesired properties. Proposition IV.3 ensures that differentialprivacy is retained throughout this procedure.

1) Ensuring Smoothness: To ensure smoothness, we rely onthe fact that S0 is dense in L2(D) and, therefore, given anyfunction g in L2(D), there exists a smooth function arbitrarilyclose to it, i.e.,

∀ε > 0, ∃gs ∈ S0 such that ‖g − gs‖ < ε.

Here, ε is a design parameter and can be chosen sufficientlysmall (later, we show how to do this so that the accuracy ofthe coordination algorithm is not affected).

2) Ensuring Strong Convexity and Bounded Hessian: Thenext result ensures that the orthogonal projection from S0 ontoS is well defined, and can therefore be used to ensure strongconvexity and bounded Hessian of the perturbed functions.

Proposition VI.1. (Convexity of S and closedness relative toS0): The set S is convex and closed as a subset of S0 underthe 2-norm.

Proof: The set S is clearly convex because, if h1, h2 ∈ Sand λ ∈ [0, 1], then for all x ∈ Do,

∇2((1− λ)h1(x) + λh2(x)) = (1− λ)∇2h1(x) + λ∇2h2(x)

≥ (1− λ)αId + λαId = αId.

Similarly, ∇2((1− λ)h1(x) + λh2(x)) ≤ βId. Also,

|∇((1− λ)h1(x) + λh2(x))| ≤ (1− λ)|∇h1(x)|+ λ|∇h2(x)|≤ (1− λ)u+ λu ≤ u,

for all x ∈ D. To establish closedness, let

S1 = h ∈ S0 | αId ≤ ∇2h(x) ≤ βId, ∀x ∈ Do,S2 = h ∈ S0 | |∇h(x)| ≤ u, ∀x ∈ D.

Since S = S1 ∩ S2, it is enough to show that S1 and S2 areboth closed subsets of S0.

To show that S1 is closed, let hk∞k=1 be a sequence of

functions in S1 such that hk‖·‖2−−→ h ∈ S0. We show that

h ∈ S. Since hk− α2 |x|

2 ‖·‖2−−→ h− α2 |x|

2 and L2 convergenceimplies pointwise convergence of a subsequence almost every-where, there exists hk`∞`=1 and Y ⊂ D such that D \Y haszero (Lebesgue) measure and hk`(x)− α

2 |x|2 → h(x)− α

2 |x|2

for all x ∈ Y . It is straightforward to verify that Y is densein D and therefore Y ∩ Do is dense in Do. Then, by [24,Theorem 10.8], h− α

2 |x|2 is convex on Do, so αI2 ≤ ∇2h(x)

for all x ∈ Do. Similarly, one can show that ∇2h(x) ≤ βIdfor all x ∈ Do. Therefore, h ∈ S1.

Next, we prove the closedness of S2 by contradiction.Assume that hk∞k=1 is a sequence of functions in S2 such

that hk‖·‖2−−→ h ∈ S0 but h /∈ S2. Therefore, there exist

x0 ∈ Do such that |∇h(x0)| > u and, by continuity of ∇h,δ0 > 0 and υ0 > 0 such that

|∇h(x)| ≥ u+ υ0, ∀x ∈ B(x0, δ0) ⊆ D.

Let u0 = ∇h(x0)|∇h(x0)| . By continuity of ∇h, for all υ1 > 0 there

exists δ1 ∈ (0, δ0] such that

∇h(x) · u0 ≥ (1− υ1)|∇h(x)|, ∀x ∈ B(x0, δ1).

As mentioned above, L2 convergence implies pointwise con-vergence of a subsequence hk`∞`=1 almost everywhere. Inturn, this subsequence converges to h almost uniformly, i.e.,for all υ2 > 0 and all υ3 > 0, there exist E ⊂ D and L ∈ Nsuch that m(E) < υ2 and

|hk`(x)− h(x)| < υ3, ∀x ∈ D \ E and ` ≥ L. (13)

For ease of notation, let δ2 = δ1/2. Using the fundamentaltheorem of line integrals [25], for all x ∈ B(x0, δ2) \ E,

h(x+ δ2u0)− h(x) =

∫ x+δ2u0

x

∇h · dr

=

∫ x+δ2u0

x

∇h · u0|dr| ≥∫ x+δ2u0

x

(1− υ1)|∇h||dr|

≥ (1− υ1)(u+ υ0)δ2. (14)

Similarly, for all x ∈ B(x0, δ2) \ E and all ` ∈ N,

hk`(x+ δ2u0)− hk`(x) =

∫ x+δ2u0

x

∇hk` · dr

≤∫ x+δ2u0

x

|∇hk` ||dr| ≤ uδ2. (15)

Putting (14), (15), and (13) together and choosing υ3 = υ1δ2u,we have for all x ∈ B(x0, δ2) \ E and all ` ≥ L,

h(x+ δ2u0)−hk`(x+ δ2u0) (16)≥ h(x)− hk`(x) + δ2(1− υ1)(u+ υ0)− δ2u≥ δ2(1− υ1)(u+ υ0)− δ2(1 + υ1)u , υ4.

The quantity υ4 can be made strictly positive choosing υ1 =υ0

4u+3υ0> 0. Let E+ = E + δ2u0 and x1 = x0 + δ2u0. Then,

(16) can be rewritten as

h(x)− hk`(x) ≥ υ4, ∀x ∈ Nδ2(x1) \ E+ and ` ≥ L,

which, by choosing υ2 = 12m(B(x1, δ2)), implies∫

Nδ2 (x1)\E+

|h(x)− hk`(x)|2dx ≥ υ24 ·m(B(x1, δ2) \ E+)

⇒ ‖h− hk`‖ ≥ υ4

√m(B(x1, δ2))/2 > 0,

contradicting hk`‖·‖2−−→ h, so S2 must be closed.

Given the result in Proposition VI.1, the best approximationin S of a function h ∈ S0 is its unique projection onto S, i.e.,

h = projS(h).

By definition, the projected function has bounded gradient andHessian.

B. Algorithm Design and Analysis

We address here the challenge (iii) laid out in Section IV-Band put together the discussion above to propose a class ofdifferentially private, distributed optimization algorithms thatsolve Problem 1. We require each agent i ∈ 1, . . . , n to firstcompute

fi =M(fi,ηi) = fi + Φ(ηi), (17a)

where ηi is a sequence of Laplace noise generated by iaccording to (8) with the choice (11), then select fsi ∈ S0

such that

‖fi − fsi ‖ < εi, (17b)

and finally compute

fi = projS(fsi ). (17c)

After this process, agents participate in any distributed op-timization algorithm with the modified objective functionsfini=1. Let

x∗ = argminx∈X

n∑i=1

fi and x∗ = argminx∈X

n∑i=1

fi,

denote, respectively, the output of the distributed algorithmand the optimizer for the original optimization problem (withobjective functions fini=1). The following result establishesthe connection between the algorithm’s accuracy and thedesign parameters.

Theorem VI.2. (Accuracy of a class of distributed, differen-tially private coordination algorithm): Consider a group of nagents which perturb their local objective functions according

to (17) with Laplace noise (8) of variance (11), where qi > 1,γi > 0, and pi ∈

(12 , qi −

12

)for all i ∈ 1, . . . , n. Let the

agents participate in any distributed coordination algorithmthat asymptotically converges to the optimizer x∗ of theperturbed aggregate objective function. Then, εi-differentialprivacy of each agent i’s original objective function is pre-served with εi =

√ζ(2(qi − pi))/γi and

|E[x∗]− x∗| ≤n∑i=1

κn

(γi√ζ(2pi)

)+ κn(εi),

where the function κn ≡ κnα,nβ is defined in Proposition A.2.

Proof: Since the distributed algorithm is a post-processingstep on the perturbed functions, privacy preservation of theobjective functions follows from Theorem V.2 and Proposi-tion IV.3. For convenience, let ∆ = |E[x∗ − x∗]|. Note that

∆ ≤ E |x∗ − x∗| = E∣∣∣ argmin

x∈X

n∑i=1

fi − argminx∈X

n∑i=1

fi

∣∣∣.Since µnα,nβ is convex and belongs to class K∞ (so ismonotonically increasing), κn is concave and belongs to classK∞ and so is subadditive. Therefore, using Proposition A.2,

∆ ≤ E[κn

(∥∥∥ n∑i=1

fi −n∑i=1

fi

∥∥∥)]≤ E

[κn

( n∑i=1

‖fi − fi‖)]≤

n∑i=1

E[κn(‖fi − fi‖

)].

Then, by the non-expansiveness of projection, we have

∆ ≤n∑i=1

E[κn(‖fsi − fi‖

)]≤

n∑i=1

E[κn(‖fsi − fi‖

)+ κn

(‖fi − fi‖

)]≤

n∑i=1

(κn(εi) + E

[κn(‖ηi‖

)]). (18)

By invoking Jensen’s inequality twice, for all i ∈ 1, . . . , n,

E[κn(‖ηi‖)

]≤ κn

(E[‖ηi‖

])= κn

(E[√‖ηi‖2

])≤ κn

(√E[‖ηi‖2

])= κn

(√∑∞

k=1b2i,k

)= κn

(γi√ζ(2pi)

). (19)

The result follows from (18) and (19).The following result describes the trade-off between accu-

racy and privacy. The proof follows by direct substitution.

Corollary VI.3. (Privacy-accuracy trade-off): Under the hy-potheses of Theorem VI.2, if pi = qi

2 in (11) for all i, then

|E[x∗]− x∗| ≤n∑i=1

κn

(ζ(qi)

εi

)+ κn(εi). (20)

In Corollary VI.3, qi and εi are chosen independently, whichin turn determines the value of γi according to (12). Also,it is clear from (20) that in order for the accuracy of thecoordination algorithm not to be affected by the smoothening

step, each agent i ∈ 1, . . . , n has to take the value of εisufficiently small so that it is negligible relative to ζ(2pi)/εi.In particular, this procedure can be executed for any arbitrarilylarge value of εi, so that in case of no privacy requirements atall, perfect accuracy is recovered, as specified in Problem 1.

Remark VI.4. (Accuracy bound for sufficiently large do-mains): One can obtain a less conservative bound than (20) onthe accuracy of the proposed class of algorithms if the min-imizers of all the agents’ objective functions are sufficientlyfar from the boundary of X . This can be made precise viaCorollary A.3. If the aggregate objective function satisfies (25)and the amount of noise is also sufficiently small so that theminimizer of the sum of the perturbed objective functionssatisfies this condition, then invoking Corollary A.3, one canobtain

|E[x∗]− x∗| ≤ L

n2

n∑i=1

(γ

2d+4

i ζ(2pi)1d+4 + ε

2d+4

i

)

=L

n2

n∑i=1

[(ζ(qi)

εi

) 2d+4

+ ε2d+4

i

],

where the equality holds under the assumption that pi = qi2

in (11) for all i ∈ 1, . . . , n. •

VII. SIMULATIONS

In this section, we report simulation results for our algo-rithm design over a group of n = 10 agents. The individualobjective functions are two-dimensional quadratic functions,defined on D = [−50, 50]2, with minimizers chosen uniformlyrandomly on the unit circle X = S1. The orthonormal basis ofL2(D) is constructed from the Gram-Schmidt orthogonaliza-tion of the Taylor functions and the series is truncated to thefourth order, resulting in a 15-dimensional coefficient space.This truncation also acts as the smoothening step described inSection VI-A1. We evaluate the projection operator in (17c)by numerically solving the convex optimization problemminfi∈S ‖fi − f

si ‖, where fsi is the result of the truncation.

Rather than implementing any specific distributed coordinationalgorithm, we use an iterative interior-point algorithm on f andf to find the perturbed x∗ and original x∗ optimizers, respec-tively (these points correspond to the asymptotic behavior ofany provably correct distributed optimization algorithm withthe perturbed and original functions, respectively).

The privacy levels are chosen the same for all agents,i.e., ε = (ε, ε, · · · , ε), and ε is swept logarithmically over[10−3, 102]. The design parameters are set to α = 1, β = 4,qi = 2pi = 1.1, and γi =

√ζ(2(qi − pi))/ε, for all

i ∈ 1, . . . , n. For each value of ε, the simulations arerepeated 50 times to capture the stochasticity of the solutions.Figure 2 illustrates the error |x∗ − x∗| as a function ofε, together with the best linear fit of log |x∗ − x∗| againstlog ε, and the upper bound obtained in Corollary VI.3. Theconservative nature of this upper bound can be explained bynoting the approximations leading to the computation of Lin Proposition A.2, suggesting there is room for refining thisbound. Figure 2 shows that accuracy keeps improving as theprivacy requirement is relaxed, in contrast with the behavior

ǫ

10-3

10-2

10-1

100

101

102

|x∗−

x∗|

10-6

10-5

10-4

10-3

10-2

10-1

100

101

102

Empirical dataLinear fit of log |x∗

− x∗| against log ǫ

Theoretical upper bound on |E[x∗]− x∗|

Fig. 2. Privacy-accuracy trade-off for proposed class of distributed, differen-tially private coordination algorithms. The circles, dotted line, and solid lineillustrate the simulation results, the best linear fit of the simulation results inlogarithmic scale, and the upper bound of Corollary VI.3, respectively.

of message-perturbing algorithms, cf. Figure 1. It is importantto mention that the respective error values for a fixed ε cannotbe compared between Figures 1 and 2 because, in [15], ε isdefined as the total exponent in (5), i.e., εi0‖f

(1)i0− f (2)

i0‖V .

Nevertheless, if we divide all the values of ε in Figure 1 bythe maximum of ‖f (1)

i0− f (2)

i0‖V over S2, which turns out to

be about 106 in our example, one can see that the algorithmin Figure 2 achieves better pointwise accuracy.

VIII. CONCLUSIONS AND FUTURE WORK

We have studied the distributed optimization of the sum oflocal strongly convex objective functions by a group of agentssubject to the requirement of differential privacy. We haveestablished the incompatibility between differential privacyand the asymptotic stability of the underlying (noise-free)dynamics for coordination strategies based on the perturbationof the messages among agents. This has motivated us todevelop a new framework for differentially private release offunctions from the L2 space that can be applied to arbitraryseparable Hilbert spaces. We have also carefully described howperturbed functions with the desired regularity requirementscan be provided to any distributed coordination algorithmwhile preserving privacy. Finally, we have characterized theaccuracy of the resulting class of distributed, differentiallyprivate coordination algorithms. Future work will analyze howto relax the smoothness and convexity assumptions on thelocal objective functions and the compactness hypothesis ontheir domains, compare the numerical efficiency of differentorthonormal bases for the L2 space, and investigate the leastamount of noise required to satisfy the requirement of differ-ential privacy for fixed values of the parameter ε.

REFERENCES

[1] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computa-tion: Numerical Methods. Athena Scientific, 1997.

[2] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54,no. 1, pp. 48–61, 2009.

[3] M. Zhu and S. Martınez, “On distributed convex optimization underinequality and equality constraints,” IEEE Transactions on AutomaticControl, vol. 57, no. 1, pp. 151–164, 2012.

[4] B. Gharesifard and J. Cortes, “Distributed continuous-time convex opti-mization on weight-balanced digraphs,” IEEE Transactions on AutomaticControl, vol. 59, no. 3, pp. 781–786, 2014.

[5] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging fordistributed optimization: convergence analysis and network scaling,”IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592–606,2012.

[6] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise tosensitivity in private data analysis,” in Proceedings of the 3rd Theory ofCryptography Conference, New York, NY, Mar. 2006, pp. 265–284.

[7] C. Dwork, “Differential privacy,” in Proceedings of the 33rd Interna-tional Colloquium on Automata, Languages and Programming (ICALP),Venice, Italy, July 2006, pp. 1–12.

[8] C. Dwork and A. Roth, “The algorithmic foundations of differentialprivacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3-4, pp. 211–407, Aug. 2014.

[9] J. Zhang, Z. Zhang, X. Xiao, Y. Yang, and M. Winslett, “Functionalmechanism: Regression analysis under differential privacy,” Proceedingsof the VLDB Endowment, vol. 5, no. 11, pp. 1364–1375, July 2012.

[10] R. Hall, A. Rinaldo, and L. Wasserman, “Differential privacy for func-tions and functional data,” The Journal of Machine Learning Research,vol. 14, no. 1, pp. 703–727, Jan. 2013.

[11] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially pri-vate empirical risk minimization,” The Journal of Machine LearningResearch, vol. 12, pp. 1069–1109, 2011.

[12] A. Rajkumar and S. Agarwal, “A differentially private stochastic gradientdescent algorithm for multiparty classification,” in Proceedings of the15th International Conference on Artificial Intelligence and Statistics,vol. 22, 2012, pp. 933–941.

[13] S. Song, K. Chaudhuri, and A. D. Sarwate, “Stochastic gradient descentwith differentially private updates,” in Proceedings of the Global Con-ference on Signal and Information Processing. IEEE, Dec. 2013, pp.245–248.

[14] K. Chaudhuri, D. J. Hsu, and S. Song, “The large margin mechanism fordifferentially private maximization,” in Advances in Neural InformationProcessing Systems 27. Curran Associates, Inc., 2014, pp. 1287–1295.

[15] Z. Huang, S. Mitra, and N. Vaidya, “Differentially private distributedoptimization,” in Proceedings of the 2015 International Conference onDistributed Computing and Networking, Pilani, India, Jan. 2015.

[16] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributedconstrained optimization,” IEEE Transactions on Automatic Control,2014, submitted. [Online]. Available: http://arxiv.org/abs/1411.4105

[17] M. T. Hale and M. Egerstedt, “Differentially private cloud-based multi-agent optimization with constraints,” in American Control Conference.Chicago, IL: IEEE, July 2015, pp. 1235–1240.

[18] K. Chatzikokolakis, M. E. Andrs, N. E. Bordenabe, and C. Palamidessi,“Broadening the scope of differential privacy using metrics,” in Pri-vacy Enhancing Technologies, ser. Lecture Notes in Computer Science.Springer Berlin Heidelberg, 2013, vol. 7981, pp. 82–102.

[19] E. Kreyszig, Introductory Functional Analysis with Applications. JohnWiley & Sons, 1989.

[20] C. Cai and A. R. Teel, “Results on input-to-state stability for hybridsystems,” in 44th IEEE Conference on Decision and Control andEuropean Control Conference. Seville, Spain: IEEE, Dec. 2005, pp.5403–5408.

[21] Z.-P. Jiang and Y. Wang, “Input-to-state stability for discrete-timenonlinear systems,” Automatica, vol. 37, no. 6, pp. 857–869, 2001.

[22] H. Jeffreys and B. S. Jeffreys, Methods of Mathematical Physics, 3rd ed.Cambridge University Press, 1999.

[23] J. L. Ny and G. J. Pappas, “Differentially private filtering,” IEEETransactions on Automatic Control, vol. 59, no. 2, pp. 341–354, 2014.

[24] R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970.[25] R. E. Williamson and H. F. Trotter, Multivariable Mathematics, 4th ed.

Pearson Education, Inc., June 2003.

APPENDIX AK-LIPSCHITZ PROPERTY OF THE argmin MAP

Here we establish the Lipschitzness of the argmin map un-der suitable assumptions. This is a strong result of independentinterest given that argmin is not even continuous for arbitraryC2 functions. Our accuracy analysis for the proposed classof distributed, differentially private coordination algorithmsin Section VI-B relies on this result. We begin with an

http://arxiv.org/abs/1411.4105

auxiliary result stating a geometric property of balls containedin convex, compact domains.

Lemma A.1. (Minimum portion of balls contained in convexcompact domains): Assume D ⊂ Rd is convex, compact, andhas nonempty interior and let rD > 0 denote its inradius.Then, there exists λD ∈ (0, 1) such that,

m(B(x, r) ∩D) ≥ λDm(B(x, r)),

for any x ∈ D and r ≤ rD.

Proof: Let B(cD, rD) be the inball of D, i.e., the largestball contained in D. If this ball is not unique, we pick onearbitrarily. Since Do 6= ∅, rD > 0. Let RD be the radiusof the largest ball centered at cD that contains D. Since Dis compact, RD < ∞. For any x ∈ D that is on or outsideof B(cD, rD), let Σ be the intersection of B(cD, rD) and thehyperplane passing through cD and perpendicular to cD − x.Consider the cone C = conv(Σ ∪ x) where conv denotesconvex hull. Since D is convex, C ⊆ D. Note that C has halfangle θx = tan−1 rD

|x−cD| so the solid angle at its apex is

Ωθx =2π

d−12

Γ(d−12 )

∫ θx

0

sind−2(φ)dφ. (21)

Therefore, for any r ≤ rD, the proportion ΩθxΩd

of B(x, r)is contained in D where Ωd is the total d-dimensional solidangle given by

Ωd =2π

d2

Γ(d2 ).

For any x inside B(cD, rD), the same argument holds with

θx = max|x−cD|≥rD

tan−1 rD|x− cD|

=π

4.

Therefore, for arbitrary x ∈ D, the statement holds with

λD = minx∈D

ΩθxΩd

=1

ΩdΩtan−1(rD/RD).

We are now ready to establish the K-Lipschitzness of theargmin map.

Proposition A.2. (K-Lipschitzness of argmin): For any twofunctions f, g ∈ S,∣∣ argmin

x∈Xf − argmin

x∈Xg∣∣ ≤ κα,β(‖f − g‖), (22)

where κα,β ∈ K∞ is given by

κ−1α,β(r) =

α2πd2

d2d+3Γ(d2 )λD

(rDdD

)dr4µdα,β(r), ∀r ∈ [0,∞),

rD and λD are as in Lemma A.1, dD is the diameter of D,and µα,β ∈ K∞ is defined for all r ∈ [0,∞) by

µα,β(r) =αr2

2√αβr2 + 2(β + α)ur + 4u2

.

Proof: We consider the case where a =argminx∈X f(x) 6= argminx∈X g(x) = b since thestatement is trivial otherwise. Let ma = f(a), mb = g(b),

m = ma −mb, ua = ∇f(a), and ub = ∇g(b). Without lossof generality, assume m ≥ 0. Define,

fl(x) =α

2|x− a|2 + uTa (x− a) +ma,

gu(x) =β

2|x− b|2 + uTb (x− b) +mb,

for all x ∈ D. Since f, g ∈ S, we can integrate ∇2f ≥ αIdand ∇2g ≤ βId twice to get,

∀x ∈ D fl(x) ≤ f(x) and g(x) ≤ gu(x). (23)

It follows that, for all x ∈ D,

|f(x)− g(x)| ≥ [fl(x)− gu(x)]+ ≥ [fl(x)− gu(x)−m]+,

where [z]+ = maxz, 0 for any z ∈ R. After somecomputations, one can get

fl(x)− gu(x)−m = −β − α2

(|x− c|2 − r2

),

where

c =βb− αa+ ua − ub

β − α,

r2 =αβ|a− b|2

(β − α)2+|ua − ub|2

(β − α)2+

2(βua − αub)T (b− a)

(β − α)2.

Therefore, the region where fl − gu − m ≥ 0 is B(c, r).Next, we seek to identify a subset inside this ball where wecan determine a strictly positive lower bound of fl − gu thatdepends on the difference |a − b|. To this effect, note thatb ∈ B(c, r), since

r2 − |c− b|2 =α

β − α|a− b|2 +

2

β − αuTa (b− a),

and, by the convexity of the problem, uTa (b − a) ≥ 0. Letr = r − |c− b| > 0 be the radius of the largest ball centeredat b and contained in B(c, r). We have,

r2 − |c− b|2 = (r − |c− b|)(r + |c− b|) ≥ α

β − α|a− b|2

⇒ r ≥α

β−α |a− b|2

r + |c− b|≥

αβ−α |a− b|

2

2r≥ µα,β(|a− b|),

where in the last inequality we have used |ua|, |ub| ≤ u. Next,note that for all x ∈ B(c, r+|c−b|2 ),

fl(x)− gu(x)−m ≥

− β − α2

(r2 + |c− b|2 + 2r|c− b|4

− r2). (24)

Using the bound 2r|c− b| ≤ r2 + |c− b|2, we get after somesimplifications,

(fl − gu)(x)−m ≥ α

4|a− b|2 +

1

2uTa (b− a) ≥ α

4|a− b|2,

for all x ∈ B(b, r2 ) ⊂ B(c, r+|c−b|2 ). Therefore,

‖f − g‖2 =

∫D

|f(x)− g(x)|2dx

≥∫D

([fl(x)− gu(x)−m]+)2dx

≥∫B(b, r2 )∩D

(fl(x)− gu(x)−m)2dx

≥ α2

16|a− b|4m

(B(b, r2 ) ∩D

)≥ α2

16|a− b|4m

(B(b,

µα,β(|a−b|)2 ) ∩D

).

Now, we invoke Lemma A.1 to lower bound the last term.Note that µα,β(|a − b|) ≤ |a − b| ≤ dD for all a, b ∈ D.Therefore, rD

dD

µα,β(|a−b|)2 ≤ minrD, µα,β(|a− b|)/2, so by

Lemma A.1,

‖f − g‖2 ≥ α2

16|a− b|4m

(B

(b,rDµα,β(|a− b|

2dD

)∩D

)≥ α2

16|a− b|4λD

2πd2

dΓ(d2 )

rdD2dddD

(µα,β(|a− b|))d,

which yields (22).The next result shows that if the minimizers of f and g are

sufficiently far from the boundary of D, then their gradientsneed not be uniformly bounded and yet one can obtain a lessconservative characterization of the K-Lipschitz property ofthe argmin map.

Corollary A.3. (K-Lipschitzness of argmin for sufficientlylarge domains): If f and g belong to S1 = h ∈ S0 | αId ≤∇2h(x) ≤ βId, ∀x ∈ Do and

argminx∈X

f(x), argminx∈X

g(x) ∈ B(cD, rD) ∩Xo, (25)

where rD = β−αα+β+2

√αβrD and B(cD, rD) ⊂ D, then∣∣ argmin

x∈Xf − argmin

x∈Xg∣∣ ≤ L‖f − g‖ 2

d+4 ,

where

L =d(d+ 2)(d+ 4)(β − α)d+2Γ(d/2)

4(αβ)d/2+2πd/2.

Proof: The proof follows the same lines as the proof ofProposition A.2 (and we use here the same notation). Since theminimizers of f and g lie in the interior of X , ua = ub = 0.The main difference here is that due to (25), we have for allx ∈ B(c, r) that

|x− cD| ≤ |x− c|+ |c− b|+ |b− cD|

≤ α+√αβ

β − α2rD + rD = rD,

so B(c, r) ⊂ D. Therefore, one can integrate (fl − gu −m)2

on the whole B(c, r) instead of its lower bound (24) on thesmaller ball B(c, r+|c−b|2 ). To explicitly calculate the valueof the resulting integral, one can use the change of variablesxi = ci + ry

1/2i , i ∈ 1, . . . , d and then use the formula

∀ai > −1,

∫Sd

ya11 · · · yadd dy =

Γ(a1 + 1) · · ·Γ(ad + 1)

Γ(a1 + · · ·+ ad + d+ 1),

where Sd = y ∈ Rd |∑di=1 yi = 1 and yi ≥ 0, i ∈

1, . . . , d. The result then follows from straightforwardsimplifications of the integral.

differentially private distributed convex optimization via ...optimizer or on the smoothness and...

Documents