systematic analysis of distributed optimization algorithms over … · 2020. 12. 15. · systematic...

Systematic Analysis of DistributedOptimization Algorithms overJointly-Connected Networks

Bryan Van ScoyMiami University

Laurent LessardNortheastern University

IEEE Conference on Decision and Control December 16, 2020

minimizex1,...,xn

n∑i=1

fi(xi)

subject to x1 = x2 = . . . = xn

f1, x1f2, x2

f3, x3

f4, x4f5, x5

W =

13

13 0 0

13

0 12 012 0

0 1623

16 0

13 0

13 0

13

13 0 0

13

13

Want each agent to compute the global optimizer x? by communicatingwith local neighbors and performing local computations.

1

DIGing

xi(k + 1) =n∑

j=1Wij(k)xj(k)− α yi(k)

yi(k + 1) =n∑

j=1Wij(k) yj(k) +∇fi

(xi(k + 1)

)−∇fi

(xi(k)

)

• analyzed in (Nedić, Olshevsky, and Shi, 2017) and (Qu, Li, 2018)• linear convergence over time-varying networks

‖xi(k)− x?‖ = O(ρk)

iterations to converge ∝ −1/ log ρ

2

DIGing

xi(k + 1) =n∑

j=1Wij(k)xj(k)− α yi(k)

yi(k + 1) =n∑

j=1Wij(k) yj(k) +∇fi

(xi(k + 1)

)−∇fi

(xi(k)

)

Issues:

• the bound grows with the number of agents• the bound does not apply to large stepsizes• the analysis is specific to DIGing

3

Systematicanalysistool

LMI

Worst-caseperformance

bound

ρAlgorithm

A Bu BvCy Dyu DyvCz Dzu DzvFx Fu

Local functions

κ

Communication network

σ, B

main contribution = sharp and systematic analysis

4


LMI


bound

ρAlgorithm


Local functions

κ


σ, B


5

Function assumptionsEach local objective function fi is L-smooth and m-strongly convex withrespect to the global optimizer.

x

fi(x)

κ = 1

x

fi(x)

κ = 2

x

fi(x)

κ = 4

x

fi(x)

κ = 8

The condition ratio κ = L/m characterizeshow much the curvature varies.

For all x,(∇fi(x)−∇fi(x?)−m (x− x?)

)T(∇fi(x)−∇fi(x?)− L (x− x?)) ≤ 06


LMI


bound

ρAlgorithm


Local functions

κ


σ, B

7

Spectral gap

The spectral gap characterizes the connectivity of a graph.

0.0 0.667 0.707 0.809 1.0

spectral gap =∥∥ 1

n 11T −W

∥∥2

8

Network assumptions

sparsity Wij(k) = 0 if agent i does not receive informationfrom agent j at time k

balanced W (k)1 = W (k)T1 = 1

spectrum the spectral gap ofW (k) is less than or equal to one

joint spectrum for some fixed B, the spectral gap of[W (k)W (k + 1) · · ·W (k +B − 1)

]1/Bis less than or equal to σ < 1

The spectral gap σ characterizes the connectivity ofthe time-varying network averaged over B iterations.

9


LMI


bound

ρAlgorithm


Local functions

κ


σ, B

10

Algorithm

xi(k + 1)yi(k)zi(k)

=A Bu BvCy Dyu DyvCz Dzu Dzv

xi(k)ui(k)vi(k)

state updateui(k) = ∇fi

(yi(k)

)gradient computation

vi(k) =n∑

j=1Wij(k) zj(k) communication

0 =n∑

j=1

(Fx xj(k) + Fu uj(k)

)invariant

11

2 state variables 3 state variables

EXTRA

1 − 12 α −α 11 0 0 0 00 0 0 1 01 0 0 0 01 − 12 0 0 01 −1 α 0

NIDS

1 − 12 α2 −α2 11 0 0 0 00 0 0 1 01 0 0 0 01 − 12 α2 −α2 01 −1 α 0

DIGing

0 −α 0 0 1 00 0 −1 1 0 10 0 0 1 0 00 −α 0 0 1 01 0 0 0 0 00 1 0 0 0 00 1 −1 0

AugDGM

0 0 0 0 1 −α0 0 −1 1 0 10 0 0 1 0 00 0 0 0 1 −α1 0 0 0 0 00 1 0 0 0 00 1 −1 0

SVL

1− γ β −α −γ−1 1 0 −11− δ 0 0 δ1 0 0 00 1 0

Exact Diffusion(ExDIFF)

1 −1 −α 112 0 −α 121 0 − 12 01 0 0 01 −1 0

Unified DIGing(uDIG)

0 −α −α 1 0L+m

2 0 −1 0 11 0 0 0 01 0 0 0 0

−L+m2 1 1 0 00 1 0

Unified EXTRA(uEXTRA)

0 −α −α 1 00 0 −1 L 11 0 0 0 01 0 0 0 00 1 1 −L 00 1 0

1co

mmunicatedvariable

2co

mmunicatedvariables

12


LMI


bound

ρAlgorithm


Local functions

κ


σ, B

13

LMIFind P � 0, Q � 0, R � 0, S � 0, and λ ≥ 0 such that

0 � ΨT(ξT+P ξ+ − ρ

2 (ξTP ξ) +B−1∑`=0

λ(`)[y(`)u(`)

]T[−2 κ+ 1κ+ 1 −2κ

][y(`)u(`)

])Ψ

and

0 � ξT+Q ξ+ − ρ2 (ξTQ ξ) +

B−1∑`=0

λ(`)[y(`)u(`)

]T[−2 κ+ 1κ+ 1 −2κ

][y(`)u(`)

]+[w(0)w(B)

]T([σ2B 0

0 −1

]⊗R

)[w(0)w(B)

]

+B−1∑`=0

z(`)w(`)v(`)w(`+ 1)

T([

1 00 −1

]⊗ S(`)

) z(`)w(`)v(`)w(`+ 1)

Solved in Julia using Convex.jl and Mosek.

14


LMI


bound

ρAlgorithm


Local functions

κ


σ, B

15

DIGing

100 101 102

(1 + σ)/(1− σ)

101

102

103

104

iter

ati

on

sto

con

verg

eκ = 10, B = 1, α optimizes DIGing bound

n = 50

n = 10

n = 2

all n

lower bound

• dashed lines are from (Nedić, Olshevsky, and Shi, 2017)• solid line is from our LMI• lower bound corresponds to only optimization and only consensus

16

DIGing

100 101 102

(1 + σ)/(1− σ)

101

102

103

104

iter

ati

on

sto

con

verg

eκ = 10, B = 1, α optimizes LMI

all n

lower bound

• the bound in (Nedić, Olshevsky, and Shi, 2017) is vacuous

17

DIGing

100 101 102

(1 + σ)/(1− σ)

101

102

103

104

iter

ati

on

sto

con

verg

eκ = 10, α optimizes LMI

B = 4

B = 3

B = 2

B = 1

lower bound

18

Algorithm comparison

100 101 102

(1 + σ)/(1− σ)

101

102

103

104

iter

ati

on

sto

con

verg


EXTRA

NIDS

DIGing

AugDGM

ExDIFF

uEXTRA

uDIG

SVL

lower bound

SVL is designed to optimize the worst-case performance when B = 1.

19Sundararajan, Van Scoy, and Lessard (2020)


100 101 102

(1 + σ)/(1− σ)

101

102

103

104

iter

ati

on

sto

con

verg


EXTRA

NIDS

DIGing

AugDGM

ExDIFF

uEXTRA

uDIG

SVL

lower bound

20


100 101 102

(1 + σ)/(1− σ)

101

102

103

104

iter

ati

on

sto

con

verg


EXTRA

NIDS

DIGing

AugDGM

ExDIFF

uEXTRA

uDIG

SVL

lower bound

21


100 101 102

(1 + σ)/(1− σ)

101

102

103

104

iter

ati

on

sto

con

verg


EXTRA

NIDS

DIGing

AugDGM

ExDIFF

uEXTRA

uDIG

SVL

lower bound

22

Summary


open questions:

• is SVL asymptotically optimal in the limit as σ → 1?• what is the optimal algorithm for B > 1?

https://vanscoy.github.io

23
https://vanscoy.github.io

systematic analysis of distributed optimization algorithms over … · 2020. 12. 15. · systematic...

Documents