systematic analysis of distributed optimization algorithms over … · 2020. 12. 15. · systematic...

24
Systematic Analysis of Distributed Optimization Algorithms over Jointly-Connected Networks Bryan Van Scoy Miami University Laurent Lessard Northeastern University IEEE Conference on Decision and Control December 16, 2020

Upload: others

Post on 05-Feb-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

  • Systematic Analysis of DistributedOptimization Algorithms overJointly-Connected Networks

    Bryan Van ScoyMiami University

    Laurent LessardNortheastern University

    IEEE Conference on Decision and Control December 16, 2020

  • minimizex1,...,xn

    n∑i=1

    fi(xi)

    subject to x1 = x2 = . . . = xn

    f1, x1f2, x2

    f3, x3

    f4, x4f5, x5

    W =

    13

    13 0 0

    13

    0 12 012 0

    0 1623

    16 0

    13 0

    13 0

    13

    13 0 0

    13

    13

    Want each agent to compute the global optimizer x? by communicatingwith local neighbors and performing local computations.

    1

  • DIGing

    xi(k + 1) =n∑

    j=1Wij(k)xj(k)− α yi(k)

    yi(k + 1) =n∑

    j=1Wij(k) yj(k) +∇fi

    (xi(k + 1)

    )−∇fi

    (xi(k)

    )

    • analyzed in (Nedić, Olshevsky, and Shi, 2017) and (Qu, Li, 2018)• linear convergence over time-varying networks

    ‖xi(k)− x?‖ = O(ρk)

    iterations to converge ∝ −1/ log ρ

    2

  • DIGing

    xi(k + 1) =n∑

    j=1Wij(k)xj(k)− α yi(k)

    yi(k + 1) =n∑

    j=1Wij(k) yj(k) +∇fi

    (xi(k + 1)

    )−∇fi

    (xi(k)

    )

    Issues:

    • the bound grows with the number of agents• the bound does not apply to large stepsizes• the analysis is specific to DIGing

    3

  • Systematicanalysistool

    LMI

    Worst-caseperformance

    bound

    ρAlgorithm

    A Bu BvCy Dyu DyvCz Dzu DzvFx Fu

    Local functions

    κ

    Communication network

    σ, B

    main contribution = sharp and systematic analysis

    4

  • Systematicanalysistool

    LMI

    Worst-caseperformance

    bound

    ρAlgorithm

    A Bu BvCy Dyu DyvCz Dzu DzvFx Fu

    Local functions

    κ

    Communication network

    σ, B

    main contribution = sharp and systematic analysis

    5

  • Function assumptionsEach local objective function fi is L-smooth and m-strongly convex withrespect to the global optimizer.

    x

    fi(x)

    κ = 1

    x

    fi(x)

    κ = 2

    x

    fi(x)

    κ = 4

    x

    fi(x)

    κ = 8

    The condition ratio κ = L/m characterizeshow much the curvature varies.

    For all x,(∇fi(x)−∇fi(x?)−m (x− x?)

    )T(∇fi(x)−∇fi(x?)− L (x− x?)) ≤ 06

  • Systematicanalysistool

    LMI

    Worst-caseperformance

    bound

    ρAlgorithm

    A Bu BvCy Dyu DyvCz Dzu DzvFx Fu

    Local functions

    κ

    Communication network

    σ, B

    7

  • Spectral gap

    The spectral gap characterizes the connectivity of a graph.

    0.0 0.667 0.707 0.809 1.0

    spectral gap =∥∥ 1

    n 11T −W

    ∥∥2

    8

  • Network assumptions

    sparsity Wij(k) = 0 if agent i does not receive informationfrom agent j at time k

    balanced W (k)1 = W (k)T1 = 1

    spectrum the spectral gap ofW (k) is less than or equal to one

    joint spectrum for some fixed B, the spectral gap of[W (k)W (k + 1) · · ·W (k +B − 1)

    ]1/Bis less than or equal to σ < 1

    The spectral gap σ characterizes the connectivity ofthe time-varying network averaged over B iterations.

    9

  • Systematicanalysistool

    LMI

    Worst-caseperformance

    bound

    ρAlgorithm

    A Bu BvCy Dyu DyvCz Dzu DzvFx Fu

    Local functions

    κ

    Communication network

    σ, B

    10

  • Algorithm

    xi(k + 1)yi(k)zi(k)

    =A Bu BvCy Dyu DyvCz Dzu Dzv

    xi(k)ui(k)vi(k)

    state updateui(k) = ∇fi

    (yi(k)

    )gradient computation

    vi(k) =n∑

    j=1Wij(k) zj(k) communication

    0 =n∑

    j=1

    (Fx xj(k) + Fu uj(k)

    )invariant

    11

  • 2 state variables 3 state variables

    EXTRA

    1 − 12 α −α 11 0 0 0 00 0 0 1 01 0 0 0 01 − 12 0 0 01 −1 α 0

    NIDS

    1 − 12 α2 −α2 11 0 0 0 00 0 0 1 01 0 0 0 01 − 12 α2 −α2 01 −1 α 0

    DIGing

    0 −α 0 0 1 00 0 −1 1 0 10 0 0 1 0 00 −α 0 0 1 01 0 0 0 0 00 1 0 0 0 00 1 −1 0

    AugDGM

    0 0 0 0 1 −α0 0 −1 1 0 10 0 0 1 0 00 0 0 0 1 −α1 0 0 0 0 00 1 0 0 0 00 1 −1 0

    SVL

    1− γ β −α −γ−1 1 0 −11− δ 0 0 δ1 0 0 00 1 0

    Exact Diffusion(ExDIFF)

    1 −1 −α 112 0 −α 121 0 − 12 01 0 0 01 −1 0

    Unified DIGing(uDIG)

    0 −α −α 1 0L+m

    2 0 −1 0 11 0 0 0 01 0 0 0 0

    −L+m2 1 1 0 00 1 0

    Unified EXTRA(uEXTRA)

    0 −α −α 1 00 0 −1 L 11 0 0 0 01 0 0 0 00 1 1 −L 00 1 0

    1co

    mmunicatedvariable

    2co

    mmunicatedvariables

    12

  • Systematicanalysistool

    LMI

    Worst-caseperformance

    bound

    ρAlgorithm

    A Bu BvCy Dyu DyvCz Dzu DzvFx Fu

    Local functions

    κ

    Communication network

    σ, B

    13

  • LMIFind P � 0, Q � 0, R � 0, S � 0, and λ ≥ 0 such that

    0 � ΨT(ξT+P ξ+ − ρ

    2 (ξTP ξ) +B−1∑`=0

    λ(`)[y(`)u(`)

    ]T[−2 κ+ 1κ+ 1 −2κ

    ][y(`)u(`)

    ])Ψ

    and

    0 � ξT+Q ξ+ − ρ2 (ξTQ ξ) +

    B−1∑`=0

    λ(`)[y(`)u(`)

    ]T[−2 κ+ 1κ+ 1 −2κ

    ][y(`)u(`)

    ]+[w(0)w(B)

    ]T([σ2B 0

    0 −1

    ]⊗R

    )[w(0)w(B)

    ]

    +B−1∑`=0

    z(`)w(`)v(`)w(`+ 1)

    T([

    1 00 −1

    ]⊗ S(`)

    ) z(`)w(`)v(`)w(`+ 1)

    Solved in Julia using Convex.jl and Mosek.

    14

  • Systematicanalysistool

    LMI

    Worst-caseperformance

    bound

    ρAlgorithm

    A Bu BvCy Dyu DyvCz Dzu DzvFx Fu

    Local functions

    κ

    Communication network

    σ, B

    15

  • DIGing

    100 101 102

    (1 + σ)/(1− σ)

    101

    102

    103

    104

    iter

    ati

    on

    sto

    con

    verg

    eκ = 10, B = 1, α optimizes DIGing bound

    n = 50

    n = 10

    n = 2

    all n

    lower bound

    • dashed lines are from (Nedić, Olshevsky, and Shi, 2017)• solid line is from our LMI• lower bound corresponds to only optimization and only consensus

    16

  • DIGing

    100 101 102

    (1 + σ)/(1− σ)

    101

    102

    103

    104

    iter

    ati

    on

    sto

    con

    verg

    eκ = 10, B = 1, α optimizes LMI

    all n

    lower bound

    • the bound in (Nedić, Olshevsky, and Shi, 2017) is vacuous

    17

  • DIGing

    100 101 102

    (1 + σ)/(1− σ)

    101

    102

    103

    104

    iter

    ati

    on

    sto

    con

    verg

    eκ = 10, α optimizes LMI

    B = 4

    B = 3

    B = 2

    B = 1

    lower bound

    18

  • Algorithm comparison

    100 101 102

    (1 + σ)/(1− σ)

    101

    102

    103

    104

    iter

    ati

    on

    sto

    con

    verg

    eκ = 10, B = 1, α optimizes LMI

    EXTRA

    NIDS

    DIGing

    AugDGM

    ExDIFF

    uEXTRA

    uDIG

    SVL

    lower bound

    SVL is designed to optimize the worst-case performance when B = 1.

    19Sundararajan, Van Scoy, and Lessard (2020)

  • Algorithm comparison

    100 101 102

    (1 + σ)/(1− σ)

    101

    102

    103

    104

    iter

    ati

    on

    sto

    con

    verg

    eκ = 10, B = 2, α optimizes LMI

    EXTRA

    NIDS

    DIGing

    AugDGM

    ExDIFF

    uEXTRA

    uDIG

    SVL

    lower bound

    20

  • Algorithm comparison

    100 101 102

    (1 + σ)/(1− σ)

    101

    102

    103

    104

    iter

    ati

    on

    sto

    con

    verg

    eκ = 10, B = 3, α optimizes LMI

    EXTRA

    NIDS

    DIGing

    AugDGM

    ExDIFF

    uEXTRA

    uDIG

    SVL

    lower bound

    21

  • Algorithm comparison

    100 101 102

    (1 + σ)/(1− σ)

    101

    102

    103

    104

    iter

    ati

    on

    sto

    con

    verg

    eκ = 10, B = 4, α optimizes LMI

    EXTRA

    NIDS

    DIGing

    AugDGM

    ExDIFF

    uEXTRA

    uDIG

    SVL

    lower bound

    22

  • Summary

    main contribution = sharp and systematic analysis

    open questions:

    • is SVL asymptotically optimal in the limit as σ → 1?• what is the optimal algorithm for B > 1?

    https://vanscoy.github.io

    23

    https://vanscoy.github.io