madmm: a generic algorithm for non-smooth optimization on ... · 9/25/2015 · 4/54 applications...

1/54

MADMM: a generic algorithm for non-smoothoptimization on manifolds

Michael Bronstein

Faculty of Informatics Perceptual Computing Group

University of Lugano Intel Corporation

Switzerland Israel

Louvain-la-Neuve, 25 September 2015

2/54

Image.processing Geometry processing

Image analysis..

Shape . analysis

Computer vision

Computer graphics 2D 3D

nD

Pattern recognition Machine

learning . Graph analysis . & processing

3/54

What is manifold optimization?

Manifold (or manifold-constrained) optimization problem

minX∈Rn×m

f(X) s.t. X ∈M

f ∶ Rn×m → R is a smooth function

M is a Riemannian submanifold of Rn×m

Absil et al. 2009

4/54

Applications

Sphere: principal geodesic analysis1, 1-bit compressed sensing2

Stiefel manifold: eigenvalue-, assignment-, Procrustes problems3,orthogonal dictionary learning4, binary coding5

Product of Stiefel manifolds: functional correspondence6,manifold learning7, structure-from-motion8, sensor localization9

Fixed-rank PSD: maxcut problems, sparse PCA10, matrixcompletion11, multidimensional scaling12

Oblique: ICA13, blind source separation14

1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008

; 3Ten Berghe 1977; 4Sun et al.2015; 5Xia et al. 2015; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimsonet al. 2012; 9Cucuringu et al. 2012; 10Journee et al. 2010; 11Tan et al. 2014;12Cayton, Dasgupta 2006; 13Absil, Gallivan 2006; 14Kleinsteuber, Shen 2012.

4/54

Applications






1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.2015; 5Xia et al. 2015

; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimsonet al. 2012; 9Cucuringu et al. 2012; 10Journee et al. 2010; 11Tan et al. 2014;12Cayton, Dasgupta 2006; 13Absil, Gallivan 2006; 14Kleinsteuber, Shen 2012.

4/54

Applications






1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.2015; 5Xia et al. 2015; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimsonet al. 2012; 9Cucuringu et al. 2012

; 10Journee et al. 2010; 11Tan et al. 2014;12Cayton, Dasgupta 2006; 13Absil, Gallivan 2006; 14Kleinsteuber, Shen 2012.

4/54

Applications






1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.2015; 5Xia et al. 2015; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimsonet al. 2012; 9Cucuringu et al. 2012; 10Journee et al. 2010; 11Tan et al. 2014;12Cayton, Dasgupta 2006

; 13Absil, Gallivan 2006; 14Kleinsteuber, Shen 2012.

4/54

Applications






1Zhang, Fletcher 2013; 2Boufounos, Baraniuk 2008; 3Ten Berghe 1977; 4Sun et al.2015; 5Xia et al. 2015; 6Kovnatsky et al. 2013; 7Eynard et al. 2015; 8Arie-Nachimsonet al. 2012; 9Cucuringu et al. 2012; 10Journee et al. 2010; 11Tan et al. 2014;12Cayton, Dasgupta 2006; 13Absil, Gallivan 2006; 14Kleinsteuber, Shen 2012.

5/54

Toy example: eigenvalue problem

minx∈Rn

x⊺Ax s.t. x⊺x = 1

6/54

Optimization on the manifold: main idea

minX∈M

f(X)

where f ∶M→ R is a function on the manifold (scalar field)

No global system of coordinates

Manifold M is locally homeomorphic to the tangent space TXMIntrinsic gradient

Exponential map expx ∶ TXM→MMoving vectors on M requires parallel transport

Absil et al. 2009

6/54


minX∈M

f(X)



Manifold M is locally homeomorphic to the tangent space TXM

Intrinsic gradient


Absil et al. 2009

6/54


minX∈M

f(X)



Manifold M is locally homeomorphic to the tangent space TXMIntrinsic gradient ∇Mf ∶M→ TM such that

f(“X + dV ”) = f(X) + ⟨∇Mf(X), dV ⟩TXM +O(∥dV ∥2)


Absil et al. 2009

6/54


minX∈M

f(X)



Manifold M is locally homeomorphic to the tangent space TXMIntrinsic gradient = projection of the extrinsic gradient

∇Mf(X) = PTXM∇f(X)


Absil et al. 2009

6/54


minX∈M

f(X)





Exponential map expx ∶ TXM→M

Moving vectors on M requires parallel transport

Absil et al. 2009

6/54


minX∈M

f(X)






Absil et al. 2009

7/54


X(k)

X(k+1)

M

Absil et al. 2009

7/54


X(k)

∇f(X(k))

PX(k)

∇Mf(X(k))

TX(k)M

M

Absil et al. 2009

7/54


X(k)

∇f(X(k))

PX(k)

α(k)∇Mf(X(k))

TX(k)M

M

Absil et al. 2009

7/54


X(k)

∇f(X(k))

PX(k)

α(k)∇Mf(X(k))

RX(k)

X(k+1)

TX(k)M

M

Absil et al. 2009

8/54

Optimization on the manifold

Algorithm 1 Conceptual algorithm for smooth optimization on Mrepeat

Compute extrinsic gradient ∇f(X(k))Projection: ∇Mf(X(k)) = PX(k)(∇f(X(k)))Compute step size α(k) along the descent direction −∇Mf(X(k))Retraction: X(k+1) = RX(k)(−α(k)∇Mf(X(k)))k ← k + 1

until convergence;

Projection and retraction operators are manifold-dependent

Typically expressed in closed form

“Black box”: need to provide only f(X) and gradient ∇f(X)

Absil et al. 2009; Boumal et al. 2014

9/54

Prototype problem

Non-smooth manifold optimization problem

minX∈M

f(X) + g(AX)

f ∶ Rn×m → R is a smooth function

g ∶ Rk×m → R is a non-smooth function

A is k × n matrix

M is a Riemannian submanifold of Rn×m

Typical examples: g(X) = ∥X∥1, ∥X∥2,1-, or ∥X∥∗

Smoothing Subgradient Splitting

/ Approximate / Problem dependent / Problem dependent

Kovnatsky, B, Glashoff 2015

9/54

Prototype problem


minX∈M

f(X) + g(AX)

Smoothing Subgradient Splitting

/ Approximate / Problem dependent / Problem dependent

Smoothing: Chen 2012Subgradient: Ferreira, Oliveira 1998; Ledyaev, Zhu 2007; Kleinsteuber, Shen 2012Splitting: Lai, Osher 2014; Neumann et al. 2014; Rosman et al. 2014

10/54

Manifold ADMM

11/54

Manifold ADMM


equivalently written as

minX∈M

f(X) + g(AX)

introducing an artificial variable Z and a linear constraint

Apply the method of multipliers only to the constraint Z = AX

minX∈M,Z

f(X) + g(Z) + ρ2∥AX −Z +U∥2F

Solve alternating w.r.t. X and Z and updating U ← U +AX −Z

Problem breaks into

Smooth manifold optimization sub-problem w.r.t. X, and

Non-smooth unconstrained sub-problem w.r.t. Z

Hestenes 1969; Powell 1969; Kovnatsky, Glashoff, B 2015

11/54

Manifold ADMM

Non-smooth manifold optimization problem equivalently written as

minX∈M,Z

f(X) + g(Z) s.t. Z = AX

introducing an artificial variable Z and a linear constraint

Apply the method of multipliers only to the constraint Z = AX

minX∈M,Z

f(X) + g(Z) + ρ2∥AX −Z +U∥2F

Solve alternating w.r.t. X and Z and updating U ← U +AX −Z

Problem breaks into

Smooth manifold optimization sub-problem w.r.t. X, and

Non-smooth unconstrained sub-problem w.r.t. Z

Hestenes 1969; Powell 1969; Kovnatsky, Glashoff, B 2015

12/54

MADMM

Algorithm 2 MADMM for non-smooth optimization on manifold MInitialize k ← 1, Z(1) = AX(1), U (1) = 0.

repeat

X-step: X(k+1) = argminX∈M

f(X) + ρ2∥AX −Z(k) +U (k)∥2F

Z-step: Z(k+1) = argminZ

g(Z) + ρ2∥AX(k+1) −Z +U (k)∥2F

Update U (k+1) = U (k) +AX(k+1) −Z(k+1)

k ← k + 1until convergence;

Solver/number of optimization iterations in X- and Z-steps

X-step and Z-step in some problems have a closed form

Parameter ρ > 0 can be chosen fixed or adapted

Kovnatsky, Glashoff, B 2015

13/54

Compressed modes

14/54

Laplacian eigenfunctions

The first k eigenfunctions of some Laplacian are used in...

Spectral clustering Dimensionalityreduction

Spectral distances

Ng et al. 2001; Belkin, Nyogi 2001; Coifman, Lafon 2006

15/54

Laplacian eigenfunctions

Find the first k eigenfunctions of an n × n Laplacian matrix ∆

minΦ∈Rn×k

tr(Φ⊺∆Φ) s.t. Φ⊺Φ = I

tr(Φ⊺∆Φ) = ∑ij wij∥φi − φj∥2F a.k.a. Dirichlet energy in physics

Many efficient solvers with global optimality guarantees

1D Euclidean Laplacian eigenfunctions = Fourier basis

∆e−iωx = −ω2e−iωx

Globally supported!

16/54

Laplacian eigenfunctions: 1D example

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

φ1 φ2

φ3 φ4

φ5 φ6

First eigenfunctions of a 1D Euclidean Laplacian

17/54

Laplacian eigenfunctions: non-Euclidean example

0

max

min

First Laplacian eigenfunctions of a Laplacian on a triangular mesh

Neumann et al. 2014

18/54

Compressed modes

minΦ∈Rn×k

tr(Φ⊺∆Φ) + µ∥Φ∥1 s.t. Φ⊺Φ = I

Dirichlet energy = smoothness

L1-norm = sparsity

Smoothness + sparsity = localization

Ozolins et al. 2013

19/54

Compressed modes: 1D example

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4

6

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

φ1 φ2

φ3 φ4

φ5 φ6

First compressed modes of a 1D Euclidean Laplacian

Ozolins et al. 2013

20/54

Compressed modes: non-Euclidean example

0

max

min

First compressed modes

Neumann et al. 2014

20/54

Compressed modes: non-Euclidean example

0

max

min

First Laplacian eigenfunctions

Neumann et al. 2014

21/54

Wannier functions

Maximally-localized Wannier functions in Si and GaAs crystals

Wannier 1937; Mostofi 2008

22/54

Splitting method for orthogonality constraints (SOC)

minΦ∈Rn×k

tr(Φ⊺∆Φ) + µ∥Φ∥1 s.t. Φ⊺Φ = I

Algorithm 3 SOC method for computing compressed modes

Initialize k ← 1, Φ(1), P (1) = Q(1) = Φ(1), U (1) = V (1) = 0repeat

Φ(k+1) = argminΦ

tr(Φ⊺∆Φ)+ ρ2∥Φ−Q(k)+U (k)∥2F+ ρ

′

2∥Φ−P (k)+V (k)∥2F

Q(k+1) = argminQ

µ∥Q∥1 + ρ2∥Φ(k+1) −Q +U (k)∥2F

P (k+1) = argminP

ρ′

2∥Φ(k+1) − P + V (k)∥2F s.t. P ⊺P = I

U (k+1) = U (k) +Φ(k+1) −Q(k+1)

V (k+1) = V (k) +Φ(k+1) − P (k+1)


Lai, Osher 2014

22/54

Splitting method for orthogonality constraints (SOC)

minΦ,P,Q∈Rn×k

tr(Φ⊺∆Φ) + µ∥Q∥1 s.t. P = Φ, Q = Φ, P ⊺P = I

Algorithm 3 SOC method for computing compressed modes

Initialize k ← 1, Φ(1), P (1) = Q(1) = Φ(1), U (1) = V (1) = 0repeat

Φ(k+1) = argminΦ

tr(Φ⊺∆Φ)+ ρ2∥Φ−Q(k)+U (k)∥2F+ ρ

′

2∥Φ−P (k)+V (k)∥2F

Q(k+1) = argminQ

µ∥Q∥1 + ρ2∥Φ(k+1) −Q +U (k)∥2F

P (k+1) = argminP

ρ′

2∥Φ(k+1) − P + V (k)∥2F s.t. P ⊺P = I

U (k+1) = U (k) +Φ(k+1) −Q(k+1)

V (k+1) = V (k) +Φ(k+1) − P (k+1)


Lai, Osher 2014

23/54

Compressed modes as manifold optimization

minΦ∈S(n,k)

tr(Φ⊺∆Φ) + µ∥Φ∥1

Stiefel manifold S(n, k) = {X ∈ Rn×k ∶X⊺X = I}

Sub-problem w.r.t. Φ: smooth manifold-constrained minimization ofa quadratic function

minΦ∈S(n,k)

tr(Φ⊺∆Φ) + ρ2∥Φ −Z +U∥2F

Sub-problem w.r.t. Z: sparse coding (Lasso) problem

minZ∥Z∥1 + ρ

2∥Φ +U −Z∥2F


; Chen et al. 1995; Tibshirani 1996

23/54


minΦ∈S(n,k),Z

tr(Φ⊺∆Φ) + µ∥Z∥1 + ρ2∥Φ −Z +U∥2F

Stiefel manifold S(n, k) = {X ∈ Rn×k ∶X⊺X = I}

Sub-problem w.r.t. Φ: smooth manifold-constrained minimization ofa quadratic function

minΦ∈S(n,k)

tr(Φ⊺∆Φ) + ρ2∥Φ −Z +U∥2F


minZ∥Z∥1 + ρ

2∥Φ +U −Z∥2F



23/54


minΦ∈S(n,k),Z

tr(Φ⊺∆Φ) + µ∥Z∥1 + ρ2∥Φ −Z +U∥2F

Stiefel manifold S(n, k) = {X ∈ Rn×k ∶X⊺X = I}Sub-problem w.r.t. Φ: smooth manifold-constrained minimization ofa quadratic function

minΦ∈S(n,k)

tr(Φ⊺∆Φ) + ρ2∥Φ −Z +U∥2F


minZ∥Z∥1 + ρ

2∥Φ +U −Z∥2F



23/54


minΦ∈S(n,k),Z

tr(Φ⊺∆Φ) + µ∥Z∥1 + ρ2∥Φ −Z +U∥2F

Stiefel manifold S(n, k) = {X ∈ Rn×k ∶X⊺X = I}Sub-problem w.r.t. Φ: smooth manifold-constrained minimization ofa quadratic function

minΦ∈S(n,k)

tr(Φ⊺∆Φ) + ρ2∥Φ −Z +U∥2F


minZ∥Z∥1 + ρ

2∥Φ +U −Z∥2F

Kovnatsky, Glashoff, B 2015; Chen et al. 1995; Tibshirani 1996

24/54

Compressed modes by MADMM

Algorithm 4 MADMM for computing compressed modes

Input n × n Laplacian matrix ∆, parameter µ > 0

Output k first compressed modes of ∆

Initialize k ← 1, Φ(1) ←some orthonormal matrix, Z(1) = Φ(1), U (1) = 0

repeat

Φ(k+1) = argminΦ∈S(n,k)

tr(Φ⊺∆Φ) + ρ2∥Φ −Z(k) +U (k)∥2F

Z(k+1) = Shrinkµρ(Φ(k+1) +U (k))

Update U (k+1) = U (k) +Φ(k+1) −Z(k+1)


where Shrinkα(x) = sign(x)max{0, ∣x∣ − α} is the shrinkage operator


25/54

Convergence

Convergence of MADMM with different random initializations(compressed modes problem of size n = 500, k = 10)

10−1 100 101 102

101

102

103

Time (sec)

Co

st


26/54

Convergence

Convergence of MADMM with different X-step solvers(compressed modes problem of size n = 500, k = 10)

10−1 100 101 102

101

102

103

2

3

5

23

5

Time (sec)

Co

st

Trust regions

Conjugate gradients

Kovnatsky, Glashoff, B 2015; Manopt: Boumal et al. 2014

27/54

Convergence

Example of convergence of different methods(compressed modes problem of size n = 8 × 103, k = 10)

0 1,000 2,000 3,000 4,000 5,000100

101

102

103

Time (sec)

Co

st

Lai & Osher

Neumann et al.

MADMM

Kovnatsky, Glashoff, B 2015; Lai, Osher 2014; Neumann et al. 2014

28/54

Scalability

Complexity of different methods(compressed modes problem of size n, k = 10)

1,000 2,000 3,000 4,000 5,00010−1

100

101

102

Problem size n

Tim

e/it

er(s

ec)

Lai & Osher

Neumann

MADMM

Kovnatsky, Glashoff, B 2015; Lai, Osher 2014; Neumann et al. 2014

29/54

Functional correspondence

30/54

Applications of shape correspondence

Texture mapping Pose transfer

B2, Kimmel 2007; Sumner et al. 2004

31/54

Shape correspondence

s

S

q

Q

t

Point-wise map t∶S → Q

31/54


s

S

q

Q

t

s′

q′

Minimum-distortion point-wise map t∶S → Q

B2, Kimmel 2006

31/54


f

F(S)

g

F(Q)

linear T

Functional map T ∶F(S)→ F(Q)

Ovsjanikov et al. 2012

32/54


f

g

↓

T↓

Representation in truncated Laplacain eigenbasis, T ≈ ΨCΦ⊺

If T is area-preserving, C⊺C = IRepresent C =XY ⊺, then T ≈ ΨΦ⊺ = ΨX(ΦY )⊺ (rotation of bases)


32/54


f

g

φ1 φ2 φk

ψ1 ψ2 ψk

≈ a1 + a2 + ⋯ + ak

≈ b1 + b2 + ⋯ + bk

↓

T↓




32/54


f

g

φ1 φ2 φk

ψ1 ψ2 ψk

≈ a1 + a2 + ⋯ + ak

≈ b1 + b2 + ⋯ + bk

↓

T↓

↓

C↓




32/54


f

g

φ1 φ2 φk

ψ1 ψ2 ψk

≈ a1 + a2 + ⋯ + ak

≈ b1 + b2 + ⋯ + bk

↓

T↓

↓

C↓


If T is area-preserving, C⊺C = I

Represent C =XY ⊺, then T ≈ ΨΦ⊺ = ΨX(ΦY )⊺ (rotation of bases)


32/54


f

g

φ1 φ2 φk

ψ1 ψ2 ψk

≈ a1 + a2 + ⋯ + ak

≈ b1 + b2 + ⋯ + bk

↓

T↓

↓

C↓




32/54


f

g

φ1 φ2 φk

ψ1 ψ2 ψk

≈ a1 + a2 + ⋯ + ak

≈ b1 + b2 + ⋯ + bk

↓

T↓

↓

C↓



Given known corresponding functions F = (f1,⋯, fq) andG = (g1,⋯, gq), find C by solving linear system CΦ⊺F = Ψ⊺G


32/54


f

g

φ1 φ2 φk

ψ1 ψ2 ψk

≈ a1 + a2 + ⋯ + ak

≈ b1 + b2 + ⋯ + bk

↓

T↓

↓

C↓



Given known corresponding Fourier coefficients A = Φ⊺F andB = Ψ⊺G, find C by solving linear system CA = B


33/54

Functional correspondence in shape collection

S1 S2

SL⋱

AijCij ≈ Bij

Si

Sj

Kovnatsky, B2, Glashoff, Kimmel 2013; Kovnatsky, Glashoff, B 2015

33/54

Functional correspondence in shape collection

X1 X2

XLXi

Xj

S1 S2

SL⋱

AijXi ≈ BijXj

Si

Sj

Kovnatsky, B2, Glashoff, Kimmel 2013; Kovnatsky, Glashoff, B 2015

34/54

Functional correspondence as manifold optimization

min(X1,⋯,XL)∈SL(k,k)

∑i≠j∥AijXi −BijXj∥2,1 + µ

L

∑i=1

tr(X⊺i ΛiXi)

where Aij ,Bij are Fourier coefficients of given corresponding functionson shapes Si,Sj , and Λi are the first k eigenvalues of Laplacian ∆Si

Joint diagonalization of Laplacians: find new bases Φi = ΦiXi thatapproximately diagonalize ∆i and are coupled

Optimization on product of Stiefel manifolds SL(k, k)L2,1-norm allows to cope with outliers in correspondence data

X-step: manifold-constrained minimization of a quadratic function

Z-step: one iteration of shrinkage

Eynard, Kovnatsky, B2, Glashoff 2012; Kovnatsky, Glashoff, B 2015

34/54




L

∑i=1

tr(X⊺i ΛiXi)



Optimization on product of Stiefel manifolds SL(k, k)

L2,1-norm allows to cope with outliers in correspondence data




34/54




L

∑i=1

tr(X⊺i ΛiXi)



Optimization on product of Stiefel manifolds SL(k, k)L2,1-norm allows to cope with outliers in correspondence data




35/54

Correspondence data

Example of correspondence data(10% of outliers shown in red)


36/54

Correspondence quality

Robust (MADMM)

Least squares


37/54

Correspondence quality

Correspondence quality evaluated using Princeton protocol

0 5 ⋅ 10−2 0.1 0.15 0.2 0.250

0.2

0.4

0.6

0.8

1

% geodesic diameter

%o

fco

rres

po

nd

ence

LS

MADMM

Kovnatsky, Glashoff, B 2015; data: B2, Kimmel 2008, benchmark: Kim et al. 2011

38/54

Convergence

Convergence of different methods

0 2 4 6 8 10

100.2

100.4

10-410-6

10-8

Time (sec)

Co

st

Smoothing

MADMM


39/54

Multimodal spectral clustering

UncoupledNo outliers

100%

Coupled (L2)No outliers

53%

Coupled (L2)10% outliers

72%

Coupled (L2,1)10% outliers

82%

Kovnatsky, Glashoff, B 2015; Eynard, Kovnatsky, B2, Glashoff 2012

40/54

Multidimensional scaling

41/54

Multidimensional scaling

D = X =

7 1 9 2 13

10 2 7 2 13

9 1 2 2 2

2 14 2 7 9

3 14 1 2 1

3 2 9 10 7

MDS problem: given an n × n (squared) distance matrix D, find ak-dimensional configuration of points X ∈ Rn×k such that

∥xi − xj∥22 ≈ dij

Cayton, Dasgupta 2006

42/54

Similarity vs Distance

Equivalence between distances and similarities

(Squared) distances Similarities

EDM PSD

dist(B) = (bii + bjj − 2bij)

B = − 12HDH

where H = I − 1n

11⊺ is the double-centering matrix

Schonberg 1938; Dattoro 2005; Cayton, Dasgupta 2006

42/54

Similarity vs Distance

Equivalence between distances and similarities

(Squared) distances Similarities

EDM PSD

B = − 12HDH

B∗= UΛ+U⊺

where H = I − 1n

11⊺ is the double-centering matrix

Schonberg 1938; Dattoro 2005; Cayton, Dasgupta 2006

43/54

Classical MDS

Algorithm 5 Classical MDS

Input squared distance matrix D

Compute similarity by double centering: B = − 12HDH

Perform eigendecomposition B = UΛU⊺ and take the largest k positiveeigenvalues Λk and corresponding eigenvectors Uk

Output X = UkΛ1/2k

Classical MDS as optimization problem: minimize the strain

minX∈Rn×k

∥HDH −XX⊺∥2F

Young, Householder 1938

44/54

Sensitivity to outliers

Error dispersion by double-centering

(Squared) distance matrix Similarity matrix

ε ε/n ε

ε/n2

B = − 12HDH


45/54


Seattle

SF

LA

Denver

NY WDC

Atlanta

Miami Houston

Chicago

Distances between 10 US cities computed with classical MDS

with distance between NY and LA doubled

Kruskal, Wish 1978; Cayton, Dasgupta 2006

45/54


Seattle SF

LA

Denver

NY

WDC Atlanta

Miami

Houston Chicago

Distances between 10 US cities computed with classical MDSwith distance between NY and LA doubled

Kruskal, Wish 1978; Cayton, Dasgupta 2006

46/54

Robust Euclidean embedding (REE)

Minimize a robust norm (instead of the Frobenius norm)

minD∗∈EDM

∥D −D∗∥1

and then recover k-dimensional X from D∗ using classical MDS

Non-smooth

Can be formulated as a semi-definite program (SDP), or

Solved by subgradient minimization


47/54

REE as manifold optimization

minB∈S+(n,k)

∥D − dist(B)∥1

Manifold of fixed-rank positive semi-definite matricesS+(n, k) = {X ∈ Rn×n ∶X =X⊺ ⪰ 0, rank(X) = k}Only non-smooth function (f ≡ 0)




48/54

REE by MADMM

Algorithm 6 MADMM for solving the REE problem

Input squared distance matrix D

Initialize k ← 1, Z(1) =X(1), U (1) = 0

repeat

X-step: B(k+1) = argminB∈S+(n,k)

∥dist(B(k+1)) −Z(k) −D +U (k)∥2F

Z-step: Z(k+1) = Shrink 1ρ

(dist(B(k+1)) −D +U (k))

Update U (k+1) = U (k) + dist(B(k+1)) −D −Z(k+1)



49/54

Robust Euclidean embedding example

Groundtruth

Classical MDS

MADMM

Embedding of distanced between 500 US cities corrupted by sparse noise(doubling the distance between a few pairs of cities)


50/54

Scalability of REE

Complexity of different methods for REE problem of different size n

0 200 400 600 800 1,000

10−2

100

102

Problem size n

Tim

e/it

er(s

ec)

SDP

Subgradient

MADMM

Kovnatsky, Glashoff, B 2015; Cayton, Dasgupta 2006

51/54

Convergence

Convergence of different methods on REE problem of size n = 500

0 20 40 60 80 100

103.5

104

10-310-4

10-5

10-2

Time (sec)

Str

ess

Subgradient

MADMM

Kovnatsky, Glashoff, B 2015; Cayton, Dasgupta 2006

52/54

Conclusions

Non-smooth manifold optimization problems are ubiquitous inmachine learning, pattern recognition, signal processing, andcomputer graphics applications

MADMM is a generic algorithm for such problems

Any manifold, any function

Very simple to implement

No parameters to tune

A. Kovnatsky, K. Glashoff, M. M. Bronstein, ‘MADMM: a generic algorithm fornon-smooth optimization on manifolds’, arXiv:1505.07676, May 2015

53/54

A. Kovnatsky

Funded by

54/54

Thank you!

madmm: a generic algorithm for non-smooth optimization on ... · 9/25/2015 · 4/54 applications...

Documents