massively scalable sinkhorn distances via the nyström method · 2019. 10. 25. · massively...

19
Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed, A. Rudi

Upload: others

Post on 02-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Massively scalable Sinkhorn distances via the Nyström method

J. Altschuler, F. Bach, J. Niles-Weed, A. Rudi

Page 2: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Optimal Transport

Increasingly popular tool in machine learning

Kusner et al. (2015)

Day 18

Day 0low probability high probabilitytrajectory trajectory

Neural

Day 0MEF

Stromal

IPSC

Trophoblast

Schiebinger et al. (2018)

Feydy et al. (2017)

Gulrajani et al. (2017)

Page 3: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Definitions

p, q ∈ Δn

pq

xixj

two probability distributions on points in .

nℝd

x1, …, xn ∈ ℝd

∥xi∥ ≤ R ∀i

Wasserstein distance between and p qW(p, q) := minP∈ℳ(p,q) ∑

i,j

Pij∥xi − xj∥2

set of couplings between and p qℳ(p, q) := P :P ∈ ℝn×n

+P1 = pP⊤1 = q

Slow to compute

Page 4: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Sinkhorn distance

Wasserstein distance between and p qW(p, q) := minP∈ℳ(p,q) ∑

i,j

Pij∥xi − xj∥2

[Cuturi ’13] : Sinkhorn “distance”Wη(p, q) := minP∈ℳ(p,q) ∑

i,j

Pij∥xi − xj∥2 − η−1H(P)

H(P ) =X

ij

Pij log1

Pij

Slow to compute — timeO(n3)

Faster to compute — timeO(n2)

Can we do better?

Page 5: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Our contribution

New algorithm (NYS-SINK) approximates Sinkhorn distance in time and spaceO(n)

Guarantees automatically adaptive to low-dimensional structure

Outperforms standard approaches by orders of magnitude

Page 6: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Minimizer: unique matrix in of formℳp,q

Why Sinkhorn distance?

· ·

positive diagonal matrices

entrywise exponential

∈ ℳp,qe−η∥xi−xj∥2

Page 7: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Sinkhorn scalingEasy iterative algorithm to find minimizer

1. Rescale rows to match 2. Rescale columns to match 3. Repeat

pq

Converges! [Sinkhorn ’67]

Goal: find (unique) Pη = D1e−η∥xi−xj∥2D2 ∈ ℳp,q

… in time! [Altschuler, Weed, Rigollet ’17]O(n2ϵ−2)

Page 8: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Prior analysis [AWR ’17]

O(1)iterations

x O(n2)per iteration

(matrix-vector product)

= O(n2)time

· ·e−η∥xi−xj∥2

Page 9: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Idea #1

O(1)iterations

x O(n)per iteration

(matrix-vector product)

= O(n)time

· · low rank approximation

(via Nyström method)

Page 10: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Idea #1

O(1)iterations

x O(n)per iteration

(matrix-vector product)

= O(n)time

· ·

Rigorous analysis needs new stability guarantees for Sinkhorn scaling

low rank approximation

(via Nyström method)

Page 11: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Idea #1

· · low rank approximation

(via Nyström method)

Works well, but error guarantee depends on ambient dimension (could be large)

Page 12: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Idea #2

· · low rank approximation

(via Nyström method)

Works well, but error guarantee depends on ambient dimension (could be large)

Choose rank adaptively and pay only for intrinsic dimension

Page 13: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Idea #2

· · low rank approximation

(via Nyström method)

Works well, but error guarantee depends on ambient dimension (could be large)

Choose rank adaptively and pay only for intrinsic dimension

Rigorous analysis needs new interpolation guarantees for Gaussian kernel

Page 14: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Experiments

Comparison with existing approaches ( approximation rank, iterations)r : T :

Page 15: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Experiments

comparable results

Page 16: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Experiments

orders-of-magnitude speedup

Page 17: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Experiments

Comparison with Sinkhorn scaling ( desired accuracy)ε :

Page 18: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Experiments

linear vs. quadratic runtime

Page 19: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,

Experimentsmuch larger problems solvable

in memory