optimal transport vs. fisher-rao distance between copulas
TRANSCRIPT
![Page 1: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/1.jpg)
IntroductionStatistical distances
Optimal Transport vs. Fisher-Rao distancebetween Copulas
IEEE SSP 2016
G. Marti, S. Andler, F. Nielsen, P. Donnat
June 28, 2016
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 2: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/2.jpg)
IntroductionStatistical distances
Clustering of Time Series
We need a distance Dij between time series xi and xj
If we look for ‘correlation’, Dij is a decreasing function of ρij ,a measure of ‘correlation’
Several choices are available for ρij . . .
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 3: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/3.jpg)
IntroductionStatistical distances
Copulas
Sklar’s Theorem:
F (xi , xj) = Cij(Fi (xi ),Fj(xj))
Cij , the copula, encodes the dependence structureFrechet-Hoeffding bounds:
max{ui + uj − 1, 0} ≤ Cij(ui , uj) ≤ min{ui , uj}
(left) lower-bound, (mid) independence, (right) upper-bound copulas
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 4: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/4.jpg)
IntroductionStatistical distances
Copulas - Gaussian Example
Gaussian copula: CGaussR (ui , uj) = ΦR(Φ−1(ui ),Φ
−1(uj))
The distribution is parametrized by a correlation matrix R.
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 5: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/5.jpg)
IntroductionStatistical distances
The Target/Forget (copula-based) Dependence Coefficient
Dependence is measured as the relative distance from independence tothe nearest target-dependence: comonotonicity or counter-monotonicity
Which distances are appropriate between copulas for the task ofclustering (copulas and time series)?
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 6: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/6.jpg)
IntroductionStatistical distances
Definitions - Fisher-Rao geodesic distance
Metrization of the paramater space {θ ∈ Rd |∫p(X ; θ)dx = 1}.
Consider the metric gjk(θ) = −∫ ∂2 log p(x ,θ)
∂θj∂θkp(x , θ)dx ,
the infinitesimal length ds(θ) =√
(∇θ)>G (θ)∇θ,
the Fisher-Rao geodesic distance
FR(θ1, θ2) =
∫ θ2
θ1
ds(θ).
f -divergences induce infinitesimal length proportional toFisher-Rao infinitesimal length:
Df (θ‖θ + dθ) =1
2(∇θ)>G (θ)∇θ.
Thus, they have the same local behaviour [1].
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 7: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/7.jpg)
IntroductionStatistical distances
Definitions - Optimal Transport distances
Wasserstein metric
Wp(µ, ν)p = infγ∈Γ(µ,ν)
∫M×M
d(x , y)pdγ(x , y)
Image from Optimal Transport for Image Processing, Papadakis
Other transportation distances: regularized discrete optimaltransport [3], Sinkhorn distances [2], . . .
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 8: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/8.jpg)
IntroductionStatistical distances
Geometry of covariances
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 9: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/9.jpg)
IntroductionStatistical distances
Distances between Gaussian copulas
Copulas C1,C2,C3 encoding a correlation of 0.5, 0.99, 0.9999 respectively;Which pair of copulas is the nearest?- For Fisher-Rao, Kullback-Leibler, Hellinger and related divergences:D(C1,C2) ≤ D(C2,C3);- For Wasserstein: W2(C2,C3) ≤W2(C1,C2)
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 10: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/10.jpg)
IntroductionStatistical distances
Distances as a function of (ρ1, ρ2)
Distance heatmap and surface as a function of (ρ1, ρ2)
for Fisher-Rao for Wasserstein W2
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 11: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/11.jpg)
IntroductionStatistical distances
Distances impact on clustering
Datasets of bivariate time series are generated from six Gaussian copulaswith correlation .1, .2, .6, .7, .99, .9999
Distance heatmaps for Fisher-Rao (left), W2 (right); Using Wardclustering, Fisher-Rao yields clusters of copulas with correlations{.1, .2, .6, .7}, {.99}, {.9999}, W2 yields {.1, .2}, {.6, .7}, {.99, .9999}
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 12: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/12.jpg)
IntroductionStatistical distances
Fisher metric and the Cramer–Rao lower bound
Cramer–Rao lower bound (CRLB)
The variance of any unbiased estimator θ of θ is bounded by thereciprocal of the Fisher information G (θ):
var(θ) ≥ 1
G (θ).
In the bivariate Gaussian copula case,
var(ρ) ≥ (ρ− 1)2(ρ+ 1)2
3(ρ2 + 1).
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 13: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/13.jpg)
IntroductionStatistical distances
Fisher metric and the Cramer–Rao lower bound
We consider the set of 2× 2 correlation matrices C =
(1 θθ 1
)parameterized by θ.
Let x =
(x1x2
)∈ R2.
f (x ; θ) = 1
2π
√1−θ2
exp(− 1
2x>C−1x
)= 1
2π
√1−θ2
exp
(− 1
2(1−θ2)(x2
1 + x22 − 2θx1x2)
)log f (x ; θ) = − log(2π
√1− θ2)− 1
2(1−θ2)(x2
1 + x22 − 2θx1x2)
∂2 log f (x ;θ)
∂θ2 = − θ2+1(θ2−1)2 −
x21
2(θ+1)3 +x21
2(θ−1)3 −x22
2(θ+1)3 +x22
2(θ−1)3 −x1x2
(θ+1)3 −x1x2
(θ−1)3
Then, we compute∫∞−∞
∂2 log f (x ;θ)
∂θ2 f (x ; θ)dx .
Since E[x1] = E[x2] = 0, E[x1x2] = θ, E[x21 ] = E[x2
2 ] = 1, we get∫∞−∞
∂2 log f (x ;θ)
∂θ2 f (x ; θ)dx =
− θ2+1(θ2−1)2 −
12(θ+1)3 + 1
2(θ−1)3 −1
2(θ+1)3 + 12(θ−1)3 −
θ(θ+1)3 −
θ(θ−1)3 = − 3(θ2+1)
(θ−1)2(θ+1)2
Thus,
G(θ) =3(θ2 + 1)
(θ − 1)2(θ + 1)2.
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 14: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/14.jpg)
IntroductionStatistical distances
Fisher metric and the Cramer–Rao lower bound
In the bivariate Gaussian copula case,
var(ρ) ≥ (ρ− 1)2(ρ+ 1)2
3(ρ2 + 1).
Recall that locally Fisher-Rao and the f -divergences are aquadratic form of the Fisher metric (∇θ)>G (θ)∇θ. So, thediscriminative power of these distances is well calibrated withrespect to statistical uncertainty. For this purpose, they induce theappropriate curvature on the parameter space.
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 15: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/15.jpg)
IntroductionStatistical distances
Properties of these distances
In addition, for clustering we prefer OT since:
in a parametric setting:
Fisher-Rao and f -divergences are defined on density manifolds,but some important copulas (such as the Frechet-Hoeffdingupper bound) do not belong to these manifolds;Thus, in case of closed-form formulas (such as in the Gaussiancase), they are ill-defined for these copulas (for perfectdependence, covariance is not invertible)
in a non-parametric/empirical setting:
f -divergences are defined for absolutely continuous measures,thus require a pre-processing KDEthey are not aware of the support geometry, thus badly handlenoise on the support
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 16: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/16.jpg)
IntroductionStatistical distances
Barycenters
OT is defined for both discrete/empirical and continuous measuresand is support-geometry aware:
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 10
0.5
1
0.0000
0.0008
0.0016
0.0024
0.0032
0.0040
0.0048
0.0056
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
5 copulas describing the dependence between X ∼ U([0, 1]) andY ∼ (X ± εi )2, where εi is a constant noise specific for each distribution
0 0.5 10
0.5
1Wasserstein barycenter copula
0.0000
0.0004
0.0008
0.0012
0.0016
0.0020
0.0024
0.0028
0.0032
Barycenter of the 5 copulas for a divergence and OT
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 17: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/17.jpg)
IntroductionStatistical distances
Future Research
Develop further geometries of copulas
using Optimal Transport: show that dependence-clustering oftime series is improved over standard correlationsusing f -divergences: detect efficiently dependence-regimeswitching in multivariate time series (cf. Frederic Barbaresco’swork on radar signal processing)
Numerical experiments and code:
https://www.datagrapple.com/Tech/fisher-vs-ot.html
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
![Page 18: Optimal Transport vs. Fisher-Rao distance between Copulas](https://reader031.vdocuments.net/reader031/viewer/2022033108/587c497a1a28abc62c8b4655/html5/thumbnails/18.jpg)
IntroductionStatistical distances
Shun-ichi Amari and Andrzej Cichocki.Information geometry of divergence functions.Bulletin of the Polish Academy of Sciences: TechnicalSciences, 58(1):183–195, 2010.
Marco Cuturi.Sinkhorn distances: Lightspeed computation of optimaltransport.In Advances in Neural Information Processing Systems, pages2292–2300, 2013.
Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyre,and Jean-Francois Aujol.Regularized discrete optimal transport.Springer, 2013.
Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas