di erential stein operators for multivariate continuous
TRANSCRIPT
![Page 1: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/1.jpg)
Differential Stein operators for multivariate continuousdistributions and applications
Gesine Reinert
A French/American Collaborative Colloquium on ConcentrationInequalities, High Dimensional Statistics and Stein’s Method
July 4th, 2017
Joint work with Guillaume Mijoule and Yvik Swan (Liege)
1 / 41
![Page 2: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/2.jpg)
Stein’s method
Outline
1 Stein’s method
2 The score function and the Stein kernel
3 Higher dimensions
4 Stein operators TpF = div(F p)/p
5 Last remarks
2 / 41
![Page 3: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/3.jpg)
Stein’s method
Stein’s method in a nutshell
For µ a target distribution, with support I:
1 Find a suitable operator A (called Stein operator) and a wide class offunctions F(A) (called Stein class) such that X ∼ µ if and only if forall functions f ∈ F(A),
EAf (X ) = 0.
2 Let H(I) be a measure-determining class on I. For each h ∈ H finda solution f = fh ∈ F(A) of the
h(x)− Eh(X ) = Af (x),
where X ∼ µ. If the solution exists and if it is unique in F(A) thenwe can write
f (x) = A−1(h(x)− Eh(X )).
We call A−1 the inverse Stein operator (for µ).
3 / 41
![Page 4: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/4.jpg)
Stein’s method
Example: mean zero normal
Stein (1972, 1986), see also Chen, Goldstein, Shao 2011Z ∼ N (0, σ2) if and only if for all smooth functions f ,
EZf (Z ) = σ2Ef ′(Z ).
Given a test function h, let Z ∼ N (0, σ2); the Stein equation is
σ2f ′(w)− wf (w) = h(w)− Eh(Z )
which has as unique bounded solution
f (y) =1
σ2ey
2/2σ2∫ y
−∞(h(x)− Eh(Z )) e−x
2/2σ2dx .
4 / 41
![Page 5: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/5.jpg)
Stein’s method
Example: the sum of independent random variables
X1, . . . ,Xn indept mean zero, Var = 1n ; W =
∑ni=1 Xi . Then
Ef ′(W )− EWf (W )
= Ef ′(W )−n∑
i=1
EXi f (W )
= Ef ′(W )−n∑
i=1
EXi f (W − Xi ) +n∑
i=1
EX 2i f′(W − Xi ) + R
=1
n
n∑i=1
(Ef ′(W )− Ef ′(W − Xi )
)+ R;
bound this expression by Taylor expansion to give that for any smooth h
|Eh(W )− Nh| ≤ ‖h′‖
(2√n
+n∑
i=1
E|X 3i |
).
Note: nothing goes to infinity.5 / 41
![Page 6: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/6.jpg)
Stein’s method
Comparison of distributions
Let X and Y have distributions µX and µY with Stein operators AX andAY , so that F(AX ) ∩ F(AY ) 6= ∅ and choose H(I) such that allsolutions f of the Stein equation belong to this intersection. Then
Eh(X )− Eh(Y ) = EAY f (X ) = EAY f (X )− EAX f (X )
and
suph∈H(I)
|Eh(X )− Eh(Y )| ≤ supf ∈F(AX )∩F(AY )
|EAX f (X )− EAY f (X )|.
If H(I) is the set of all Lipschitz-1-functions then the resulting distance isdW , the Wasserstein distance. For examples: Holmes (2004),Eichelsbacher and R. (2008), Dobler (2012), Ley, Swan and R. 2015, 2017.
6 / 41
![Page 7: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/7.jpg)
The score function and the Stein kernel
Outline
1 Stein’s method
2 The score function and the Stein kernel
3 Higher dimensions
4 Stein operators TpF = div(F p)/p
5 Last remarks
7 / 41
![Page 8: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/8.jpg)
The score function and the Stein kernel
A Stein operator for continuous real-valued variables
Let X be continuous having pdf p with support I = [a, b] ⊂ R.
The Stein class of X is the class F(p) of functions f : R→ R such that(i) x 7→ f (x)p(x) is differentiable on R(ii) (fp)′ is integrable and
∫(fp)′ = 0.
To p associate the Stein operator Tp:
Tpf =(fp)′
p.
(Stein 1986, Stein et al. 2004, Ley and Swan 2013)
By the product rule,
E[g ′(X )f (X )
]= −E [g(X )Tpf (X )]
for all f ∈ F(p) and for all differentiable functions g such that∫(gfp)′dx = 0, and
∫|g ′fp|dx <∞; we say that g ∈ dom((·)′ , p, f ).
8 / 41
![Page 9: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/9.jpg)
The score function and the Stein kernel
Stein characterisations
Let Y be continuous with density q, and same support as X .
1 Suppose that qp is differentiable. Take g ∈ ∩f ∈F(p)dom((·)′ , p, f )
such that g is p-a.s. never 0 and g qp is differentiable. Then
YD= X if and only if E
[f (Y )g ′(Y )
]= −E [g(Y )Tpf (Y )]
for all f ∈ F(p).
2 Let f ∈ F(p) be p-a.s. never zero and assume that dom((·)′ , p, f ) isdense in L1(p). Then
YD= X if and only if E
[f (Y )g ′(Y )
]= −E [g(Y )Tpf (Y )]
for all g ∈ dom((·)′ , p, f ).
9 / 41
![Page 10: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/10.jpg)
The score function and the Stein kernel
The inverse Stein operator
Let F (0)(p) be the class of mean zero smooth test functions; the inverseStein operator T −1
p : F (0)(p)→ F(p) is
T −1p h(x) = − 1
p(x)
∫ x
ap(y)h(y)dy =
1
p(x)
∫ b
xp(y)h(y)dy .
The equation
h(x)− Eh(X ) = f (x)g ′(x) + g(x)Tpf (x), x ∈ I,
is a Stein equation for the target p. Solutions of this equation (for h suchthat a solution exists) are pairs of functions (f , g) such that
fg = T −1p (h − Eph).
Although fg is unique, the individual f and g are not.
10 / 41
![Page 11: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/11.jpg)
The score function and the Stein kernel
f (x)g ′(x) + g(x)Tpf (x): Special Stein operators
Our general Stein operator is an operator on pairs of functions (f , g);
A(f , g)(x) = Tp(fg)(x) = f (x)g ′(x) + g(x)Tpf (x).
Suppose that 1 ∈ F(p). Then taking f (x) = 1 we get
Apg(x) = g ′(x) + g(x)ρ(x) with ρ(x) = Tp1(x) =p′(x)
p(x)
the so-called “score function” of p; see for example Stein (2004).
If X has finite mean ν taking f (x) = T −1p (ν − x) we get
AXg(x) = τ(x)g ′(x) + (ν − x)g(x) with τ = T −1p (ν − Id)
the “Stein kernel of p”; see Stein (1986) and Cacoullos et al. (1992).
11 / 41
![Page 12: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/12.jpg)
The score function and the Stein kernel
Example: Normal
In the example of a N (0, σ2) random variable,
TN f (x) = −f ′(x) +1
σ2xf (x)
which contrasts withσ2f ′(x)− xf (x),
the standard Stein operator for this case. The score function is − xσ2 . The
Stein kernel isτ(x) = σ2
giving the standard Stein operator.
12 / 41
![Page 13: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/13.jpg)
Higher dimensions
Outline
1 Stein’s method
2 The score function and the Stein kernel
3 Higher dimensions
4 Stein operators TpF = div(F p)/p
5 Last remarks
13 / 41
![Page 14: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/14.jpg)
Higher dimensions
Notation
Let e1, . . . , ed be the canonical basis for Cartesian coordinates in Rd .
The gradient for φ : Rd → R is ∇φ =(∂φ∂x1, . . . , ∂φ∂xd
)T=∑d
i=1(∂iφ)ei .
The gradient of a vector field v : Rd → Rr : x 7→ (v1(x), v2(x), . . . , vr (x))(a line vector) is the matrix
∇v =(∇v1 ∇v2 · · · ∇vr
)=
(∂vj∂xi
)1≤i≤d ,1≤j≤r
.
If r = d then the divergence of v is
div(v) = ∇ · vT =d∑
i=1
∂vi∂xi
= Tr (∇v) ,
with Tr the trace operator and x · y = xT y = 〈x , y〉 the Euclidean scalarproduct between x and y .
14 / 41
![Page 15: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/15.jpg)
Higher dimensions
More generally, the divergence of a q × d tensor field
F : Rd → Rq × Rd : x 7→ F(x) =
F1(x)...
Fq(x)
=
F11(x) . . . F1d(x)...
. . ....
Fq1(x) . . . Fqd(x)
is
div(F) :=∇∇∇ · F =
div(F1)...
div(Fq)
=
∇ · FT1
...∇ · FT
q
=
d∑
i=1
∂F1i∂xi
...d∑
i=1
∂Fqi
∂xi
.
The divergence maps matrix-valued functions F : Rd → Rq × Rd ontovector valued functions div(F) : Rd → Rq.
15 / 41
![Page 16: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/16.jpg)
Higher dimensions
Product rule for divergence
Let F : Rd → Rq × Rd be a q × d tensor field and φ : Rd → R. Then,under appropriate regularity conditions,
div(Fφ) = div(F)φ+ F∇φ.
Similarly if F is a q × d tensor field and G is a d × d tensor field then FGis a q × d vector field and
(div (FG))j = Fjdiv(G) + Tr (grad (Fj)G)
for j = 1, . . . , q.
16 / 41
![Page 17: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/17.jpg)
Higher dimensions
What is known: multivariate normal
Y ∈ Rd is a multivariate normal MVN (0,Σ) if and only if
EY t∇f (Y ) = E∇tΣ∇f (Y ), for all smooth f : Rd → R.
Assume that h : Rd → R has 3 bounded derivatives. Then, if Σ ∈ Rd×d issymmetric and positive definite, and Z ∼MVN (0,Σ) , there is a solutionf : Rd → R to the Stein equation
∇tΣ∇f (w)− w t∇f (w) = h(w)− Eh(Σ1/2Z ),
which holds for every w ∈ Rd .
17 / 41
![Page 18: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/18.jpg)
Higher dimensions
The Mehler formula
To solve ∇tΣ∇f (w)− w t∇f (w) = h(w)− Eh(Σ1/2Z ), w ∈ Rd , fort ∈ [0, 1] put
Zw ,t =√tw +
√1− t Σ1/2Z ,
then
f (w) =
∫ 1
0
1
2t[Eg(Zw ,t)− Eg(Σ1/2Z )]dt
is a solution to the Stein equation. This solution f satisfies the bounds∣∣∣∣∣ ∂k f (w)∏kj=1 ∂wij
∣∣∣∣∣ ≤ 1
k
∣∣∣∣∣ ∂kh(w)∏kj=1 ∂wij
∣∣∣∣∣for every w ∈ Rd .(Barbour 1990, Gotze 1993, Rinott and Rotar 1996, Goldstein and Rinott1996, R. + Rollin 2007, Meckes 2009, Chen, Goldstein and Shao 2011)
18 / 41
![Page 19: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/19.jpg)
Higher dimensions
What is known: strictly log-concave
(Mackey and Gorham 2016) For continuous p on Rd , such thatlog p ∈ C 4(Rd) is k-strictly concave, the operator
Af (w) =1
2〈∇f (w),∇ log p(w)〉+
1
2∆f (w)
is the generator of an overdamped Langevin diffusion. The Stein equation
Af (w) = h(w)− Eph
is solved by
f (w) =
∫ ∞0
[Eph(Z )− Eh(Zw ,t)]dt
with (Zw ,t)t≥0 the overdamped Langevin diffusion with generator A andZw ,0 = w .
The first three derivatives of f can be bounded in terms of same and lowerorder derivatives of h.
19 / 41
![Page 20: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/20.jpg)
Higher dimensions
What is known: Score functions
(Nourdin et al. 2013, 2014) Let X ∈ Rd have mean 0 and p.d.f.p : Rd → R. The score of p is the random vector ρp(X ) inRd whichsatisfies
Eρp(X )φ(X ) = −E∇φ(X )
for all φ ∈ C∞c (Rd).
If p has a score, then it is uniquely defined through ρp(x) = ∇ log p(x).
20 / 41
![Page 21: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/21.jpg)
Higher dimensions
What is known: Stein kernels
(Nourdin et al. 2013, 2014) A random d × d matrix τp(X ) such that
Eτp(X )∇φ(X ) = EXφ(X )
for all φ ∈ C∞c (Rd) is called a strong Stein kernel for p.Ledoux et al. 2015: τp(X ) is a weak Stein kernel if for all φ ∈ C∞c (Rd)
ETr(τp(X )Hess(φ(X ))T ) = EX∇φ(X ).
There is no reason to assume uniqueness for the Stein kernel, or existence.If τ1 and τ2 are two Stein kernels for p, then for all φ ∈ C∞c (Rd),
E(τ1(X )− τ2(X ))∇φ(X ) = 0;
thendiv(p(x)(τ1(x)− τ2(x)) = 0
from which we get uniqueness only in the one-dimensional case.21 / 41
![Page 22: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/22.jpg)
Stein operators TpF = div(F p)/p
Outline
1 Stein’s method
2 The score function and the Stein kernel
3 Higher dimensions
4 Stein operators TpF = div(F p)/p
5 Last remarks
22 / 41
![Page 23: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/23.jpg)
Stein operators TpF = div(F p)/p
The general multivariate density case
Let X ∈ Rd have pdf p : Rd → R with respect to the Lebesgue measureon Rd . Let Ω be the support of p.
1 Let q ∈ N0. The q-Stein class for X is the class Fq(X ) of allF : Rd → Rq × Rd such that pF is(i) differentiable in the sense that its gradient exists,(ii) div(pF) is integrable, on Ω(iii)
∫Ω div(pF) = 000.
2 We propose as Stein operator of p the operator
TpF =div(F p)
p
acting on test functions F : Rd → Rq × Rd ∈ Fq(X ).
If F ∈ Fq(X ) then TXF : Rd → Rq.
23 / 41
![Page 24: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/24.jpg)
Stein operators TpF = div(F p)/p
Stein type integration by parts
To each F : Rd → Rq × Rd ∈ Fq(p) we associate dom(∇, p,F) the vectorspace of functions g : Rd → R such that F g ∈ Fq(p) and F∇g ∈ L1(p).Proposition:
Ep [F∇g ] = −Ep [(TXF) g ]
for all F ∈ Fq(p) and all g ∈ dom(∇, p,F).
Proof: Apply the product rule for divergence,
div(Fφ) = div(F)φ+ F∇φ,
to (Fφ)p with φ = g , to show that for TpF = div(F p)p ,
Tp(F g) = (TpF) g + F∇g ,
and then take expectations, using that∫
Ω div(Fgp) = 000 and hence thel.h.s has mean 0.
24 / 41
![Page 25: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/25.jpg)
Stein operators TpF = div(F p)/p
Stein operators
As in the 1-dimensional case, our Stein operators depend on two testfunctions, F and g , and are of the form
Tp(F g) = (TpF) g + F∇g
obtained by
either by fixing F and considering g as the (scalar-valued) testfunctions,
or fixing g and considering F as the (matrix-valued) test functions.
25 / 41
![Page 26: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/26.jpg)
Stein operators TpF = div(F p)/p
Tp(F g) = (TpF) g + F∇g : F = Id fixed
Suppose that the identity matrix Id ∈ Fd(p) (e.g. if p is log-concave andvanishes at ∂Ω). Then
TpId = ∇ log p = ρp,
and the Stein operator is Apg : Rd → Rd ,
Apg = Tp(Ig) = ∇g + ρX g
acting on g : Rd → R belonging to dom(∇, p, Id).
26 / 41
![Page 27: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/27.jpg)
Stein operators TpF = div(F p)/p
Tp(F g) = (TpF) g + F∇g : F = τpτpτp fixed
Let X have mean ν and suppose that there exists a d × d matrix-valuedfunction F = τpτpτp (a Stein kernel) satisfying
Tp(τpτpτp)(x) = −(x − ν)
at all x . Then Apg : Rd → Rd ,
Apg(x) = Tp(τpτpτpg)(x) = −(x − ν)g(x) + τXτXτX (x)∇g(x)
acting on differentiable functions g : Rd → R belonging to dom(∇, p, τττp).
27 / 41
![Page 28: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/28.jpg)
Stein operators TpF = div(F p)/p
Tp(F g) = (TpF) g + F∇g : g = 1 fixed
For g : Rd → R, g(x) = 1 we obtain for F ∈ Fq(p),
ApF(x) = TpF(x) ∈ Rq,
vector-valued. The Stein equation for a zero mean function h : Rd → Rq
is then
ApF(x) =div(Fp)
p(x) = h(x)
which givesdiv(Fp)(x) = p(x)h(x).
There is not a unique solution. If q = d then we could choose a solution Fsuch that Fi ,j = 0 for i 6= j .
28 / 41
![Page 29: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/29.jpg)
Stein operators TpF = div(F p)/p
Special case: q = 1
Let v = (v1, . . . , vn) : Rd → Rd be a vector field in the 0-Stein class forp : Rd → R. Then our Stein operator of p is
Tpv =(∇ · v)p + v∇p
p
=d∑
i=1
∂vi∂xi
+d∑
i=1
vi∂ip
p.
This is a function from Rd to R.
Take as vector field v = ∇f for a smooth function f : Rd → R. Thischoice gives
Ap(f ) = Tpv = ∆f + 〈∇ log p,∇f 〉,
interpreted as operator on f rather than v. This is the operator consideredby Mackey and Gorham 2016, except for a factor 1
2 .
29 / 41
![Page 30: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/30.jpg)
Stein operators TpF = div(F p)/p
Tp(F g) = (TpF) g + F∇g : g = p−1 fixed
For g : Rd → R, g(x) = 1/p(x) we obtain for F ∈ Fq(p),
ApF =div(Fp)
p2+ F∇(1/p) ∈ Rq,
vector-valued. The Stein equation for a zero mean function h : Rd → Rq
is thendiv(Fp)
p2(x) + F∇(1/p)(x) = h(x)
which givesdiv(F)(x) = p(x)h(x).
Again there is not a unique solution. If q = d then we could choose asolution F such that Fi ,j = 0 for i 6= j .
30 / 41
![Page 31: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/31.jpg)
Stein operators TpF = div(F p)/p
Example: multivariate normal
Consider Z ∼MVN d(0,ΣΣΣ). Then
ρp(x) = −ΣΣΣ−1x and τpτpτp(z) = ΣΣΣ.
(linear score and constant Stein kernel). These lead to the Stein operatorfor g : Rd → R
Apg(x) = ΣΣΣ∇g(x)− g(x)x .
31 / 41
![Page 32: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/32.jpg)
Stein operators TpF = div(F p)/p
Example: elliptical distributions
A d-random vector has multivariate elliptical distribution Ed(µ,Σ, φ) if itsdensity is given by
p(x) = κ|ΣΣΣ|−1/2φ
(1
2(x − µ)tΣΣΣ−1(x − µ)
)on Rd , for φ a smooth function and κ the normalising constant. Ellipticaldistributions have the score function
ρp(x) = ΣΣΣ−1xφ′(x tΣ−1x/2)
φ(x tΣ−1x/2),
and
τττp(x) =
(1
φ(x tΣΣΣ−1x/2)
∫ +∞
x tΣ−1x/2φ(u)du
)ΣΣΣ
is a strong Stein kernel for p (Landsman, Vanduffel, Yao 2014).
32 / 41
![Page 33: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/33.jpg)
Stein operators TpF = div(F p)/p
Bounds on the solution of the Stein equation
So we have Stein equations, but when are the solutions well behaved?
In the multivariate normal case: Mehler formula.
In the case of strictly log-concave distributions: overdamped Langevindiffusion.
The bounds will be distribution-specific.
33 / 41
![Page 34: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/34.jpg)
Stein operators TpF = div(F p)/p
Bounds using a Poincare constant
We say that Cp is a Poincare constant associated to µX if for everysmooth function ϕ ∈ L2(µX ) such that Eϕ(X ) = 0,
Eϕ2(X ) ≤ CpE|∇ϕ(X )|2.
For example, when X has k-log-concave density, then the law of Xsatisfies a Poincare inequality with Cp = 1/k .Using the Lax-Milgram theorem we can show the following result.
Let h be a smooth, 1-Lipschitz function. Let X be a random vector withdensity p, and assume Cp <∞ is a Poincare constant for p(x)dx . Thenwe prove that there exists a weak solution u to
∆u +∇ log p · ∇u = h − p(h),
such that ∫|∇u|2p ≤ C 2
p .
34 / 41
![Page 35: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/35.jpg)
Stein operators TpF = div(F p)/p
Application: nested densities
The Wasserstein distance between (the distributions of) X and Y is
dW (X ,Y ) = suph∈Lip(1)
|Eh(X )− Eh(Y )| .
Compare the Wasserstein distance between P1 and P2 on Rd , withdensities p1, assumed k-log concave, and p2 = π0p1. Put
A1u =1
2∇ log p1 · ∇u +
1
2∆u,
and
A2u =1
2∇ log p2 · ∇u +
1
2∆u.
Then
A2u = A1u +1
2∇ log π0 · ∇u.
35 / 41
![Page 36: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/36.jpg)
Stein operators TpF = div(F p)/p
Let h : Rd 7→ R be a 1-Lipschitz function, and uh a solution toA1uh = h −
∫hp1. Let X1 (X2) have distribution P1 (P2). Then as
A2u = A1u + 12∇ log π0 · ∇u,
E[h(X2)]− E[h(X1)] = E[A1uh(X2)]
= E[A2uh(X2)− 1
2∇ log π0(X2) · ∇uh(X2)
]= −1
2E [∇ log π0(X2) · ∇uh(X2)] .
Using the Poincare bounds we obtain
dW (X1,X2) ≤ 1
kE[|∇π0(X1)|].
36 / 41
![Page 37: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/37.jpg)
Stein operators TpF = div(F p)/p
Example: Copulas
Let (V1,V2) be a 2-dimensional random vector, such that the marginalsV1 and V2 have uniform U[0, 1] distribution. The copula of (V1,V2) is
C (x1, x2) = P[V1 ≤ x1,V2 ≤ x2], (x1, x2) ∈ [0, 1]2
and we assume that c = ∂2x1x2
C exists.Let (U1,U2) be independent U[0, 1]. The copula of (U1,U2) is(x1, x2)→ x1x2.Payne 1960: an optimal Poincare constant for U[0, 1]2 is Cp = 2/π2.Now we can show:
dW [(V1,V2) , (U1,U2)] ≤ 2
π2
√∫[0,1]2
|∇c(x1, x2)|2dx1 dx2.
37 / 41
![Page 38: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/38.jpg)
Stein operators TpF = div(F p)/p
Example: the effect of the prior on the posterior
Consider a normal model with mean θ ∈ Rd and positive definitecovariance matrix Σ. The likelihood of θ given a sample (x1, . . . , xn) is
(2π)−nd/2 det(Σ)−n/2 exp
(−1
2
n∑i=1
(xi − θ)TΣ−1(xi − θ)
).
We want to compare the posterior distribution P1 = N (x , n−1Σ) of θ withuniform prior with the posterior P2 with normal prior with parameters(µ,Σ2); Σ2 is assumed positive definite.
38 / 41
![Page 39: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/39.jpg)
Stein operators TpF = div(F p)/p
The operator norm of a matrix A is |||A||| = sup||x ||=1 ||Ax ||. The nomal
density p1 is n/|||Σ|||-log concave. Moreover P2 = N (µ, Σn) with
µ = µ+ nΣnΣ−1(x − µ)
Σn = (Σ−12 + nΣ−1)−1.
After some calculation we find
dW (P1,P2) ≤ |||Σ||| |||(Σ + nΣ2)−1||| ||x − µ||
+
√2Γ(d/2 + 1/2)
Γ(d/2)
|||Σ|||n|||(Σ2 + nΣ2Σ−1Σ2)−1/2|||.
The closer x is to µ, the smaller the bound.The influence of Σ2 vanishes as n→∞.
39 / 41
![Page 40: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/40.jpg)
Last remarks
Outline
1 Stein’s method
2 The score function and the Stein kernel
3 Higher dimensions
4 Stein operators TpF = div(F p)/p
5 Last remarks
40 / 41
![Page 41: Di erential Stein operators for multivariate continuous](https://reader031.vdocuments.net/reader031/viewer/2022012411/616a695b11a7b741a352354a/html5/thumbnails/41.jpg)
Last remarks
Last remarks
Solving and bounding the Stein equation is crucial for applying themethod. Our framework gives a large (indeed infinite) choice for Steinequations to choose from.
The effect of the prior on the posterior will be studied in more detail.
We are thinking about the multivariate discrete case, too. Note thatBarbour et al. 2017 gives an approximation by a discretisedmultivariate normal, using Markov process arguments.
41 / 41