stein's method for matrix concentration · sums of independent matrices and matrix martingales...

20
Stein’s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan , Richard Y. Chen , Brendan Farrell , and Joel A. Tropp University of California, Berkeley California Institute of Technology BEARS 2012 Mackey (UC Berkeley) Stein’s Method BEARS 2012 1 / 20

Upload: others

Post on 17-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Steinrsquos Method for Matrix Concentration

Lester Mackeydagger

CollaboratorsMichael I Jordandagger Richard Y Chenlowast Brendan Farrelllowast and Joel A Tropplowast

daggerUniversity of California Berkeley lowastCalifornia Institute of Technology

BEARS 2012

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 1 20

Motivation

Concentration Inequalities

Matrix concentration

PX minus EX ge t le δ

Pλmax(X minus EX) ge t le δ

Non-asymptotic control of random matrices with complexdistributions

Applications

Matrix estimation from sparse random measurements(Gross 2011 Recht 2009 Mackey Talwalkar and Jordan 2011)

Randomized matrix multiplication and factorization(Drineas Mahoney and Muthukrishnan 2008 Hsu Kakade and Zhang 2011b)

Convex relaxation of robust or chance-constrained optimization(Nemirovski 2007 So 2011 Cheung So and Wang 2011)

Random graph analysis (Christofides and Markstrom 2008 Oliveira 2009)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 2 20

Motivation

Concentration Inequalities

Matrix concentration

Pλmax(X minus EX) ge t le δ

Difficulty Matrix multiplication is not commutative

Past approaches (Oliveira 2009 Tropp 2011 Hsu Kakade and Zhang 2011a)

Deep results from matrix analysis

Sums of independent matrices and matrix martingales

This work

Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentration

rArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 3 20

Motivation

Roadmap

1 Motivation

2 Steinrsquos Method Background and Notation

3 Exponential Tail Inequalities

4 Polynomial Moment Inequalities

5 Extensions

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20

Background

Notation

Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian

Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B

Spectral norm B the maximum singular value of B

Schatten p-norm Bp =(

tr|B|p)1p

for p ge 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20

Background

Matrix Stein Pair

Definition (Exchangeable Pair)

(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)

Definition (Matrix Stein Pair)

Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd

be a measurable function Define the random matrices

X = Ψ(Z) and X prime = Ψ(Z prime)

(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if

E[X prime |Z] = (1minus α)X

Matrix Stein pairs are exchangeable pairs

Matrix Stein pairs always have zero mean

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20

Background

The Conditional Variance

Definition (Conditional Variance)

Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional

variance is the random matrix

∆X = ∆X(Z) =1

2αE[

(X minusX prime)2 |Z]

∆X is a stochastic estimate for the variance EX2

Control over ∆X yields control over λmax(X)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 2: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Motivation

Concentration Inequalities

Matrix concentration

PX minus EX ge t le δ

Pλmax(X minus EX) ge t le δ

Non-asymptotic control of random matrices with complexdistributions

Applications

Matrix estimation from sparse random measurements(Gross 2011 Recht 2009 Mackey Talwalkar and Jordan 2011)

Randomized matrix multiplication and factorization(Drineas Mahoney and Muthukrishnan 2008 Hsu Kakade and Zhang 2011b)

Convex relaxation of robust or chance-constrained optimization(Nemirovski 2007 So 2011 Cheung So and Wang 2011)

Random graph analysis (Christofides and Markstrom 2008 Oliveira 2009)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 2 20

Motivation

Concentration Inequalities

Matrix concentration

Pλmax(X minus EX) ge t le δ

Difficulty Matrix multiplication is not commutative

Past approaches (Oliveira 2009 Tropp 2011 Hsu Kakade and Zhang 2011a)

Deep results from matrix analysis

Sums of independent matrices and matrix martingales

This work

Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentration

rArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 3 20

Motivation

Roadmap

1 Motivation

2 Steinrsquos Method Background and Notation

3 Exponential Tail Inequalities

4 Polynomial Moment Inequalities

5 Extensions

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20

Background

Notation

Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian

Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B

Spectral norm B the maximum singular value of B

Schatten p-norm Bp =(

tr|B|p)1p

for p ge 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20

Background

Matrix Stein Pair

Definition (Exchangeable Pair)

(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)

Definition (Matrix Stein Pair)

Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd

be a measurable function Define the random matrices

X = Ψ(Z) and X prime = Ψ(Z prime)

(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if

E[X prime |Z] = (1minus α)X

Matrix Stein pairs are exchangeable pairs

Matrix Stein pairs always have zero mean

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20

Background

The Conditional Variance

Definition (Conditional Variance)

Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional

variance is the random matrix

∆X = ∆X(Z) =1

2αE[

(X minusX prime)2 |Z]

∆X is a stochastic estimate for the variance EX2

Control over ∆X yields control over λmax(X)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 3: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Motivation

Concentration Inequalities

Matrix concentration

Pλmax(X minus EX) ge t le δ

Difficulty Matrix multiplication is not commutative

Past approaches (Oliveira 2009 Tropp 2011 Hsu Kakade and Zhang 2011a)

Deep results from matrix analysis

Sums of independent matrices and matrix martingales

This work

Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentration

rArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 3 20

Motivation

Roadmap

1 Motivation

2 Steinrsquos Method Background and Notation

3 Exponential Tail Inequalities

4 Polynomial Moment Inequalities

5 Extensions

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20

Background

Notation

Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian

Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B

Spectral norm B the maximum singular value of B

Schatten p-norm Bp =(

tr|B|p)1p

for p ge 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20

Background

Matrix Stein Pair

Definition (Exchangeable Pair)

(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)

Definition (Matrix Stein Pair)

Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd

be a measurable function Define the random matrices

X = Ψ(Z) and X prime = Ψ(Z prime)

(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if

E[X prime |Z] = (1minus α)X

Matrix Stein pairs are exchangeable pairs

Matrix Stein pairs always have zero mean

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20

Background

The Conditional Variance

Definition (Conditional Variance)

Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional

variance is the random matrix

∆X = ∆X(Z) =1

2αE[

(X minusX prime)2 |Z]

∆X is a stochastic estimate for the variance EX2

Control over ∆X yields control over λmax(X)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 4: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Motivation

Roadmap

1 Motivation

2 Steinrsquos Method Background and Notation

3 Exponential Tail Inequalities

4 Polynomial Moment Inequalities

5 Extensions

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20

Background

Notation

Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian

Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B

Spectral norm B the maximum singular value of B

Schatten p-norm Bp =(

tr|B|p)1p

for p ge 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20

Background

Matrix Stein Pair

Definition (Exchangeable Pair)

(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)

Definition (Matrix Stein Pair)

Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd

be a measurable function Define the random matrices

X = Ψ(Z) and X prime = Ψ(Z prime)

(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if

E[X prime |Z] = (1minus α)X

Matrix Stein pairs are exchangeable pairs

Matrix Stein pairs always have zero mean

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20

Background

The Conditional Variance

Definition (Conditional Variance)

Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional

variance is the random matrix

∆X = ∆X(Z) =1

2αE[

(X minusX prime)2 |Z]

∆X is a stochastic estimate for the variance EX2

Control over ∆X yields control over λmax(X)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 5: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Background

Notation

Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian

Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B

Spectral norm B the maximum singular value of B

Schatten p-norm Bp =(

tr|B|p)1p

for p ge 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20

Background

Matrix Stein Pair

Definition (Exchangeable Pair)

(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)

Definition (Matrix Stein Pair)

Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd

be a measurable function Define the random matrices

X = Ψ(Z) and X prime = Ψ(Z prime)

(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if

E[X prime |Z] = (1minus α)X

Matrix Stein pairs are exchangeable pairs

Matrix Stein pairs always have zero mean

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20

Background

The Conditional Variance

Definition (Conditional Variance)

Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional

variance is the random matrix

∆X = ∆X(Z) =1

2αE[

(X minusX prime)2 |Z]

∆X is a stochastic estimate for the variance EX2

Control over ∆X yields control over λmax(X)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 6: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Background

Matrix Stein Pair

Definition (Exchangeable Pair)

(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)

Definition (Matrix Stein Pair)

Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd

be a measurable function Define the random matrices

X = Ψ(Z) and X prime = Ψ(Z prime)

(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if

E[X prime |Z] = (1minus α)X

Matrix Stein pairs are exchangeable pairs

Matrix Stein pairs always have zero mean

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20

Background

The Conditional Variance

Definition (Conditional Variance)

Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional

variance is the random matrix

∆X = ∆X(Z) =1

2αE[

(X minusX prime)2 |Z]

∆X is a stochastic estimate for the variance EX2

Control over ∆X yields control over λmax(X)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 7: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Background

The Conditional Variance

Definition (Conditional Variance)

Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional

variance is the random matrix

∆X = ∆X(Z) =1

2αE[

(X minusX prime)2 |Z]

∆X is a stochastic estimate for the variance EX2

Control over ∆X yields control over λmax(X)

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 8: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Exponential Tail Inequalities

Exponential Concentration for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that

∆X 4 cX + v I almost surely for c v ge 0

Then for all t ge 0

Pλmax(X) ge t le d middot exp minust2

2v + 2ct

Control over the conditional variance ∆X yields

Gaussian tail for λmax(X) for small t Poisson tail for large t

When d = 1 reduces to scalar result of Chatterjee (2007)

The dimensional factor d cannot be removed

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 9: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Exponential Tail Inequalities

Application Matrix Hoeffding Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (Yk)kge1 be independent matrices in Hd satisfying

EYk = 0 and Y 2

k 4 A2

k

for deterministic matrices (Ak)kge1 Define the variance parameter

σ2 =1

2

sum

k

(

A2

k + EY 2

k

)

Then for all t ge 0

P

λmax

(

sum

kYk

)

ge t

le d middot eminust22σ2

Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound

sum

k A2k

Tighter than classical Hoeffding inequality (1963) when d = 1

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 10: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

1 Matrix Laplace transform method (Ahlswede amp Winter 2002)

Relate tail probability to the trace of the mgf of X

Pλmax(X) ge t le infθgt0

eminusθt middotm(θ)

where m(θ) = E tr eθX

How to bound the trace mgf

Past approaches Golden-Thompson Liebrsquos concavity theorem

Chatterjeersquos strategy for scalar concentration

Control mgf growth by bounding derivative

mprime(θ) = E trXeθX for θ isin R

Rewrite using exchangeable pairs

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 11: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Exponential Tail Inequalities

Method of Exchangeable Pairs

Lemma

Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying

E(X minusX prime)F (X) lt infin

Then

E[X F (X)] =1

2αE[(X minusX prime)(F (X)minus F (X prime))] (1)

Intuition

Can characterize the distribution of a random matrix byintegrating it against a class of test functions F

Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 12: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

2 Method of Exchangeable Pairs

Rewrite the derivative of the trace mgf

mprime(θ) = E trXeθX =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

Goal Use the smoothness of F (X) = eθX to bound the derivative

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 13: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Exponential Tail Inequalities

Mean Value Trace Inequality

Lemma (Mackey Jordan Chen Farrell and Tropp 2012)

Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that

tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1

2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]

Standard matrix functions If g R rarr R then

g(A) = Q

g(λ1)

g(λd)

Qlowast when A = Q

λ1

λd

Qlowast

Inequality does not hold without the trace

For exponential concentration we let g(A) = A and h(B) = eθB

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 14: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Exponential Tail Inequalities

Exponential Concentration Proof Sketch

3 Mean Value Trace Inequality

Bound the derivative of the trace mgf

mprime(θ) =1

2αE tr

[

(X minusX prime)(

eθX minus eθXprime)]

le θ

4αE tr

[

(X minusX prime)2 middot(

eθX + eθXprime)]

= θ middot E tr[

∆X eθX]

4 Conditional Variance Bound ∆X 4 cX + v I

Yields differential inequality

mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)

Solve to bound m(θ) and thereby bound Pλmax(X) ge t

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 15: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Polynomial Moment Inequalities

Polynomial Moments for Random Matrices

Theorem (Mackey Jordan Chen Farrell and Tropp 2012)

Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p

2p lt infin Then(

EX2p2p

)12p leradic

2pminus 1 middot(

E∆Xpp)12p

Moral The conditional variance controls the moments of X

Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)

See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)

Proof techniques mirror those for exponential concentration

Also holds for infinite dimensional Schatten-class operators

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 16: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Polynomial Moment Inequalities

Application Matrix Khintchine Inequality

Corollary (Mackey Jordan Chen Farrell and Tropp 2012)

Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15

(

E

sum

kεkAk

2p

2p

)12p

leradic

2pminus 1 middot∥

(

sum

kA2

k

)12∥

2p

Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard

and Pisier 1991) is a dominant tool in applied matrix analysis

eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)

Steinrsquos method offers an unusually concise proof

The constantradic2pminus 1 is within

radice of optimal

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 17: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Extensions

Extensions

Refined Exponential Concentration

Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices

General Complex Matrices

Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation

D(B) =

[

0 B

Blowast0

]

isin Hd1+d2

Preserves spectral information λmax(D(B)) = BDependent Sequences

Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property

Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 18: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Extensions

The End

Thanks

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 19: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Extensions

References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)

569ndash579 Mar 2002

Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023

Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007

Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011

Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008

Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix

Analysis and Applications 30844ndash881 2008

Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011

Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a

Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003

Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008

Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986

Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991

Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information

Processing Systems 24 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions
Page 20: Stein's Method for Matrix Concentration · Sums of independent matrices and matrix martingales This work Stein’s method of exchangeable pairs (1972), as advanced by Chatterjee (2007)

Extensions

References II

Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012

Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math

Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726

Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009

Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997

Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009

Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc

Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)

So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011

Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press

Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011

Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20

  • Motivation
  • Steins Method Background and Notation
  • Exponential Tail Inequalities
  • Polynomial Moment Inequalities
  • Extensions