concentration inequalities and tail bounds€¦ · concentration inequalities and tail bounds john...
TRANSCRIPT
Concentration inequalities and tail bounds
John Duchi
Prof. John Duchi
Outline
I Basics and motivation1 Law of large numbers2 Markov inequality3 Cherno↵ bounds
II Sub-Gaussian random variables1 Definitions2 Examples3 Hoe↵ding inequalities
III Sub-exponential random variables1 Definitions2 Examples3 Cherno↵/Bernstein bounds
Prof. John Duchi
Motivation
I Often in this class, goal is to argue that sequence of random(vectors) X1, X2, . . . satisfies
1
n
nX
i=1
X
i
p! E[X].
I Law of large numbers: if E[kXk] < 1, then
P
lim
n!1
1
n
nX
i=1
X
i
6= E[X]
!= 0.
Prof. John Duchi
Markov inequalities
Theorem (Markov’s inequality)
Let X be a non-negative random variable. Then
P(X � t) E[X]
t
.
Prof. John Duchi
Chebyshev inequalitiesTheorem (Chebyshev’s inequality)
Let X be a real-valued random variable with E[X2] < 1. Then
P(X � E[X] � t) E[(X � E[X])
2]
t
2=
Var(X)
t
2.
Example: i.i.d. sampling
Prof. John Duchi
Cherno↵ bounds
Moment generating function: for random variable X, the MGF is
M
X
(�) := E[e�X ]
Example: Normally distributed random variables
Prof. John Duchi
Cherno↵ bounds
Theorem (Cherno↵ bound)
For any random variable and t � 0,
P(X �E[X] � t) inf
��0M
X�E[X](�)e��t
= inf
��0E[e�(X�E[X])
]e
��t
.
Prof. John Duchi
Sub-Gaussian random variables
Definition (Sub-Gaussianity)
A mean-zero random variable X is �2-sub-Gaussian if
Ehe
�X
i exp
✓�
2�
2
2
◆for all � 2 R
Example: X ⇠ N(0,�2)
Prof. John Duchi
Properties of sub-Gaussians
Proposition (sums of sub-Gaussians)
Let X
i
be independent, mean-zero �
2i
-sub-Gaussian. ThenPn
i=1Xi
is
Pn
i=1 �2i
-sub-Gaussian.
Prof. John Duchi
Concentration inequalities
TheoremLet X be �
2-sub-Gaussian. Then for t � 0,
P(X � E[X] � t) exp
✓� t
2
2�
2
◆
P(X � E[X] �t) exp
✓� t
2
2�
2
◆
Prof. John Duchi
Concentration: convergence of an independent sum
CorollaryLet X
i
be independent �
2i
-sub-Gaussian. Then for t � 0,
P 1
n
nX
i=1
X
i
� t
! exp
� nt
2
2
1n
Pn
i=1 �2i
!
Prof. John Duchi
Example: bounded random variables
PropositionLet X 2 [a, b], with E[X] = 0. Then
E[e�X ] e
�2(b�a)2
8.
Prof. John Duchi
Maxima of sub-Gaussian random variables (in probability)
Emax
jn
X
j
�p2�
2log n
Prof. John Duchi
Maxima of sub-Gaussian random variables (in expectation)
P✓max
jn
X
j
�p2�
2(log n+ t)
◆ e
�t
.
Prof. John Duchi
Hoe↵ding’s inequality
If Xi
are bounded in [a
i
, b
i
] then for t � 0,
P 1
n
nX
i=1
(X
i
� E[Xi
]) � t
! exp
� 2nt
2
1n
Pn
i=1(bi � a
i
)
2
!
P 1
n
nX
i=1
(X
i
� E[Xi
]) �t
! exp
� 2nt
2
1n
Pn
i=1(bi � a
i
)
2
!.
Prof. John Duchi
Equivalent definitions of sub-Gaussianity
TheoremThe following are equivalent (up to constants)
i E[exp(X2/�
2)] e
ii E[|X|k]1/k �
pk
iii P(|X| � t) exp(� t
2
2�2 )
If in addition X is mean-zero, then this is also equivalent to i–iii
above
iv X is �
2-sub-Gaussian
Prof. John Duchi
Sub-exponential random variables
Definition (Sub-exponential)
A mean-zero random variable X is (⌧2, b)-sub-Exponential if
E [exp (�X)] exp
✓�
2⌧
2
2
◆for |�| 1
b
.
Example: Exponential RV, density p(x) = �e
��x for x � 0
Prof. John Duchi
Sub-exponential random variables
Example: �2-random variable. Let Z ⇠ N(0,�2) and X = Z
2.Then
E[e�X ] =
1
[1� 2��
2]
12+
.
Prof. John Duchi
Concentration of sub-exponentials
TheoremLet X be (⌧
2, b)-sub-exponential. Then
P(X � E[X]+t) (e
� t2
2⌧2if 0 t ⌧
2
b
e
� t2b
if t � ⌧
2
b
= max
⇢e
� t2
2⌧2, e
� t2b
�.
Prof. John Duchi
Sums of sub-exponential random variables
Let Xi
be independent (⌧2i
, b
i
)-sub-exponential random variables.Then
Pn
i=1Xi
is (P
n
i=1 ⌧2i
, b⇤)-sub-exponential, whereb⇤ = max
i
b
i
Corollary: If Xi
satisfy above, then
P �����
1
n
nX
i=1
X
i
� E[Xi
]
����� � t
! 2 exp
�min
(nt
2
2
1n
Pn
i=1 ⌧2i
,
nt
2b⇤
)!.
Prof. John Duchi
Bernstein conditions and sub-exponentialsSuppose X is mean-zero with
|E[Xk
]| 1
2
k!�
2b
k�2
Then
E[e�X ] exp
✓�
2�
2
2(1� b|�|)
◆
Prof. John Duchi
Johnson-Lindenstrauss and high-dimensional embedding
Question: Let u1, . . . , um 2 Rd be arbitrary. Can we find amapping F : Rd ! Rn, n ⌧ d, such that
(1� �)
��u
i � u
j
��22��F (u
i
)� F (u
j
)
��22 (1 + �)
��u
i � u
j
��22
Theorem (Johnson-Lindenstrauss embedding)
For n & 1✏
2 logm such a mapping exists.
Prof. John Duchi
Proof of Johnson-Lindenstrauss continued
P �����
kXuk22n kuk22
� 1
����� � t
! 2 exp
✓�nt
2
8
◆for t 2 [0, 1].
Prof. John Duchi
Reading and bibliography
1. S. Boucheron, O. Bousquet, and G. Lugosi. Concentrationinequalities.In O. Bousquet, U. Luxburg, and G. Ratsch, editors, AdvancedLectures in Machine Learning, pages 208–240. Springer, 2004
2. V. Buldygin and Y. Kozachenko. Metric Characterization of
Random Variables and Random Processes, volume 188 ofTranslations of Mathematical Monographs.American Mathematical Society, 2000
3. M. Ledoux. The Concentration of Measure Phenomenon.American Mathematical Society, 2001
4. S. Boucheron, G. Lugosi, and P. Massart. ConcentrationInequalities: a Nonasymptotic Theory of Independence.Oxford University Press, 2013
Prof. John Duchi