3. discrete random variables - stat homestat€¦ · 3. discrete random variables contents idea of...
TRANSCRIPT
http://statwww.epfl.ch
3. Discrete Random Variables
Contents
Idea of a random variable; probability mass function.
Types of discrete random variables: Bernoulli, indicator, binomial,
geometric, hypergeometric, Poisson.
Distribution function and its properties.
Expectation, variance, and other moments.
Conditional distributions.
Convergence of distributions, and moment generating function.
References: Ross (Chapter 4); Ben Arous notes (Chapters III, IV).
Exercises: 60–64, 66–78 of Recueil d’exercices.
Probabilite et Statistique I — Chapter 3 1
http://statwww.epfl.ch
Petit Vocabulaire Probabiliste
Mathematics English Francais
random experiment une experience aleatoire
Ω sample space l’ensemble fondamental
ω outcome, elementary event une epreuve, un evenement elementaire
A, B, . . . event un evenement
F event space l’espace des evenements
sigma-algebra une tribu
P probability distribution/probability function une loi de probabilite
(Ω, F, P) probability space un espace de probabilite
inclusion-exclusion formulae formule d’inclusion-exclusion
P(A | B) probability of A given B la probabilite de A sachant B
independence independance
(mutually) independent events les evenements (mutuellement) independants
pairwise independent events les evenements independants deux a deux
conditionally independent events les evenements conditionellement independants
X, Y, . . . random variable une variable aleatoire
I indicator random variable une variable indicatrice
fX probability mass function fonction de masse
FX probability distribution function fonction de repartition
Probabilite et Statistique I — Chapter 3 2
http://statwww.epfl.ch
Random Variables
In applications we usually want to consider numerical random
quantities.
Example 3.1: A family has three children. Let X represent the
number of boys. Possible values for X are 0, 1, 2, 3. Find the
probabilities that X takes these values. •
Definition: Let (Ω,F , P) be a probability space. A random
variable X : Ω 7→ R is a mapping from the sample space Ω to the
real numbers R.
If the range of X ,
D = x ∈ R : ∃ω ∈ Ω such that X(ω) = x
is countable, then X is called a discrete random variable. •
Probabilite et Statistique I — Chapter 3 3
http://statwww.epfl.ch
This induces probabilities on subsets S of the real line, given by
P(X ∈ S) = P(w ∈ Ω : X(w) ∈ S).
In particular, we set Ax = ω ∈ Ω : X(ω) = x.
Example 3.2 (Coin): A coin is tossed repeatedly and
independently. Let X be the random variable representing the
number of tosses needed until the first head shows. Compute
P(X = 3), P(X = 15), P(X ≤ 3.5), P(X > 1.7), P(1.7 ≤ X ≤ 3.5).
•
Example 3.3 (Dartboard): A natural set Ω when I play at darts
is the wall on which the dartboard hangs. The dart lands at a point
ω ∈ Ω ⊂ R2. My score X(ω) takes values in D = 0, 1, . . . , 60. •
Probabilite et Statistique I — Chapter 3 4
http://statwww.epfl.ch
Bernoulli Random Variable
Definition: A random variable that takes only the values 0 and 1 is
called an indicator random variable, or a Bernoulli random
variable, or sometimes a Bernoulli trial. We use I(C) to denote
the indicator of the event C.
Example 3.4 (Coins): Suppose that n identical coins are tossed
independently, let Hi be the event that the ith coin shows a head,
and let Ii = I(Hi) be the indicator of this event. Then
P(Ii = 1) = P(Hi) = p, P(Ii = 0) = P(Hci ) = 1 − p,
where p is the probability that a coin shows a head. Write down the
sample space and the sets Ax when n = 3. What is the random
variable X = I1 + · · · + In? •
Probabilite et Statistique I — Chapter 3 5
http://statwww.epfl.ch
Probability Mass Function
We have already seen that a random variable X induces probabilities
on subsets of R. In particular when X is discrete, we have
Ax = ω ∈ Ω : X(ω) = x, and can define:
Definition: The probability mass function (pmf) (fonction de
masse) of X is the function
fX(x) = P(X = x) = P(Ax), x ∈ R.
The probability mass function has two key properties: (i) fX(x) ≥ 0,
and is positive only for x ∈ D, where D is the range of X , also called
the support of fX ; (ii) the total probability∑
i:xi∈D fX(xi) = 1.
When there is no risk of confusion we write fX ≡ f .
Probabilite et Statistique I — Chapter 3 6
http://statwww.epfl.ch
Binomial Random Variable
Example 3.4 (ctd): Compute the probability mass functions and
support of Ii and of X . •
Definition: A binomial random variable X has pmf
fX(x) =
(
n
x
)
px(1 − p)n−x, x = 0, 1, . . . , n, n ∈ N, 0 ≤ p ≤ 1.
We write X ∼ B(n, p), and call n the denominator and p the
success probability. •
Note: we use ∼ as shorthand for ‘has the distribution’.
The binomial model is used when considering the number of
“successes” occurring in a fixed number of independent trials, and
each trial has the same success probability.
Probabilite et Statistique I — Chapter 3 7
http://statwww.epfl.ch
Binomial Probability Mass Functions
0 2 4 6 8 10
0.00
0.15
0.30
B(10,0.5)
x
f(x)
0 2 4 6 8 10
0.00
0.15
0.30
B(10,0.3)
x
f(x)
0 5 10 15 20
0.00
0.15
0.30
B(20,0.1)
x
f(x)
0 10 20 30 40
0.00
0.15
0.30
B(40,0.9)
x
f(x)
Probabilite et Statistique I — Chapter 3 8
http://statwww.epfl.ch
Example 3.5: Certain physical traits are determined by a pair of
genes, of which there are two types: dominant d and recessive r. A
person with genotype dd has pure dominance, one with dr is hybrid,
and one with rr is recessive. The genotypes dd and rd have the same
phenotype, so cannot be distinguished physically. A child receives a
gene from each parent. If the parents are both hybrid and have 4
children, what is the probability that just three of them show the
dominant trait? What is the probability that at most three of them
show the dominant trait? •
Theorem (Stability of binomial): Let X ∼ B(n, p) and
Y ∼ B(m, p) be independent binomial random variables. Then
X + Y ∼ B(n + m, p). •
Probabilite et Statistique I — Chapter 3 9
http://statwww.epfl.ch
Waiting Times
Definition: A geometric random variable X has pmf
fX(x) = p(1 − p)x−1, x = 1, 2, . . . , 0 ≤ p ≤ 1.
We write X ∼ Geom(p), and call p the success probability. •
This is used to model a waiting time to the first event in a series of
independent trials, each with the same success probability.
Example 3.6: To start a board game, players take it in turns to
throw a die. The first to obtain a six starts. What is the probability
that the 3rd player starts? What is the probability of waiting until at
least 6 throws before starting? •
Theorem (Lack of memory): If X ∼ Geom(p), then
P(X > n + m | X > m) = P(X > n). •
Probabilite et Statistique I — Chapter 3 10
http://statwww.epfl.ch
Geometric and Negative Binomial PMFs
0 10 20 30 40
0.0
0.2
0.4
Geom(0.5)
x + 1
f(x)
0 10 20 30 40
0.0
0.2
0.4
Geom(0.1)
x + 1
f(x)
0 10 20 30 40
0.00
0.10
0.20
NegBin(4,0.5)
x + 4
f(x)
0 10 20 30 40
0.00
0.10
0.20
NegBin(6,0.3)
x + 6
f(x)
Probabilite et Statistique I — Chapter 3 11
http://statwww.epfl.ch
Definition: A negative binomial random variable X with
parameters n and p has pmf
fX(x) =
(
x − 1
n − 1
)
pn(1− p)x−n, x = n, n + 1, n + 2, . . . , 0 ≤ p ≤ 1.
We write X ∼ NegBin(n, p). When n = 1, X ∼ Geom(p). •
This is used to model a waiting time to the nth success in a series of
independent trials, each with the same success probability.
Example 3.7: Two players toss a fair coin successively. What is
the probability that 2 heads appear before 5 tails? •
Theorem (Stability of negative binomial): Let X1, . . . , Xn be
independent geometric random variables with success probability p.
Then X1 + · · · + Xn ∼ NegBin(n, p). •
Probabilite et Statistique I — Chapter 3 12
http://statwww.epfl.ch
Banach’s Match Problem
Example 3.8: A pipe-smoking mathematician carries a box of
matches in each of the pockets of his jacket, one on the right and one
on the left. Initially both boxes contain m matches. Each time he
lights his pipe, he chooses a box of matches at random, and throws
the spent match away. After a while he finds that the box he has
chosen is empty. What is then the distribution of the number of
matches in the other box? •
Probabilite et Statistique I — Chapter 3 13
http://statwww.epfl.ch
Hypergeometric Distribution
Example 3.9 (Capture-recapture): In order to estimate the
unknown number of fish N in a lake, we first catch r ≤ N fish, mark
them, and put them back. After waiting long enough for the fish
population to be well-mixed, we take a further sample of size s, of
which 0 ≤ m ≤ s are marked. Let M be the random variable
representing the number of marked fish in this sample. Show that
P(M = m) =
(
rm
)(
N−rs−m
)
(
Ns
) , m ∈ max(0, s + r − N), . . . , min(r, s).
This is the pmf of the hypergeometric distribution.
Show that the value of N that maximises this P(M = m) is brs/mc.
Compute the best estimate of N when s = 50, r = 40, and m = 4. •
Probabilite et Statistique I — Chapter 3 14
http://statwww.epfl.ch
Example 3.10: An electrician buys components in packets of 10.
He examines three components chosen at random from a packet, and
accepts the packet only if the three chosen are faultless. If 30% of
packets contain 4 bad components and the other 70% contain just
one bad component, what proportion of packets does he reject? •
Probabilite et Statistique I — Chapter 3 15
http://statwww.epfl.ch
Distribution Function
Definition: Let X be a random variable. Its cumulative
distribution function (CDF) (fonction de repartition) is
FX(x) = P(X ≤ x), x ∈ R.
If X is discrete, this can be written as
FX(x) =∑
xi∈D:xi≤x
P(X = xi),
which is a step function with jumps at the support of fX(x), i.e.
x ∈ R : fX(x) > 0.
When there is no risk of confusion, we write F ≡ FX .
Probabilite et Statistique I — Chapter 3 16
http://statwww.epfl.ch
Example 3.11: Give the support, pmf, and distribution function
for a Bernoulli random variable. •
Example 3.12 (Die): Give the support, pmf, and distribution
function for the value obtained when a fair die is thrown. •
Definition: A discrete uniform random variable X has probability
mass function
fX(x) =1
b − a + 1, x = a, a + 1, . . . , b, a < b, a, b ∈ Z.
•
Definition: A Poisson random variable X has probability mass
function
fX(x) =λx
x!e−λ, x = 0, 1, . . . , λ > 0.
We write X ∼ Pois(λ). •
Probabilite et Statistique I — Chapter 3 17
http://statwww.epfl.ch
Poisson Probability Mass Functions
0 5 10 15 20
0.0
0.3
0.6
Pois(0.5)
x
f(x)
0 5 10 15 20
0.0
0.3
0.6
Pois(1)
x
f(x)
0 5 10 15 20
0.00
0.10
0.20
Pois(4)
x
f(x)
0 5 10 15 20
0.00
0.10
0.20
Pois(10)
x
f(x)
Probabilite et Statistique I — Chapter 3 18
http://statwww.epfl.ch
Properties of a Distribution Function
Theorem : Let (Ω,F , P) be a probability space and X : Ω 7→ R a
random variable. Its cumulative distribution function FX satisfies:
(a) limx→−∞ FX(x) = 0;
(b) limx→∞ FX(x) = 1;
(c) FX is non-decreasing, that is, FX(x) ≤ FX(y) whenever x ≤ y;
(d) FX is continuous to the right, that is,
limt↓0
FX(x + t) = FX(x), x ∈ R;
(e) P(X > x) = 1 − FX(x);
(f) if x < y, then P(x < X ≤ y) = FX(y) − FX(x).
Probabilite et Statistique I — Chapter 3 19
http://statwww.epfl.ch
Note: The pmf is obtained from the CDF by
f(x) = F (x) − limy↑x
F (y), where y < x.
In many cases X takes only integer values, and then
f(x) = F (x) − F (x − 1) for integer x.
Example 3.13 (Urn): An urn contains tickets labelled 1, . . . , n,
from which r are drawn at random. Let X be the largest number
removed if the tickets are replaced in the urn after each drawing, and
let Y be the largest number removed if the drawn tickets are not
replaced. Find fX(x), FX(x), fY (x), and FY (x). Show that
FY (k) < FX(k) for k = 1, . . . , n − 1. •
Example 3.14 (Poisson): Find FX(x) when X ∼ Pois(λ). •
Probabilite et Statistique I — Chapter 3 20
http://statwww.epfl.ch
Poisson Cumulative Distribution Functions
0 5 10 15 20
0.0
0.4
0.8
Pois(0.5)
x
F(x
)
0 5 10 15 20
0.0
0.4
0.8
Pois(1)
x
F(x
)
0 5 10 15 20
0.0
0.4
0.8
Pois(4)
x
F(x
)
0 5 10 15 20
0.0
0.4
0.8
Pois(10)
x
F(x
)
Probabilite et Statistique I — Chapter 3 21
http://statwww.epfl.ch
Transformations of Discrete Random Variables
Real-valued functions of random variables are themselves random
variables, so they too have probability mass functions.
Theorem : If X and Y are random variables such that Y = g(X),
then Y has probability mass function
fY (y) =∑
x:g(x)=y
fX(x).
•
Example 3.15: Find the pmf of Y = I(X > 0) when X ∼ Pois(λ).
Example 3.16: Let Y be the reminder when the total from a throw
of two independent dice is divided by 4. Find the pmf of Y .
Probabilite et Statistique I — Chapter 3 22
http://statwww.epfl.ch
Mathematical Honesty
From now on we mostly ignore the underlying probability space
(Ω,F , P) when dealing with a random variable X and think in terms
of X , FX(x), and fX(x). It can be proved that this is
mathematically legitimate.
Probabilite et Statistique I — Chapter 3 23
http://statwww.epfl.ch
3.2 Expectation
Definition: Let X be a discrete random variable for which∑
x∈D |x|fX(x) < ∞, where D is the support of fX . The
expectation (l’esperance) or mean of X is defined to be
E(X) =∑
xP(X = x) =∑
x∈D
xfX(x).
Note: E(X) is sometimes called the average value (la moyenne) of
X . We confine the use of ‘average’ to empirical quantities.
Example 3.17: Find the expected score on the throw of a fair die. •
Example 3.18: Find the means of the random variables with pmfs
fX(x) =4
x(x + 1)(x + 2), fY (x) =
1
x(x + 1), x = 1, 2, . . . .
Probabilite et Statistique I — Chapter 3 24
http://statwww.epfl.ch
Example 3.19: Find the mean of a Bernoulli variable with
probability p. •
Example 3.20: Find the mean of X ∼ B(n, p). •
Theorem : Let X be a random variable with mass function f , and
let g be a real-valued function on R. Then
Eg(X) =∑
x
g(x)f(x),
whenever∑
x |g(x)|f(x) < ∞. •
Example 3.21: Let X ∼ Pois(λ). Find the expectations of
X, X(X − 1), X(X − 1) · · · (X − r + 1), cos(θX).
•
Note: Expectation is analogous to the idea from mechanics of the
centre of mass of an object whose mass is distributed according to fX .
Probabilite et Statistique I — Chapter 3 25
http://statwww.epfl.ch
Properties of Expectation
Theorem : Let X be a random variable with finite mean E(X),
and let a, b be constants. Then
(a) E(·) is a linear operator, i.e. E(aX + b) = aE(X) + b;
(b) if P(X = b), then E(X) = b;
(c) if P(a < X ≤ b) = 1, then a < E(X) ≤ b;
(d) if g(X) and h(X) have finite means, then
Eg(X) + h(X) = Eg(X) + Eh(X);
(e) finally, E(X)2 ≤ E(|X |)2 ≤ E(X2).
•
Probabilite et Statistique I — Chapter 3 26
http://statwww.epfl.ch
Note: The linearity of expectation is extremely useful in practice.
Example 3.22: Let X = I1 + · · · + In, where I1, . . . , In are
independent Bernoulli variables with probability p. Find E(X). Is
independence of the Ii needed? •
Example 2.16 (Matching, ctd): Show that the expected number
of men who leave with the correct hats is 1, for all n. •
Example 3.23 (Indicator random variables): Let IA, IB, . . .
denote indicators of events A, B, . . .. Show that
IA∩B = IAIB, IA∪B = 1 − (1 − IA)(1 − IB), E(IA) = P(A).
and hence establish the inclusion-exclusion formulae. •
Probabilite et Statistique I — Chapter 3 27
http://statwww.epfl.ch
Moments of a Distribution
Definition: If X has a pmf f(x) such that∑
x |x|rf(x) < ∞, then
(a) the rth moment of X is E(Xr);
(b) the rth central moment of X is E[X − E(X)r];
(c) the rth factorial moment of X is EX(X − 1) · · · (X − r + 1);
(d) the variance of X is var(X) = E[X − E(X)2].
Note: Of these the mean and variance are most important, as they
measures the location and spread of fX . The variance is analogous to
the moment of inertia in mechanics.
Example 3.24: Find the variance of the score when a die is cast. •
Probabilite et Statistique I — Chapter 3 28
http://statwww.epfl.ch
Properties of Variance
Theorem : Let X be a random variable whose variance exists, and
let a, b be constants. Then
var(X) = E(X2) − E(X)2 = EX(X − 1) + E(X) − E(X)2;
var(aX + b) = a2var(X);
var(X) = 0 ⇒ X is constant with probability 1.
Example 3.25: Find the various moments of a Poisson random
variable. •
Probabilite et Statistique I — Chapter 3 29
http://statwww.epfl.ch
Theorem : If X takes values in 0, 1, . . ., r ≥ 2, and E(X) < ∞,
then
E(X) =∞∑
x=1
P(X ≥ x),
EX(X − 1) · · · (X − r + 1) = r∞∑
x=r
(x − 1) · · · (x − r + 1)P(X ≥ x).
•
Example 3.26: Let X ∼ Geom(p). Find E(X) and var(X). •
Example 3.27 (Coupons): Each packet of some product is
equally likely to contain any one of n different types of coupon,
independently of every other packet. What is the expected number of
packets you must buy to obtain at least one of each type of coupon?•
Probabilite et Statistique I — Chapter 3 30
http://statwww.epfl.ch
3.3 Conditional Distributions
Definition: Let (Ω,F , P) be a probability space, on which a
random variable X is defined, and let B ∈ F . Then the conditional
probability mass function of X given B is
fX(x | B) = P(X = x | B) = P(Ax ∩ B)/P(B),
where Ax = ω ∈ Ω : X(ω) = x.
Theorem : The function fX(x | B) satisfies
fX(x | B) ≥ 0,∑
x
fX(x | B) = 1,
and so is a well-defined probability mass function. •
Example 3.28: Find the conditional pmf of the result of tossing a
die, given that the result is odd. •
Probabilite et Statistique I — Chapter 3 31
http://statwww.epfl.ch
Example 3.29: Find the conditional pmf of X ∼ Geom(p), given
that X ≤ n. •
Definition: Suppose that∑
x |g(x)|fX(x | B) < ∞. Then the
conditional expectation of g(X) given B is
Eg(X) | B =∑
x
g(x)fX(x | B).
Theorem : Let X be a random variable with mean E(X) and let B
be an event with P(B), P(Bc) > 0. Then
E(X) = E(X | B)P(B) + E(X | Bc)P(Bc).
More generally, whenever Bi∞i=1 is a partition of Ω, P(Bi) > 0 for
all i, and the sum is absolutely convergent, then
E(X) =∞∑
i=1
E(X | Bi)P(Bi).
Probabilite et Statistique I — Chapter 3 32
http://statwww.epfl.ch
Example 3.30: The truncated Poisson distribution is defined by
taking X ∼ Pois(λ) and B = X > 0. Find the conditional
probability mass function, mean, and variance for this distribution. •
Example 3.31: A coin is tossed repeatedly. Find the expected
numbers of tosses to the first head, and to the first two consecutive
heads. •
Probabilite et Statistique I — Chapter 3 33
http://statwww.epfl.ch
Example 3.32: Bilbo the hobbit and Smaug the dragon have b and
s gold coins respectively. They play a series of independent games in
which the loser gives the winner a gold coin, stopping when one of
them has no coins remaining. If Bilbo wins each game with
probability p (and p 6= q = 1 − p), find the expected number of games
before they stop.
They then redivide the b + s coins by tossing them all. One player
gets those showing a head, and the other player gets the rest. Now
they play as before. What is the expected number of games until one
or other player has all the coins? •
Probabilite et Statistique I — Chapter 3 34
http://statwww.epfl.ch
3.4 Convergence of Distributions
In applications we often want to approximate one distribution by
another. The mathematical basis for doing so is provided by
convergence of distributions.
Example 3.33 (Law of small numbers): Let Xn ∼ B(n, p),
and suppose that np → λ > 0 while n → ∞. Show that the limiting
probability mass function of Xn is Pois(λ). •
Example 3.34 (Matching, again): In Example 2.16 we saw that
the probability of exactly r fixed points in a random permutation of
n objects is
1
r!
n−r∑
k=0
(−1)k
k!→
e−1
r!as n → ∞.
Thus the number of fixed points has a limiting Pois(1) distribution. •
Probabilite et Statistique I — Chapter 3 35
http://statwww.epfl.ch
Law of Small Numbers
0 5 10 15
0.00
0.15
B(10,0.5)
x
f(x)
0 5 10 15
0.00
0.15
B(20,0.25)
x
f(x)
0 5 10 15
0.00
0.15
B(50,0.1)
x
f(x)
0 5 10 150.
000.
15
Pois(5)
xf(
x)
Probabilite et Statistique I — Chapter 3 36
http://statwww.epfl.ch
Definition: Let f(x) be a probability mass function which is
non-zero for x ∈ D, and zero for x ∈ R\D = C. Let F (x) be the
corresponding distribution function
F (x) =∑
xi≤x
f(xi).
A sequence of distribution functions Fn(x) is said to converge to
F (x) if
Fn(x) → F (x) for x ∈ C as n → ∞.
The corresponding random variables Xn are then said to converge
in distribution (or in law) to a random variable X , that is,
XnD−→ X , where Xn has distribution function Fn and X has
distribution function F . •
If D ⊂ Z, then Fn(x) → F (x) if fn(x) → f(x) for all x as n → ∞.
Probabilite et Statistique I — Chapter 3 37
http://statwww.epfl.ch
Example 3.35: Let XN have hypergeometric probability mass
function
P(XN = i) =
(
mi
)(
N−mn−i
)
(
Nn
) , i = max(0, m + n − N), . . . , min(m, n).
This is the distribution of the number of white balls obtained when a
random sample of size n is taken without replacement from an urn
containing m white and N − m black balls. Show that as N, m → ∞
in such a way that m/N → p, where 0 < p < 1,
P(XN = i) →
(
n
i
)
pi(1 − p)n−i, i = 0, . . . , n.
Thus the limiting distribution of XN is B(n, p). •
Probabilite et Statistique I — Chapter 3 38
http://statwww.epfl.ch
Inequalities
Theorem (Basic inequality): If h(x) is a non-negative function,
then for a > 0,
Ph(X) ≥ a ≤ Eh(X)/a.
Theorem : Let a > 0 and let g be a convex function. Then:
P(|X | ≥ a) ≤ E(|X |)/a, (Markov’s inequality)
P(|X | ≥ a) ≤ E(X2)/a2, (Chebyshov’s inequality)
PX − E(X) ≥ a ≤var(X)
a2 + var(X), (one-sided Chebyshov’s inequality)
Eg(X) ≥ gE(X). (Jensen’s inequality)
Theorem : If var(X) = 0, then X is constant with probability one.
Probabilite et Statistique I — Chapter 3 39
http://statwww.epfl.ch
3.5 Moment Generating Functions
Definition: The moment generating function of a random
variable X is defined as
MX(t) = E(etX),
for t ∈ R such that MX(t) < ∞.
Example 3.36: Find MX(t) when: (a) X is an indicator random
variable; (b) X ∼ B(n, p), (c) X ∼ Pois(λ).
Theorem : There is a one-one correspondence between distribution
functions FX(x) and moment generating functions MX(t).
Example 3.37: Let X ∼ B(n, p) and Y ∼ Pois(λ). Show that as
n → ∞, p → 0 in such a way that np → λ, XD−→ Y , that is, X
converges in distribution to Y .
Probabilite et Statistique I — Chapter 3 40