3. discrete random variables - stat homestat€¦ · 3. discrete random variables contents idea of...

http://statwww.epfl.ch

3. Discrete Random Variables

Contents

Idea of a random variable; probability mass function.

Types of discrete random variables: Bernoulli, indicator, binomial,

geometric, hypergeometric, Poisson.

Distribution function and its properties.

Expectation, variance, and other moments.

Conditional distributions.

Convergence of distributions, and moment generating function.

References: Ross (Chapter 4); Ben Arous notes (Chapters III, IV).

Exercises: 60–64, 66–78 of Recueil d’exercices.

Probabilite et Statistique I — Chapter 3 1


Petit Vocabulaire Probabiliste

Mathematics English Francais

random experiment une experience aleatoire

Ω sample space l’ensemble fondamental

ω outcome, elementary event une epreuve, un evenement elementaire

A, B, . . . event un evenement

F event space l’espace des evenements

sigma-algebra une tribu

P probability distribution/probability function une loi de probabilite

(Ω, F, P) probability space un espace de probabilite

inclusion-exclusion formulae formule d’inclusion-exclusion

P(A | B) probability of A given B la probabilite de A sachant B

independence independance

(mutually) independent events les evenements (mutuellement) independants

pairwise independent events les evenements independants deux a deux

conditionally independent events les evenements conditionellement independants

X, Y, . . . random variable une variable aleatoire

I indicator random variable une variable indicatrice

fX probability mass function fonction de masse

FX probability distribution function fonction de repartition



Random Variables

In applications we usually want to consider numerical random

quantities.

Example 3.1: A family has three children. Let X represent the

number of boys. Possible values for X are 0, 1, 2, 3. Find the

probabilities that X takes these values. •

Definition: Let (Ω,F , P) be a probability space. A random

variable X : Ω 7→ R is a mapping from the sample space Ω to the

real numbers R.

If the range of X ,

D = x ∈ R : ∃ω ∈ Ω such that X(ω) = x

is countable, then X is called a discrete random variable. •



This induces probabilities on subsets S of the real line, given by

P(X ∈ S) = P(w ∈ Ω : X(w) ∈ S).

In particular, we set Ax = ω ∈ Ω : X(ω) = x.

Example 3.2 (Coin): A coin is tossed repeatedly and

independently. Let X be the random variable representing the

number of tosses needed until the first head shows. Compute

P(X = 3), P(X = 15), P(X ≤ 3.5), P(X > 1.7), P(1.7 ≤ X ≤ 3.5).

•

Example 3.3 (Dartboard): A natural set Ω when I play at darts

is the wall on which the dartboard hangs. The dart lands at a point

ω ∈ Ω ⊂ R2. My score X(ω) takes values in D = 0, 1, . . . , 60. •



Bernoulli Random Variable

Definition: A random variable that takes only the values 0 and 1 is

called an indicator random variable, or a Bernoulli random

variable, or sometimes a Bernoulli trial. We use I(C) to denote

the indicator of the event C.

Example 3.4 (Coins): Suppose that n identical coins are tossed

independently, let Hi be the event that the ith coin shows a head,

and let Ii = I(Hi) be the indicator of this event. Then

P(Ii = 1) = P(Hi) = p, P(Ii = 0) = P(Hci ) = 1 − p,

where p is the probability that a coin shows a head. Write down the

sample space and the sets Ax when n = 3. What is the random

variable X = I1 + · · · + In? •



Probability Mass Function

We have already seen that a random variable X induces probabilities

on subsets of R. In particular when X is discrete, we have

Ax = ω ∈ Ω : X(ω) = x, and can define:

Definition: The probability mass function (pmf) (fonction de

masse) of X is the function

fX(x) = P(X = x) = P(Ax), x ∈ R.

The probability mass function has two key properties: (i) fX(x) ≥ 0,

and is positive only for x ∈ D, where D is the range of X , also called

the support of fX ; (ii) the total probability∑

i:xi∈D fX(xi) = 1.

When there is no risk of confusion we write fX ≡ f .



Binomial Random Variable

Example 3.4 (ctd): Compute the probability mass functions and

support of Ii and of X . •

Definition: A binomial random variable X has pmf

fX(x) =

(

n

x

)

px(1 − p)n−x, x = 0, 1, . . . , n, n ∈ N, 0 ≤ p ≤ 1.

We write X ∼ B(n, p), and call n the denominator and p the

success probability. •

Note: we use ∼ as shorthand for ‘has the distribution’.

The binomial model is used when considering the number of

“successes” occurring in a fixed number of independent trials, and

each trial has the same success probability.



Binomial Probability Mass Functions

0 2 4 6 8 10

0.00

0.15

0.30

B(10,0.5)

x

f(x)

0 2 4 6 8 10

0.00

0.15

0.30

B(10,0.3)

x

f(x)

0 5 10 15 20

0.00

0.15

0.30

B(20,0.1)

x

f(x)

0 10 20 30 40

0.00

0.15

0.30

B(40,0.9)

x

f(x)



Example 3.5: Certain physical traits are determined by a pair of

genes, of which there are two types: dominant d and recessive r. A

person with genotype dd has pure dominance, one with dr is hybrid,

and one with rr is recessive. The genotypes dd and rd have the same

phenotype, so cannot be distinguished physically. A child receives a

gene from each parent. If the parents are both hybrid and have 4

children, what is the probability that just three of them show the

dominant trait? What is the probability that at most three of them

show the dominant trait? •

Theorem (Stability of binomial): Let X ∼ B(n, p) and

Y ∼ B(m, p) be independent binomial random variables. Then

X + Y ∼ B(n + m, p). •



Waiting Times

Definition: A geometric random variable X has pmf

fX(x) = p(1 − p)x−1, x = 1, 2, . . . , 0 ≤ p ≤ 1.

We write X ∼ Geom(p), and call p the success probability. •

This is used to model a waiting time to the first event in a series of

independent trials, each with the same success probability.

Example 3.6: To start a board game, players take it in turns to

throw a die. The first to obtain a six starts. What is the probability

that the 3rd player starts? What is the probability of waiting until at

least 6 throws before starting? •

Theorem (Lack of memory): If X ∼ Geom(p), then

P(X > n + m | X > m) = P(X > n). •



Geometric and Negative Binomial PMFs

0 10 20 30 40

0.0

0.2

0.4

Geom(0.5)

x + 1

f(x)

0 10 20 30 40

0.0

0.2

0.4

Geom(0.1)

x + 1

f(x)

0 10 20 30 40

0.00

0.10

0.20

NegBin(4,0.5)

x + 4

f(x)

0 10 20 30 40

0.00

0.10

0.20

NegBin(6,0.3)

x + 6

f(x)



Definition: A negative binomial random variable X with

parameters n and p has pmf

fX(x) =

(

x − 1

n − 1

)

pn(1− p)x−n, x = n, n + 1, n + 2, . . . , 0 ≤ p ≤ 1.

We write X ∼ NegBin(n, p). When n = 1, X ∼ Geom(p). •

This is used to model a waiting time to the nth success in a series of

independent trials, each with the same success probability.

Example 3.7: Two players toss a fair coin successively. What is

the probability that 2 heads appear before 5 tails? •

Theorem (Stability of negative binomial): Let X1, . . . , Xn be

independent geometric random variables with success probability p.

Then X1 + · · · + Xn ∼ NegBin(n, p). •



Banach’s Match Problem

Example 3.8: A pipe-smoking mathematician carries a box of

matches in each of the pockets of his jacket, one on the right and one

on the left. Initially both boxes contain m matches. Each time he

lights his pipe, he chooses a box of matches at random, and throws

the spent match away. After a while he finds that the box he has

chosen is empty. What is then the distribution of the number of

matches in the other box? •



Hypergeometric Distribution

Example 3.9 (Capture-recapture): In order to estimate the

unknown number of fish N in a lake, we first catch r ≤ N fish, mark

them, and put them back. After waiting long enough for the fish

population to be well-mixed, we take a further sample of size s, of

which 0 ≤ m ≤ s are marked. Let M be the random variable

representing the number of marked fish in this sample. Show that

P(M = m) =

(

rm

)(

N−rs−m

)

(

Ns

) , m ∈ max(0, s + r − N), . . . , min(r, s).

This is the pmf of the hypergeometric distribution.

Show that the value of N that maximises this P(M = m) is brs/mc.

Compute the best estimate of N when s = 50, r = 40, and m = 4. •



Example 3.10: An electrician buys components in packets of 10.

He examines three components chosen at random from a packet, and

accepts the packet only if the three chosen are faultless. If 30% of

packets contain 4 bad components and the other 70% contain just

one bad component, what proportion of packets does he reject? •



Distribution Function

Definition: Let X be a random variable. Its cumulative

distribution function (CDF) (fonction de repartition) is

FX(x) = P(X ≤ x), x ∈ R.

If X is discrete, this can be written as

FX(x) =∑

xi∈D:xi≤x

P(X = xi),

which is a step function with jumps at the support of fX(x), i.e.

x ∈ R : fX(x) > 0.

When there is no risk of confusion, we write F ≡ FX .



Example 3.11: Give the support, pmf, and distribution function

for a Bernoulli random variable. •

Example 3.12 (Die): Give the support, pmf, and distribution

function for the value obtained when a fair die is thrown. •

Definition: A discrete uniform random variable X has probability

mass function

fX(x) =1

b − a + 1, x = a, a + 1, . . . , b, a < b, a, b ∈ Z.

•

Definition: A Poisson random variable X has probability mass

function

fX(x) =λx

x!e−λ, x = 0, 1, . . . , λ > 0.

We write X ∼ Pois(λ). •



Poisson Probability Mass Functions

0 5 10 15 20

0.0

0.3

0.6

Pois(0.5)

x

f(x)

0 5 10 15 20

0.0

0.3

0.6

Pois(1)

x

f(x)

0 5 10 15 20

0.00

0.10

0.20

Pois(4)

x

f(x)

0 5 10 15 20

0.00

0.10

0.20

Pois(10)

x

f(x)



Properties of a Distribution Function

Theorem : Let (Ω,F , P) be a probability space and X : Ω 7→ R a

random variable. Its cumulative distribution function FX satisfies:

(a) limx→−∞ FX(x) = 0;

(b) limx→∞ FX(x) = 1;

(c) FX is non-decreasing, that is, FX(x) ≤ FX(y) whenever x ≤ y;

(d) FX is continuous to the right, that is,

limt↓0

FX(x + t) = FX(x), x ∈ R;

(e) P(X > x) = 1 − FX(x);

(f) if x < y, then P(x < X ≤ y) = FX(y) − FX(x).



Note: The pmf is obtained from the CDF by

f(x) = F (x) − limy↑x

F (y), where y < x.

In many cases X takes only integer values, and then

f(x) = F (x) − F (x − 1) for integer x.

Example 3.13 (Urn): An urn contains tickets labelled 1, . . . , n,

from which r are drawn at random. Let X be the largest number

removed if the tickets are replaced in the urn after each drawing, and

let Y be the largest number removed if the drawn tickets are not

replaced. Find fX(x), FX(x), fY (x), and FY (x). Show that

FY (k) < FX(k) for k = 1, . . . , n − 1. •

Example 3.14 (Poisson): Find FX(x) when X ∼ Pois(λ). •



Poisson Cumulative Distribution Functions

0 5 10 15 20

0.0

0.4

0.8

Pois(0.5)

x

F(x

)

0 5 10 15 20

0.0

0.4

0.8

Pois(1)

x

F(x

)

0 5 10 15 20

0.0

0.4

0.8

Pois(4)

x

F(x

)

0 5 10 15 20

0.0

0.4

0.8

Pois(10)

x

F(x

)



Transformations of Discrete Random Variables

Real-valued functions of random variables are themselves random

variables, so they too have probability mass functions.

Theorem : If X and Y are random variables such that Y = g(X),

then Y has probability mass function

fY (y) =∑

x:g(x)=y

fX(x).

•

Example 3.15: Find the pmf of Y = I(X > 0) when X ∼ Pois(λ).

Example 3.16: Let Y be the reminder when the total from a throw

of two independent dice is divided by 4. Find the pmf of Y .



Mathematical Honesty

From now on we mostly ignore the underlying probability space

(Ω,F , P) when dealing with a random variable X and think in terms

of X , FX(x), and fX(x). It can be proved that this is

mathematically legitimate.



3.2 Expectation

Definition: Let X be a discrete random variable for which∑

x∈D |x|fX(x) < ∞, where D is the support of fX . The

expectation (l’esperance) or mean of X is defined to be

E(X) =∑

xP(X = x) =∑

x∈D

xfX(x).

Note: E(X) is sometimes called the average value (la moyenne) of

X . We confine the use of ‘average’ to empirical quantities.

Example 3.17: Find the expected score on the throw of a fair die. •

Example 3.18: Find the means of the random variables with pmfs

fX(x) =4

x(x + 1)(x + 2), fY (x) =

1

x(x + 1), x = 1, 2, . . . .



Example 3.19: Find the mean of a Bernoulli variable with

probability p. •

Example 3.20: Find the mean of X ∼ B(n, p). •

Theorem : Let X be a random variable with mass function f , and

let g be a real-valued function on R. Then

Eg(X) =∑

x

g(x)f(x),

whenever∑

x |g(x)|f(x) < ∞. •

Example 3.21: Let X ∼ Pois(λ). Find the expectations of

X, X(X − 1), X(X − 1) · · · (X − r + 1), cos(θX).

•

Note: Expectation is analogous to the idea from mechanics of the

centre of mass of an object whose mass is distributed according to fX .



Properties of Expectation

Theorem : Let X be a random variable with finite mean E(X),

and let a, b be constants. Then

(a) E(·) is a linear operator, i.e. E(aX + b) = aE(X) + b;

(b) if P(X = b), then E(X) = b;

(c) if P(a < X ≤ b) = 1, then a < E(X) ≤ b;

(d) if g(X) and h(X) have finite means, then

Eg(X) + h(X) = Eg(X) + Eh(X);

(e) finally, E(X)2 ≤ E(|X |)2 ≤ E(X2).

•



Note: The linearity of expectation is extremely useful in practice.

Example 3.22: Let X = I1 + · · · + In, where I1, . . . , In are

independent Bernoulli variables with probability p. Find E(X). Is

independence of the Ii needed? •

Example 2.16 (Matching, ctd): Show that the expected number

of men who leave with the correct hats is 1, for all n. •

Example 3.23 (Indicator random variables): Let IA, IB, . . .

denote indicators of events A, B, . . .. Show that

IA∩B = IAIB, IA∪B = 1 − (1 − IA)(1 − IB), E(IA) = P(A).

and hence establish the inclusion-exclusion formulae. •



Moments of a Distribution

Definition: If X has a pmf f(x) such that∑

x |x|rf(x) < ∞, then

(a) the rth moment of X is E(Xr);

(b) the rth central moment of X is E[X − E(X)r];

(c) the rth factorial moment of X is EX(X − 1) · · · (X − r + 1);

(d) the variance of X is var(X) = E[X − E(X)2].

Note: Of these the mean and variance are most important, as they

measures the location and spread of fX . The variance is analogous to

the moment of inertia in mechanics.

Example 3.24: Find the variance of the score when a die is cast. •



Properties of Variance

Theorem : Let X be a random variable whose variance exists, and

let a, b be constants. Then

var(X) = E(X2) − E(X)2 = EX(X − 1) + E(X) − E(X)2;

var(aX + b) = a2var(X);

var(X) = 0 ⇒ X is constant with probability 1.

Example 3.25: Find the various moments of a Poisson random

variable. •



Theorem : If X takes values in 0, 1, . . ., r ≥ 2, and E(X) < ∞,

then

E(X) =∞∑

x=1

P(X ≥ x),

EX(X − 1) · · · (X − r + 1) = r∞∑

x=r

(x − 1) · · · (x − r + 1)P(X ≥ x).

•

Example 3.26: Let X ∼ Geom(p). Find E(X) and var(X). •

Example 3.27 (Coupons): Each packet of some product is

equally likely to contain any one of n different types of coupon,

independently of every other packet. What is the expected number of

packets you must buy to obtain at least one of each type of coupon?•



3.3 Conditional Distributions

Definition: Let (Ω,F , P) be a probability space, on which a

random variable X is defined, and let B ∈ F . Then the conditional

probability mass function of X given B is

fX(x | B) = P(X = x | B) = P(Ax ∩ B)/P(B),

where Ax = ω ∈ Ω : X(ω) = x.

Theorem : The function fX(x | B) satisfies

fX(x | B) ≥ 0,∑

x

fX(x | B) = 1,

and so is a well-defined probability mass function. •

Example 3.28: Find the conditional pmf of the result of tossing a

die, given that the result is odd. •



Example 3.29: Find the conditional pmf of X ∼ Geom(p), given

that X ≤ n. •

Definition: Suppose that∑

x |g(x)|fX(x | B) < ∞. Then the

conditional expectation of g(X) given B is

Eg(X) | B =∑

x

g(x)fX(x | B).

Theorem : Let X be a random variable with mean E(X) and let B

be an event with P(B), P(Bc) > 0. Then

E(X) = E(X | B)P(B) + E(X | Bc)P(Bc).

More generally, whenever Bi∞i=1 is a partition of Ω, P(Bi) > 0 for

all i, and the sum is absolutely convergent, then

E(X) =∞∑

i=1

E(X | Bi)P(Bi).



Example 3.30: The truncated Poisson distribution is defined by

taking X ∼ Pois(λ) and B = X > 0. Find the conditional

probability mass function, mean, and variance for this distribution. •

Example 3.31: A coin is tossed repeatedly. Find the expected

numbers of tosses to the first head, and to the first two consecutive

heads. •



Example 3.32: Bilbo the hobbit and Smaug the dragon have b and

s gold coins respectively. They play a series of independent games in

which the loser gives the winner a gold coin, stopping when one of

them has no coins remaining. If Bilbo wins each game with

probability p (and p 6= q = 1 − p), find the expected number of games

before they stop.

They then redivide the b + s coins by tossing them all. One player

gets those showing a head, and the other player gets the rest. Now

they play as before. What is the expected number of games until one

or other player has all the coins? •



3.4 Convergence of Distributions

In applications we often want to approximate one distribution by

another. The mathematical basis for doing so is provided by

convergence of distributions.

Example 3.33 (Law of small numbers): Let Xn ∼ B(n, p),

and suppose that np → λ > 0 while n → ∞. Show that the limiting

probability mass function of Xn is Pois(λ). •

Example 3.34 (Matching, again): In Example 2.16 we saw that

the probability of exactly r fixed points in a random permutation of

n objects is

1

r!

n−r∑

k=0

(−1)k

k!→

e−1

r!as n → ∞.

Thus the number of fixed points has a limiting Pois(1) distribution. •



Law of Small Numbers

0 5 10 15

0.00

0.15

B(10,0.5)

x

f(x)

0 5 10 15

0.00

0.15

B(20,0.25)

x

f(x)

0 5 10 15

0.00

0.15

B(50,0.1)

x

f(x)

0 5 10 150.

000.

15

Pois(5)

xf(

x)



Definition: Let f(x) be a probability mass function which is

non-zero for x ∈ D, and zero for x ∈ R\D = C. Let F (x) be the

corresponding distribution function

F (x) =∑

xi≤x

f(xi).

A sequence of distribution functions Fn(x) is said to converge to

F (x) if

Fn(x) → F (x) for x ∈ C as n → ∞.

The corresponding random variables Xn are then said to converge

in distribution (or in law) to a random variable X , that is,

XnD−→ X , where Xn has distribution function Fn and X has

distribution function F . •

If D ⊂ Z, then Fn(x) → F (x) if fn(x) → f(x) for all x as n → ∞.



Example 3.35: Let XN have hypergeometric probability mass

function

P(XN = i) =

(

mi

)(

N−mn−i

)

(

Nn

) , i = max(0, m + n − N), . . . , min(m, n).

This is the distribution of the number of white balls obtained when a

random sample of size n is taken without replacement from an urn

containing m white and N − m black balls. Show that as N, m → ∞

in such a way that m/N → p, where 0 < p < 1,

P(XN = i) →

(

n

i

)

pi(1 − p)n−i, i = 0, . . . , n.

Thus the limiting distribution of XN is B(n, p). •



Inequalities

Theorem (Basic inequality): If h(x) is a non-negative function,

then for a > 0,

Ph(X) ≥ a ≤ Eh(X)/a.

Theorem : Let a > 0 and let g be a convex function. Then:

P(|X | ≥ a) ≤ E(|X |)/a, (Markov’s inequality)

P(|X | ≥ a) ≤ E(X2)/a2, (Chebyshov’s inequality)

PX − E(X) ≥ a ≤var(X)

a2 + var(X), (one-sided Chebyshov’s inequality)

Eg(X) ≥ gE(X). (Jensen’s inequality)

Theorem : If var(X) = 0, then X is constant with probability one.



3.5 Moment Generating Functions

Definition: The moment generating function of a random

variable X is defined as

MX(t) = E(etX),

for t ∈ R such that MX(t) < ∞.

Example 3.36: Find MX(t) when: (a) X is an indicator random

variable; (b) X ∼ B(n, p), (c) X ∼ Pois(λ).

Theorem : There is a one-one correspondence between distribution

functions FX(x) and moment generating functions MX(t).

Example 3.37: Let X ∼ B(n, p) and Y ∼ Pois(λ). Show that as

n → ∞, p → 0 in such a way that np → λ, XD−→ Y , that is, X

converges in distribution to Y .


3. discrete random variables - stat homestat€¦ · 3. discrete random variables contents idea of...

Documents