btry 4080 / stsci 4080, fall 2009

395
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 1 BTRY 4080 / STSCI 4080, Fall 2009 Theory of Probability Instructor: Ping Li Department of Statistical Science Cornell University

Upload: others

Post on 22-Nov-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 1

BTRY 4080 / STSCI 4080, Fall 2009

Theory of Probability

Instructor: Ping Li

Department of Statistical Science

Cornell University

Page 2: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 2

General Information

• Lectures: Tue, Thu 10:10-11:25 am, Malott Hall 253

• Section 1: Mon 2:55 - 4:10 pm, Warren Hall 245

• Section 2: Wed 1:25 - 2:40 pm, Warren Hall 261

• Instructor: Ping Li, [email protected],

Office Hours: Tue, Thu 11:25 am -12 pm, 1192, Comstock Hall

• TA: Xiao Luo, [email protected]. Office hours:

(1) Mon, 4:10 - 5:10pm Warren Hall 245;

(2) Wed, 2:40 - 3:40pm, Warren Hall 261.

• Prerequisites: Math 213 or 222 or equivalent; a course in statistical methods

is recommended

• Textbook: Ross, A First Course in Probability, 7th edition, 2006

Page 3: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 3

• Exams: Locations TBD

– Prelim 1: In Class, September 29, 2009

– Prelim 2: In Class, October 20, 2009

– Prelim 3: In Class, TBD

– Final Exam: Wed. December 16, 2009, from 2pm to 4:30pm.

– Policy: Close book, close notes

Page 4: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 4

• Homework: Weekly

– Please turn in your homework either in class or to BSCB front desk

(Comstock Hall, 1198).

– No late homework will be accepted.

– Before computing your overall homework grade, the assignment with the

lowest grade (if ≥ 25%) will be dropped, the one with the second lowest

grade (if ≥ 50%) will also be dropped.

– It is the students’ responsibility to keep copies of the submitted homework.

Page 5: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 5

• Grading:

1. The homework: 30%

2. The prelim with the lowest grade: 5%

3. The other two prelims: 15% each

4. The final exam 35%

• Course Letter Grade Assignment

A ≈ 90% (in the absolute scale)

C ≈ 60% (in the absolute scale)

In borderline cases, participation in section and class interactions will be used as

a determining factor.

Page 6: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 6

Material

Topic Textbook

Combinatorics 1.1 - 1.6

Axioms of Probability 2.1 - 2.5

Conditional Probability and Independence 3.1 - 3.5

Discrete Random Variables 4.1 - 4.9

Continuous Random Variables 5.1 - 5.7

Jointly Distributed Random Variables 6.1 - 6.7

Expectation 7.1 - 7.9

As time permits: Limit Theorems 8.1 - 8.6,

Poisson Processes and Markov Chains 9.1 - 9.2 Simulation 10.1 - 10.4

Page 7: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 7

Probability Is Increasingly Popular & Truly Useful

An Example: Probabilistic Counting

Consider two sets S1, S2 ∈ Ω = 0, 1, 2, ..., D − 1, for example

D = 2000

S1 = 0, 15, 31, 193, 261, 495, 563, 919S2 = 5, 15, 19, 193, 209, 325, 563, 1029, 1921

Q: What is the size of intersection a = |S1 ∩ S2|?

A: Easy! a = |15, 193, 563| = 3.

Page 8: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 8

Challenges

• What if the size of space D = 264?

• What if there are 10 billion of sets, instead of just 2?

• What if we only need approximate answers?

• What if we need the answers really quickly?

• What if we are mostly interested in sets that are highly similar?

• What if the sets are dynamic and frequently updated?

• ...

Page 9: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 9

Search Engines

Google/Yahoo!/Bing all collected at least 1010 (10 billion) web pages.

Two examples (among many tasks)

• (Very quickly) Determine if two Web pages are nearly duplica tes

A document can be viewed as a collection (set) of words or contiguous words.

This is a counting of set intersection problem.

• (Very quickly) Determine the number of co-occurrences of tw o words

A word can be viewed as a set of documents containing that word

This is also a counting of set intersection problem.

Page 10: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 10

Term Document Matrix

0

Word n

Word 4

Word 2

Word 1

Word 3

Doc 1 Doc 2 Doc m

1 0

10

0 1 0

0 1 0 0 1 0

10

≥ 1010 documents (Web pages).

≥ 105 common English words. (What about total number of contiguous words?)

Page 11: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 11

One Method for Probabilistic Counting of Intersections

Consider two sets S1, S2 ⊂ Ω = 0, 1, 2, ..., D − 1

S1 = 0, 15, 31, 193, 261, 495, 563, 919S2 = 5, 15, 19, 193, 209, 325, 563, 1029, 1921

One can first generate a random permutation π:

π : Ω = 0, 1, 2, ..., D − 1 −→ 0, 1, 2, ..., D − 1

Apply π to sets S1 and S2 (and all other sets). For example

π (S1) = 13, 139, 476, 597, 698, 1355, 1932, 2532π (S2) = 13, 236, 476, 683, 798, 1456, 1968, 2532, 3983

Page 12: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 12

Store only the minimums

minπ (S1) = 13, minπ (S2) = 13

Suppose we can theoretically calculate the probability

P = Pr (minπ (S1) = minπ (S2))

S1 and S2 are more similar =⇒ larger P (i.e., closer to 1).

Generate (small) k random permutations and every times store the minimums.

Count the number of matches to estimate the similarity between S1 and S2.

Page 13: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 13

Later, we can easily show that

P = Pr (minπ (S1) = minπ (S2)) =|S1 ∩ S2||S1 ∪ S2|

=a

f1 + f2 − a

where a = |S1 ∩ S2|, f1 = |S1|, f2 = |S2|.

Therefore, we can estimate P as a binomial probability from k permutations.

Page 14: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 14

Combinatorial Analysis

Ross: Chapter 1, Sections 1.1 - 1.6

Page 15: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 15

Basic Principles of Counting, Section 1.2

1. The multiplicative principle

A project requires two sub-tasks, A and B.

There are m ways to complete A, and there are n ways to complete B.

Q: How many different ways to complete this project?

2. The additive principle

A project can be completed either by doing task A, or by doing task B.

There are m ways to complete A, and there are n ways to complete B.

Q: How many different ways to complete this project?

Page 16: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 16

The Multiplicative Principle

2

A B

DestinationOrigin

1

m

2

1

n

Q: How many different routes from the origin to reach the destination?

Route 1: A:1 + B:1Route 2: A:1 + B:2

...

Route n: A:1 + B:nRoute n+1: A:2 + B:1

...

Route ? A:m + B:n

Page 17: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 17

The additive Principle

Destination

A

1

m

2

B1

n

2

Origin

Q: How many different routes from the origin to reach the destination?

Page 18: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 18

Example :

A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors,

and 2 seniors. A subcommittee of 4, consisting of 1 person from each class, is to

be chosen.

Q: How many different subcommittees are possible?

—————–

Multiplicative or additive?

Page 19: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 19

Example :

In a 7-place license plate, the first 3 places are to be occupied by letters and the

final 4 by numbers, e.g., ITH-0827.

Q: How many different 7-place license plates are possible?

—————–

Multiplicative or additive?

Page 20: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 20

Example :

How many license plates would be possible if repetition among letters or numbers

were prohibited?

Page 21: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 21

Permutations and Combinations, Section 1.3, 1.4

Need to select 3 letters from 26 English letter, to make a license plate.

Q: How many different arrangements are possible?

———————

The answer differs if

• Order of 3 letters matters =⇒ Permutation

e.g., ITH, TIH, HIT, are all different.

• Order of 3 letters does not matter =⇒ Combination

Page 22: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 22

Permutations

Basic formula :

Suppose we have n objects. How many different permutations are there?

n × (n − 1) × (n − 2) × ... × 1 = n!

Why?

Page 23: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 23

...

1 2 3 4 n

n choices n−1 choices n−2 choices 1 choice

...

...

Which counting principle?: Multiplicative? additive?

Page 24: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 24

Example:

A baseball team consists of 9 players.

Q: How many different batting orders are possible?

Page 25: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 25

Example:

A class consists of 6 men and 4 women. The students are ranked according to

their performance in the final, assuming no ties.

Q: How many different rankings are possible?

Q: If the women are ranked among themselves and the men among themselves

how many different rankings are possible?

Page 26: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 26

Example:

How many different letter arrangements can be formed using P,E,P ?

The brute-force approach

• Step 1: Permutations assuming P1 and P2 are different.

P1 P2 E P1 E P2

P2 P1 E P2 E P1

E P1 P2 E P2 P1

• Step 2: Removing duplicates assuming P1 and P2 are the same.

P P E P E P E P P

Q: How many duplicates?

Page 27: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 27

Example:

How many different letter arrangements can be formed using P,E,P,P,E,R ?

Answer:

6!

3! × 2! × 1!= 60

Why?

Page 28: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 28

A More General Example of Permutations:

How many different permutations of n objects, of which n1 are identical, n2 are

identical, ..., nr are identical?

• Step 1: Assuming all n objects are distinct, there are n! permutations.

• Step 2: For each permutation, there are n1! × n2! × ... × nr! ways to

switch objects and still remain the same permutation.

• Step 3: The total number of (unique) permutations are

n!

n1! × n2! × ... × nr!

Page 29: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 29

111 2 3 4 5 6 7 8 9 10

How many ways to switch the elements and remain the same permutation?

Page 30: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 30

Example:

A signal consists of 9 flags hung in a line, which can be made from a set of 4

white flags, 3 red flags, and 2 blue flags. All flags of the same color are identical.

Q: How many different signals are possible?

Page 31: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 31

Combinations

Basic question of combination: The number of different groups of r

objects that could be formed from a total of n objects.

Basic formula of combination:(

n

r

)

=n!

(n − r)! × r!

Pronounced as “n-choose-r”

Page 32: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 32

Basic Formula of Combination

Step 1: When orders are relevant, the number of different groups of r objects

formed from n objects = n × (n − 1) × (n − 2) × ... × (n − r + 1)

n−r+1 choices

1 2 3 4

n choices n−1 choices n−2 choices

...

...

... r

Step 2: When orders are NOT relevant, each group of r objects can be

formed by r! different ways.

Page 33: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 33

Step 3: Therefore, when orders are NOT relevant, the number of different groups

of r objects would be

n × (n − 1) × ... × (n − r + 1)

r!

=n × (n − 1) × ... × (n − r + 1)×(n − r) × (n − r − 1) × ... × 1

r!×(n − r) × (n − r − 1) × ... × 1

=n!

(n − r)! × r!

=

(

n

r

)

Page 34: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 34

Example:

A committee of 3 is to be formed from a group of 20 people.

Q: How many different committees are possible?

Page 35: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 35

Example:

How many different committees consisting of 2 women and 3 men can be formed,

from a group of 5 women and 7 men?

Page 36: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 36

Example (1.4.4c):

Consider a set of n antennas of which m are defective and n−m are functional.

Q: How many linear orderings are there in which no two defectives are

consecutive?

A:(

n−m+1m

)

.

Line up n − m functional antennas. Insert (at most) one defective antenna

between two functional antennas. Considering the boundaries, there are in total

n − m + 1 places to insert defective antennas.

Page 37: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 37

Useful Facts about Binomial Coefficient(

nr

)

1.(

n0

)

= 1, and(

nn

)

= 1. Recall 0! = 1.

2.(

nr

)

=(

nn−r

)

3.(

nr

)

=(

n−1r−1

)

+(

n−1r

)

Proof 1 by algebra(

n−1r−1

)

+(

n−1r

)

= (n−1)!(n−r)!(r−1)! + (n−1)!

(n−r−1)!r! = (n−1)!(r+n−r)(n−r)!r!

Proof 2 by a combinatorial argument

A selection of r objects either contains object 1 or does not contain object 1.

Page 38: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 38

4. Some computational tricks

• Avoid factorial n!, which is easily very large. 170! = 7.2574 × 10306

• Take advantage of(

nr

)

=(

nn−r

)

• Convert products to additions(

n

r

)

=n(n − 1)(n − 2)...(n − r + 1)

r!

= exp

r−1∑

j=0

log(n − j) −r∑

j=1

log j

• Use accurate approximations of the log factorial function (log n!). Check

http://en.wikipedia.org/wiki/Factorial.

Page 39: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 39

The Binomial Theorem

The Binomial Theorem:

(x + y)n

=

n∑

k=0

(

n

k

)

xkyn−k

Example for n = 2:

(x + y)2 = x2 + 2xy + y2,(

20

)

=(

22

)

= 1,(

21

)

= 2

Page 40: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 40

Proof of the Binomial Theorem by Induction

Two key components in proof by induction:

1. Verify a base case, e.g., n = 1.

2. Assume the case n − 1 is true, prove for the case n.

Proof:

The base case is trivially true (but always check ! !)

Assume

(x + y)n−1 =n−1∑

k=0

(

n − 1

k

)

xkyn−1−k

Page 41: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 41

Then

n∑

k=0

(

n

k

)

xkyn−k =n−1∑

k=1

(

n

k

)

xkyn−k + xn + yn

=n−1∑

k=1

(

n − 1

k

)

xkyn−k +n−1∑

k=1

(

n − 1

k − 1

)

xkyn−k + xn + yn

=yn−1∑

k=0

(

n − 1

k

)

xkyn−1−k − yn +n−1∑

k=1

(

n − 1

k − 1

)

xkyn−k + xn + yn

=y(x + y)n−1 +n−2∑

j=0

(

n − 1

j

)

xj+1yn−1−j + xn, (let j = k − 1)

=y(x + y)n−1 + xn−1∑

j=0

(

n − 1

j

)

xjyn−1−j − xn + xn

=y(x + y)n−1 + x(x + y)n−1 = (x + y)n

Page 42: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 42

Multinomial Coefficients

A set of n distinct items is to be divided into r distinct groups of respective size

n1, n2, ..., nr , where∑r

i=1 ni = n.

Q: How many different divisions are possible?

Answer:(

n

n1, n2, ..., nr

)

=n!

n1!n2!...nr!

Page 43: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 43

Proof

Step 1: There are(

nn1

)

ways to form the first group of size n1.

Step 2: There are(

n−n1

n2

)

ways to form the second group of size n2.

...

Step r: There are(

n−n1−n2−...−nr−1

nr

)

ways to form the last group of size nr .

Therefore, the total number of divisions should be(

n

n1

)

×(

n − n1

n2

)

× ... ×(

n − n1 − n2 − ... − nr−1

nr

)

=n!

(n − n1)!n1!× (n − n1)!

(n − n1 − n2)!n2!× ... × (n − n1 − ... − nr−1)!

(n − n1 − ...nr−1 − nr)!nr!

=n!

n1!n2!...nr!

Page 44: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 44

Example:

A police department in a small city consists of 10 officers. The policy is to have 5

patrolling the streets, 2 working full time at the station, and 3 on reserve at the

station.

Q: How many different divisions of 10 officers into 3 groups are possible?

Page 45: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 45

The Number of Integer Solutions of Equations

Suppose x1, x2, ..., xr , are positive, and x1 + x2 + ... + xr = n.

Q: How many distinct vectors (x1, x2, ..., xr) can satisfy the constraints.

A:(

n − 1

r − 1

)

Line up n 1’s. Draw r − 1 lines from in total n − 1 places to form r groups

(numbers).

Page 46: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 46

Suppose x1, x2, ..., xr , are non-negative, and x1 + x2 + ... + xr = n.

Q: How many distinct vectors (x1, x2, ..., xr) can satisfy the constraints.

A:(

n + r − 1

r − 1

)

,

by padding r zeros after n one’s.

One can also let yi = xi + 1, i = 1 to r. Then the problem becomes finding

positive integers (y1, ..., yr) so that y1 + y2 + ... + yr = n + r.

Page 47: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 47

Example:

8 identical blackboards are to be divided among 4 schools.

Q (a): How many divisions are possible?

Q (b): How many, if each school must receive at least 1 blackboard?

Page 48: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 48

More Examples for Chapter 1

From a group of 8 women and 6 men a committee consisting of 3 men and 3

women is to be formed.

(a): How many different committees are possible?

(b): How many, if 2 of the men refuse to serve together?

(c): How many, if 2 of the women refuse to serve together?

(d): How many, if 1 man and 1 woman refuse to serve together?

Page 49: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 49

Solution:

(a): How many different committees are possible?

(

83

)

×(

63

)

= 56 × 20 = 1120

(b): How many, if 2 of the men refuse to serve together?

(

83

)

×[(

43

)

+(

21

)(

42

)]

= 56 × 16 = 896

(c): How many, if 2 of the women refuse to serve together?

[(

63

)

+(

21

)(

62

)]

×(

63

)

= 50 × 20 = 1000

(d): How many, if 1 man and 1 woman refuse to serve together?

(

73

)(

53

)

+(

73

)(

52

)

+(

72

)(

53

)

= 350 + 350 + 210 = 910

Or(

83

)(

63

)

−(

72

)(

52

)

= 910

Page 50: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 50

Fermat’s Combinatorial Identity

Theoretical Exercise 11.

(

n

k

)

=n∑

i=k

(

i − 1

k − 1

)

, n ≥ k

There is an interesting proof (different from the book) using the result for the

number of integer solutions of equations, by a recursive argument.

Page 51: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 51

Proof:

Let f(m, r) be the number of positive integer solutions, i.e.,

x1 + x2 + ... + xr = m.

We know f(m, r) =(

m−1r−1

)

.

Suppose m = n + 1 and r = k + 1, Then

f(n + 1, k + 1) =

(

m − 1

r − 1

)

=

(

n

k

)

x1 can take values from 1 to (n + 1) − k, because xi ≥ 1, i = 1 to k + 1.

Decompose the problem into (n + 1) − k sub-problems by fixing a value for x1.

If x1 = 1, then x2 + x3 + ... + xk+1 = m − 1 = n.

The total number of positive integer solutions is f(n, k).

Page 52: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 52

If x1 = 1, there are still f(n, k) possibilities, i.e.,(

n−1k−1

)

.

If x1 = 2, there are still f(n − 1, k) possibilities, i.e.,(

n−2k−1

)

...

If x1 = (n + 1) − k, there are still f(k, k) possibilities, i.e.,(

k−1k−1

)

Thus, by additive counting principle,

f(n + 1, k + 1) =

n+1−k∑

j=1

f(n + 1 − j, k) =

n+1−k∑

j=1

(

n − j

k − 1

)

But, is this what we want? Yes!

Page 53: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 53

By additive counting principle,

f(n + 1, k + 1) =n+1−k∑

j=1

f(n + 1 − j, k)

=

n+1−k∑

j=1

(

n − j

k − 1

)

=

(

n − 1

k − 1

)

+

(

n − 2

k − 1

)

+ ... +

(

k − 1

k − 1

)

=

(

k − 1

k − 1

)

+

(

k

k − 1

)

+ ... +

(

n − 1

k − 1

)

=n∑

i=k

(

i − 1

k − 1

)

Page 54: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 54

Example: A company has 50 employees, 10 of whom are in management

positions.

Q: How many ways are there to choose a committee consisting of 4 people, if

there must be at least one person of management rank?

A:

(

50

4

)

−(

40

4

)

= 138910 =4∑

k=1

(

10

k

)(

40

4 − k

)

———-

Can we also compute the answer using(

101

)(

493

)

= 184240? No.

It is not correct because the two sub-tasks,(

101

)

and(

493

)

, are overlapping; and

hence we can not simply apply the multiplicative principle.

Page 55: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 55

Axioms of Probability

Chapter 2, Sections 2.1 - 2.5, 2.7

Page 56: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 56

Section 2.1

In this chapter we introduce the concept of the probability of an event and then

show how these probabilities can be computed in certain situations. As a

preliminary, however, we need the concept of the sample space and the events of

an experiment.

Page 57: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 57

Sample Space and Events, Section 2.2

Definition of Sample Space:

The sample space of an experiment, denoted by S, is the set of all possible

outcomes of the experiment.

Example:

S = girl, boy, if the output of an experiment consists of the determination of

the gender of a newborn.

Example:

If the experiment consists of flipping two coins (H/T), the the sample space

S = (H, H), (H, T ), (T, H), (T, T )

Page 58: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 58

Example:

If the outcome of an experiment is the order of finish in a race among 7 horses

having post positions, 1, 2, 3, 4, 5, 6, 7, then

S = all 7! permutations of (1, 2, 3, 4, 5, 6, 7)

Example:

If the experiment consists of tossing two dice, then

S = (i, j), i, j = 1, 2, 3, 4, 5, 6, What is the size |S|?

Example:

If the experiment consists of measuring (in hours) the lifetime of a transistor, then

S = t : 0 ≤ t ≤ ∞.

Page 59: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 59

Definition of Events:

Any subset E of the sample space S is an event.

Example:

S = (H, H), (H, T ), (T, H), (T, T ),

E = (H, H) is an event. E = (H, T ), (T, T ) is also an event

Example:

S = t : 0 ≤ t ≤ ∞,

E = t : 1 ≤ t ≤ 5 is an event,

But t : −10 ≤ t ≤ 5 is NOT an event.

Page 60: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 60

Set Operations on Events

Let A, B, C be events of the same sample space S

A ⊂ S, B ⊂ S, C ⊂ S. Notation for subset: ⊂

Set Union:

A ∪ B = the event which occurs if either A OR B occurs.

Example:

S = 1, 2, 3, 4, 5, ... = the set of all positive integers.

A = 1, B = 1, 3, 5, C = 2, 4, 6,

A ∪ B = 1, 3, 5, B ∪ C = 1, 2, 3, 4, 5, 6,

A ∪ B ∪ C = 1, 2, 3, 4, 5, 6

Page 61: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 61

Set Intersection:

A ∩ B = AB = the event which occurs if both A AND B occur.

Example:

S = 1, 2, 3, 4, 5, ... = the set of all positive integers.

A = 1, B = 1, 3, 5, C = 2, 4, 6,

A ∩ B = 1, B ∩ C = = ∅, A ∩ B ∩ C = ∅

Mutually Exclusive:

If A ∩ B = ∅, then A and B are mutually exclusive.

Page 62: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 62

Set Complement:

Ac = S − A = the event which occurs only if A does not occur.

Example:

S = 1, 2, 3, 4, 5, ... = the set of all positive integers.

A = 1, 3, 5, ... = the set of all positive odd integers.

B = 2, 4, 6, ... = the set of all positive even integers.

Ac = S − A = B, Bc = S − B = A.

Page 63: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 63

Laws of Set Operations

Commutative Laws:

A ∪ B = B ∪ A A ∩ B = B ∩ A

Associative Laws:

(A ∪ B) ∪ C = A ∪ (B ∪ C) = A ∪ B ∪ C

(A ∩ B) ∩ C = A ∩ (B ∩ C) = A ∩ B ∩ C = ABC

Distributive Laws:

(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) = AC ∪ BC

(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)

These laws can be verified by Venn diagrams.

Page 64: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 64

DeMorgan’s Laws

(

n⋃

i=1

Ei

)c

=n⋂

i=1

Eci

(

n⋂

i=1

Ei

)c

=n⋃

i=1

Eci

Page 65: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 65

Proof of (⋃n

i=1 Ei)c

=⋂n

i=1 Eci

Let A = (⋃n

i=1 Ei)c

and B =⋂n

i=1 Eci

1. Prove A ⊆ B

Suppose x ∈ A. =⇒ x does not belong to any Ei.

=⇒ x belongs to every Eci . =⇒ x ∈ ⋂n

i=1 Eci = B.

2. Proof B ⊆ A

Suppose x ∈ B. =⇒ x belongs to every Eci .

=⇒ x does not belong to any Ei. =⇒ x ∈ (⋃n

i=1 Ei)c

= A.

3. Consequently A = B

Page 66: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 66

Proof of (⋂n

i=1 Ei)c

=⋃n

i=1 Eci

Let Fi = Eci .

We have proved (⋃n

i=1 Fi)c

=⋂n

i=1 F ci

(

n⋂

i=1

Ei

)c

=

(

n⋂

i=1

F ci

)c

=

((

n⋃

i=1

Fi

)c)c

=n⋃

i=1

Fi =n⋃

i=1

Eci

Page 67: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 67

Axioms of Probability, Section 2.3

For each event E in the sample space S, there exists a value P (E), referred to

as the probability of E.

P (E) satisfies three axioms.

Axiom 1: 0 ≤ P (E) ≤ 1

Axiom 2: P (S) = 1

Axiom 3: For any sequence of mutually exclusive events, E1, E2, ..., (that

is, EiEj = ∅ whenever i 6= j), then

P

( ∞⋃

i=1

Ei

)

=∞∑

i=1

P (Ei)

Page 68: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 68

Consequences of Three Axioms

Consequence 1: P (∅) = 0.

P (S) = P (S ∪ ∅) = P (S) + P (∅) =⇒ P (∅) = 0

Consequence 2:

If S =⋃n

i=1 Ei and all mutually-exclusive events Ei are equally likely, meaning

P (E1) = P (E2) = ... = P (En), then

P (Ei) =1

n, i = 1 to n

Page 69: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 69

Some Simple Propositions, Section 2.4

Proposition 1: P (Ec) = 1 − P (E)

Proof: S = E ∪ Ec =⇒ 1 = P (S) = P (E) + P (Ec)

Proposition 2: If E ⊂ F , then P (E) ≤ P (F )

Proof:

E ⊂ F =⇒ F = E ∪ (Ec ∩ F )

=⇒ P (F ) = P (E) + P (Ec ∩ F ) ≥ P (E)

Page 70: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 70

Proposition 3: P (E ∪ F ) = P (E) + P (F ) − P (EF )

Proof:

E ∪F = (E −EF )∪(F −EF )∪EF , union of three mutually exclusive sets.

=⇒ P (E ∪ F ) = P (E − EF ) + P (F − EF ) + P (EF )

= P (E) − P (EF ) + P (F ) − P (EF ) + P (EF )

= P (E) + P (F ) − P (EF )

Note that E = EF ∪ (E − EF ), union of two mutually exclusive sets.

Page 71: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 71

A Generalization of Proposition 3:

P (E ∪ F ∪ G)

=P (E) + P (F ) + P (G) − P (EF ) − P (EG) − P (FG) + P (EFG)

Proof:

Let B = F ∪ G. Then P (E ∪ B) = P (E) + P (B) − P (EB)

P (B) = P (F ∪ G) = P (F ) + P (G) − P (FG)

P (EB) = P (E(F ∪ G)) = P (EF ∪ EG) =

P (EF ) + P (EG) − P (EF EG) = P (EF ) + P (EG) − P (EFG)

Page 72: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 72

Proposition 4: P (E1 ∪ E2 ∪ ... ∪ En) = ?

Read this proposition and its proof yourself.

We will temporarily skip this proposition and relevant examples 5m, 5n, and 5o.

However, we will come back to those material at a later time.

Page 73: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 73

Example:

A total of 36 members of a club play tennis, 28 play squash, and 18 play

badminton. Furthermore, 22 of the members play both tennis and squash, 12

play both tennis and badminton, 9 play both squash and badminton, and 4 play all

three sports.

Q: How many members of this club play at least one of these sports?

Page 74: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 74

Solution:

A = members playing tennis B = members playing squash C = members playing badminton |A ∪ B ∪ C| =??

Page 75: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 75

Sample Space Having Equally Likely Outcomes, Sec. 2.5

Assumption: In an experiment, all outcomes in the sample space S are equally

likely to occur.

For example, S = 1, 2, 3, ..., N. Assuming equally likely outcomes, then

P (i) =1

N, i = 1 to N

For any event E:

P (E) =Number of outcomes in E

Number of outcomes in S=

|E||S|

Page 76: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 76

Example: If two dice are rolled, what is the probability that the sum of the

upturned faces will equal 7?

S = (i, j), i, j = 1, 2, 3, 4, 5, 6

E = ??

Page 77: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 77

Example: If 3 balls are randomly drawn from a bag containing 6 white and 5

black balls, what is the probability that one of the drawn balls is white and the

other two black?

S = ??

E = ??

Page 78: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 78

Example: An urn contains n balls, of which one is special. If k of these balls

are withdrawn randomly, which is the probability that the special ball is chosen?

Page 79: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 79

Revisit Probabilistic Counting of Set Intersections

Consider two sets S1, S2 ⊂ Ω = 0, 1, 2, ..., D − 1. For example,

S1 = 0, 15, 31, 193, 261, 495, 563, 919S2 = 5, 15, 19, 193, 209, 325, 563, 1029, 1921

One can first generate a random permutation π:

π : Ω = 0, 1, 2, ..., D − 1 −→ 0, 1, 2, ..., D − 1

Apply π to sets S1 and S2. For example

π (S1) = 13, 139, 476, 597, 698, 1355, 1932, 2532π (S2) = 13, 236, 476, 683, 798, 1456, 1968, 2532, 3983

Page 80: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 80

Store only the minimums

minπ (S1) = 13, minπ (S2) = 13

Suppose we can theoretically calculate the probability

P = Pr (minπ (S1) = minπ (S2))

S1 and S2 are more similar =⇒ larger P (i.e., closer to 1).

Generate (small) k random permutations and every times store the minimums.

Count the number of matches to estimate the similarity between S1 and S2.

Page 81: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 81

Now, we can easily (How? ) show that

P = Pr (minπ (S1) = minπ (S2)) =|S1 ∩ S2||S1 ∪ S2|

=a

f1 + f2 − a

where a = |S1 ∩ S2|, f1 = |S1|, f2 = |S2|.

Therefore, we can estimate P as a binomial probability from k permutations.

How many permutations (value of k) we need?:

Depend on the variance of this binomial distribution.

Page 82: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 82

Poker Example (2.5.5f): A poker hand consists of 5 cards. If the cards have

distinct consecutive values and are not all of the same suit, we say that the hand

is a straight. What is the probability that one is dealt a straight?

A 2 3 4 5 6 7 8 9 10 J Q K A

Page 83: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 83

Example: A 5-card poker hand is a full house if it consists of 3 cards of the

same denomination and 2 cards of the same denomination. (That is, a full house

is three of a kind plus a pair.) What is the probability that one is dealt a full house?

Page 84: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 84

Example (The Birthday Attack): If n people are present in a room, what is the

probability that no two of them celebrate their birthday on the same day of the

year (365 days) ? How large need n be so that this probability is less than 12?

Page 85: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 85

Example (The Birthday Attack): If n people are present in a room, what is the

probability that no two of them celebrate their birthday on the same day of the

year? How large need n be so that this probability is less than 12?

p(n) =365 × 364 × ... × (365 − n + 1)

365n

=365

365× 364

365× ... × 365 − n + 1

365

p(23) = 0.4927 < 12 , p(50) = 0.0296

Page 86: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 86

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n

p(n)

Page 87: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 87

Matlab Code

n = 5:60;

for i = 1:length(n)

p(i) = 1;

for j = 2:n(i);

p(i) = p(i) * (365-j+1)/365;

end;

end;

figure; plot(n,p,’linewidth’,2); grid on; box on;

xlabel(’n’); ylabel(’p(n)’);

How to improve the efficiency for computing many n’s?

Page 88: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 88

Example: A football team consists of 20 offensive and 20 defensive players.

The players are to be paired into groups of 2 for the purpose of determining

roommates. The pairing is done at random.

(a) What is the probability that there are no offensive-defensive roommate pairs?

(b) What is the probability that there are 2i offensive-defensive roommate pairs?

Denote the answer to (b) by P2i. The answer to (a) is then P0.

Page 89: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 89

Solution Steps:

1. Total ways of dividing the 40 players into 20 unordered pairs.

2. Ways of pairing 2i offensive-defensive pairs.

3. Ways of pairing 20 − 2i remaining players within each group.

4. Applying multiplicative principle and equally likely outcome assumption.

5. Obtaining numerical probability values.

Page 90: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 90

Solution Steps:

1. Total ways of dividing the 40 players into 20 unordered pairs.

ordered

(

40

2, 2, ..., 2

)

=(40)!

(2!)20

unordered(40)!

(2!)20(20)!

2. Ways of pairing 2i offensive-defensive pairs.(

20

2i

)

×(

20

2i

)

× (2i)!

3. Ways of pairing 20 − 2i remaining players within each group.

[

(20 − 2i)!

210−i(10 − i)!

]2

Page 91: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 91

4. Applying multiplicative principle and equally likely outcome assumption.

P2i =

[(

202i

)]2(2i)!

[

(20−2i)!210−i(10−i)!

]2

40!22020!

5. Obtaining numerical probability values.

Stirling’s approximation: n! ≈ nn+1/2e−n√

Be careful!

• Better approximations are available.

• n! could easily overflow, so could(

nk

)

.

• The probability never overflows, P ∈ [0, 1].

• Looking for possible cancelations.

• Rearranging terms in numerator and denominator to avoid overflow.

Page 92: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 92

Conditional Probability and Independence

Chapter 3, Sections 3.1 - 3.5

Page 93: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 93

Introduction, Section 3.1

Conditional Probability: one of the most important concepts in probability:

• Calculating probabilities when partial (side) information is available.

• Computing desired probabilities may be easier.

Page 94: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 94

A Famous Brain-Teaser

• There are 3 doors.

• Behind one door is a prize.

• Behind the other 2 is nothing.

• First, you pick one of the 3 doors at random.

• One of the doors that you did not pick will be opened to reveal nothing.

• Q: Do you stick with the door you chose in the first place, or do you switch?

Page 95: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 95

A Joke: Tom & Jerry Conversation

• Jerry: I’ve heard that you will take a plane next week. Are you afraid that

there might be terrist activities?

• Tom: Yes. That’s exactly why I am prepared.

• Jerry: How?

• Tom: It is reported that the chance of having a bomb in a plane is 1106 .

According to what I learned in my probability class (4080), the chance of

having two bombs in the plane will be 11012 .

• Jerry: So?

• Tom: Therefore, to dramatically reduce the chance from 1106 to 1

1012 , I will

bring one bomb to the plane myself!

• Jerry: AH...

Page 96: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 96

Section 3.2, Conditional Probabilities

Example: Suppose we toss a fair dice twice.

• What is the probability that the sum of the 2 dice is 8?

Sample space S = ??Event E = ??

• What if we know the first dice is 3?

(Reduced) S = ?? (Reduced) E = ??

Page 97: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 97

The Basic Formula of Conditional Probability

Two events E, F ⊆ S.

P (E|F ): conditional probability that E occurs given that F occurs.

If P (F ) > 0, then

P (E|F ) =P (EF )

P (F )

Page 98: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 98

The Basic Formula of Conditional Probability

P (E|F ) =P (EF )

P (F )

How to understand this formula? —- Venn diagram

Given that F occurs,

• What is the new (ie, reduced) sample space S′?

• What is the new (ie, reduced) event E′?

• P (E|F ) = |E′||S′| = |E′|/S

|S′|/S =??

Page 99: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 99

Example (3.2.2c):

In the card game bridge the 52 cards are dealt out equally to 4 players — called

East, West, North, and South. If North and South have a total of 8 spades among

them, what is the probability that East has 3 of the remaining 5 spades?

The reduced sample space S′ = ??The reduced event E′ = ??

Page 100: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 100

A transformation of the basic formula

P (E|F ) =P (EF )

P (F )

=⇒

P (EF ) = P (E|F ) × P (F ) = P (F |E) × P (E)

P (E|F ) may be easier to compute than P (F |E), or vice versa.

Page 101: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 101

Example:

A total of n balls are sequentially and randomly chosen, without replacement,

from an urn containing r red and b blue balls (n ≤ r + b).

Given that k of the n balls are blue, what is the probability that the first ball

chosen is blue?

Two solutions:

1. Working with the reduced sample space — Easy!

Basically a problem of choosing one ball from n balls randomly.

2. Working with the original sample space and basic formula — More difficult!

Page 102: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 102

Working with original sample space:

Event B: the first ball chosen is blue.

Event Bk: a total of k blue balls are chosen.

Desired probability :

P (B|Bk) =P (BBk)

P (Bk)=

P (Bk|B)P (B)

P (Bk)

P (B) =?? P (Bk) =?? P (Bk|B) =??

Page 103: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 103

Desired probability :

P (B|Bk) =P (BBk)

P (Bk)=

P (Bk|B)P (B)

P (Bk)

P (B) =b

r + b, P (Bk) =

(

bk

)(

rn−k

)

(

r+bn

) , P (Bk|B) =

(

b−1k−1

)(

rn−k

)

(

r+b−1n−1

)

Therefore,

P (B|Bk) =P (Bk|B)P (B)

P (Bk)=

(

b−1k−1

)(

rn−k

)

(

r+b−1n−1

) × b

r + b×

(

r+bn

)

(

bk

)(

rn−k

) =k

n

Page 104: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 104

Example (3.2.2f):

Suppose an urn contains 8 red balls and 4 white balls. We draw 2 balls from the

urn without replacement. Assume that at each draw each ball in the urn is equally

likely to be chosen.

Q: what is probability that both balls drawn are red?

Two solutions:

1. The direct approach

2. The conditional probability approach

Page 105: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 105

Two solutions:

1. The direct approach(

82

)

(

122

) =14

33

2. The conditional probability approach

8

12× 7

11=

14

33

Page 106: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 106

The Multiplication Rule

An extremely important formula in science

P (E1E2E3...En) = P (E1)P (E2|E1)P (E3|E1E2)...P (En|E1E2...En−1)

Proof?

Page 107: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 107

Example (3.2.2g):

An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards

each.

Q: Compute the probability that each pile has exactly 1 ace.

A: Two solutions:

1. Direct approach.

2. Conditional probability (as in the textbook).

Page 108: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 108

Solution by the direct approach

• The size of sample space)(

52

13

)(

39

13

)(

26

13

)

=52!

13! × 13! × 13! × 13!=

52!

(13!)4

• Fix 4 aces in four piles

• Only need to select 12 cards for each pile (from 48 cards total)(

48

12

)(

36

12

)(

24

12

)

=48!

(12!)4

• Consider 4! permutations

• The desired probability(

4812

)(

3612

)(

2412

)

(

5213

)(

3913

)(

2613

) × 4! =48!

52!

(13!)4

(12!)44! = 0.1055

Page 109: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 109

Solution using the conditional probability

Define

• E1 = The ace of spades is in any one of the piles

• E2 = The ace of spades and the ace of hearts are in different piles

• E3 = The ace of spades, hearts, and diamonds are all in different piles

• E4 = All four aces are in different piles

The desired probability

P (E1E2E3E4)

=P (E1) P (E2|E1)P (E3|E1E2) P (E2|E1)P (E4|E1E2E3)

Page 110: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 110

• P (E1) = 1

• P (E2|E1) = 3951

• P (E3|E1E2) = 2650

• P (E4|E1E2E3) = 1349

• P (E1E2E3E4) = 3951

2650

1349 = 0.1055.

Page 111: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 111

Section 3.3, Baye’s Formula

The basic Baye’s formula

P (E) = P (E|F )P (F ) + P (E|F c) (1 − P (F ))

Proof:

• By Venn diagram. Or

• By algebra using the conditional probability formula

Page 112: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 112

Example (3.3.3a):

An accident-prone person will have an accident within a fixed one-year period

with probability 0.4. This probability decreases to 0.2 for a non-accident-prone

person. Suppose 30% of the population is accident prone.

Q 1: What is the probability that a random person will have an accident within a

fixed one-year period?

Q 2: Suppose a person has an accident within a year. What is the probability that

he/she is accident prone?

Page 113: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 113

Example:

Let p be the probability that the student knows the answer and 1 − p the

probability that the student guesses. Assume that a student who guesses will be

correct with probability 1m , where m is the total number of multiple-choice

alternatives.

Q: What is the conditional probability that a student knew the answer, given that

he/she answered it correctly?

Solution:

Let C = event that the student answers the question correctly.

Let K = event that he/she actually knows the answer.

P (K|C) =??

Page 114: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 114

Solution:

Let C = event that the student answers the question correctly.

Let K = event that he/she actually knows the answer.

P (K|C) =P (KC)

P (C)=

P (C|K)P (K)

P (C|K)P (K) + P (C|Kc)P (Kc)

=1 × p

1 × p + 1m (1 − p)

=mp

1 + (m − 1)p

————

What happens when m is large?

Page 115: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 115

Example (3.3.3d):

A lab blood test is 95% effective in detecting a certain disease when it is

represent. However, the test also yields a false positive result for 1% of the

healthy tested. Assume 0.5% of the population actually has the disease.

Q: What is the probability that a person has the disease given that the test result

is positive?

Solution:

Let D = event that the tested person has the disease.

Let E = event that the test result is positive.

P (D|E) = ??

Page 116: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 116

The Generalized Baye’s Formula

Suppose F1, F2, ..., Fn are mutually exclusive events such as⋃n

i=1 Fi = S,

then

P (E) =n∑

i=1

P (E|Fi)P (Fi),

which is easy to understand from the Venn diagram.

P (Fj |E) =P (EFj)

P (E)=

P (E|Fj)P (Fj)∑n

i=1 P (E|Fi)P (Fi)

Page 117: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 117

Example (3.3.3m):

A bin contains 3 different types of disposable flashlights. The probability that a

type 1 flashlight will give over 100 hours of use is 0.7, with the corresponding

probabilities for type 2 and type 3 flashlights being 0.4 and 0.3 respectively.

Suppose that 20% of the flashlights in the bin are type 1, and 30% are type 2,

and 50% are type 3.

Q 1: What is the probability that a randomly chosen flashlight will give more than

100 hours of use?

Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability

that it was type j?

Page 118: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 118

Q 1: What is the probability that a randomly chosen flashlight will give more than

100 hours of use?

0.7 × 0.2 + 0.4 × 0.3 + 0.3 × 0.5 = 0.14 + 0.12 + 0.15 = 0.41

Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability

that it was type j?

Type 1: 0.140.41 = 0.3415

Type 2: 0.120.41 = 0.3927

Type 3: 0.150.41 = 0.3659

Page 119: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 119

Independent Events

Definition:

Two events E and F are said to be independent if

P (EF ) = P (E)P (F )

Consequently, if E and F are independent, then

P (E|F ) = P (E), P (F |E) = P (F ).

Independent is different from mutually exclusive!

Page 120: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 120

Check Independence by Verifying P (EF ) = P (E)P (F )

Example: Two coins are flipped and all 4 outcomes are equally likely. Let E be

the event that the first lands heads and F the event that the second lands tails.

Verify that E and F are independent.

Example: Suppose we toss 2 fair dice. Let E denote the event that the sum of

two dice is 6 and F denote the event that the first die equals 4.

Verify that E and F are not independent.

Page 121: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 121

Proposition 4.1:

If E and F are independent, then so are E and F c.

Proof:

Given P (EF ) = P (E)P (F )

Need to show P (EF c) = P (E)P (F c).

Which formula contains both P (EF ) and P (EF c)?

Page 122: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 122

Independence of Three Events

Definition:

The three events, E, F ,and G, are said to be independent if

P (EFG) = P (E)P (F )P (G)

P (EF ) = P (E)P (F )

P (EG) = P (E)P (G)

P (FG) = P (F )P (G)

Page 123: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 123

Independence of More Than Three Events

Definition:

The events, Ei, i = 1 to n, are said to be independent if, for every

E1, E2, ..., Er , r ≤ n, of these events,

P (E1E2...Er) = P (E1)P (E2)...P (Er)

Page 124: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 124

Example (3.4.4f):

An infinite sequence of independent trials is to be performed. Each trial results in

a success with probability p and a failure with probability 1 − p.

Q: What is the probability that

1. at least one success occurs in the first n trials.

2. exactly k successes occur in the first n trials.

3. all trials result in successes?

Page 125: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 125

Solution:

Q: What is the probability that

1. at least one success occurs in the first n trials.

1 − (1 − p)n

2. exactly k successes occur in the first n trials.

(

nk

)

pk(1 − p)n−k

How is this formula related to the binomial theorem?

3. all trials result in successes?

pn → 0 if p < 1.

Page 126: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 126

Example (3.4.4h):

Independent trials, consisting of rolling a pair of fair dice, are performed. What is

the probability that an outcome of 5 appears before an outcome of 7 when the

outcome is the sum of the dice?

Solution 1:

Let En denote the event that no 5 or 7 appears on the first n − 1 trials and a 5

appears on the nth trial, then the desired probability is

P

( ∞⋃

n=1

En

)

=∞∑

n=1

P (En)

Why Ei and Ej are mutually exclusive?

Page 127: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 127

P (En) = P ( no 5 or 7 on the first n − 1 trials AND a 5 on the nth trial)

By independence

P (En) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial)

P ( a 5 on the nth trial) = P (a 5 on any trial)

P ( no 5 or 7 on the first n − 1 trials) = [P (no 5 or 7 on any trial)]n−1

P (no 5 or 7 on any trial) = 1 − P (either a 5 OR a 7 on any trial)

P (either a 5 OR a 7 on any trial) = P (a 5 on any trial) + P (a 7 on any trial)

Page 128: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 128

P (En) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial)

P (En) =

(

1 − 4

36− 6

36

)n−14

36=

1

9

(

13

18

)n−1

The desired probability

P

( ∞⋃

n=1

En

)

=∞∑

n=1

P (En) =1

9

1

1 − 1318

=2

5.

Page 129: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 129

Solution 2 (using a conditional argument):

Let E = the desired event.

Let F = the event that the first trial is a 5.

Let G = the event that the first trial is a 7.

Let H = the event that the first trial is neither a 5 nor 7.

P (E) = P (E|F )P (F ) + P (E|G)P (G) + P (E|H)P (H)

Page 130: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 130

A General Result:

If E and F are mutually exclusive events of an experiment, then when

independent trials of this experiment are performed, E will occur before F with

probability

P (E)

P (E) + P (F )

Page 131: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 131

The Gambler’s Ruin Problem (Example 3.4.4k)

Two gamblers, A and B, bet on the outcomes of successive flips of a coin. On

each flip, if the coin comes up head, A collects 1 unit from B, whereas if it comes

up tail, A pays 1 unit to B. They continue until one of them runs out of money.

If it is assumed that the successive flips of the coin are independent and each flip

results in a head with probability p, what is the probability that A ends up with all

the money if he starts with i units and B starts with N − i units?

Intuition: What is this probability if the coin is fair (i.e., p = 1/2)?

Page 132: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 132

Solution:

E = event that A ends with all the money when he starts with i.

H = event that first flip lands heads.

Pi = P (E) = P (E|H)P (H) + P (E|Hc)P (Hc)

P0 =??

PN =??

P (E|H) =??

P (E|Hc) =??

Page 133: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 133

Pi = P (E) =P (E|H)P (H) + P (E|Hc)P (Hc)

=pPi+1 + (1 − p)Pi−1 = pPi+1 + qPi−1

=⇒Pi+1 =Pi − qPi−1

p=⇒ Pi+1 − Pi =

q

p(Pi − Pi−1)

P2 − P1 =q

pP1

P3 − P2 =q

p(P2 − P1) =

(

q

p

)2

P1

...

Pi − Pi−1 =

(

q

p

)i−1

P1

Page 134: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 134

Pi =P1

[

1 +

(

q

p

)

+

(

q

p

)2

+ ... +

(

q

p

)i−1]

=P1

1 −(

qp

)i

1 −(

qp

)

if q 6= p

Because PN = 1, we know

P1 =1 −

(

qp

)

1 −(

qp

)Nif q 6= p

Pi =1 −

(

qp

)i

1 −(

qp

)Nif q 6= p

(Do we really have to worry about q = p = 12?)

Page 135: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 135

The Matching Problem (Example 3.5.5d)

At a party n men take off their hats. The hats are then mixed up, and each man

randomly selects one. We say that a match occurs if a man selects his own hat.

Q: What is the probability of

(a) no matches;

(b) exactly k matches?

Page 136: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 136

Solution (a): using recursion and conditional probabilities.

1. Let E = event that no matches occurs. Denote Pn = P (E).

Let M = event that the first man selects his own hat.

2. Pn = P (E) = P (E|M)P (M) + P (E|M c)P (M c)

3. Express Pn as a function a Pn−1, Pn−2, etc.

4. Identify the boundary conditions, eg P1 =?, P2 =?.

5. Prove Pn, eg, by induction.

Page 137: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 137

Solution (a): using recursion and conditional probabilities.

1. Let E = event that no matches occurs. Denote Pn = P (E).

Let M = event that the first man selects his own hat.

2. Pn = P (E) = P (E|M)P (M) + P (E|M c)P (M c)

P (E|M) = 0, P (M c) = n−1n , Pn = n−1

n P (E|M c)

3. Express Pn as a function a Pn−1, Pn−2, etc.

P (E|M c) = 1n−1Pn−2 + Pn−1, Pn = n−1

n Pn−1 + 1nPn−2.

4. Identify the boundary conditions.

P1 = 0, P2 = 1/2.

5. Prove Pn by induction.

Pn = 12! − 1

3! + 14! − ... + (−1)n

n! .

(What is a good approximation of Pn?)

Page 138: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 138

Solution (b):

1. Fix a group of k men.

2. The probability that these k men select their own hats.

3. The probability that none of the remaining n − k men select their own hats.

4. Multiplicative counting principle.

5. Adjustment for selecting k men.

Page 139: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 139

Solution (b):

1. Fix a group of k men.

2. The probability that these k men select their own hats.

1n

1n−1

1n−2 ... 1

n−k+1 = (n−k)!n!

3. The probability that none of the remaining n − k men select their own hats.

Pn−k

4. Multiplicative counting principle.(n−k)!

n! × Pn−k

5. Adjustment for selecting k men.

(n−k)!n! Pn−k

(

nk

)

Page 140: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 140

P (.|F ) Is A Probability

Conditional probabilities satisfy all the properties of ordinary probabilities.

Proposition 5.1

• 0 ≤ P (E|F ) ≤ 1

• P (S|F ) = 1

• If Ei, i = 1, 2, ..., are mutually exclusive events, then

P

( ∞⋃

i=1

Ei|F)

=∞∑

i=1

P (Ei|F )

Page 141: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 141

Conditionally Independent Events

Definition:

Two events, E1 and E2, are said to be conditionally independent given event F , if

P (E1|E2F ) = P (E1|F )

or, equivalently

P (E1E2|F ) = P (E1|F )P (E2|F )

Page 142: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 142

Laplace’s Rule of Succession (Example 3.5.5e)

There are k + 1 coins in a box. The ith coin will, when flipped, turn up heads with

probability i/k, i = 0, 1, ..., k. A coin is randomly selected from the box and is

then repeatedly flipped.

Q: If the first n flip all result in heads, what is the conditional probability that the

(n + 1)st flip will do likewise?

Key: Conditioning on that the ith coin is selected, the outcomes will be

conditionally independent.

Page 143: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 143

Solution :

Let Ci = event that the ith coin is selected.

Let Fn = event the first n flips all result in heads.

Let H = event that the (n + 1)st flip is a head.

The desired probability :

P (H|Fn) =k∑

i=0

P (H|FnCi)P (Ci|Fn)

By conditional independence P (H|FnCi) =??

Solve P (Ci|Fn) using Baye’s formula.

Page 144: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 144

The desired probability :

P (H|Fn) =k∑

i=0

P (H|FnCi)P (Ci|Fn)

By conditional independence P (H|FnCi) = P (H|Ci) = ik

Using Baye’s formula

P (Ci|Fn) =P (CiFn)

P (Fn)=

P (Fn|Ci)P (Ci)∑k

j=0 P (Fn|Cj)P (Cj)=

(

ik

)n 1k+1

∑kj=0

(

jk

)n 1k+1

The final answer :

P (H|Fn) =

∑kj=0

(

jk

)n+1

∑kj=0

(

jk

)n

Page 145: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 145

Chapter 4: (Discrete) Random Variables

Definition:

A random variable is a function from the sample space S to the real numbers.

A discrete random variable is a random variable that takes only on a finite (or at

most a countably infinite) number of values.

Page 146: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 146

Example (1a):

Suppose the experiment consists of tossing 3 fair coins. Let Y denote the

number of heads appearing, then Y is a discrete random variable taking on one

of the values 0, 1, 2, 3 with probabilities

P (Y = 0) = P(T, T, T ) =1

8

P (Y = 1) = P(H, T, T ), (T, H, T ), (T, T, H) =3

8

P (Y = 2) = P(T, H, H), (H, T, H), (H, H, T ) =3

8

P (Y = 3) = P(H, H, H) =1

8

Check

P

(

3⋃

i=0

Y = i)

=3∑

i=0

PY = i = 1.

Page 147: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 147

Example (1c)

Independent trials, consisting of the flipping of a coin having probability p of

coming up heads, are continuously performed until either a head occurs or a total

of n flips is made. If we let X denote the number of times the coin is flipped, then

X is a random variable taking on one of the the values 1, 2, ..., n.

P (X = 1) = PH = p

P (X = 2) = P(T, H) = (1 − p)p

P (X = 3) = P(T, T, H) = (1 − p)2p

...

P (X = n − 1) = (1 − p)n−2p

P (X = n) = (1 − p)n−1

Check∑n

i=1 P (X = i) = 1??

Page 148: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 148

Example (1d)

3 balls are randomly chosen from an urn containing 3 white, 3 red, and 5 black

balls. Suppose that we win $1 for each white ball selected and lose $1 for each

red ball selected. Let X denote our total winnings from the experiments.

Q 1: What possible values can X take on?

Q 2: What are the respective probabilities?

Page 149: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 149

Solution 1 : What possible values can X take on?

X ∈ 3, 2, 1, 0,−1,−2,−3.

Solution 2 : What are the respective probabilities?

P (X = 0) =

(

53

)

+(

31

)(

31

)(

51

)

(

113

) =55

165

P (X = 3) = P (X = −3) =

(

33

)

(

113

) =1

165

Page 150: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 150

The Coupon Collection Problem (Example 1e)

There are N distinct types of coupons and each time one obtains a coupon

equally likely from one of the N types, independent of prior selections.

A random variable T = number of coupons that needs to be collected until one

obtains a complete set of at least one of each type.

Q: What is P (T = n)?

Page 151: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 151

Solution:

1. Suffice to study P (T > n)

because P (T = n) = P (T > n) − P (T > n − 1)

2. Define event Aj = no type j coupon is collected among first n.

3. P (T > n) = P(

⋃Nj=1 Aj

)

Then what?

Page 152: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 152

P

N⋃

j=1

Aj

=∑

j

P (Aj) −∑

j1<j2

P (Aj1Aj2)

+∑

j1<j2<j3

P (Aj1Aj2Aj3) + ... + (−1)N+1P (A1A2...AN )

Page 153: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 153

P (Aj) =

(

N − 1

N

)n

P (Aj1Aj2) =

(

N − 2

N

)n

...

P (Aj1Aj2 ...Ajk) =

(

N − k

N

)n

...

P (Aj1Aj2 ...AjN−1) =

(

1

N

)n

P (A1A2...AN ) = 0

Page 154: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 154

P (T > n) = P

N⋃

j=1

Aj

=∑

j

P (Aj) −∑

j1<j2

P (Aj1Aj2)

+∑

j1<j2<j3

P (Aj1Aj2Aj3) + ... + (−1)N+1P (A1A2...AN )

=N

(

N − 1

N

)n

−(

N

2

)(

N − 2

N

)n

+

(

N

3

)(

N − 3

N

)n

+ ... + (−1)N

(

N

N − 1

)(

1

N

)n

=N−1∑

j=1

(

N

j

)(

N − j

N

)n

(−1)j+1

Page 155: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 155

Cumulative Distribution Function

For a random variable X , its cumulative distribution function F is defined as

F (x) = PX ≤ x

F (x) is a non-decreasing function:

If x ≤ y, then F (x) ≤ F (y).

Page 156: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 156

Section 4.2: Discrete Random Variables

Discrete random variable A random variable X that take on at most a

countable number of possible values.

The probability mass function p(a) = PX = a

If X takes one of the values x1, x2, ..., then

p(xi) ≥ 0

∞∑

i=1

p(xi) = 1

Page 157: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 157

Example (4.2.2a): The probability mass function of a random variable X is

given by p(i) = cλi

i! , i = 0, 1, 2, ....

Q: Find

• the relation between c and λ.

• PX = 0;

• PX > 2;

Page 158: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 158

Solution :

• The relation between c and λ.

∞∑

i=0

p(i) = 1 =⇒ c∞∑

i=0

λi

i!= 1 =⇒ ceλ = 1 =⇒ c = e−λ.

• PX = 0;

PX = 0 = c = e−λ.

• PX > 2;

PX > 2 =1 − PX = 0 − PX = 1 − PX = 2

=1 − c − cλ − cλ2

2

=1 − e−λ − λe−λ − λ2e−λ

2.

Page 159: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 159

Section 4.3: Expected Value

Definition:

If X is a discrete random variable having a probability mass function p(x), the

expected value of X is defined by

E(X) =∑

x;p(x)>0

xp(x)

The expected value is a weighted average of possible values that X can take on.

Page 160: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 160

Example Find E(x) where X is the outcome when we roll a fair die.

E(X) =

6∑

i=1

i1

6=

1

6

6∑

i=1

i =7

2

Example Let I be an indicator variable for event A, if

I =

1 if A occurs

0 if Ac occurs

E(X) =??

Page 161: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 161

Example (3d) A school class of 120 students are driven in 3 buses to a

symphony. There are 36 students in one of the buses, 40 in another, and 44 in the

third bus. When the buses arrive, one of the 120 students is randomly chosen.

Let X denote the number of students on the bus of that randomly chosen

student, find E(X).

Solution:

X = ??P (X) =??

Page 162: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 162

Example (4a)

PX = −1 = 0.2, PX = 0 = 0.5, PX = 1 = 0.2.

Find E(X2).

Solution: Let Y = X2.

PY = 0 =PX = 0 = 0.5

PY = 1 =PX = 1 OR X = −1=PX = 1 + PX = −1=0.2 + 0.3 = 0.5

Therefore

E(X2) = E(Y ) = 0 × 0.5 + 1 × 0.5 = 0.5

Page 163: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 163

Verify

i

PX = xix2i

=PX = −1(−1)2 + PX = 0(0)2 + PX = 1(1)2

=0.5??= E(X2)

Page 164: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 164

Section 4.4: Expectation of a Function of Random Variable

Proposition 4.1 If X is a discrete random variable that takes on one of the

values xi, i ≥ 1, with respective probabilities p(xi), then for any real-valued

function g

E (g(X)) =∑

i

g(xi)p(xi)

Proof: Group the i’s with the same g(xi).

Page 165: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 165

Proof:

Group the i’s with the same g(xi).

i

g(xi)p(xi) =∑

j

i:g(xi)=yj

g(xi)p(xi)

=∑

j

yj

i:g(xi)=yj

p(xi)

=∑

j

yjPg(X) = yj

=E(g(X)).

Page 166: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 166

Corollary 4.1 If a and b are constants, then

E(aX + b) = aE(X) + b.

Proof Use the definition of E(X).

The nth moment

E(Xn) =∑

x;p(x)>0

xnp(x)

Page 167: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 167

Section 4.5: Variance

The expectation E(X) measures the average of possible values of X .

The variance V ar(X) measures the variation, or spread of these values.

Definition If X is a random variable with mean µ, then the variance of X is

defined by

V ar(X) = E[

(X − µ)2]

Page 168: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 168

An alternative formula for V ar(X)

V ar(X) = E(X2) − (E(X))2

= E(X2) − µ2

Proof:

V ar(X) =E[

(X − µ)2]

=∑

x

(x − µ)2p(x)

=∑

x

(x2 − 2µx + µ2)p(x)

=∑

x

x2p(x) − 2µ∑

x

xp(x) + µ2∑

x

p(x)

=E(X2) − 2µ2 + µ2

=E(X2) − µ2.

Page 169: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 169

Two important identities

If a and b are constants, then

V ar(a) = V ar(b) = 0

V ar(aX + b) = a2V ar(X)

Proof: Use the definitions.

Page 170: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 170

E(aX + b) = aE(x) + b

V ar(aX + b) =E ((aX + b) − E(aX + b))2

=E ((aX + b) − aEX − b)2

=E (aX − aEX)2

=a2E (X − EX)2

=a2V ar(X)

Page 171: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 171

Section 4.6: Bernoulli and Binomial Random Variables

Bernoulli random variable X ∼ Bernoulli(p)

PX = 1 = p(1) = p,

PX = 0 = p(0) = 1 − p.

Properties:

E(X) = p

V ar(X) = p(1 − p)

Page 172: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 172

Binomial random variable

Definition A random variable X presents the number of total successes that

occur in n independent trials. Each trial results in a success with probability p and

a failure with probability 1 − p.

X ∼ Binomial(n, p).

The probability mass function

p(i) = PX = i =

(

n

i

)

pi(1 − p)n−i

• Proof?

• ∑ni=0 p(i) = 1?

Page 173: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 173

A very useful representation:

X ∼ binomial(n, p) can be viewed as the sum of n independent Bernoulli

random variables with parameter p.

X =n∑

i=1

Yi, Yi ∼ Bernoulli(p)

Page 174: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 174

Example : Five fair coins are flipped. If the outcomes are assumed

independent, find the probability mass function of the numbers of heads obtained.

Solution:

• X = number of heads obtained.

• X ∼ binomial(n, p). n=??, p=??

• PX = i =?

Page 175: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 175

Properties of X ∼ binomial(n, p)

Expectation: E(X) = np.

Variance: V ar(X) = np(1 − p).

Proof:??

Page 176: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 176

E(X) =n∑

i=0

i

(

n

i

)

pi(1 − p)n−i

=n∑

i=0

in!

i!(n − i)!pi(1 − p)n−i

=

n∑

i=1

in!

i!(n − i)!pi(1 − p)n−i

=npn∑

i=1

(n − 1)!

(i − 1)!((n − 1) − (i − 1))!pi−1(1 − p)(n−1)−(i−1)

=npn−1∑

j=0

(n − 1)!

j!((n − 1) − j)!pj(1 − p)(n−1)−j

=np

Page 177: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 177

E(X2) =n∑

i=0

i2(

n

i

)

pi(1 − p)n−i =n∑

i=1

i2n!

i!(n − i)!pi(1 − p)n−i

=npn∑

i=1

i(n − 1)!

(i − 1)!((n − 1) − (i − 1))!pi−1(1 − p)(n−1)−(i−1)

=npn−1∑

j=0

(j + 1)(n − 1)!

j!((n − 1) − j)!pj(1 − p)(n−1)−j

=npE(Y + 1) Y ∼ binomial(n − 1, p)

=np((n − 1)p + 1)

V ar(X) = E(X2)− (E(X))2

= np((n− 1)p+1)− (np)2 = np(1− p)

Page 178: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 178

What Happens When n → ∞ and np → λ?

X ∼ binomial(n, p), n → ∞ and np → λ

PX = i =n!

(n − i)!i!pi(1 − p)n−i

=(1 − p)n

i!

pi

(1 − p)in(n − 1)(n − 2)...(n − i + 1)

=

[

(1 − p)n(np)i

i!

] [

n(n − 1)(n − 2)...(n − i + 1)

(1 − p)ini

]

≈e−λλi

i!

Page 179: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 179

limn→∞

(

1 − λ

n

)n

= e−λ

For fixed (small) i,

limn→∞

n(n − 1)(n − 2)...(n − i + 1)

(n − λ)i= 1

Page 180: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 180

Section 4.7: The Poisson Random Variable

Definition: A random variable X , taking on one of the values 0, 1, 2, ..., is a

Poisson random variable with parameter λ, if for some λ > 0,

p(i) = PX = i = e−λ λi

i!, i = 0, 1, 2, 3, ...

Check

∞∑

i=0

p(i) =∞∑

i=0

e−λ λi

i!= 1??

Page 181: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 181

Facts about Poisson Distribution

• Poisson(λ) approximates binomial (n, p), when n large and np ≈ λ.

(

n

i

)

pi(1 − p)n−i ≈e−λλi

i!

• Poisson is widely used for modeling rare (p ≈ 0) events (but not too rare).

– The number of misprints on a page of a book

– The number of people in a community living to 100 years of age

– ...

Page 182: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 182

Compare Binomial with Poisson

0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

i

p(i)

n = 20 p = 0.2

BinomialPoisson

0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

ip(

i)

n = 20 p = 0.2

PoissonBinomial

Poisson approximation is good

• n does not have to be too large.

• p does not have to be too small.

Page 183: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 183

Matlab Code

function comp_binom_poisson(n,p);

i=0:n; P1=binopdf(i,n,p); P2=poisspdf(i,n*p);

figure; bar(i,P1,’g’); hold on; grid on;

bar(i,P2,’r’); legend(’Binomial’,’Poisson’);

xlabel(’i’); ylabel(’p(i)’);

axis([-0.5 n 0 max(P1)+0.05]);

title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);

figure; bar(i,P2,’r’); hold on; grid on;

bar(i,P1,’g’); legend(’Poisson’,’Binomial’);

xlabel(’i’); ylabel(’p(i)’);

axis([-0.5 n 0 max(P1)+0.05]);

title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);

Page 184: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 184

Example Suppose that the number of typographical errors on a single page

of a book has a Poisson distribution with parameter λ = 12 .

Q: Calculate the probability that there is at least one error on this page.

P (X ≥ 1) = 1 − P (X = 0) = 1 − e−1/2 = 0.393.

Page 185: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 185

Expectation and Variance of Poisson

X ∼ Poisson(λ) Y ∼ binomial(n, p)

X approximates Y when (1): n large, (2): p small, and (3): np ≈ λ

E(Y ) = np, V ar(Y ) = np(1 − p)

Guess E(X) = ??, V ar(X) =??

Page 186: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 186

X ∼ Poisson(λ)

E(X) =∞∑

i=1

ie−λ λi

i!= λ

∞∑

i=1

e−λ λi−1

(i − 1)!= λ

∞∑

j=0

e−λ λj

j!= λ

E(X2) =∞∑

i=1

i2e−λ λi

i!= λ

∞∑

i=1

ie−λ λi−1

(i − 1)!

=λ∞∑

j=0

(j + 1)e−λ λj

j!= λ (E(X) + 1) = λ2 + λ

V ar(X) = λ2 + λ − λ2 = λ

Page 187: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 187

Section 4.8.1: Geometric Random Variable

Definition: Suppose that independent trials, each having probability p of

being a success, are performed until a success occurs. X is the number of trials

required. Then

PX = n = (1 − p)n−1p, n = 1, 2, 3, ...

Check∑∞

n=1 PX = n =∑∞

n=1(1 − p)n−1p = 1??

Page 188: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 188

Expectation X ∼ Geometric(p).

E(X) =∞∑

n=1

n(1 − p)n−1p (q = 1 − p)

=p + 2qp + 3q2p + 4q3p + 5q4p + ...

=p(

1 + 2q + 3q2 + 4q3 + 5q4 + ...)

=??

We know

1 + q + q2 + q3 + q4 + ... =1

1 − q

How about

I = 1 + 2q + 3q2 + 4q3 + 5q4 + ... =??

Page 189: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 189

I = 1 + 2q + 3q2 + 4q3 + 5q4 + ... =??

First trick

q × I = q + 2q2 + 3q3 + 4q4 + 5q5 + ...

I − qI = 1 + q + q2 + q3 + q4 + ... =1

1 − q

=⇒I =1

(1 − q)2

=⇒E(X) = pI =p

(1 − q)2=

1

p

Page 190: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 190

I = 1 + 2q + 3q2 + 4q3 + 5q4 + ... =??

Second trick

I =d

dq

(

q + q2 + q3 + q4 + q5 + ...)

I =d

dq

1

1 − q=

1

(1 − q)2

Page 191: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 191

Variance X ∼ Geometric(p).

E(X2) =∞∑

n=1

n2(1 − p)n−1p (q = 1 − p)

=p(

1 + 22q + 32q2 + 42q3 + 52q4 + ...)

J = 1 + 22q + 32q2 + 42q3 + 52q4 + ... =??

Page 192: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 192

J = 1 + 22q + 32q2 + 42q3 + 52q4 + ... =??

Trick 1

qJ = q + 22q2 + 32q3 + 42q4 + 52q5 + ...

J(1 − q) = 1 + (22 − 12)q + (32 − 22)q2 + (42 − 32)q3 + ...

= 1 + 3q + 5q2 + 7q3 + ...

J(1 − q)q = q + 3q2 + 5q3 + 7q4 + ...

J(1 − q)2 = 1 + 2q + 2q2 + 2q3 + ... = −1 +2

1 − q=

1 + q

1 − q

J =1 + q

(1 − q)3E(X2) =

p(2 − p

p3=

2 − p

p2V ar(X) =

1 − p

p2

Page 193: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 193

J = 1 + 22q + 32q2 + 42q3 + 52q4 + ... =??

Trick 2

J =

∞∑

n=1

n2qn−1 =

∞∑

n=1

d

dqnqn

=d

dq

[ ∞∑

n=1

nqn

]

=d

dq

[

q

pE(X)

]

=q

p

d

dq

1

1 − q=

q

p

1

(1 − q)2

Page 194: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 194

Section 4.8.2: Negative Binomial Distribution

Definition: Independent trials, each having probability p, of being a success

are performed until a total of r successes is accumulated. Let X denote the

number of trials required. Then X is a negative binomial random variable

X ∼ NB(r, p)

X ∼ Geometric(p) is the same as X ∼ NB(1, p).

What is probability mass function of X ∼ NB(r, p)?

Page 195: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 195

PX = n =

(

n − 1

r − 1

)

pr(1 − p)n−r, n = r, r + 1, r + 2, ...

E(Xk) =∞∑

n=r

nk

(

n − 1

r − 1

)

pr(1 − p)n−r

=∞∑

n=r

nk (n − 1)!

(r − 1)!(n − r)!pr(1 − p)n−r

Now what?

Page 196: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 196

E(Xk) =∞∑

n=r

nk−1 n!

(r − 1)!(n − r)!pr(1 − p)n−r

=r

p

∞∑

n=r

nk−1 n!

r!(n − r)!pr+1(1 − p)n−r, (m = n + 1)

=r

p

∞∑

m=r+1

(m − 1)k−1 (m − 1)!

r!(m − 1 − r)!pr+1(1 − p)m−1−r

=r

pE[(Y − 1)k−1], Y ∼ NB(r + 1, p)

E(X) =??, E(X2) =??, V ar(X) =??

Page 197: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 197

E(Xk) =r

pE[(Y − 1)k−1], Y ∼ NB(r + 1, p)

Let k = 1. Then

E(X) =r

pE[(Y − 1)0] =

r

p

Let k = 2. Then

E(X2) =r

pE[(Y − 1)] =

r

p

(

r + 1

p− 1

)

V ar(X) = E(X2) − E2X =r(1 − p)

p2

Page 198: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 198

Example (4.8.8a) An urn contains N white and M black balls. Balls are

randomly selected, one at a time, until a black one is obtained. Each selected ball

is replaced before the next one is drawn.

Q(a): What is the probability that exactly n draws are needed?

Q(b): What is the expected number of draws needed?

Q(c): What is the probability that at least k draws are needed?

Page 199: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 199

Solution: This is a geometric random variable with p = MM+N .

Q(a): What is the probability that exactly n draws are needed?

P (X = n) = (1 − p)n−1p =

(

N

M + N

)n−1M

M + N=

MNn−1

(M + N)n

Q(b): What is the expected number of draws needed?

E(X) =1

p=

M + N

M= 1 +

N

M

Q(c): What is the probability that at least k draws are needed?

P (X ≥ k) =∞∑

n=k

(1 − p)n−1p =p(1 − p)k−1

1 − (1 − p)=

(

N

M + N

)k−1

Page 200: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 200

Example (4.8.8e): A pipe-smoking mathematician carries 2 matchboxes, 1 in

the left-hand pocket and 1 in his right-hand pocket. Each time he needs a match

he is equal likely to take it from either pocket. Consider the moment when the

mathematician first discovers that one of his matchboxes is empty. If it is assumed

that both matchboxes initially contained N matches, what is the probability that

there are exactly k matches in the other box, k = 0, 1, ..., N?

Page 201: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 201

Solution:

1. Assume the left-hand pocket is empty.

2. X = total matches taken at the moment when left-hand pocket is empty.

3. View X as a random variable X ∼ NB(r, p). What is r, p?

4. Remember there are k matches remaining in the right-hand pocket.

5. The desired probability = PX = n. What is n?

6. Adjustment by considering either left-hand or right-hand can be empty.

Page 202: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 202

Solution:

1. Assume the left-hand pocket is empty.

2. X = total matches taken at the moment when left-hand pocket is empty.

3. View X as a random variable X ∼ NB(r, p). What is r, p?

p = 12 , r = N + 1.

4. Remember there are k matches remaining in the right-hand pocket.

5. The desired probability = PX = n. What is n?

n = 2N − k + 1.

6. Adjustment by considering either left-hand or right-hand can be empty.

2(

2N−kN

) (

12

)2N−k+1.

Page 203: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 203

Section 4.8.3: Hypergeometric Distribution

Definition A sample of size n is chosen randomly without replacement from

an urn containing N balls, of which m are white and N − m are black. Let X

denote the number of white balls selected, then X ∼ HG(N, m, n)

PX = i =

(

mi

)(

N−mn−i

)

(

Nn

) , i = 0, 1, ..., n.

n − (N − m) ≤ i ≤ min(n, m)

Page 204: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 204

If the n balls are chosen with replacement, then it is binomial(

n, p = mN

)

.

when m and N are large relative to n.

X ∼ HG(N, m, n) ≈ binomial(

n,m

N

)

E(X)??≈ np =

nm

N??

V ar(X)??≈ np(1 − p) =

nm

N

(

1 − m

N

)

??

Page 205: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 205

E(Xk) =

n∑

i=1

ik(

mi

)(

N−mn−i

)

(

Nn

)

=n∑

i=1

ik−1 m!

(i − 1)!(m − i)!

(N − m)!

(n − i)!(N − m − n − i)!

n!(N − n)!

N !

=n−1∑

j=0

(j + 1)k−1 m!

j!(m − 1 − j)!

(N − m)!

(n − 1 − j)!(N − m − n − 1 − j)!

n!(N − n)!

N !

=nm

NE[

(Y + 1)k−1]

, Y ∼ HG(N − 1, m − 1, n − 1)

Page 206: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 206

Section 4.9: Properties of Cumulative Distribution Functi on

F (x) = PX ≤ x

1. F (x) is a non-decreasing function: If a < b, then F (a) ≤ F (b)

2. limx→∞ = 1

3. limx→−∞ = 0

4. F (x) is right continuous.

Page 207: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 207

Chapter 5: Continuous Random Variables

Definition: X is a continuous random variable if there exists a non-negative

function f , defined for all real x ∈ (−∞, ∞), having the property that for any

set B of real numbers,

PX ∈ B =

B

f(x)dx

The function f is called the probability density function of the random variable X .

Page 208: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 208

Some Facts

• PX ∈ (−∞, ∞) =∫∞−∞ f(x)dx = 1

• PX ∈ (a, b) = Pa ≤ X ≤ b =∫ b

af(x)dx

• PX = a =∫ a

af(x)dx = 0

• F (a) = PX ≤ a = PX < a =∫ a

−∞ f(x)dx

• F (∞) = 1, F (−∞) = 0

• f(x) = ddxF (x)

Page 209: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 209

Example 5.1.1a: Suppose that X is a continuous random variable whose

probability density function is given by

f(x) =

C(4x − 2x2) 0 < x < 2

0 otherwise

Q(a): What is the value of C?

Q(b): Find PX > 1.

Page 210: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 210

Example 5.1.1d: Denote FX and fX the cumulative function and the

density function of a continuous random variable X , respectively. Find the density

function of Y = 2X .

Page 211: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 211

Example 5.1.1d: Denote FX and fX the cumulative function and the

density function of a continuous random variable X , respectively. Find the density

function of Y = 2X .

Solution:

FY (y) = P Y ≤ y = P 2X ≤ y = P X ≤ y/2 = FX(y/2)

fY (x) =d

dyFY (y) =

d

dyFX(y/2) =

1

2fX(y/2)

Page 212: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 212

Derivatives of Common Functions

http://en.wikipedia.org/wiki/Derivative

• Exponential and logarithmic functionsddxex = ex, d

dxax = ax log a, (Notation: log x = loge x)ddx log x = 1

x , ddx loga x = 1

x log a

• Trigonometric functionsddx sinx = cosx, d

dx cosx = − sinx, ddx tanx = sec2 x

• Inverse trigonometric functionsddx arcsinx = 1√

1−x2 , ddx arccosx = − 1√

1−x2 ,

ddx arctanx = 1

1+x2 ,

Page 213: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 213

Integration by Parts

∫ b

a

f(x)dx = f(x)x|ba −∫ b

a

xf ′(x)dx

∫ b

a

f(x)d[g(x)] = f(x)g(x)|ba −∫ b

a

g(x)f ′(x)dx

∫ b

a

f(x)h(x)dx =

∫ b

a

f(x)d[H(x)] = f(x)H(x)|ba −∫ b

a

H(x)f ′(x)dx

where h(x) = H ′(x).

Page 214: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 214

Examples:

∫ b

a

log xdx =x log x|ba −∫ b

a

x [log x]′dx

=b log b − a log a −∫ b

a

x1

xdx

=b log b − a log a − b + a.

∫ b

a

x log xdx =1

2

∫ b

a

log xd[x2] =1

2x2 log x|ba − 1

2

∫ b

a

x2 [log x]′dx

=1

2b2 log b − 1

2a2 log a − 1

2

∫ b

a

xdx

=1

2b2 log b − 1

2a2 log a − 1

4

(

b2 − a2)

Page 215: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 215

∫ b

a

xnexdx =

∫ b

a

xnd[ex] = xnex|ba − n

∫ b

a

exxn−1dx

∫ b

a

xn cosxdx =

∫ b

a

xnd[sinx] = xn sinx|ba − n

∫ b

a

sinx xn−1dx

∫ b

a

sinn x sin 2xdx =2

∫ b

a

sinn x sinx cosxdx = 2

∫ b

a

sinn+1 xd[sinx]

=2

n + 2sinn+2 x

b

a

Page 216: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 216

Expectation

Definition: Given a continuous random variable X with density f(x), its

density is defined as

E(X) =

∫ ∞

−∞xf(x)dx

Page 217: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 217

Example: The density of X is given by

f(x) =

2x if 0 ≤ x ≤ 1

0 otherwise

Find E[

eX]

.

Two solutions:

1. • Find the density function of Y = eX , denoted by fY (y).

• E(Y ) =∫∞−∞ yfY (y)dy.

2. Apply the formula

E [g(X)] =

∫ ∞

−∞g(x)f(x)dx

Page 218: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 218

Solution 1: Y = eX .

FY (y) =P Y ≤ y = P

eX ≤ y

=P X ≤ log y =

∫ log y

0

2ydy = log2 y

Therefore,

fY (y) =2

ylog y, 1 ≤ y ≤ e

E(Y ) =

∫ ∞

−∞yfY (y)dy =

∫ e

1

2 log ydy

= 2y log y|e1 −∫ e

1

2y1

ydy = 2

Page 219: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 219

Solution 2:

E[

eX]

=

∫ 1

0

2xexdx =

∫ 1

0

2xdex

= 2xex|10 − 2

∫ 1

0

exdx

=2e − 2(e − 1) = 2

Page 220: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 220

Lemma 5.2.2.1 For a non-negative random variable X ,

E(X) =

∫ ∞

0

PX > xdx

Page 221: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 221

Lemma 5.2.2.1 For a non-negative random variable X ,

E(X) =

∫ ∞

0

PX > xdx

Proof:∫ ∞

0

PX > xdx =

∫ ∞

0

∫ ∞

x

fX(y)dydx

=

∫ ∞

0

∫ y

0

fX(y)dxdy

=

∫ ∞

0

fX(y)

(∫ y

0

dx

)

dy

=

∫ ∞

0

fX(y)ydy = E(X)

Page 222: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 222

X is a continuous random variable.

• E(aX + b) = aE(X) + b, where a and b are constants.

• V ar(X) = E[

(X − E(X))2]

= E(X2) − E2(X).

• V ar(aX + b) = a2V ar(X)

Page 223: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 223

Section 5.3: The Uniform Random Variable

Definition: X is a uniform random variable on the interval (a, b) if its

probability density function is given by

f(x) =

1b−a if a < x < b

0 otherwise

Denote X ∼ Uniform(a, b), or simply X ∼ U(a, b).

Page 224: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 224

Cumulative function

F (x) =

0 if x ≤ a

x−ab−a if a < x < b

1 if x ≥ b

If X ∼ U(0, 1), then F (x) = x, if 0 ≤ x ≤ 1.

Expectation

E(X) =a + b

2

Variance

V ar(X) =(b − a)2

12

Page 225: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 225

Section 5.4: Normal Random Variable

Definition: X is a normal random variable, X ∼ N(

µ, σ2)

if its density is

given by

f(x) =1√2πσ

e−(x−µ)2

2σ2 , −∞ < x < ∞

Page 226: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 226

• If X ∼ N(µ, σ2), then E(X) = µ and V ar(X) = σ2.

• The density function is a “bell-shaped” curve.

• It was first introduced by French Mathematician Abraham DeMoivre in 1733,

to approximate the binomial when n was large.

• The distribution became truly famous because of Karl F. Gauss

and hence its another common name is “Gaussian distribution.”

• It is called “normal distribution” (probably started by Karl Pearson) because

“most people” believed that it was “normal” for any well-behaved data set to

follow this distribution.

Page 227: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 227

Check that∫ ∞

−∞f(x)dx =

∫ ∞

−∞

1√2πσ

e−(x−µ)2

2σ2 dx = 1.

Note that, by letting z = x−µσ ,

∫ ∞

−∞

1√2πσ

e−(x−µ)2

2σ2 dx =

∫ ∞

−∞

1√2π

e−z2

2 dz = I.

It suffices to show that I2 = 1.

I2 =

∫ ∞

−∞

1√2π

e−z2

2 dz ×∫ ∞

−∞

1√2π

e−w2

2 dw

=

∫ ∞

−∞

∫ ∞

−∞

1

2πe−

z2+w2

2 dzdw

=1

∫ ∞

0

∫ 2π

0

e−r2

2 rdθdr =1

∫ 2π

0

∫ ∞

0

e−r2

2 d

[

r2

2

]

= 1.

Page 228: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 228

Expectation and Variance

If X ∼ N(µ, σ2), then E(X) = µ and V ar(X) = σ2.

E(X) =

∫ ∞

−∞x

1√2πσ

e−(x−µ)2

2σ2 dx

=

∫ ∞

−∞(x − µ)

1√2πσ

e−(x−µ)2

2σ2 dx +

∫ ∞

−∞µ

1√2πσ

e−(x−µ)2

2σ2 dx

=

∫ ∞

−∞(x − µ)

1√2πσ

e−(x−µ)2

2σ2 d[x − µ] + µ

=

∫ ∞

−∞z

1√2πσ

e−z2

2σ2 dz + µ

=µ.

Page 229: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 229

V ar(X2) =E[

(X − µ)2]

=

∫ ∞

−∞(x − µ)2

1√2πσ

e−(x−µ)2

2σ2 d[x − µ]

=

∫ ∞

−∞z2 1√

2πσe−

z2

2σ2 dz

=σ2

∫ ∞

−∞

z2

σ2

1√2π

e−z2

2σ2 d[ z

σ

]

=σ2

∫ ∞

−∞t2

1√2π

e−t2

2 d [t]

=σ2

∫ ∞

−∞t

1√2π

e−t2

2 d

[

t2

2

]

=σ2

∫ ∞

−∞−t

1√2π

d[

e−t2

2

]

=σ2

∫ ∞

−∞

1√2π

e−t2

2 dt − 1√2π

σ2te−t2

2

−∞= σ2

Page 230: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 230

Why limt→∞ te−t2

2 = 0??

limt→∞

te−t2

2 = limt→∞

t

et2/2= lim

t→∞[t]′

[

et2/2]′ = lim

t→∞1

tet2/2= 0

Page 231: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 231

Normal Distribution Approximates Binomial

Suppose X ∼ Binomial(n, p). For fixed p, as n → ∞

Binomial(n, p) ≈ N(µ, σ2),

µ = np, σ2 = np(1 − p).

Page 232: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 232

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

x

Den

sity

(m

ass)

func

tion

n = 10 p = 0.2

Page 233: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 233

−10 −5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

x

Den

sity

(m

ass)

func

tion

n = 20 p = 0.2

Page 234: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 234

−10 −5 0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

x

Den

sity

(m

ass)

func

tion

n = 50 p = 0.2

Page 235: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 235

−10 0 10 20 30 40 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

x

Den

sity

(m

ass)

func

tion

n = 100 p = 0.2

Page 236: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 236

100 150 200 250 3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

x

Den

sity

(m

ass)

func

tion

n = 1000 p = 0.2

Page 237: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 237

Matlab code

function NormalApprBinomial(n,p);

mu = n*p; sigma2 = n*p*(1-p);

figure;

bar((0:n),binopdf(0:n,n,p),’g’);hold on; grid on;

x = mu - 3*sigma2:0.001:mu+3*sigma2;

plot(x,normpdf(x,mu,sqrt(sigma2)),’r-’,’linewidth’,2);

xlabel(’x’);ylabel(’Density (mass) function’);

title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);

Page 238: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 238

The Standard Normal Distribution

Z ∼ N(0, 1) is called the standard normal.

• E(Z) = 0, V ar(Z) = 1.

• If X ∼ N(

µ, σ2)

, then Y = x−µσ ∼ N(0, 1).

• Z ∼ N(0, 1). Denote density function fZ(z) by φ(z)

and cumulative function FZ(z) by Φ(z).

φ(z) =1√2π

e−z2

2 , Φ(z) =

∫ z

−∞

1√2π

e−z2

2 dz

• φ(−x) = φ(x), Φ(−x) = 1 − Φ(x). Numerical table on Page 222.

Page 239: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 239

Example 5.4.4b: X ∼ N(µ = 3, σ2 = 9). Find

• P2 < X < 5• PX > 0• P|X − 3| > 6.

Solution:

Let Y = X−µσ = X−3

3 . Then Y ∼ N(0, 1).

P2 < X < 5 =P

2 − 3

3<

X − 3

3<

5 − 3

3

=P

−1

3< Y <

2

3

= Φ

(

2

3

)

− Φ

(

−1

3

)

=Φ(0.6667) − 1 + Φ (0.3333)

≈0.7475 − 1 + 0.6306 ≈ 0.3781

Page 240: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 240

Section 5.5: Exponential Random Variable

Definition: An exponential random variable X has the probability density

function given by

f(x) =

λe−λx if x ≥ 0

0 if x < 0(λ > 0)

X ∼ EXP (λ).

Cumulative function:

F (x) =

∫ x

0

λe−λxdx = 1 − e−λx, (x ≥ 0).

F (∞) =??

Page 241: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 241

Moments:

E(Xn) =

∫ ∞

0

xnλe−λxdx

= −∫ ∞

0

xnde−λx

=

∫ ∞

0

e−λxnxn−1dx

=n

λE(Xn−1)

E(X) = 1λ , E(X2) = 2

λ2 , V ar(X) = 1λ2 , E(Xn) = n!

λn ,

Page 242: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 242

Applications of Exponential Distribution

• The exponential distribution often arises as being the distribution of the

amount of time until some specific event occurs.

• For instance, the amount of time (starting from now) until an earthquake

occurs, or until a new war breaks out, etc.

• The exponential distribution is memoryless.

PX > s + t|X > t = PX > s, for all s, t ≥ 0

Page 243: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 243

Example 5b: Suppose that the length of a phone call in minutes is an

exponential random variable with parameter λ = 110 . If someone arrives

immediately ahead of you at a public telephone booth, find the probability that you

will have to wait

• (a) more than 10 minutes;

• (b) between 10 and 20 minutes.

Solution:

• PX > 10 = e−λ10 = e−1 = 0.368.

• P10 < X < 20 = e−1 − e−2 = 0.233

Page 244: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 244

Section 5.6.1: The Gamma Distribution

Definition: A random variable has a gamma distribution with parameters

(α, λ), where α, λ > 0, if its density function is given by

f(x) =

λe−λx(λx)α−1

Γ(α) , x ≥ 0

0 x < 0

Denote X ∼ Gamma(α, λ).

∫∞0

f(x)dx = 1??

Page 245: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 245

The Gamma function is defined as

Γ(α) =

∫ ∞

0

e−yyα−1dy = (α − 1)Γ(α − 1).

Γ(1) = 1. Γ(0.5) =√

π. Γ(n) = (n − 1)!.

Page 246: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 246

Connection to exponential distribution

• When α = 1, the gamma distribution reduces to an exponential distribution.

Gamma(α = 1, λ) = EXP (λ).

• A gamma distribution Gamma(n, λ) often arises as the distribution of the

amount of time one has to wait until a total of n events has occurred.

Page 247: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 247

X ∼ Gamma(α, λ).

Expectation and Variance:

E(X) = αλ , V ar(X) = α

λ2 .

E(X) =

∫ ∞

0

xλe−λx(λx)α−1

Γ(α)dx

=

∫ ∞

0

e−λx(λx)α

Γ(α)dx

=Γ(α + 1)

Γ(α)

1

λ

∫ ∞

0

λe−λx(λx)α

Γ(α + 1)dx

=Γ(α + 1)

Γ(α)

1

λ

λ.

Page 248: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 248

Section 5.6.4: The Beta Distribution

Definition: A random variable X has a beta distribution if its density is given

by

f(x) =

1B(a,b)x

a−1(1 − x)b−1 0 < x < 1

0 otherwise

Denote X ∼ Beta(a, b).

The Beta function

B(a, b) =Γ(a)Γ(b)

Γ(a + b)=

∫ 1

0

xa−1(1 − x)b−1dx

Page 249: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 249

X ∼ Beta(a, b).

Expectation and Variance:

E(X) =a

a + b,

V ar(X) =ab

(a + b)2(a + b + 1)

Page 250: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 250

Sec. 5.7: Distribution of a Function of a Random Variable

Given the density function fX(x) and cumulative function FX(x) for a random

variable X , what is the general formula for the density function fY (y) of the

random variable Y = g(X)?

Example 5.7.7a: X ∼ Uniform(0, 1). Y = g(X) = Xn.

FY (y) = PY ≤ y = PXn ≤ y = PX ≤ y1/n = FX(y1/n) = y1/n

fY (y) =1

ny1/n−1, 0 ≤ y ≤ 1

=fX

(

g−1(y))

d

dyg−1(y)

y = g(x) = xn =⇒ g−1(y) = y1/n.

Page 251: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 251

Theorem 7.1: Let X be a continuous random variable having probability

density function fX . Suppose g(x) is a strictly monotone (increasing or

decreasing), differentiable function of x. Then the random variable Y = g(X)

has a probability density function given by

fY (y) =

fX

(

g−1(y))

ddy g−1(y)

∣if y = g(x) for some x

0 if y 6= g(x) for all x

Page 252: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 252

Example 7b: Y = X2

FY (y) = P Y ≤ y = P

X2 ≤ y

=P −√y ≤ X ≤ √

y=FX(

√y) − FX(−√

y)

fY (y) =1

2√

y(fX(

√y) + fX(−√

y))

Does this example violate Theorem 7.1?

Page 253: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 253

Example (Cauchy distribution) : Suppose a random variable X has density

function

f(x) =1

π(1 + x2), −∞ < x < ∞

Q: Find the distribution of 1X .

One can check∫∞−∞ f(x) = 1, but E(X) = ∞.

Page 254: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 254

Solution: Denote the cumulative function of X by FX(x).

Let Z = 1/X . Assume z > 0,

FZ(z) =P1/X ≤ z = PX ≤ 0 + PX ≥ 1/z, X ≥ 0

=1

2+

∫ ∞

1/z

1

π(1 + x2)dx

=1

2+

1

πtan−1(z)

1/z

=1

2+

1

π

2− tan−1(1/z)

)

Therefore, for z > 0,

fZ(z) =1

π

2− tan−1(1/z)

]′=

1

π(1 + z2).

Page 255: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 255

Assume z < 0,

FZ(z) =P1/X ≤ z = PX ≥ 1/z, X ≤ 0

=

∫ 0

1/z

1

π(1 + x2)dx

=1

πtan−1(z)

0

1/z

=1

π

(

0 − tan−1(1/z))

Therefore, for z < 0,

fZ(z) =1

π

[

0 − tan−1(1/z)]′

=1

π(1 + z2).

Thus, Z and X have the same density function.

Page 256: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 256

Chapter 6: Jointly Distributed Random Variables

Definition: For any two random variables X and Y , the joint cumulative

probability distribution is defined as

F (x, y) = P X ≤ x, Y ≤ y , −∞ < x, y < ∞

Page 257: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 257

F (x, y) = P X ≤ x, Y ≤ y , −∞ < x, y < ∞The distribution of X is

FX(x) =PX ≤ x=PX ≤ x, Y ≤ ∞=F (x,∞)

The distribution of Y is

FY (y) =PY ≤ y = F (∞, y)

Example:

PX > x, Y > y = 1 − FX(x) − FY (y) + F (x, y)

Page 258: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 258

Jointly Discrete Random Variables

X and Y are both discrete random variables.

Joint probability mass function

p(x, y) = PX = x, Y = y,∑

x

y

p(x, y) = 1

Marginal probability mass function

pX(x) = PX = x =∑

y:p(x,y)>0

p(x, y)

pY (y) = PY = y =∑

x:p(x,y)>0

p(x, y)

Page 259: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 259

Example 1a: Suppose that 3 balls are randomly selected from an urn

containing 3 red, 4 white, and 5 blue balls. Let X and Y denote, respectively, the

number of red and white balls chosen.

Q: The joint probability mass function p(x, y) = PX = x, Y = y=??

Page 260: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 260

p(0, 0) =

(

53

)

(

123

) =10

220

p(0, 1) =

(

41

)(

52

)

(

123

) =40

220

p(0, 2) =

(

42

)(

51

)

(

123

) =30

220

p(0, 3) =

(

43

)

(

123

) =4

220

pX(0) = p(0, 0) + p(0, 1) + p(0, 2) + p(0, 3) =84

220

pX(0) =

(

93

)

(

123

) =84

220

Page 261: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 261

Jointly Continuous Random Variables

X and Y are jointly continuous if there exists a function f(x, y) defined for all

real x and y, having the property that for every set C of pairs of real numbers

P(X, Y ) ∈ C =

∫ ∫

(x,y)∈C

f(x, y)dxdy

The function f(x, y) is called the joint probability density function.

Page 262: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 262

The joint cumulative function

F (x, y) = PX ∈ (−∞, x], Y ∈ (−∞, y]

=

∫ y

−∞

∫ x

−∞f(x, y)dxdy

The joint probability density function

f(x, y) =∂2

∂x∂yF (x, y)

The marginal probability density functions

fX(x) =

∫ ∞

−∞f(x, y)dy, fY (y) =

∫ ∞

−∞f(x, y)dx

Page 263: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 263

Example 1c: The joint density function is given by

f(x, y) =

Ce−xe−2y 0 < x < ∞, 0 < y < ∞0 otherwise

• (a) Determine C .

• (b) PX > 1, Y < 1,

• (c) PX < Y ,

• (d) PX < a.

Page 264: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 264

Solution (a):

1 =

∫ ∞

0

∫ ∞

0

Ce−xe−2ydxdy

=C

∫ ∞

0

e−xdx

∫ ∞

0

e−2ydy

=C × e−x|0∞ × 1

2e−2y|0∞ =

C

2

Therefore, C = 2.

Solution (b):

PX > 1, Y < 1 =

∫ 1

0

∫ ∞

1

2e−xe−2ydxdy

=e−x|1∞ e−2y|01 = e−1(1 − e−2)

Page 265: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 265

Solution (c):

PX < Y =

∫ ∫

(x,y):x<y

2e−xe−2ydxdy =

=

∫ ∞

0

∫ y

0

2e−xe−2ydxdy

=

∫ ∞

0

[∫ y

0

e−xdx

]

2e−2ydy

=

∫ ∞

0

[

1 − e−y]

2e−2ydy

=

∫ ∞

0

2e−2y − 2e−3ydy

=1 − 2

3=

1

3

Page 266: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 266

Solution (d):

PX < a =PX < a, Y < ∞

=

∫ a

0

∫ ∞

0

2e−2ye−xdydx

=

∫ a

0

e−xdx

=1 − e−a

Page 267: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 267

Example 1e: The joint density is given by

f(x, y) =

e−(x+y) 0 < x < ∞, 0 < y < ∞0 otherwise

Q: Find the density function of the random variable X/Y .

Page 268: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 268

Solution: Let Z = X/Y , 0 < Z < ∞.

FZ(z) =P Z < z = P X/Y ≤ z

=

∫ ∫

x/y≤z

e−(x+y)dxdy

=

∫ ∞

0

[∫ yz

0

e−(x+y)dx

]

dy

=

∫ ∞

0

e−y[

e−x|0yz

]

dy

=

∫ ∞

0

e−y − e−y−yzdy

=e−y|0∞ +e−y−yz

z + 1

0

= 1 − 1

z + 1

Therefore, fZ(z) =[

1 − 1z+1

]′= 1

(z+1)2 .

Page 269: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 269

FZ(z) =P Z < z = P X/Y ≤ z

=

∫ ∫

x/y≤z

e−(x+y)dxdy

=

∫ ∞

0

[

∫ ∞

xz

e−(x+y)dy

]

dx

=

∫ ∞

0

e−x[

e−y|xz∞]

dy

=

∫ ∞

0

e−x−xz dy

=e−x−x

z

1 + 1z

0

∞=

1

1 + 1z

= 1 − 1

z + 1

Page 270: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 270

Multiple Jointly Distributed Random Variables

Joint cumulative distribution function

F (x1, x2, x3, ..., xn) = P X1 ≤ x1, X2 ≤ x2, ..., Xn ≤ xn

Suppose X1, X2, ..., Xn are discrete random variables.

Joint probability mass function

p(x1, x2, ..., xn) = PX1 = x1, X2 = x2, ..., Xn = xn

Page 271: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 271

X1, X2, ..., Xn are jointly continuous if there exists a function f(x1, x2, ..., fn)

such that for any set C in n-dimensional space

P(X1, X2, ..., Xn) ∈ C =

∫ ∫

...

(x1,x2,...,xn)∈C

f(x1, x2, ..., xn)dx1dx2...dxn

f(x1, x2, ..., xn) is the joint probability density function .

Page 272: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 272

Section 6.2: Independent Random Variables

Definition: The random variable X and Y are said to be independent if for

any two sets of real numbers A and B

PX ∈ A, Y ∈ B = PX ∈ A × PY ∈ B

equivalently

PX ≤ x, Y ≤ y = PX ≤ x × PY ≤ y

or, equivalently,

p(x, y) = pX(x)pY (y) for all x, y when X and Y are discrete

f(x, y) = fX(x)fY (y) for all x, y when X and Y are continous

Page 273: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 273

Example 2b: Suppose the number of people that enter a post office on a

given day is a Poisson random variable with parameter λ.

Each person that enters the post office is male with probability p and a female

with probability 1 − p.

Let X and Y , respectively, be the number of males and females that enter the

post office.

• (a): Find the joint probability mass function PX = i, Y = j.

• (b): Find the marginal probability mass functions PX = i, PY = j.

• (c): Show that X and Y are independent.

Page 274: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 274

Key: Suppose we know there are N = n people that enter a post office. Then

the number of males is a binomial with parameters (n, p).

[X |N = n] ∼ Binomial (n, p)

[N = X + Y ] ∼ Poi(λ).

Therefore,

PX = i, Y = =∞∑

n=0

PX = i, Y = j|N = nPN = n

=PX = i, Y = j|N = i + jPN = i + j

because PX = i, Y = j|N 6= i + j = 0.

Page 275: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 275

Solution (a) Conditional on X + Y = i + j:

PX = i, Y = j=PX = i, Y = j|X + Y = i + jPX + Y = i + j

=

[(

i + j

i

)

pi(1 − p)j

] [

e−λ λi+j

(i + j)!

]

=pi(1 − p)je−λλiλj

i!j!

=pi(1 − p)je−λ(p+(1−p))λiλj

i!j!

=

[

e−λp (λp)i

i!

] [

e−λ(1−p) (λ(1 − p))j

j!

]

Page 276: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 276

Solution (b)

PX = i =

∞∑

j=0

PX = i, Y = j

=∞∑

j=0

[

e−λp (λp)i

i!

] [

e−λ(1−p) (λ(1 − p))j

j!

]

=

[

e−λp (λp)i

i!

] ∞∑

j=0

[

e−λ(1−p) (λ(1 − p))j

j!

]

=

[

e−λp (λp)i

i!

]

Page 277: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 277

Example 2c: A man and a woman decide to meet at a certain location. Each

person independently arrives at a time uniformly distributed between 12 noon and

1 pm.

Q: Find the probability that the first to arrive has to wait longer than 10 minutes.

Solution:

Let X = number of minutes by which one person arrives later than 12 noon.

Let Y = number of minutes by which the other person arrives later than 12 noon.

X and Y are independent.

The desired probability = P|X − Y | ≥ 10.

Page 278: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 278

P|X − Y | ≥ 10 =

∫ ∫

|x−y|>10

f(x, y)dxdy

=

∫ ∫

|x−y|>10

fX(x)fY (y)dxdy

=2

∫ ∫

y<x−10

1

60

1

60dxdy

=2

∫ 60

10

∫ x−10

0

1

3600dydx

=25

36.

Page 279: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 279

A Necessary and Suffice Condition for Independence

Proposition 2.1 The continuous (discrete) random variables X and Y are

independent if and only if their joint probability density (mass) function can be

expressed as

fX,Y (x, y) = h(x)g(y), −∞ < x, y < ∞

Proof: By definition, X and Y are independent if fX,Y (x, y) = fX(x)fY (y).

Page 280: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 280

Example 2f:

(a): Two random variables X and Y with joint density given by

f(x, y) = 6e−2xe−3y, 0 < x, y < ∞

are independent.

(b): Two random variables X and Y with joint density given by

f(x, y) = 24xy 0 < x, y < 1, 0 < x + y < 1

are NOT independent. (Why?)

Page 281: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 281

Independence of More Than Two Random Variables

Definition: The n random variables, X1, X2, ..., Xn, are said to be

independent if, for all sets of real numbers, A1, A2, ..., An

PX1 ∈ A1, X2 ∈ A2, ..., Xn ∈ An =n∏

i=1

PXi ∈ Ai

equivalently,

PX1 ≤ x1, X2 ≤ x2, ..., Xn ≤ xn =n∏

i=1

PXi ≤ xi,

for all x1, x2, ..., xn.

Page 282: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 282

Example 2h: Let X , Y , Z be independent and uniformly distributed over

(0, 1). Compute PX ≥ Y Z.

Solution:

PX ≥ Y Z =

∫ ∫ ∫

x≥yz

f(x, y, z)dxdydz

=

∫ 1

0

∫ 1

0

∫ 1

yz

dxdydz

=

∫ 1

0

∫ 1

0

(1 − yz) dydz

=

∫ 1

0

(

1 − z

2

)

dz

=3

4

Page 283: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 283

Chapter 6.3: Sum of Independent Random Variables

Suppose that X and Y are independent. Let Z = X + Y .

FZ(z) =P X + Y ≤ z =

∫ ∫

x+y≤z

fX(x)fY (y)dxdy

=

∫ ∞

−∞

∫ z−y

−∞fX(x)fY (y)dxdy

=

∫ ∞

−∞fY (y)

∫ z−y

−∞fX(x)dxdy

=

∫ ∞

−∞fY (y)FX(z − y)dy

fZ(z) =∂

∂zFZ(z) =

∫ ∞

−∞fY (y)fX(z − y)dy

Page 284: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 284

The discrete case: X and Y are two independent discrete random variables,

then Z = X + Y has probability mass function:

pZ(z) =∑

j

pY (j)pX(z − j)

——————–

Example: Suppose X, Y ∼ Bernouli(p). Then

Z = X + Y ∼ Binomial(2, p)

Page 285: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 285

Suppose X, Y ∼ Bernouli(p) are independent. Then

Z = X + Y ∼ Binomial(2, p).

pZ(0) =∑

j

pY (j)pX(0 − j) =pY (0)pX(0) = (1 − p)2 =

(

2

0

)

(1 − p)2

pZ(1) =∑

j

pY (j)pX(1 − j) =pY (0)pX(1 − 0) + pY (1)pX(1 − 1)

=(1 − p)p + p(1 − p) =

(

2

1

)

p(1 − p)

pZ(2) =∑

j

pY (j)pX(2 − j) =pY (1)pX(2 − 1) = p2 =

(

2

2

)

p2

Page 286: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 286

Suppose X ∼ Binomial(n, p) and Y ∼ Binomial(m, p) are independent.

Then Z = X + Y ∼ Binomial(m + n, p).

pZ(z) =∑

j

pY (j)pX(z − j).

Note 0 ≤ j ≤ m and z − n ≤ j ≤ z. We can assume m ≤ n.

There are three cases to consider:

1. Suppose 0 ≤ z ≤ m, then 0 ≤ j ≤ z

2. Suppose m < z ≤ n, then 0 ≤ j ≤ m.

3. Suppose n < z ≤ m + n, then z − n ≤ j ≤ m.

Page 287: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 287

Suppose 0 ≤ z ≤ m, then 0 ≤ j ≤ z

pZ(z) =∑

j

pY (j)pX(z − j) =z∑

j=0

pY (j)pX(z − j)

=z∑

j=0

(

m

j

)

pj(1 − p)m−j

(

n

z − j

)

pz−j(1 − p)n−z+j

=pz(1 − p)m+n−zz∑

j=0

(

m

j

)(

n

z − j

)

=

(

m + n

z

)

pz(1 − p)m+n−z

because we have seen (e.g., using a combinatorial argument)

z∑

j=0

(

m

j

)(

n

z − j

)

=

(

m + n

z

)

.

Page 288: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 288

Sum of Two Independent Uniform Random Variables

X ∼ U(0, 1) and Y ∼ U(0, 1) are independent. Let Z = X + Y .

fZ(z) =

∫ ∞

−∞fY (y)fX(z − y)dy =

∫ 1

0

fX(z − y)dy

Must have 0 ≤ y ≤ 1 and 0 ≤ z − y ≤ 1, i.e., z − 1 ≤ y ≤ z.

Two cases to consider

1. If 0 ≤ z ≤ 1, then 0 ≤ y ≤ z.

2. If 1 < z ≤ 2, then z − 1 < y ≤ 1.

Page 289: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 289

If 0 ≤ z ≤ 1, then 0 ≤ y ≤ z.

fZ(z) =

∫ 1

0

fX(z − y)dy =

∫ z

0

dy = z.

If 1 < z ≤ 2, then z − 1 ≤ y ≤ 1.

fZ(z) =

∫ 1

0

fX(z − y)dy =

∫ 1

z−1

dy = 2 − z.

fZ(z) =

z 0 < z ≤ 1

2 − z 1 < z < 2

0 otherwise

Page 290: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 290

Sum of Two Gamma Random Variables

X ∼ Gamma(t, λ) and Y ∼ Gamma(s, λ) are independent.

Let Z = X + Y .

Show that Z ∼ Gamma(s + t, λ).

For an intuition, see notes page 246.

Page 291: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 291

fZ(z) =

∫ ∞

−∞fY (y)fX(z − y)dy =

∫ z

0

fY (y)fX(z − y)dy

=

∫ z

0

[

λe−λy(λy)s−1

Γ(s)

] [

λe−λ(z−y)(λ(z − y))t−1

Γ(t)

]

dy

=e−λzλ2λs+t−2

Γ(s)Γ(t)

∫ z

0

[

e−λyys−1] [

eλy(z − y)t−1]

dy

=λe−λzλs+t−1

Γ(s)Γ(t)

∫ z

0

ys−1(z − y)t−1dy

Page 292: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 292

fZ(z) =λe−λzλs+t−1

Γ(s)Γ(t)

∫ z

0

ys−1(z − y)t−1dy

=

[

λe−λz(λz)s+t−1

Γ(s + t)

] [

Γ(s + t)

zs+t−1Γ(s)Γ(t)

∫ z

0

ys−1(z − y)t−1dy

]

=

[

λe−λz(λz)s+t−1

Γ(s + t)

] [

Γ(s + t)

Γ(s)Γ(t)

∫ z

0

(y

z

)s−1 (

1 − y

z

)t−1

dy

z

]

=

[

λe−λz(λz)s+t−1

Γ(s + t)

] [

Γ(s + t)

Γ(s)Γ(t)

∫ 1

0

θs−1 (1 − θ)t−1

]

=fW (z)

[

Γ(s + t)

Γ(s)Γ(t)

∫ 1

0

θs−1 (1 − θ)t−1 dθ

]

W ∼ Gamma(s + t, λ).

Because fZ(z) is also a density function, we must have[

Γ(s + t)

Γ(s)Γ(t)

∫ 1

0

θs−1 (1 − θ)t−1 dθ

]

= 1

Page 293: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 293

Recall the definition of Beta function in Section 5.6.4

B(s, t) =

∫ 1

0

θs−1 (1 − θ)t−1

and its property (Eq. (6.3) in Section 5.6.4)

B(s, t) =Γ(s)Γ(t)

Γ(s + t).

Thus,

[

Γ(s + t)

Γ(s)Γ(t)

∫ 1

0

θs−1 (1 − θ)t−1 dθ

]

=

[

Γ(s + t)

Γ(s)Γ(t)

Γ(s)Γ(t)

Γ(s + t)

]

= 1

Page 294: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 294

Generalization to more than two gamma random variables

If X1, X2, ..., Xn are independent gamma random variables with respective

parameters (s1, λ), (s2, λ), ..., (sn, λ), then

X1 + X2 + ... + Xn =n∑

i=1

Xi ∼ Gamma

(

n∑

i=1

si, λ

)

Page 295: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 295

Sum of Independent Normal Random Variables

Proposition 3.2 If Xi, i = 1 to n, are independent random variables that

are normally distributed with respective parameters ui and σ2i , i = 1 to n. Then

X1 + X2 + ... + Xn =n∑

i=1

Xi ∼ N

(

n∑

i=1

µi,n∑

i=1

σ2i

)

————————

It suffices to show

X1 + X2 ∼ N(

µ1 + µ2, σ21 + σ2

2

)

Page 296: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 296

X ∼ N(µx, σ2x), Y ∼ N(µy, σ2

y). Let Z = X + Y .

fZ(z) =

∫ ∞

−∞fX(z − y)fY (y)dy

=

∫ ∞

−∞

[

1√2πσx

e− (z−y−µx)2

2σ2x

]

[

1√2πσy

e− (y−µy)2

2σ2y

]

dy

=1√2π

∫ ∞

−∞

1√2πσxσy

e− (z−y−µx)2

2σ2x

− (y−µy)2

2σ2y dy

=1√2π

∫ ∞

−∞

1√2πσxσy

e− (z−y−µx)2σ2

y+(y−µy)2σ2x

2σ2xσ2

y dy

=1√2π

∫ ∞

−∞

1√2πσxσy

e− y2(σ2

x+σ2y)−2y(µyσ2

x−(µx−z)σ2y)+(µx−z)2σ2

y+µ2yσ2

x

2σ2xσ2

y dy

Page 297: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 297

Let A = σxσy , B = σ2xσ2

y , C = σ2x + σ2

y ,

D = −2(µyσ2x − (µx − z)σ2

y),

E = (µx − z)2σ2y + µ2

yσ2x.

fZ(z) =

∫ ∞

−∞fX(z − y)fY (y)dy

=1√2π

∫ ∞

−∞

1√2πσxσy

e− y2(σ2

x+σ2y)−2y(µyσ2

x−(µx−z)σ2y)+(µx−z)2σ2

y+µ2yσ2

x

2σ2xσ2

y dy

=

∫ ∞

−∞

1√2πA

e−Cy2+Dy+E

2B dy =

B

CA2e

D2−4EC8BC

because∫∞−∞

1√2πσ

e−(y−µ)2

2σ2 dy = 1, for any µ, σ2.

Page 298: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 298

∫ ∞

−∞

1√2πA

e−Cy2+Dy+E

2B dy

=

∫ ∞

−∞

1√2πA

e− y2+ D

Cy+ E

C

2 BC dy

=

∫ ∞

−∞

1√2πA

e− (y+ D

2C )2−

D2

4C2 + EC

2 BC dy

=e−

−D2

4C2 + EC

2 BC

∫ ∞

−∞

1√

2π√

BC

A√BC

e− (y+ D

2C )2

2 BC dy

=e−

−D2

4C2 + EC

2 BC

B

CA2=

B

CA2e

D2−4EC8BC

Page 299: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 299

fZ(z) =1√2π

B

CA2e

D2−4EC8BC

=1

√2π√

σ2x + σ2

y

e

(µyσ2x−(µx−z)σ2

y)2−((µx−z)2σ2y+µ2

yσ2x)(σ2

x+σ2y)

2σ2xσ2

y(σ2x+σ2

y)

=1

√2π√

σ2x + σ2

y

e− (z−µx−µy)2

2(σ2x+σ2

y)

Therefore, Z = X + Y ∼ N(

µx + µy, σ2x + σ2

y

)

.

Page 300: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 300

Sum of Independent Poisson Random Variables

If X1, X2, ..., Xn are independent Poisson random variables with respective

parameters λi, i = 1 to n, then

X1 + X2 + ... + Xn =n∑

i=1

Xi ∼ Poisson

(

n∑

i=1

λi

)

Page 301: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 301

Sum of Independent Binomial Random Variables

If X1, X2, ..., Xn are independent binomial random variables with respective

parameters (mi, p), i = 1 to n, then

X1 + X2 + ... + Xn =n∑

i=1

Xi ∼ Binomial

(

n∑

i=1

mi, p

)

Page 302: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 302

Sections 6.4, 6.5: Conditional Distributions

Recall P (E|F ) = P (EF )P (F ) .

If X and Y are discrete random variables, then

pX|Y (x|y) =p(x, y)

pY (y)

If X and Y are continuous random variables, then

fX|Y (x|y) =f(x, y)

fY (y)

Page 303: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 303

Example 4b: Suppose X and Y are independent Poisson random variables

with respective parameters λx and λy .

Q: Calculate the conditional distribution of X , given that X + Y = n.

—————–

What is the distribution of X + Y ??

Page 304: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 304

Example 4b: Suppose X and Y are independent Poisson random variables

with respective parameters λx and λy .

Q: Calculate the conditional distribution of X , given that X + Y = n.

Solution:

PX = k|X + Y = n =PX = k, X + Y = n

PX + Y = n

=PX = k, Y = n − k

PX + Y = n =PX = kPY = n − k

PX + Y = n

=

[

e−λxλkx

k!

]

[

e−λyλn−ky

(n − k)!

]

[

n!

e−λx−λy(λx + λy)n

]

=

(

n

k

)(

λx

λx + λy

)k (

1 − λx

λx + λy

)n−k

Therefore, X ∼ Binomial(

n, p = λx

λx+λy

)

.

Page 305: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 305

Example 5b: The joint density of X and Y is given by

f(x, y) =

e−x/ye−y

y 0 < x, y < ∞0 otherwise

Q: Find PX > 1|Y = y.

————————–

PX > 1|Y = y =∫∞1

fX|Y (x|y)dx

Page 306: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 306

Solution:

fX|Y (x|y) =f(x, y)

fY (y)=

e−x/ye−y

y∫∞0

e−x/ye−y

y dx=

e−x/y

∫∞0

e−x/ydx=

1

ye−x/y

PX > 1|Y = y =

∫ ∞

1

fX|Y (x|y)dx =

∫ ∞

1

1

ye−x/ydx = e−1/y

Page 307: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 307

The Bivariate Normal Distribution

The random variables X and Y have a bivariate normal distribution if, for

constants, ux, uy , σx, σy , −1 < ρ < 1, their joint density function is given, for

all −∞ < x, y < ∞, by

f(x, y) =1

2πσxσy

1 − ρ2e− 1

2(1−ρ2)

[

(x−µx)2

σ2x

+(y−µy)2

σ2y

−2ρ(x−µx)(y−µy)

σxσy

]

If X and Y are independent, then ρ = 0, and

f(x, y) =1

2πσxe− 1

2

[

(x−µx)2

σ2x

+(y−µy)2

σ2y

]

=1√

2πσx

e− (x−µx)2

2σ2x × 1√

2πσy

e− (y−µy)2

2σ2y

Page 308: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 308

fX(x) =

∫ ∞

−∞f(x, y)dy

=

∫ ∞

−∞

1

2πσxσy

1 − ρ2e− 1

2(1−ρ2)

[

(x−µx)2

σ2x

+(y−µy)2

σ2y

−2ρ(x−µx)(y−µy)

σxσy

]

dy

=

∫ ∞

−∞

1

2πσx

1 − ρ2e− 1

2(1−ρ2)

[

(x−µx)2

σ2x

+t2−2ρ(x−µx)t

σx

]

dt

=

∫ ∞

−∞

1

2πσx

1 − ρ2e− (x−µx)2+t2σ2

x−2ρ(x−µx)tσx

2(1−ρ2)σ2x dt

=

∫ ∞

−∞

1√2πA

e−Ct2+Dt+E

2B dt =

B

CA2e

D2−4EC8BC =

1√2πσx

e− (x−µx)2

2σ2x

A =√

2π√

1 − ρ2σx, B = (1 − ρ2)σ2x, C = σ2

x,

D = −2ρ(x − µx)σx, E = (x − µx)2

Page 309: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 309

Therefore,

X ∼ N(µx, σ2x), Y ∼ N(µy, σ2

y)

The conditional density

fX|Y (x|y) =f(x, y)

fY (y)

A useful trick: Only worry about the terms that are functions of x; the rest

terms are just normalizing constants.

Page 310: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 310

Any (reasonable) non-negative function g(x) can be essentially viewed as a

probability density function because one can always normalize it.

f(x) =g(x)

∫∞−∞ g(x)dx

= Cg(x) ∝ g(x), where C =1

∫∞−∞ g(x)dx

is a probability density function. (One can verify∫∞−∞ f(x) = 1).

C is just a (normalizing) constant.

When y is treated as a constant, C(y) is still a constant.

X |Y = y means y is a constant.

Page 311: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 311

For example, g(x) = e−(x−1)2

2 is essentially the density function of N(1, 1).

Similarly, g(x) = e−x2

−2x2 , g(x) = e−

x2−2xy2 are both (essentially) the

density functions of normals.

————–

1

C=

∫ ∞

−∞e−

x2−2xy2 dx =

√2πe

y2

2

Therefore,

f(x) = Cg(x) =1√2π

e−y2

2 × g(x) =1√2π

e−(x−y)2

2

is the density function of N(y, 1), where y is merely a constant.

Page 312: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 312

(X , Y ) is a bivariate normal.

fX|Y (x|y) =f(x, y)

fY (y)∝ f(x, y)

∝ e− 1

2(1−ρ2)

[

(x−µx)2

σ2x

−2ρ(x−µx)(y−µy)

σxσy

]

∝ e−

(x−µx)2−2ρ(x−µx)(y−µy)σxσy

2(1−ρ2)σ2x

∝ e−

(

x−µx−ρ(y−µy)σxσy

)2

2(1−ρ2)σ2x

Therefore,

X |Y ∼ N

(

µx + ρ(y − µy)σx

σy, (1 − ρ2)σ2

x

)

Y |X ∼ N

(

µy + ρ(x − µx)σy

σx, (1 − ρ2)σ2

y

)

Page 313: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 313

Example 5d: Consider n + m trials having a common probability of

success. Suppose, however, that this success probability is not fixed in advance

but is chosen from U(0, 1).

Q: What is the conditional distribution of this success probability given that the

n + m trials result in n successes?

Solution:

Let X = trial success probability. X ∼ U(0, 1).

Let N = total number of successes. N |X = x ∼ Binomial(n + m, x).

The desired probability: X |N = n.

Page 314: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 314

Solution:

Let X = trial success probability. X ∼ U(0, 1).

Let N = total number of successes. N |X = x ∼ Binomial(n + m, x).

fX|N (x|n) =PN = n|X = xfX(x)

PN = n

=

(

n+mn

)

xn(1 − x)m

PN = n∝ xn(1 − x)m

Therefore, X |N ∼ Beta(n + 1, m + 1).

Here X ∼ U(0, 1) is the prior distribution.

Page 315: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 315

If X |N ∼ Beta(n + 1, m + 1), then

E(X |N) =n + 1

(n + 1) + (m + 1)

Suppose we do not have a prior knowledge of the success probability X .

We observe n successes out of n + m trials.

The most intuitive guess (estimate) of X should be

X =n

n + m

Assuming a uniform prior on X leads to the add-one smoothing .

Page 316: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 316

Section 6.6: Order Statistics

Let X1, X2, ..., Xn be n independent and identically distributed, continuous

random variables having a common density f and distribution function F .

Define

X(1) = smallest of X1, X2, ..., Xn

X(2) = second smallest of X1, X2, ..., Xn

...

X(j) = jth smallest of X1, X2, ..., Xn

...

X(n) = largest of X1, X2, ..., Xn

The ordered values X(1) ≤ X(2) ≤ ... ≤ X(n) are known as the order

statistics corresponding to the random variables X1, X2, ..., Xn.

Page 317: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 317

For example, the CDF of the largest (i.e., X(n)) should be

FX(n)(x) = P (X(n) ≤ x) = P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)

because, if the largest is smaller than x, then everyone is smaller than x.

Then, by independence

FX(n)(x) =P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)

=P (X1 ≤ x)P (X2 ≤ x)...P (Xn ≤ x)

=F (x)n,

and the density function is simply

fX(n)(x) =

∂Fn(x)

∂x= nFn−1(x)f(x)

Page 318: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 318

The CDF of the smallest (i.e., X(1)) should be

FX(1)(x) = P (X(1) ≤ x) =1 − P (X(1) > x)

=1 − P (X1 > x, X2 > x, ..., Xn > x)

because, if the smallest is larger than x, then everyone is larger than x.

Then, by independence

FX(1)(x) =1 − P (X1 > x)P (X2 > x)...P (Xn > x)

=1 − [1 − F (x)]n

and the density function is simply

fX(1)(x) =

∂1 − [1 − F (x)]n

∂x= n [1 − F (x)]

n−1f(x)

Page 319: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 319

Joint density function of order statistics

for x1 ≤ x2 ≤ ... ≤ xn,

fX(1),X(2),...,X(n)(x1, x2, ..., xn) = n!f(x1)f(x2)...f(xn),

Intuitively (and heuristically)

PX(1) = x1, X(2) = x2=PX1 = x1, X2 = x2 OR X2 = x1, X1 = x2=P X1 = x1, X2 = x2 + P X2 = x1, X1 = x2=f(x1)f(x2) + f(x1)f(x2)

=2!f(x1)f(x2)

Warning, this is a heuristic argument, not a rigorous proof. Please read the book.

Page 320: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 320

Density function of the kth order statistics, X(k): fX(k)(x).

• Select 1 from the n values X1, X2, ..., Xn. Let it be equal to x.

• Select k − 1 from the remaining n − 1 values. Let them be smaller than x.

• Let the rest be larger than x.

• Total number of choices is ??.

• For a given choice, the probability should be ??

Page 321: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 321

• Select 1 from the n values X1, X2, ..., Xn. Let it be equal to x.

• Select k − 1 from the remaining n − 1 values. Let them be smaller than x.

• Let the rest be larger than x.

• Total number of choices is(

n1,k−1,n−k

)

.

• For a given choice, the probability should be

f(x) [F (x)]k−1

[1 − F (x)]n−k

.

Therefore,

fX(k)(x) =

(

n

1, k − 1, n − k

)

f(x) [F (x)]k−1

[1 − F (x)]n−k

Page 322: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 322

Cumulative function of the kth order statistics

• Via integrating the density function:

FX(k)(y) =

∫ y

−∞fX(k)

(x)

=

∫ y

−∞

(

n

1, k − 1, n − k

)

f(x) [F (x)]k−1

[1 − F (x)]n−k

dx

=

(

n

1, k − 1, n − k

)∫ y

−∞f(x) [F (x)]

k−1[1 − F (x)]

n−kdx

=

(

n

1, k − 1, n − k

)∫ F (y)

0

[z]k−1 [1 − z]n−k dz

Page 323: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 323

• Via a direct argument.

FX(k)(y) =PX(k) ≤ (y) = P≥ k of the Xi’s are ≤ y

FX(k)(y) =PX(k) ≤ y

=P≥ k of the Xi’s are ≤ y

=

n∑

j=k

Pj of the Xi’s are ≤ y

=

n∑

j=k

(

n

j

)

[F (y)]j[1 − F (y)]

n−j

Page 324: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 324

• A more intuitive solution by dividing y into non-overlapping segments.

X(k) ≤ y =y ∈ [X(n),∞)⋃

y ∈ [X(n−1), X(n))⋃

...⋃

y ∈ [X(k), X(k+1))

=n∑

j=k

(

n

j

)

[F (y)]j[1 − F (y)]

n−j

Py ∈ [X(n),∞) =

(

n

n

)

[F (y)]n

Py ∈ [X(n−1), X(n)) =

(

n

n − 1

)

[F (y)]n−1[1 − F (y)]

Py ∈ [X(k), X(k+1)) =

(

n

k

)

[F (y)]k[1 − F (y)]n−k

Page 325: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 325

An interesting identity by letting X ∼ U(0, 1):

n∑

j=k

(

n

j

)

yj [1 − y]n−j

=

(

n

1, k − 1, n − k

)∫ y

0

xk−1 [1 − x]n−k

dx, 0 < y < 1.

Page 326: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 326

Joint Probability Distribution of Functions of Random Vari ables

X1 and X2 are jointly continuous random variables

with probability density function fX1,X2 .

Y1 = g1(X1, X2), Y2 = g2(X1, X2).

The goal: Determine the density function fY1,Y2 .

Page 327: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 327

The formula to be remembered

fY1,Y2(y1, y2) = fX1,X2(x1, x2)|J(x1, x2)|−1

where, the inverse functions:

x1 = h1(y1, y2), x2 = h2(y1, y2)

and the Jocobian:

J(x1, x2) =

∂g1

∂x1

∂g1

∂x2

∂g2

∂x1

∂g2

∂x2

=∂g1

∂x1

∂g2

∂x2− ∂g2

∂x1

∂g1

∂x2

Two conditions

• y1 = g1(x1, x2) and y2 = g2(x1, x2) can be uniquely solved:

x1 = h1(y1, y2), and x2 = h2(y1, y2)

• The Jocobian J(x1, x2) 6= 0 for all points (x1, x2).

Page 328: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 328

Example 7a: Let X1 and X2 be jointly continuous random variables with

probability density fX1,X2 . Let Y1 = X1 + X2 and Y2 = X1 − X2.

Q: Find fY1,Y2 in terms of fX1,X2 .

Page 329: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 329

Solution:

y1 = g1(x1, x2) = x1 + x2, y2 = g2(x1, x2) = x1 − x2,

=⇒x1 = h1(y1, y2) = y1+y2

2 , x2 = h2(y1, y2) = y1−y2

2 ,

J(x1, x2) =

1 1

1 −1

= −2;

Therefore,

fY1,Y2(y1, y2) =fX1,X2(x1, x2)|J(x1, x2)|−1

=1

2fX1,X2

(

y1 + y2

2,y1 − y2

2

)

Page 330: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 330

Example 7b: The joint density in the polar coordinates.

x = h1(r, θ) = r cos θ, y = h2(r, θ) = r sin θ, i.e.,

r = g1(x, y) =√

x2 + y2, θ = g2(x, y) = tan−1 yx .

Q (a): Express fR,Θ(r, θ) in terms of fX,Y (x, y).

Q (b): What if X and Y are independent standard normal random variables?

Page 331: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 331

Solution: First, compute the Jocobian J(x, y):

∂g1

∂x=

x√

x2 + y2= cos θ

∂g1

∂y=

y√

x2 + y2= sin θ

∂g2

∂x=

1

1 + (y/x)2

(−y

x2

)

=−y

x2 + y2=

− sin θ

r

∂g2

∂y=

1

1 + (y/x)2

(

1

x

)

=x

x2 + y2=

cos θ

r

J =∂g1

∂x

∂g2

∂y− ∂g1

∂y

∂g2

∂x=

cos2 θ

r+

sin2 θ

r=

1

r

Page 332: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 332

fR,Θ(r, θ) = fX,Y (x, y)|J |−1 = rfX,Y (r cos θ, r sin θ)

where 0 < r < ∞, 0 < θ < 2π.

If X and Y are independent standard normals, then

fR,Θ(r, θ) = rfX,Y (r cos θ, r sin θ) =r

2πe−

x2+y2

2 =re−r2/2

R and Θ are independent. Θ ∼ U(0, 2π).

Page 333: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 333

Example 7d: X1, X2, and X3, are independent standard normal random

variables. Let

Y1 = X1 + X2 + X3

Y2 = X1 − X2

Y3 = X1 − X3

Q: Compute the joint density function fY1,Y2,Y3(y1, y2, y3).

Page 334: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 334

Solution:

Compute the Jocobian

J =

1 1 1

1 −1 0

1 0 −1

= 3

Solve for X1, X2, and X3

X1 =Y1 + Y2 + Y3

3, X2 =

Y1 − 2Y2 + Y3

3, X3 =

Y1 + Y2 − 2Y3

3

fY1,Y2,Y3(y1, y2, y3) = fX1,X2,X3(x1, x2, x3)|J |−1

=1

3(2π)3/2e−

x21+x2

2+x23

2 =1

3(2π)3/2e− 1

2

[

y213 +

2y22

3 +2y2

33 − 2y2y3

3

]

Page 335: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 335

More about Determinant

a b c

d e f

g h i

= aei + bfg + cdh − ceg − bdi − afh

x1 0 0 0 ... 0

× x2 0 0 ... 0

× × x3 0 ... 0

× × × x4 ... 0

× × × × ... 0

× × × × × xn

= x1x2x3...xn

Page 336: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 336

Chapter 7: Properties of Expectation

If X is a discrete random variable with mass function p(x), then

E(X) =∑

x

xp(x)

If X is a continuous random variable with density f(x), then

E(X) =

∫ ∞

−∞xf(x)dx

If a ≤ X ≤ b, then a ≤ E(X) ≤ b.

Page 337: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 337

Proposition 2.1

If X and Y have a joint probability mass function p(x, y), then

E[g(x, y)] =∑

y

x

g(x, y)p(x, y)

If X and Y have a joint probability density function f(x, y), then

E[g(x, y)] =

∫ ∞

−∞

∫ ∞

−∞g(x, y)f(x, y)dxdy

What if g(X, Y ) = X + Y ?

Page 338: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 338

Example 2a: An accident occurs at a point X that is uniformly distributed on

a road of length L. At the time of the accident an ambulance is at a location Y

that is also uniformly distributed on the road. Assuming that X and Y are

independent, find the expected distance between the ambulance and the point of

the accident.

Solution:

X ∼ U(0, L), Y ∼ U(0, L). To find E(|X − Y |).

E(|X − Y |) =

∫ ∫

|x − y|f(x, y)dxdy

Page 339: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 339

Solution:

X ∼ U(0, L), Y ∼ U(0, L). To find E(|X − Y |).

E(|X − Y |) =

∫ ∫

|x − y|f(x, y)dxdy

=1

L2

∫ L

0

∫ L

0

|x − y|dxdy

=1

L2

∫ L

0

∫ x

0

(x − y) dydx +1

L2

∫ L

0

∫ L

x

(y − x) dydx

=1

L2

∫ L

0

xy − y2

2

x

0

dx +1

L2

∫ L

0

y2

2− xy

L

x

dx

=1

L2

∫ L

0

x2

2dx +

1

L2

∫ L

0

L2

2− xL +

x2

2dx

=L

3

Page 340: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 340

Let X and Y be continuous random variables with density f(x, y).

g(X, Y ) = X + Y .

E[g(x, y)] = E(X + Y )

=

∫ ∞

−∞

∫ ∞

−∞(x + y)f(x, y)dxdy

=

∫ ∞

−∞

∫ ∞

−∞xf(x, y)dxdy +

∫ ∞

−∞

∫ ∞

−∞yf(x, y)dxdy

=

∫ ∞

−∞x

[∫ ∞

−∞f(x, y)dy

]

dx +

∫ ∞

−∞y

[∫ ∞

−∞f(x, y)dx

]

dy

=

∫ ∞

−∞xfX(x)dx +

∫ ∞

−∞yfY (y)dy

=E(X) + E(Y )

Page 341: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 341

Linearity of Expectations

E(X + Y ) = E(X) + E(Y )

E(X + Y + Z) = E((X + Y ) + Z)

=E(X + Y ) + E(Z) = E(X) + E(Y ) + E(Z)

E(X1 + X2 + ... + Xn) = E(X1) + E(X2) + ... + E(Xn)

Linearity is not affected by correlations among Xi’s.

No assumption of independence is needed.

Page 342: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 342

The Sample Mean

Let X1, X2, ..., Xn be identically distributed random variables having expected

value µ. The quantity X , defined by

X =1

n

n∑

i=1

Xi

is called the sample mean.

Page 343: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 343

E(X) =E

[

1

n

n∑

i=1

Xi

]

=1

nE

[

n∑

i=1

Xi

]

=1

n

n∑

i=1

E[Xi]

=1

n

n∑

i=1

µ

=1

nnµ

When µ is unknown, an unbiased estimator of µ is X .

Page 344: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 344

Expectation of a Binomial Random Variable

X ∼ Binomial(n, p)

X = Z1 + Z2 + ... + Zn

where Zi, i = 1 to n, are independent Bernoulli with probability of success p.

Because E(Zi) = p,

E(X) = E

(

n∑

i=1

Zi

)

=

n∑

i=1

E(Zi) = np

Page 345: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 345

Expectation of a Hypergeometric Random Variable

Suppose n balls are randomly selected without replacement, from an urn

containing N balls of which m are white. Denote X ∼ HG(N, m, n).

Q: Find E(X).

Solution 1: Let X = X1 + X2 + ... + Xm, where

Xi =

1 if the ith white ball is selected

0 otherwise

Note that the Xi’s are not independent; but we only need the expectation.

Page 346: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 346

Solution 1: Let X = X1 + X2 + ... + Xm, where

Xi =

1 if the ith white ball is selected

0 otherwise

E(Xi) =PXi = 1 = Pith white ball is selected

=

(

11

)(

N−1n−1

)

(

Nn

) =n

N

Therefore,

E(X) =mn

N.

Page 347: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 347

Solution 2: Let X = Y1 + Y2 + ... + Yn, where

Yi =

1 if the ith ball selected is white

0 otherwise

E(Yi) =PYi = 1 = Pith ball elected is white

=m

N

Therefore,

E(X) =mn

N.

Page 348: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 348

Expected Number of Matches

A group of N people throw their hats into the center of a room. The hats are

mixed up and each person randomly selects one.

Q: Find the expected number of people that selects their own hat.

Page 349: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 349

Solution: Let X denote the number of matches and let

X = X1 + X2 + ... + XN ,

where

Xi =

1 if the i th person selects his own hat

0 otherwise

E(Xi) = PXi = 1 =1

N

Thus

E(X) =N∑

i=1

E(Xi) =1

NN = 1

Page 350: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 350

Expectations May Be Undefined

A More Rigorous Definition: If X is a discrete random variable with mass

function p(x), then

E(X) =∑

x

xp(x), provided∑

x

|x|p(x) < ∞

If X is a continuous random variable with density f(x), then

E(X) =

∫ ∞

−∞xf(x)dx, provided

∫ ∞

−∞|x|f(x) < ∞

Page 351: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 351

Example (Cauchy Distribution):

f(x) =1

π

1

x2 + 1, −∞ < x < ∞

E(X) =

∫ ∞

−∞

x

π

1

x2 + 1dx = 0 ??? (Answer is NO)

because∫ ∞

−∞

|x|π

1

x2 + 1dx =

∫ ∞

0

2x

π

1

x2 + 1dx

=

∫ ∞

0

1

π

1

x2 + 1d[x2 + 1]

=1

πlog(x2 + 1)

∞0

=1

πlog∞ = ∞

Page 352: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 352

Section 7.3: Moments of The Number of Events

Ii =

1 if Ai occurs

0 otherwise

X =∑n

i=1 Ii: number of these events Ai that occur. We know

E(X) =n∑

i=1

E(Ii) =n∑

i=1

P (Ai)

Suppose we are interested in the number of pairs of events that occur, eg. AiAj

Page 353: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 353

X =∑n

i=1 Ii: number of these events Ai that occur.

X(X−1)2 =

(

X2

)

: number of pairs of event that occur.

⇐⇒∑

i<j IiIj : number of pairs of event that occur, because

IiIj =

1 if both Ai and Aj occur

0 otherwise

Therefore,

E

(

X(X − 1)

2

)

=∑

i<j

P (AiAj)

E(

X2)

− E (X) = 2∑

i<j

P (AiAj)

Page 354: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 354

Second Moments of a Binomial Random Variables

n independent trials, with each trial being a success with probability p.

Ai = the event that trial i is a success.

If i 6= j, then P (AiAj) = p2.

E(

X2)

− E (X) = 2∑

i<j

P (AiAj) = 2

(

n

2

)

p2 = n(n − 1)p2

E(X2) = n2p2 − np2 + np

V ar(X) =[

n2p2 − np2 + np]

− [np]2

= np(1 − p)

Page 355: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 355

Second Moments of a Hypergeometric Random Variables

n balls are randomly selected without replacement from an urn containing N

balls, of which m are white.

Ai = event that the ith ball selected is white.

X = number of white balls selected. We have known

E(X) =

n∑

i=1

P (Ai) = nm

N.

Page 356: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 356

P (AiAj) = P (Ai)P (Aj |Ai) =m

N

m − 1

N − 1, i 6= j

E(

X2)

− E (X) = 2∑

i<j

P (AiAj) = 2

(

n

2

)

m

N

m − 1

N − 1

E(X2) = n(n − 1)m

N

m − 1

N − 1+

nm

N

V ar(X) =nm(N − n)(N − m)

N2(N − 1)= n

(m

N

)(

1 − m

N

) N − n

N − 1

One can view mN = p. Note that N−n

N−1 ≈ 1 if N is much larger than n.

Page 357: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 357

Second Moments of the Matching Problem

Let Ai = event that person i selects his/her own hat in the match problem, i = 1

to N .

P (AiAj) = P (Ai)P (Aj |Ai) =1

N

1

N − 1, i 6= j

X = number of people who select their own hat. We have shown E(X) = 1.

E(X2) − E(X) = 2∑

i<j

P (AiAj) = N(N − 1)1

N(N − 1)= 1

V ar(X) = E(X2) − E2(X) = 1

Page 358: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 358

Section 7.4: Covariance and Correlations

Proposition 4.1: If X and Y are independent, then

E(XY ) = E(X)E(Y ).

For any function h and g,

E[h(X) g(Y )] = E[h(X)] E[g(Y )].

Page 359: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 359

Proof:

E[h(X) g(Y )] =

∫ ∞

−∞

∫ ∞

−∞g(x)h(y)f(x, y)dxdy

=

∫ ∞

−∞

∫ ∞

−∞g(x)h(y)fX(x)fY (y)dxdy

=

[∫ ∞

−∞g(x)fX(x)dx

] [∫ ∞

−∞h(y)fY (y)dy

]

=E[h(X)] E[g(Y )]

Page 360: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 360

Covariance

Definition: The covariance between X and Y , is defined by

Cov(X, Y ) = E[(X − E[X ])(Y − E[Y ])]

Equivalently

Cov(X, Y ) = E(XY ) − E(X)E(Y )

Page 361: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 361

Proof:

Cov(X, Y ) = E[(X − E[X ])(Y − E[Y ])]

= E[XY − E[X ]Y − E[Y ]X + E[X ]E[Y ]]

= E[XY ] − E[E[X ]Y ] − E[E[Y ]X ] + E[E[X ]E[Y ]]

= E[XY ] − E[X ]E[Y ] − E[Y ]E[X ] + E[X ]E[Y ]

= E[XY ] − E[X ]E[Y ]

Note that E(X) is a constant, so is E(X)E(Y )

Page 362: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 362

X and Y are independent =⇒ Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0.

However, that X and Y are uncorrelated Cov(X, Y ) = 0,

does not imply that X and Y are independent

Page 363: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 363

Properties of Covariance: Proposition 4.2

• Cov(X, Y ) = Cov(Y, X)

• Cov(X, X) = V ar(X)

• Cov(aX, Y ) = aCov(X, Y )

Directly follow from the definition Cov(X, Y ) = E(XY ) − E(X)E(Y ).

Cov

n∑

i=1

Xi,m∑

j=1

Yj

=n∑

i=1

m∑

j=1

Cov(Xi, Yj)

Page 364: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 364

Proof:

Cov

n∑

i=1

Xi,m∑

j=1

Yj

=E

n∑

i=1

Xi

m∑

j=1

Yj

− E

(

n∑

i=1

Xi

)

E

m∑

j=1

Yj

=E

n∑

i=1

m∑

j=1

XiYj

−n∑

i=1

E(Xi)m∑

j=1

E(Yj)

=n∑

i=1

m∑

j=1

E(XiYj) −n∑

i=1

m∑

j=1

E(Xi)E(Yj)

=n∑

i=1

m∑

j=1

[E(XiYj) − E(Xi)E(Yj)]

=

n∑

i=1

m∑

j=1

Cov(Xi, Yj)

Page 365: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 365

Cov

n∑

i=1

Xi,m∑

j=1

Yj

=n∑

i=1

m∑

j=1

Cov(Xi, Yj)

If n = m and Xi = Yi, then

V ar

(

n∑

i=1

Xi

)

=Cov

n∑

i=1

Xi,n∑

j=1

Xj

=n∑

i=1

n∑

j=1

Cov(Xi, Xj)

=n∑

i=1

Cov(Xi, Xi) +∑

i 6=j

Cov(Xi, Xj)

=n∑

i=1

V ar(Xi) +∑

i 6=j

Cov(Xi, Xj)

=n∑

i=1

V ar(Xi) + 2∑

i<j

Cov(Xi, Xj)

Page 366: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 366

If Xi’s are pairwise independent, then

V ar

(

n∑

i=1

Xi

)

=

n∑

i=1

V ar(Xi)

Example 4b: X ∼ Binomial(n, p) can be written as

X = X1 + X2 + ... + Xn,

where Xi ∼ Bernoulli(p). Xi’s are independent. Therefore

V ar(X) =V ar

(

n∑

i=1

Xi

)

=n∑

i=1

V ar(Xi) =n∑

i=1

p(1 − p) = np(1 − p).

Page 367: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 367

Example 4a: Let X1, X2, ..., Xn be independent and identically distributed

random variables having expected value µ and variance σ2. Let X be the

sample mean and S2 be the sample variance, where

X =1

n

n∑

i=1

Xi, S2 =1

n − 1

n∑

i=1

(Xi − X)2

Q: Find V ar(X) and E(S2).

Page 368: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 368

Solution:

V ar(X) =V ar

(

1

n

n∑

i=1

Xi

)

=1

n2V ar

(

n∑

i=1

Xi

)

=1

n2

n∑

i=1

V ar (Xi)

=1

n2

n∑

i=1

σ2 =σ2

n

We’ve already shown E(X) = µ. As the sample size n increases, the

variance of X , the unbiased estimator of µ, decreases.

Page 369: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 369

The Less Clever Approach (different from the book)

E(S2) =E

(

1

n − 1

n∑

i=1

(Xi − X)2

)

=1

n − 1E

(

n∑

i=1

(

X2i + X2 − 2XXi

)

)

=1

n − 1E

(

n∑

i=1

X2i + nX2 − 2X

n∑

i=1

Xi

)

=1

n − 1E

(

n∑

i=1

X2i − nX2

)

=1

n − 1

(

n∑

i=1

E(X2i ) − nE

(

X2)

)

=n

n − 1

(

µ2 + σ2 − E(

X2))

Page 370: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 370

E(X2) =1

n2E

(

n∑

i=1

Xi

)2

=1

n2E

n∑

i

X2i + 2

i<j

XiXj

=1

n2

n∑

i

E(X2i ) + 2

i<j

E(XiXj)

=1

n2

n∑

i

(µ2 + σ2) + 2∑

i<j

µ2

=1

n2

[

n(

µ2 + σ2)

+ n(n − 1)µ2]

=σ2

n+ µ2

Or, directly,

E(X2) =V ar(

X)

+ E2(

X)

=σ2

n+ µ2

Page 371: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 371

Therefore,

E(S2) =n

n − 1

(

µ2 + σ2 − E(

X2))

=n

n − 1

(

µ2 + σ2 − σ2

n− µ2

)

=σ2

Sample variance S2 = 1n−1

∑ni=1(Xi − X)2 is an unbiased estimator of σ2.

Why 1n−1 instead of 1

n ?

∑ni=1(Xi − X)2 is a sum of n terms with only n − 1 degrees of freedom,

because X = 1n

∑ni=1 Xi.

Page 372: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 372

The Standard Clever Technique for evaluating E(S2) (as in the textbook)

(n − 1)S2 =n∑

i=1

(Xi − X)2 =n∑

i=1

(Xi−µ + µ − X)2

=n∑

i=1

(Xi − µ)2 +n∑

i=1

(X − µ)2 − 2n∑

i=1

(Xi − µ)(X − µ)

=n∑

i=1

(Xi − µ)2 − n(X − µ)2

E[(n − 1)S2] =(n − 1)E(S2) = E

[

n∑

i=1

(Xi − µ)2 − n(X − µ)2

]

=nV ar(Xi) − nV ar(X) = nσ2 − nσ2

n= (n − 1)σ2

Page 373: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 373

The Correlation of Two Random Variables

Definition: The correlation of two random variables X and Y , denoted by

ρ(X, Y ), is defined, as long as V ar(X) > 0 and V ar(Y ) > 0, by

ρ(X, Y ) =Cov(X, Y )

V ar(X)V ar(Y )

It can be shown that

−1 ≤ ρ(X, Y ) ≤ 1

Page 374: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 374

−1 ≤ ρ(X, Y ) =Cov(X, Y )

V ar(X)V ar(Y )≤ 1

Proof: The basic trick: Variance V ar(X) ≥ 0 always.

V ar

(

X√

V ar(X)+

Y√

V ar(Y )

)

=V ar(X)

V ar(X)+

V ar(Y )

V ar(Y )+

2Cov(X, Y )√

V ar(X)V ar(Y )

=2 (1 + ρ(X, Y )) ≥ 0

Therefore, ρ(X, Y ) ≥ −1.

Page 375: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 375

V ar

(

X√

V ar(X)− Y√

V ar(Y )

)

=V ar(X)

V ar(X)+

V ar(Y )

V ar(Y )− 2Cov(X, Y )√

V ar(X)V ar(Y )

=2 (1 − ρ(X, Y )) ≥ 0

Therefore, ρ(X, Y ) ≤ 1.

Page 376: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 376

• ρ(X, Y ) = +1 =⇒ Y = a + bX , and b > 0.

• ρ(X, Y ) = −1 =⇒ Y = a − bX , and b > 0.

• ρ(X, Y ) = 0 =⇒ X and Y are uncorrelated

• ρ(X, Y ) is close to +1 =⇒ X and Y are highly positively correlated.

• ρ(X, Y ) is close to −1 =⇒ X and Y are highly negatively correlated.

Page 377: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 377

Section 7.5: Conditional Expectation

If X and Y are jointly discrete random variables

E[X |Y = y] =∑

x

xPX = x|Y = y

=∑

x

xpX|Y (x|y)

If X and Y are jointly continuous random variables

E[X |Y = y] =

∫ ∞

−∞xfX|Y (x|y)dx

Page 378: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 378

Example 5b: Suppose the joint density of X and Y is given by

f(x, y) =1

ye−x/ye−y, 0 < x, y < ∞

Q: Compute

E[X |Y = y] =

∫ ∞

−∞xfX|Y (x|y)dx

What the answer should be like? (a function a y)

Page 379: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 379

Solution:

fX|Y (x|y) =f(x, y)

fY (y)=

f(x, y)∫∞−∞ f(x, y)dx

=

1y e−x/ye−y

∫∞0

1y e−x/ye−ydx

=

1y e−x/ye−y

e−x/ye−y∣

0

=

1y e−x/ye−y

e−y

=1

ye−x/y, y > 0

E[X |Y = y] =

∫ ∞

−∞xfX|Y (x|y)dx =

∫ ∞

0

x

ye−x/ydx = y

Page 380: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 380

Computing Expectations by Conditioning

Proposition 5.1: The tower property

E(X) = E[E[X |Y ]]

In some cases, computing E(X) is difficult;

but computing E(X |Y ) may be much more convenient.

Page 381: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 381

Proof: Assume X and Y are continuous. (Book assumes discrete variables)

E[E[X |Y ]] =

∫ ∞

−∞E(X |Y )fY (y)dy

=

∫ ∞

−∞

[∫ ∞

−∞xfX|Y (x|y)dx

]

fY (y)dy

=

∫ ∞

−∞

∫ ∞

−∞xfX|Y (x|y)fY (y)dxdy

=

∫ ∞

−∞

∫ ∞

−∞x[

fX|Y (x|y)fY (y)]

dxdy

=

∫ ∞

−∞

∫ ∞

−∞xf(x, y)dxdy

=

∫ ∞

−∞x

[∫ ∞

−∞f(x, y)dy

]

dx

=

∫ ∞

−∞xfX(x)dx = E(X)

Page 382: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 382

Example 5d: Suppose N the number of people entering a store on a given

day is a random variable with mean 50. Suppose further that the amounts of

money spent by these customers are independent random variables having a

common mean $8. Assume the amount of money spent by a customer is also

independent of the total number of customers to enter the store.

Q: What is the expected amount of money spent in the store on a given day?

Solution:

N : the number of people entering the store. E(N) = 50.

Xi: the amount spent by the ith customer. E(Xi) = 8.

X : the total amount of money spent in the store on a give day. E(X) =??

Page 383: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 383

We can not use the linearity directly:

E(X) = E

(

N∑

i=1

Xi

)

=N∑

i=1

E(Xi) =N∑

i=1

8 ???

because N , the number of terms in the summation, is also random.

E(X) =E

(

N∑

i=1

Xi

)

=E

[

E

(

N∑

i=1

Xi|N)]

=E

[

N∑

i=1

E(Xi)

]

=E(N8) = 8E(N)

=8 × 50 = $400

Page 384: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 384

Example 2e: A game is begun by rolling an ordinary pair of dice.

• If the sum of the dice is 2, 3, or 12, the player loses.

• If the sum is 7 or 11, the player wins.

• If the sum is any other numbers i,the player continues to roll the dice until the

sum is either 7 or i.

– If it is 7, the player loses;

– If it is i, the player wins;

—————————–

Q : Compute E(R), where R= number of rolls in a game.

Page 385: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 385

Conditional on S, the initial sum,

E(R) =

12∑

i=2

E(R|S = i)P (S = i)

Let P (S = i) = Pi.

P2 =1

36, P3 =

2

36, P4 =

3

36,

P5 =4

36, P6 =

5

36, P7 =

6

36,

P8 =5

36, P9 =

4

36, P10 =

3

36,

P11 =2

36, P12 =

1

36

Equivalently,

Pi = P14−i =i − 1

36, i = 2, 3, ..., 7

Page 386: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 386

Obviously,

E(R|S = i) =

1, if i = 2, 3, 7, 11, 12

1 + E(Z), otherwise

Z = the number of rolls until the sum is either i or 7.

Note that i /∈ 2, 3, 7, 11, 12 at this point.

Z is a geometric random variable with success probability = Pi + P7.

Thus E(Z) = 1Pi+P7

.

Therefore,

E(R) =1 +6∑

i=4

Pi

Pi + P7+

10∑

i=8

Pi

Pi + P7= 3.376

Page 387: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 387

Computing Probabilities by Conditioning

If Y is discrete, then

P (E) =∑

y

P (E|Y = y)P (Y = y)

If Y is continuous, then

P (E) =

∫ ∞

−∞P (E|Y = y)fY (y)dy

Page 388: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 388

The Best Prize Problem: Deal? No Deal!

We are presented with n distinct prizes in sequence. After being presented with a

prize we must decide whether to accept it or to reject it and consider the next

prize. The objective is to maximize the probability of obtaining the best prize.

Assuming that all n! orderings are equally likely.

Q: How well can we do? What would be a good strategy?

—————————–

Many related applications: online algorithms, hiring decisions, dating, etc.

Page 389: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 389

A plausible strategy: Fix a number k, 0 ≤ k ≤ n. Reject the first k prizes

and then accepts the first one that is better than all of those first k.

Let X = the location of the best prize, i.e., 1 ≤ X ≤ n.

Let Pk(best) = the best prize is selected using this strategy.

Pk(best) =n∑

i=1

Pk(best|X = i)P (X = i)

=1

n

n∑

i=1

Pk(best|X = i)

Obviously

Pk(best|X = i) = 0, if i ≤ k

Page 390: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 390

Assume i > k, then

Pk(best|X = i) =Pk(best of the first i − 1 is among the first k|X = i)

=k

i − 1

Therefore,

Pk(best) =1

n

n∑

i=1

Pk(best|X = i)

=k

n

n∑

i=k+1

1

i − 1

Next question: which k to use? We choose the k that maximizes Pk(best).

Exhaustive search is the best thing to do, which only involves a linear scan.

However, it is often useful to have an “analytical” (albeit approximate) expression.

Page 391: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 391

A useful approximate formula

n∑

i=1

1

i= 1 +

1

2+

1

3+ ... +

1

n≈ log n + γ

where Euler’s constant

γ = 0.57721566490153286060651209008240243104215933593992...

The approximation is asymptotically (as n → ∞) exact.

Page 392: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 392

Pk(best) =k

n

n∑

i=k+1

1

i − 1

=k

n

(

1

k+

1

k + 1+ ... +

1

n − 1

)

=k

n

(

n−1∑

i=1

1

i−

k−1∑

i=1

1

i

)

≈k

n(log(n − 1) − log(k − 1))

=k

nlog

n − 1

k − 1

Page 393: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 393

Pk(best) ≈k

nlog

n − 1

k − 1≈ k

nlog

n

k

[Pk(best)]′ ≈ 1

nlog

n

k− k

n

1

k= 0 =⇒ log

n

k= 1 =⇒ k =

n

e

Also, when k = ne , Pk(best) ≈ n

n1e ≈ 0.36788, as n → ∞.

Page 394: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 394

Conditional Variance

Proposition 5.2:

V ar(X) = E[V ar(X |Y )] + V ar(E[X |Y ])

Proof: (from the opposite direction, compared to the book)

E[V ar(X |Y )] + V ar(E[X |Y ])

=E(

E(X2|Y ) − E2(X |Y ))

+(

E(E2(X |Y )) − E2(E[X |Y ]))

=(

E(E(X2|Y )) − E2(E[X |Y ]))

+(

−E(E2(X |Y )) + E(E2(X |Y )))

=E(X2) − E2(X)

=V ar(X)

Page 395: BTRY 4080 / STSCI 4080, Fall 2009

Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 395

Example 5o: Let X1, X2, ..., XN be a sequence of independent and

identically distributed random variables and let N be a non-negative

integer-valued random variable independent of Xi.

Q: Compute V ar(

∑Ni=1 Xi

)

.

Solution:

V ar

(

N∑

i=1

Xi

)

=E

[

V ar

(

N∑

i=1

Xi|N)]

+ V ar

(

E

[

N∑

i=1

Xi|N])

=E (V ar(X)N) + V ar(NE(X))

=V ar(X)E(N) + E2(X)V ar(N)