notes#10

67
Probability and Statistics (ENM 503) Michael A. Carchidi June 30, 2015 Chapter 10 - Simulation and Monte-Carlo Methods The following notes are based on the textbook entitled: A First Course in Probability by Sheldon Ross (9th edition) and these notes can be viewed at https://canvas.upenn.edu/ after you log in using your PennKey user name and Password. 1. Introduction and Motivation for Simulation In this chapter, we want to discuss simulation and specifically Monte-Carlo methods for computing probabilities. A more detailed discussion of simulation methods, of which Monte-Carlo methods is just one part of, is discussed in the ESE 603 course that is offered during the spring semester and this course serves as a good continuation of the ENM 503 course. Let us motivate the ideas behind simulation and Monte-Carlo methods in probability by considering the following geometric probability problem. Suppose that two coins of radii R 1 and R 2 are thrown on a rectangular sheet of paper having length L> 0 and width W> 0 so that the position of each coin’s center uniformly lands somewhere on the sheet of paper. Note that this does not require that the entire coin lands on the paper, only its center. Given these conditions, we would like to compute (in terms of the inputs: L, W , R 1 and R 2 ) the probability that the two coins overlap. Such a problem is known as a geometric probability problem.

Upload: i1958239

Post on 17-Feb-2016

218 views

Category:

Documents


2 download

DESCRIPTION

prob and statssim

TRANSCRIPT

Page 1: notes#10

Probability and Statistics(ENM 503)

Michael A. Carchidi

June 30, 2015

Chapter 10 - Simulation and Monte-Carlo Methods

The following notes are based on the textbook entitled: A First Course inProbability by Sheldon Ross (9th edition) and these notes can be viewed at

https://canvas.upenn.edu/

after you log in using your PennKey user name and Password.

1. Introduction and Motivation for Simulation

In this chapter, we want to discuss simulation and specifically Monte-Carlomethods for computing probabilities. A more detailed discussion of simulationmethods, of which Monte-Carlo methods is just one part of, is discussed in theESE 603 course that is offered during the spring semester and this course servesas a good continuation of the ENM 503 course.

Let us motivate the ideas behind simulation and Monte-Carlo methods inprobability by considering the following geometric probability problem. Supposethat two coins of radii R1 and R2 are thrown on a rectangular sheet of paperhaving length L > 0 and width W > 0 so that the position of each coin’s centeruniformly lands somewhere on the sheet of paper. Note that this does not requirethat the entire coin lands on the paper, only its center. Given these conditions, wewould like to compute (in terms of the inputs: L, W , R1 and R2) the probabilitythat the two coins overlap. Such a problem is known as a geometric probabilityproblem.

Page 2: notes#10

Without any lost in generality, we may assume that the rectangle is fixed onan xy plane as the region

R = {(x, y)|0 ≤ x ≤ L, 0 ≤ y ≤W}, (1)

which is shown in the following figure.

R = {(x, y)|0 ≤ x ≤ L, 0 ≤ y ≤W}

If (X1, Y1) give the coordinates of the center of coin 1 and if (X2, Y2) give thecoordinates of the center of coin 2, then, under the conditions of the problem, wehave X1 and X2, both independent and uniform random variables in the continu-ous interval [0, L), and we have Y1 and Y2, both independent and uniform randomvariables in the continuous interval [0,W ), i.e.,

X1 ∼ U [0, L) , Y1 ∼ U [0,W ) (2a)

andX2 ∼ U [0, L) , Y2 ∼ U [0,W ). (2b)

From the geometry of the problem, we then see that the two coins will overlapwhen the distance between their centers,

D =√(X2 −X1)2 + (Y2 − Y1)2 (3)

is less then or equal to the sum of their radii, i.e., when

D ≤ R1 +R2, (4)

2

Page 3: notes#10

as illustrated in the following two figures.

Here we have D ≤ R1 +R2and the two coins overlap

and

Here we have D > R1 +R2and the two coins do not overlap

As mentioned earlier, only the centers of the coins are required to lie in therectangular region, not the entire coins themselves, as illustrated in the next two

3

Page 4: notes#10

figures.

Here we have D ≤ R1 +R2and the two coins overlap

and

Here we have D > R1 +R2and the two coins do not overlap

To compute the probability that the two coins overlap requires that we compute

P = Pr(D ≤ R1 +R2) (5)

where D is some random variable that could be as small as zero, when the twocenters coincide, or as large as (L2+W 2)1/2, when the two centers are on oppositecorners of the rectangle. This is somewhat difficult to compute analytically sincethe random variables X1 and X2 are from U [0, L) and the random variables Y1and Y2 are from U [0,W ), making it difficult to determine the random nature of

4

Page 5: notes#10

the random variable D as defined in Equation (4), even through stating the rangespace of D as

0 ≤ D ≤√L2 +W 2

is somewhat obvious.

We shall see that simulation offers a way to estimate the probability in Equa-tion (5) using the computer and without requiring that much more work than wehave already done. Such an estimate will be provided in the last section of thischapter. Before we see how this is accomplished, it should first be noted thatsince

X1 ∼ U [0, L) , Y1 ∼ U [0,W )

andX2 ∼ U [0, L) , Y2 ∼ U [0,W )

we have

X1 = LZ11 , Y1 = WZ12 , X2 = LZ21 , Y2 = WZ22

where Z11, Z12, Z21 and Z22 are all independent standard uniform random vari-ables, U [0, 1). Then

P = Pr(D ≤ R1 +R2) = Pr(√(X2 −X1)2 + (Y2 − Y1)2 ≤ R1 +R2),

becomes

P = Pr(√(LZ21 − LZ11)2 + (WZ22 −WZ12)2 ≤ R1 +R2)

= Pr(√(Z21 − Z11)2 + (W/L)2(Z22 − Z12)2 ≤ (R1 +R2)/L)

orP = Pr(

√(Z21 − Z11)2 + ω2(Z22 − Z12)2 ≤ ρ) (6a)

where

ρ =R1 +R2

Land ω =

W

L, (6b)

thereby showing that P is not a function of the four parameters: L, W , R1 andR2, but is rather a function of only the two parameters ρ and ω, and in the specialcase when W = L (i.e., when ω = 1), then P depends only on the single valueof ρ. These results will serve as a way of checking the simulation for accuracy by

5

Page 6: notes#10

seeing if P stays fixed when one changes the values of L, W , R1 and R2 in a waythat keeps the values of ρ and ω fixed.

At the heart of all simulations are random numbers so let us now discuss these.A more detailed discussion is found in the ESE 603 course.

2. The Definition of Random Numbers

A random number (denoted by R) is simply a sample from the standard uni-form distribution U [0, 1), whose pdf and cdf are given by

f(x) =

0, x < 0

1, 0 ≤ x < 1

0, 1 ≤ x

& F (x) =

0, x ≤ 0

x, 0 ≤ x ≤ 1

1, 1 ≤ x

,

respectively. It is easily seen that the mean and variance of the standard uniformdistribution are given by

E(X) =

∫ 1

0

xdx =1

2and V (X) =

∫ 1

0

(x− 1

2

)2dx =

1

12,

respectively, and these will help when it comes to checking a random sequence foraccuracy.

Random numbers are a necessary basic ingredient in simulation because froma sample R ∼ U [0, 1), we shall see that in theory, it will be possible to generate asample from any other random variable X. The reader may never have to writea computer program to generate random numbers because all well-written simu-lation software have built-in subroutines, objects, or functions that will generaterandom numbers. For example, Microsoft Excel, which we shall use later to solvethe problem proposed in the introduction, has a routine called RAND() whichgenerates a random number. However, it is still important to understand thebasic ideas behind the generation and testing of random numbers.

6

Page 7: notes#10

3. Some Basic Properties of Random Numbers

Before we look at a common method for the generation of random numbers,let us first discuss some basic properties of random numbers that can be used inthe testing of random numbers. Because a sequence of random numbers

{R1, R2, R3, . . . , RN}

is a sample from the standard uniform distribution U [0, 1), it must satisfy thefollowing three important properties.

• Uniformity, which says that if the interval [0, 1) is partitioned into n classes,or subintervals of equal length, then the expected number of observationsin each class must be N/n, where N is the total number of observations.

• Independence, which says that the probability of observing a value Rk in aparticular interval must be independent of any of the earlier values R1, R2,..., Rk−1.

• Sample Mean and Variance, which says that the average sample value

R1 +R2 +R3 + · · ·+RNN

in the limit of large N must approach 1/2, and the variance in this sample,

R21 +R22 +R23 + · · ·+R2NN

−(R1 +R2 +R3 + · · ·+RN

N

)2,

in the limit of large N must approach 1/12. In addition, these limits shouldbe approached in an oscillatory (or non-monotonic) manner so that some-times they are too high and sometimes they are too low and sometimes theyare too high and sometimes they are too low, and so on.

4. Generation of Pseudo-Random Numbers

This section describes the common method for the generation of random num-bers and some methods for testing these for “randomness”. Since a computeralgorithm must be used to generate random numbers, they are “technically” not

7

Page 8: notes#10

really random. For this reason, they are call pseudo random since the word pseudoimplies that the very act of generating random numbers by any known methodremoves the potential for true randomness because if the method is known, theset of random numbers can be replicated over and over again. Therefore a philo-sophical argument could be made that it is impossible to construct a computeralgorithm that generates “truly” random numbers.

Therefore, the real goal of any “random-number generation scheme” is toproduce a sequence of numbers between zero and one which simulates (or mimics)the necessary properties of uniformity and independence as closely as possible, sothat if just the sequence of numbers

{R1, R2, R3, . . . , RN}

is provided to a user, it should be virtually impossible for the user to reconstructthe computer algorithm that produced this sequence of numbers.

When generating pseudo-random numbers, certain problems or errors can oc-cur which should be avoided by a good algorithm. Some of these errors (butcertainly not all) include the following:

• the generated numbers may really not be uniformly distributed,

• the generated numbers may really be discrete-valued instead of continuousvalued,

• the sample mean of the generated numbers may be consistently too highabove 1/2 or too low below 1/2,

• the sample variance of the generated numbers may be consistently too highabove 1/12 or too low below 1/12, and

• the numbers may not really be independent in that there may be dependencein any of the following ways:

• autocorrelation between numbers, e.g., every “fifth” number is largerthan the mean of 1/2, and so on,

• numbers successively higher or lower than adjacent numbers,

8

Page 9: notes#10

• several numbers are found above the mean followed by several numbersbelow the mean, and so on.

Any departures from uniformity and independence for a particular generationscheme may be detected by tests such as those we shall describe later. Generators,such as RAND() in Microsoft Excel, have pass many of these tests as well as morestringent tests and so there is really no excuse for using a generator that is laterbeen found to be deficient.

In most cases, random numbers are generated as part of a subroutine (orfunction) for a given simulation and most generators of random numbers shouldsatisfy the following “practical” conditions:

• the generator routine should be fast since “good” statistics requires a largesample size of random numbers,

• the generator routine should be portable to different computers, and ideallyto different programming languages,

• the generator routine should have a long cycle length or period, which isthe length of the random-number sequence before previous numbers beginto repeat themselves, (and what this means will be discussed in more detaillater),

• the random numbers generated should be replicable so that it should bepossible to generate the same sequence of random numbers given the samestarting point in the sequence,

• the generated random numbers should closely approximate the ideal statis-tical properties of uniformity and independence.

Note that constructing algorithms that seem to generate random numbers iseasy, but constructing algorithms that really do produce sequences of randomnumbers that are independent and uniformly distributed in the interval between0 and 1 is much more difficult.

One purpose of this section is to discuss the central issues in random-numbergeneration in order to enhance ones understanding in the generation of random

9

Page 10: notes#10

numbers and to show some of the techniques that are used to test a sequence ofnumbers for independence and uniformity.

First we discuss the techniques for generating random numbers and then weshall discuss some tests used to see if these sequences are “random”.

A seemingly simple way to generate a sequence of N random numbers

{R0, R1, R2, R3, . . . , RN}

is to start with a continuous function f that maps the interval [0, 1) onto theinterval [0, 1), i.e.,

f : [0, 1)→ [0, 1).

Then an initial value (called the seed) R0 in the interval [0, 1) is chosen and theiteration scheme Rn+1 = f(Rn), for n = 0, 1, 2, . . . , N − 1, is used to generateR1, R2, R3, . . ., RN . This is known as an iteration (or recursive) method. Let’sillustrate the idea with two examples.

Example #1: f(R) = 4R(1− R)

The function f(R) = 4R(1−R), which is plotted below,

10.80.60.40.20

1

0.8

0.6

0.4

0.2

0

Plot of f(R) = 4R(1− R) for 0 ≤ R ≤ 1

does map the unit interval [0, 1) into itself and if we let R0 = 0.6, then

Rn+1 = f(Rn) = 4Rn(1−Rn)

10

Page 11: notes#10

produces the sequence

{R0, R1, R2, R3, . . .} = {0.6, 0.95, 0.1536, 0.5200, . . .}

which may, on the surface “look” random but tests for uniformity will reveal thatsuch a sequence is not very random at all. Choosing the seed R0 = 0.75 insteadproduces the sequence

{R0, R1, R2, R3, . . .} = {0.75, 0.75, 0.75, 0.75, . . .}

which is certainly not random, or choosing the seed

R0 =5−

√5

8≃ 0.34549

produces the sequence

{R0, R1, R2, R3, . . .} ={5−

√5

8,5 +

√5

8,5−

√5

8,5 +

√5

8, ...

}

or{R0, R1, R2, R3, . . .} = {0.34549, 0.90451, 0.34549, 0.90451, ...}

which is also certainly not random, showing that such a recursive method is verymuch dependent on the value of the seed R0. In addition, we should note thatthe sequence generated using R0 = 0.6 may never contain the numbers 0.75 or(5±

√5)/8. �

Example #2: f(R) = R2

It should be noted that some choices of the “mapping function” f : [0, 1) →[0, 1) will never produce a sequence that looks random for any choice of R0. Forexample, the mapping function f(R) = R2, which is plotted below,

11

Page 12: notes#10

10.80.60.40.20

1

0.8

0.6

0.4

0.2

0

Plot of f(R) = R2 for 0 ≤ R ≤ 1

yields (using any value of 0 < R0 < 1) the monotonically decreasing sequence

{R0, R1, R2, R3, . . .} = {R0, R20, R40, R80, . . .}

which tends to zero and is definitely not random. �

Another method for possibly generating random numbers, which is a littlemore direct, is to let Rn equal to some function f(n) which maps the positiveintegers (N ) to the interval [0, 1), and let us illustrate this with an example.

Example #3: Using a Function f : N → [0, 1)

Consider the function f(n) = | sin(n)| which maps the positive integers ontothe unit interval [0, 1), and let

Rn = f(n) = | sin(n)|

for n = 1, 2, 3, . . .. This is plotted in the following figure,

12

Page 13: notes#10

n1086420

1

0.8

0.6

0.4

0.2

0

Plot of Rn = |sin(n)| versus n

and it produces the sequence

{R1, R2, R3, R4, R5, . . .} = {0.8415, 0.9093, 0.1411, 0.7568, 0.9589, . . .}which may, on the surface look random, but tests for uniformity will reveal thatsuch a sequence is not very random at all. In addition, there is no seed R0 thatcan be adjusted so that the same sequence will result every time the algorithm isused. Of course, this problem can be removed by setting

Rn = |sin(nR0)|for n = 1, 2, 3, . . ., where R0 is an adjustable parameter which “acts like” a seed.�

The biggest problem with both of the approaches,

f : [0, 1)→ [0, 1) and f : N → [0, 1),

is that they rely on real-number arithmetic which can sometimes be “unpre-dictable” when performed by a computer. To illustrate this statement, the readeris directed to the 4R(1 − R) worksheet in the Microsoft Excel file that accom-panies this chapter. This worksheet illustrates what is commonly known as the“butterfly effect”, which says that a small change at the beginning of an itera-tion scheme could very quickly propagate into a very large effect later on. It issometimes dramatically worded to say that a single butterfly flapping its wings inSouth America could result in a tornado being formed in Texas. Specifically, thisworksheet shows that the sequence generated using

R0 = 0.60000001 and Rn+1 = 4Rn(1− Rn)

13

Page 14: notes#10

is very different from the sequence generated using

R0 = 0.60000002 and Rn+1 = 4Rn(1− Rn)

even as soon as in the values of R18. This is mainly due to the limited storagecapability of a computer and these effects can sometimes not be avoided and isthe subject of a branch of mathematics known as Chaos Theory. A better scheme,which uses mostly integer arithmetic (and hence avoids this type of chaotic be-havior) is now described.

Linear Congruential Method

The linear congruential method is the most widely used method for generatingrandom numbers. The major advantage of this method is that it uses mostlyinteger arithmetic and hence can be implemented easily on a computer with verydependable outcomes. The linear congruential method first produces a sequenceof integers

{X0,X1,X2,X3, . . . ,Xn, . . .}between 0 and m− 1 according to the following linear recursive relationship

Xn+1 = (aXn + c)mod(m) (7)

for n = 0, 1, 2, 3, . . .. The initial integer value X0 is called the seed, the integer ais called the constant multiplier, the integer c is the increment, and the integerm is the modulus with m > 1. From this sequence of integers, the sequence of“random” numbers in the interval [0, 1),

{R0, R1, R2, R3, . . . , Rn, . . .},

is then computed using Rn = Xn/m for n = 0, 1, 2, 3, . . ., and hence involvesa single division. This is the only real-number arithmetic needed and all otherarithmetic is integer.

Modular Arithmetic

By definition, we say that a = bmod(m) when the integer a − b is evenlydivisible by m. In fact, the notation bmod(m) is used to represent the remainderone gets when one divides b by m. For example, 7mod(3) = 1 since 3 divided

14

Page 15: notes#10

into 7 gives 2 with a remainder of 1. Also, −10mod(3) = 2 since 3 divided into−10 gives −4 with a remainder of 2, and 15mod(3) = 0 since 3 divided into 15gives 5 with a remainder of 0. It should be clear that bmod(3) can equal any oneof three possible values: 0, 1 or 2.

It should be clear from the definition of mod(m) that each Xn computed inEquation (7) must equal an integer from 0 to m − 1, inclusive, and hence thesequence of integers

{X0,X1,X2,X3, . . . ,Xn, . . .}must eventually become repetitive, which then says that the resulting sequence of“random” numbers

{R0, R1, R2, R3, . . . , Rn, . . .},with Rn = Xn/m for n = 0, 1, 2, 3, . . ., must also eventually become repetitive.The cycle length of the sequence

{X0, X1, X2, X3, . . . , Xn, . . .} and hence {R0, R1, R2, R3, . . . , Rn, . . .},

equals the number of entries in the repetitive part of the sequence. This wouldthen suggest that the sequence

{R0, R1, R2, R3, . . . , Rn, . . .},

is really not random unless the cycle length of the sequence is large enough so thatthe repetitive nature of the sequence is very well hidden. We shall see that theselection of the values for a, c, m, and X0 can drastically affect the cycle lengthbut first let’s look at an example.

Example #4: A Linear Congruence

Let us use the linear congruential method to generate a sequence of “randomnumbers” using a = 17, c = 43, m = 100, and X0 = 27 (along with X0 = 20) inthe equation

Xi+1 = (aXi + c)mod(m) = (17Xi + 43)mod(100)

for i = 0, 1, 2, 3, . . .. Here the Xi’s will be integers from 0 to 99, inclusive, andso the Ri’s will be two decimal-place random numbers between 0.00 and 0.99,

15

Page 16: notes#10

inclusive. The following two tables of results (one using X0 = 27 and one usingX0 = 20) are obtained.

i Xi Ri0 27 0.271 2 0.022 77 0.773 52 0.524 (27) 0.275 2 0.026 77 0.777 52 0.528 27 0.279 2 0.0210 77 0.77

,

i Xi Ri i Xi Ri0 20 0.20 11 21 0.211 83 0.83 12 0 0.002 54 0.54 13 43 0.433 61 0.61 14 74 0.744 80 0.80 15 1 0.015 3 0.03 16 60 0.606 94 0.94 17 63 0.637 41 0.41 18 14 0.148 40 0.40 19 81 0.819 23 0.23 20 (20) 0.2010 34 0.34 21 83 0.83

The numbers in parenthesis show where the sequence starts to repeat. Note thatusing the seed X0 = 27 gives in the numbers 0.27, 0.02, 0.77, and 0.52 and thesecontinually repeat resulting in a cycle length of 4, but using the seed X0 = 20does a little better resulting in a cycle length of 20, but note that the numbers0.27, 0.02, 0.77 and 0.52 can never appear in this sequence of 20 numbers. � Oneshould note that the resulting sequence generated by

Xi+1 = (aXi + c)mod(m)

does depends on the seed X0 and the repetitive part of two different sequencescan have no elements in common.

The ultimate test of the linear congruential method, as of any generationscheme, is how closely the generated numbers approximate uniformity and inde-pendence. Other important properties include maximum density and maximumperiod. By maximum density, it is meant that the values assumed by the Ri’sleave no large gaps on [0, 1).

Gaps

With regards to these gaps, note that the sequence of random numbers gen-

16

Page 17: notes#10

erated by the linear congruential method can only come from the set

{0,1

m,2

m,3

m, · · · , m− 1

m

}

which means that the Ri’s are discrete (not continuous) on the interval [0, 1) andthe gap is no smaller than 1/m. However, all of this is of little consequence if themodulus m is very large. Values of m as large as

m = 248 = 281, 474, 976, 710, 656

are in common use these days making

1

m≃ 3.5527× 10−15

so that the discreteness of such a sequence, and the resulting gap produced, iswell hidden.

Periods

With regards to the period, we again note that the sequence of random num-bers generated by the linear congruential method can only come from the set

{0,1

m,2

m,3

m, · · · , m− 1

m

}

which means that maximum period of the sequence of Ri’s can be no larger thanm and a maximum period equal to m can be achieved by proper choices of a, cand X0 (for a given value of m). Specifically, the following general results fromnumber theory can be utilized to insure maximum periods when m is either apower of 2 (which is good when it comes to computers) or when m is a primenumber.

• For m a power of 2, and c �= 0, the longest possible period that can beachieved is m, and this is accomplished whenever c is odd and a = 1mod(4).Furthermore, it should be obvious that this does not depend on the choiceof seed X0 since every integer from 0 to m− 1, inclusive will be representedsomewhere in the sequence of Xi’s.

17

Page 18: notes#10

• A decrease in the number of operations in Xi+1 = (aXi+ c)mod(m) can beaccomplished by making c = 0, so that Xi+1 = (aXi)mod(m), and for ma power of 2, and c = 0, the longest possible period that can be achievedis m/4, and this is accomplished whenever the seed X0 is odd and themultiplier a satisfies a = 3mod(8) or a = 5mod(8).

• For m a prime number and c = 0, the longest possible period that can beachieved is m− 1 which is accomplished whenever the multiplier a has theproperty that the smallest integer k such that ak = 1mod(m) is k = m− 1.In other words we want a so that ak − 1 is not divisible by m for all valuesof k = 1, 2, 3, . . . ,m − 2, yet ak − 1 is divisible by m for k = m − 1. Inaddition, we must have X0 �= 0, since X0 = 0 generates only the sequence{0, 0, 0, . . . , 0, . . .}.

Example #5: A Linear Congruence in Which m is a Power of 2

Let’s assume that a = 13, m = 26 = 64 (a power of 2), c = 0 and X0 = 1, 2, 3,and 4. Then

Xi+1 = (13Xi)mod(64)

and we generate the following table.

i Xi Xi Xi Xi

0 1 2 3 41 13 26 39 522 41 18 59 363 21 42 63 204 17 34 51 (4)5 29 58 23 526 57 50 43 367 37 10 47 208 33 (2) 35 4

,

i Xi Xi Xi Xi

9 45 26 7 5210 9 18 27 3611 53 42 31 2012 49 34 19 413 61 58 55 5214 25 50 11 3615 5 10 15 2016 (1) 2 (3) 417 13 26 39 52

The numbers in parenthesis show where the sequence starts to repeat. Themaximum period of m/4 = 16 is achieved using X0 odd (1 or 3). Notice thata = 13 = 5mod(8) as required to achieved maximum period. Note also that whenX0 = 1, the generated sequence assumes values (when ordered) in the set

{1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61}

18

Page 19: notes#10

and the “gap” in this sequence of random numbers is equal to

5

64− 1

64=1

16= 0.0625

which is large, and this leads one to be concerned about the density of the randomnumbers generated using the scheme in this example. Of course, this generatorhas a period that is too short and the density is insufficiently low for it to be usedto generate random numbers, but it does illustrate the importance of properlychoosing a, c, m and X0. �

Example #6: A Linear Congruence in Which m is Prime

Let’s assume that a = 5, m = 7 (a prime) and c = 0. Then the followingtable shows that choosing a = 5 leads to a maximum period of m − 1 = 6 usingXi+1 = 5Ximod(7).

k ak − 1 Comments i Xi

1 51 − 1 = 4 Not Divisible by 7 0 32 52 − 1 = 24 Not Divisible by 7 1 13 53 − 1 = 124 Not Divisible by 7 2 54 54 − 1 = 624 Not Divisible by 7 3 45 55 − 1 = 3124 Not Divisible by 7 4 66 56 − 1 = 15624 Divisible by 7 5 2− − − 6 (3)

The number in parenthesis show where the sequence of Xi’s starts to repeat. Themaximum period of m − 1 = 6 is then achieved using any value of X0 not equalto zero, and the resulting sequence of random numbers

{3

7,1

7,5

7,4

7,6

7,2

7,3

7, . . . ,

}→{1

7,2

7,3

7,4

7,5

7,6

7,1

7, . . . ,

}

when ordered, produces a gap of 1/7. �

Once again we point out that using a large value of m such as m = 248 andhaving a maximum period of m (when c �= 0) or m/4 (when c = 0) will resultin a small gap and a large period for the appropriately chosen values of a and c,and this will mask the discrete and the repetitive nature of the numbers beinggenerated.

19

Page 20: notes#10

If the reader is using a reliable random-number generator that is provided witha reliable software package, then the set of random numbers generated need notbe tested for “randomness”. In this case, the reader may choose to skip the nextsection and proceed to Section 6.

5. Tests for Random Numbers - Optional

Even though a sequence of random numbers may “look” random, many statis-tical test must be performed to insure their randomness. The desirable propertiesof random numbers - uniformity and independence - were discussed earlier. Toinsure that these desirable properties are achieved, a number of statistical testscan be performed on the generated numbers. Note that the appropriate statisticaltests have already been performed on generators used for most commercial sim-ulation software because, without a dependable random number generator, anysimulation that uses this random-number generator would not yield valid results.For this reason, this section is for informational purposes only. A more detaileddiscussion of these is covered in the ESE 603 course.

Some common tests for random numbers that should be preformed if an “in-house” random number generator is constructed, are as follows.

• Test for Uniformity

• Frequency Test #1 : Uses the Kolmogorov-Smirnov (KS) test to com-pare the sample cumulative distribution of the set of generated numbersto the theoretical standard uniform cumulative distribution for U [0, 1)given by

F (x) =

0, when x ≤ 0

x, when 0 ≤ x ≤ 1

1, when 1 ≤ x

.

• Frequency Test #2 : Uses the chi-squared (χ2) statistic

χ2 =n∑

i=1

(Oi − Ei)2

Ei

20

Page 21: notes#10

to test if that the actual measured number of numbers in a particleclass (Oi) equals the expected number in that class (Ei), for all the nclasses, as predicted by the standard uniform distribution U [0, 1).

• Tests for Independence

• Runs Test : Tests the runs up and down or the runs above and belowthe mean by comparing the actual values to the expected values aspredicted by the standard uniform distribution U [0, 1). The statisticfor comparison are the Standard Normal Distribution and The Chi-Squared Distribution.

• Autocorrelation Test: Tests the correlation between the generated num-bers and compares the sample correlation to the expected correlationof zero as predicted by the standard uniform distribution U [0, 1). Thestatistic for comparison is the Standard Normal Distribution.

• Gap Test : Counts the number of digits that appear between repeti-tions of a particular digit and then uses the Kolmogorov-Smirnov (KS)test to compare this with the expected size of gaps as predicted by ageometric distribution. The statistic for comparison is the GeometricDistribution.

• Poker Test : Treats numbers grouped together as a “poker hand”. Forexample, a five-digit number 0.11433 can be though of as a five-cardpoker hand having two pairs, or 0.2222 can be though of as a five-cardpoker hand having five of a kind, and so on. Then, a chi-squared (χ2)statistic is used to compare the frequency of these “poker hands” towhat is expected based on a deck of 50 cards having: five 0’s, five1’s, five 2’s, and so on up to five 9’s (in the case of five-digit randomnumbers).

Hypothesis Testing for Uniformity

In testing for uniformity, the null hypothesis is

H0 : {R1, R2, R3, . . . , RN} are uniform on the interval [0, 1)

(neglecting the seed R0) and failure to reject this null hypothesis means that noevidence of non-uniformity has been detected on the basis of this test.

21

Page 22: notes#10

Note that this does not imply that further testing of the generator for uniformityis unnecessary because no test can ever “guarantee” that the generated numbersare distributed uniformly on the interval [0, 1).

Hypothesis Testing for Independence

In testing for independence, the null hypothesis is

H0 : {R1, R2, R3, . . . , RN} are independent on the interval [0, 1)

(neglecting the seed R0) and failure to reject this null hypothesis means that noevidence of dependency has been detected on the basis of this test.

Note that this does not imply that further testing of the generator for inde-pendence is unnecessary because no test can ever “guarantee” that the generatednumbers are independence.

Level of Significance

For each of the above tests, a level of significance α must be stated. This levelof significance is the probability of rejecting the null hypothesis given that thenull hypothesis is true and is known as a Type I(α) error, i.e.,

α = Pr(Reject H0 |H0 is true) (8a)

which is often referred to as a false positive. The decision maker sets the valueof α, and usually, α is set to a small value such as 0.01 or 0.05. This then saysthat the probability that you reject the null hypothesis, given that it is true (i.e.,make a false positive) would be small. Note that a Type II(β) error involves theprobability of accepting the null hypothesis given that the null hypothesis is falseand is defined as

β = Pr(Accept H0 |H0 is false), (8b)

and this is known as a false negative. Of course

Pr(Accept H0 |H0 is true) = 1− α & Pr(Reject H0 |H0 is false) = 1− β

are not considered errors.

22

Page 23: notes#10

Note that we can never choose to accept H0 with certainty. We can only chooseto reject H0 (or accept H0) up to a certain significance level .

If several tests are made on a sequence of random numbers, the probabilityof rejecting the sequence (making a Type I(α) error) on at least one test, bychance alone, must increase. Similarly, if one test is conducted on many sets ofrandom numbers, the probability of rejecting at least one set (making a TypeI(α) error), by chance alone must increase as well. For example, if 100 sets ofnumbers were subjected to a particular test, with α = 0.05, it would be expectedthat (100)(0.05) = 5 of these sets would be rejected by chance alone, or if one setof numbers is subjected to 100 tests (all with the same level α), then this set ofnumbers is expected to not pass (100)(0.05) = 5 of these tests by chance alone.

In general, if the number of rejections in N tests (all with the same level α) isclose to the expected number, αN , then there is no compelling reason to discardthe generator that is being tested since αN rejections would normally occur bychance alone. In addition, if a set of random numbers passes all the tests, it isstill no guarantee that the set is truly random because it is always possible thatsome underlying pattern will go undetected.

Frequency Tests

Basic tests that should always be performed to validate a “new generator”of random numbers are tests for uniformity. At least two different methods oftesting are readily available. They are the Kolmogorov-Smirnov (KS) and thechi-squared (χ2) tests and both of these tests measure the degree of agreementbetween the distribution of a sample of generated random numbers and resultspredicted by the theoretical uniform distribution U [0, 1). These both assume thenull hypothesis of no significant difference between the sample distribution andthe theoretical distribution.

The Kolmogorov-Smirnov (KS) Test

This test compares the empirical cdf SN(x) constructed from a sample of Nrandom numbers to the theoretical cdf F (x) of the standard uniform distribution.

23

Page 24: notes#10

For the standard uniform distribution U [0, 1), the theoretical cdf is given by

F (x) =

0, for x ≤ 0

x, for 0 ≤ x ≤ 1

1, for 1 ≤ x

which is plotted below.

x1.510.50-0.5

1

0.8

0.6

0.4

0.2

0

Plot of the theoretical cdf for U [0, 1)

If a sample from a given random-number generator is {R1, R2, R3, . . . , RN}, (notincluding the seed R0) then the empirical cdf of this sample is constructed using

SN(x) =Number of Elements in {R1, R2, R3, . . . , RN} ≤ x

N.

One would think that, if the null hypothesis

H0 : {R1, R2, R3, . . . , RN} ∼ U [0, 1)

is true, then SN (x) should be an approximation to F (x), especially as N becomeslarger.

One measure of this approximation is the largest absolute deviation betweenF (x) and SN (x) over the range space of the random variable X ∼ U [0, 1), i.e.,

DN = max0≤x<1

|F (x)− SN(x)| . (9a)

24

Page 25: notes#10

The Kolmogorov-Smirnov test uses this sampling distribution of DN and this istabulated in many references for various values of α and of N . If fact, Kolmogorovand Smirnov showed that

Pr

(DN ≤

x√N

)≃ 1− 2

∞∑

k=1

(−1)k−1e−2k2x2 (9b)

for large values of N . For example, setting

Pr

(DN ≤ Dα,N =

x√N

)≃ 1− 2

∞∑

k=1

(−1)k−1e−2k2x2 = 1− α

gives

Pr

(DN ≤

1.22√N

)≃ 1− 2

∞∑

k=1

(−1)k−1e−2k2(1.22)2 ≃ 0.90 = 1− 0.10

and

Pr

(DN ≤

1.36√N

)≃ 1− 2

∞∑

k=1

(−1)k−1e−2k2(1.36)2 ≃ 0.95 = 1− 0.05

and

Pr

(DN ≤

1.63√N

)≃ 1− 2

∞∑

k=1

(−1)k−1e−2k2(1.63)2 ≃ 0.99 = 1− 0.01.

When using the Kolmogorov-Smirnov to test a random sequence {R1, R2, R3, ..., RN}against a standard uniform cdf, the test procedure follows the following five stepswhich can be easily performed using Microsoft Excel:

1.) Rank the sequence {R1, R2, R3, ..., RN} from smallest to largest. Specifically,let R(i) denote the ith smallest observation, so that

R(1) ≤ R(2) ≤ R(3) ≤ · · · ≤ R(N).

25

Page 26: notes#10

2.) Compute

D+N = max

1≤i≤N

{i

N−R(i)

}

which is the largest deviation of SN (x) above F (x), and

D−N = max

1≤i≤N

{R(i) −

i− 1N

}

which is the largest deviation of SN (x) below F (x).

3.) Compute DN = max{D+N ,D

−N}, which is the largest absolute deviation be-

tween SN (x) and F (x).

4.) Determine the critical value, Dα,N , from a KS table for a specified signif-icance level α and the given sample size N . Note that these can be alsocomputed using

Dα,N ≃1√N×

1.22, when α = 0.10

1.36, when α = 0.05

1.63, when α = 0.01

when the sample size N is larger than 35, which is usually the case.

5.) If the sample statistic DN is greater than the critical value, Dα,N , the nullhypothesis that the sample data are a sample from a standard uniformdistribution is rejected. If DN ≤ Dα,N , we conclude that no difference hasbeen detected between the true distribution of {R1, R2, R3, . . . , RN} and thestandard uniform distribution U [0, 1).

Example #7: The Kolmogorov-Smirnov (KS) Test

Consider the small set of N = 5 “random” numbers

{0.44, 0.81, 0.14, 0.05, 0.93}.

26

Page 27: notes#10

From these we may generate the following table

i Ri R(i) i/N i/N − R(i) (i− 1)/N R(i) − (i− 1)/N1 0.44 0.05 0.20 0.15 0.00 0.052 0.81 0.14 0.40 0.26 0.20 −3 0.14 0.44 0.60 0.16 0.40 0.044 0.05 0.81 0.80 − 0.60 0.215 0.93 0.93 1.00 0.07 0.80 0.13

D+ = 0.26 D− = 0.21

which leads to D = max{D+, D−} = 0.26. The sample cdf using these N = 5“random” numbers is shown in the figure below, along with the expected cdfF (x) = x for 0 ≤ x ≤ 1.

x10.80.60.40.20

1

0.8

0.6

0.4

0.2

0

Plot of the theoretical (thin curve)and sample (thick curve) cdf

This also shows that D = 0.26, and for α = 0.05 and N = 5, we find from theKS tables that DN,α = D5,0.05 = 0.565, showing that D < D5,0.05. Therefore, thehypothesis that the distribution of generated numbers is random should not berejected based on the KS test. �

The Chi-Squared (χ2) Test

The chi-squared test on N observations begins by partitioning the N observa-tions into n disjoint classes and then uses the sample statistic

χ2 =n∑

i=1

(Oi −Ei)2

Ei(10a)

27

Page 28: notes#10

where Oi is the observed number of observations in the ith class, Ei is the expectednumber of observations in the ith class (based on the random variable one believesthe observations come from), and n is the number of classes choosen. Of course,we must have

n∑

i=1

Oi = N, (10b)

and for the uniform distribution, the expected number of observations in eachclass is Ei = N/n for equally-sized classes. We now present an intuitive argumentshowing that the χ2 sampling distribution (for large values of N) is approximatelythe chi-squared distribution with n − 1 degrees of freedom. Tables of differentpercentage points of the chi-squared distribution with ν degrees of freedom fordifferent values of α are also easily obtained.

An Intuitive Argument Showing that χ2 ∼ χ2n−1 - Optional

The statistic

χ2 =n∑

i=1

(Oi −Ei)2

Eialong with the constraint

n∑

i=1

Oi = N

is known to have (approximately) a chi-squared distribution with n − 1 degreesof freedom. In this section, we want to motivate why this is so. Toward thisend we will be using the fact from probability which says if Z1, Z2, Z3, ..., Znare independent random variables each having a standard normal distributionN(0, 1), then the random variable

X = Z21 + Z22 + Z23 + · · ·+ Z2n (11)

is chi-squared with n degrees of freedom.

Suppose we defined n disjoint classes C1, C2, C3, ..., Cn and suppose datapoints are coming in every second so that each data point can be placed into oneand only one of these classes. For example, the classes might simply be definedby weights of people (in pounds) with

C1 = {W |0 ≤W < 100 lbs} , C2 = {W |100 lbs ≤W < 150 lbs}and

C3 = {W |150 lbs ≤W < 200 lbs} , C4 = {W |200 lbs ≤W < 250 lbs}

28

Page 29: notes#10

and

C5 = {W |250 lbs ≤W < 300 lbs} , C6 = {W |300 lbs ≤W},

and the data points coming in refers to customers entering a store. Then eachcustomer can be placed into one of these six classes depending on the weight ofthe customer.

Now let Oi be the random variable on the number of data points coming inand placed in class Ci, for i = 1, 2, 3, ..., n. This is a random variable just like thenumber of customers coming into a store with weights less than 100 lbs is alsoa random variable. Let Ei = E(Oi) be the expected value of Oi based on somedistribution and to determine the variance of Oi, we must make some assumptionabout the natural of Oi. As data points come into class Ci, suppose we make thereasonable assumption that they come in according to a Poisson Process, whichleads to a Poisson distribution with parameter λit (with t = 1 time unit). Beforewe continue, let us be reminded about the assumptions behind a Poisson process.

A Poisson Process - A Reminder

Consider a sequence of random events such as the arrival of units at a shop orthe arrival of data coming in as measurements. These events may be describedby a counting function N(t) (defined for all 0 ≤ t), which equals the numberof events that occur in the closed time interval [0, t]. We assume that t = 0 isthe point at which the observations begin, whether or not an arrival occurs atthat instant and we note that N(t) is a random variable with with possible valuesequal to the non-negative integers: 0, 1, 2, 3, . . .. Such an arrival process is called aPoisson process with mean rate (per unit time) λ if the following three reasonableassumptions are fulfilled.

A1: Arrivals occur one at a time: This implies that the probability of 2 or morearrivals in a very small (i.e., infinitesimal) time interval ∆t is zero comparedto the probability of 1 or 0 arrivals occurring in the same time interval ∆t.

A2: N(t) has stationary increments: The distribution of the numbers of arrivalsbetween t and t+∆t depends only on the length of the interval ∆t and noton the starting point t. Thus, arrivals are completely random without rushor slack periods. In addition, the probability that a single arrival occurs in

29

Page 30: notes#10

a small time interval ∆t is proportional to ∆t and given by λ∆t where λ isthe mean arrival rate (per unit time).

A3: N(t) has independent increments: The numbers of arrivals during non-overlapping time intervals are independent random variables. Thus, a largeor small number of arrivals in one time interval has no effect on the numberof arrivals in subsequent time intervals. Future arrivals occur completely atrandom, independent of the number of arrivals in past time intervals.

Given that arrivals occur according to a Poisson process, (i.e., meeting the threeassumptions A1, A2, and A3), it is possible to derive an expression for the proba-bility that n arrivals (n = 0, 1, 2, 3, . . . ,) occur in the time interval [0, t]. We shalldenote this probability by Pn(t), and it can be shown that

Pn(t) = Pr(N(t) = n) =e−λt(λt)n

n!

for n = 0, 1, 2, 3, . . . and for all time t > 0. This is known as a Poisson distributionwith parameter λt and the mean and variance of such a distribution are

E(N(t)) = λt and V (N(t)) = λt = E(N(t)),

respectively. In one time unit (in which t = 1), we then have

E(N(1)) = λ. and V (N(1)) = λ = E(N(1)).

as the mean and variance in the arrival of the data that is being studied.

Back to the Intuitive Argument that χ2 ∼ χ2n−1

Using this little reminder about the Poisson process, we then see from A1, A2and A3 above, that assuming that the data comes in as one piece of data everytime unit is reasonable and under this assumption, we find that

E(Oi) = Ei and V (Oi) = Ei so that σ(Oi) =√Ei.

If we then define the random variable

Zi =Oi − Ei√

Ei=Oi − E(Oi)√

V (Oi)we see that E(Zi) = 0 and V (Zi) = 1.

30

Page 31: notes#10

Plots of the pdfs for the Poisson distribution

e−Ei(Ei)x

x!,

along with the normal distribution having the same mean and variance

1√2Eiπ

e− (x−Ei)

2

2Ei ,

are shown in the figures below for a means of 4, 5, 7, 9 and 11.

x1086420

0.2

0.15

0.1

0.05

0

Plots of the chi-squared distribution (thin)along with the normal distribution (thick)

having the same mean of 4 and variance of 4

and

x1086420

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

Plots of the chi-squared distribution (thin)along with the normal distribution (thick)

having the same mean of 5 and variance of 5

31

Page 32: notes#10

and

x14121086420

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0

Plots of the chi-squared distribution (thin)along with the normal distribution (thick)

having the same mean of 7 and variance of 7

and

x1614121086420

0.12

0.1

0.08

0.06

0.04

0.02

0

Plots of the chi-squared distribution (thin)along with the normal distribution (thick)

having the same mean of 9 and variance of 9

32

Page 33: notes#10

and

x20151050

0.12

0.1

0.08

0.06

0.04

0.02

0

Plots of the chi-squared distribution (thin)along with the normal distribution (thick)

having the same mean of 11 and variance of 11

From these plots, we start to see the central-limit theorem from probability, whichsays that Oi is approximately normal with mean Ei and standard deviation Ei,N(Ei, Ei), so that

Zi =Oi −Ei√

Eiis approximately N(0, 1)

provided that Ei ≥ 5. The reason for choosing Ei ≥ 5, besides the visual indica-tions in the above figures, is because if X ∼ N(µ, µ), then

Pr(X < 0) = Pr

(X − µ√

µ<0− µ√

µ

)= Φ(−√µ) = 1√

∫ −√µ

−∞e−

12x2dx.

A plot of this probability, Φ(−√µ) versus µ is shown in the following figure

33

Page 34: notes#10

543210

0.1

0.08

0.06

0.04

0.02

0

A plot of Φ(−√µ) versus µ

and it shows that Φ(−√µ) is very small, in fact

P (X < 0) = Φ(−√µ) ≤ Φ(−√5) = 0.00078

when µ ≥ 5. This shows that when N(µ, µ) is used to approximate the Poissondistribution, less than 0.00078 of the probability is in the “forbidden region” tothe left of zero.

Now going back to Equation (11), we see that

χ2 =n∑

i=1

Z2i =n∑

i=1

(Oi − Ei√

Ei

)2

must be approximately chi-squared with n degrees of freedom. However, since weare fixing the total number of data points coming in to be N , so that

n∑

i=1

Oi = N,

this then says (for example) that On is completely known once O1, O2, ..., On−1are known, which says that

χ2 =n−1∑

i=1

(Oi − Ei√

Ei

)2+

(On − En√

En

)2=

n−1∑

i=1

(Oi −Ei√

Ei

)2+ constant

or

χ2 =n−1∑

i=1

Z2i + constant.

34

Page 35: notes#10

This now makes χ2 approximately chi-squared with one less degree of freedomsince only the Zi’s for i = 1, 2, 3, ..., n−1 are independent standard normal randomvariables. This is why the statistic

χ2 =n∑

i=1

(Oi −Ei)2

Eialong with the constraint

n∑

i=1

Oi = N

is (approximately) a chi-squared distribution with n− 1 degrees of freedom. Wenow remind the reader about the form of a chi-squared distribution.

The Chi-Squared Distribution With ν Degrees of Freedom (χ2ν)

The chi-squared distribution with ν degrees of freedom (denoted by χ2ν) has apdf given by

fν(x) =1

2ν/2Γ(ν/2)xν/2−1e−x/2 (12a)

for 0 ≤ x and f(x) = 0 for x < 0, where

Γ(z) =

∫ ∞

0

tz−1e−tdt (12b)

is the gamma function evaluated at z and is equal to (z− 1)! when z is a positiveinteger. The cdf of the χ2ν distribution is given by

Fν(x) =

∫ x

0

f(z)dz =1

2ν/2Γ(ν/2)

∫ x

0

zν/2−1e−z/2dz. (12c)

Themean and variance of the χ2ν distribution are given by E(X) = ν and V (X) =2ν, respectively, and some typical plots of the pdf function fν(x) versus x areshown in the figure below.

35

Page 36: notes#10

x543210

1

0.8

0.6

0.4

0.2

0

Plots of fν(x) for ν = 1 and 2 and 3showing: f1(0) > f2(0) > f3(0)

Note that the mode of the χ2ν distribution is given by ν − 2 for 2 ≤ ν. Note alsothat the critical values in most percentage-point tables are computed using

Pr(X > χ2α,ν) = α or F (χ2α,ν) = Pr(X ≤ χ2α,ν) = 1− α.

For example, if ν = 7 and α = 0.1, then F (χ20.1,7) = 0.9 gives

∫ χ20.1,7

0

1

27/2Γ(7/2)x7/2−1e−x/2dx = 0.9

which has the solution χ20.1,7 ≃ 12.

When using the chi-squared test statistic against a uniform cdf, the test proce-dure follows these steps given that the sequence {R1, R2, R3, . . . , RN} has alreadybeen generated.

1.) Choose a value of n and partition the unit interval [0,1) into the n disjointclasses C1, C2, C3, . . ., Cn,

C1 = [0, x1) , C2 = [x1, x2) , C3 = [x2, x3) ... Cn = [xn−1, 1)

for chosen values of x0 ≡ 0 < x1 < x2 < x3 < · · · < xn−1 < 1 ≡ xn. It isrecommended ( but not necessary) that all n classes have the same size bymaking xi = i/n for i = 1, 2, 3, ..., n− 1, so that

C1 = [0, 1/n), C2 = [1/n, 2/n), C3 = [2/n, 3/n), ..., Cn = [(n−1)/n, 1).

36

Page 37: notes#10

2.) Compute Oi as

Oi = # of {R1, R2, R3, . . . , RN} in Ci

for each i = 1, 2, 3, . . . , n.

3.) Compute Ei using Ei = (xi − xi−1)N as predicted by the standard uniformdistribution U [0, 1), for each i = 1, 2, 3, . . . , n, and (as demonstrated earlier)it is recommended that the ith class is large enough so that Ei ≥ 5. Whenusing classes of equal size, we have Ei = N/n for each value of i, and thenEi ≥ 5 says that we should choose n so that n ≤ N/5. In fact, it is usuallybest to choose n so that √

N ≤ n ≤ N/5

when N ≥ 25.

4.) Compute the sample statistic

χ2 =n∑

i=1

(Oi −Ei)2

Ei∼ χ2n−1.

5.) Determine the critical value, χ2α,n−1, from either a chi-squared table or fromthe equation

Pr(X ≤ χ2α,n−1) =

∫ χ2α,n−1

0

1

2(n−1)/2Γ((n− 1)/2)x(n−1)/2−1e−x/2dx = 1− α.

for a specified significance level α and the value of n.

6.) If the sample statistic χ2 is greater than the critical value, χ2α,n−1, the nullhypothesis that the sample data are a sample from a standard uniformdistribution is rejected. If χ2 ≤ χ2α,n−1, then we conclude that no differencehas been detected between the true distribution of {R1, R2, R3, . . . , RN} andthe standard uniform distribution.

37

Page 38: notes#10

Example #8: The Chi-Squared (χ2) Test

Consider the set of N = 100 “random” numbers in the following table.

0.34 0.90 0.25 0.89 0.87 0.44 0.12 0.21 0.46 0.670.83 0.76 0.79 0.64 0.70 0.81 0.94 0.74 0.22 0.740.96 0.99 0.77 0.67 0.56 0.41 0.52 0.73 0.99 0.020.47 0.31 0.17 0.82 0.56 0.05 0.45 0.31 0.78 0.050.79 0.71 0.23 0.19 0.82 0.93 0.65 0.37 0.39 0.420.99 0.17 0.99 0.46 0.05 0.66 0.10 0.42 0.18 0.490.37 0.51 0.54 0.01 0.81 0.28 0.69 0.34 0.75 0.490.72 0.43 0.56 0.97 0.30 0.94 0.96 0.58 0.73 0.050.06 0.39 0.84 0.24 0.40 0.64 0.40 0.19 0.79 0.620.18 0.26 0.97 0.88 0.64 0.47 0.60 0.11 0.29 0.78

Using the n (equally length) classes:

C1 = [0, 1/n), C2 = [1/n, 2/n), C3 = [2/n, 3/n), ..., Cn = [(n− 1)/n, 1)so that

Ei =

(i

n− i− 1

n

)N =

N

n

for each class, we have (for n = 10) the 10 classes:

C1 = [0, 0.1) , C2 = [0.1, 0.2) , C3 = [0.2, 0.3) · · · C10 = [0.9, 1),

and the expected value for each class is Ei = 100/10 = 10 ≥ 5. Using these 100numbers we generate the next table.

Class Oi Ei Oi − Ei (Oi −Ei)2 (Oi −Ei)

2/Ei

C1 7 10 −3 9 0.9C2 9 10 −1 1 0.1C3 8 10 −2 4 0.4C4 9 10 −1 1 0.1C5 14 10 +4 16 1.6C6 7 10 −3 9 0.9C7 10 10 0 0 0.0C8 15 10 +5 25 2.5C9 9 10 −1 1 0.1C10 12 10 +2 4 0.4

− 100 100 0 − χ2 = 7.0

38

Page 39: notes#10

The results of this table show that χ2 = 7.0, and since

∫ χ20.05,9

0

1

29/2Γ(9/2)x9/2−1e−x/2dx = 1− 0.05

leads to1

210

√2

π

∫ χ20.05,9

0

x7/2e−x/2dx = 0.95

resulting in χ20.05,9 ≃ 16.919, we see that χ2 < χ20.05,9, and so the null hypothesisthat the 100 numbers come from a standard uniform distribution should not berejected on the bases of this test and significance level. �

Both the Kolmogorov-Smirnov and the chi-squared test are acceptable fortesting the uniformity of a sample of data, and the Kolmogorov-Smirnov test isthe more powerful of the two since it directly compares cdfs, and so it is themore recommended of the two. Furthermore the Kolmogorov-Smirnov test canbe applied to small sample sizes, whereas the chi-squared test is valid only forlarge samples so that each Ei ≥ 5.

Testing for uniformity is certainly important but it does not tell the wholestory. It should be noted that the order in which the Ri’s are computed has no ef-fects on the conclusions drawn from the Kolmogorov-Smirnov and the chi-squaredtests for uniformity but the order in which the Ri’s are computed is certainly im-portant from the perspective of giving the appearance of independence, as thenext example shows.

Example #9: A Perfect Random-Number Sequence Or Not

Consider the sequence

{X1,X2,X3, . . . ,XN}

generated using X0 = m− 1 and

Xi+1 = (Xi + 1)mod(m).

Such a sequence must always lead to

{X1,X2, X3, . . . ,Xm} = {0, 1, 2, 3, . . . ,m− 1, ...}

39

Page 40: notes#10

which then repeats in a maximum cycle of length m. It is clear that the resulting“random numbers”

{R1, R2, R3, . . . , Rm} = {0, 1/m, 2/m, 3/m, . . . , (m− 1)/m, ...}

would easily pass any Kolmogorov-Smirnov and chi-squared tests since we wouldalways find that Dm = 0 and χ2 = 0 for any choice of classes C1, C2, C3, ...,Cn. Yet such a sequence definitely does not look random. This set of numberswould pass all possible frequency tests with ease, but the ordering of the numbersproduced by the generator would not be random and so these numbers would notpass any tests for independence. �

In fact, in general, one can take any sequence of random numbers that wouldpass all possible frequency tests and simply rearrange them (i.e., in increasingorder) and these same numbers would easily fail any type of independence test.

There are many tests for independence, some of which include:

• Runs Test : Tests the runs up and down or the runs above and belowthe mean by comparing the actual values to the expected values aspredicted by the standard uniform distribution U [0, 1). The statisticfor comparison are the Standard Normal Distribution and The Chi-Squared Distribution.

• Autocorrelation Test: Tests the correlation between the generated num-bers and compares the sample correlation to the expected correlationof zero as predicted by the standard uniform distribution U [0, 1). Thestatistic for comparison is the Standard Normal Distribution.

• Gap Test : Counts the number of digits that appear between repeti-tions of a particular digit and then uses the Kolmogorov-Smirnov (KS)test to compare this with the expected size of gaps as predicted by ageometric distribution. The statistic for comparison is the GeometricDistribution.

• Poker Test : Treats numbers grouped together as a “poker hand”. Forexample, a five-digit number 0.11433 can be though of as a five-cardpoker hand having two pairs, or 0.2222 can be though of as a five-cardpoker hand having five of a kind, and so on. Then, a chi-squared (χ2)statistic is used to compare the frequency of these “poker hands” to

40

Page 41: notes#10

what is expected based on a deck of 50 cards having: five 0’s, five1’s, five 2’s, and so on up to five 9’s (in the case of five-digit randomnumbers).

These tests lie outside the scope of the ENM 503 courses and are discussed indetail in the ESE 603 course. Now that we know how to generate a set of randomnumbers which is a sample from the standard uniform distribution U [0, 1), let ussee how these can be converted into a sample from any random variable X.

6. Using Random Numbers to Generate Random Samples of X

This section deals with one common procedure for converting a set of randomnumbers

{R1, R2, R3, ..., RN}into a random sample

{X1,X2,X3, ...,XN}from a random variable X that has either a continuous or discrete distribution.Although many of the standard simulation programs generate these random vari-ates (for many of the standard random variables discussed in probability andstatistics) using subroutines and functions, it is still important to understandhow random-variate generation occurs just in case you are faced with a randomvariable X that has a distribution that is not covered by the standard simulationprograms.

The method we shall discuss is called the inverse transform method, but othermethods, such as the convolution method, the acceptance-rejection method, andthe composition method are also important, but these lie outside the scope of theENM 503 course but they are covered in the ESE 603 course.

We assume from the start that a source of uniform random numbers

{R1, R2, R3, . . . , RN} ∼ U [0, 1)

is readily available and throughout this section the symbolR and {R1, R2, R3, . . . , RN}represent random numbers uniformly distributed on [0, 1).

41

Page 42: notes#10

The Inverse Transform Technique

The inverse transform method is based on the fact that if f(x) and F (x) arethe pdf and cdf, respectively, of some random variable X, then Z = F (X) has auniform distribution on the interval [0, 1), i.e.,

Z = F (X) ∼ U [0, 1).

This is an incredibly simple yet powerful result. To prove this, we simply notethat since 0 ≤ F (X) ≤ 1, then 0 ≤ Z ≤ 1, and so the possible values of Z arebetween 0 and 1, and if g(z) and G(z) are the pdf and cdf, respectively, for Z,then

G(z) = Pr(Z ≤ z) = Pr(F (X) ≤ F (x)).

But F (x) is a monotonically increasing function of x and so F (X) ≤ F (x) impliesX ≤ x, and hence

G(z) = Pr(F (X) ≤ F (x)) = Pr(X ≤ x) = F (x) = z,

since z = F (x) from the definition of Z . Then

g(z) =dG(z)

dz= 1

and so we see that the pdf of Z = F (X) is

g(z) =

1, for 0 ≤ z < 1

0, for otherwise,

which is also the pdf of a uniform distribution on the interval [0, 1), and so wehave shown that if F (x) is the cdf of some random variable X, then Z = F (X) ∼U [0, 1).

This means that (in theory) if F (x) is the cdf of some random variable X,then R = F (X) is the continuous standard uniform distribution on the interval[0, 1), and then X = F−1(R), the inverse function of F , (which always existssince F (x) is a monotonically increasing function of x) has distribution with pdff(x) = F ′(x). Therefore if

{R1, R2, R3, . . . , RN}

42

Page 43: notes#10

is a random sample from U [0, 1), then

{F−1(R1), F

−1(R2), F−1(R3), . . . , F

−1(RN)}

becomes a random sample from the random variableX having cdf F (x). Note thatin practice, it may be very difficult (if not impossible) to get a simple algebraicform for F (X), and even if it’s possible to get a simple algebraic form for F (X), itmay be very difficult (if not impossible) to get a simple algebraic form for F−1(R).For this reason, other methods such as the acceptance-rejection method have beendeveloped and this method is discussed in detail in the ESE 603 course. Let usnow look at a few examples.

Example #10: Exponential Distribution with Parameter λ

The exponential distribution with parameter λ > 0 has pdf

f(x) =

λe−λx, for 0 ≤ x

0, for x < 0

and cdf given by

F (x) =

∫ x

−∞f(z)dz =

1− e−λx, for 0 ≤ x

0, for x < 0.

Then R = F (X), yields R = 1− e−λX . Solving for X, we get

X = F−1(R) = −1λln(1−R)

so that if {R1, R2, R3, . . . , RN} is a random sample from U [0, 1), then

{X1,X2,X3, . . . ,XN} with Xi = −1

λln(1− Ri)

for i = 1, 2, 3, . . . , N , becomes a random sample from an exponential distributionwith parameter λ > 0. Note that since R and 1−R are both uniform distributionson the interval [0, 1), we may just use

X = −1λln(R) instead of X = −1

λln(1− R).

43

Page 44: notes#10

This removes the need for the operations that subtracts each of the Ri’s from 1,which could result in considerable computer savings time especially if the valueof N (i.e., the size of the random sample) is very large. In Excel, this allows oneto use

−1λln(RAND())

to generate samples of X ∼ Exp(λ). �

Example #11: Uniform Distribution on the Interval [a, b]

The uniform distribution on the interval [a, b] has pdf

f(x) =

1/(b− a), for a ≤ x ≤ b

0, otherwise

and cdf given by

F (x) =

∫ x

−∞f(z)dz =

0, for x ≤ a

(x− a)/(b− a), for a ≤ x ≤ b

1, for b ≤ x

.

Then R = F (X) gives R = (X − a)/(b− a). Solving for X yields

X = F−1(R) = a+ (b− a)R

so that if {R1, R2, R3, . . . , RN} is a random sample from U [0, 1), then

{X1, X2, X3, . . . , XN} with Xi = a+ (b− a)Ri

for i = 1, 2, 3, . . . , N , is a random sample from U [a, b). In Excel, this allows oneto use

a+ (b− a) ∗ RAND()

to generate samples of X ∼ U [a, b). �

44

Page 45: notes#10

Example #12: Triangular Distribution with Parameters a, b and c

The triangular distribution with parameters a, b, c, and h = 2/(c−a), has pdf

f(x) =

0, for x ≤ a

h(x− a)/(b− a), for a ≤ x ≤ b

h(c− x)/(c− b), for b ≤ x ≤ c

0, for elsewhere

and cdf given by

F (x) =

∫ x

−∞f(z)dz =

0, for x ≤ a

12h(x− a)2/(b− a), for a ≤ x ≤ b

1− 12h(c− x)2/(c− b), for b ≤ x ≤ c

1, for c ≤ x

.

The shape of f , illustrated in the figure below,

403020100

0.05

0.04

0.03

0.02

0.01

0

Plot of f(x) versus x usinga = 5, b = 15 and c = 40

explains its name. To determine a sample of X from a random number R, we

45

Page 46: notes#10

may use the inverse transform method and set R = F (X), yielding

R =h(X − a)2

2(b− a)when 0 = F (a) ≤ R ≤ F (b) =

1

2h(b− a)

and

R = 1− h(c−X)2

2(c− b)when

1

2h(b− a) = F (b) ≤ R ≤ F (c) = 1.

Solving each of these for X leads to

X = F−1(R) = a +

√2R(b− a)

hwhen 0 ≤ R ≤ 1

2h(b− a)

and

X = F−1(R) = c−√2(1− R)(c− b)

hwhen

1

2h(b− a) ≤ R ≤ 1.

Replacing h by 2/(c− a), we then have

X = a +√(b− a)(c− a)R when 0 ≤ R ≤ b− a

c− a

and

X = c−√(c− b)(c− a)(1− R) when

b− a

c− a≤ R ≤ 1.

Therefore if {R1, R2, R3, . . . , RN} is a random sample from U [0, 1), then

{X1,X2,X3, . . . ,XN}

with

Xi =

a+√(b− a)(c− a)Ri, for 0 ≤ Ri ≤ (b− a)/(c− a)

c−√(c− b)(c− a)(1− Ri), for (b− a)/(c− a) ≤ Ri ≤ 1

for i = 1, 2, 3, . . . , N , is a random sample from a triangular distribution withparameters a, b and c. In Excel, this allows one to use either

a+√(b− a)(c− a)RAND() or c−

√(c− b)(c− a)(1− RAND())

46

Page 47: notes#10

to generate samples of X ∼ Tri(a, b, c), depending on where RAND() falls relativeto the quantity (b− a)/(c− a). �

Newton Iteration Method - Optional

While it might be possible to obtain a simple algebraic form for F (X), itmay not be possible to obtain a simple algebraic form for F−1(R). Under theseconditions, a Newton Method of iteration may be used to solve R = F (X) for agiven value of X from a given value of R. For example, suppose that a continuousrandom variable X has a pdf given by

f(x) =10

3(x− x4)

for 0 ≤ x ≤ 1, and zero otherwise. A plot of this pdf is shown in the figure below.

x10.80.60.40.20

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

Plot of f(x) = 10(x− x4)/3versus x for 0 ≤ x ≤ 1

The cdf of X is given by

F (x) =10

3

∫ x

0

(t− t4)dt =5

3x2 − 2

3x5

for 0 ≤ x ≤ 1, which is a simple algebraic form. However, solving for X giventhat

R = F (X) =5

3X2 − 2

3X5

47

Page 48: notes#10

is not very easy to do analytically. From Newton’s Iteration in Calculus, we knowthat one method for solving an equation of the form g(x) = 0 is to generate asequence {x0, x1, x2, x3, . . . , xn, . . .} using an initial guess x0, and then using

xn+1 = xn −g(xn)

g′(xn).

If the initial guess x0 is not too far away from a solution to g(x) = 0, then

limn→∞

xn = a solution to g(x) = 0.

Therefore if we want to solve for X given that R = F (X), then we let g(X) =F (X)−R and get g′(X) = F ′(X) = f(X), where f is the pdf of X, so that

Xn+1 = Xn −g(Xn)

g′(Xn)= Xn −

F (Xn)− R

f(Xn)

(with an initial guess of 0 < X0 < 1) could generate a sequence of values thatconverge to X in which R = F (X). For example, earlier we had f(x) = 10(x −x4)/3 and F (x) = 5x2/3− 2x5/3, and so solving

R = F (X) =5

3X2 − 2

3X5

leads to

Xn+1 = Xn −F (Xn)−R

f(Xn)= Xn −

( 53X2n − 2

3X5n −R

103(Xn −X4

n)

)

which reduces to

Xn+1 = Xn −1

10

(2X5

n − 5X2n + 3R

Xn(X3n − 1)

).

For R = 0.3 and a starting point guess of X0 = 0.5 we generate the followingtable.

n Xn

0 0.50000000001 0.43428571432 0.43124493493 0.43123707154 0.43123707165 0.43123707156 0.4312370716

48

Page 49: notes#10

which converges very quickly to X ≃ 0.43123707. Therefore the equation

R = F (X) =5

3X2 − 2

3X5 = 0.3

has X ≃ 0.43123707 as the unique solution between zero and one. A plot of F (x)versus x along with the horizontal line at R = 0.3 is shown in the figure belowand this shows the point of intersection at (X = 0.431, R = 0.3).

x10.80.60.40.20

1

0.8

0.6

0.4

0.2

0

Plots of F (x) = 5x2/3− 2x5/3and R = 0.3 showing the pointof intersection at x ≃ 0.4312

This means that the random number R = 0.3 leads to the random variate X ≃0.4312.

Example #13: An Empirical Continuous Distribution

Consider an empirical continuous distribution with points

{α ≡ x0, x1, x2, x3, . . . , xn ≡ β} ordered so that x0 ≤ x1 ≤ x2 ≤ x3 ≤ · · · ≤ xn

with respective probabilities

{p1, p2, p3, . . . , pn} ,n∑

k=1

pk = 1

where pi = Pr{xi−1 ≤ X ≤ xi} for i = 1, 2, 3, . . . , n. These can be organized in

49

Page 50: notes#10

the following table

Random Variable Random Variable CumulativeIntervals Probabilities Intervals Probabilites

x0 ≤ X ≤ x1 p1 X ≤ x1 c1x1 ≤ X ≤ x2 p2 X ≤ x2 c2x2 ≤ X ≤ x3 p3 X ≤ x3 c3

......

......

xn−1 ≤ X ≤ xn pn X ≤ xn cn

where c1 = p1, and ck = ck−1 + pk, for k = 2, 3, 4, . . . , n, are the cumulativeprobabilities. This says that the cumulative distribution function F (X), containsthe points

{(x0, c0), (x1, c1), (x2, c2), (x3, c3), . . . , (xn, cn)}where c0 ≡ 0, F (xk) = ck for k = 1, 2, 3, . . . , n, and cn = 1. Using linear inter-polation, we may construct a continuous cumulative distribution function (cdf)by

F (x) =

0, for x ≤ x0

c0 +m0(x− x0), for x0 ≤ x ≤ x1

c1 +m1(x− x1), for x1 ≤ x ≤ x2

c2 +m2(x− x2), for x2 ≤ x ≤ x3

......

...

cn−1 +mn−1(x− xn−1), for xn−1 ≤ x ≤ xn

1, for xn ≤ x

where the slopes are given by

mi =ci+1 − cixi+1 − xi

50

Page 51: notes#10

for i = 0, 1, 2, . . . , n− 1. Then setting R = F (X) and solving for X yields

X = F−1(R) =

x0 + (R− c0)/m0, for c0 ≤ R ≤ c1

x1 + (R− c1)/m1, for c1 ≤ R ≤ c2

x2 + (R− c2)/m2 for c2 ≤ R ≤ c3

......

...

xn−1 + (R− cn−1)/mn−1 for cn−1 ≤ R ≤ cn

so that if {R1, R2, R3, . . . , RN} is a random sample from U [0, 1), then

{X1,X2,X3, . . . ,XN}with

Xi =

x0 + (Ri − c0)/m0, for c0 ≤ Ri ≤ c1

x1 + (Ri − c1)/m1, for c1 ≤ Ri ≤ c2

x2 + (Ri − c2)/m2 for c2 ≤ Ri ≤ c3

......

...

xn−1 + (Ri − cn−1)/mn−1 for cn−1 ≤ Ri ≤ cn

for i = 1, 2, 3, . . . , N , is a random sample from an empirical distribution describedby the above table. Note that a non-linear interpolation scheme (e.g., cubic) maybe used if more smoothness is desired in the cdf function. �

Example #14: Continuous Distributions Without a Closed-Form Cdf

A number of very useful distributions (such as the normal, gamma, and betadistributions) do not have simple closed-forms for their cdf F (x), or its inverse,and hence using the inverse transform method to generate random variates maybe very difficult. For example, if X ∼ N(0, 1), then

F (x) =1√2π

∫ x

−∞e−

12t2dt ≡ Φ(x)

51

Page 52: notes#10

does not have a simple closed form. If we are willing to approximate the inverse ofthe cdf, then we may still be able to generate these random variates. For example,starting from F (x), we may choose a value of n and values of x1, x2, x3, . . ., xn,and construct the following table

Random Variable Values Cumulative(Increasing Order) Probabilites

x1 c1 = Pr(X ≤ x1) = F (x1)x2 c2 = Pr(X ≤ x2) = F (x2)x3 c3 = Pr(X ≤ x3) = F (x3)...

...xn cn = Pr(X ≤ xn) = F (xn)

like that in the previous example and then we may use the method described inthe previous example to generate a random sample from X. This says that if{R1, R2, R3, . . . , RN} is a random sample from U [0, 1), then

{X1,X2,X3, . . . ,XN}

with

Xi =

x0 + (Ri − F (x0))/m0, for F (x0) ≤ Ri ≤ F (x1)

x1 + (Ri − F (x1))/m1, for F (x1) ≤ Ri ≤ F (x2)

x2 + (Ri − F (x2))/m2 for F (x2) ≤ Ri ≤ F (x3)

......

...

xn−1 + (Ri − F (xn−1))/mn−1 for F (xn−1) ≤ Ri ≤ F (xn)

and F (x0) = 0, F (xn) = 1, and

mi =F (xi+1)− F (xi)

xi+1 − xi,

for i = 1, 2, 3, . . . , N , is an approximation to a random sample from the distribu-tion having cdf F (x). The larger we make the value of n, and the smaller we make

52

Page 53: notes#10

the intervals [xi, xi+1], the better the approximation, but also the more computerwork involved and so the slower the algorithm. �

Discrete Distributions

Samples from discrete distributions can also be generated using the inversetransform method, either numerically through a table look-up procedure, or insome cases algebraically with the final generation scheme in terms of a formulainvolving the ceiling and/or floor functions.

Note that for the sake of this discussion involving discrete distributions weshall assume that R ∼ U(0, 1], which includes 1 but not 0. This is done simplyout of convenience and since

U(0, 1) = U(0, 1] = U [0, 1) = U [0, 1],

it really does not matter as long as N (the sample size) is large. After all, gettingexactly R = 0 or R = 1 should be very rare events. Let us illustrate the ideaswith some examples.

Example #15: An Empirical Discrete Distribution

Suppose we have a random variable X with a discrete range space

RX = {x1, x2, x3, . . . , xn}

and corresponding probabilities {p1, p2, p3, . . . , pn}. Then we may construct thefollowing table of cumulative probabilities.

Random Variable Values Probabilities Cumulative(Increasing Order) Probabilites

x1 p1 c1 = p1x2 p2 c2 = c1 + p2x3 p3 c3 = c2 + p3...

......

xn pn cn = cn−1 + pn = 1

53

Page 54: notes#10

This says that the cdf of X is a step function defined by

F (x) =

c0 ≡ 0, for x < x1

c1, for x1 ≤ x < x2

c2, for x2 ≤ x < x3

c3, for x3 ≤ x < x4

......

...

cn−1, for xn−1 ≤ x < xn

cn ≡ 1, for xn ≤ x

.

Note that F is not continuous and it should not be made continuous using someinterpolation scheme. When applying the inverse transform method in this case,we note that if

{R1, R2, R3, . . . , RN}is a random sample from U(0, 1], then

{X1,X2,X3, . . . ,XN}

with

Xi = F−1(Ri) =

x1, for 0 = c0 < Ri ≤ c1

x2, for c1 < Ri ≤ c2

x3, for c2 < Ri ≤ c3

......

...

xn, for cn−1 < Ri ≤ cn = 1

is a random sample from an empirical distribution described by the above table.�

54

Page 55: notes#10

Note that for generating discrete random variables, the inverse transform tech-nique becomes a table look-up procedure, and unlike the case of a continuousvariable, interpolation should not be done. However, if the values of xi in theabove table are such that xi+1 − xi is a constant (independent of i), then theceiling and/or floor functions may be used along with the expression for F−1(R)to generate a sample from a discrete distribution X, as we shall now demonstrate.But first, let us be reminded of the ceiling and floor functions.

The Ceiling (Round Up) and Floor (Round Down) Functions

If x is a real number, the ceiling or round up of x is denoted by ⌈x⌉ and definedby

⌈x⌉ = The smallest integer greater than or equal to x (13a)

and the floor or round down of x is denoted by ⌊x⌋ and defined by

⌊x⌋ = The largest integer less than or equal to x. (13b)

Plots of these are shown below.

x420-2-4

4

2

0

-2

-4

Plot of ⌈x⌉ versus x

55

Page 56: notes#10

and

x420-2-4

4

2

0

-2

-4

Plot of ⌊x⌋ versus x

Note that in general⌊x⌋ ≤ x ≤ ⌈x⌉ (13c)

for any real number x, and⌈x⌉ = 1 + ⌊x⌋ (13d)

for any non-integer real number x. Of course, if x is an integer, Equation (13d)is not valid and the inequalities in Equation (13c) become equalities. In addition,if x is an integer satisfying a ≤ x < a + 1, then we must have x = ⌈a⌉ andif x is an integer satisfying a < x ≤ a + 1, we must have x = ⌊a + 1⌋. Asmentioned, the ceiling (round up) and the floor (round down) functions can oftenbe use along with F−1(R) to construct a sample from a discrete random variableX when xi+1 − xi is a constant (independent of i) and let us now look at someexamples.

Example #16a: A Discrete Uniform Distribution

Consider the discrete uniform distribution on the set

S = {a+ b, a + 2b, a+ 3b, . . . , a+ kb}

(denoted by UD[a, b]) for fixed values of a and b > a, with pmf p(x) = 1/k for all

56

Page 57: notes#10

x in S, and cdf given by

F (x) =

0/k, for x < a+ b

1/k, for a + b ≤ x < a + 2b

2/k, for a + 2b ≤ x < a + 3b

......

...

(k − 1)/k, for a + (k − 1)b ≤ x < a+ kb

1, for a + kb ≤ x

.

Then we note that xi+1 − xi = b (independent of i) and

Xi = F−1(Ri) =

a + b, for 0 < Ri ≤ 1/k

a + 2b, for 1/k < Ri ≤ 2/k

a + 3b, for 2/k < Ri ≤ 3/k

......

...

a + kb, for (k − 1)/k < Ri ≤ k/k

which we may write as

Xi = F−1(Ri) =

a + b, for 0 < kRi ≤ 1

a + 2b, for 1 < kRi ≤ 2

a + 3b, for 2 < kRi ≤ 3

......

...

a + kb, for k − 1 < kRi ≤ k

.

57

Page 58: notes#10

Using the ceiling function, we may write all of this as simply

Xi = F−1(Ri) = a+ ⌈kRi⌉b.

Note that this simple form for Xi using the ceiling function is made possiblebecause we are assuming in this section that R ∼ U(0, 1]. Thus we find that if

{R1, R2, R3, ..., RN}

is a random sample from U(0, 1], then

{X1,X2,X3, ...,XN} with Xi = a+ ⌈kRi⌉b (14)

is a random sample of the discrete uniform distribution on the set

{a+ b, a + 2b, a+ 3b, . . . , a+ kb}.

In Excel, this allows one to use

a+ b ∗ ceiling(kRAND())

to generate samples of X ∼ DU [a, b]. �

Example #16b: A Discrete Uniform Distribution - Acceptance/Rejection

Consider again the discrete uniform random variable X defined on the set ofk equally spaced values

S = {a+ b, a + 2b, a+ 3b, . . . , a+ kb}

(denoted by DU [a, b]) for fixed values of a and b > a, with pmf p(x) = 1/k for allx in S. Consider also the iterative scheme in which

Zi+1 = (aZi + c)mod(m)

where m ≥ k, i = 0, 1, 2, ...,M and Z0 is some seed. Excluding the seed Z0, thiswill generate a sequence of M integers

{Z1, Z2, Z3, ..., ZM}

that have values between 0 and m − 1, inclusive. Setting r = [m/k], which isthe greatest integer less than or equal to m/k (which is also the floor of x), let

58

Page 59: notes#10

us agree that all those values of Zi satisfying 0 ≤ Zi < r will be assigned thevalue of Xi = a + b, all those values of Zi satisfying r ≤ Zi < 2r will be assignedthe value of Xi = a + 2b, all those values of Zi satisfying 2r ≤ Zi < 3r will beassigned the value of Xi = a+3b, and so on up to all those values of Zi satisfying(k − 1)r ≤ Zi < kr will be assigned the value of Xi = a + kb. In other words allvalues of Zi satisfying

(j − 1)r ≤ Zi < jr → Xi = a+ jb

for j = 1, 2, 3, ..., k are accepted and assigned the value Xi = a+ jb. Since

(j − 1)r ≤ Zi < jr ⇒ j − 1 ≤ Zi/r < j

we see thatj = [Zi/r] and so Xi = a+ [Zi/r]b

can be used to compute the value of Xi from the value of Zi whenever [Zi/r] ≤ k,and any value of Zi ≥ kr is rejected. The number of acceptable value of Zi is thenequal to kr = k[m/k], and the number of rejected values is m−kr = m−k[m/k].It should be noted that

jr − (j − 1)r = r (a result not dependent on j)

of the Zi’s will be assigned the value of Xi = a + jb and since kr numbers areaccepted, the fraction r/kr = 1/k of the numbers accepted are assigned the valueXi = a+ jb for each j, showing that each value of Xi has the same probability of1/k of occurring, so it is doing the right thing. This method has the advantageof using only integer arithmetic but it does require that some numbers are dis-carded. The efficiency of this acceptance/rejection method can be measured bythe quantity

c =kr

m=k[m/k]

m=[m/k]

m/k

so that the closer c is to 1, the better. A plot of [x]/x is shown in the figure below,

59

Page 60: notes#10

252015105

1

0.9

0.8

0.7

0.6

0.5

0.4

Plot of [x]/x versus x for x ≥ 1.The dotted curves are 1 and 1− 1/x.

and since

1− 1x≤ [x]

x≤ 1

for all x > 1, we see that the efficiency of the method is better than 0.9 (90%)when m/k ≥ 10. �

Example #17: The Geometric Distribution

Consider the geometric distribution with pmf

p(x) = p(1− p)x−1

for x = 1, 2, 3, . . .. Its cdf is given by

F (x) =x∑

k=0

p(1− p)k−1 = p

(1− (1− p)x

1− (1− p)

)= 1− (1− p)x

for x = 1, 2, 3, . . . and we note that xi+1 − xi = 1 (independent of i). Using theinverse transform method we see that if R is a random number satisfying

F (x− 1) < R ≤ F (x)

then X = x. Since F is a non-decreasing function, we may rewrite this set ofinequalities as

x− 1 < F−1(R) ≤ x or simply F−1(R) ≤ x < 1 + F−1(R).

60

Page 61: notes#10

Using the ceiling function, this says that

X = ⌈F−1(R)⌉.

In the case of the geometric distribution, we have F (X) = 1− (1− p)X = R, sothat

(1− p)X = 1−R or X = F−1(R) =ln(1− R)

ln(1− p),

and hence

X =⌈F−1(R)

⌉=

⌈ln(1− R)

ln(1− p)

⌉.

Therefore, if {R1, R2, R3, . . . , RN} is a random sample from U(0, 1], then

{X1, X2,X3, . . . , XN} with Xi =

⌈ln(1− Ri)

ln(1− p)

⌉(15)

is a random sample from a geometric distribution having parameter p and rangespace S = {1, 2, 3, ...}. In Excel, this allows one to use

ceiling

(ln(1−RAND())

ln(1− p)

)

to generate samples of X ∼ geometric(p). �

In a more general setting suppose that X is a discrete distribution with rangespace

RX = {a1, a2, a3, . . . , an, . . .}where

a1 < a2 < a3 < · · · < an < · · ·and suppose that F (x) is the cdf of X. Then for a given random number R, thevalue of X is chosen by the condition X = ak when F (ak−1) < R ≤ F (ak).

7. Using Simulation for Parameter Estimation

The heart of using simulation for parameter estimation lies in The Strong Lawof Large Numbers from probability which is probably one of the best-known re-sults in probability theory. It simply states that the sample average of a sequence

61

Page 62: notes#10

of independent random variables X1. X2, X3, ..., Xn having a common distrib-ution will, with probability 1, converge to the mean µ = E(X), of that commondistribution. In other words,

Pr

(

limn→∞

1

n

n∑

i=1

Xi = µ

)

= 1.

Therefore, simulation is ideal for approximating the average (or expected value)of a random variable by simply computing

1

N

N∑

i=1

Xi

for a large number of samples N .

Example #18: Computing Averages Using Simulation

Suppose that a triangle is to be constructed from three points (0, 0), (X, 0)and (Y,Z), as shown in the following figure,

The triangle having points (0, 0)-lower left,(X, 0)-lower right and (Y, Z)-upper

where X, Y and Z are all independent each coming from the standard uniformdistribution U [0, 1). First we would like to analytically compute the average areaof such a triangle. To solve this exactly, we first compute the area of one suchtriangle as

A =1

2XZ

62

Page 63: notes#10

and then since each of X, Y and Z are from U [0, 1), we have

A =

∫ 1

0

∫ 1

0

∫ 1

0

1

2xzdxdydz =

1

8.

The “area” worksheet that accompanies this chapter shows the result obtainedusing a simulation for N = 5000 samples and it agrees rather nicely with theresult of 1/8.

A more difficult analytical calculation is to compute the average perimeter ofsuch a triangle since the perimeter of one such triangle is

P = X +√Y 2 + Z2 +

√(Y −X)2 + Z2

resulting in

P =

∫ 1

0

∫ 1

0

∫ 1

0

(x+√y2 + z2 +

√(y − x)2 + z2)dxdydz

which we may write as

P =1

2+

∫ 1

0

∫ 1

0

√y2 + z2dydz +

∫ 1

0

∫ 1

0

∫ 1

0

√(y − x)2 + z2dxdydz.

Using numerical integration, we find that∫ 1

0

∫ 1

0

√y2 + z2dydz ≃ 0.7652

and ∫ 1

0

∫ 1

0

∫ 1

0

√(y − x)2 + z2dxdydz ≃ 0.65176

and so P ≃ 0.5 + 0.7652 + 0.65176 = 1.917. The “perimeter” worksheet thataccompanies this chapter shows the result obtained using a simulation for N =5000 samples and one should note that this calculation using simulation is reallyno more difficult than the area calculation using simulation. This is one bigadvantage with Monte-Carlo simulation. �

We see how simulation can be used to estimate averages by using sample meansand

1

N

N∑

i=1

Xi

63

Page 64: notes#10

for a large number of samples N . We may also estimate k moments using

1

N

N∑

i=1

Xki

for k = 1, 2, 3, ..., and variances using the statistic

1

N

N∑

i=1

X2i −

(1

N

N∑

i=1

Xi

)2,

and coefficients of variation using the statistic

√√√√√√√√

1N

N∑

i=1

X2i

(1N

N∑

i=1

Xi

)2 − 1

for a large number of samples N . Let us now see how we may use simulation toestimate probabilities.

Computing Probabilities Using The Strong Law of Large Numbers

Using simulation to compute probabilities is an important application of thestrong law of large numbers. It works by constructing probabilities as expectedvalues. Toward this end, suppose that a sequence of independent trials of someexperiment is performed and suppose that E is some fixed event of the experimentand suppose thatE occurs with probability Pr(E) on any particular trial. Definingthe random variable X by

X =

1, if E does occur on the ith trial

0, if E does not occur on the ith trial,

for i = 1, 2, 3, ..., we note that

E(X) = (1) Pr(E) + (0)Pr(E) = Pr(E)

64

Page 65: notes#10

showing that the expected value of X is the same as the probability of the occur-rence of E. Therefore, letting

Xi =

1, if E does occur on the ith trial

0, if E does not occur on the ith trial,

for i = 1, 2, 3, ..., then by the strong law of large numbers

limn→∞

1

n

n∑

i=1

Xi = E(X) = P (E)

or

Pr(E) = limn→∞

1

n

n∑

i=1

Xi which we may also write as Pr(E) ≃ 1

N

N∑

i=1

Xi

for large N . This result is very important in simulation because it allows usto compute probabilities using only expected values. In fact, we shall now endthis Chapter by using this idea to answer the problem that was proposed at thebeginning of this chapter.

8. A Solution to the Problem in Section 1

In section 1 at the beginning of this chapter, the following problem was pro-posed. Suppose that two coins of radii R1 and R2 are thrown on a rectangularsheet of paper having length L and width W so that the position of each coin’scenter uniformly lands somewhere on the sheet of paper. Given these conditions,we were asked to compute (in terms of the inputs: L, W , R1 and R2) the proba-bility that the two coins overlap. By placing the rectangle is on the xy plane asthe region

R = {(x, y)|0 ≤ x ≤ L, 0 ≤ y ≤ W}and letting (X1, Y1) give the coordinates of the center of coin 1 and if (X2, Y2) givethe coordinates of the center of coin 2, then, under the conditions of the problem,we have

X1 ∼ U [0, L) , Y1 ∼ U [0,W )

andX2 ∼ U [0, L) , Y2 ∼ U [0,W ).

65

Page 66: notes#10

and the two coins will overlap when the distance between their centers,

D =√(X2 −X1)2 + (Y2 − Y1)2

is less then or equal to the sum of their radii, i.e., when D ≤ R1+R2. To computethe probability that the two coins overlap requires that we compute

P = Pr(D ≤ R1 +R2).

Using simulation to solve this problem, we first use our random number generator(RAND() in Microsoft Excel) to generate four independent random numbers R11,R12, R13 and R14 and we use a+(b−a)R to generate a sample from U [a, b). Thuswe set

X11 = 0 + (L− 0)R11 = LR11 , Y11 = 0 + (W − 0)R12 =WR12

along with

X12 = 0 + (L− 0)R13 = LR13 , Y12 = 0 + (W − 0)R14 =WR14

to generate X11 ∼ U [0, L), X12 ∼ U [0, L), Y11 ∼ U [0,W ) and Y12 ∼ U [0,W ).Then we compute

D1 =√(X12 −X11)2 + (Y12 − Y11)2

and we set Z as the random variable defined by

Z =

1, when D ≤ R1 +R2

0, when D > R1 +R2

.

so that

Z1 =

1, when D1 ≤ R1 + R2

0, when D1 > R1 +R2

.

We then use our random number generator (RAND() in Microsoft Excel) to gen-erate another four independent random numbers R21, R22, R23 and R24 and weset

X21 = 0 + (L− 0)R21 = LR21 , Y21 = 0 + (W − 0)R22 =WR22

66

Page 67: notes#10

along with

X22 = 0 + (L− 0)R23 = LR23 , Y22 = 0 + (W − 0)R24 = WR24.

Then we computeD2 =

√(X22 −X21)2 + (Y22 − Y21)2

and we set

Z2 =

1, when D2 ≤ R1 + R2

0, when D2 > R1 +R2

.

Continuing this process and constructing Z3, Z4, ..., we then use the fact that

Pr(D ≤ R1 +R2) = E(Z)

to get

Pr(D ≤ R1 +R2) = E(Z) = limn→∞

(1

n

n∑

i=1

Zi

)

resulting in the estimate

P = Pr(D ≤ R1 +R2) ≃1

N

N∑

i=1

Zi

for large values N , which is known as the number of Monte-Carlo simulations.A simulation using N = 5000 is presented in the coin problem worksheet thataccompanies this chapter. Specifically, using the inputs: L = 6, W = 3, R1 = 2and R2 = 1, we find that P ≃ 0.7. The user is encourage to change the inputsthat are in the “yellow” cells and to show that the end result depends only on theratios ω = W/L and ρ = (R1 + R2)/L, and under the special case when L = W ,the value of P depends only on the ratio ρ = (R1 +R2)/L.

67