anurag-agarwal.comanurag-agarwal.com/.../classnotes/chapter03probability… · web viewnow let us...

Chapter 3Probability Distributions

3.1 IntroductionRemember one extremely important thing about Statistics - Probability Distributions are extremely

important in Statistics. From now on we will be talking about probability distributions a lot in the rest of the book. Some students get scared when they hear the term “probability distributions” and they don’t want anything to do with them. But really it is not as difficult as it sounds. Although the word “distribution” is a relatively long word, it is definitely not hard to understand. When something is distributed, you want to understand how it is distributed. For example, when your grandfather wrote his will, your dad and uncles must have been interested in how grandpa distributed his wealth amongst them. You obviously hoped that your dad was grandpa’s favorite son. The word probability is not that hard to grasp either. We have already studied about probability in Chapter 2. So, the word “probability” is not that hard to understand, and we have already established that the word “distribution” is not that hard, so how hard can it be to understand “probability distribution”?

3.2 Example of Probability DistributionLet us start with a very small example – the smallest example possible of a probability distribution. If you

flip a fair coin, we know that there are a total of two possible outcomes: Head or Tail. We also know that the probability of each of these two outcomes is 50%. Together, the probabilities of these two outcomes add up to 100%. So we can say that the 100% probability for all possible outcomes gets distributed as 50% and 50% for each of the two possible outcomes. This situation is similar to your grandpa distributing his wealth 50-50 amongst his two children - your dad and your uncle. Of course, if the coin was not fair, and let’s say it was biased towards heads so that 70 out of 100 times you got heads, then for such a coin, the probability distribution would be 70% for heads and 30% for tails. This would be the case if your dad was your grandpa’s favorite son and he willed 70% of his wealth for your dad. Of course your uncle would think that was unfair (just like the coin was not fair in our example.)

So, whenever you distribute 100% probability (of all possible outcomes combined) amongst each of the possible outcomes, you get a probability distribution. We can express a probability distribution as a table or as a chart. For the above example the probability distribution might be expressed as a table or as a chart, as in Figures 3.1a and 3.1b, respectively.

Outcome Probability

Head 0.5

Tail 0.5

Total 1.0

Figure 3.1a: Probability distribution of outcomes on a flip of a coin (in table form)

Head Tail0

0.1

0.2

0.3

0.4

0.5

0.6

Probability Distribution

Outcome

Prob

abili

ty

1

Figure 3.1b: Probability distribution of outcomes on a flip of a coin (in chart form)

Note that the probabilities of each outcome must add up to 1 (or 100%). Note that in the chart, the line connecting the two points has no special meaning. It is just there to give an outline shape of the points. The points are what are important.

Now let us discuss a slightly more complex example. Let’s say we flip a fair coin two times and ask you how many heads we will get? If we get both tails, then the number of heads is zero. If we get one head and one tail the number of heads is one. And if you get both heads the number of heads is 2. So there are three possible outcomes – 0, 1 and 2. The probability that one of these three possible outcomes occurs is 100%. The question is how is this 100% probability divided (or distributed) into each of the three outcomes? Figure 3.2a and 3.2b show the probability distribution for this example.

Outcome Probability

0 Heads 0.25

1 Head 0.50

2 Heads 0.25

Total 1.00

Figure 3.2a: Probability distribution of outcomes on two flips of a coin (in table form)

0 1 20

0.10.20.30.40.50.6


Outcomes (Number of Heads in two flips of a coin)

Prob

abili

ty

Figure 3.2b: Probability distribution of outcomes on two flips of a coin (in chart form)

We will now make it even more complex now. Let’s flip a fair coin three times this time. Now how many heads are possible in three flips? There can be no heads (if you get all three tails), or 1 head or 2 heads or 3 heads. So there are four possible outcomes (0, 1, 2 and 3). So the question is how is the 100% probability of having any one of these outcomes distributed? Figure 3.3a and 3.3b give the distribution for this example.

Number of Heads Probability

0 0.125

1 0.375

2 0.375

3 0.125

Total 1.000

Figure 3.3a: Probability distribution of outcomes on three flips of a coin (in table form)

2

0 1 2 30

0.050.1

0.150.2

0.250.3

0.350.4


Outcomes (Number of Heads)

Prob

abili

ty

Figure 3.3b: Probability distribution of outcomes on three flips of a coin (in chat form)

Let us fast forward to 10 flips (and save a few pages) and ask the same question - how many heads will show up? We can have zero heads (i.e., if I get all ten tails), or 1 head, or 2, or 3 and so on up to 10 heads. So there are eleven possible outcomes and the probability of 100% gets divided into each of eleven outcomes. The probability distribution is given in Figure 3.4a and 3.4b.

Number of Heads

Probability

0 0.0010

1 0.0098

2 0.0439

3 0.1172

4 0.2051

5 0.2461

6 0.2051

7 0.1172

8 0.0439

9 0.0098

10 0.0010

Total 1.0000

Figure 3.4a: Probability distribution of outcomes on ten flips of a coin (in table form)

3

0 1 2 3 4 5 6 7 8 9 100.0000

0.0500

0.1000

0.1500

0.2000

0.2500

0.3000


Outcomes (Number of Heads in ten flips of a coin)

Prob

abili

ty

Figure 3.4b: Probability distribution of outcomes on ten flips of a coin (in chart form)

So now we understand what a probability distribution is. It is basically a distribution of probabilities for each of the possible outcomes such that the probabilities of all outcomes add up to 1.0 or 100%.

The shape of a probability distribution tells quite a bit of story. And remember, in Statistics we want to tell stories. If the shape is flat, like in Figure 3.1b, we know that the probabilities are distributed uniformly across all outcomes. If it looks like a nice and symmetric bell-shaped curve like in Figure 3.4b, it tells the story that the probability of getting the middle value is the highest and the probability reduces symmetrically as we move from the middle value, in either direction.

3.3 The link between frequency, relative frequency and probabilityIt is important to understand the link between frequency, relative frequency and probability. The examples

of flipping coins discussed above are theoretical in nature. For an experiment involving coins, we can calculate probabilities theoretically using the classical approach. But what about things such as what is the probability that upon graduation your salary in your first job will be over 80K per year? Or what is the probability that the salary will be between 50K and 60K per year? It is hard to come up with any theoretical probabilities. So what do we do? We use the relative frequency approach. We collect data from previous years and use them as proxies. Let’s say we collect the following data:

Salary Range Frequency Relative Frequency

Relative Frequency

(Expressed as %)

< 30K 14 0.112 11.2%

30 - 40 24 0.192 19.2%

40 - 50 35 0.280 28.0%

50 - 60 29 0.232 23.2%

60 - 70 15 0.120 12.0%

70 - 80 6 0.048 4.8%

>80 2 0.016 1.6%

Total 125 1.000 100%

Figure 3.5: Frequency Table for Salary of your First Job

It is easy to see the relationship between frequency and relative frequency. You simply divide each frequency by the sum of the frequencies and you get relative frequencies. If you express relative frequency as a percentage, it looks a little better. Now suppose this data was collected for last year. Can you say based on this

4

data that the probability that you will get a job of over 80K is 0.016 (1.6%)? Sure. When we have a non-theoretic situation, such as this, the past data serves as a means to talk about probabilities for the future. Of course you should realize that this data based on one year can only act as an “estimate” about the future and the exact values for the next year will be different. In later chapters we will study more formally how to estimate values based on sample data. As another example, let’s say 97% of all heart surgeries performed by a particular heart surgeon last year for the same symptoms were successful. One can use this relative frequency data and interpret it as an estimate of the probability for the future. So given a surgery, there is a roughly 97% chance it will be successful if performed by the same doctor. Note that this is just an estimate based on a sample of last years’ surgeries. In future chapters we will study how much confidence we have in these estimates. As another example, suppose one day when I had nothing better to do so I sat down and flipped a coin three times, a hundred times, i.e., I flipped the coin 300 times and treated each set of three flips as a set, so there were 100 sets of three flips. I recorded the number of heads I got in each set of 3 flips. I got the following frequency (and relative frequency):

Outcome Frequency Relative Frequency

Theoretical Probability

0 Heads 12 0.12 0.125

1 Head 35 0.35 0.375

2 Heads 39 0.39 0.375

3 Heads 14 0.14 0.125

Total 100 1.00 1.00

Figure 3.6: Relationship between Relative Frequency and Probability

Note that this was a real experiment and we find that the values in the relative frequency column are quite close to those in the theoretical probability column. No matter how many times I repeat sets of 3-flips of a coin, the relative frequency column will never be exactly equal to the theoretical probability column, but the more times I flip, the relative frequency column will get closer and closer to the theoretical probability column. For this reason, the relative frequency column is often used as a proxy for probability.

Relationship between probability and cumulative probabilityNumber of Heads Probability Cumulative Probability

0 0.0010 0.0010

1 0.0098 0.0107

2 0.0439 0.0547

3 0.1172 0.1719

4 0.2051 0.3770

5 0.2461 0.6230

6 0.2051 0.8281

7 0.1172 0.9453

8 0.0439 0.9893

9 0.0098 0.9990

10 0.0010 1.0000

Total 1.0000

Figure 3.7: Cumulative Probability

Let’s look at the data in Figure 3.4a which has been reproduced in Figure 3.7. We have added a column called cumulative probability. As you can figure out, the cumulative probability column is fairly easy to calculate. On the first row, it is the same as the probability column. On the second, row, it is the sum of the first two probabilities. On the third row it is the sum of the first three probabilities and so on. On the last row, it is the

5

sum of all the eleven probabilities, which must necessarily be equal to 1.0. So why are we talking about cumulative probabilities? Well, once we create the column for cumulative probabilities, which is really not hard to do at all, we can answer a lot of questions about this experiment of flipping a coin ten times. For example, if we want to know – what is the probability of getting fewer than 6 heads if I flipped a coin 10 times? To answer this question you can simply look at the cumulative probability table for our answer. The answer is 0.623. If the question was what is the probability of getting 6 or fewer heads in 10 flips of a coin? The answer is 0.8281?

Now let’s ask ourselves you a slightly more difficult question. What is the probability of getting 7 or more heads in 10 flips of a coin? Now if we had created the cumulative probability column bottom up, i.e. if last row had been 0.0010 and second last row was 0.0107 and so on, then we could easily answer this question. But we can still answer this question (without having to create a second cumulative probability column). We can argue that the probability of getting seven or more heads is 1 minus the probability of getting 6 or fewer heads. So the answer is 1 minus 0.8281 or 0.1719.

As another example, recall that in Chapter 1, we looked at data like this: 2, 5, 7, 9, 5, 3, 1, 4, 6, 8, 14, 4, 20, 6, 10, 4, 6, 9, 11, 2, 6, 9, 4, 5, 13, 18, 7, 5, 9, 10 which we translated it into a frequency table like this:

Number ofCars sold

Count(or frequency)

CumulativeFrequency

RelativeFrequency

Cumulative RelativeFrequency

1 – 3 4 4 13.3% 13.3%

4 – 6 12 16 40.0% 53.3%

7 – 9 7 23 23.3% 76.7%

10 – 12 3 26 10.0% 86.7%

13 – 15 2 28 6.7% 93.3%

16 – 18 1 29 3.3% 96.7%

19 – 21 1 30 3.3% 100%

And then used the relative frequency column, we drew a frequency polygon that looked like this:

0 1-3 4-6 7-9 10-12 13-15 16-18 19-210

0.050.1

0.150.2

0.250.3

0.350.4

0.45

Relative Frequency of Students visit -ing States

Number of States Visited

Rela

tive

Freq

eunc

y

Figure 3.8: Probability distribution of number of states visited by students

In Chapter 1, we called this char a frequency polygon. Now that we know about probability distributions, we can treat relative frequency as a proxy for probability and we can treat it as a probability distribution. Note that the heights of each of the points in the above polygon add up to exactly 100%. Also recall that the numbers in our dataset represented the number of cars sold per day in the month of April at a car dealership. By looking at this picture, we can make statements such as if you randomly pick a day, there is a 40% chance that on this day 4

6

to 6 cars were sold. Or that the probability that on a randomly selected day, the number of cars sold is up to 6 is 53.3%.

3.4 Random VariablesIn statistics, we deal with variables whose values are never deterministic. Their values depend upon

chance. A variable whose value depends upon chance is called a Random Variable. So, in statistics, we only deal with random variables. When we talk about a probability distribution, we are really talking about the probability distribution of a random variable. For example, number of cars sold in a day in April depends upon chance, so the number of cars sold in a day in April is a random variable. Since it is a random variable, we can talk about its probability distribution. A random variable may be quantitative or qualitative. If it is quantitative, it can be discrete or continuous. The probability distributions of discrete variables are a little different from those of continuous variables. We will study these differences in the next section.

3.6 Probability Distribution of Discrete vs. Continuous Random VariablesSo far, we have only discussed probability distributions of discrete random variables. For example, the

number of heads in 10 flips of a coin or the number of cars sold per day in April. If the value of a random variable is a result of some kind of measurement, as opposed to some kind of counting, then it is a continuous random variable. For example, height of a person is a continuous random variable. Can we talk about probability distributions for continuous variables? Yes we can. But there are some fundamental differences that we should keep in mind. To understand the difference, you must understand one very important thing about probabilities of continuous variables. What we are going to tell you next will first appear to be counterintuitive. Suppose there is a continuous variable, such as a person’s height and suppose the question is – what is the probability that the height of a person is 68” (68” is basically 5 ft. 8 inches) or 70”? In other words what is P(height = 68”)? Or what is P(height = 70”)? Or what is P(height = any other number)? The answer to all these questions is zero. Yes, you heard it right. The probability that the height is exactly 68” is zero. The probability that the height is exactly 70” is zero. You are wondering – how can that be true? If your height happens to be 68” or 70” and if you were to believe us, then you would have to believe that you don’t exist! But since you are quite sure that you exist you have no choice but to not believe us. So let us relieve you of your predicament. We will now tell you something so that your existence is not in danger and you can also believe us.

The key word is “exactly”. If you think your height is 68”, it is only because you have rounded your height to 68”. Remember if you round continuous numbers to the nearest integer, you are treating the variable as discrete. When we say exactly 68”, we mean exactly 68” without any rounding. We mean 68.00000000000000000000000… inches. If there was a measuring device that could measure height accurately up to thousands of decimal places, then what is the probability that your height was exactly 68.000000000000000000000000000000… for thousands of decimal places? The answer is zero. So if you thought your height was 68”, well it was only approximately 68” and you definitely exist. Hopefully you can also believe us that the probability of finding someone on this earth whose height is “exactly” 68” is as close to zero as one can imagine.

So why is this difference between discrete and continuous variables important? Well, if the probability of every possible value is zero, how does the probability distribution look like? You are probably thinking that the probability column has zeros throughout the column and the graph will be a straight horizontal line at a height of zero. That would not be a very interesting probability distribution, would it be? And it wouldn’t be the right one either because the sum of all the probabilities would not be 1.

Luckily, that is not the case. Luckily even for continuous variables, the probability distributions do look like bell-shaped curves and other types of curves. The reason is that even though the probability of an exact value of a continuous variable is zero, the probabilities of values ranging between two exact values are not zero but some finite positive number. So we can talk about the probability of your height being between 68” and 69” or between 68” and 68.3” or between 68.2567” and 68.5432”. We can talk about probability of your height being less than 68” or greater than 68”. We just cannot talk about the probability of exactly 68”.

A more accurate description of the probability distribution of a continuous variable is probability density function. However, it is quite common to use the term probability distribution for both discrete and continuous variables.

So here is the main difference. If we are looking at the probability distribution chart of a discrete variable, then the height of the curve is a measure of the probability. When we are looking at the probability density

7

function for a continuous variable, then the area under the curve between two points is the measure of probability between those two points. The height of the probability density function of a continuous variable has no meaning. So when reading the probability density function in Figure 3.9, which is for a continuous variable, such as height of person, we cannot say that the probability of height of 68 is roughly 0.08. But we can say that the probability of height between 68” and 70” is the area under the curve between 68 and 70. Please note that the total area under the curve is 1, just as the sum of all the heights of probabilities of all possible outcomes in a discrete probability distribution curve was 1. We can also say that the probability of height greater than 68” is the area under the curve to the right of 68”.

48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 880.0000

0.0100

0.0200

0.0300

0.0400

0.0500

0.0600

0.0700

0.0800

0.0900

Figure 3.9: Continuous Probability Distribution

When you see a probability density function, remember, the area under the curve is a measure of probability and that the entire area under the curve is 1.

3.6 Some Popular Discrete and Continuous Random Variables:Three popular discrete random variables are – The Binomial Random variable, the Poisson Random

variable and the Hypergeometric random variable. Three very popular continuous random variables are – The Uniform, the Normal, t and the Exponential. These are all theoretical random variables. Whenever there is a theoretical random variable, its probability distribution is well defined. Since its probability distribution is well defined, its story is also well defined. For example, the Normal random variable and the t random variable both have a nice symmetric bell shaped probability distribution curve. So here is what statisticians do. If they have some data for some random variable, such as the height of a person, which is not a theoretical variable, they try to see if its shape resembles one of the theoretical variables. And if they find that it does then the story telling becomes easier. Think of each theoretical probability distribution as a template of a story. So if I can fit the probability distribution of a random variable in to a template, then the story emerges easily. As an example, statisticians have discovered that the height (of a person) random variable does indeed resemble closely a normal random variable, especially if we have lots of data, i.e. we have data on heights of lots of people. Therefore it becomes easy to tell the story hidden behind the height data because we already know the story of a normal random variable.

In fact, statisticians have found that many variables in nature, that happen to be continuous in nature, resemble the normal distribution. It is therefore important to understand how a normal probability distribution looks like. In fact, the picture in Figure 3.9 is how a normal probability distribution (or normal curve) looks like. Since a probability density function is basically a curve, statisticians usually use the term curve as short for probability density function.

Remember that a probability distribution can either be expressed as a table or as a curve. Now keep this in mind - when you have a discrete random variable, you can talk about the probability of each possible value of the outcome. When you have a continuous random variable, you cannot talk about the probability of each possible value because the probability of each possible value is 0 because there are an infinite number of possible values. So for continuous variables, it makes more sense to specify the cumulative probability. Of course you can talk

8

about cumulative probability of a discrete random variable also (See Figure 3.7). And really it is the cumulative probability column that is more useful in telling a story and to make statements about a random variable.

3.6.1 Normal Probability DistributionAs mentioned before statisticians have found that many random variables in nature follow a normal

distribution. It is therefore important to understand a normal distribution. We have also learnt that a distribution table for a discrete random variable is basically a table that shows the probabilities of the various outcomes. For continuous random variables, we deal in cumulative probabilities instead of probabilities because probabilities of specific outcomes are zero.

For the sake of simplicity suppose we are looking at height as the continuous random variable. Let’s see how its distribution might look like in Figure 3.10 and 3.11. We assume that mean height is 68” and standard deviation is 4”.

Height Cum Prob.Heigh

t Cum Prob. Height Cum Prob.

50.0 0.0000034 62.0 0.0668072 74.0 0.9331928

50.5 0.0000061 62.5 0.0845657 74.5 0.9479187

51.0 0.0000107 63.0 0.1056498 75.0 0.9599408

51.5 0.0000185 63.5 0.1302945 75.5 0.9696036

52.0 0.0000317 64.0 0.1586553 76.0 0.9772499

52.5 0.0000533 64.5 0.1907870 76.5 0.9832067

53.0 0.0000884 65.0 0.2266274 77.0 0.9877755

53.5 0.0001445 65.5 0.2659855 77.5 0.9912255

54.0 0.0002326 66.0 0.3085375 78.0 0.9937903

54.5 0.0003691 66.5 0.3538302 78.5 0.9956676

55.0 0.0005770 67.0 0.4012937 79.0 0.9970202

55.5 0.0008890 67.5 0.4502618 79.5 0.9979799

56.0 0.0013499 68.0 0.5000000 80.0 0.9986501

56.5 0.0020201 68.5 0.5497382 80.5 0.9991110

57.0 0.0029798 69.0 0.5987063 81.0 0.9994230

57.5 0.0043324 69.5 0.6461698 81.5 0.9996309

58.0 0.0062097 70.0 0.6914625 82.0 0.9997674

58.5 0.0087745 70.5 0.7340145 82.5 0.9998555

59.0 0.0122245 71.0 0.7733726 83.0 0.9999116

59.5 0.0167933 71.5 0.8092130 83.5 0.9999467

60.0 0.0227501 72.0 0.8413447 84.0 0.9999683

60.5 0.0303964 72.5 0.8697055 84.5 0.9999815

61.0 0.0400592 73.0 0.8943502 85.0 0.9999893

61.5 0.0520813 73.5 0.9154343 85.5 0.9999939

86.0 0.9999966

Figure 3.10: Cum Probability of the Height Normal Random Variable with Mean of 68” and std. dev. of 4”

9

50.0

50.5

51.0

51.5

52.0

52.5

53.0

53.5

54.0

54.5

55.0

55.5

56.0

56.5

57.0

57.5

58.0

58.5

59.0

59.5

60.0

60.5

61.0

61.5

62.0

62.5

63.0

63.5

64.0

64.5

65.0

65.5

66.0

66.5

67.0

67.5

68.0

68.5

69.0

69.5

70.0

70.5

71.0

71.5

72.0

72.5

73.0

73.5

74.0

74.5

75.0

75.5

76.0

76.5

77.0

77.5

78.0

78.5

79.0

79.5

80.0

80.5

81.0

81.5

82.0

82.5

83.0

83.5

84.0

84.5

85.0

85.5

86.0

0.0000000

0.0200000

0.0400000

0.0600000

0.0800000

0.1000000

0.1200000

Normal Probability Distribution for Height Random Variable, Mean 68" and standard deviation 4"

Height

f(hei

ght)

Figure 3.11: Normal Probability Distribution for Height Random Variable, Mean 68, standard deviation of 4”

By looking at this table we can answer many questions such as – what is the probability that if I pick a random person, that their height will be less than 5 ft. (60”). Just look at the cumulative probability table and the answer is 0.02275. If the question was what is the probability that the height of a random person is greater than 60”, the answer is 1 minus 0.02275, which is 0.97725.

Question: P(height of a random person < 68”)? Answer: 0.50

The above answer should make sense because in this distribution the average height is 68” so half the population must be below 68” and half above, hence the answer is 0.50. Note that since the area under the curve in Figure 3.11 is 1, the area of the curve to the left of 68” must be 0.50. So the answer to the question P(Height < 68”) is 0.5 can either read from Figure 3.10 or it can be seen from Figure 3.11 as the area under the curve to the left of 68”. Since it is easy to see that 68” divides the curve into exactly half, it was easy to read the area under the curve. But the point we are trying to make is that the cumulative probability value for a point on the x-axis in Figure 3.11 corresponds to the area under the curve to the left of that point.

Question: P(height < 70”)? Answer: 0.69146



Question: P(65” < height < 71”)? Answer: 0.77337 minus 0.22662 = 0.54675

Please make sure you understand how we got the answer to the above question (i.e. Question: P(65” < height < 71”)?) . One way to look at this is to look at the curve. The answer to the above question is – the area under the curve between heights of 65” and 71”, which is the same as the area to the left of 71” (which is 0.7733) minus the area to the left of 65” (which is 0.2266).

Question: P(60” < height < 70”)? Answer: 0.69146 minus 0.02275 = 0.66871

Question: P(height > 70”)? Answer: 1 minus 0.69146 = 0.30854

Please make sure you understand how we got the answer to the above question (i.e., P(height > 70”)?). The answer to this question is the area to the right of 70”. Since 0.69146 is the area to the left of 70” and since the total area under the curve is 1, therefore the area to the right of 70” is 1 minus 0.69146.

If you understand the above examples, then you can answer any question of the following three types:

P(height < some number)?; P(some number < height < some higher number)?; P(height > some number)?

If you can answer questions of the above three types then you can tell all kinds of stories using the table in Figure 3.10.

10

There are an infinite number of Normal DistributionsThe Normal distribution in Figures 3.10 and 3.11 is just one of an infinite number of normal distributions

possible. This one happens to have a mean of 68 and a standard deviation of 4. If we take any combination of mean and standard deviation, we can construct a probability distribution for that combination. To construct a probability distribution of a normal random variable, we only need its mean and standard deviation. For example, if the mean was 66” and the standard deviation was 3”, then we would construct the probability distribution table for that combination and then we can answer all kinds of questions about a population whose mean was 66” and the standard deviation was 3”.

3.6.2 A very special Normal Distribution – The Standard Normal DistributionOf all the infinite number of normal distributions possible, there is one very special normal distribution.

This special distribution has a mean of 0 and a standard deviation of 1. The reason this distribution is so special is because if you have this distribution, you can answer questions such as in the previous section for any normal population with any combination of mean and standard deviation. Such a normal distribution is called a Standard Normal Distribution. Almost every book in Statistics includes a table on the Standard Normal Distribution. It is given in Figure 3.12 here:

zCum Prob. z

Cum Prob. z

Cum Prob. z

Cum Prob.

-3.00 0.00135 -1.50 0.06681 0.00 0.50000 1.50 0.93319-2.95 0.00159 -1.45 0.07353 0.05 0.51994 1.55 0.93943-2.90 0.00187 -1.40 0.08076 0.10 0.53983 1.60 0.94520-2.85 0.00219 -1.35 0.08851 0.15 0.55962 1.65 0.95053-2.80 0.00256 -1.30 0.09680 0.20 0.57926 1.70 0.95543-2.75 0.00298 -1.25 0.10565 0.25 0.59871 1.75 0.95994-2.70 0.00347 -1.20 0.11507 0.30 0.61791 1.80 0.96407-2.65 0.00402 -1.15 0.12507 0.35 0.63683 1.85 0.96784-2.60 0.00466 -1.10 0.13567 0.40 0.65542 1.90 0.97128-2.55 0.00539 -1.05 0.14686 0.45 0.67364 1.95 0.97441-2.50 0.00621 -1.00 0.15866 0.50 0.69146 2.00 0.97725-2.45 0.00714 -0.95 0.17106 0.55 0.70884 2.05 0.97982-2.40 0.00820 -0.90 0.18406 0.60 0.72575 2.10 0.98214-2.35 0.00939 -0.85 0.19766 0.65 0.74215 2.15 0.98422-2.30 0.01072 -0.80 0.21186 0.70 0.75804 2.20 0.98610-2.25 0.01222 -0.75 0.22663 0.75 0.77337 2.25 0.98778-2.20 0.01390 -0.70 0.24196 0.80 0.78814 2.30 0.98928-2.15 0.01578 -0.65 0.25785 0.85 0.80234 2.35 0.99061-2.10 0.01786 -0.60 0.27425 0.90 0.81594 2.40 0.99180-2.05 0.02018 -0.55 0.29116 0.95 0.82894 2.45 0.99286-2.00 0.02275 -0.50 0.30854 1.00 0.84134 2.50 0.99379-1.95 0.02559 -0.45 0.32636 1.05 0.85314 2.55 0.99461-1.90 0.02872 -0.40 0.34458 1.10 0.86433 2.60 0.99534-1.85 0.03216 -0.35 0.36317 1.15 0.87493 2.65 0.99598-1.80 0.03593 -0.30 0.38209 1.20 0.88493 2.70 0.99653-1.75 0.04006 -0.25 0.40129 1.25 0.89435 2.75 0.99702-1.70 0.04457 -0.20 0.42074 1.30 0.90320 2.80 0.99744-1.65 0.04947 -0.15 0.44038 1.35 0.91149 2.85 0.99781-1.60 0.05480 -0.10 0.46017 1.40 0.91924 2.90 0.99813-1.55 0.06057 -0.05 0.48006 1.45 0.92647 2.95 0.99841

3.00 0.99865Figure 3.12: The Standard Normal Table

11

-3.0

0-2

.80

-2.6

0-2

.40

-2.2

0-2

.00

-1.8

0-1

.60

-1.4

0-1

.20

-1.0

0-0

.80

-0.6

0-0

.40

-0.2

00.

000.

200.

400.

600.

801.

001.

201.

401.

601.

802.

002.

202.

402.

602.

803.

00

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Figure 3.13: The Standard Normal Curve

Since you know how to read any normal table, you should have no difficulty reading the standard normal table in Figure 3.12. So you can answer questions like P(z < 1.10)= ?. The answer is 0.86433.

Question: What is P(z < 1.5) = ? Answer: 0.93319

Question: What is P(z > 1.1) = ? Answer: 1 minus 0.86433 = 0.13567

Question: What is P( -1 < z < 1)? Answer: 0.84134 – 0.15866 = 0.68268

We were able to answer all three types of questions above.

Now you must be wondering – We said that using Figure 3.12, we can answer questions about any normal random variable with any combination of mean and standard deviation. How is that possible? For example, what if the random variable was normally distributed with mean of 68” and a standard deviation of 4”? It was easy to answer questions about this random variable using Figure 3.10, but how can we answer those questions using Figure 3.12?

Let’s do an example. Recall that using Figure 3.10, we answered the question P(height < 70”) = 0.69146

So how do we answer this question using Figure 3.12?

You perform a slight transformation and then you use Figure 3.12. The slight transformation is this. You convert the 70” into a number called the standardized value or the z-value. How to get a standardized value? You subtract the mean and divide by the standard deviation. So the standardized value of 70 in this example would be (70 – 68) / 4 or 2/4 or 0.50. So the standardized value of 70” is 0.50. Now look at Figure 3.12 and find P(z < 0.50) and you get the answer 0.69146 which is the same answer that you got from Figure 3.10. It is as simple as that.

Let’s do another example. What is P(height < 80”)? From Figure 3.10, we know the answer to be 0.99865. The standardized value of 80 is (80-68)/4 = 12/4 = 3. So from Figure 3.12, you can see that the answer is 0.99865.

What about the question: What is P(60 < height < 70)? We know from Figure 3.10 that the answer is 0.66871.

To use Figure 3.12, we standardize both 60 and 70 as (60 – 68)/4 and (70 – 68)/4 or -2 and 0.5. So the new question is what is P(-2 < z < 0.5). From Figure 3.12, the answer is 0.69146 – 0.02275 = 0.66871, which is the same answer we got from Figure 3.10.

If you understand the above examples, then you should be able to answer questions about any normal random variable, with any mean and any standard deviation, just by using Figure 3.12.

Some important facts about the Normal Curve:

12

1. The area under the curve between ± 1 std. dev. is about 68% (68.268% to be more exact)2. The area under the curve between ± 2 std. dev. is about 95% (95.45% to be more exact)3. The area under the curve between ± 3 std. dev. is about 99.7% (99.73% to be more exact)

3.6.3 The t-DistributionJust like the standard normal distribution, which has a bell-shaped curve, the t-distribution also has a bell-

shaped curve. In fact, the t-curve resembles a z-curve a lot. The only difference is that the tails of a t-curve are a little bit thicker than that of the z curve. Please remember that the standard normal distribution is for a random variable z, whose mean is 0 and standard deviation is 1. Similarly, t is also a random variable with mean of 0 and standard deviation of 1, however, its probability distribution depends upon another factor called the degrees of freedom.

To determine say P( z > 1.5), I can look up the table in Figure 3.12. But to determine P(t > 1.5), we will need one more piece of information - the degrees of freedom. For each different degree of freedom, the cumulative probabilities for the various t values are different. Figure 3.14 shows the cumulative probabilities of the t-variable for various degrees of freedom (5 through 40 and 10,000). The last column in Figure 3.10 shows the cumulative probabilities of the standard normal variable z. Note that the probability distribution of z is extremely close to that of t random variable for a very high degree of freedom, such as 10,000. For smaller degrees of freedom, there are minor differences. For t of 0.0 all the values are the same.

Degrees of Freedom

t 5 10 15 20 25 30 35 40 10000 normal

-3.0 0.01505 0.006670.0044

9 0.00354 0.00302 0.00269 0.00247 0.00232 0.001350.0013

5

-2.5 0.02725 0.015720.0122

5 0.01062 0.00967 0.00906 0.00863 0.00831 0.006220.0062

1

-2.0 0.05097 0.036690.0319

7 0.02963 0.02824 0.02731 0.02665 0.02616 0.022760.0227

5

-1.5 0.09695 0.082250.0771

8 0.07462 0.07307 0.07203 0.07129 0.07073 0.066820.0668

1

-1.0 0.18161 0.170450.1665

9 0.16463 0.16345 0.16265 0.16209 0.16166 0.158670.1586

6

-0.5 0.31915 0.313950.3121

7 0.31127 0.31072 0.31036 0.31010 0.30991 0.308540.3085

4

0.0 0.50000 0.500000.5000

0 0.50000 0.50000 0.50000 0.50000 0.50000 0.500000.5000

0

0.5 0.68085 0.686050.6878

3 0.68873 0.68928 0.68964 0.68990 0.69009 0.691460.6914

6

1.0 0.81839 0.829550.8334

1 0.83537 0.83655 0.83735 0.83791 0.83834 0.841330.8413

4

1.5 0.90305 0.917750.9228

2 0.92538 0.92693 0.92797 0.92871 0.92927 0.933180.9331

9

2.0 0.94903 0.963310.9680

3 0.97037 0.97176 0.97269 0.97335 0.97384 0.977240.9772

5

2.5 0.97275 0.984280.9877

5 0.98938 0.99033 0.99094 0.99137 0.99169 0.993780.9937

9

3.0 0.98495 0.993330.9955

1 0.99646 0.99698 0.99731 0.99753 0.99768 0.998650.9986

5

Figure 3.14: Cumulative Probabilities for the t-random variable for various degrees of freedom

13

Area under the t-curve, to the right the t-value

Degrees of Freedom 0.2 0.1 0.05 0.02 0.01 0.005

1 1.376 3.078 6.314 15.895

31.821 63.6572 1.061 1.886 2.920 4.849 6.965 9.9253 0.978 1.638 2.353 3.482 4.541 5.8414 0.941 1.533 2.132 2.999 3.747 4.6045 0.920 1.476 2.015 2.757 3.365 4.0326 0.906 1.440 1.943 2.612 3.143 3.7077 0.896 1.415 1.895 2.517 2.998 3.4998 0.889 1.397 1.860 2.449 2.896 3.3559 0.883 1.383 1.833 2.398 2.821 3.250

10 0.879 1.372 1.812 2.359 2.764 3.16911 0.876 1.363 1.796 2.328 2.718 3.10612 0.873 1.356 1.782 2.303 2.681 3.05513 0.870 1.350 1.771 2.282 2.650 3.01214 0.868 1.345 1.761 2.264 2.624 2.97715 0.866 1.341 1.753 2.249 2.602 2.94716 0.865 1.337 1.746 2.235 2.583 2.92117 0.863 1.333 1.740 2.224 2.567 2.89818 0.862 1.330 1.734 2.214 2.552 2.87819 0.861 1.328 1.729 2.205 2.539 2.86120 0.860 1.325 1.725 2.197 2.528 2.84521 0.859 1.323 1.721 2.189 2.518 2.83122 0.858 1.321 1.717 2.183 2.508 2.81923 0.858 1.319 1.714 2.177 2.500 2.80724 0.857 1.318 1.711 2.172 2.492 2.79725 0.856 1.316 1.708 2.167 2.485 2.78726 0.856 1.315 1.706 2.162 2.479 2.77927 0.855 1.314 1.703 2.158 2.473 2.77128 0.855 1.313 1.701 2.154 2.467 2.76329 0.854 1.311 1.699 2.150 2.462 2.75630 0.854 1.310 1.697 2.147 2.457 2.75040 0.851 1.303 1.684 2.123 2.423 2.70450 0.849 1.299 1.676 2.109 2.403 2.67860 0.848 1.296 1.671 2.099 2.390 2.66070 0.847 1.294 1.667 2.093 2.381 2.64880 0.846 1.292 1.664 2.088 2.374 2.63990 0.846 1.291 1.662 2.084 2.368 2.632100 0.845 1.290 1.660 2.081 2.364 2.626

Infinity 0.842 1.282 1.645 2.054 2.326 2.576Figure 3.15: t-distribution

You can obtain these values in Excel® using this function: =T.DIST(t-value,dof,TRUE). So, for example, if you put =T.DIST(1.0,20,TRUE), you will get 0.83537.

In Figure 3.14, we have shown only a few values of t and only a few degrees of freedom. You can imagine the amount of paper used if we showed you these values for all possible degrees of freedom and all possible values of t. For this reason, t-tables are shown in textbooks in a different format. Whereas in Figure 3.14 you can find the cumulative probability for a given value of t, for certain degrees of freedom, the way t-tables are shown in textbooks, you read the value of t, given a certain cumulative probability distribution for a

14

given degree of freedom. Figure 3.15 shows a typical t-table. How to read the t-table of Figure 3.15? Let’s say I want to know the value of t such that P(t > value of t to be determined) > 0.05 for degrees of freedom 10. We will look at the column for 0.05 and the row for 10 and get the value 1.812. So for 10 degrees of freedom, P(t > 1.812) = 0.05. Similarly, for 20 degrees of freedom, P(t > 2.528) = 0.01

3.7 Other Special DistributionsSo far in this chapter, we studied the normal and the t-distributions, which are the two most well-known

distributions for continuous variables. When we are telling a story about a continuous variable that follows a normal or a t-distribution, we make use of those distributions. When we have to tell a story about a discrete variable we need distributions suitable for those discrete variables. We cannot even use the normal distribution to tell the story about a continuous variable that is not normally distributed. For example one type of continuous variable has a uniform distribution and we use the uniform distribution to tell its story. In this part, we will study several more distributions. We will study the uniform distribution which is the distribution of a uniform random variable. We will also study the distributions of some well-known discrete random variables such as the Binomial, the Poisson and the Hypergeometric random variables.

3.7.1 The Uniform DistributionSome continuous random variables have a uniform probability distribution within a range of values. This

distribution is also called the rectangular distribution because of the shape of the distribution (Figure 3.16). If the uniform random variable varies between values of a and b, then the height of the distribution is 1/(b-a). The area of the rectangle is 1. If c is a value between a and b, then P(X < c) = (c - a)/(b - a). P(X < c) is also given by the area under the rectangle to the left of c. If c and d are values between a and b and if c < d, then then P(c < X < d) = (d – c)/(b - a). It is also given by the area under the rectangle between c and d. The expected value of X is (a + b)/2 and the variance of X is (b-a)2/12.

Figure 3.16: A Uniform Distribution

3.7.2 The Binomial DistributionA binomial variable is a discrete variable and its probability distribution is called a binomial distribution.

We have already seen examples of a binomial variable in previous chapters without realizing that they were binomial. The example of getting a certain number of heads in certain number of flips of a coin was an example of a binomial variable.

A Binomial Variable: In general, a binomial variable is of this type: Number of successes in n trials, where the probability of success in each trial is p and there are only two outcomes in each trial (success and failure) and each of the n trials is identical and independent of each other, i.e. on each trial, the probability of success is p and this probability does not change depending on the outcomes on previous trials. For example, if on each of the first two flips of a coin you get a “Head”, the probability of getting a “Head” in the third trial stays at 0.5.

In the coin-toss example, we can define success as obtaining a “Head” and failure as getting a “Tail”. For a fair coin, the probability of success is 0.5. If we toss the coin 3 times, then n is 3. At the end of the 3 trials, we will have either zero successes, 1 success, 2 successes or 3 successes. In general, in n trials, there are (n+1) number of possible outcomes namely 0 or 1 or 2,… or n successes.

For a Binomial variable, the probability of r successes in n trials, where the probability of success is p is given by P(r successes in n trials) = nCr.pr(1-p)n-r.

15

a bdc

1/(b-a)

In Excel, you can use the =BINOMDIST() function to calculate the probability of r successes in n trials where the probability of success is p. If we want the probability of r successes in n trials where the probability of success is p, the Excel function is =BINOMDIST(r,n,p,FALSE). If we want the cumulative probability of up to r successes in n trials, the Excel function is =BINOMDIST(r,n,p,TRUE).

Examples of a Binomial Variable

If a coin is loaded and biased towards “Heads” such that the probability of obtaining a “Head” is 0.7 what is the probability of obtaining 3 Heads in 4 flips of a coin? In this example, n = 4, r = 3 and p = 0.7. We can get the answer to this question either by using the formula 4C3.(0.7)3(1-0.7)1 or using Excel function =BINOMDIST(3,4,0.7,FALSE). The answer is 0.4116. Please verify this answer using both, the formula and Excel.

Note: if the coin had been fair, the probability of obtaining 3 heads in 4 flips of coin would be =BINOMDIST(3,4,0.5,FALSE) = 0.25

For any combination of n, r and p we can create a probability distribution for all possible values of r. Such a table would be a called a Binomial Distribution.

Examples of a Binomial Distribution

Suppose n = 4 and p = 0.7. Since n is 4, there are 5 possible values of r. The Binomial distribution for all possible values of r is shown in Figure 3.17 below. Note that the sum of the probabilities is 1.

r n Probability

0 4 0.00811 4 0.07562 4 0.26463 4 0.41164 4 0.2401

Sum = 1.0000

0 1 2 3 40

0.050.1

0.150.2

0.250.3

0.350.4

0.45

Binomial Distribution for n = 4, p = 0.7

Number of Heads

Prob

abili

ty

Figure 3.17: Binomial Distribution for n = 4, p = 0.7

For n = 4 and p = 0.5, the binomial distribution is shown in Figure 3.18 below. Again note that the probabilities add up to 1. Also note that when p = 0.5, the distribution is symmetric.

r n Probability0 4 0.06251 4 0.25002 4 0.37503 4 0.2500

16

4 4 0.0625Sum = 1.0000

0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


Number of Heads

Prob

abili

ty

Figure 3.18: Binomial Distribution for n = 4, p = 0.5

When p is 0.5 and n is large, the Binomial Distribution starts resembling a normal distribution. For example, in Figure 3.19, n is 20 and p is 0.5. We can see how much this distribution resembles the bell-shaped normal curve.

r n Probability Cum. Prob. 1 - Cum. Prob.

0 20 0.00000095 0.00000095 0.999999051 20 0.00001907 0.00002003 0.999979972 20 0.00018120 0.00020123 0.999798773 20 0.00108719 0.00128841 0.998711594 20 0.00462055 0.00590897 0.994091035 20 0.01478577 0.02069473 0.979305276 20 0.03696442 0.05765915 0.942340857 20 0.07392883 0.13158798 0.868412028 20 0.12013435 0.25172234 0.748277669 20 0.16017914 0.41190147 0.5880985310 20 0.17619705 0.58809853 0.4119014711 20 0.16017914 0.74827766 0.2517223412 20 0.12013435 0.86841202 0.1315879813 20 0.07392883 0.94234085 0.0576591514 20 0.03696442 0.97930527 0.0206947315 20 0.01478577 0.99409103 0.0059089716 20 0.00462055 0.99871159 0.0012884117 20 0.00108719 0.99979877 0.0002012318 20 0.00018120 0.99997997 0.0000200319 20 0.00001907 0.99999905 0.0000009520 20 0.00000095 1.00000000 0.00000000

17

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200.000000000.020000000.040000000.060000000.080000000.100000000.120000000.140000000.160000000.180000000.20000000


Number of Heads

Prob

abili

ty

Figure 3.19: Binomial Distribution for n = 20 and p = 0.5

The table in Figure 3.19 also has a cumulative probability column and a “1 minus Cum. Prob” column, using which we can answer questions like follows:

P(X <= 15) = 0.9941

P(X < 15) = 0.9793

P(X > 15) = 0.0059

P(10 <= X <= 15) = 0.9941 - 0.4119 = 0.5822

Using Excel, you can create a Binomial Distribution for any n and any p with ease and answer any kind of question related to a Binomial variable.

Mean and Variance of Binomial Random VariableThe mean or the expected value of a binomial random variable is np

The variance of a binomial random variable is np(p-1)

3.7.3 The Poisson DistributionThe Poisson random variable is a discrete random variable. Its probability distribution is called a Poisson

distribution. What kind of variable is a Poisson variable? It is of this type: Number of occurrences of a certain type in a given period of time or number of occurrences over a distance or number of occurrences over an area or a volume. For example, number of customers arriving at a bank in one hour is a Poisson random variable. The number of hurricanes in a year, the number of paint defects per square meter of a car surface, the number of dust particles per cubic meters, are all examples of Poisson variables.

Suppose on average 5 customers arrive at a bank every hour, what is the probability that only 3 will arrive in a given hour or that as many as 9 customers will arrive or that fewer than 4 will arrive. These kinds of questions can be answered if we understand how a Poisson random variable behaves. Meteorologists are interested in calculating the probabilities of the number of hurricanes in a season. If the average number of category-4 hurricanes in a season over the past 50 seasons is say 3, what is the probability of getting exactly 2 hurricanes or up to 2 hurricanes of category-4 in a given season? These types of questions can be answered using Poisson distributions.

Poisson Probabilities

If the average number of occurrences per unit time is say λ, then the probability of x number of occurrences per unit time is given by P(x occurrences) = e-λ.λx/x!

Using Excel, we can find Poisson probabilities using the function =POISSON(x,λ,FALSE). If we want to find the cumulative probability of up to x occurrences, then we use the function = POISSON(x,λ,TRUE).

An Example of Poisson Probability Distribution18

Figure 3.20 shows the Poisson probabilities for x = 0,1,2,3,…,10 for λ = 5. This table was created using Excel.

x λ Poisson Prob. Cum Prob.1 minus Cum.

Prob.

0 5 0.006738 0.006738 0.9932621 5 0.033690 0.040428 0.9595722 5 0.084224 0.124652 0.8753483 5 0.140374 0.265026 0.7349744 5 0.175467 0.440493 0.5595075 5 0.175467 0.615961 0.3840396 5 0.146223 0.762183 0.2378177 5 0.104445 0.866628 0.1333728 5 0.065278 0.931906 0.0680949 5 0.036266 0.968172 0.03182810 5 0.018133 0.986305 0.013695

Sum = 0.986305Figure 3.20: Poisson Probability Distribution for λ = 5

Note that the sum of the probabilities column is not 1. This is because we have not listed all possible outcomes. Theoretically, there are an infinite number of possible outcomes.

0 1 2 3 4 5 6 7 8 9 100

0.020.040.060.08

0.10.120.140.160.18

0.2

Poisson Distribution for Lamba = 5

Prob

abili

ty

Figure 3.21: Chart of Poisson Probability Distribution for λ = 5

Figure 3.21 shows the chart of the probability distribution for λ = 5. Please note that the Poisson distribution is not symmetrical.

Mean and Variance of Poisson Random Variable

The mean of a Poisson variable is λ.

Interestingly, the variance of a Poisson variable is also λ.

19

3.7.4 The Hypergeometric DistributionA hypergeometric random variable works like this: There are N objects in a population. The population

has only two types of objects, M objects of type A and N-M objects of type B. A random sample of size n is chosen, where n is less than N, without replacement, from this population. If we define our random variable as the number of objects of type A in the sample, such a random variable is a hypergeometric random variable.

If X is a hypergeometric random variable then P(X = r) = MCr.N-MCn-r/NCn

This formula is obviously quite complicated but thankfully, Excel provides a function called =HYPGEOMDIST(r,n,M,N).

Mean and Variance of Poisson Random Variable

The mean of a hypergeometic random variable = nM/N

The variance of a hypergeometic random variable is = n(M/N)(1-M/N)((N-n)/(N-1))

An Example

A bag has 12 balls, 5 of which are red and the remaining 7 are blue. If I randomly draw 4 balls, without replacement, what is the probability of getting 3 red balls? This kind of problem can be solved using the hypergeometric formula.

P(3 red balls) = 5C3.7C1/12C4 =COMBIN(5,3).COMBIN(7,1)/COMBIN(12,4) = 0.1414

=HYPGEOMDIST(3,4,5,12) = 0.1414

Note: It is important to understand that in the above example, it is critical that that we pick objects without replacement. If we replace objects before picking the next object, the random variable becomes a binomial variable. This is so because if we replace objects each time, we are not changing the probability of success from one trial to the next.

The expected value of a hypergeometric random variable X is n*(M/N). The variance of a hypergeometric random variable X is n*(M/N)*((N-M)/N)*(N-n)/(N-1).

3.8 Sampling Distribution and the Central Limit TheoremWhenever students hear the term “Theorem” they don’t want to have to do anything with it. For some

reason, the term “theorem” paints a picture of some complex and abstract mathematical symbols that have little, if any, to do with the real-world. But a theorem is essentially a result based on some astute observations and analysis by some very smart people. Not all theorems are scary or involve complex abstract mathematical symbols and some can actually be quite useful in the real world. For example, there is a theorem that says that in a right triangle, the square of the hypotenuse equals the sum of the squares of the other two sides. You can think of this as a result, based on some keen observation and analysis of a right triangle and this result is extremely useful for architectures and engineers. Sometimes I wish they didn’t call these observations by such scary name such as “a theorem”.

The central limit theorem is one such theorem. It could have been called – a great concept or an interesting result or an amazing observation so that the students of statistics would embrace this great concept or an interesting result or an amazing observation as opposed to distance themselves from it because the central limit theorem is, in fact, a great concept, an interesting result, and an amazing observation made by some very intelligent statisticians.

Before we get into this amazing observation which we call the Central Limit Theorem, let’s first get our feet wet with the concepts of a sampling distribution. And before we talk about sampling distributions, let us test your intuition first. Suppose there are 100 students in a class. Suppose we treat this group of 100 students as our population. Let’s find the mean height of 100 students and the standard deviation of the height of these 100 students. Let’s represent the mean as µ and the standard deviation as σ. Now let’s make ten groups of two students each selected randomly from the population and let’s find the mean height of students in each group. So we have ten means. Now let’s find the mean and the standard deviation of these ten means and call them Mean(Means2) and s(Means2) respectively. Next, suppose that instead of ten groups of two students, we make ten groups of four students each. Again we find the mean height of students in each group, this time the mean is based on four students. Again we find the mean and the standard deviations of all ten means and call them Mean(Means4) and s(Means4). Let’s repeat this with ten groups of nine students instead of four students. We

20

will now have the means and standard deviation of ten means and we will call them Mean(Means9) and s(Means9).

If you have not understood the situation above, please go back and reread it because now we will ask you some questions. The first question is – which of the three means of means – Mean(Means2) or Mean(Means4) or Mean(Means9) will be the closest to µ? The second question is – which of the three sample standard deviations s(Means2) or s(Means4) or s(Means9) will be the smallest? The third question is – will s(Means2) or s(Means4) or s(Means9) be roughly equal to σ or smaller than σ or larger than σ? Please think about these three questions very carefully and think about their answers before reading any further.

If you have not thought of the answers, please do so before reading further. This exercise, helps a lot in understanding inferential statistics. Let’s discuss each of the three questions one by one. The first question asked was - which of the three means, viz., Mean(Means2), Mean(Means4) or Mean(Means9) will be the closest to µ? If you said Mean(Means2), then you are absolutely wrong. If you said Mean(Means9), then you are absolutely correct. This question was not that hard and we are sure you got it right. Basically it is saying that the larger the sample size, the better is the estimate of the population mean. So a sample size of 9 will clearly provide a better estimate of the population mean than a sample size of 2, because in a sample size of 2, what if you got unlucky and got the two tallest or the two shortest students in the group. But in a sample size of 9, it is very unlikely that you will get all 9 tallest or all 9 shortest students in your sample.

Now let’s discuss the second question. Which of the three standard deviations s(Means2) or s(Means4) or s(Means9) will be the smallest? If you said s(Means2), then you are absolutely wrong. If you said s(Means9) then you are absolutely correct. This question is a little bit harder than the first. So if you got it right, go ahead and pat yourself on your back. Again, the intuition is that the averages of two large samples will be a lot closer to each other than the averages of two small samples. The extreme values in a small sample influence the average of the sample much more than in a large sample. So the standard deviation of the means of larger samples is smaller than that of the means of smaller samples.

Now let’s discuss the third question. Will s(Means2) or s(Means4) or s(Means9) be roughly equal to σ or smaller than σ or larger than σ? This is the hardest of the three questions. Note that σ is the standard deviation of the heights of all 100 students. Note that s(Means2) is the standard deviation of the means of ten groups of students where each group has two students. Similarly s(Means9) is the standard deviation of the means of ten groups of students where each group has nine students. If you said that s(Means2), s(Means4) and s(Means9) will be smaller than σ then you are absolutely correct. Note that in the second question we have already established that s(Means2) > s(Means4) > s(Means9). Now we are saying that s(Means2), s(Means4) and s(Means9) are all smaller than σ.

In fact there is an amazing result that says that s(Meansn) = σ/sqrt(n) where s is the standard deviation of the means of samples of size n. So, in our example, let’s say σ was 3 inches. Then s(Means2) would be 3/sqrt(2) or 3/1.414 or 2.12. s(Means4) would be 3/sqrt(4) = 3/2 = 1.5 and s(Means9) would be 3/sqrt(9) = 1. If we had samples of size 25, then s(Means25) would be 3/sqrt(25) = 3/5 or 0.60.

The concepts discussed above are not about the Central Limit Theorem but about Sampling Distributions. Note that in the above discussion we were dealing with the values of means and standard deviation of means of samples of certain size; i.e. we were imagining lots of samples (ten in our example) and we were looking at the mean of the means of lots of samples. Basically we were talking about how the means of several samples are distributed. The distribution of the means of many samples is called a sampling distribution of the means. So essentially what we have said is that the mean of the sampling distribution of the means gets closer and closer to the mean of the population mean as n becomes larger – this was the intuition in the first question. We also said that the standard deviation of the sampling distribution of the mean decreases as the sample size increases – this was the intuition in the second question above. We also said that the standard deviation of the sampling distribution of the means is smaller than the standard deviation of the population values – the intuition in the third question. And then we quantified the standard deviation of the sampling distribution of the means in terms of the standard deviation of the population using this relationship:

Stdev.(means of samples of size n) = Stdev(population)/ sqrt(n)

Now let me come to the central limit theorem. The central limit theorem talks about the sampling distribution of the mean which is why it was necessary to learn a few things about the sampling distribution of the mean. Before we discuss the amazing observation which we call the central limit theorem, let us test your intuition again. We have talked about the mean and the standard deviation of the sampling distribution of the

21

mean, but what about its shape? Will it be rectangular, or bell shaped or what? Let’s suppose the original population has a normal distribution. It is easy to guess that the shape of the sampling distribution of the means will also be normal. But what if the original population has a rectangular distribution or a triangular distribution or a skewed bell shaped distribution or some other non-normal distribution? What will be the shape of the sampling distribution of the means? Some very intelligent statisticians have made the observation that even if the original population is not normal, the shape of the sampling distribution of the means is approximately normal and the larger the sample size, the more closely the sampling distribution of the means resembles a normal distribution.

This amazing observation is called the Central Limit Theorem. There are no abstract complex mathematical symbols involved and it is very useful in the real world. Why is it so useful? It is useful because we are already very familiar with the behavior of a normal distribution. We know how to compute probabilities of various values for a normal distribution. Knowing that the means of the sample means are normally distributed, we can compute the probabilities that the means of the means will fall within certain range of values.

So the following is a statement of the Central Limit Theorem: The sampling distribution of the mean of a random sample drawn from any population is approximately normal for a sufficiently large sample size. The larger the sample size, the more closely the sampling distribution of the means will resemble a normal distribution. Also, the sampling distribution of the mean of a random sample drawn from a normal population is normal, even for small sample sizes.

Chapter Summary1. A random variable is a variable whose value depends upon chance.2. A random variable can be either quantitative or qualitative. A quantitative random variable can be either

discrete or continuous. 3. Three well-known discrete random variables are binomial, poisson and hypergeometric. 4. Three well-known continuous variables are uniform, normal, t and exponential.5. For a discrete random variable, when we specify the probability of each possible outcome we obtain the

probability distribution of that variable.6. The sum of the probabilities of all possible outcomes of a discrete variable is 1.7. The probability of a specific value of a continuous variable is 0.8. For a continuous random variable, the distribution of probabilities is given by a probability density

function.9. For a continuous random variable, we talk about probabilities for less than a value or greater than a value

or between two values. These probabilities are given by the area under the probability density function.10. The total area under a probability density function is 1.11. The normal curve is symmetric and bell shaped. It is the most commonly found distribution for many

continuous random variables.12. A standard normal curve has a mean of 0 and a standard deviation of 1. 13. Given the mean and standard deviation of a normal distribution, we can find the probability for any range

of values.14. A t-distribution is also symmetric and bell shaped and has a mean of 0 and a standard deviation of 1. But

to find the probability for any range of values, in addition to knowing the mean and standard deviation, we also need to know the degrees of freedom.

15. A uniform variable is a continuous variable having a uniform distribution over a range of values such as a and b. The distribution is given by a rectangle of width of (b-a) and a height 1/(b-a).

16. A binomial variable is a discrete random variable defined as number of successes in n trials, where each trial has only two outcomes - success and failure, and the probability of success in each trial is p. The probability of r successes in n trials is given by P(r successes in n trials) = nCr.pr(1-p)n-r.

17. A poisson variable is a discrete random variable defined as the number of occurrences in a given time or distance or area of volume. If λ is the average number of occurrences, then P(x occurrences) = e-λ.λx/x!

18. A hypergeometric variable is a discrete random variable defined as the number of successes in n trials without replacement from a finite population of size N having M number of successes.

19. Sampling distribution of the mean is the probability distribution of the means of many samples.20. The standard deviation of a sampling distribution of the mean is the standard deviation of the underlying

population divided by the square root of the sample size.

22

21. The shape of the distribution of the sampling distribution is normal if the underlying distribution is normal.

22. The shape of the distribution of the sampling distribution is normal even if the underlying distribution is not normal as long as the sample size is large This is the Central Limit Theorem.

23

anurag-agarwal.comanurag-agarwal.com/.../classnotes/chapter03probability… · web viewnow let us...

Documents