chapter 6 notes and elaborations for math 1125...

Chapter 6 Notes and elaborations for Math 1125-Introductory Statistics

Assignment:

Chapter 6 is also pretty good, so I’ll be following the text pretty closely. It is of the upmost importance that you are

able to find areas under the normal curve. You will need to be able to do this for the remainder of the chapters.

!Section 6.2: p.322!

The bit on “determining normality” is just plain wrong. One can’t just assume that a distribution is normal because

it looks bell shaped. There are some hard distributions out there to deal with that look very much like a normal

distribution. Assuming normality is something that is fairly difficult to verify and is beyond the scope of this class.

As an interesting footnote, it is speculated by some very bright econometrists that many of our basic assumptions

of normality in financial risk are not correct. We love to assume normality because we know so much about the

normal curve. But it appears as though many types of risk are actually distributed as some very nasty distributions,

e.g., Cauchy and Levy distributions. They can look just like normals to the untrained eye, but they make for very

unstable and unreliable estimates.

You do not have to read about the finite population correction factor at the end of 6.3.

Do the following exercises:

6.1: 1-5 all, 7-13 odd, 19, 23, 27, 31, 35, 45, 47. (It’s a little overkill, but make sure you know what to do)

6.2: 3, 5, 9, 11, any of the odds from 13 to 23 are pretty good.

6.3: 1-7all, 9-15 odd.

Chapter 6 concerns the normal distribution, one of the more famous continuous probability distributions. This

chapter is extremely important as we will be utilizing the concepts and mathematical procedures from this chapter

throughout the rest of the course. We’ve already discussed the normal distribution once when working with the

Empirical Rule (you should review this now, it’s very important for this section). I’m sure this is not the first time

many of you have heard of this particular distribution; we teach it in high school math almost everywhere now.

Before we learn to solve problems via properties of the normal distribution (which is the adventure upon which we

are about to embark), let’s ask why the normal distribution is so important. It’s true that many things we

study/research are normally distributed random variables, but it’s also true that many things we study/research are

not normally distributed random variables. Because of the beautiful properties of the Empirical Rule, we would

probably rather work with normal random variables than with non-normal random variables (remember the

Empirical Rule is based on the normal distribution), simply because the probabilities are easy to compute. Indeed,

people have already made tables of all the probabilities we’ll ever need to use to solve the problems we’ll look at,

but more on this in a bit.

So, we’d rather work with normal distributions. Well, as it turns out, we have a

fundamental theorem that says we can essentially tease a normal distribution out of almost

any population regardless of the population distribution. This theorem is called the

Central Limit Theorem (CLT), and it is a very powerful tool (using this theorem is kinda

like wielding a Jedi lightsaber).

We will begin these notes with a general discussion of the meaning of the CLT followed

by more than you probably ever wanted to know about the normal distribution, and we

will conclude with a formal statement of the CLT. Learn this material well because we

will be using it throughout the rest of this course.

____________________________________________

6.0 Introduction to the Central Limit Theorem (CLT)

The central limit theorem is a really big and important part of statistics. To make any sense of it, you need to

recognize that an average can be looked at as a random variable, i.e., for any experiment we may devise, we don’t

know the value of the average (if it exists - there are some distributions which have no mean) until after we’ve

performed the experiment. Thus, the average is also a random variable. The CLT concerns this new random

variable.

We will start with an example so you can get a feel for it and then discuss the CLT a bit more. We withhold a

formal statement of the theorem until post-discussion of the normal distribution.

_______________________________

Example 6.0.0

Suppose you have a fair die. Let X be the value of the roll of the die. So X is a discrete random variable. We're

going to be generating many trials (a trial is a roll) of this random variable, so we need to construct some notation.

If you roll the die 100 times, you can label each roll of the die like this:

X1 = whatever you roll the first time

X2 = whatever you roll the second time

X3 = whatever you roll the third time

X100 = whatever you roll the 100th time

What if we take the average of all of these values? (We will be doing this a lot for the rest of this class.) Realize

that the average is again a random variable. Let’s give this random variable a name.

Let Y = the average of the random variables X1 up to and including X100

= .

Then Y is a random variable. That is, we don't know what it is equal to unless we know the values of X1 to X100.

As with any random variable, however, we can consider all theoretical possibilities to obtain the distribution table

of Y (this is rather complicated and better left to computers). But let’s go through a sketch of the idea.

The rolls of a fair die are independent of each other, and each roll of the die is a random variable just like all the rest

of the rolls. So we give these “trials” a statistical name. We say the random variables X1 to X100 are independent

and identically distributed. Think about this: the process of rolling a die does not change from roll to roll and each

roll is independent of every other, i.e., the rolls are 100 independent, identical experiments. So, it should make

intuitive sense that they have identical distributions. We usually just abbreviating this to “iid.”

In this example, I would write: X1 to X100 are iid random variables, distributed like X (the value of a roll of a die).

And the random variable Y is the average as defined above.

You already know what the distribution of X looks like. Recall its distribution table is:

Probability Histogram for X1

Probability Histogram for Y1,2

X 1 2 3 4 5 6

P(X) 1/6 1/6 1/6 1/6 1/6 1/6

Its probability histogram looks flat on top (it’s actually what we

call a uniform random variable, but you do not need to know

this).

But when we start looking at averages of random variables identical to X, the distribution changes. For example,

if we let Y1,2 be the average of X1 and X2, we get the following distribution table:

Y1,2 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

P(Y1,2) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

It would be a good exercise for you to try and recreate the previous table. (Hint: make a table like we did for sums

of two dice but fill it with averages instead.)

The probability histogram for Y1,2 is on the right. Notice that

the histogram is no longer flat on top. It is closer to bell-

shaped. If we looked at the probability histogram of the

average of X1, X2, and X3, we’d see an even better

approximation of a bell curve.

What do you think the probability histogram for the average of

50 rolls looks like? What do you think the probability

histogram for Y looks like? If you’re guessing that the

distribution of the average appears more and more normal (to

be discussed next) as the number of rolls increases, you’re

correct. This is the CLT at work.

_______________________________

Now, we generalize to n iid random variables. Suppose you have a “good” random variable, X (most random

variables you would encounter in the real world are “good.”). It doesn't really matter if it is discrete or continuous,

but you might find it easier to think of it as being discrete. Suppose we have n trials of this random variable. Let

Y be the average of all n iid random variables, X1 up to Xn. Symbolically, this look like

Y = .

Then, the distribution of the random variable Y is almost normal provided n is large enough. Got it? That is the

central limit theorem. This theorem will pervade almost everything we do in the second half of this class.

The central limit theorem is exactly the reason why we study the normal distribution. Any random variable that is

really the right type of sum of other random variables tends to look like a normal distribution.

We will, in general, assume that when the sample size is larger than 30, that the sample average is distributed like

a normal random variable via the CLT. It is simply for convenience. In many situations it isn’t a safe assumption,

and in some others 30 is complete overkill. We assume this for the sake of learning the associated concepts.

Here are some fairly cool simulators for more practice with the CLT. If we had actual class time, I’d probably walk

you through a similar lab, so please take the time to check these out and see what’s going on. Email or post any

questions.

http://www.stat.sc.edu/~west/javahtml/CLT.html

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

http://elonen.iki.fi/articles/centrallimit/index.en.html#demo

___________________

6.1 The Normal Curve

To really own the normal curve, we should begin with a normal random variable.

Definition 6.1.0 A normal random variable is a continuous random variable whose probability distribution is

normal.

So, we need to know what it means for a distribution to be normal. It is no simple feat to properly define this

without the use of Calculus, but I will try and give you some intuition. We’ve studied the binomial distribution

already, so let’s briefly go over some relevant aspects of what we studied to see if we can make some analogies to

normal distributions. To that end, let’s recall the picture of the binomial distribution of Example 5.2.0:

Binomial(4, 1/5)

This is a picture of the probability distribution of a binomial random variable X with n = 4 and p = 1/5. The vertical

bars represent probabilities. For example, the probability that X is 0 is about 0.41, and the probability that X is 2

is about 0.15 (note that there is positive probability that X is 4, but it is so very close to 0 that the bar doesn’t show).

The fixed values n and p are parameters of the distribution of X. The parameters determine what the picture looks

like. We write X ~ Binomial(n, p), which we read “X is distributed binomial with parameters n and p.” In Example

5.2.0, X ~ Binomial(4, 1/5).

For example, if the quiz has 5 questions instead of 4, then X ~ Binomial(5, 1/5) and the picture looks as follows:

Binomial(5, 1/5)

Again, there are positive probabilities on 4 and 5, but they are so close to 0 that they do not show.

If the quiz has 10 questions, X ~ Binomial(10, 1/5) and the picture looks like this:

Binomial(10, 1/5)

And again, there are positive probabilities on 6, 7, 8, 9, and 10, but they are so close to zero that they do not show.

Try calculating them to see.

In the following example, we give you some pictures of the probability distributions of binomial random variables

with different parameters.

________________________________

Example 6.1.0

Binomial(40, 1/10) Binomial(40, 1/2)

Binomial(6, 1/3) Binomial(20, 1/100)

________________________________

The binomial distribution is a discrete probability distribution, i.e., a binomial random variable can take on a

countable number of values. Namely, a binomial random variable can take on the value of any integer between and

including 0 and n. As such, we can talk about the probability that X is equal to some value as well as the probability

that X is less than or equal to some value and the probability that X is greater than or equal to some value. Equation

5.2.0 is the general equation that gives the probability that X is equal to some value x; it describes all the above

pictures.

We are going to approach the normal distribution in much the same manner, but the normal distribution is a

continuous probability distribution, i.e., a normal random variable can take on an uncountable number of values.

Namely, a normal random variable can be any real number (you remember your real number line, right?). The

pictures of the probability distributions of normal random variables are smooth, bell-shaped curves described by

the following function (you do not have to know this function, just of its existence, i.e., we’ll be sketching lots of

these curves without ever mentioning the word function or the following expression ever again).

For x a real number, define a function f by

Equation 6.1.0 ,

where μ is a.fixed real number, σ2 is a fixed positive real number, and x is the variable that represents the realization

of a normal random variable X. The graph of Equation 6.1.0 is a picture of the probability distribution of X. Note

that μ and σ2 are parameters of the distribution of X just as n and p are parameters of the distribution of a binomial

random variable. We generally call the graph of Equation 6.1.0 the normal curve.

The parameters of the normal distribution are the mean and variance of the normal random variable, i.e., for X a

normal random variable (so the picture of X’s probability distribution is given by the graph of Equation 6.1.0), we

have

Equation 6.1.1 E(X) = μ ,

and

Equation 6.1.2 V(X) = σ2.

We write X ~ N(μ, σ2), which we read “X is distributed with mean mu and variance sigma-squared.”

Following are some pictures of normal curves with different parameters. Again, it is important to note that the total

area under each curve is 1. The graph gets closer and closer to the x-axis as x gets larger and smaller than the mean,

but it never actually touches it.

________________________________

Example 6.1.1

N(-6.2, 2) N(-6.2, 1/2)

N(π, 3) N(π, 1/3)

N(0, 6) N(0, 1/6)

________________________________

Hopefully you’re wondering how we use these pictures to find probabilities. Well, probabilities are given by areas

under the curves. For those of you versed in the Calculus, we’re talking integrals here (you do not need to know

this). The normal curve is symmetric about its mean, which means that if you take the graph of the normal curve,

and fold it along the line x = μ , then the two halves of the graph coincide perfectly. This is very nice for us, as

you’ll see throughout the rest of the course.

The following example gives you the idea of how we do it.

________________________________

Example 6.1.2

Let’s suppose X is a normal random variable with mean -6.2 and variance 2, i.e., X ~ N(-6.2, 2). So, the picture of

the probability distribution is the first picture in Example 6.1.1. I’m going to rescale this picture a bit for illustration

purposes.

Here’s the picture:

N(-6.2, 2)

Now, let’s ask some probability questions and see what we come up with for answers.

1. What is P(X � -6.2)? P(X � -6.2)?

Remember that this normal curve is symmetric about -6.2. Let’s investigate what this means with a picture.

To solve the types of problems we will be considering, we don’t really care about the values on the y-axis.

In fact, we usually don’t even draw the y-axis, but I’m too lazy to figure out how to entirely remove it from

my pictures so it remains in our pictures. Sorry.

Consider the following picture:

N(-6.2, 2)

The dashed line at the mean divides the area under this curve in half (because it’s symmetric!). That is, 50%

of the area under the curve lies to the left of the mean and 50% of the area under the curve lies to the right

of the mean. Consider the following pictures:

N(-6.2, 2) N(-6.2, 2)

In the picture on the left, the shaded area is the area under the curve to the left of the mean. In the picture

on the right, the shaded area is the area under the curve to the right of the mean. Recall that we’ve said a

few times now that probability is synonymous with area under the curve. So, let’s now reconsider the

questions.

We want to find P(X � -6.2) and P(X � -6.2). Let’s consider what outcomes satisfy the event X � -6.2.

Well, any x that is smaller than or equal to -6.2 satisfies the event. So, we find P(X � -6.2) by finding the

area under the curve to the left of -6.2. But we already know from above that this area is 50%. So, we have

P(X � -6.2) = 0.5.

Similarly, we get

P(X � -6.2) = 0.5.

You should note that P(X � -6.2) + P(X � -6.2) = 1.

2. What is P(X � -6.2 + )? P(X � -6.2 - )? P(X � -6.2 +2 )? P(X � -6.2 - 3 )?

This probably looks scary, but it’s not so bad. We know that X~ N(-6.2, 2), so the variance of X is 2, which

means the standard deviation of X is .

Now we just need to use the Empirical Rule.

The picture at the right is from the notes for

chapters 2 and 3. One more time, go back and

reread about the Empirical Rule if you have not

already done so. We’re going to draw a

corresponding picture of our distribution and

then it will be easy to answer the questions.

The figure to the right is the corresponding picture

for our distribution. Figure out where the numbers

came from!

Now we just use the areas given by the Empirical

Rule to answer our questions. Why? The areas are

probabilities! Here we go.

The first question was P(X � -6.2 + ). Notice

that we’re looking for the probability that X is to

the left of -6.2 + = μ + σ. So we need the area

under the curve to the left of μ + σ. We know that

half the area under the curve lies to the left of the

mean, so we need to add to that the area under the

curve between μ and μ + σ. This is about 34.13%

according to the Empirical rule, so we have

P(X � -6.2 + ) � 0.5+0.3413 = 0.8413.

Check out the picture on the right.

Similarly, we get the following results:

P(X � -6.2 - ) � 0.5-0.3413 = 0.1587,

P(X � -6.2 +2 ) �1-(0.5+0.3413+0.1359) = 0.0228, and

P(X � -6.2 - 3 ) �1-0.0015 = 0.9985.

Here are the corresponding pictures.

We want the area to the left of μ-σ. We want the area to the right of μ+2σ. We want the area to the right of μ-3σ.

N(-6.2, 2)

We want the area to the left of μ+σ.

3. What is P(-6.2 - 3 � X � -6.2 - )? P(-6.2 - � X � -6.2 + )?

For these types of problems, we’re looking for “chunks” of area, i.e., we’re looking for the area under the

curve between two numbers instead of area to the left of or to the right of a single number.

We want the area between μ-3σ and μ-σ. We want the area between μ-σ and μ+σ.

Using the empirical rule and our sketches above as guides, we get

P(-6.2 - 3 � X � -6.2 - ) � 0.5-0.3413-0.0015 = 0.1572, and

P(-6.2 - � X � -6.2 + ) �0.3413*2 = 0.6826.

4. What is P(-4 � X � 0)?

This question is not so easy. We can not get the numbers -4 and 0 by adding or subtracting multiples of the

standard deviation to the mean. Luckily enough, people have already calculated entire tables of all the

probabilities we’ll ever need. However, these tables are based on the standard normal distribution, so we

must learn this before we can find P(-4 � X � 0) (by using the table). We will provide an answer to this

question in section 6.3.

________________________________

___________________

6.2 The Standard Normal Curve

The standard normal distribution is a normal distribution with mean of 0 and standard deviation of 1, i.e., if X is a

standard normal random variable, then X ~ N(0, 1). The standard normal curve (the picture/graph of the distribution)

has some very nice properties that we can utilize when we know we’re working with normally distributed variables,

since we can ‘”standardize” any normally distributed random variable. That is, any normal random variable can

be turned into a standard normal random variable via a process we call standardization. And we like to do this so

we can use the table (next section).

We can think of the standardization of a normal random variable as a sort-of z-score transformation of the random

variable. Recall that to find the z-score of a data point, we first subtracted the mean from the data point and then

divided by the standard deviation. The formula was

.

Now suppose X ~ N(μ , σ). If we apply the same process as in z-score transformations, we get a new random

variable, let’s call it Z. So, we find Z by subtracting the mean from X and then dividing by the standard deviation

as follows:

.

Then we have that Z ~ N(0, 1). We have standardized X to get the standard normal random variable Z.

The picture of the probability distribution of a standard normal random variable is the graph of the function defined

by

Equation 6.1.3 .

Notice this is just Equation 6.1.0 with μ = 0 and σ = 1.

Here’s a picture of the standard normal curve. Again, the area under this curve is 1.

When we apply the Empirical Rule to the standard normal curve, we get the following very nice picture. But,

you’ve seen this before because you reread the section on the Empirical Rule, right?

Now, we are not quite ready to completely answer question 4 of Example 6.1.2, but we can gather some more

information for it. Remember that I said people have made an entire table of all the probabilities we’ll ever need,

but they are based on the standard normal distribution? So, we need to convert P(-4 � X � 0) into a probability

involving a standard normal random variable. Then, in the next section, we’ll look up what we need to in the table

in order to give a final, complete answer.

________________________________

Example 6.2.0

Recall the formula to standardize a normal random variable: .

In Example 6.1.2, X ~ N(-6.2, 2). So, we have μ = -6.2 and σ = and we get that and Z ~ N(0, 1).

Consider that the statement -4 � X � 0 is equivalent to the statement

� Z � .

Since we have

�1.56 and �4.38 ,

we thus have

P(-4 � X � 0) �P(1.56 � Z � 4.38).

Now we just need to find P(1.56 � Z � 4.38) from our handy-dandy table.

________________________________

________________________________________

6.3 Using the Table E (pgs. 782-3 in your book)

Table E in your book is a table of probabilities of the standard normal distribution, i.e., it is a table of areas under

the standard normal curve. Note that this is a complete table, but since the distribution is symmetric, we could (and

often do) use only the table on pg. 783. So keep this in mind if you’re using a different table.

Reading the table may seem tricky, but it’s really not. The data values in the table are areas under the curve, i.e.,

they are probabilities, and the numbers on the left-hand side and along the top are the standardized scores (if the

standardized score is 1.36, go to the row 1.3 and over to the column 0.06). Let’s do some examples. We begin with

the completion of Example 6.2.0.

________________________________

6.3.0

Recall Example 6.2.0. We need to find P(1.56 � Z� 4.38). Let’s begin by looking up the relevant numbers in our

table. The number in the table that corresponds to 1.56 is 0.9406. To find this, use the table on pg. 783. In the left-

hand column, go down to the row 1.5, and then go across to the column 0.06. The number you come to is 0.9406.

This means 94.06% of the area under the standard normal curve lies to the left of 1.56. Here’s a picture:

The number that corresponds to 4.38 is 0.9999. The table ends at 3.49, and it is common practice to take the

corresponding area for any number higher than 3.49 to be 0.9999. Here’s a picture:

So, to find P(1.56 � Z � 4.38), we need the area between 1.56 and 4.38. Here’s a picture:

We want the shaded area. We get that

P(1.56 � Z � 4.38)�0.9999-0.9406 = 0.0593.

So, we can finally answer the original question:

P(-4 � X � 0)�0.0593.

________________________________

It would be a good exercise for you to check the answers to the first three questions of Example 6.1.2 by

standardizing and using the table.

Here’s a more general idea of how we will apply everything previously discussed in these notes.

________________________________

6.3.1

Consider a random sample (n=500) from a normal population with X� =500 and SD=100 (GRE).

(a) What percentage of the data points have values less than 380? This question could also be phrased as, if we

chose a data value at random, what’s the probability that it will be less than 380? In probability notation, this is

P(X < 380) = probability that a randomly chosen data value is less than 380,

where X ~ N(500, 1002).

To find this, we first standardize the value 380. In doing so, we get that the statement X < 380 is equivalent

to the statement

.

So, we have that P(X < 380) �P(Z < -1.20) and it remains to find the area under the standard normal curve

to the left of -1.2. On pg. 782, go down the z-column until you come to -1.2. Since the second decimal

place of our standardized score is 0, we use the number in the 0.00 column. So, we are in the -1.2 row and

the 0.00 column. The number there is 0.1151. This means that 11.51% of the area under the curve lies to

the left of -1.20 (the shaded area below). Here is a picture of the situation. Always sketch a

picture.

Standardized curve Original curve

Thus, we see that P(X < 380) � 11.51%. Can you find P(X > 380)? How about P(380 < X < 500)?

(b) What percentage of data points have values between 375 and 735? or if we chose a data value at random,

what’s the probability that it will be between 375 and 735?

First, we need the corresponding standardized scores. The standardized score of 375 is -1.25 (375 is 1.2

SD’s below the mean), and the standardized score of 735 is 2.35 (735 is 2.35 SD’s above the mean).

Calculate these to check.

So we must find the total area under the standard normal curve between -1.25 and 2.35. Using the table, we

see that the number in the -1.2 row, 0.05 column is 0.1056. So, 10.56% of the area under the curve falls to

the left of -1.25. Now, we go to the 2.3 row, 0.05 column, and we see that 99.06% of the area under the

curve falls below 2.35. Here’s a picture of the situation. We want the shaded area.

Standardized curve Original curve

Do you see how to get the values for the shaded area? If not, figure it out or post discussions.

Add them together and we get 88.5% of the data values are between 375 and 735. With probability, this is

stated

P(375 < X < 735) � 88.5%.

(c) Say our sample is a sample of 500 people who took the GRE. Say your university accepts people if they scored

at least 672 points on the GRE. About how many people from this sample would your university accept?

First, the standardized score of 672 is 1.72. Using the table in your book, we see that 95.73% of the data

lies to the left of 672. We want the percentage scoring above 672. Here is a picture of the situation. We

want the shaded area. Do you see what to do?

From this picture, we see that 4.27% of the area is to the right of 672, i.e., P(X > 672) = 4.27%, which tells

us that approximately 4.27% of our sample will be accepted to the university. Since our sample size is 500,

we multiply 500 by 0.0427 to get that about 21 people from this sample would be accepted to your

university.

________________________________

Some very important standardized scores for us will be ± 1.645, ± 1.96, ± 2.33, and ± 2.575. Look at the

corresponding areas/probabilities and see if you can figure out why. (These will provide for good quiz questions.)

________________________________________

6.4 Formal statement of the CLT

The CLT (central limit theorem)

Let X be a random variable with expected value μ and standard deviation σ. Let X1 to Xn be independent and

identically distributed trials of X. Let Y be the average of X1 to Xn. Then, for most probability distributions, Y is

approximately a normal random variable. The bigger n is, the closer to being normal Y will be. The expected value

of Y is μ and standard deviation of Y is , i.e., Y ~ N(μ, ).

We are now ready to begin the study of inferential statistics.

chapter 6 notes and elaborations for math 1125...

Documents