normal dist1 continuous probability distributions: the normal distribution

68
Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Post on 20-Dec-2015

238 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist1

Continuous Probability Distributions:

The Normal Distribution

Page 2: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist2

Towards the Meaning of Continuous Probability Distribution Functions:

When we introduced probabilities, we spoke of discrete events:

S = collection of all possible sample points ei

0 P(ei) 1 Probability of any event is

between zero and one

P(ei) = 1 Probability of all elementary

events sum to 1 (somethinghappens)

Page 3: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist3

In particular, for the binomial distribution:

For the random variable X:

• x stands for a particular value

The probability that the random variable X takes the value x is between 0 and 1, inclusive.

The sum of the probabilities over all possible values of x is 1.

0 [ ] 1P X x

[ ] 1all x

P X x

Page 4: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist4

A continuous variable has infinitely many possible values:

With infinitely many possible values, the probability of observing any one particular value is essentially zero:

[Pr(X=x)] = 0 e.g., for x=1.0 vs 1.02 vs 1.0195

vs 1.01947, …

Pr(X=x) is meaningless for a continuous random variable – Instead, we consider a range of values for X:

Pr(aX b)

We can make this range quite broad or very narrow

Page 5: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist5

Discrete ContinuousList all possiblesample points, e.g.,

S={ei}, i=1 to k.

State the range of of possible values of X; e.g.,

Comparing Probability Distributions for Discrete vs Continuous Random Variables

We need new notation to describe probability distributions for continuous variables.

0

0

to

to

to

Note: is the symbol for ‘infinity’

Page 6: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist6

For a continuous Random Variable, X,

• P(X=x) = 0

• Instead, we compute the probability of X within some interval:

[ ] ( )b

x

a

P a X b f x dx

This function is the probability density

function of X.

Don’t worry – if you don’t know or have forgotten calculus, I won’t be asking you to work with this notation.

Page 7: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist7

Much of statistical inference is based upon a particular choice of a probability density function,

fx(x) –

The Normal distribution.

• This function is a mathematical model describing one particular pattern of variation of values.

• It is appropriate for continuous variables only.

Page 8: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist8

Practically speaking, the normal distribution function is appropriate for:

• Many phenomena that occur naturally.

• Special cases of other phenomena. e.g., averages of phenomena that, individually are not normally distributed.

For example, the sampling distribution of means may follow a normal distribution even when the underlying data do not.

Page 9: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist9

The Normal Probability Density Function2

2

( )

21( ) e

2

x

xf x

Features to note:

The range of X is – to

is the mathematical constant 3.14159…

e is the mathematical constant 2.71828…

Page 10: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist10

The Normal Probability Density Function2

2

( )

21( ) e

2

x

xf x

Features to note: is the mean of the distribution

is the standard deviation of the distribution

2 is the variance

(x – )2 the squared deviation from the mean appears in the function

Page 11: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist11

Notation:

X ~ N(,2)

We say

“X follows a Normal Distribution with mean and variance 2 ”

or

“X is Normally distributed with mean and variance 2 ”

Page 12: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist12

fx x

x

A Picture of the Normal Distribution

The infamous “Bell-shaped Curve”

Page 13: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist13

There are infinitely many normal distributions, each determined by different values of and 2.

The Shape of the Normal Distribution is characteristically

• Smooth

• Defined everywhere on the real axis

• Bell-shaped

• Symmetric about the mean = (it is defined in terms of deviations about the mean)

Page 14: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist14

The area under the curve represents probability, and the total area under the curve = 1

fx x

x

2

2

( )

21Pr[ ] 1

2

x

X e dx

Page 15: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist15

x

Pr[X < x]

The area under the curve up to the value x is often represented by the notation:

( ) Pr[ ] Pr[ ]x X x X x

Page 16: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist16

A Feeling for the Shape of the Normal distribution:

locates the center, and

measures the spread

Page 17: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist17

c

IF alone is changed – by adding a constant c,

• the entire curve is shifted in location

• but the shape remains the same.

Page 18: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist18

IF alone is changed – by multiplying by a constant c

• the shape of the bell is changed

• a larger variance implies a wider spread (or flatter curve) – the area under the curve is always 1

c

Page 19: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist19

Picturing the Normal Probability Density

xAs the variance, 2, increases:

• Bell flattens (gets wide)

• Values close to the mean are less likely

• Values farther from the mean more likely.

As the variance decreases:

• Bell narrows

• Most values are close to the mean

• Values close to the mean are more likely

Page 20: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist20

A Very Handy Rough Rule of Thumb:

If X follows a Normal Distribution

Then: ~68% of the values of X are in

the interval

68%

Page 21: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist21

If X follows a Normal Distribution

Then: ~95% of the values of X are in the interval

1.96

~99% of the values of X are in the interval

2.576

Page 22: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist22

Why is the Normal Distribution So Important?

There are two types of data that follow a normal distribution:

1. A number of naturally occurring phenomena:

For example :

• heights of men (or women)

• total blood cholesterol of adults

2. Special functions of some non-normally distributed phenomena, in particular sums and averages:

The sampling distribution of sample means tends to be ~ Normal.

Page 23: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist23

Research often focuses on sample means

Example: Blood pressure can vary with time of day,

stress, food, illness, etc. One reading may not be

a good representation of “typical”

Distribution of a single reading of blood pressure

for an individual

– tends to be skewed, with a few high values

Page 24: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist24

To have a better gauge of an individual’s BP, we

might use the average of 5 readings:

Sampling Distribution of mean of 5 readings for an

individual

– tends to be ~ Normal, even when the original

distribution is not

Page 25: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist25

A Feeling for the Central Limit Theorem.

• Shake a pair of die.

• On each roll, note the total of the two die

faces.

• This total can range from 2 to 12.

• The most likely total is 7. (Why?)

• How often do the other totals arise?

Histogram of die totals for n=100 trials of rolling die pair

2 3 4 5 6 7 8 9 10 11 12

Page 26: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist26

Histogram of die totals for n=1000 trials of rolling die pair

As the sample size n increases the distribution

of the sum of the 2 die begins to look more and

more normal.

2 3 4 5 6 7 8 9 10 11 12

Page 27: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist27

A Statement of the Central Limit Theorem:For any population with

• mean and finite variance 2, • the sampling distribution of means, x, • from samples of size n from this population, • will be approximately normally distributed• with mean , • and variance 2/n, • for n large.

That is, for n large, and X ~ ?? (, 2)

then Xn ~ N (, 2/n)

Page 28: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist28

This is the main reason for our interest in the normal distribution:

• regardless of the underlying distribution

• if we take a large enough sample

• we can make probability statements about means from such samples

• based upon the normal distribution.

This is true, even when the underlying distribution is discrete.

Page 29: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist29

Example: The Central Limit Theorem Works

even for VERY non-normal data:

A population has only 3 outcomes in it:

1 2 9 X

1

29

mean of 1,2,9{ }: =4

P(X=x) 1/3

sum of 1,2,9{ }=12

standard deviation of 1,2,9{ } =3.6

Page 30: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist30

Experiment: Take sample of size n with replacement. Compute sum of all n. Repeat…

Look at Sampling Distribution of Sums

n=25

n=100

n=50

Page 31: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist31

To compute probabilities for a normal distribution.

• Recall that we are looking at intervals of values of the random variable, X.

• The probability that X has a value in the interval between a and b is the area under the curve corresponding to that interval:

a b

Pr( ) ( )b

x

a

a X b f x dx

Note: since Pr(X=a) or any exact value is zero, this can be written as Pr(aXb) or Pr(a<X<b)

Page 32: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist32

The symmetry of the normal distribution can

also help in computing probabilities.

• The normal distribution is symmetric about

the mean µ.

• This tells us that the probability of a value

less than the mean is .5 or 50%,

• and the probability of a value greater than

the mean is also .5 or 50%

0.5 0.5

Pr( ) ( ) 0.5xX f x dx

Page 33: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist33

The Standard Normal Distribution

The standard normal distribution is just one of

infinitely many possible normal distributions.

It has

mean: = 0

variance: 2 = 1

By convention we let the letter Z represent a random variable that is distributed Normally with =0 and 2=1:

Z ~ N(0,1)

=0

=1

Page 34: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist34

The standard normal distribution is important for several reasons:

• Probabilities of Z within any interval have been computed and tabulated.

• It is possible to look up Pr(a Z b) for any values of a and b in such tables.

• Any other normal distribution can be transformed to a standard normal for computing probabilities.

• Distances from the mean are equivalent to number of standard deviations from the mean.

This last is perhaps of greatest interest to us, now that software does much of the transformation and computation for us.

Page 35: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist35

Table 3 in the Appendix of Rosner gives areas under the normal curve, in 4 different ways:

• Column A gives values between – and z, where z is a particular value of the standard normal distribution.(Note: Rosner uses X rather than Z)

That is, column A gives values for

Pr(– Z z) = Pr(Z z)

z is also known as a standard normal deviate.

z 0

Pr[Z < z]

Page 36: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist36

Table 3 in the Appendix of Rosner:• Column B gives values between z and

Pr(z Z ) = Pr(z Z) = Pr(Z z)

• Column C gives values between 0 and z

Pr(0 Z z)

• Column D gives values between -z and z Pr(-z Z z)

0 z

-z 0 z

0 z

Page 37: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist37

A probability calculation for any random variable, X~Normal (,2) can be re- expressed as an equivalent probability calculation for a standard Normal (0,1).

This is nice because

• we have tables for probabilities of the Normal (0,1) distribution.

• We can interpret probabilities in terms of # of std deviations from the mean

Of course, we can also use computer programs to compute probabilities for any Normal Distribution – the program does the translation for us.

Page 38: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist38

The Normal (0,1) or Standard Normal Table.Positive values of z are read from the first column (under x in Rosner)

The shaded area,

which is the

probability of Z z,

is shown under Col

A of the table:

Pr(Z < 0.31) = .6217

z0.31

z A B C D 0.0 .5000 .5000 .0 .0 0.01 .5040 .4960 .0040 .0080 … 0.30 .6179 .3821 .1179 .2358 0.31 .6217 .3783 .1217 .2434

0

Pr[Z < 0.31]

A check that this makes sense: any positive value of z is above the mean, and should have a probability > .5

Page 39: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist39

Note that only positive values of z are tabulated.

We can take advantage of a few important features of the standard normal, to compute probabilities for values of z less than zero:

• Symmetry Pr(Z -z) = Pr(Z z)

• Zero is the median Pr(Z 0) = Pr(Z 0) =

.50

• Total area is 1 Pr(Z z) + Pr(Z z) = 1

Page 40: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist40

z = 0.31z = - 0.31

We can read this probability from Col B

Pr(Z > 0.31) = .3783

Use the property of symmetry to get this.

Pr(Z <- 0.31) = .3783

For example, we cannot read Pr(Z < -0.31) directly from the tables.

We can, however use the property of symmetry:

Page 41: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist41

-z 0 z

Page 42: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist42

Example Word Problem

What is the probability of a value of Z more than 1 standard deviation below the mean?

Solution: Since = 0 and = 1

1 standard deviation below the mean is

z = x

Pr(Z<-1) = 0.1587

-1 0

The probability of observing a value more than 1 standard deviation below the mean is .1587, or just under 16%.

Page 43: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist43-1.50 0 1.50

Example: What is the probability Z is between –1.5 and 1.5?

We can read this from Column D of the Table in Rosner:

Pr[-1.50 Z 1.50] from the table: 0.8664

Example: What is the probability of Z more than 1.5 standard deviations from the mean in either direction?Since probabilities sum to 1:

Pr[ Z -1.50 or 1.50 Z ] = 1 – 0.8664 = 0.1336By symmetry, half of this or 0.0668 lies at either end.

.0668.0668

Page 44: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist44

Exercise

Find the area under the standard normal curve between Z = +1 and Z = +2

Solution.

It helps to draw pictures!

0 1 2 0 2 0 1

Pr(1<Z<2) = Pr(Z<2) Pr(Z<1)

= 0.9772 0.8413

= 0.1359

Page 45: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist45

Notes on using Standard Normal Tables:• These come in a variety of formats. The examples

given here are for the version seen in Rosner, Table 3 in the Appendix.

• Look at the accompanying picture of the distribution to be clear what probability is listed in the body of the table.

• Draw a sketch (paper and pencil) when computing probabilities – it always helps you keep track of what you are doing.

• Minitab provides the same probabilities as Column A: Pr(X<x), when Cumulative Probability is selected

Page 46: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist46

Using Minitab:

Calc Probability Distributions Normal

Select for Pr(Z<z)

or Pr(X<x)

Enter value of z

(or x)

Page 47: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist47

Solution: Again, it helps to draw a picture!

We want the area under the curve to be 75% --

The value of z we want is the value, below which 75% of values are found.

That is, find z.75 so that Pr(Z < z.75) = .75

0 z.75

0.75

Finding Percentiles of the Normal Distribution

Example: What is the 75th percentile of N(0,1) ?

Page 48: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist48

Inverse Cumulative Distribution FunctionNormal with mean = 0 and standard deviation = 1.00000

P( X <= x) x

0.7500 0.6745

Use the Inverse Cumulative Option in Minitab

Input desired percentile

Page 49: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist49

Standardizing a Normal Random Variate:

From N(,2) to N(0,1)

We can transform any Normal distribution to a standard normal by means of a simple transformation:

2~ ( , )X N ~ (0,1)X

Z N

Page 50: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist50

Standardizing a Normal Random Variate:

From N(,2) to N(0,1)

Adding a constant:

For X~N(,2) (X+b) ~ N(?,?)

b

The mean is shifted over ‘b’ units, but the variance or spread of the data is unchanged by adding a constant:

(X+b) ~ N(+b, 2)

Page 51: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist51

Multiplying by a constant:

For X~N(,2) (aX) ~ N(?,?)

a

The mean is adjusted to ‘a’ times the original mean, and the variance by a2 times the original variance – this is a shift in scale:

(aX) ~ N(a, a22)

a

Page 52: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist52

Adding a constant, multiplying by a constant:

For X~N(,2) (aX+b) ~ N(?,?)

Both adjustments are made:

The mean is adjusted to ‘a’ times the original mean plus ‘b’, and the variance by a2 times the original variance:

(aX+b) ~ N(ab, a22)

Page 53: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist53

Now, let a and b

Then

For X~N(,2) Z ~ N(?,?)

Or Z ~ N(0,1)

1 XZ aX b X

10z a b

22 2 2 21

1z a

Page 54: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist54

We have transformed the original scale

• to units measured in multiples of standard deviations

• centered around zero

• A value of z=-1 means the value of x is 1 standard deviation below the mean

• A value of z=2.5 means the value of x is 2.5 standard deviations above the mean

2~ ( , )X N ~ (0,1)X

Z N

Page 55: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist55

This transformation is also important, because if we want to know

Pr(a X b)

Then we can convert it to an equivalent calculation:

Pr( ) Pra X b

a X b

Pra b

Z

Page 56: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist56

Word Problem

The profit from the Massachusetts state lottery on

any given week is distributed Normally with

mean = 10.0 million and variance = 6.25 million dollars.

What is the probability that this week’s profit is

between 8 and 10.5 million?

Let X = weekly profit in millions

Then X ~ N(,2)

where =10 and 2=6.25 ( =2.5 )

What is Pr(8 X 10.5) ?

Page 57: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist57

What is Pr(8 X 10.5) ?

Translate to Standard Normal:

8 10.5Pr(8 10.5) Pr

XX

8 10 10.5 10Pr

2.5 2.5Z

Pr 0.8 0.2Z

-.8 .2

Page 58: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist58

= 0.5793 – 0.2119

= 0.3674

Read from Table 3 or use Minitab or other program:

.2 -.8

Pr(Z<0.2) – Pr(Z<-.8)

The probability of a weekly profit between 8 and 10.5 million dollars is 36.74%.

Page 59: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist59

Application of the Central Limit Theorem

• Means of samples of size n

• from a population with

• mean and variance 2

• follow a normal distribution

• with mean and variance 2/n, for n large.

That is, for X ~ ?(, 2)

for n large,

X ~ N(, 2/n)

Page 60: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist60

Example: Consider a population of families with =3.4 children per family and 2=4.37.

What percentage of samples of size n=4 families will have means greater than 5 children per family?

Sample means from samples with n=4 follow a normal distribution with

x= 3.4 and x2 = 2/n = 4.37/4 = 1.09.

Then x = 1.045

We want: Pr(X>5) , where X ~ N(3.4, 1.09)

Page 61: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist61

1.53

Pr(z > 1.53) = 0.06

5 3.4Pr( 5) Pr

1.045x

x

XX

Pr 1.53Z

The probability of observing a sample with a mean of 5 children per family or larger, when n=4 is about 6%.

Page 62: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist62

So far we have gone from

• X ~ N(, 2) Z ~ N(0,1):

We may be interested in the reverse:

• Z ~ N(0,1) X ~ N(, 2):

XZ

X Z

Page 63: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist63

Example:

The distribution of IQ scores is normal with a mean of

100 and a standard deviation of 15.

What is the 95th percentile of this distribution?

Step 1:

Find the 95th percentile of the standard normal –

use Minitab, or another program to compute:

Inverse Cumulative Distribution FunctionNormal with mean = 0 and standard deviation = 1.00000

P( X <= x) x

0.9500 1.6449 or z.95 = 1.645

Page 64: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist64

Step 2:

We know X ~ N(100, 152), and z.95 = 1.645

x.95 = z.95 +

= (15)(1.645) + 100

= 124.7

The 95th percentile of the IQ distribution is 124.7

X Z

Page 65: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist65

Another Example:Taking samples of size n=4 from the population of families with =3.4 children per family and 2=4.37:

What is the middle 50% of the sampling distribution?

That is, find a and b so the Pr(a X b) = .50

a is the 25th percentile of the sampling distribution of X

b is the 75th percentile of the sampling distribution of X

50%

25% 25%

a b

Page 66: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist66

Use Minitab to find 25th and 75th percentiles of standard normal:Inverse Cumulative Distribution Function

P( X <= x) x

0.2500 -0.6745

0.7500 0.6745

For X ~ N(, 2/n) where =3.4 and 2/n=1.09, Convert z back to x:

x = z x +

x.75 = .675 (1.045) + 3.4 = 4.11

x.25 = -.675 (1.045) + 3.4 = 2.69

Pr( 2.69 < X < 4.11) = .5050% of samples of size 4 from this population will have mean family size between 2.69 and 4.11 children per family.

Page 67: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist67

Recap. . . Introduction to the Normal Distribution

For continuous variables, we speak of a

• probability density function

• We calculate the probabilities of intervals of

values, not individual values

The normal distribution is a good description of

• many naturally occurring phenomena

• the average of non-normal phenomena

This last is particularly important since much

statistical inference is based on the behavior of

averages.

Page 68: Normal Dist1 Continuous Probability Distributions: The Normal Distribution

Normal Dist68

While there are infinitely many normal distributions,

each determined by and 2,

• they can all be standardized by using the

transformation

• We use the standardized form to compute

probabilities for any normal distribution.

• In the standardized form, distance from the

mean is in units of standard deviation

~ (0,1)X

Z N