chpter.5. stats

17
Chapter 5 Density Curve Normal Distribution For continuous data we have seen the need to use grouped data histograms. Another tool we can use is the construction of a density curve, a theoretical curve that best matches the areas of the histogram. The curve super-imposed over the histogram shape is a possible “best fit” representation. Rather than reading off the density at an individual score, we consider the area below the curve for a score interval. Example: Suppose we wish to know the proportion of data points that lie between x-values of x=-2 and x=-0.5. This area contains three bars. The sum of their total area (width × height) yields proportion of data for the interval. Bar Width Height Area ʹ ͳǤͷ 0.5 0.08 0.04 ͳǤͷ ͳ 0.5 0.14 0.07 ͳ ͲǤͷ 0.5 0.24 0.13 The total proportion of data points X, such that -2<x<-0.5, is the sum of these areas: 0.24 (or 24%)

Upload: zsiddiqui

Post on 21-Jul-2016

221 views

Category:

Documents


0 download

DESCRIPTION

chapter

TRANSCRIPT

Page 1: Chpter.5. Stats

Chapter 5

Density Curve

Normal Distribution

For continuous data we have seen the need to use grouped data histograms. Another tool we can use is the construction of a density curve, a theoretical curve that best matches the areas of the histogram. The curve super-imposed over the histogram shape is a possible “best fit” representation. Rather than reading off the density at an individual score, we consider the area below the curve for a score interval. Example: Suppose we wish to know the proportion of data points that lie between x-values of x=-2 and x=-0.5. This area contains three bars. The sum of their total area (width × height) yields proportion of data for the interval.

Bar Width Height Area

�� � � � ���� 0.5 0.08 0.04 ���� � � � �� 0.5 0.14 0.07 �� � � � ��� 0.5 0.24 0.13

The total proportion of data points X, such that -2<x<-0.5, is the sum of these areas: 0.24 (or 24%)

Page 2: Chpter.5. Stats

Density Curves

Recall that a random variable is continuous if it can take on any value in a given (possibly infinite) interval. A probability density function f is a real valued function that describes the distribution of probability for a continuous random variable and must satisfy: • �� � for all �� • The area under the curve � � �� and above �- axis is 1.

Point (ii) is conveniently written using notation from integral calculus:

� ��� �� � ���� .

Further convention insists that we no longer say the area bounded above by f(x) and below by the x-axis. Instead we will simply say the area under the curve f(x). If f is the probability density function for a continuous random variable �� then the area under the probability density curve �� � ���) that lies above �� � � and between �� � �� and �� � �� represents the probability that a random value of � lies between � and �.

Page 3: Chpter.5. Stats

���� �� � ��� � � � ��

Note: Since the area enclosed by a vertical line segment is 0, we have

��� � � � � � ��� � � � � � ��� � � � � � ��� � � � � �

Measures for Probability Density Functions

The mean, variance, standard deviation, median, and percentiles can also be defined for continuous random variables. The mean is the balance point of the density curve. So if the curve were a solid object the mean would be the point that you

Page 4: Chpter.5. Stats

could balance the curve on your finger. The formula for the mean of a density curve �� � ��� is

�� � ��! � � ��� "��

���

The standard deviation of a continuous random variable � measures the square root of the distance of any value of � to the mean, taking into account the distribution of these values. The formula for the variance and standard deviation for a density curve are

#$ � ��� � �% $�� "��

��

And

# � &#$.

The median of a density curve is the real number '( for which �) of the area under �� � ��� lies to the left of the line �� � �'(. Thus, the median '( satisfies the equation

Page 5: Chpter.5. Stats

Similarly, the percentile number for which left of ; that is

percentile of a density curve is the real of the area under

.

of a density curve is the real lies to the

Page 6: Chpter.5. Stats

The Normal Distribution

The Normal Distribution, used for inferential statistics and advanced probability, is the most important probability distribution in statistics. The density function depends on the mean * and standard deviation + and is given as

�� � �+&�, -

��.�/ 0$10

Where , and - are the familiar mathematical constants , 23.141592 . . .and -� 2 ���3�4�4��� � �� The probabilities 5��� � ��� � �� under a normal distribution are usually estimated numerically using a z-table. We will always use a z-table to compute these probabilities for normal distributions. Notation: The normal distribution with mean * and standard deviation + is denoted by 6�*� + . If ��is a continuous random variable with the normal distribution�6�*� + , then we write ��7�6�*� + �

Page 7: Chpter.5. Stats

Properties of the Normal Curve 6�*� +

(i) The normal curve is bell-shaped and symmetric about the line �� � �*. (ii) The area under the normal curve is one. (iii) The maximum value of the normal curve occurs at the point �*� 8

1&$9 . (iv) The normal curve has two inflection points at �* : +� 8

1&$9

Page 8: Chpter.5. Stats

(v) Fixing and changing (vi) The normal curve approaches the

however, the curve never reaches the y = 0 is a horizontal asymptote of the normal curve.) (vii) Fixing and changing and longer/shorter tails

and changing results in a horizontal shift.

The normal curve approaches the x-axis on either side of,the curve never reaches the x-axis. (So we say that

asymptote of the normal curve.)

and changing results in different max heights tails.

results in a horizontal shift.

axis on either side of, axis. (So we say that

results in different max heights

Page 9: Chpter.5. Stats

Calculating Probabilities for Standard

Normal Distributions

The standard normal distribution is the normal curve with parameters * � �and��+ � �. A continuous random variable with the standard normal distribution is usually denoted by ;. Example 5.1: The fact that the standard normal is symmetric about zero implies that �) of the area falls on either side of zero or 5�< � � ��.�Thus, zero is the medain of the standard normal distribution.

Note: z- tables give the area under the curve for each positive z, as seen in the following graph.

Page 10: Chpter.5. Stats
Page 11: Chpter.5. Stats
Page 12: Chpter.5. Stats

Example 5.2:�5� � < � ��� �is equal to the area under the curve between zero (0) and 0.55, and that area is provided for us on the table. That is, 5� � < � ��� � ��44�

Example 5.3: 5����=� � < � . This probability is equal to the area under the graph between -1.65 and zero (0):

This area is obtained from the table: Area = 0.4505. Therefore, 5����=�� � �;� � � � �� � 5�; � ���=� � �>��.

Example 5.4: 5�< � � where�� ? . For example to calculate 5�;� � ��@� � The area that is equal to this probability is the shaded area of the following graph:

Page 13: Chpter.5. Stats

5�;� � ��@� �� ��4�4@�

Example 5.5: 5�� � < where�� � . For example to calculate 5����4� � ;� � The area that is equal to this probability is the shaded area of the following graph:

5�;� � ����4� �� �5�;� � ���4� �� ��@=34�

Example 5.6: 5��� � �;� � �� , where �� � � and �� ? �. For example, 5����@� � �;� � ��=� . The area that is equal to this probability is the shaded area of the following graph:

Page 14: Chpter.5. Stats

We follow the same procedure as we used in last examples, by finding the area of the shaded region to the left of the y-axis and the area of the shaded area to the right of the y-axis, and adding the areas to get the total area, which gives us the desired probability

5����@� � �;� � ��=� � �5�; � �=�� � �5�� � �;� � ����@ � ��3>�� � ���43� � ��3�A��

Example 5.7: 5��� � �;� � �� , where �� ? � and �� ? �. For example, 5����� � �;� � ���>� . The area that is equal to this probability is the shaded area of the following graph:

Page 15: Chpter.5. Stats

5����� � �;� � ���>� � 5��;� � ���>� � 5��;� � �� � � �>@�@�B ��A4>@� � ���4�

Probability Calculations with the Normal Distribution

If �76�*� + , then the standardized variable < � .�/1 has the

standard normal distribution. Thus,

5�� � � � � � 5�� � *+ � < � � � *

+

Where <76�� � . Example 5.8: Suppose �76�A�� . Find 5��4 � � � A� �

Start by standardizing���. So let < � .�CD$ . Then

Page 16: Chpter.5. Stats

5��4 � < � A� � 5 E�4 � A� � < � A� � A

� F� �5��� � < � �� � ��A�4�

Example 5.9: The air pressure in a randomly selected tire is normally distributed with mean 31 psi and standard deviation 0.2 psi. What is the probability that the pressure of a randomly selected tire is

(a) more than 30.5 psi? (b) between 30.5 and 31.5?

Let � � the air pressure of a given tire. Then ��76�A�� �� , and hence

< � G�C8D�$ 76��� .

(a) 5�� � A�� � 5 H< � CD�I�C8D�$ J � 5�< � ����

� 5�< � ��� � �@@A4

So the probability that the pressure of a randomly selected tire exceeds 30.5 psi is .9938.

(b) 5�A�� � � � A��� � 5 HCD�I�C8D�$ � < � C8�I�C8D�$ J

� 5����� � < � ��� � �5� � < � ��� � � K �>@A4 � �@43=

Page 17: Chpter.5. Stats

Percentiles of the normal distribution

Example 5.9: Suppose certain test scores are normally distributed with mean 300 and standard deviation 45.

(a) What percentage of people scored below 350?

(b) Find the 25th percentile for the scores.

Let �� � the score on a given test. Then

< � � � A>� 76��� �

a) ��5�� � A� � 5 H< � CID�CDDLI J � 5�< � ����

� �4==�� That is to say that 86.65% of the people scored below 350. Hence 350 is approximately the 86th percentile.

(a) We need to find M such that

��� � 5�� � M � 5 E< � M � A>� F

From the table we see that 5 H< � < � N�CDDLI J � ��� which

implies that��N�CDDLI � ��=3. Thus, ��=3 � N�CDDLI or

M� � ��=@�4�� Hence the 25 th percentile is �=@�4�.