elementary statistics 3

86
Lesson 1: The Language and Terminology Introduction 1.1 Definitions & Concepts 1.2 Frequency Distribution 1.3 Pictorial Representation of Data Homework 1 - 3 Introduction Most people think of statistics as the study of the numerical features of a subject/population. It means the same to statisticians, but also emphasizes the methods of collecting data, summarizing and presenting data, and drawing inferences from data. We all see on TV how political pundits justify opposing points of view by presenting statistics from respectable sources. How could something be a science when it justifies two opposing points of view? The answer is that statistics has a scientific basis but it can be misrepresented in use. Example. During the saga of President Clinton's impeachment, we observed the following: 1. One pundit says that, according to statistics, the majority of Americans think that character matters. 2. The other pundit says, also according to statistics, that the majority of Americans think the president is doing a good job. The implication here is that one of them was "wrong." But the science of statistics says that both were correct. Data was collected and analyzed, and it was found that the majority of Americans think that character matters and that the majority of Americans think the president is doing a good job. It does not matter to the science of statistics which one of the statistically established facts you or I want to believe. Another point about the nature of statistics as a science is that it is not a deterministic science. It does not have laws like force is equal to mass times acceleration. Statements in statistics come with a probability (i.e., quantified chance) of being correct. When a weatherman says that it will rain today he means that there is, say, a ninety five percent chance that it will rain today. Roughly, this means that if he makes the same prediction one hundred times he will be correct 95 times, and it will not rain the other 5 days. The problem is that sometimes a weatherman will hide the information that there is a 95 percent chance only. Such information hiding is sometimes done for simplicity. Before I conclude this introduction, let me tell you an interesting anecdote about the development of this subject. When the proposal to establish the Indian Statistical Institute in Calcutta was considered by the government of India in the early part of the last century, some critics said, then why not an institute in astrology? At the inception of statistics as a science there was a lot of skepticism about its scientific validity. Those days are gone, and statistics is not likened to astrology any more! Statistics is a well-founded and precise science. It is a nondeterministic science in nature; it makes precise probabilistic statements only. In this course we will be talking about two branches of statistics. The first one is called descriptive statistics and deals with methods of processing, summarizing, and presenting data. The other part deals with the scientific methods of drawing inferences and forecasting from the data, and is called inferential or inductive statistics. In the rest of this lesson and the next we deal with descriptive statistics, which includes the presentation of data in the form of tables, graphs, and computations of various averages of data. 1.1 Basic Definitions and Concepts In statistics we use a small representative "sample" to study a big "population." The reason for this is the cost or even the impossibility of studying the whole population. Population and Sample Definitions. A complete collection of data on the group under study is called the population or the universe. A member of the population is called a sampling unit. Therefore, the population consists of all its sampling units. A Sample is a collection of sampling units selected from the population.

Upload: alecksander2005

Post on 24-Apr-2015

76 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elementary Statistics 3

Lesson 1: The Language and Terminology

Introduction 1.1 Definitions & Concepts 1.2 Frequency Distribution

1.3 Pictorial Representation of Data Homework 1 - 3

Introduction

Most people think of statistics as the study of the numerical features of a subject/population. It means the same to statisticians, but also emphasizes the methods of collecting data, summarizing and presenting data, and drawing inferences from data.

We all see on TV how political pundits justify opposing points of view by presenting statistics from respectable sources. How could something be a science when it justifies two opposing points of view? The answer is that statistics has a scientific basis but it can be misrepresented in use.

Example. During the saga of President Clinton's impeachment, we observed the following:

1.One pundit says that, according to statistics, the majority of Americans think that character matters.

2.The other pundit says, also according to statistics, that the majority of Americans think the president is doing a

good job.

The implication here is that one of them was "wrong." But the science of statistics says that both were correct. Data was collected and analyzed, and it was found that the majority of Americans think that character matters and that the majority of Americans think the president is doing a good job. It does not matter to the science of statistics which one of the statistically established facts you or I want to believe.

Another point about the nature of statistics as a science is that it is not a deterministic science. It does not have laws like force is equal to mass times acceleration. Statements in statistics come with a probability (i.e., quantified chance) of being correct. When a weatherman says that it will rain today he means that there is, say, a ninety five percent chance that it will rain today. Roughly, this means that if he makes the same prediction one hundred times he will be correct 95 times, and it will not rain the other 5 days. The problem is that sometimes a weatherman will hide the information that there is a 95 percent chance only. Such information hiding is sometimes done for simplicity.

Before I conclude this introduction, let me tell you an interesting anecdote about the development of this subject. When the proposal to establish the Indian Statistical Institute in Calcutta was considered by the government of India in the early part of the last century, some critics said, then why not an institute in astrology? At the inception of statistics as a science there was a lot of skepticism about its scientific validity. Those days are gone, and statistics is not likened to astrology any more! Statistics is a well-founded and precise science. It is a nondeterministic science in nature; it makes precise probabilistic statements only.

In this course we will be talking about two branches of statistics. The first one is called descriptive statistics and deals with methods of processing, summarizing, and presenting data. The other part deals with the scientific methods of drawing inferences and forecasting from the data, and is called inferential or inductive statistics.

In the rest of this lesson and the next we deal with descriptive statistics, which includes the presentation of data in the form of tables, graphs, and computations of various averages of data.

1.1 Basic Definitions and Concepts

In statistics we use a small representative "sample" to study a big "population." The reason for this is the cost or even the impossibility of studying the whole population.

Population and Sample

Definitions. A complete collection of data on the group under study is called the population or the universe.

A member of the population is called a sampling unit. Therefore, the population consists of all its sampling units.

A Sample is a collection of sampling units selected from the population.

Page 2: Elementary Statistics 3

Most often, we will work with numerical characteristics (like height, weight, and salary) of a group. So usually the population is a large collection of numbers and the sample is a small subset of the population.

Example. Suppose we are studying the daily rainfall in Lawrence. Since daily rainfall could be from 0 inches to anything above 0, the population here is all nonnegative numbers (i.e., the interval [0, ∞)). A sample from this population would be the observed amount of daily rainfall in Lawrence on some number of days. A sample of size 11 would be the observed daily rainfall in Lawrence on 11 days.

Variables

Many definitions of variables are available in standard textbooks. For our purpose the following definition will suffice.

Definition. A variable is a rule or a formula or a mechanism that associates a value with each member of the population. So, given a member w, a variable X assigns a valueX(w) to w. For us X(w) will be a characteristic (like height, weight, time, salary) of the population.

Example. Suppose we are studying the KU student population. The population is the whole collection of KU students. A KU student is a sample unit. If GPA is the "characteristic" that we are studying, then X = the GPA of a student is a variable. So, given a student, X has a value. For example:

X(Donald Smith) = 3.25, X(Sam Donaldson) = 3.11,

X(Karen Currie) = 3.89, X(King Who) = 2.13

On the other hand, if GENDER is the "characteristic" that we are studying, then Y = gender of a student is a variable. So, given a student, Y has a value. For example:

Y(Donald Smith) = Male, Y(Sam Donaldson) = Male,

Y(Karen Currie) = Female , Y(King Who) = Male

If HEIGHT is the characteristic that we are studying, then Z = height of students is a variable.

To give another example, if credit hours completed is the characteristic studied, T = the number of course credit hours completed so far by a student is a variable.

Similarly, given any other characteristic like weight, annual income, annual expenditure, you can construct a variable for this population.

A variable that takes numerical values is called a quantitative variable. So, the variables X, Z, and T above are quantitative variables, while Y is not. A variable that takes non-numerical values is called a qualitative variable. So, the variable Y above is a qualitative variable. We will mostly be concerned with quantitative variables.

We discuss two types of quantitative variables: continuous and discrete variables. A quantitative variable that can assume any numerical value over an interval is called acontinuous variable. Since Z above can (hypothetically) assume any value between 0 to 100 inches, Z is a continuous variable. T assumes only integer values and is therefore not a continuous variable.

A different way to understand a discrete variable is that the possible values of the variable can be written down (or can be counted) in a (finite or infinite) list. We say that the values of a discrete variable are countable.

A quantitative variable is called a discrete variable if its possible values consist of breaks between successive values. If a variable assumes only a finite number of values, then it is also called a finite variable. Otherwise the variable is called an infinite variable. A finite variable is definitely a discrete variable. The variable T above is a discrete variable.

Examples of Continuous and Discrete Variables

1. The examples of continuous variables are weight, length, volume, area, and time.2. For this course, examples of discrete variables are always the number of something—number of typos, number of

road accidents, number of phone calls.

Page 3: Elementary Statistics 3

Parameters and Statistics

Definition 1. Given a set of data, any numerical value computed from the data using a formula or a rule is called a quantitative measure of the data.

Definition 2. A quantitative measure of a population data is called a parameter. In other words, parameters belong to the whole population and are computed (if feasible) from the WHOLE population data. Examples: the average GPA of all KU students, the height of the tallest student in KU, the average income of the entire KU student population.

One way to study a population is to know some of the parameters of the population. Unfortunately, computing such parameters could be expensive or even impossible. Essentially, parameters are unknown and the main game of statistics is to try to estimate parameters on the basis of small samples collected from the population.

Definition 3. A quantitative measure of a sample data is called a statistic. So, any constant that we compute from a sample is a statistic. We use these statistics to estimate the parameters of the population. For example, the average height computed from a sample is a reasonable estimate for the (parameter) average height of the KU student population. Obviously, we do not expect the value of the statistic to be exactly equal to the parameter value. Hopefully, the error will be small or will exceed our tolerable limit very rarely (say once in a 100 trials).

Why do we need a statistic?

Sometimes it will be impossible to know the actual value of a parameter. For example, let μ be the mean length of the life of light bulbs produced by a company. In this case, the company cannot test all the bulbs it produces to find a mean length. So, the best it can do is to test a few bulbs, compute the sample mean length (a statistic) of the life of these bulbs and use it as an estimate for the mean length (parameter μ) of the life for all the bulbs it produces.

Definition 4. The data that has not been processed or organized in any form is called raw data. When the data is arranged in an increasing or decreasing order, then it is called an array. The range of the data is the difference between the largest and the smallest value of the data.

range = highest value - lowest value.

1.2 Frequency Distribution

In this section we talk about representation of data organized in tabular form. Such a representation is called a frequency distribution. We are mostly concerned with numerical data (i.e., quantititative data), but also consider some non-numerical data (i.e., qualitative data).

Example. (from Khazanie, p. 18) The following is data on the blood group of 36 patients in a hospital:

O A B O A A A O O

O A O A B O O O AB

B A A O O A A O AB

O A A B A O A O O

We have four types of blood groups, namely, O, A, B, AB. Each of these blood groups may be referred to as a "class." The frequency of a class is defined as the number of data members that belong to that class. For example, the frequency of the class O is 16; the frequency of class A is 14. A table that lists the classes and the corresponding frequency is called the frequency distribution of this qualitative data. Following is the frequency distribution of this data:

Blood Group Frequency

O 16

A 14

B 4

Page 4: Elementary Statistics 3

AB 2

Total 36

Ungrouped Data

For the quantitative data, we consider two types of frequency table. When we are working with a large set of data we group that data into a few classes and construct a "frequency table," which we will discuss later. If the data set is small or if the number of values that appear in the data is small we need not group the data. Instead, we make a list of all the data members and give the corresponding frequency for each data member in a table. The number of times a data member (i.e., value) appears in the data is called the frequency of the data member. A list that presents the data members and the corresponding frequency in a tabular form is called a frequency table orfrequency distribution. The relative frequency and percentage frequency of a data member x are defined as follows:

relative frequency of x =

frequency of x

total # of data points

and

percentage frequency of x =

frequency of x

total # of data points

· 100.

The frequency table may also contain the relative and percentage frequency. Since we did not group the data into a few classes, we call this the frequency distribution of the ungrouped data.

Example 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials, and the following sample of times taken (in seconds) to complete the laps was collected:

50 48 49 46 54 53 52 51 47 56 52 51

51 53 50 49 48 54 53 51 52 54 54 53

55 48 51 50 52 49 51 53 55 54 50

Note that there are 35 observations here. So we say that the size of the sample (or data) is 35. Also the values present are 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56. Since there are only 11 distinct values present we can make a frequency table for the ungrouped data. The following is the frequency distribution of this ungrouped data:

Time(in seconds)

FrequencyRelative

FrequencyPercentage Frequency

46 1 1/35 2.86

47 1 1/35 2.86

48 3 3/35 8.57

49 3 3/35 8.57

50 4 4/35 11.43

51 6 6/35 17.14

52 4 4/35 11.43

53 5 5/35 14.29

54 5 5/35 14.29

Page 5: Elementary Statistics 3

55 2 2/35 5.71

56 1 1/35 2.86

Total 35 1 100

Grouped Data

When we are working with a large set of data that has too many distinct class member (i.e., values) then we group the whole set of data into a few class intervals and give the corresponding "frequency" of the class. When the data is presented in this way, the data is called grouped data. The number of data members that fall in a class interval is called the class frequency and the relative and percentage frequencies are computed by the same formula as above. A list that gives various class intervals and the corresponding class frequencies in a tabular form is called a class frequency table or class frequency distribution of the data. The frequency distribution may also include the relative and percentage frequencies.

Grouped Data and Loss of Information

Sometimes it is convenient or necessary to group data into class intervals and construct a class frequency distribution. This is the case when there are too many distinct numbers present in the data—too many even to fit into a simple table on a page for presentation. In such situations, we group the data in a few class intervals. While class frequency distribution is very good for presentation and convenient for other reasons, we lose a lot of information in this process. There is no way we can recover the original data from the class frequency distribution.

Given a set of data, a good question would be, How many class intervals should we have? The answer is that it should not be too few nor should it be too many. If we take too few (say one), then all the information will be lost. On the other hand, if we take too many, we will have the problem of having to work with ungrouped data. (In this course we will always tell you how many classes to take.) Although sometimes it may be necessary to take class intervals of varying width, in this course we only consider classes of equal class width.

Steps to Construct Frequency Distribution

1.Range: Pick a suitable number L less than or equal to the smallest value present in the data. Pick a suitable

number H greater than or equal to the highest value present in the data. The range R that we consider is R = H - L.

2.Number of Classes: Decide on a suitable number of classes. (In this course we will tell you the number of

classes.)

3.Class Width: We have

class width = w =

R

Number of classes

We will pick L, H, and the number of classes so that class width is a "round number."

4.Classes: We divide our interval [L:H] into subintervals, to be called classes, as

[L,L+w],[L+w,L+2w],[L+2w, L+3w], ...,[H-w,H]

Since this definition creates an ambiguous situation in which a data value may fall into two classes, we need a convention to address this situation.

Page 6: Elementary Statistics 3

5.Frequency: Find the frequency for each of the classes. You can use an advanced calculator or some software

(like Excel) to count frequencies.

A few more important definitions. The above intervals are called class intervals. The w above is called the class size or width. The lower end of the class is called lower limit and the upper end of the class is called upper limit. The class mark is the midpoint of the class, defined as follows:

class mark =

lower limit of class+ upper limit of class

2

.

A class limit is also called a class boundary. I took a slightly different approach when I defined the classes, so that for us class limits and class boundaries are the same. Although all the approaches are essentially the same, many slightly different approaches are possible depending on the situation.

Example 1.2.2 The following is the weight (in ounces), at birth, of a certain number of babies.

74 105 124 110 119 137 96 110 120 115 140

65 135 123 129 72 121 117 96 107 80 91

74 123 124 124 134 78 138 106 130 97 145

93 133 128 96 126 124 125 127 62 127 92

95 118 126 94 127 121 117 124 93 135 156

143 125 120 147 138 72 119 89 81 113 91

133 127 138 122 110 113 100 115 110 135 141

97 127 120 110 107 111 126 132 120 108 148

143 103 92 124 150 86 121 98 74 85 99

We will construct a class frequency table of this data by dividing the whole range of data into class intervals.

Solution: Note that the lowest value is 62 and the highest value is 156. We take L = 60, H = 160, so R = H-W = 100. We made such a choice of L and H, precisely so that R = 100 is a "nice" number. Now we decide to have 5 class intervals and so w = R/5 = 20. According to what I said above, our classes should be : [60, 80], [80,100], [100,120], [120,140], [140, 160]. But if we do so then there is a risk that some data members (like 80, 100, 120, 140) will fall in two classes. One way to avoid this is to add .5 to all the class boundaries. So, our classes are [60.5, 80.5], [80.5, 100.5], [100.5, 120.5], [120.5, 140.5], [140.5, 160.5].

So the frequency distribution is as follows:

Classes FrequencyRelative

FrequencyPercentage Frequency

60.5 - 80.5 9 9/99 9.09

80.5 - 100.5 20 20/99 20.20

100.5 - 120.5 25 25/99 25.26

120.5 - 140.5 37 37/99 37.38

140.5 - 160.5 8 8/99 8.08

Total 99 1 100

Page 7: Elementary Statistics 3

1.3 Pictorial Representation of Data

Another way to represent data is to use pictures and graphs. We see such pictorial representation in newspapers and other sources every day. Pictorial representation is particularly important when you have to represent data to people with limited technical background, like newspaper readers or a governmental or congressional body.

The Pie Chart

The pie chart is a commonly used pictorial representation of data. When you do your tax return every year, you find a few pie charts in the instruction book for form 1040. These charts show what proportion/percentage of each tax dollar goes for particular expenses. I reproduced the following pie charts from the 1040 instruction book of 1999.

Pie charts are self explanatory; we do not need to discuss them further.

The Histogram

Among pictorial representations, the most useful in this course is the histogram. The histogram of data is the graphical representation of the frequency distribution of the data, where we plot the variable on the horizontal axis and above each class interval, we erect a bar of the height equal to the frequency of the class. Such a histogram is called a frequency histogram.

If, instead, we erect bars of height equal to the relative frequency, then the graph is called a relative frequency histogram. Similarly, we can construct a percentage frequency histogram.

The following is a histogram.

Page 8: Elementary Statistics 3

We have decided to avoid unequal class lengths, which makes our discussion of the histogram fairly simple.

Remark. Take a look at the Stem and Leaf Diagram discussed in any textbook.

Example 1.3.1. Following is the frequency table of data on height (in inches) of some babies at birth. Sketch the histogram of the following data:

Height Frequency

16-17 3

17-18 8

18-19 34

19-20 60

20-21 72

21-22 18

The Cumulative Frequency Distributions

For a given value x of a variable, the cumulative frequency of the data, for x, is the number of data members that are less than or equal to x.

Definition. Given a frequency distribution of some data, for a class boundary x, the cumulative frequency is the sum of all the class frequenies less or equal to x. Thecumulative frequency distribution is a table that gives the cumulative frequencies against some x values (for us the class boundaries). We also define cumulative relative frequency and cumulative percentage frequency as follows:

cumulative relative frequency of x =cumulative frequency of x

total # of data points

cumulative percentage frequency of x=

cumulative frequency

total # of data points

×100

Example 1.3.2 Once again we consider the data on birth weight of babies in Example 1.2 that we discussed in the last section. A cumulative frequency distribution can be constructed from the frequency distribution.

Solution: We have seen the frequency distribution before. The following is the cumulative distributions:

Page 9: Elementary Statistics 3

WeightCumulative Frequency

Relative-Cumulative Frequency

CumulativePercentage Frequency

60.5 0 0 0

80.5 9 9/99 9.09

100.5 29 29/100 29.29

120.5 54 54/99 54.55

140.5 91 91/99 91.92

160.5 99 1 100

The Ogive

Definition. The ogive is a line graph, where we plot the variable on the horizontal axis and the cumulative frequency on the vertical axis. If we plot the cumulative relative frequency on the vertical axis, then the line graph is called the relative frequency ogive.

Use of Calculators

Because we will be using calculators (TI-83) extensively in this course, let me explain how you enter data in the TI-83.

Use of Calculators (TI-83):

Enter Your Data:

1. Press the button "stat."2. Select "Edit" in the Edit menu and enter.3. You will find 6 lists named L1, L2, L3, L4, L5, L6.4. Let's say you want to enter your data in L1. If L1 has some data, you clear it by pressing the stat button

and selecting ClrList in the Edit menu. ClrList appears then type L1 and hit enter. To type "L1" on your TI-83 simply press 2nd then 1.

5. Once L1 is cleared, you select Edit in the Edit menu and enter.6. Now type in your data; enter one by one.

Page 10: Elementary Statistics 3

It is not easy to construct a frequency table of a data set unless you are systematic. Traditionally, we used "tally marks" to count the frequency. Now you can use some software programs (e.g., Excel). Let me show you a method, using a calculator (TI-83).

1. Press "stat."2. To input data, enter "edit."3. Enter your data (say in L1).4. Press "stat."5. Enter "sortA" L1.6. Press "stat" and then enter "edit." On L1 you will see that the data is sorted in an increasing order.7. Now you can count the frequencies.

Problems on 1.2: Frequency Distribution

Exercise 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials, and the following sample of times taken (in seconds) to complete the laps was collected:

50 48 49 46 54 53 52 51 47 56 52 51

51 53 50 49 48 54 53 51 52 54 54 53

55 48 51 50 52 49 51 53 55 54 50

The following is the frequency distribution of this ungrouped data:

Time(in seconds)

FrequencyRelative

FrequencyPercentage Frequency

46 1 1/35 2.86

47 1 1/35 2.86

48 3 3/35 8.57

49 3 3/35 8.57

50 4 4/35 11.43

51 6 6/35 17.14

52 4 4/35 11.43

53 5 5/35 14.29

54 5 5/35 14.29

55 2 2/35 5.71

56 1 1/35 2.86

Total 35 1 100

Construct a histogram.

Exercise 1.2.2. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119

104 135 123 129 72 121 117 96 107 80 80

Page 11: Elementary Statistics 3

96 123 124 124 134 78 138 106 130 97 134

111 133 128 96 126 124 125 127 62 127 96

116 118 126 94 127 121 117 124 93 135 112

120 125 120 147 138 72 119 89 81 113 100

109 127 138 122 110 113 100 115 110 135 120

97 127 120 110 107 111 126 132 120 108 148

133 103 92 124 150 86 121 98

Construct a class frequency table of this data by dividing the the whole range of data into class intervals:

[60.5-70.5], [70.5-80.5], [80.5-90.5], [90.5-100.5], [100.5-110.5], [110.5-120.5], [120.5-130.5], [130.5-140.5], [140.5-150.5]

Solution

Exercise 1.2.3. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5

19 19 21.5 19.5 20 17 20 20 19 20.5

18 18.5 20 19.5 20.75 20 21 18 20.5 20

21 19 20.5 19 20 19.5 17.75 20 19.5 20

20.5 17 21 18.5 20 20 20 18.5 19.5 19

18 20.5 18 20 19 19 19.5 20 20.75 21

17.75 19 18 19 20 18.5 20 19 21 19

19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5

20.25 20 19.5 19.5 20 20 20 21 20 19

18.5 20.5 21.5 18 19.5 18

Construct a frequency table for this data by dividing the whole range into class intervals:

[16-17], [17-18], [18-19], [19-20], [20-21], [21-22].

Note: If a data member falls on the boundary, count it in the right/upper class-interval.Solution

Exercise 1.2.4. The following data represents the number of typos in a sample of 30 books published by some publisher.

156 159 162 160 156 162

159 160 156 156 160 162

156 159 162 156 162 158

160 158 159 162 158 158

162 160 159 162 162 160

Construct a frequency table (by sorting in your calculator). Also construct a histogram.Solution

Exercise 1.2.5. Following is data on the hourly wages (paid only in whole dollars) in an industry.

9 11 8 9 10 11 7 10 12 13

Page 12: Elementary Statistics 3

7 11 8 11 14 9 10 9 11 7

13 13 14 12 9 8 12 14 15 9

9 7 12 7 12 7 7 11 13 9

11 9 9 9 10 14 11 12 14 7

Construct a frequency table (by sorting in your calculator). Also construct a histogram.Solution

Exercise 1.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13

7 8 11 11 14 9 7 9 11 7

9 13 12 14 7 8 7 14 15 9

9 7 11 9 12 9 12 11 14 9

12 13 7 9 10 14 11 12 13 7

15 15 16 16 15 16 11 7 18 19

15 16 15 15 16 16 17 16 16 13

15 15 16 15 16 15 15 17 16 12

16 15 15 16 15 15 19 8 16 17

16 16 15 16 16 16 13 12 8

Construct a frequency table (by sorting in your calculator).

Lesson 2 : Measures of Central Tendency and Measures of Dispersion

Introduction 2.1 Measures of Central Tendency: Mean

2.2 Measures of Central Tendency: Mean, Median, Mode

2.3 Measures of Dispersion Homework 4 - 7

Introduction

In this lesson we talk about two types of constants that we compute from data:

1. measures of central tendency and

2. measures of dispersion.

A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around.

2.1 Measure of Central Tendency: Mean

The most common measure of central tendencies is the mean or arithmetic mean.

Definition. The mean or the arithmetic mean of a set of data is given by

mean = sum of all the data values .

Page 13: Elementary Statistics 3

size of the data

If we denote a data value (i.e., the variable) by x and if n is the size of the data, then the above formula is written as

mean = x =

∑ x

n

.

OR

mean = x = ∑ x/n where ∑ denotes summation.

If the data is a sample, then the mean is called the sample mean. Again, if x denotes the variable, the data is sometimes denoted by x1,x2, ... ,xn and then

mean = x

=

n ∑ xii=1

n

.

OR

mean = x = n ∑ xi/n

i=1If you have not seen the notation ∑ before, it simply means summation. For example,

n ∑

i = 1

xi = x1+x2+ ... +xn

Weighted Mean

Sometimes, different values in data carry different weight. Let us consider the following data and the corresponding frequency distribution that we computed earlier:

Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials. The following are sample times taken (in seconds) to complete the laps:

50 48 49 46 54 53 52 51 47 56 52 51

51 53 50 49 48 54 53 51 52 54 54 53

55 48 51 50 52 49 51 53 55 54 50

Following is the frequency distribution of this data:

Time (in seconds) 46 47 48 49 50 51 52 53 54 55 56

Frequency 1 1 3 3 4 6 4 5 5 2 1

Now we want to compute the mean time. So, we add all the data values and divide by the data size 35. We already have computed the frequency distribution which tells us that, in the data, 46 was present 1 time, 47 was present 1 time, 48 was present 3, times and so on. So, using the frequency distribution, we compute the mean as follows :

mean=x=(46x1+47x1+48x3+49x3+50x4+51x6+52x4+53x5+54x5+55x2+56x1)

(1+1+3+3+4+6+4+5+5+2+1)=1799/35=51.4

The mean of the original data is the weighted mean of the data values 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 56 with the corresponding frequency as the weight. So, a new formula for the mean would be

Page 14: Elementary Statistics 3

x

= mean =

n ∑ xi fii=1

n

∑fii=1

OR

mean

= x =

n n ∑ fi xi / ∑fi

i=1 i=1

where fi is the frequency of xi. The weighted mean is defined in more general context as follows:

Definition. If x1, x2, ... , xn in a data set have different weights and the values xi has weight wi, then the weighted

mean is defined as

weighted mean =n

i=1wixi

n

i=1

wi

.

OR

weighted mean = x = ∑wixi / ∑wi

Properties of the Mean

1. Combining two means. Suppose we have two sets of data. The mean of the first set is x, and the size of the

first set is m; the mean of the second set is y, and size of the second set is n. The mean of the combined data is

Combined mean = (m x +ny)/(m+n)

This is the weighted mean of x, y with weight m,n respectively.

2. Effect of translation. Let x be the mean of x1, x2, ... , xn. Then the mean of y1 = x1+d, y2 = x2+d, ... ; yn =

xn+d is given by

y = x+d

3. Effect of multiplication by a constant. Let x be the mean of x1, ... , xn. Then the mean of

z1 = cx1, z2 = cx2, ... , zn = cxn

is given by

z = cx

Page 15: Elementary Statistics 3

Properties of Mean

Remark (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean + 7) = 73 + 7 = 80. This is what we meant by "effect of translation."

Example (effect of multiplication by c): Suppose you have some data x1, x2, ..., xn on salaries in an industry in the

United States and the mean is $37000. On a certain day, 1 U.S. dollar = 1.4729 Canadian dollars (say c = 1.4729). So, in Canadian dollars the mean is 37000*c = 37000 x 1.4729. Similarly, the change of units (inches to feet or cm) are "multiplication by a constant c."

Example 2.1.2. A student took PHSX 115 (College Physics), PSYC 120 (Personality), FREN 110 (Elementary French), BUS 241 (Managerial Accounting), and MATH 365 (Elementary Statistics). The number of credit hours and the student's grade is given in the following table:

Course PHSX 115 PSYC 120 FREN 110 BUS 241 MATH 365

Grade (Points) B (3 points) A (4 points) B (3 points) C (2 points) B (3 points)

Credit Hours 4 3 5 3 3

What is the student's GPA?

Solution. The GPA is the weighted average of the points (corresponding to the grades), weight being the course-credit hours. So, the GPA = (3x4+4x3+3x5+2x3+3x3)/(4+3+5+3+3) = 54/18 = 3.

2.2 Measure of Central Tendency: Median, and Mode

The Median

The median represents the middle value of the data. Half the data will be less than or equal to the median, and half the data will be greater than or equal to the median. You are above the median American income if half the American population is making less than you make.

Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the size of data is ODD then the median is the middle value. If it is EVEN, then themedian is the mean of the middle two values.

The Percentiles

Definition. For a number p between 0 to 100, the pth percentile xp of the data is a number such that at least p percent of

the data members are below xp and at least (100 - p) percent of the data members are above xp.

1. The 25th percentile is called the first quartile Q1.

2. The median is the 50th percentile, also called the second quartile Q2.

3. The 75th percentile is called the third quartile Q3.

The Mode

There is one other measure of central tendencies that should be mentioned.

Definition. The MODE of the data is the value or values that have the highest frequency. For example, the mode of the set {1, 3, 5, 5, 7} is {5} because it has the highest frequency. The mode of {1, 1, 3, 5, 5, 7} is {1, 5} because 1 and 5 both have the highest frequency. Such a set is said to be bimodal.

Use of Calculators (TI-83):

Entering your data

1. Press the button stat.2. Select "Edit" in the Edit menu and enter.3. You will find six lists named L1, L2, L3, L4, L5, L6.

Page 16: Elementary Statistics 3

4. Let's say you want to enter your data in L1.5. If L1 has some data, clear it by pressing the stat button and selecting ClrList in the Edit

menu.6. Once L1 is cleared, select Edit in the Edit menu and enter.7. Now type in your data and enter one by one.

Sorting data and computing the median

1. Enter your data in a list, say L1.2. Select SortA in the Edit menu and enter.3. The calculator will ask for the list. Type in the list (L1), close the parentheses, and enter.4. The calculator will say Done.5. Press stat, select edit in the Edit menu, and enter.6. You will see that your data in L1 has been sorted in an increasing order.7. If the data size is odd, the median is the middle value.

If the data size is even, the median is the average of the middle two values.

Computing the mean if only raw data is given

1. Enter your data in a list, say L1.2. Select "1-Var Stats" in the CALC menu and enter.3. The calculator will ask for the list. Type in the list L1 and enter.

4. The calculator will give a list of numbers; x-bar is the mean x.

Computing the mean if the frequency table is given

1. Enter the frequency table in the calculator, say, x-values in L1 and frequencies in L2.2. Select "1-Var Stats" in the CALC menu and enter.3. The calculator will ask for the lists. Type in the list L1, L2 and enter.

4. The calculator will give a list of numbers; x-bar is the mean x.

Problems on 2.2: Mean and Median

Exercise 2.2.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

138 142 127 137 148 130 142 133

Find the median price and mean price observed by the trader. Solution

Exercise 2.2.2. The following figures refer to the GPA of six students.

3.0 3.3 3.1 3.0 3.1 3.1

Find the median and mean GPA.

Exercise 2.2.3. The following data give the lifetime (in days) of light bulbs.

138 952 980 967 992 197 215 157

Page 17: Elementary Statistics 3

Find the mean and median lifetime of these bulbs. Solution

Exercise 2.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds) Frequency

26 3

27 6

28 5

29 6

30 9

31 3

Total 32

Compute the mean and median time taken by the athlete. Solution

Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119

104 135 123 129 72 121 117 96 107 80 80

96 123 124 124 134 78 138 106 130 97 134

111 133 128 96 126 124 125 127 62 127 96

116 118 126 94 127 121 117 124 93 135 112

120 125 120 147 138 72 119 89 81 113 100

109 127 138 122 110 113 100 115 110 135 120

97 127 120 110 107 111 126 132 120 108 148

133 103 92 124 150 86 121 98

Compute the mean and median weight, at birth, of the babies. Solution

Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13

7 8 11 11 14 9 7 9 11 7

9 13 12 14 7 8 7 14 15 9

9 7 11 9 12 9 12 11 14 9

12 13 7 9 10 14 11 12 13 7

15 15 16 16 15 16 11 7 18 19

15 16 15 15 16 16 17 16 16 13

15 15 16 15 16 15 15 17 16 12

16 15 15 16 15 15 19 8 16 17

16 16 15 16 16 16 13 12 8

Compute the mean and median hourly wage. Solution

Page 18: Elementary Statistics 3

Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos 156 158 159 160 162

Frequency 6 4 5 6 9

Find the mean and median number of typos in a book. Solution

Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5

19 19 21.5 19.5 20 17 20 20 19 20.5

18 18.5 20 19.5 20.75 20 21 18 20.5 20

21 19 20.5 19 20 19.5 17.75 20 19.5 20

20.5 17 21 18.5 20 20 20 18.5 19.5 19

18 20.5 18 20 19 19 19.5 20 20.75 21

17.75 19 18 19 20 18.5 20 19 21 19

19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5

20.25 20 19.5 19.5 20 20 20 21 20 19

18.5 20.5 21.5 18 19.5 18

Compute the mean and median length, at birth, of these babies. Solution

2.3 Measures of Dispersion

RangeClearly, the measures of central tendency—mean, median, mode—cannot tell us the "whole story" about the data.

Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of the semester:

Section A 81 84 83 80 82

Section B 72 93 92 82 71

Both these sections have the same mean—82. But in Section A, everybody will get a B grade. In section B, we will have two C's, one B and two A's.

The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very small dispersion or variability, whereas section B has a large dispersion.

A very simple measure of dispersion is the range of the data as we have defined before:

range = largest value - smallest value.

Mean Deviation, Sample Variance, and Standard Deviation

We will discuss three more measures of dispersion.

Suppose we have a data set x1, x2, ... , xn of size n. We will denote the mean of the data by x. Three definitions follow:

Definition. The mean deviation of the data is defined as follows.

Page 19: Elementary Statistics 3

mean deviation = ( |x1- x | + ... + |xn- x |) / n

So, the mean deviation is the mean of the absolute deviations | xi -x | from the mean.

Definition. The sample variance s2 of the data is defined as follows:

s2 = ( (x1- x)2 + ... + (xn- x)2 ) / (n -1)

Remark.

1. Note that we denote the sample variance as the square of a number s.2. Also note that we divide by n-1, not by n. For some reason, dividing by n-1 works better.

3. We would like our measure of dispersion to have the same units as our data, but our formula involves squares (xi-

x)2, which means the unit of dispersion, s2, is the unit of the data squared. If the data is in feet, the variance is in square feet. To solve this problem we define another measure of dispersion, standard deviation denoted s.

Definition. The sample standard deviation s is defined as the square root of the sample variance s2. So, to compute the sample standard deviation, we have to compute the sample variance first.

If we simplify the definition of sample variance we get the following formula:

s2 =( (x12 + x2

2 + ... + xn2) - nx2)/(n - 1)

Let us quickly do some computation with the above example 2.3.1.

The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5= 42/5. Since the variability of section B was much higher, the mean deviation was very high.

Let us compute the the sample variances :

For section A the sample variance is

( (81-82)2+(84-82)2+(83-82)2+(80-82)2+(82-82)2 )/(5-1) = (1+4+1+4+0) /4= 10/4 = 2.5 .

For section B the sample variance is

( (72-82)2+(93-82)2+(92-82)2+(82-82)2+(71-82)2 )/(5-1) =(100+121+100+0+121) /4= 442/4.

Application of Standard deviation

The mean and the standard deviation tell us a lot about how the data is distributed.

Chebyshev's Rule. This rule applies for all kinds of data. Suppose x is the mean and s is the standard deviation of the data. Then we have the following:

1. At least 0 percent of the observations will fall within 1 standard deviation of the mean, i.e, within (x-s, x+s). This is clearly obvious.

2. At least 75 percent of the observations will fall within 2 standard deviations of the mean, i.e., within (x-2s, x+2s).3. At least 89 percent of the observations will fall within 3 standard deviations of the mean, i.e., within (x-

3s, x+3s).

4. More generally, at least 100(1 - 1/k2) percent of the data will be within k- standard deviations from the mean, i.e.

within (x-ks, x+ks).

Chebyshev's Rule makes no assumption about the data or the variable. If we make some assumptions about the data, then we can improve the above rule as follows.

The Empirical Rule: Suppose the histogram of the data is symmetric around the vertical line x = x as follows:

Page 20: Elementary Statistics 3

In other words, the histogram should fit into a bell-shaped curve.

Bell-shaped Curve

Click to see the Flash animation. Then we have the following:

1. Approximately 68.3 percent of the observations will fall in the interval (x-s, x+s).2. Approximately 95.4 percent of the observations will fall in the interval (x-2s, x+2s).3. Approximately 99.7 percent of the observations will fall within the interval (x-3s, x+3s).

Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data members are EQUAL!

Practice Problem. Consider the exercises 2.2.1 through 2.2.8. For each problem, compute the mean and standard deviation of the data and find what percentage of the data are within one, two, or three standard deviations from the mean.

Use of the Frequency Table

When a frequency table is given, we can use new formulas to compute the mean and variance of the data.

Formulas. Suppose the data consisting of n observations are given in a frequency table (ungrouped). Let xi denote the

values and fi be the frequency of xi. Then

1. the mean =

x

=

∑ fixi∑ fi

=

∑ fixin

,

Page 21: Elementary Statistics 3

2. the variance =

s2 =

∑ fi(xi - x)2n- 1 ,

3. A simplified formula for variance is

s2 =

1

n- 1

[∑ (fixi2) - n x2 ].

4. If the data is given in a frequency table of the grouped data, we use the same formula, with xi as the class

mark, which is the average of the class limits.

Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car (example 2.1.1) to compute mean and variance using the above formulas.

Time x

Frequency f

fx fx2

46 1 46 2116

47 1 47 2209

48 3 144 6912

49 3 147 7203

50 4 200 10000

51 6 306 15606

52 4 208 10816

53 5 265 14045

54 5 270 14580

55 2 110 6050

56 1 56 3136

Total 35 1799 92673

So, the mean x = 1799/35 =51.4 and variance

s2 = (92673 - 35x 51.42)/(35-1) = 6.0118.

Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2, Lesson 1):

ClassesFrequency

fClass Mark

xfx fx2

60.5-80.5 9 70.5 634.5 44732.25

80.5-100.5 20 90.5 1810 163805

100.5-120.5 25 110.5 2762.5 305256.25

120.5-140.5 37 130.5 4828.5 630119.25

140.5-160.5 8 150.5 1204 181202

Page 22: Elementary Statistics 3

Total 99 11239.5 1325114.75

We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight.

So, the mean x = 11239.5/99 = 113.53 and variance

s2 = (1325114.75 - 99 x 113.532)/(99-1) = 500.997.

Remarks.

1. Note that we can only get an approximate mean and variance if we use the class mark and with the above formula. If you also use the original data you may notice a difference.

2. Because of the availability of computers, the importance of such approximations has declined.

Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It is important to understand these concepts and formulas.

It is equally important to appreciate the value and necessity of using calculators or other available software (like Excel). It is almost impossible (and unnecessary) to compute these constants manually and correctly, unless one is specially gifted with numerical computations.

Use of Calculators (TI-83):

Computing the variance and standard deviation

1. Follow the same steps used for computing the mean (using either raw data or the frequency table).

2. The calculator will give a list of numbers; SX is the standard deviation.

3. The variance is the square of the standard deviation.

Problems on 2.3: Variance, Standard Deviation, and Use of the Frequency Table

Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

138 142 127 137 148 130 142 133

Find the variance and standard deviation of the price. Solution

Exercise 2.3.2. The following figures refer to the GPA of six students.

3.0 3.3 3.1 3.0 3.1 3.1

Find the variance and standard deviation of GPA.

Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs.

138 952 980 967 992 197 215 157

Find the variance and standard deviation of the lifetime of these bulbs. Solution

Page 23: Elementary Statistics 3

Exercise 2.3.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds) Frequency

15.6 3

15.7 6

15.8 5

15.9 6

16.0 9

16.1 3

Total 32

Compute the variance and standard deviation of time taken by the athlete. Solution

Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119

104 135 123 129 72 121 117 96 107 80 80

96 123 124 124 134 78 138 106 130 97 134

111 133 128 96 126 124 125 127 62 127 96

116 118 126 94 127 121 117 124 93 135 112

120 125 120 147 138 72 119 89 81 113 100

109 127 138 122 110 113 100 115 110 135 120

97 127 120 110 107 111 126 132 120 108 148

133 103 92 124 150 86 121 98

Compute the variance and standard deviation of the weight, at birth, of these babies. Solution

Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13

7 8 11 11 14 9 7 9 11 7

9 13 12 14 7 8 7 14 15 9

9 7 11 9 12 9 12 11 14 9

12 13 7 9 10 14 11 12 13 7

15 15 16 16 15 16 11 7 18 19

15 16 15 15 16 16 17 16 16 13

15 15 16 15 16 15 15 17 16 12

16 15 15 16 15 15 19 8 16 17

16 16 15 16 16 16 13 12 8

Compute the variance and standard deviation of the hourly wages. Solution

Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

Page 24: Elementary Statistics 3

No. of Typos 156 158 159 160 162

Frequency 6 4 5 6 9

Find the mean number, variance, and standard deviation of typos in a book. Solution

Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5

19 19 21.5 19.5 20 17 20 20 19 20.5

18 18.5 20 19.5 20.75 20 21 18 20.5 20

21 19 20.5 19 20 19.5 17.75 20 19.5 20

20.5 17 21 18.5 20 20 20 18.5 19.5 19

18 20.5 18 20 19 19 19.5 20 20.75 21

17.75 19 18 19 20 18.5 20 19 21 19

19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5

20.25 20 19.5 19.5 20 20 20 21 20 19

18.5 20.5 21.5 18 19.5 18

Compute the variance and standard deviation of the length, at birth, of these babies. Solution

Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance and standard deviation.

Weight x 31 32 33 34 35 36 37

Frequency f 3 2 4 5 6 5 9

Find the variance and the standard deviation. Solution

Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus.

23 17 19 24 42 33 20 22 15 9

26 37 29 19 35 18 30 21 11 23

13 27 32 32 23 35 25 33 24 23

Find the mean, variance, and the standard deviation of the data.

Lesson 3 : Probability

Introduction 3.1 Basic Concept of Probability 3.2 Sets and Subsets, Statistical Experiments, Sample Space, Events, Probability

3.3 Laws of Probability 3.4 Counting Techniques and Probability 3.5 Conditional Probability and Independent Events

Homework 8 - 11

Page 25: Elementary Statistics 3

Introduction

Probability to a statistician is the probability of the occurrence of an event. To an ordinary person it is the quantified chance of occurrence of that event.

Some of the early theory of probability originated in gambling and later theories developed in bioscience. We get very tempted when we see somebody win $1 million in a lottery, but lottery operators design their games and machines in such a way that they will make more money than they give, in the long run.

3.1 Basic Concept of Probability

We are all familiar with simple probabilistic statements. If you toss a coin the probability that the HEAD will show up is 1 out of 2. If you roll a die the probability of getting the face 5 is 1 out of 6. The probability of having an accident on a particular busy street on a particular day is 1 out of 100. (When a child says "probably we should invite Aaron for my birthday," however, the "probably" may have little to do with mathematics of probability, but shows the awareness of the concept of probability at a basic human level.)

When we toss a coin for a large number of times we find that essentially half the time the head shows up. As we continue to toss, we see that the ratio of the number of Heads to the number of tosses remains close to and moves around 1/2. So we say that if we toss a coin, the probability that the head will show up is .50. On the other hand, if this ratio remains close to and moves around .49 then we will say the probability of heads is .49. To understand the concept of probability empirically, we visit aflash animation of a coin tossing experiment.

We observe the accidents on a street over a long period of time and observe that on about one in a hundred days there is an accident. The longer we continue to observe, we see that the ratio of the number of days there is an accident to the number of days observed remains close to one to one hundred. So we say that probability of an accident on a day on that street is 1 percent.

These examples explain the basic notion of probability. The probability of an event is understood as the "relative frequency," the ratio of occurrences of the EVENT to the total number of times the EXPERIMENT is repeated.

3.2 Sets and Subsets, Statistical Experiments, Sample Space, Events, Probability

This section provides basic definitions that we will need for the rest of the course.

Sets and Subsets

Definition. By a set S we mean a collection of objects. The objects in this set S are also called elements of the set. A set E is said to be a subset of a set S if each element of E is also an element of S. We write

E S⊆

to mean that E is a subset of S. Obviously, a subset E of S is a smaller collection than or equal to S.

The following are some examples. We also explain the usage of braces to describe a set.

1. Let D = the collection of all 52 cards in a deck. Then D is a set. Let E be the collection of all the hearts in this deck. Then E is a subset of D. In brace notation

E={x in D : X is a Heart }

This is read "the set of x in D such that x is a heart"

Page 26: Elementary Statistics 3

2. Let T be the collection of all those who filed a tax return to the IRS for the year 1999. Then T is a set. Let L be the collection of those whose Adjusted Gross Income in the return was less or equal to $30,000. Then L is a subset of T. Let C be the collection of those who declared capital gains income. Then C is a subset of T. We write

L T⊆

C T⊆

In brace notation

L = {x T : the Adjusted Gross Income of x is less or equal to $30,000}.∈

The symbol means "an element of"∈ x T means x is an element of T∈

3. Let N be the collection of all integers, and let E be the collection of even integers. Then N, E are set and

E N⊆

In brace notation

N = {n : n is an integer} E = {n N : n is even}. ∈

4. Let R be the set of all (real) numbers. Let I be the set of all numbers between 0 and 1, not equal to 0,1. Then R,I are sets and I is a subset of R. In brace notation

R = {x : x is a real number} I = {x ∈ R : 0 < x < 1}.

5. S = {1,7,13,17,19} is a set.

6. Let S be the collection of you and your siblings, B be the collection of your brothers, and F be the collection of your sisters. Then S,B,F are sets and we have

F S⊆

B S.⊆

Statistical Experiments and Sample Space

Definitions.

1.A statistical experiment is a procedure that produces exactly one out of many possible outcomes. All the

possible outcomes are known, but which outcome will result when you perform the experiment is not known.

2.Given an experiment, the set of all possible outcomes is called the sample space.

3.Given an experiment, an outcome of the experiment is called a sample point. So, the sample space consists of

sample points.

Page 27: Elementary Statistics 3

Examples. The following are examples of some experiments and their sample spaces.

1.Suppose your experiment is tossing a coin. The outcomes are H (heads) and T (tails). So, the sample space is S =

{H,T}.

2.Suppose your experiment is tossing a coin twice. The sample points (or outcomes) are HH,HT,TH,TT and the

sample space is S = {HH,HT,TH.TT}.

3.Your experiment is rolling a die. The outcomes are 1,2,3,4,5,6 and the sample space is S = {1,2,3,4,5,6}.

4. Suppose that your experiment is rolling a die twice. Then the sample space is

S=

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

In brace notation, we can write

S = {(i,j) : i = 1,2,3,4,5,6 and j = 1,2,3,4,5,6}.

5. Suppose your experiment is to determine the number of road accidents in Lawrence on a particular day. So, the sample space is S = {0,1,2,3 ... }.

6. Suppose the experiment is to determine the sex of an unborn chlid. Then the sample space is S = {Female, Male}.

7. Suppose your experiment is to determine the blood group of a patient in a lab. Then the sample space is S = {O,A,B,AB}.

8. Suppose your experiment is to observe the annual wheat production in Kansas. Then the sample space is S={x : x is a nonnegative Number} = {x ∈ R : x ≥ 0} =[0, ∞).

Definition. The sample space S is called a finite sample space if S has only a finite number of outcomes. If S has infinite elements, it is called an infinite sample space. Note that examples 1, 2, 3, 4, 6, and 7 above have finite sample spaces, and 5 and 8 have infinite sample space.

Events

Definitions. Given an experiment and its sample space S, the following are important definitions.

1.A subset of the sample space S is called an event. So, an event E consists of outcomes, and we have

E S.⊆

2.An event that has no outcome and is called the∅ empty event or impossible event. The impossible event

consists of no outcome; if you perform the experiment, the impossible event will never occur.

Page 28: Elementary Statistics 3

3.Since S is also a subset of S, S is an event. This event S is called the sure event. If you perform the experiment,

this event is sure to occur.

4.A simple event consists of a single outcome.

Remark. Often, we will describe events in "English," and we may have to identify them as a subset of the sample space and also conversely.

Examples. The following are some examples of events.

1. Look at example 2 above—the experiment on the coin toss. Let E be the event that at least one of the tosses gave T, and let F be the event that both tosses gave the same face. Then

E = {HT, TH, TT} and F = {HH,TT}.

2. Look at example 4 above—the experiment on rolling a die. Let E5 be the event that first die showed 5. Then

E5 = {(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)}.

Let T5 be the event that the sum of the two "rolls" is 5. Then

T5 = {(1,4), (2,3), (3,2), (4,1)}.

Let T1 be the event that the sum of the two rolls is 1. Because T1 has no outcome, it is an impossible event. Let

T13 be the event that the sum of the two rolls is 13. Then T13 is also an impossible event.

3. Look at the example 5 above—the experiment on road accidents. Let E be the event that there is no accident on that day. Then

E = {0}.

4. Look at example 8 above—the experiment on annual wheat production. Let E be the event that there will be more than 1000 units of wheat production in 1998. Then

E = (1000, ∞).

The Theory of Probability

Given a sample space S, in the MATHEMATICS of probability we have rules for how to compute the probability of an event E. Although the MATHEMATICS of probability was inspired by the empirical concept of probability, we do not derive anything from our intuitive ideas. We are guided by the precise rules and laws that we set up.

For now we will be dealing with finite sample spaces.

Definition. Let

Page 29: Elementary Statistics 3

S = { e1, e2, ... ,en }.

be a finite sample space. The probability of a simple event {e} is a number (possibly given) denoted by P({e}) which has the following properties:

1. 0 ≤ P({e}) ≤ 1.

2. The sum of the probabilities of all the simple events is 1:

P({e1}) + P({e2}) + ... + P({en}) = 1.

3. If E is an event, then the probability E, P(E), is defined as the sum of the probabilities of all the sample events in E:

P(E)= ∑ P({e})

e E∈

4. So, we also have

P(impossible Event)=P ( )=0∅

P(Sure Event)=P(S)=1

Remark. If we know the probabilities P({e}) of all the simple events {e}, we will be able to compute the probability of any event E using 3. The probabilities of the simple events will

1. either be given

2. or we will be given a rule how to compute it.

Probability with Equally Likely Outcomes

One of the most frequently used models to compute probabilities of simple events is called EQUALLY LIKELY OUTCOMES.

Definition. Let S = {e1, ... , eN} be a finite sample space. We say that all the outcomes are equally likely if all the

outcomes have the same probability. So, in this case, we have

P({e1}) = P({e2}) = … = P({eN}) = 1/N.

Also, in this case, for an event E

P(E) = ∑ P({e}) = ∑ 1/N

e E∈ e E∈

=(Number of Outcomes in E)/(Number of Outcomes in S)

If n(E) denotes the number of outcomes in E then

P(E) = n(E) .

Page 30: Elementary Statistics 3

n(S)

Problems on 3.2

Probability of Simple Events Given in a Table

Exercise 3.2.1. The following table gives the blood group distribution of a certain population.

Blood Group Distribution

Blood Group O A B AB

Percentage ofPopulation

47 42 8 3

Find the probability that a random sample of blood will be of Blood Group A or B or AB. (Here S={O, A, B, AB} and we want to compute the probability P(E) of the event E={A, B, AB}.Solution

Exercise 3.2.2. A student wants to pick a school based on its grade distribution. Following is the most recent grade distribution in a school:

Grade Distribution Unreal Data

Grades A B C D F

Percentage ofStudents

19 33 31 14 3

Find the probability that a randomly picked student will have at least a B average. Solution

Exercise 3.2.3. The following table gives the probability distribution of a loaded die.

Probability Distribution for a Die

Face 1 2 3 4 5 6

Probability 0.20 0.15 0.15 0.10 0.05 0.35

Find the probability that the face 2 or 3 or 6 will show up when you roll the die. Solution

Find the Probability with Equally Likely Outcomes

Exercise 3.2.4. An urn contains 7 apples and 3 oranges and 5 pears. One piece of fruit is picked at random. Find the probability that

1. the fruit is an apple,

Page 31: Elementary Statistics 3

2. the fruit is either an apple or a pear, and

3. the fruit is an orange.

Solution

Exercise 3.2.5. A die is rolled twice. Find the probability that

1. the sum is 8,

2. only 2 or 3 showed up in both the rolls, and

3. the first roll produced a bigger number.

Solution

Exercise 3.2.6. A letter is chosen at random from the letters of the English alphabet. Find the probability that

1. the letter is either I or U,

2. the letter is in the word ALWAYS, and

3. the letter is not in the word NEVER.

Solution

3.3 Laws of Probability

Notations from Set Theory

Following are a few notations from the set theory, which we will be using in the context of sample spaces and events.

Notations. Let S be a set and E, F be two subsets of S.

1.The union E F, of E and F is the set defined as follows:∪

E F = {x S : x E∪ ∈ ∈ or x F}.∈

So, if you put together the elements of E and F in a single collection, you get the union E F.∪

2.The intersection E F, of E and F is defined as follows:∩

E F = {x S : both x E and x F}.∩ ∈ ∈ ∈

So, if you take all the elements common to both E and F, you get the intersection of E and F.

3.The complement Ec, of E is defined as follows:

Page 32: Elementary Statistics 3

Ec = {x S : x E}.∈ ∉

So, the complement Ec of E is the collection of all the elements in S that are not in E.

Remark. If we can understand and interpret the above definitions in our context of sample spaces and events, that is adequate. For us, S will be a fixed sample space and E,F will be events.

1.E F is the event that consists of all outcomes that are either in E or in F (or both). So the occurrence of either E∪

or F is the same as the occurrence of E F. That is why some textbooks use the notation∪ (E or F) for E F. So,∪ notationally, as in some textbooks,

E F = E∪ or F.

2. E F is the event that consists of all the outcomes that are both in E and F. So the simultaneous occurrence of E∩ and F is the same as the occurrence of E F. That is why E F is denoted by (E and∩ ∩ F) in some texts. Notationally, as in some textbooks,

E F = E∩ and F.

3.Similarly, Ec is the event that consists of all the outcomes in S that are not in E. So, the occurrence of Ec is the

same as the nonoccurrence of E. Notationally, as in some textbooks,

E c = (not E)

Laws of Probability

Following are some of the laws of probability.

First, probability behaves like area and the laws of probability are like that of area.

Some formulas and definitions: Let S be sample space and let E and F be two events.

1. We have

P(E F) = P( E∪ or F) = P(E)+P(F)- P(E F)∩

= P(E) + P(F) -P(E and F)

We subtract P(E F) because we counted it twice: once in P(E) and once in P(F).∩

2.Definition. We say E and F are mutually exclusive if E F = , i.e., E and F have no outcome in common.∩ ∅

Since P( ) = 0, it follows from 1 that∅

if E and F are mutually exclusive then

Page 33: Elementary Statistics 3

P(E F) = P(E) + P(F)∪

3. We also have

P(Ec) = 1 - P(E).

4.Definition. Let E be an event. We say that the odds of an event E occuring are a to b if

P(E) = a/(a+b)

Remark: This concept of ODDS is used often in gambling. When the odds in favor of a horse are 2 to 3, essentially this means that the probability the horse will win is 2/5. We say "essentially" because in actual betting, the probability is actually slightly less than 2/3, so that in the long run the gambling establishment makes more money than it gives. (This instructor is not particularly experienced in such betting or horse races.)

Problems on 3.3: Laws of Probability

Exercise 3.3.1. Let E, F, G be three events. It is given

P(E)=0.3 P(F)=0.7 P(G)=0.6 P( E F) = 0.2∩ P( E G) = 0.7 ∪

Find the probability that

1. E or F occur,

2. both E and G occur, and

3. E does not occur.

Solution

Exercise 3.3.2. Let E, F, G be events .

1. If the odds in favor of E are 3 to 5, find the probability that E occurs.

2. If the odds against F are 3 to 4, find P(F).

3. If P(G) = 7/10, what are the odds in favor of G?

Exercise 3.3.3. The probability that a Christmas tree is taller than 6 feet is .30; the probability that a Christmas tree weighs more than sixty pounds is 0.25; and the probability that a Christmas tree is either taller than 6 feet or more than sixty pounds is .4.

1. Find the probability that a Christmas tree is both taller than 6 feet and weighs more than sixty pounds.

2. Find the probability that a Christmas tree is not taller than 6 feet.

Page 34: Elementary Statistics 3

3. Find the probability that a Christmas tree is either less than 6 feet tall or less than sixty pounds in weight.

4. Find the probability that a Christmas tree is neither taller than 6 feet nor heavier than sixty pounds.

Solution

Exercise 3.3.4. The probability that a student majors in liberal arts is .44; the probability that a student majors in business is .33; and the probability that a student majors in either liberal arts or business is .65. Find the probabilities

1. that a student majors in both liberal arts and business.

2. that a student majors in neither liberal arts nor business.

Solution

3.4 Counting Techniques and Probability

Counting techniques are important and useful to learn. You might like to know, for example,

1. the number of English words (formal) of 5 letters, (A formal word is any sequence of letters from the English alphabet. For example, eezq is a formal word.)

2. the number of ways you can deal a hand of 13 cards from a deck of 52 cards, or

3. the number of ways you can assign the first row of 11 seats to 231 guests.

Before we go further into counting, let us recall the factorial notation.

Notations. Let n be a positive integer. Then the n! (read as factorial n) is defined as

n!= 1 . 2 . … (n-2) . (n-1) n

0!=1.

Factorial n is the product of all integers from 1 up to n.

One of the main tools for such counting is the following principle:

The Basic Counting Principle. Suppose we have an experiment that is a combination of r sub-experiments, performed one after the other, such that

1. the first sub-experiment has n1 outcomes;

2. corresponding to each outcome of the first sub-experiment, the second sub-experiment has n2 outcomes;

3. corresponding to each outcome of the first and the second sub-experiments, the third sub-experiment has n3 outcomes;

Page 35: Elementary Statistics 3

r. corresponding to each outcome of each of the previous r-1 sub-experiments, the rth sub-experiment has nr outcomes.

Then our original experiment will have n1n2 ... nr outcomes.

Remark. Here we have used the word "experiment" in a slightly different sense than the statistical experiments. The basic counting principle will be used to count the number of outcomes in sample spaces and events.

Examples.

3.4.1. Count the number of words of length four that you can construct from the English alphabet. Answer: 26x 26x26x26

We use the counting principle by splitting this experiment into four sub-experiments:

Stage Job to do Number of Ways

1. Pick the first letter 26

2. Pick the 2nd letter 26

3. Pick the 3rd letter 26

4. Pick the 4th letter 26

Answer = Product = 456976

3.4.2. Count the number of ways you can assign the 11 seats in the first row in a concert hall to 231 guests.

Stage Job to do Number of Ways

1. Assign seat 1 231

2. Assign seat 2 230

3. Assign seat 3 229

4. Assign seat 4 228

5. Assign seat 5 227

6. Assign seat 6 226

7. Assign seat 7 225

8. Assign seat 8 224

9. Assign seat 9 223

10. Assign seat 10 222

11. Assign seat 11 221

Answer = Product = 221*222*...*230*231

3.4.3. Contrast: How many ways can you form a committee of 11 members from a group of 231 people? Unlike assigning seats, here the order of selection of the members will be ignored. The 11 members, when permuted around, will have different seat assignments but in the same committee. Forming the committee is a "combination" problem that comes below.

Remark. The difference between assigning 11 seats in a row and forming a committee of 11 is that in the first case the order of assignment is important. Assigning the first row to the same 11 guests in two different ways will count as

Page 36: Elementary Statistics 3

two different outcomes. When we form a committee, the order in which we pick 11 members does not make any difference.

Definition. Suppose we have n objects. We pick r of them one by one (without ever puttting them back) and arrange them in a row. Such an ordered arrangement will be called a permutation of n objects taken r at a time. The number of permutations of n objects taken r at a time is denoted by nPr. It follows from the basic counting principle that

nPr = n (n-1) (n-2) ... (n-r+1) = n!/(n- r)!

Number of permutations nPr = product of r integers starting from n downward.

In contrast, we can pick r objects from a collection of n objects one by one but place the object back in the collection before the next pick, and arrange all of them in a row. Such selection and arrangement is called picking with replacment. Constructing a formal word of length 4 is an experiment of picking with replacement.

Remark: Example 3.1 is a problem on picking with replacement because a letter can be selected more that once. Example 3.2 is a permutation problem.

Definition. Suppose we have n objects in a container. We pick r of them all at a time. In this case the order of selection does not come into consideration. Such a selection is called a combination of n objects taken r at a time. The number of combinations of n objects taken r at a time is denoted by nCr and is given by

nCr = n!

(r! (n-r)!)

Examples. 1. Count the number of ways you can form a committee of 11 from a group of 231 people. Answer: 231C112. Count the number of ways you can deal a hand of 13 cards from a deck of 52 cards. Answer: 52C13.

Problems on 3.4: Counting Techniques and Probability

Exercise 3.4.1. Find 5! Solution

Exercise 3.4.2. A homeowner would like to install a new storm door. The local store offers 2 brand names; each brand has 4 different styles and 3 colors. How many choices does the homeowner have?Solution

Exercise 3.4.3. Suppose in the World Cup soccer tournament, group A has 8 teams. Each team of group A has to play all

the other teams in the group. How many games will be played among the group A teams. Answer: 8C2

Exercise 3.4.4. How many ways can you deal a hand of 13 cards from a deck of 52 cards? Answer: 52C13

Exercise 3.4.5. How many ways can you deal a hand of 4 spades, 3 hearts, 3 diamonds, and 3 clubs?Solution Solution-variation

Exercise 3.4.6. We have 13 students in a class. How many ways can we assign the 4 seats in the first row? Solution

Page 37: Elementary Statistics 3

Exercise 3.4.7. Programming languages sometimes use a hexadecimal system (also called "hex") of numbers. In this system, 16 digits are used and denoted by 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. Suppose you form a 6-digit number in a hexadecimal system.

1. What is the probability that the number will start with a letter digit?

2. What is the probability that the number is divisible by 16 (i.e., ends with 0)?

SolutionHere the sample space is the collection of all the 5-digit hex numbers.

• Using the counting principle, the number of hex = n(S) = 166.

• Let E be the event that the number starts with a letter digit.

• Again, by the counting principle, the number of hex in E = n(E) = 6*165.

• So, P(E) = n(E)/n(S) = 6/16.• Let F be the event that the number is divisible by 16. Since a number is divisible by 16 means, in hex, the first

digit is 0.

• So, the number of hex in F = n(F) = 165*1 = 165.

• So, P(F) = 165/166 = 1/16.

Exercise 3.4.8. You are playing Bridge and you are dealt a hand of 13 cards.

1. What is the probability that you will get a hand of 4 spades, 3 hearts, 3 diamonds and 3 clubs?

2. What is the probability that you will get all 4 aces?

3. What is the probability that you will get all 13 spades?

Solution

Exercise 3.4.9. A committee of 9 is selected at random from a group of 11 students, 17 mothers and 13 fathers.

1. What is the probability that the committee has 3 students, 3 mothers, and 3 fathers, i. e., is a balanced committee?

2. What is the probability that the committee has 4 mothers and 5 fathers?

3. What is the probability that the committee has all students?

Solution

Exercise 3.4.10. Three scholarships of unequal value will be awarded from a group of 35 applicants. How many ways can such a selection be made? Solution

3.5 Conditional Probability and Independent Events

Sometimes when new information becomes available, the probability of an event may have to be reevaluated in light of this new information. Suppose we have a sample space S and an event E. Now suppose we have new information that an event C has occurred. We will have to reevaluate the conditional probability of E given that C has occurred. The conditional probability of E given that C has occurred is denoted by P(E|C). Clearly, P(E|C) may be different from P(E). In

Page 38: Elementary Statistics 3

fact, now that C has occurred, our old sample space is no longer relevant. And C assumes the role of the new sample space.

Example. Suppose we pick a KU student at random and let E be the event that the student is taller than 6 feet. Then we have the following observations.

1. The sample space S is the whole KU student population.

2. Since all the outcomes are equally likely, we have

P(E) =

number of KU students who are taller than 6 feet

Total number of KU students

=

n(E)

n(S)

.

3. Now suppose we know that the student selected is a male. Let us denote the event that the student is a male by C. The probability that the student is taller than 6 feet, given that the student is a male, is higher than "simple" P(E). In fact, our new sample space is C, which is the whole KU male student population, not S, which is the whole KU student population.

4. We now have the probability that the student is taller than 6 feet in height given that the student is a male

= P(E|C) =

number of MALE students who are taller than 6 feet

Total number of male KU- students

=

n(E C)∩

n(C)

.

Simple computations show that

P(E|C) =

n(E C)/n(S)∩

n(C)/n(S)

=

P(E C)∩

P(C)

.

Based on the above example, we give the following definition and formula.

Definition. Let S be a sample space and E, C be two events.

1. The conditional probability of E given that C has occurred is

P(E|C) =

P(E C)∩

P(C)

if P(C) ≠ 0.

2. We get the following formula

Page 39: Elementary Statistics 3

P(E C) = P(E|C)P(C).∩

Independent Events

If the conditional probability P(E|F) = P(E) the "simple" probability, then we say that E and F are independent. In this case,

P(E F) = P(E)P(F)∩ .

Definition. We say that two events E and F are independent if

P(E F) = P(E)P(F)∩ .

If two events are not independent, then they are said to be dependent.

Remark. Let us also describe what we mean by independence of 3 or more events. For events E1,E2, … , En, we say they

are independent if the "multiplication rule" applies. For example E,F,G,H are independent if all of the following holds:

2 events

P(E F) = P(E)P(F), ∩ P(E G) = P(E)P(G),∩

P(E H) = P(E)P(H), ∩ P(F G) = P(F)P(G),∩

P(F H) = P(F)P(H), ∩ P(G H) = P(G)P(H)∩

3 events

P(E F G) = P(E)P(F)P(G), ∩ ∩ P(E F H) = P(E)P(F)P(H), ∩ ∩

P(E G H) = P(E)P(G)P(H), ∩ ∩ P(F G H) = P(F)P(G)P(H) ∩ ∩

4 events

P(E F G H) = P(E)P(F)P(G)P(H)∩ ∩ ∩

Problems on 3.5: Conditional Probability and Independent Events

Exercise 3.5.1. Let A, B be two events. Given that

P(A) = .66 P(A B) = .11∩

Find P(B|A). Solution

Exercise 3.5.2. Given

P(A|B) = .8 P(B) = .1

Find P(A B).∩ Solution

Page 40: Elementary Statistics 3

Exercise 3.5.3. In a certain county, the probability that a person took a flu shot is .45 and the probability that a person will get flu, given that he/she took a flu shot is .06. What is the probability that a randomly selected person took a flu shot and will get flu? Solution

Exercise 3.5.4. Consider the following two circuit diagrams:

Circuit 1

Circuit 2

For each of the two circuits do the following:As you can see, current flows through two switches A and B to the radio and back to the battery. It is given that the probability that the switch A is closed is 0.91 and the probability that the switch B is closed is 0.83. Assume that the two switches function independently. Find the probability that the radio is playing. Solution

Exercise 3.5.5. An airplane has two engines. The probability that engine 1 fails is 0.023 and the probability that engine 2 fails is 0.06. Assume that the engines function independently.

1. What is the probability that both engines fail?

2. What is the probability that both will not fail?

3. What is the probability that neither will fail?

Solution

Exercise 3.5.6. Following are data from a hospital emergency room:

1. The probability that a patient in the emergency room will have health isurance is 0.75.

2. The probability that a patient in the emergency room will survive the treatment 0.85.

3. The probability that a patient in the emergency room will have health insurance and will also survive is 0.7.

What is the conditional probability that a patient in the emergency room will survive, given that he/she has health insurance. Solution

Exercise 3.5.7. The probability that you will receive a wrong number call this week is 0.3; the probability that you will receive a sales call this week is 0.8; and that the probability that you will receive a survey call this week is 0.5. What is the probability that you will receive one of each this week? (Assume that all these calls are independent.)

esson 4 : Random Variables

Page 41: Elementary Statistics 3

4.1 Random Variables 4.2 Probability Distribution 4.3 The Bernoulli and Binomial Experiments

Homework 12 and 13

4.1 Random Variables

Definition. Let S be a sample space. Then a random variable X assigns a numerical value X(w) to each outcome w in S.

Examples. Suppose we pick a KU student at random. Then our sample space S is the whole population of KU students.

1. Let X be the GPA of the student. If w is a student, X has a value X(w) which is the GPA of w.2. Define Y as follows :

Y(w) = 0 If w is Male Y(w) = 1 If w is Female

3. Let Z be the height of student w.4. Let T be the number of credit hours completed by w.5. Let W be the weight of w.6. Let D be the total expenses (rounded up to the nearest dollar) of w in 1997.

Then X,Y,Z,T,W,D are all random variables.

Definitions. A random variable X is said to be a discrete random variable if the values that X can assume can be written in a (possibly infinite) list

x1, x2, x3, ….

A random variable X is said to be a continuous random variable if X can assume any value in an interval.

Remark. In this course, examples of discrete random variables are always the number of something: number of typos, number of accidents on a street, number of defective items in a lot, and so on. Examples of continuous random variables are length, weight, and time.

So, Z,W are continuous random variables and X,Y,T,D are discrete random variables.

Examples.

1. Let X be the number of wrong number calls you receive in a day. Then X is a discrete random variable.

2. Let X be the waiting time before you receive the next wrong number call. Then X is a continuous random variable.

First, we will be concerned with the discrete random variables.

4.2 Probability Distribution

The probability distribution of a random variable X is a table or a rule or a method that answers probability-related questions regarding X.

Definition. Suppose X is a discrete random variable that assumes the values

x1,x2,….

The probability distribution of X can be described by giving

p(xi) = P(X = xi)

in a table or by a formula. This function p(xi) is called the probability function of X.

So, if the probability distribution of X is given in a table, then it looks like this:

Page 42: Elementary Statistics 3

Value x

Probability p(x)

x1 p(x1)

x2 p(x2)

x3 p(x3)

… …

Properties of Probability function. Suppose X is a discrete random variable that assumes value

x1, x2, x3, …

and let p(x) be the probability function. Then we have the following:

1. 0 ≤ p(xi) ≤ 1.

2. ∑ p(xi) = 1.

Definition. Let X be a discrete random variable that assumes the values

x1, x2, x3, …

Then the mean μ of X is defined as

μ=∑ xip(xi).

The mean μ is also called the expected value of X and is denoted by E(X). The mean μ is also called the population mean.

Example. Suppose you design a coin toss game. In this game, you give the opponent $3 if a head comes and you collect $1 if a tail comes. Let X be the money you receive. Then X assumes the values -3 and 1. You also have a loaded coin so that

P(H) = 1/9 P(T) = 8/9.

Then the probability distribution of X is given by

Value x

Probability p(x)

-3 1/9

1 8/9

So, the mean μ of X is given by

μ=∑ xip(xi)= (-3)(1/9)+1(8/9)=5/9.

Interpretation of mean μ of X. In this example, (see the first example in section 4.1), the mean μ tells us your average win per game if you play for a long time.

Similarly, if Z is the height then the mean μ = E(Z) is the actual mean height of the KU student population. If we take a large sample from the KU student population and compute the sample mean, it should approximate μ.

Definition. Let X be a discrete random variable that assumes values

x1,x2, x3, …

Then the variance σ2 of X is defined as

σ2= Variance(X)=∑ (xi-μ)2p(xi).

Some simplification will show

σ2= Variance(X)=∑ xi2p(xi)-μ

2.

Page 43: Elementary Statistics 3

The standard deviation σ of X is defined as the positive square root of the variance of X.

standard deviation of X= σ =√Variance(X)

The variance σ2 is also called the population variance. If we take a large sample and compute the sample variance

s2 then s2 will be an estimate for σ2. Similarly, σ is called the population standard deviation.

Problems on 4.2: Probability Distribution

Exercise 4.2.1. The number of passengers X in a car on a freeway has the following probability distribution.

X=x 1 2 3 4 5

p(x) 0.35 0.30 0.15 0.15 0.05

Find:

1. the expected number of passengers in a car;

2. the Variance σ2 of the number of passengers;

3. the probability that the number of passengers in a car is at least 3.

Solution

Exercise 4.2.2. Karin is a plumber who works for 3 different employers. Employer A pays her $120 a day, employer B pays her $70 dollars a day, and employer C pays her $180 a day. She works for whoever calls her first. The probability that employer A calls her first is 0.30; the probability that employer B calls first is .20; and the probability that employer C calls her first is 0.40 (the probability that no one calls is .10). What is the expected income and variance of Karin per day? Solution

Exercise 4.2.3. An insurance company sells a flight insurance policy at a flat rate of $500 per flight. If a policyholder dies in flight, the insurance company pays $100,000 to the survivors. The probability that a policyholder will die in flight is .003. What is the expected gain and variance of the company per sale? Solution

4.3 The Bernoulli and Binomial Experiments

There are many random variables that we encounter fairly often. The first one that we discuss is called a Bernoulli random variable.

Definition. There are many statistical experiments that have only two outcomes. In such cases, the outcomes may be called a success or a failure. So the sample space is

S={s,f}.

Here s means success and f means failure.

Such an experiment is called a Bernoulli trial. Given a Bernoulli trial, we can define a random variable as

X = 1 if successX = 0 if failure

If the probability P(success) = p then we have P(failure) = 1-p. So, the probability distribution of a Bernoulli random variable is given by

Value x

Probability p(x)

0 1-p

Page 44: Elementary Statistics 3

1 p

The mean of X is

μ = 0(1-p)+1p = p.

The variance of X is

σ2 = ∑ xi2p(xi) - μ2 = (0.(1-p)+1p) -p2 = p-p2 = p(1-p).

Binomial Random Variable

Definition. An interesting statistical experiment is a combination of n "identical and independent" Bernoulli trials. Such an experiment is called a binomial experiment. More formally, given a positive integer n and a number p with 0 ≤ p ≤ 1 a binomial(n,p) experiment (or B(n,p) experiment) is characterized as follows:

1. A binomial experiment consists of n identical and independent Bernoulli trials.

2. The probability of success in each trial remains fixed and is equal to p.

Definition. Given a B(n,p)-experiment, let

X = total number of successes in these n trials.

Then X is called a binomial (n,p)-random (or B(n,p)-random) variable. Following are some important facts about a B(n,p)-random variable X:

1. X can assume values 0,1,…,n. The probability distribution is given by

p(r) = P(X = r) = P(r success) = nCr pr(1-p)n-r

where r runs through 0,1,2,…,n.

2. The mean of X is

μ = E(X) = np.

3. The variance of X is

σ2 = Variance(X) = np(1-p).

Problems on 4.3: Binomial Experiments

Exercise 4.3.1. Let X be a B(6,.3)-random variable. Find P(X = 2). Also find the probability that X is at least 2. Solution

Exercise 4.3.2. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control (CDC), 18 percent of children younger than 2 years had anemia in 1997. On a particular day, a pediatrician examined 11 children.

1. What is the probability that none will have anemia?2. What is the probability that exactly 5 will have anemia?3. What is the probability that all will have anemia?4. Compute the expectation and variance of the number of children with anemia.5. What is the probability that at least 7 will have anemia?

Page 45: Elementary Statistics 3

Solution

Exercise 4.3.3. A gardener planted 15 seeds. The probability that a seed will germinate is 0.1.

1. What is the probability that exactly 3 seeds will germinate?2. What is the probability that exactly 4 seeds will germinate?3. What is the probability that exactly 9 seeds will germinate?4. Compute the expected number of seeds that will germinate.5. Compute the standard deviation of the number of seeds that will germinate.6. What is the probability that at most 4 seeds will germinate?

Solution

Exercise 4.3.4. In a particular county, 60 percent of the population is Hispanic.

1. What is the probability that a jury of 12 will have exactly 6 Hispanic members?2. What is the probability that a jury of 12 will have more than 6 Hispanic members?

Solution

Exercise 4.3.5. From the hiring statistics of a corporation (say IBM), it is known that for every 4 interviews they give, they make 1 job offer. Suppose that the corporation interviews 8 candidates each time it comes to campus. What is the mean and standard deviation of the number of job offers made each time?

Lesson 5 : Continuous Random Variables

5.1 Probability Density Function (pdf) 5.2 The Normal Random Variable 5.3 Nomal Approximation to Binomial

Homework 14 - 16

5.1 Probability Density Function (pdf)

Given a sample space S, a continuous random variable was defined as a random variable X that can assume any value in an interval. The probability distribution of a continuous random variable is described very differently from that of a discrete random variable. We describe it as follows.

Definition. Let S be a sample space and X be a continuous random variable. Then there is a function f(x), of real numbers x, to be called the probability density function, abbreviated as pdf of X. This pdf f(x) has the following properties:

1. We have f(x) ≥ 0 for all real numbers x.2. For any two real numbers a ≤ b (also for a = -∞ and b = ∞) the probability that X will be between a and b is

given by the area under the graph of y = f(x), above the x-axis and between the vertical lines x = a and x = b. In mathematical notations we have

P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b) =

the area under the graph of y = f(x), above x-axis, between the vertical lines x = a and x = b.

Page 46: Elementary Statistics 3

Look at the animations on

1. exponential probability.

2. normal probability.

3. If you had calculus, we have

P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b) = a∫bf(x)dx

4. It follows that for any real number a

P(X = a) = 0.

This is very much in contrast with the discrete random variables.

5. The whole area under the graph of y = f(x) above the x-axis must be one.

Remark. Given a continuous random variable X, to get a model for f(x) we look at a large sample and look at the relative frequency histogram of the X-values.

Example. Let X have the following pdf:

f(x) =1 if 0 ≤ x ≤ 1

0 Otherwise

Then we say X is uniformly distributed between 0 and 1 because it has the same density everywhere between 0 and 1.

Similarly, Y is said to be uniformly distributed between -1 and 3 if the pdf of Y is given by

g(x) =1/4 if -1 ≤ x ≤ 3

0 Otherwise

The Mean and Variance

The mean μ, variance σ2 and standard deviation σ of continuous random variables X are interpreted as we did for discrete random variables. As before, the mean μ, which is also called the expectation E(X), represents the average value of X.

But the definitions involve some calculus, which we are trying to avoid. If you have had calculus, I am giving the following definitions.

Suppose f(x) is the pdf of a continuous random variable X. Then the mean of X is

Page 47: Elementary Statistics 3

μ =E(X)=∫

xf(x)dx-∞

and the variance of X is

σ 2 =Variance(X)=∫ ∞

(x- μ )2 f(x)dx-∞

and the standard deviation σ is the square root of the variance σ2.

Look at the following flash animations of graphs of some pdfs:

1. Example 1. Normal

2. Example 2. t Distribution

3. Example 3. Chi-square Distribution

5.2 The Normal Random Variable

The most commonly encountered random variable in nature is the normal random variable. As we have seen in the last section, the probability distribution of a random variable is determined by the pdf of the random variable. The pdf of a normal random variable is described below.

PDF of a Normal Random Variable: Suppose f(x) is the pdf of a normal random variable X. Then we have the following properties of f(x).

1. The graph of the pdf y = f(x) has a symmetric bell shape as illustrated below:

2. Look at the flash animation of the pdf of normal random variables.

3. The pdf f(x) is completely known if we know the mean μ and the standard deviation σ.

4. The graph is symmetric around the vertical line x = μ. The graph is also peaked at x = μ.

5. The graph approaches the x-axis at both ends of the x-axis.6. The larger the standard deviation σ is, the flatter the the graph of y = f(x) will be.

7. In fact,

f(x)= 1/[σ √(2 Π)] exp [-(x- μ)2/(2σ2)] for - ∞ < x < ∞ .

8. If X is a normal random variable, we say X is normally distributed, or X has normal distribution. We also

write X has N(μ,σ)-distribution.

Definition. A normal random variable is called a Standard Normal Random Variable if it has mean μ = 0 and standard deviation σ = 1. So, a N(0,1)-random variable is called a standard normal variable. In some textbooks the standard normal

Page 48: Elementary Statistics 3

random variable is denoted by Z. The GOOD NEWS is that a table is available to compute these probabilities. The following properties of Z will be useful.

1. The graph of the pdf y = f(x) of the standard random variable Z is symmetric around the y-axis.2. The total area under the graph above the x-axis is one.3. So, on each side of the y-axis, the area under the graph above the x-axis is .5.

4. Visit the flash animation on Standard Normal Probability to see illustrations of the above.

Using the Probability Tables: Tables are used widely to compute probability. However, due to the use of various software programs on probability, the importance of such tables has declined. In this chapter, we will use the Z-table to compute probability for the standard normal random variable. We note the following:

1. Tables are available in many different formats.

2. Visit the Z-table and try to understand it.

3. This table gives P(Z<z) for numbers z.4. The probability P(Z<z) is the area on the left side of z, under the bell curve.5. The number z is read from the left column and top. The probability P(Z<z) is given in the middle.6. So, P(a < Z < b) = P(Z < b) - P(Z<a) = the difference between the probability P(Z < b) and P(Z<a) that we read

from the table.

Inverse Probability: Sometimes we will be given the probability and asked to compute a "cut off" point.

1. Example: We may be given P(Z < c) = .975 and asked to compute c. You will see from the table P(Z<1.96) = .

975 and conclude that c=1.96.

2. Example: We may be given P(l < Z) = .005 and asked to compute l. P(l<Z) represents the area on the right side

of l, under the bell curve. So, P(Z < l) = 1 - .005 = .995. From the table P(Z<2.58) = .995 (actually .9951, but the exact match is not always expected). So, l=2.58.

3. Visit the animation on Inverse Z distribution to inspect a particular type of cut-off problem that we will use later.

Given a N(μ,σ)-random variable X, we can use the Z-table to compute probabilities for X because of the following theorem.

Theorem. Let X be a N(μ, σ)-random variable. Then Z = [(X-μ)/(σ)] is a standard random varable. So,

P(a < X < b) = P(

a- μ

σ

< Z <

b- μ

σ

)

OR

P(a < X < b) = P(A < Z < B)

where A= (a-μ)/σ and B= (b-μ)/σ).

Now we can use the Z-table.

Problem Solving: We will have two types of problems in this section—probability computation and problems of inverse probability (or cut-off points).

1. For a problem on normal random variables X with mean μ and standard deviation σ, the first step

is STANDARDIZATION.2. Then, we look at the Z-table.

3. Example: Suppose X is a N(2, .5) random variable and P(X<L) = .95, what is the cut-off L? First, we standardize

and we have P((X-μ)/σ < (L-μ)/σ) = P(Z < (L-μ)/σ ) = .95. From table, P(Z < 1.65) = .95 (approximately). So, L-μ/σ = 1.65 an L = μ+1.65σ = 2 + 1.65*.5 = 2.825.

Ubiquity of Normal Random Variables: Any random variable that we encounter in nature is, almost certainly, either normal or approximately normal. If there is one concept that you take from this course it is this: nature's random variables are normal or approximately normal. You will hear about normal random variables and the bell curve in your workplace or anywhere you may have to use statistics.

Page 49: Elementary Statistics 3

Problems on 5.2: the Normal Random Variable

Exercise 5.2.1. Let Z be the standard normal random variable.

1. Find the probability P(-1.1 < Z < 2.5).2. Find the probability P(Z < -2.1).3. Find the probability P(-2.1 < Z < -1.5).4. Find the probability P(1.5 < Z).

Experiment with the normal animation. Solution

Exercise 5.2.2. Let X be a normal random variable with mean μ = 3 and standard deviation σ = 1.5 .

1. Find the probability P(-1.1 < X < 2.5).2. Find the probability P(X < -2.1).3. Find the probability P(-1.2 < X < -0.5).4. Find the probability P(1.5 < X).

Experiment with the normal animation. Experiment with the Solution

Exercise 5.2.3. The length of life of some light bulbs produced in a factory is normally distributed with mean 8640 hours and standard deviation 1440 hours. Find the probability that a bulb will last

1. less than 5040 hours;2. between 5040 hours and 8640 hours.

Solution

Exercise 5.2.4. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. What proportion (i.e, probability) of fish are between 44 cm and 110 cm long?Solution

Exercise 5.2.5. The diameter of the pumpkins in my patch has normal distribution with mean 13 inches and standard deviation 4.5 inches. What proportion (i.e., probability) of pumpkins is above 22 inches? Solution

Exercise 5.2.6. The annual expenditure X of a student is approximately normally distributed with mean μ = 11,000 dollars and standard deviation σ = 1500 dollars. What percent of students spend less than 10,000 dollars? Solution

Exercise 5.2.7. Suppose the annual production X of milk per cow is normally distributed with μ = 5500 liters and standard deviation σ = 150 liters. What percent of cows have annual yield less than 5155 liters? Solution

Exercise 5.2.8. The amount of vegetable oil X produced by a machine in a day is normally distributed with μ = 130 liters and standard deviation σ = 25 liters. What is the probability that a machine will produce between 120 liters and 150 liters on a day? Solution

Exercise 5.2.9. The weight X at birth of babies is normally distributed with mean μ = 114 oz and standard deviation σ = 18 oz. What percent of babies will have birth weight below 141 oz? Solution

Exercise 5.2.10. Let Z be the standard normal random variable.

1. Given that P(-1.1 < Z < c)=.6881, find c.2. Given that P(Z < c)=0.0222, find c.3. Given that P(c < Z < 1.5) = 0.0919, find c.4. Given that P(c < Z) = 0.102, find c.

Page 50: Elementary Statistics 3

Experiment with the normal animation. Solution

Exercise 5.2.11. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. On a fishing trip to the lake, you are instructed to release those in the lower 33 percent in length. What is the cut-off length?Solution

Exercise 5.2.12. The telephone company's data shows that length X of their international calls has normal distribution with mean 11.5 minutes and standard deviation 4.3 minutes. The company decided to give a special rate for the longest 20 percent calls. What is the cut-off time length? Solution

Exercise 5.2.13. The weight X of babies (of a fixed age) is normally distributed with with mean μ = 212 oz and standard deviation σ = 25 oz. Doctors would be concerned (not necessarily alarmed) if a baby is among the lower 5.05 percent in weight. Find the cut-off weight L below which the doctors will be concerned. Solution

Exercise 5.2.14. Monthly water consumption X per household, in a subdivision in Kansas City, has normal distribution with mean 15000 gallons and standard deviation 3000 gallons. It has been decided that a surcharge will be imposed for those in the top 25 percent. Find the cut-off consumption U in gallons. Solution

5.3 Normal Approximation to Binomial

A wide range of random variables behave approximately like a normal random variable. One such example is binomial(n,p)-random variables.

Roughly, if X is a B(n,p) random variable, then X behaves approximately like a normal random variable with mean μ = np

and standard deviation σ = [np(1-p)]1/2.

As we know, a B(n,p) random variable X is discrete and

P(X=r) = nC rpr(1-p)n-r r=0,1,2,…,n.

On the other hand, if Y is a N(μ, σ) random variable then

P(Y = r) = 0.

Because of this, some correction needs to be done. The following theorem states how to use normal approximation to binomial random variables.

Theorem. Suppose X is a B(n,p) random variable. If n is large and p is not very close to 0 or 1, then X behaves, approximately, like a N(μ, σ) random variable where

μ = np and standard deviation σ = [np(1-p)]1/2.

We have, for r=0,1,…,n

P(X = r) = P(r-0.5 < X < r + .5) =P(L < Z < R)

where L=(r-0.5-μ)/σ and R=(r+0.5-μ)/σ.

More generally, for r,s=0,1,…,n

P(r ≤ X ≤ s) = P(r-0.5 < X < s + .5) =P(L < Z < R)

where L=(r-0.5-μ)/σ and R=(s+0.5-μ)/σ. Now use the Z table.

This adjustment by .5 on two sides is called continuity correction.

Page 51: Elementary Statistics 3

Problems on 5.3: Normal Approximation to Binomial

Exercise 5.3.1. A Lawrence bank knows that 35 percent of its customers will visit the drive-through window. If 400 customers visit the bank, what is the approximate probability that more than 120 will visit the drive-through window? Solution

Exercise 5.3.2. It is known that the probability that a household owns a food processor is 0.1. If 190 households are interviewed, find the approximate probability that

1. more than 26 households own a food processor;2. less than 30 households own a food processor.

Solution

Exercise 5.3.3. The campaign committee of a candidate claims that sixty percent of the voters are in favor of the candidate. You interview 150 voters. Assuming that the campaign committe's claim is accurate, what is the approximate probability that less than 77 will favor the candidate?Solution

Exercise 5.3.4. A technique is used to fertilize eggs in a fertility clinic laboratory. It is known that the probability that an egg will be fertilized by this technique is 0.1. If 500 eggs are treated, what is the probability that at least 60 eggs will be fertilized? Solution

Exercise 5.3.5. The probability that a computer chip produced in a factory is defective is is .2. If you have a sample of 60 chips, what is the probability that the number of defective chips will be less than 20? Solution

Exercise 5.3.6. The probability that a light bulb produced by a machine is defective is p = 0.2. Suppose a quality control inspector takes a sample of 120 bulbs. What is the probability that more than 30 bulbs will be defective? Solution

Exercise 5.3.7. Suppose the probability that a student has access to the Internet is p = 0.8. Suppose you interview 160 students. What is the probability that less than 120 students will have access to the Internet? Solution

Exercise 5.3.8. Suppose that the probability that a person favors medical use of marijuana is p = 0.6. If 780 individuals are interviewed, what is the probability that less than 450 will be in favor? Solution

Exercise 5.3.9. Suppose that the probability that a middle-income family invests in the stock market is p = 0.8. If we interview 880 middle-income families, what is the probability that more than 700 have invested in the stock market? Solution

Exercise 5.3.10. Suppose that an insurance company knows from experience that the probability that a life-insurance policyholder will survive another 10 years is p = 0.9. The company has 2280 policyholders. What is the probability that more than 2025 will survive another 10 years.

Lesson 6 : Sampling Distribution

Introduction 6.1 Central Limit Theorem and Sampling Distribution of the Proportion Homework 17

Introduction

The sample mean x that we have computed in the previous chapters is, in fact, the observed value of a random variable X.

Similarly, the sample variance s2 that we have computed before is the observed value of a random variable S2. Each time you collect a sample/data, the computed sample mean x is the value of the random variable X for this sample. This is explained in the following example.

Example. Suppose we want to study the height distribution of the U.S. population. We collect data of size n = 1713. We

Page 52: Elementary Statistics 3

shall consider that height xi of the ith individual in this sample is, in fact, the observed value of a random

variable Xi. Here Xi is the notation for height of the ith member of the sample, which could be the height of any person

from the whole U.S. population. When we finished collecting data we have n measurements

x1, x2, …, xn.

They are, respectively, the observed values of n random variables

X1, X2, …, Xn.

We (re)define the sample mean X as the random variable

X= X

=

X1+X2+…+Xn

n

.

We also (re)define sample variance S2 as the random variable

S2 =

1

n- 1

n ∑

i = 1

(Xi - X

)2.

So, the sample mean we computed before in Lesson 2 is a value of X.

We also say that X1, X2, … , Xn is a sample from the population X = height of an American. We assume that our sampling

was done with replacement. Such a sample has the following properties.

1. Let X = height of an American and let mean of X be μ and variance σ2. Then X is called the parent or

the population random variable. Also μ and σ2 are called the population mean and variance.

2. Then, each of the sample member Xi has the same distribution as X. So, mean of Xi is μ and variance of Xi is σ2.

3. The sample members X1,X2, …, Xn are all mutually independent.

4. The distribution of X is called the sampling distribution of X.

5. Theorem. The mean of the sample mean X is the population mean μ, that is

E(X) = E(X) = μ

The variance of the sample mean X is given by

Var(X) = σ2/n

So, the standard deviation of X, denoted by σ X, is given by

σX = σ/√n.

6. Definition. The standard deviation σX is also called standard error.

Remark. In the above discussion, we have assumed that the sampling was done with replacement. That means that each time a sample member is drawn, it is placed back before we select the next member. A member could, therefore, appear more than once. Although this may seem unnatural, when we are working with a large population this is not likely to happen and is most natural from the statistical point of view. (How often would one receive calls twice for the same poll?)

The type of sampling where we do not place back the item selected before we select the next one is called sampling without replacement. Although many textbooks have a lengthy discussion of this concept, we will not emphasize it. All our samples are drawn with replacement and have the above properties.

6.1 Central Limit Theorem and Sampling Distribution of the Proportion

Central Limit Theorem

Page 53: Elementary Statistics 3

Suppose X1,X2, …,Xn is a sample from a population X with mean μ and variance σ2.

Assume n is large.

1. Then the sample mean X is, approximately, distributed as

N(μ,σX)

where σX= σ/√n.

2. So, approximately,

P(a < X <b)=P(L < Z < R)

where L=(a-μ)/σ X and R=(b-μ)/σ X

OR

P(a < X

< b) = P

a- μ

σ/√n

< Z <

b-μ

σ/√n

.

3. If the parent population X is Normal, then 1) and 2) are exact.

Sampling Distribution of the Proportion

Suppose you are conducting a poll to determine the proportion p (or percentage) of people in favor of a certain presidential candidate. You interview a randomly selected sample of n voters. Then you let X be the number of people among these n voters who are in favor of the candidate. Then X/n is the proportion in this sample that are in favor of the candidate. We use this sample proportion X/n as an estimate for the proportion of the entire voter population that are in favor of the candidate. This is the number X/n that the pollsters report on TV every evening before the election.

Here p is the proportion of voters that are in favor of the candidate. So, X is a B(n,p) random variable. We have already seen (section 5.3 in lesson 5) that, approximately, X follows a N(μ, σ) distribution, where μ = np, σ = √(np(1-p)). From this it follows that the sample proportion X/n, approximately, has

N (p, σ) distribution

where σ =(p(1-p)/n)1/2.

In fact, the same could be derived from the central limit theorem. Let

Y=1 if successY=0 if failure

Here by "success" we mean that the voter is in favor of the candidate. Then Y is a Bernoulli(p) random variable and the mean of Y is p and the variance(Y) = p(1-p). The response of each voter in the sample could thus be represented as a random variable as follows

Xi=1 if ith sample is a success

Xi=0 if ith sample is a failure

Then X1,X2, … , Xn is a sample from the Y- population, and the sample proportion

X/n = X =(X1+X2+… +Xn)/n

is the sample mean. So, by CLT the sample proportion X=X/n, approximately, has

N(p,σ) distribution

where σ=(p(1-p)/n)1/2.

The final formulas regarding sample proportion X=X/n are as follows:

1. The mean μ and the standard deviation σ of X=X/n are given by

Page 54: Elementary Statistics 3

μ = p σ = (p(1-p)/n)1/2.

2. So, approximately,

P(a < X <b)=P(L < Z < R)

where L=(a-μ)/σ and R=(b-μ)/σ

OR

P (a < X/n < b ) = P

a- p

σ X/n

< Z <

b- p

σ X/n

.

Remark. The same thing applies when you are trying to estimate the proportion of success p. Some examples might be the proportion of defective items, the proportion of people in favor of capital punishment, the proportion of immigrants.

Remark. The normal approximation of the sample proportion given above is not really different from the normal approximation of the binomial random variable (section 5.3). The only difference is the way we use them. In section 5.3, we used continuity correction. For large n, continuity correction is, in fact, negligible and will not have any effect.

Problems on 6.1: Central Limit Theorem and Sampling Distribution of the Proportion

Problems on Central Limit Theorem:

Exercise 6.1.1. It is known that the tuition paid per semester by students in a university has a distribution with mean $2,050 and standard deviation $310. If 64 students are interviewed, what is the approximate probability that the sample mean tuition paid will be above $2,060? Solution

Exercise 6.1.2. The monthly water consumption X per household in a subdivision in Kansas City has normal distribution with mean 15000 gallons and standard deviation 3000 gallons. What is the probability that the mean consumption of the 44 households in the subdivision will exceed 16000 gallons?Solution

Exercise 6.1.3. According to some data, the annual Kansas wheat export X has a mean 733 million dollars and standard deviation 163 million dollars. What is the probability that over the next 10 years Kansas wheat exports will exceed 8040 million dollars?Solution

Problems on Population Proportion:

Exercise 6.1.4. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control (CDC) 18 percent of the children younger than two had anemia in 1997. On a particular day in that year, a pediatrician examined 180 children.

1. What is the expected (sample) proportion of children with anemia?2. What is the variance of the sample proportion of children with anemia?3. What is the probability that the proportion will exceed 0.20?

Solution

Exercise 6.1.5. On one day during an impeachment hearing, it is claimed that 75 percent of eligible voters think the President should not be impeached. Suppose we interview 700 voters. Assuming the above, what is the probability that the sample proportion of voters who do not think the President should be impeached

1. is less than .73?2. is less than .70?3. is less than .60?

Page 55: Elementary Statistics 3

Lesson 7: Estimation

Introduction 7.1 Point and Interval Estimation 7.2 When σ Is Unknown

7.3 Confidence Interval for σ 2 7.4 About the Population Proportion Homework 18 - 24

Introduction

The name of the game in statistics is trying to understand the POPULATION on the basis of the information available in the SAMPLE. Part of what we mean by "understand" is estimating the values of the population parameters. The game here is to use suitable sample STATISTICS to estimate population parameters. For example, we may like to use the sample mean x as an estimate for the population mean μ.

We consider two methods of estimating parameters.

1. The first one is called point estimation. In point estimation, we give a number as an estimate for the parameter.

For example, if we are trying to estimate the mean height μ of the American population, we may take a sample of a certain size, compute the sample mean height x, and call it an estimate for μ.

2. The second one is called interval estimation. In interval estimation we give an interval (L, U) and say that the

parameter will be within this interval (with a certain level of confidence). For example, when estimating the mean height μ of the American population, we may take a sample, compute the sample mean x and say that the population mean μ is in the interval (x-1, x+1). Obviously, in interval estimation, the smaller the length, U-L, of the interval and the higher the level of confidence, the better the estimation is.

7.1 Point and Interval Estimation

As we have already mentioned, we use a statistic to estimate a parameter. The statistic T used to estimate a parameter θ is called an estimator of θ. The computed value t of T is called a point estimate or an estimate of θ. For example, the sample mean X is an estimator of μ and the computed value x is an estimate of μ. The estimator is a

sampling random variable. Similarly, the sample variance S2 is an estimator of the population variance σ2 and the

computed value s2 is an estimate of σ2.

It may be intuitively clear to you why X and S2 would be reasonable estimators, respectively, for μ and σ2. Mathematically, the reasons are as follows:

1. We have

E(X) = μ E(S2) = σ2.

For this reason we say X and S2 are unbiased estimators, respectively, for μ and σ2.

2.var(X) = σ2/n

is small if n is large. So, for large n, the standard deviation σX of X decreases. This means that values of X will be

close to the mean μ more frequently. This improves the level of confidence for X as an estimator of μ. View the animation on normal distribution to see how the probability mass concentrates around the mean μ as the standard deviation decreases.

Interval Estimation

We would almost never expect a point estimate t of a parameter θ to be exactly equal to the actual value of θ. This is why it is more reasonable to give an interval (L,U) and say that θ would be within this interval. Here L, U will be statistics. Since the computed values of L = l,U = u will depend on the sample, we do not expect that the value of θwill always be within this computed interval (l,u). We are happy as long as the true value of θ falls within the interval (l,u) most often (or often enough), allowing the possibility of being "wrong" a few times.

But how often is often enough? The probability P(L < θ < U) tells us how often the paramenter will fall within (l,u). So, it is also reasonable to give the probability P(L < θ < U) or P( θ (L,U)). This is what we do in∉ interval estimation, also called a confidence interval of θ.

Page 56: Elementary Statistics 3

Definition. Let θ be a population parameter. An interval estimate for θ provides the following:

1. It gives an interval (L,U) as an estimate for θ. Here L,U are statistics.2. It also gives the probability P(L < θ < U). This number

P(L < θ < U) = 1- α

is called the level of confidence. And (L,U) is said to be a (1-α)100 percent confidence interval of θ.3. In practice, α will be a small number, like, 0.1, 0.01, 0.05.

We need the following definition.

Definition: Given a number 0 < α < 1, the number zα is defined by the formula

P(Z > zα) = α.

View the animation on inverse Z-distribution to understand the numbers zα. As mentioned above, for us a will be a small

number .1, .01, .05 and so on. At the end of the Z-table is a list of the numbers zα that we may need frequently.

A (1-α)100 percent confidence interval for the mean μ:

Suppose X is a random variable with mean μ and variance σ2. We want to construct a confidence interval for μ.

We assume that σ is known. Let X1,X2, …, Xn be a sample from X. Note that from CLT we have, approximately,

P(-zα/2 < Z < zα/2 ) = 1 - α

where Z=(X-μ)√n/σ.

If we simplify, we get

P(X-E < μ < X+E)=1- α

where E=zα/2 σ/√n.

So we have the following theorem.

Theorem. Assume that σ is known. Then a (1-α)100 percent confidence interval for μ is given by

X-E < μ < X+E

where E=zα/2 σ/√n.

Remarks.

1. If you go on computing (1-α)100 percent confidence intervals on a regular basis, the true value of μ will not be within the confidence interval α100 percent times.

2. The confidence interval we computed above may also be called a (1-α)100 percent two sided confidence interval

for μ. There could be all kinds of confidence intervals. For example, if

P(L < μ < ∞) =1 - α.

then (L, ∞) will be a (1-α)100 percent one sided (upper) confidence interval for μ.

Definitions and Formulas:

1. The length l of this (1-α)100 percent confidence interval for μ is given by

l = 2zα/2σ/√n.

2. The margin of error E is defined as

E = zα/2σ/√n.

3. The sample size n needed for a (1-α)100 percent confidence interval to have a preassigned margin of error E is

given by

Page 57: Elementary Statistics 3

n = (zα/2σ/E)2.

To be sure, always round upward in this class. Also use the Z-table for online homework.

Use of Calculators (if you have a TI-83): Z-interval

1. Press stat and then select TESTS.

2. Select Z-interval and enter.

3. Input: you will have to select stats (not data) in this section.4. Feed in the values of σ, x, n and c-level.5. Select calculate and enter. It will give you the confidence interval.6. The margin of error = E = (width of the interval)/2.7. To compute the sample size, use the formula above.

Problems on 7.1: Point and Interval Estimation

Exercise 7.1.1. Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81.

1. Find a 99 percent confidence interval for μ.

2. Find the margin of error at 99 percent level of confidence.

Solution

Exercise 7.1.2. Assume that you have a normal population with mean μ and standard deviation σ = 9.8. Suppose you have collected a sample of size 14 and the sample mean X was found to be 151.1.

1. Find a 99 percent confidence interval for μ.

2. Find the margin of error at 99 percent level of confidence.

Solution

Exercise 7.1.3. The time taken by an athlete to run an event is normally distributed with mean μ and known standard deviation σ = 3.5 seconds. To estimate the mean μ, he ran 16 times and the sample mean was found to be X = 33 seconds.

1. Find the margin of error in estimating the true mean μ with 95 percent level of confidence.

2. Find a 99 percent confidence interval for μ.

Solution

Exercise 7.1.4. A population has normal distribution with variance σ2 = 289. How large a sample do we need to estimate

the mean μ within 3 units from the true value of μ, with 90 percent confidence?Solution

Exercise 7.1.5. The tuition X paid by a student per semester in a university has a distribution with mean μ and σ = $416. How large a sample should you draw so that you are 95 percent sure that the true value of μ will be within $10 of the sample mean x?Solution

7.2 When σ Is Unknown

Let X be a normal random variable with mean μ and variance σ2. Unlike in the last section, in this section we assume that σ is not known, and we try to compute a confidence interval of μ. In the last section, the main tool (or fact) that we used was that

Z=(X-μ) √n/σ

has N(0,1) distribution. In this section, we use the distribution of

Page 58: Elementary Statistics 3

T=(X-μ) √n/S.

The distribution of T is known as t-distribution with degrees of freedom n-1, which we have not discussed. As we did for the N(0,1) random variable, we will now give the properties of t-distribution.

About t-distribution

Given a positive integer ν, there is a random variable T = tν that is said to have t-distribution with degrees of freedom ν.

The useful properties of t-distribution are listed below:

1. A t-random variable has degrees of freedom. If a random variable T has t-distribution with degrees of freedom ν then we say that T has tν distribution.

2. The t-random variables are continuous random variables.

3. The mean of a t-random variable is ZERO.

4. The graph of the pdf of a t-random variable is symmetric around the y-axis and has a bell shape.

a. Flash animation: t-distribution

b. Flash animation: probability computation.

5. For a T = tν random variable, if the degrees of freedom ν is large, then it can be approximated by a N(0,1)

random variable.

6. For a number 0 < α < 1 and any positive integer ν, we define a number tν, α by the equation

P(T > tν, α) = α

where T has t-distribution with degrees of freedom ν.View the animation on inverse-T distribution to undertand the numbers tν, α.

7. Tables are available, one for each degree of freedom ν, that can be used to compute the probability for T-random variables. We will need only some of the numbers tν, α. A table sufficient for us is provided at link for a table .

Theorem. Let X be a normal random variable with mean μ and standard deviation σ. Let X1,X2,…, Xn be a

sample of size n from the X population. Then

T=(X-μ) √n/S.

has t-distribution with degrees of freedom n-1.

So,

P(-tn-1,α/2 < (X-μ)√n/S < tn-1,α/2 ) = 1-α.

Page 59: Elementary Statistics 3

If we simplify, we get

P(X-E < μ < X+E)=1- α

where E=tn-1,α/2S/√n.

A (1-α)100 percent Confidence Interval for μ

Under the set up of the theorem, a (1-α)100 percent confidence interval for μ is given by

X-E < μ < X+E

where E=tn-1,α/2s/ √n

E is also called the margin or error.

A Frequently Asked Question:To estimate μ, when do we use the ZInterval and when do we use the TInterval? Answer: We use the TInterval only when σ is not known.

Use of Calculators (if you have a TI-83): T-interval

1. If we have raw data, enter the data into the Calculator.2. Press stat and then select TESTS.

3. Select T-interval and enter.

4. Input: you will have to select stats or data, depending on what is given.5. Feed in the values of sample standard deviation s, x, n or the List where you have the

data and c-level.6. Select calculate and enter. It will give you the confidence interval.7. The margin of error = E = (width of the interval)/2.

Problems on 7.2: When σ Is Unknown

Exercise 7.2.1. Assume that we have normal populations with mean μ and standard deviation σ. We have a sample of size n = 18 that has sample mean x = 170.5 and standard deviation s = 13.3. Find the margin of error and compute a 99 percent confidence interval for μ. Solution

Exercise 7.2.2. Suppose that the time taken to complete a problem in a Math 365 test is normally distributed with mean μ and standard deviation σ. A sample of size 23 was taken, and sample mean and standard deviation were found to be x = 4.7 and s = .47. Estimate the mean time μ taken to complete a problem using a 98 percent confidence interval. Solution

Exercise 7.2.3. It is assumed that the lifetime (in hours) of lightbulbs produced in a factory is normally distributed with mean μ and standard deviation σ. To estimate μ the following data was collected on the lifetime of bulbs.

5110 4671 6441 3331 5055 5270 5335 4973 1837

7783 4560 6074 4777 4707 5263 4978 5418 5123

Compute a 95 percent confidence interval for μ. Write down the formula for (1-α)100 percent confidence interval that you use here. Solution

Exercise 7.2.4. To estimate the mean weight (in pounds) of salmon in a river the following sample was collected:

34.7 33.8 38.2 20.3 27.8 45.3 43.1 37.3 32.5 32.3

31.8 41.5 44.5 29.2 25.3 29.6 39.5 29.1 37.3

Page 60: Elementary Statistics 3

Compute a 99 percent confidence interval for the sample mean μ. Write down the formula for (1-α)100 percent confidence interval that you use here. Solution

Exercise 7.2.5. Suppose we collect a sample from a normal population of size n = 40 with sample mean X = 18.6 and standard deviation s = 9.486. Construct a 95 percent confidence interval for mean μ. Solution

Exercise 7.2.6. The time taken by an athlete to run an event is normally distributed with mean μ and unknown standard deviation σ. To estimate the mean μ he ran 16 times and the sample mean was found to be X = 33 seconds and the sample standard deviation s = 3.5 seconds.

1. Find the margin of error in estimating the true mean μ with 99 percent level of confidence.

2. Find a 99 percent confidence interval for μ.

Solution

7.3 Confidence Interval for σ2

Let X be the normal random variable with mean μ and variance σ2. In this section, we will construct a confidence interval

for σ2. We will take a sample X1,X2, …, Xn of size n from the X population. Let X be the sample mean and let S2 be the

sample variance. To compute a confidence interval for σ2, we will be using the distribution of

U = (n-1)S2/σ2

The distribution of U is known as χ2 distribution with degrees of freedom n-1, which we have not discussed. Next we

will give the properties of a χ2 random variable.

About χ2-distribution

Given a positive integer ν, there is a random variable χ2ν that is said to have χ2 distribution with degrees of freedom ν.

The useful properties of χ2 distribution are listed below.

1. A χ2 random variable has a degree of freedom. If a random variable U has χ2 distribution with degrees of

freedom ν then we say that U has χ2ν-distribution.

2. The χ2 random variables are all continuous random variables.

3. A χ2 random variable is always nonnegative.

4. The graph of the pdf of a χ2 random variable is skewed to the right. If the degrees of freedom, ν, is large then it

can be approximated with a N(0,1) random variable. View the animations on pdf of Chi-Square random variable and probability distribution of Chi-Square.

5. If U is a χ2ν random variable then the mean of U is ν. (We will not need this.) This fact is reflected in the

animation above.

6. For a number 0 < α < 1 and any positive integer ν, we define a number χ2ν, α by the equation

P(U > χ2 ) = α

v,α

where U has χ2 distribution with degrees of freedom ν.

View the animation on inverse Chi-Square distribution to undertand the numbers χ2ν, α.

7. Tables are available, one for each degree of freedom ν, that can be used to compute probability for χ2-random

variables. For our purpose, only some of the numbersχ2ν, α will be needed. Here is a link for a table that will be

sufficient for us.

Page 61: Elementary Statistics 3

Theorem. Let X be a normal random variable with mean μ and variance σ2. Let X1,X2,…,Xn be a sample of size n from

the X population. Then

T = (n-1)S2/σ2

has χ2 distribution with degrees of freedom n-1.

So,

P(χ2 n-1,1-α/2 < (n-1)S2/ σ2 < χ2 n-1,α/2 ) = 1-α.

If we simplify, we get

P(L < σ2 < U) = 1 - α

where

L = (n-1)S2/χ2n-1,α/2

U = (n-1)S2/χ2n-1,1-α/2

Theorem. Under the same set-up as in the above theorem, a (1-α)100 percent confidence interval for the variance σ2 is given by

l < σ2 < u

where

l = (n-1)s2/χ2n-1,α/2

u = (n-1)s2/χ2n-1,1-α/2

OR

(n- 1)s2

χ2n- 1, [(α)/2]

< σ 2 <

(n- 1)s2

χ2n- 1, 1- [(α)/2]

.

Use of Calculators: The TI-83 will not compute the confidence interval for σ2. If data is given, it is important to use the

calculator to compute the sample variance s2.

Problems for 7.3: Confidence Interval for σ2

Exercise 7.3.1. Suppose that we have collected a sample of size n = 26 from a normal population with mean μ and

variance σ2. The sample variance was found to be s

2 = 26.7. Compute a 95 percent confidence interval for σ

2.

Solution

Exercise 7.3.2. The following is sample data on the amount (in 1000 bushels) of wheat harvested by Kansas farmers in 2002.

206 300 200 385 280

600 225 933 320 260

1. Compute a 99 percent confidence interval for the variance of harvest σ2.

Solution

Exercise 7.3.3. The following is data on monthly gas consumption (in ccf) during the winter months by a household.

Page 62: Elementary Statistics 3

154 222 264 257 127

228 240 393 278 140

1. Compute a 99 percent confidence interval for the variance σ2.

Solution

7.4 About the Population Proportion

Once again, let p be the population proportion of a certain attribute. We want to compute a confidence interval for p. We let

X = 1 if successX = 0 if failure

where "success" means that the sample has the attribute.

So, X is a Bernoulli(p) random variable. We draw a sample X1,X2,…, Xn from the X population, let

X = X1+…+Xn

be the total number of success and

X=X/n

be the sample proportion of success. We have seen that, approximately, the sample proportion X has

N(μX, σX)-distrubution

where μX = p and σX = √((p(1-p))/n).

Therefore,

P(-zα/2 < (X-p)/σX < zα/2 ) = 1-α.

In an attempt to compute a confidence interval for p we simplify and get

P(X-zα/2 σX < p < X+zα/2 σX) ) = 1-α.

Since p is unknown, this will not produce a confidence interval for p. But the sample proportion x of success is a point estimate of p. So we have an approximate (1-α)100 percent confidence interval for p given by

x-e < p < x+e

where

e = zα/2 √(x(1-x)/n)

Following are some of the useful formulas and definitions that we may need.

1. The margin of error e is defined as

e = zα/2 √(x(1-x)/n)

2. A conservative margin of error E is defined as

E = zα/2/√4n.

It can be checked that the margin of error e is always less or equal to the conservative margin of error E.

3. Theorem. For a (1-α)100 percent confidence interval for p, if we are given a preassigned conservative margin of

error E, then the sample size n that we need to take is given by

n = (zα/2/2E)2 , rounded to the higher integer.

Page 63: Elementary Statistics 3

Remark. In the days of Clinton's impeachment, we often heard TV newscasters read something like the following.

President Clinton has 64 percent approval rating. The poll has a margin of error plus or minus 3.1 percentage points. The poll surveyed 972 people.

They mean that the sample proportion x of people who "approve" President Clinton is 0.64. Normally they don't tell us the level of confidence they are using. Assuming that they are using a 95 percent confidence interval, they mean that

E = zα/2 /√4n = 1.96/√(4x972) = 0.031.

Use of Calculators (if you have a TI-83): 1-PropZint

1. Press stat and then select TESTS.

2. Select 1-PropZint and enter.

3. Feed in the values of number of success x, n and c-level.4. Select calculate and enter. It will give you the confidence interval.5. The margin of error = e = (width of the interval)/2.6. To compute the conservative margin of error, use the formula in the definition.7. To compute the sample size, use the formula above.

Problems on 7.4: About the Population Proportion

Exercise 7.4.1 In a sample of 197 apples from a lot, 19 were found to be sour. Set a 99 percent confidence interval for the proportion p of sour apples in the lot. Solution

Exercise 7.4.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 97 of them developed immunity. Find a 95 percent confidence interval for the proportion p of individuals in the population for whom the vaccine would help. Solution

Exercise 7.4.3. Before a congressional election, a poll was conducted. Out of 887 randomly selected voters interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

1. Construct a 98 percent confidence interval for the proportion p of voters who would vote for A.2. Construct a 98 percent confidence interval for the proportion p of voters who would vote for B.3. What is the conservative margin of error for both?

Solution

Exercise 7.4.4. If a pollster wanted to estimate the proportion p of Americans who think that the President should not be impeached, how large a sample should he/she take so that the true value of p will be within .02 of the sample proportion, with 99 percent confidence? Solution

Exercise 7.4.5. The proportion p of defective lightbulbs produced by a machine needs to be estimated within .01 to determine whether the machine needs to be replaced. How large a sample should we take to do this with 90 percent confidence? Solution

Exercise 7.4.6. In a poll released on October 28,1998, it was revealed that 60 percent of Americans wanted President Clinton rebuked but not impeached. The poll was conducted among 1,013 adults, and it had a margin of error of 3 percentage points.

1. Can you relate the last two numbers?2. What is the level of confidence used here?

Solution: News media polls use 95 percent confidence intervals. When they say "margin of error," they mean "conservative margin of error." The conservative margin of error E and level of confidence 1 - α are related by the formula E = zα/2 /√4n. For this problem E = .03, 1 - α =.95, and n =1,013. We can check zα/2 /√4n = 1.96/√(4x1013) =

0.03079.

Page 64: Elementary Statistics 3

Lesson 8 : Comparing Two Populations

Introduction 8.1 Confidence Interval of μ 1- μ 2

8.2 When σ 1 and σ 2 are Unknown 8.3 Comparing Two Population Proportions

Homework 25 - 27

Introduction

In this lesson we try to compare two populations. We will consider the following:

1.Compute a confidence interval of the difference μ1- μ2 of the means of two populations. For example, we may like

to estimate the difference μ1 - μ2 between the mean μ1 = annual male income and the mean μ2 = annual

female income in the United States.

2.Compute a confidence interval of the difference p1-p2 of the proportions of an attribute present (or proportions of

"success") in two populations. For example, we may like to estimate the difference p1-p2 between p1 = the

proportion of defective items produced by the new machine and p2 = the proportion of defective items produced

by the old machine.

8.1 Confidence Interval of μ1- μ2

Suppose X, Y are two similar random variables. Let mean and standard deviation of X be, respectively, μ1 and σ1. Let

mean and standard deviation of Y be, respectively, μ2and σ2. We want to compute a confidence interval for the

difference μ1- μ2. So we do the following.

1.We draw a sample X1, X2, …, Xm, of size m, from the X population and we draw a sample Y1, Y2, …, Yn, of size n,

from the Y population. Let

X = (X1+X2+ … +Xm)/m

Y = (Y1+Y2+ … +Yn)/n

be the corresponding sample means.

2. BY CLT, we have that X has

N(μ1, σ1/√m )

distribution and Y has

N(μ2, σ2/√n )

distribution.

3.You would agree that X-Y is a natural estimator of μ1- μ2.

4. Now we assume that the X samples and Y samples are mutually independent. In that case, it follows that X-Y has

N(μ1 - μ2, σ) - distribution,

where

Page 65: Elementary Statistics 3

σ = √( σ12/m + σ2

2/n ).

5. It follows that

P(-zα/2 < ((X-Y) - (μ1 - μ2)) /σ < zα/2 ) = 1 - α.

where σ is as above in (4).

6. If we simplify, we get

P(X-Y -zα/2 σ < μ1 - μ2 < X-Y +zα/2 σ ) = 1 - α.

where σ is as above in (4).

7.Theorem. A (1-α)100 percent confidence interval for μ1- μ2 is given by

x-y -zα/2 σ < μ1 - μ2 < x-y +zα/2 σ

where σ is as above in (4).

This formula is usable if we know the values σ1 and σ2.

8.The margin of error is given by

E = zα/2 σ

where σ is as above in (4).

Use of Calculators (if you have a TI-83): 2-SampZinterval

1. Press stat and then select TESTS.

2. Select 2-SampZinterval and enter.

3. Input: you will have to select stats (not data) in this section.

4. Feed in the values of σ1, σ2, x,y, m, n and c-level.

5. Select calculate and enter. It will give you the confidence interval.6. The margin of error = E = (width of the interval)/2.

Problems on 8.1: Confidence Interval of μ1 - μ2

Exercise 8.1.1. Suppose we have two normal populations with means μ1, μ2 and standard deviation σ1, σ2 respectively.

It is known that σ1 = 8.1 and σ2 = 11.3. A sample of size m = 64 was collected from the first population, and the sample

mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean was found to be y = 4.1. Compute a 95 percent confidence interval for the difference of mean μ1- μ2.

Solution

Exercise 8.1.2. The birth weight of babies in developed and developing countries are normally distributed with mean μ1, μ2 and standard deviation σ1, σ2, respectively. (My data is not real.) Given σ1 = 2.3 pounds and σ2 = 2.9

pounds. A sample of size m = 35 babies from the developed nations were collected and the sample mean birth weight was found to be x = 8.9 pounds. A sample of size n = 48 babies from the developing nations was collected and the sample mean birth weight was found to be y = 7.1 pounds.

Page 66: Elementary Statistics 3

1.Compute a point estimate of the difference of mean birth weight μ1- μ2.

2.Determine the margin of error of the difference μ1- μ2 at the 95 percent level of confidence.

3.Construct a 95 percent confidence interval for μ1- μ2.

Solution

Exercise 8.1.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. The mean height and standard deviation of African elephants are μ1, σ1 = 1.2 feet, respectively. The mean height and standard deviation of Indian elephants are μ2, σ2 = 1.1 feet,

respectively. A sample of size 25 African elephants were collected and the sample mean height was found to be x = 10.9 feet. A sample of size 28 Indian elephants was collected and the sample mean height was found to be y = 9.1 feet.

1.Compute a point estimate of the difference of mean height μ1- μ2.

2.Determine the maximum error of the difference μ1- μ2 at the 99 percent level of confidence.

3.Construct a 99 percent confidence interval for μ1- μ2.

Solution

8.2 When σ1 and σ2 are Unknown

As in the last section, we have two populations X, Y. We assume that X has N(μ1, σ1) distribution and Y has N(μ2, σ2)

distribution. Unlike in the last section, we assume thatσ1, σ2 are unknown. We try to find a confidence interval

for μ1 - μ2.

We take a sample X1, X2, …, Xm of size m from the X population, and we take a sample Y1,Y2, …, Yn from the Y

population. Following are some facts and notations.

1.Assumptions: We make an important assumption that the variances σ12 and σ2

2 are equal. So, we write

σ1 = σ2 = σ.

And, we also assume that the X-sample and the Y-sample are mutually independent.

2Let X and S

X

2.be the sample mean and sample variance of the X-sample and let Y and SY2 be the sample mean and sample

variance of the Y-sample.

3.Definition. Define the pooled estimate Sp2 for σ2 as follows

Sp2 =

Page 67: Elementary Statistics 3

[(m-1)SX2+(n-1)SY

2 ]/ [m+n-2] =

[ ∑ (Xi-X)2 + ∑ (Yj-Y )2 ] / [m+n-2]

Although both SX2, SY

2 are estimators of σ2, Sp2 is a better estimator for σ2 because it uses both the samples.

One can see that Sp2 is a weighted average of SX

2and SY2.

4. It follows that

T = [ (X - Y) - (μ1 -μ2) ] / [Sp√(1/m + 1/n) ]

has a t-distribution with m+n-2 degrees of freedom.

5.Using the same kind of computations that we have done before, we see that a (1-α)100 percent confidence interval for μ1- μ2 is given by

x-y-E < μ1- μ2 < x-y+E

where

E=tm+n-2,α/2 Sp √(1/m + 1/n)

Use of Calculators (if you have a TI-83): 2-SampTint

1. If we have raw data, enter the data into the calculator in 2 lists (say L1,L2).2. Press stat and then select TESTS.

3. Select 2-SampTinterval and enter.

4. Input: you will have to select stats or data, depending on what is given.

5. Feed in the values of sample standard deviation s1, s2, x, y, m, n or the Lists where you have the data and c-

level.6. Select calculate and enter. It will give you the confidence interval and also the pooled estimate of the equal

standard deviation σ.7. The margin of error = E = (width of the interval)/2.

Problems on 8.2: When σ1 and σ2 Are Unknown

Exercise 8.2.1. Suppose that we are comparing two "similar" normal populations with means μ1, μ2 respectively and the

populations both have standard deviation σ. We collected a sample of size m = 11 from the first population that produced a sample mean x = 13.2 and sample standard deviation s1 = 2.33. A sample of size n = 13 was collected from the

second population that had sample mean y = 11.5 and sample variance s2 = 2.73.

1.Compute the pulled estimate sp for σ.

2. Find a point estimate for μ1- μ2.

3.Compute the margin of error in estimating μ1- μ2 at the 90 percent level of significance.

Page 68: Elementary Statistics 3

4.Compute a 90 percent confidence interval for μ1- μ2.

Solution

Exercise 8.2.2. Suppose we have two normal populations with means μ1, μ2 and equal standard deviation σ. A sample

of size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x = 3.7, s1 = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard

deviation were y = 4.1, s2 = 8.7.

1.Compute the pulled estimate sp for σ.

2.Compute the margin of error for a 95 percent confidence interval for μ1- μ2.

3.Compute a 95 percent confidence interval for the difference of mean μ1- μ2.

Solution

Exercise 8.2.3. The birth weight of the babies in developed and developing countries are normally distributed with mean μ1, μ2 and equal standard deviation σ. (My data is not real.) Suppose the following data about the birth weight

from developed and developing nations were collected.

Developed

8.8 8.1 6.3 9.7 6.3

7.1 5.3 7.7 9.1 8.1

8.2 7.9 8.3 8.9 9.0

10.1 9.9 8.8 7.8 5.2

7.2

Developing

6.3 5.2 8.3 5.9 5.5

7.1 8.1 7.9 6.3 6.9

9.1 8.1 7.0 4.9 5.3

6.3 7.1 6.3 6.1 5.8

5.7 6.8 8.3 7.7

1.Compute a point estimate of the difference of mean birth weight μ1- μ2.

2. Compute the pulled estimate for σ.

3.Determine the maximum error of the difference μ1- μ2 at the 95 percent level of confidence.

4.Construct a 95 percent confidence interval for μ1- μ2.

Solution

Exercise 8.2.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. Assume that the height of African and Indian elephants have an equal mean σ. The mean heights of African elephants and Indian elephants are μ1, μ2, respectively. Suppose the

following data were collected on the height of elephants from the two continents (these are not real data).

African

10.9 11.7 9.3 9.9 11.5

8.8 12.9 11.7 9.1 11.1

9.1 8.7 10.5 11.3 12.3

13.1 12.9 9.5 10.7 11.3

Indian

7.1 8.3 8.2 9.1 10.3

9.3 9.7 8.9 8.8 9.1

7.9 9.9 9.2 8.8 8.1

8.7 8.8 9.3 10. 1 9.9

Page 69: Elementary Statistics 3

9.9

1. Compute a point estimate of the difference of mean height μ1- μ2.

2. Determine the maximum error of the difference μ1- μ2 at the 99 percent level of confidence.

3. Construct a 99 percent confidence interval for μ1- μ2.

Solution

8.3 Comparing Two Population Proportions

In this section, we compute a confidence interval for the difference p1-p2 of two population proportions. An example

follows.

Example. We would like to have an estimate for the difference between the proportion p1 of males who are making more

than fifty thousand dollars annually and the proportion p2 of females who are making more than fifty thousand dollars

annually. We construct a confidence interval for p1-p2.

Similarly, we might like to compare the proportion of defective items produced by an old machine and new machine in a factory.

Assume we have two populations. Let p1 be the proportion of Population 1 that has an attribute A and let p2 be the

proportion of Population 2 that has the attribute A. We want to compute a confidence interval for p1-p2.

So, we take a sample of size m from Population 1 and let X be the number of sample members that have the attribute A and X=X/m be the sample proportion that has the attribute A. ( We may say that X is the number of "success" in this sample from Population 1 and X=X/m is the proportion of "success".) We take a sample from Population 2 of size n, which is independent of the other sample. Let Y be the number of sample members that has attribute A and Y=Y/n be the sample proportion that has the attribute A. (So, Y=Y/n is the sample proportion of "success" from Population 2.)

(Let me explain the context of the example above. We interview m males and X would be the number of males in this sample who make more than fifty thousand annually and X=X/m would be the proportion of the males in this sample who make more than fifty thousand annually. Similarly, we interview n females and Y=Y/n would be the proportion of females in this sample who make more than fifty thousand.)

We develop a confidence interval for p1-p2 as follows.

1. Notation. For the sample proportions, we have the following notatons:

X=X/m

Y=Y/n

2. As we have seen before, by CLT, we have that X has N(p1,σ1) distribution where σ1 = √(p1(1-p1)

/m) and Y has N(p2,σ2) distribution where σ2 = √(p2(1-p2) /n).

3. You would agree that X-Y is a natural estimator of p1-p2.

4. As we have assumed that the X samples and Y samples are mutually independent, it follows that X-Y has N(p1-

p2,σ) distribution where σ = √ ( σ12 + σ2

2 ).

5. So, it follows that

P(-zα/2<( (X- Y)-(p1-p2))/σ < zα/2 ) = 1-α

6. If we simplify, we get P((X- Y) -zα/2σ < p1-p2< (X- Y) +zα/2σ) = 1-α

7. As in section 7.4, we use X as an estimate for p1 and Y as an estimate for p2 and get the following theorem.

Page 70: Elementary Statistics 3

Theorem. An approximate (1-α)100 percent confidence interval for p1-p2 is given by

X-Y -E < p1-p2 < X-Y+E

where

E= Zα/2√( X(1-X)/m + Y(1-Y)/n )

8. The E is called the margin of error.

Use of Calculators (if you have a TI-83): 2-PropZint

1. Press stat and then select TESTS.

2. Select 2-PropZint and enter.

3. Feed in the values of number of successes x, y, sample sizes n1, n2 and c-level.

4. Select calculate and enter. It will give you the confidence interval.5. The margin of error = E = (width of the interval)/2.

Problems on 8.3: Comparing Two Population Proportions.

Exercise 8.3.1. Suppose two independent samples were collected from two populations. We want to compare the proportions p1,p2 , respectively, of an attribute A present in these two populations. Use 95 percent confidence interval to

estimate p1-p2. We are given that x = 55 had the attribute A in a sample of size m = 117 from the first population and y

= 37 had the attribute A in a sample of size n = 79 from the second sample. Solution

Exercise 8.3.2. To compare the proportions p1,p2 of defective items produced by new and old machines, respectively,

samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of 41 items from the old, 9 were defective. Compute a 99 percent confidence interval for p1-p2

Solution

Exercise 8.3.3. To compare the proportions p1,p2 of men and women, respectively, who watch football, data was

collected. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch football. (These are not real data.) Construct a 99 percent confidence interval for p1-p2.

Lesson 9 :Testing Hypotheses

9.1 The Philosophy of Testing Hypotheses 9.2 Developing a Test 9.3 Testing on a Single Population

9.4 Testing Hypotheses on Variance σ 2 9.5 Population Proportion 9.6 Testing of Hypotheses to Compare

Two Populations

9.7 Comparing Means of Two Populations: σ 1, σ 2 Unknown

9.8 Comparing Proportions p1,

p2 of Two Populations Homework 28 - 32

9.1 The Philosophy of Testing Hypotheses

In this lesson we will test a hypothesis H0, called Null hypothesis, against hypothesis HA, called the alternative

hypothesis. Only one of these two hypotheses is true.Based on the collected sample and testing criterion that we will set up, we will accept only one of them and reject the other.

Example 1. Suppose we want to test the hypothesis that the disparity between the wages (annual income) of working men and women does not exist any more. Let μ1 be the mean annual income of men and μ2 be the mean annual income

of working women. So, our Null hypothesis H0 and the alternative hypothesis HA may be written as

H0 : μ1- μ2 > 0

HA : μ1- μ2 = 0

Page 71: Elementary Statistics 3

Example 2. A TV commentator mentions that only about 10 years ago the average life expectancy of a human being was 75, and now it has increased substantially. To test the claim of this commentator, we let μ be the average life expectancy of a human being. Then we set up our Null and alternative hypotheses as follows:

H0 : μ =75

HA : μ >75

1. Definition. A statistical hypothesis is a statement, claim, or proposition regarding a population. Most often, it is about the values of the population parameters. In the above two examples, H0 and HA are statistical

hypotheses.2. It is important to consider which is a Null hypothesis and which is an alternative hypothesis in a given context.

Essentially, one is the negation of the other.

3. The Null hypothesis H0 represents the status quo; it is something that you have believed for a long time, or it

is some assumption or method that has been working reliably for you for a long time. You want to hold on to the Null hypothesis unless there is very strong evidence, in the collected data, that the alternative hypothesis is better.

4. The alternative hypothesis represents a new claim or something out of the ordinary. It could be a

researcher's new technology or some sales person's claim that his/her product is better. We would be very skeptical about the alternative hypothesis and would accept it only if there is very strong evidence, in the collected data, in favor of it.

5. Given a Null hypothesis H0 and an alternative hypothesis HA, a test of hypothesis is a rule or a procedure to

decide, based on the collected sample, whether to accept H0 or HA.

Our test will be based on the value of a test statistic. The rule is also called the decision rule or a test of significance.

6. Two Types of errors. In this process of testing, we may commit two types of errors.

1. If we reject H0 when it is in fact true, then it is called a type one error.

2. If we accept H0 when it is in fact false, then it is called a type two error.

3. The probability of committing a type one error is called the level of significance and is, normally,

denoted by α. Usually, α will be a .1, .05, .01 or a small number.

9.2 Developing a Test

Let X be a random variable with mean μ and standard deviation σ. Some of our hypotheses testing will look like the following.

H0 : μ = 75

HA : μ ≠ 75

or

H0 : μ = 75

HA : μ > 75

or

H0 : μ = 75

HA : μ < 75

More generally, we test hypotheses like

H0 : μ = μ 0HA : μ ≠ μ 0

or

Page 72: Elementary Statistics 3

H0 : μ = μ 0HA : μ > μ 0

or

H0 : μ = μ 0HA : μ < μ 0

To Develop a test:

Suppose we have a random variable X with mean μ and standard deviation σ. We want to develop a test procedure for the following null and alternative hypotheses.

H0 : μ = μ 0HA : μ ≠ μ 0

We take a sample X1,X2, …, Xm of size m from the X population and let X be the sample mean.

1. We assume that sample size m is large enough, so we have by CLT that X has

N(μ, σX)

distribution, where

σX = σ/√m.

2. Both type one and type two errors can be controlled by increasing the sample size m. But once the sample size is

fixed, it is not possible to control both simultaneously. If you want to reduce the probability of type one error, the probability of type two error will go up. The converse is also true. Since we are more concerned about type one error, we will try to minimize the probability of type one error, which is also called the level of significance. So we want to develop a test at the level of significance α.

3. Since X is a good estimator for μ, and since the alternative hypothesis is

HA : μ ≠ μ0

we will reject our null hypothesis H0 only if X and μ0 are far apart, that is, if

| X - μ0| is large.

4. Also, if H0 is true, then μ = μ0 and

Z=(X-μ0) /σX

has N(0,1) distribution, where

σX = σ/√m.

Expression Z above will be called a test statistic and we will accept H0 if the observed (absolute) value |z| of |Z|

is small and reject H0 if the observed value |z| of |Z| is large.

5. If H0 is true, then

P(Z ∉ ( -zα /2, zα/2 )) = α

6. So, at the level of significance α, our decision rule is

Reject H0 if z ∉ ( -zα/2, zα/2 )where z = (x-μ0) /σX

Accept H0 otherwise.

7. The above decision rule works only if we know the value of σ.

Some Hypotheses and Decision Rules. We will assume that the value of σ is known.

Page 73: Elementary Statistics 3

1. Two-tail test: Suppose we are testing

H0 : μ = μ0HA : μ ≠ μ0

At the level of significance α, our decision rule is

Reject H0 if z ∉ ( -zα/2, zα/2 ) where z = (x-μ0) /σX

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : μ = μ0HA : μ < μ0

At the level of significance α, our decision rule is

Reject H0 if z < -zα where z = (x-μ0) /σX

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing H0 : μ = μ0HA : μ > μ0

At the level of significance α, our decision rule is

Reject H0 if z > z α where z = (x-μ0) /σX

Accept H0 otherwise.

Definition. The set of values (that is, the intervals) that leads to the rejection of the Null hypothesis H0 is called

the rejection region or the critical region.

Definition. Suppose we have a test statistic T to test H0 against HA. Let the observed value of T = t. The P-value is

defined as the probability, assuming H0 is true, that T will take a value at least as extreme as t or worse. In the above

decision rules, our test statistic is

Z = (X-μ0) /σX

If Z = z is the observed value of Z, then we have the following.

1. For the two-tail test, the P-value is given by

p=P(Z ∉(-|z|,|z|))

2. For the left-tail test, the P-value is given by

p=P(Z < z)

3. For the right-tail test, the P-value is given by

p=P(Z > z)

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the Z-Test, which comes under TESTS.

2. When we use calculators (say TI-83) for testing hypotheses, the calculator will give us z-values and p-values.3. We can use the z-values with the above decision rules to test hypotheses.4. Alternately, at the level of significance α, if the P-value=p then

Reject H0 if p < α

Accept H0 otherwise.

Remark. For the rest of this chapter, we will test hypotheses for various parameters.

Page 74: Elementary Statistics 3

1. In each case, as above, we will have three tests—the two-tail test, the left-tail test, and the right-tail test.2. In each case, the calculator will give the value of the test statistic (as the z-value above) and the p-value.

3. If we use the p-value for a test, then the decision rule will remain the same for all the tests to come:

Reject H0 if p < α

Accept H0 otherwise.

Problems on 9.2: Developing a Test —σ Known

Exercise 9.2.1. Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis

H0 : μ = 75

HA : μ ≠ 75

At the 5 percent level of significance will you reject or accept the null hypothesis? Solution

Exercise 9.2.2. (Change the level of significance.) Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis

H0 : μ = 75

HA : μ ≠ 75

At the 1 percent level of significance will you reject or accept the null hypothesis?Solution : Same as 9.2.1

Exercise 9.2.3. (Change the alternative hypothesis) Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis

H0 : μ = 75

HA : μ > 75

At the 5 percent level of significance will you reject or accept the null hypothesis? Solution

Exercise 9.2.4. The time taken by an athlete to run an event is normally distributed with mean μ and known standard deviation σ = 3.5 seconds. The coach believes that his mean has improved from last year's mean 34 seconds. To test, the athlete ran 16 times and the sample mean was found to be X = 31 seconds.

1. Formulate the null and the alternative hypotheses.2. At 5 percent level of significance, would the coach accept or reject his belief that the athlete has improved?

Solution

9.3 Testing on a Single Population

In this section, we assume that X is a N(μ,σ) random variable. In the last section, we assumed that σ was known; but in this section we assume that σ is not known. We will do all three tests as in the above section, but assume that the value of σ is not known.

Once again, we draw a sample X1,X2,…,X m of size m from the X population. Let X and S2 be the sample mean and

variance, respectively. The test statistic we use is

T=((X-μ0) √m) /S

Page 75: Elementary Statistics 3

If H0: μ = μ0 is true then T has t-distribution with degrees of freedom m-1. Using the same kind of arguments, we

formulate the following decision rules.

1. Two-tail test: Suppose we are testing

H0 : μ= μ0HA : μ ≠ μ0

At the level of significance α, our decision rule is

Reject H0 if t ( -t∉ m-1, α/2, tm-1, α/2 )where t = ((x-μ0) √m) /s

Accept H0 otherwise.

2. Left-tail test: Suppose we are testingH0 : μ = μ0HA : μ < μ0

At the level of significance α, our decision rule is

Reject H0 if t < -tm-1, α where t = ((x-μ0) √m) /s

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ = μ0HA : μ > μ0

At the level of significance α, our decision rule is

Reject H0 if t > tm-1, α where t = ((x-μ0) √m) /s

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the T-Test, which comes under TESTS. Use it when σ is not known.

2. The calculator will give us t-values and p-values.3. We can use the t-values with the above decision rules to test hypotheses.4. Alternately, at the level of significance α, if the P-value=p then

Reject H0 if p < α

Accept H0 otherwise.

Problems on 9.3: Testing on a Single Population —σ Unknown

Exercise 9.3.1. It is assumed that the lifetime (in hours) of light bulbs produced in a factory is normally distributed with mean μ and standard deviation σ. The mean lifetime for an average light bulb on the market is 6000 hours. To estimate μ, the following data was collected on the lifetime of light bulbs.

5110 4671 6441 3331 5055 5270 5335 4973 1837 5487

7783 4560 6074 4777 4707 5263 4978 5418 5123 5017

The producer claims that the mean life expectancy of the bulbs is more than the average bulbs on the market.

1. Formulate your null and alternative hypotheses.2. Write down your decision rule.

Page 76: Elementary Statistics 3

3. At one percent level of significance, what will you decide?

Solution

Exercise 9.3.2. To estimate the mean weight (in pounds) of salmon in a river, the following sample was collected.

34.7 33.8 38.2 20.3 27.8

45.3 43.1 37.3 32.5 32.3

31.8 41.5 44.5 29.2 25.3

29.6 39.5 29.1 37.3

Last year the mean weight was found to be 35 pounds. You want to test to determine if the mean weight has changed significantly this year.

1. Formulate your null and alternative hypotheses.2. Write down your decision rule.3. At one percent level of significance, what will you decide?

Solution

Exercise 9.3.3. A supplier of light bulbs claims that the mean lifetime of his bulbs is longer than that of the bulbs available on the market. It is known that the mean lifetime of the bulbs on the market is 3456 hours. To test the claim of the supplier, you test a sample of 26 bulbs and find the sample mean to be 3720 hours and the sample standard deviation to be s = 1152 hours. At 5 percent level of significance, would you accept the claim of the supplier? Solution

Exercise 9.3.4. It is believed that the mean length of babies at birth in the United States is higher than the world wide mean of 18.7 inches. A sample of 26 babies in the United States was collected, and the sample mean and standard deviation was found to be x = 19 inches, s = 1 inch. At 1 percent level of significance, do you believe that babies in the United States are longer? Solution

Exercise 9.3.5. A car manufacturer claims that a new model of car will get more mileage per gallon than the old model. The old model gets a mean mileage of 33 miles per gallon. To test the claim, 9 cars from the new model were tested and the sample mean was found to be x = 35 miles and standard deviation s = 2.2 miles. At 5 percent level of significance, would you accept the claim of this manufacturer? Solution

9.4 Testing Hypotheses on Variance σ2

Once again, let X be a N(μ, σ) random variable. We would like to test the Null hypothesis that

H0 : σ2 = σ20.

As usual we draw a sample X1,X2, …,Xm of size m from the X population. Let S2 be the sample variance. The test statistic

we use is

Y = (m-1)S2/σ02.

If H0 : σ2 = σ02 is true, then Y has χ2-distribution with degrees of freedom m-1. Using the same kind of arguments, we

formulate the following decision rules.

1. Two-tail test: Suppose we are testing

H0 : σ2 = σ02

HA : σ2 ≠ σ02

At the level of significance α, our decision rule is

Page 77: Elementary Statistics 3

Reject H0 if y (∉ χ2 m-1,1-α/2, χ2 m-1, α/2 ) where y = (m-1)s2/σ0

2

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : σ2 = σ02

HA : σ2 < σ02

At the level of significance α, our decision rule is

Reject H0 if y < χ2 m-1,1-αwhere y = (m-1)s2/σ02

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : σ2 = σ02

HA : σ2 > σ02

At the level of significance α, our decision rule is

Reject H0 if y > χ2 m-1,α where y = (m-1)s2/σ02

Accept H0 otherwise.

Remark. The TI-83 does not have a test for σ2. So, one has to use the above decision rules for this section.

Problems on 9.4: Testing Hypotheses on Variance σ2

Exercise 9.4.1 Suppose that we have collected a sample of size n = 23 from a normal population with mean μ and

variance σ2. The sample variance was found to be s2 = 46.7. At 5 percent level of significance, would you conclude

that σ2 is bigger than 25?Solution

Exercise 9.4.2 Following is data on the life expectancies of a group of people older than 75.

87 92 81 76 81

87 79 88 88 79

81 89 97 91 82

At one percent level of significance, would you conclude that the variance, σ2, of life expectancies is higher than 16? Solution

Exercise 9.4.3 Following is data on a household's monthly gas consumption (in ccf) during the winter months.

154 222 264 257 127

228 240 393 278 140

At 5 percent level of significance, would you conclude that the variance σ2 of gas consumption is less than 6400 ccf2? Solution

Page 78: Elementary Statistics 3

9.5 Population Proportion

Let p be the population proportion that has a particular attribute A. We want to test Null hypothesis

H0 : p = p0.

As usual, we draw (or interview) a sample of size m. Let X be the number of sample members that has this attribute and X = X/m be the sample proportion. (So, X is the sample proportion of "success.") The test statistic we use is

Z=(X-p0) /σX

where

σX = √[(p0(1-p0)) /m].

If H0 : p = p0 is true, then Z has approximately N(0,1) distribution. As before, our decision rules are

1. Two-tail test: Suppose we are testing

H0 : p = p0HA : p ≠ p0

At the level of significance α, our decision rule is

Reject H0 if z ( -z∉ α/2, zα/2 ) where z = (x-p0) /σX

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : p = p0HA : p < p0

At the level of significance α, our decision rule is

Reject H0 if z < -zα where z = (x-p0) /σX

Accept H0 otherwisep.

3. Right-tail test: Suppose we are testing

H0 : p = p0HA : p > p0

At the level of significance α, our decision rule is

Reject H0 if z > zα where z = (x-p0) /σX

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 1-PropZTest, which comes under TESTS.

2. The calculator will ask for p0, the number of success x, and the sample size n.

3. The calculator will give us z-values and p-values; p-cap is, in fact, sample proportion of success x = x/n.4. We can use the z-values with the above decision rules to test hypotheses.5. Alternately, at the level of significance α, if the P-value=p then

Reject H0 if p < α

Accept H0 otherwise.

Page 79: Elementary Statistics 3

Problems on 9.5: Population Proportion

Exercise 9.5.1. In a sample of 197 apples from a lot, 19 were found to be sour.

1. At one percent level of significance, would you conclude that more than 10 percent of the apples are sour?2. At five percent level of significance, would you conclude that more than 10 percent of the apples are sour?3. At ten percent level of significance, would you conclude that more than 10 percent of the apples are sour?

Solution

Exercise 9.5.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 61 of them got the virus. It is known that usually fifty percent of the population get the virus.

1. At one percent level of significance, would you conclude that the vaccine is effective?2. At five percent level of significance, would you conclude that the vaccine is effective?3. At ten percent level of significance, would you conclude that the vaccine is effective?

Solution

Exercise 9.5.3. Before an election for a congressional seat, a poll was conducted. Out of 887 randomly selected voters interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

1. At five percent level of significance, would you conclude that candidate A will receive more than 40 percent of the

vote? Solution

2. At ten percent level of significance, would you conclude that candidate A will receive more than 40 percent of the vote?

3. At ten percent level of significance, would you conclude that candidate B will receive more than 40 percent of the

vote? Solution

9.6 Testing of Hypotheses to Compare Two Populations

As we have computed confidence intervals to compare two populations, in this section we will do significance tests to compare two populations.

Let X be a random variable with mean μ1 and standard deviation σ1 and let Y be a random variable with mean μ2 and

standard deviation σ2. (For example, X could be the height of an American male and Y could be the height of an American

female.)

We may like to compare the equality (or inequality) of means μ1, μ2. So, our Null hypothesis is given by

H0 : μ1 = μ2

or equivalently

H0 : μ1- μ2 = 0.

So, as before we collect a sample X1,X2, …,Xm, of size m from the X-population and a sample Y1,Y2, …,Yn, of size n, from

the Y-population. Let X and S12 be the sample mean and variance, respectively, of the X-sample. Let Y and S2

2 be the

sample mean and variance, respectively, of the Y-sample.

First, assume that σ1, σ2 are known

If σ1, σ2 are known, then the test statistic that we use is

Z = (X-Y)/σd

where

Page 80: Elementary Statistics 3

σd = √( σ12 /m + σ2

2 /n )

If the Null hypothesis H0 : μ1- μ2 = 0 is true, then Z has N(0,1) distribution. As before, our decision rules are formulated

as follows.

1. Two-tail test: Suppose we are testing

H0 : μ1 - μ2= 0

HA : μ1 - μ2≠ 0

At the level of significance α, our decision rule is

Reject H0 if z ( -z∉ α/2, zα/2 ) where z = (x-y) /σd

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 < 0

At the level of significance α, our decision rule is

Reject H0 if z < -zα where z = (x-y) /σd

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 > 0

At the level of significance α, our decision rule is

Reject H0 if z > zα where z = (x-y) /σd

Accept H0 otherwise.

Remark. If sample sizes m,n are large, we can use S1, S2 as an estimate for σ1, σ2 in the above expression for Z. So, the

modified formula for Z would be :

Z = (X-Y)/sd

where

Sd = √( S12 /m + S2

2 /n )

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 2-SampZTest, which comes under TESTS.

2. Use it when σ 1 and σ 2 are known.

3. The calculator will give us z-values and p-values.4. We can use the z-values with the above decision rules to test hypotheses.5. Alternately, at the level of significance α, if the P-value=p then

Reject H0 if p < α

Accept H0 otherwise.

Page 81: Elementary Statistics 3

Problems on 9.6: Testing of Hypotheses to Compare Two Populations — σ1, σ2 Known

Exercise 9.6.1. Suppose we have two normal populations with means μ1, μ2 and standard deviation σ1, σ2, respectively.

It is known that σ1 = 8.1 and σ2 = 11.3. A sample of size m = 64 was collected from the first population, and the sample

mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean was found to be y = 4.1. At 5 percent level of significance, would you conclude that μ1 ≠ μ2?

Solution

Exercise 9.6.2. Suppose the birth weight of babies in developed and developing countries are normally distributed with mean μ1, μ2 and standard deviation σ1, σ2, respectively. (My data is not real, as is often the case.) It is known the σ1 =

2.3 pounds and σ2 = 2.9 pounds. A sample of size m = 35 babies from the developed nations was collected, and the

sample mean birth weight was found to be X = 8.9 pounds. A sample of size n = 48 babies from the developing nations was collected, and the sample mean birth weight was found to be y = 7.6 pounds.

1. At 5 percent level of significance, would you conclude that the mean birth weight of babies in the developed nations is higher than that of the developing nations?

2. At 1 percent level of significance, would you conclude that the mean birth weight of babies in developed nations is higher than that of developing nations?

Solution

Exercise 9.6.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. The mean and standard deviation height of African elephants are μ1, σ1= 1.5 feet, respectively. The mean and standard deviation of the height of Indian elephants are μ2, σ2= 1.3

feet, respectively. A sample of size 25 African elephants was collected, and the sample mean height was found to be x = 10.9 feet. A sample of size 28 Indian elephants was collected, and the sample mean height was found to be y = 9.1 feet.

1. At 5 percent level of significance, would you conclude that the mean height of African elephants is higher than that of the Indian elephants?

2. At 1 percent level of significance, would you conclude that the mean height of African elephants is higher than that of the Indian elephants?

Solution

9.7 Comparing Means of Two Populations: σ1, σ2 Unknown

As we did with confidence intervals, we consider the case where σ1, σ2 are not known, but we assume that standard

deviations are equal:

σ1 = σ2 = σ.

In this case, we have the estimator Sp for σ given by

Sp = ( [(m-1)SX2+(n-1)SY

2 ]/ [m+n-2] )1/2

where SX and SY are the respective sample standard deviations of the corresponding samples. The test statistic that we

use is

T = (X-Y) /[Sp √( 1/m+1/n) ]

If the Null hypothesis H0 : μ1- μ2 = 0 is true, then T has a t-distribution with degrees of freedom m+n-2. We formulate

the test hypotheses and the decision rules as follows.

1. Two-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 ≠ 0

At the level of significance α, our decision rule is

Page 82: Elementary Statistics 3

Reject H0 if t ( -t∉ m+n-2,α/2, tm+n-2,α/2 ) where t = (x-y) / [sp √( 1/m + 1/n )]

Accept H0 otherwise.

2. Left-tail test : Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 < 0

At the level of significance α, our decision rule is

Reject H0 if t < -tm+n-2,α where t = (x-y) / [sp √ ( 1/m + 1/n )]

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 > 0

At the level of significance α, our decision rule is

Reject H0 if t > tm+n-2,α where t = (x-y) / [sp √ ( 1/m + 1/n )]

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 2-SampTTest, which comes under TESTS.

2. Use it when σ 1 = σ 2= σ are UNKNOWN and equals. Either s1 and s2 will be given or raw data will be given.

3. Always use Pooled estimate of σ by selecting YES for "Pooled".

4. The calculator will give t-values and p-values and also the pooled estimate SXP.

5. We can use the t-values with the above decision rules to test hypotheses.6. Alternately, at the level of significance α, if the P-value=p then

Reject H0 if p < α

Accept H0 otherwise.

Problems on 9.7: Comparing Means of Two Populations — σ1, σ1 Unknown:

Exercise 9.7.1. Suppose that we are comparing two similar normal populations with means μ1, μ2, respectively, equal

standard deviation σ. We collected a sample of size m = 11 from the first population that produced a sample mean x = 13.2 and samples standard deviation s1 = 2.33. A sample of size n = 13 was collected from the second population that

had sample mean y = 11.5 and sample variance s2 = 2.73.

At 5 percent level of significance, would you conclude that μ1 ≠ μ2?

Solution

Exercise 9.7.2. Suppose we have two normal population with means μ1, μ2 and equal standard deviation σ. A sample of

size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x = 3.1, s1 = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard deviation

were y = 4.4, s2 = 8.7. At 5 percent level of significance, would you conclude that μ1 ≠ μ2.

Solution

Page 83: Elementary Statistics 3

Exercise 9.7.3. Suppose the birth weight of babies in developed and developing countries are normally distributed with mean μ1, μ2 and equal standard deviation σ. (My data is not real, as is often the case.) The following data about birth

weight in developed and developing nations were collected.

8.8 8.1 6.3 9.7 6.3

7.1 5.3 7.7 9.1 8.1

8.2 7.9 8.3 8.9 9.0

10.1 9.9 8.8 7.8 5.2

7.2

6.3 5.2 8.3 5.9 5.5

7.1 8.1 7.9 6.3 6.9

9.1 8.1 7.0 4.9 5.3

6.3 7.1 6.3 6.1 5.8

5.7 6.8 8.3 7.7

1. At 5 percent level of significance, would you conclude that the mean birth weight of babies in the developed countries is higher than that in developing countries?

2. At 1 percent level of significance, would you conclude that the mean birth weight of babies in the developed countries is higher than that in developing countries?

Solution

Exercise 9.7.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. Assume that the height of Arican and Indian elephants have an equal standard deviation σ. The mean heights of the African elephants and Indian elephants areμ1, μ2, respectively. The

following data were collected on the height of the elephants from the two continents (these are not real data):

10.9 11.7 9.3 9.9 11.5

8.8 12.9 11.7 9.1 11.1

9.1 8.7 10.5 11.3 12.3

13.1 12.9 9.5 10.7 11.3

7.1 8.3 8.2 9.1 10.3

9.3 9.7 8.9 8.8 9.1

7.9 9.9 9.2 8.8 8.1

8.7 8.8 9.3 10. 1 9.9

9.9

1. At 5 percent level of significance, would you conclude that the mean height of African elephants is higher than that of Indian elephants?

2. At 1 percent level of significance, would you conclude that the mean height of African elephants is higher than that of Indian elephants?

Solution

9.7 Paired t-test

Once again, we are testing equality of means μ1, μ2 of two populations. So, our Null Hypothesis is

H0 : μ1- μ2 = 0.

We continue to denote the first population random variable by X and the second population random variable by Y. We also assume that X and Y have normal distribution, and that they are independent.

In certain situations, it is natural to collect samples in "pairs" (X,Y) from the two populations and consider the difference D = X-Y. So, D has mean

μD = μ1- μ2

and our Null hypothesis becomes

H0 : μD = 0.

Also D has

Page 84: Elementary Statistics 3

N(μD, σD )-distribution

where

σD = √( σ12 + σ2

2 ).

We will collect samples in pairs (X1,Y1), …,(Xn,Yn) and look at the corresponding D-sample:

D1 = X1-Y1, …, Dn = Xn-Yn.

Let

D = ( D1+…+Dn )/n

S2D = [∑ (Di-D)2] / (n-1)

be the sample mean and variance, respectively, of the D-sample.

The test statistic that we will use is

T = (D√n) /SD

If the Null hypothesis H0 : μD = μ1- μ2 = 0 is true, then T has a t-distribution with degrees of freedom n-1.

The following are decision rules for the Paired t-test.

1. Two-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 ≠ 0

At the level of significance α, our decision rule is

Reject H0 if t ( -t∉ n-1,α/2, tn-1,α/2 ) where t = (D√n) /SD

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 < 0

At the level of significance α, our decision rule is

Reject H0 if t < -tn-1,α where t = (D√n) /SD

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0

HA : μ1 - μ2 > 0

At the level of significance α, our decision rule is

Reject H0 if t > tn-1,α where t = (D√n) /SD

Accept H0 otherwise.

Example. Suppose we are comparing two models of cars to see how fast they accelerate. In this case, to avoid any variation due to individual drivers, we take n drivers and let each driver drive one of each model of car. So, (xi,yi) are the

accelerations of the first and second model driven by driver 1. Thus, we will have n pairs of observations.

Page 85: Elementary Statistics 3

Remark. The same technique of paired t-test will give us that a (1-α)100 percent confidence interval for μD = μ1- μ2 is

d-tn-1,α/2sd < μ1- μ2 < d-tn-1,α/2sd

9.8 Comparing Proportions p1, p2 of Two Populations

Once again, we have two populations and let p1 be the proportion of Population 1 that has a certain attribute A and let p2 be the population proportion of Population 2 that has attribute A. We want to compare p1 and p2. We want to test the

equality of these two proportions. So,our Null hypothesis is

H0 : p1-p2 = 0.

We take a sample of size m from Population 1 and let X be the number of the sample members that have this attribute A, and X = X/m be the sample mean. Similarly, we take a sample (or interview) of size n and let Y be the number of the sample members that have this attribute A and Y = Y/n be the sample mean. (So, X, Y are proportion of "success" of the two samples.)

Write

P=(X+Y)/(m+n)

If the null hypothesis

H0 : p1 = p2

is true, then p is the natural estimate for p1 = p2.

The sample statistic we use here is

Z = (X-Y) /sD

where

sD = √ [P(1-P)(1/m + 1/n) ]

If H0 : p1-p2 = 0 is true, then Z has, approximately, N(0,1) distribution. Now our test hypotheses and the decision rules

are as follows.

1. Two-tail test: Suppose we are testing

H0 : p1 - p2 = 0

HA : p1 - p2 ≠ 0

At the level of significance α, our decision rule is

Reject H0 if z ( -z∉ α/2, zα/2 ) where z = (X-Y)/sD

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

HO : p1 - p2 = 0

HA : p1 - p2 < 0

At the level of significance α, our decision rule is as follows:

Reject H0 if z < -zα where z = (X-Y)/sD

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

Page 86: Elementary Statistics 3

H0 : p1 - p2 = 0

HA : p1 - p2 > 0

At the level of significance α, our decision rule is

Reject H0 if z > zα where z = (X-Y)/sD

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 2PropZTest, which comes under TESTS.

2. The calculator will give us z-values and p-values. Also, in our notations, p1-cap = X, p2-cap = Y, p-cap = P

3. We can use the z-values with the above decision rules to test hypotheses.4. Alternately, at the level of significance α, if the P-value=p then

Reject H0 if p < α

Accept H0 otherwise.

9.8: Problems on Comparing Proportions p1, p2 of Two Populations

Exercise 9.8.1. Suppose two independent samples were collected from two populations. We want to compare the proportions p1,p2 , respectively, of an attribute A present in these two populations. We are given that x = 55 had the

attribute A in a sample of size m = 117 from the first population, and y = 37 had the attribute A is a sample of size n = 79 from the second sample.

At 1 percent level of significance, would you conclude that p1 > p2?

Solution

Exercise 9.8.2. To compare the proportions p1,p2 of defective items produced by new and old machines, respectively,

samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of 41 items from the old machine, 9 were defective.

At 5 percent level of significance, would you conclude that p1 < p2?

Solution

Exercise 9.8.3. Data was collected to compare the proportions p1,p2 of men and women, respectively, who watch

football. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch football. (These are not real data).

1. At 5 percent level of significance, would you conclude that the proportion of men who watch football is higher than the proportion of women who watch football?

2. At 1 percent level of significance, would you conclude that the proportion of men who watch football is higher than the proportion of women who watch football?