chapter 7 sampling distributions statistics for business (env) 1

51
Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Upload: jacob-mitchell

Post on 04-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Chapter 7

Sampling Distributions

Statistics for Business(Env)

1

Page 2: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Sampling Distributions

7.1 The Sampling Distribution of the Sample Mean

7.2 Central Limit Theorem7.3 STANDARD ERROR AND STATISTICAL

INFERENCE

2

Page 3: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

The sampling process

A sample should be representative of the entire population, yet it is not expected to be identical to the population.

3

Page 4: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Sampling distribution• Suppose that we draw all possible samples

of size n from a given population. • Suppose further that we compute a statistic

(e.g., a mean, IQR, standard deviation) for each sample.

• The probability distribution of this statistic is called a sampling distribution.

4

Page 5: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

5

Sampling error is the discrepancy, or amount of error, between a sample statistic and its corresponding population parameter.The distribution of sample means is the collection of sample means for all thepossible random samples of a particular size (n) that can be obtained from apopulation.

Page 6: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

The sampling distribution

6

Page 7: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Two different questions

Data distribution

P( X > 70)

Distribution of sample means

P( X > 70)7

Page 8: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

8

The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population.

Page 9: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Variability of a Sampling Distribution

• The variability of a sampling distribution depends on three factors: – N: The number of objects in the population. – n: The number of objects in the sample. – The way that the random sample is chosen.

9

Page 10: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Sample without replacement• If the population size is much larger than the

sample size, then the sampling distribution has roughly the same sampling error, whether we sample with or without replacement (population

element can be selected only one time). • On the other hand, if the sample represents a

significant fraction (say, 1/10) of the population size, the sampling error will be noticeably smaller, when we sample without replacement.

10

Page 11: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Methods of Probability SamplingThe sampling error is the difference between a sample statistic (e.g. X) and its corresponding population parameter(e.g. ).

The sampling distribution of the sample mean is the probability distribution of the population of the sample means obtainable from all possible samples of size n from a population of size N.

11

Page 12: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

A population that consists of only 4 scores: 2, 4, 6, 8.

12

Mean=5

Example 1:

Page 13: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

13

All the possible samples of n = 2TABLE 7.1

Notice that the table lists random samples. This requires sampling with replacement, so it is possible to select the same score twice.

Page 14: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

14

FIGURE 7.2

The distribution of sample means for n = 2. The distribution shows the 16 sample means from Table 7.1.

Mean of sample mean = 5

Page 15: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

15

Sample #1 #2 Sample mean

1 2 4 3

2 2 6 4

3 2 8 5

4 4 6 5

5 4 8 6

6 6 8 7

Sampling without replacement: 4C2 = 4!/(2! 2!) = 6

73 4 5 6 X

f

Mean of sample mean = 5

Page 16: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Partner Hours

Dunn 22

Hardy 26

Kiers 30

Malory 26

Tillman 22

A law firm has five partners. At their weekly partners meeting each reported the number of hours they billed clients for their services last week.

If two partners are selected randomly, how many different samples are possible?

Example 1Example 2:

16

Page 17: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

10)!25(!2

!525

C

5 objects taken 2 at a time. A total of 10 different

samples

Partners Total Mean 1,2 48 24 1,3 52 26 1,4 48 24 1,5 44 22 2,3 56 28 2,4 52 26 2,5 48 24 3,4 56 28 3,5 52 26 4,5 48 24

Example 1Sampling without replacement

17

Page 18: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Sample Mean Frequency Probability

22 1 1/10

24 4 4/10

26 3 3/10

28 2 2/10

Example 1As a sampling distribution

18

Page 19: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

2.2510

)2(28)3(26)2(24)1(22)X(E X

Compute the mean of the sample means. Compare it with the population mean.

The mean of the sample means

The population mean

2.255

2226302622

Notice that the mean of the sample means is exactly equal to the population mean.

19

Page 20: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Example 3• Take another population: 3, 6, 9, 12, 15• Population size N=5, sample size n=2, mean=9,

variance=18, SD=4.2426• The number of possible samples which can be

drawn without replacement is 5C2 =10

20

Page 21: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Variance = 6.7521

Page 22: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Example 4: Sampling All Stocks

• Population of returns of all 1,815 stocks listed on NYSE for 1987– See Figure on next slide– The mean rate of return m was –3.5% with a standard

deviation s of 26%

• Draw all possible random samples of size n=5 and calculate the sample mean return of each– Sample with a computer– See Figure on next slide

22

Page 23: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Example: Sampling All Stocks

23

Page 24: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Results from Sampling All Stocks

• Observations– Both histograms appear to be bell-shaped and centered

over the same mean of –3.5%– The histogram of the sample mean returns looks less

spread out than that of the individual returns

• Statistics– Mean of all sample means: µx = µ = -3.5%

– Standard deviation of all possible means:

%63.115

26

nx

24

Page 25: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

25

Examples above demonstrate the construction of the distribution of sample means for a relatively simple, specific situation. In most cases, however, it will not be possible to list all the samples and compute all the possible sample means. Therefore, it is necessary to develop the general characteristics of the distribution of sample means that can be applied in any situation. Fortunately, these characteristics are specified in a mathematical

proposition known as the central limit theorem. This important and useful theorem serves as a cornerstone for much of inferential statistics.

Page 26: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

General Conclusions

1. If the population of individual items is normal, then the population of all sample means is also normal

2. Even if the population of individual items is not normal, there are circumstances when the population of all sample means is normal (Central Limit Theorem)

26

Page 27: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

27

Central Limit Theorem: For any population with mean and standard deviation , the distribution of sample means for sample size n will have a mean of and a standard deviation of and will

approach a normal distribution as n becomes sufficiently large.

The value of this theorem comes from two simple facts. First, it describes the distribution of sample means for any population, no matter what shape, or mean, or standard deviation. Second, the distribution of sample means “approaches” a normaldistribution very rapidly. By the time the sample size reaches n > 30, the distribution is almost perfectly normal.

n

Page 28: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

If the samples size is large enough (n30), then we can consider the sample mean approximately follows a normal distribution f(X) ~ N/ n

This theorem also implies the variance of the sample mean is the population variance divided by n. (for large n)

n)X(Var

22

X

Central Limit Theorem

Averages are less variable than individual observations.

28

Page 29: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Sample Means

the sample size is large enough (n30).

Sample means follow the normal distribution under two conditions:

the population itself follows the normal distribution

OR

29

Page 30: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Distribution of data (normal distribution)

Distribution of all possible sample means

nx

x

The distribution of sample means is less spread out.

30

Page 31: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

31

The standard deviation of the distribution of sample means is called

The standard error measures the standard amount of difference between and that is reasonable to expect simply by chance.

It should be intuitively reasonable that the size of a sample should influence how accurately the sample represents its population. Specifically, a large sample should be more accurate than a small sample. In general, as the sample size increases, the error between the sample mean and the population mean

should decrease. This rule is also known as the law of large numbers.

Page 32: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

32

The law of large numbers states that the larger the sample size (n), the moreprobable it is that the sample mean will be close to the population mean.

The standard error provides a way to measure the “average” or standard distance between a sample mean and the population mean.

Page 33: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

33

The distribution of sample means for random samples as the size n increases

Page 34: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

34

The population of scores on the SAT forms a normal distribution with mean = 500 and sd = 100. If you take a random sample of n = 25 students, what is the probability that the sample mean would be greater than = 540?

You can restate this probability question as : Out of all the possible sample means, what proportion has values greater than 540?Need to determine the distribution of the sample mean with n = 25. We know:1. The distribution is normal because the population of SAT scores is normal.2. The distribution has a mean of 500 because the population mean is 500.3. The distribution has a standard error of 100/sqrt(25)

Example 5:

Page 35: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

35

The distribution of samplemeans for n = 25. Sampleswere selected from a normalpopulation with mean = 500 and sd= 100.

The next step is to use a z-score to locate the exact position of = 540 in the distribution.

nx

Page 36: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

36

The value 540 is located above the mean by 40 points, which is exactly 2 standard deviations (in this case, exactly 2 standard errors). Thus, the z-score for 540 is 2.00.Because this distribution of sample means is normal, you can use the unit normal table to find the probability associated with z>2.00. The table indicates that 0.0228 of the distribution is located in the tail of the distribution beyond z>2.00.Our conclusion is that it is very unlikely, p = 0.0228 (2.28%), to obtain a random sample of n = 25 students with an average SAT score greater than 540.

Page 37: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Example 6Suppose the mean selling price of a gallon of gasoline in the U.S. is $1.30. Further, assume the population is $0.28. What is the probability that the mean of a sample of 35 gasoline stations is between $1.22 and $1.38?

37

Page 38: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

The z-values corresponding to $1.22 and $1.38 are -1.69 and 1.69

From the table for standard normal distribution

9090.)4545(.2)69.1Z69.1(P

We would expect about 91% of the sample means to be within $0.08 of the population mean.

Example 2

38

Page 39: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Example 7

Assume that a school district has 10,000 sixth graders. In this district, the average weight of a sixth grader is 80 pounds, with a standard deviation of 20 pounds. Suppose you draw a random sample of 50 students. What is the probability that the average weight of a sampled student will be less than 75 pounds?

39

Page 40: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Example 7 cont.

• The standard deviation of the sampling distribution can be computed using the following formula.

• σx = 20 * sqrt(1/50) = 20 * 0.141 = 2.828

• The sampling distribution of the mean is normally distributed with a mean of 80 and a standard deviation of 2.83.

• To find from table: P(z<(75-80)/2.83)=P(z<-1.77)=0.038

nx

Page 41: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

The Central Limit Theorem Random Sample (x1, x2, …, xn)

Population Distribution

(, )

(right-skewed)

X

as n large

n, xx

Sampling Distribution of Sample Mean

(nearly normal)

x

41

Page 42: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Example: Central Limit Theorem Simulation

42

Page 43: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Histogram of Population - Bimodal Distribution: population = 16,000; mean = 5.002 std dev 4.242

Sampling Distribution (from a bimodal population) n = 2: number of samples = 4000; mean = 4.977; std dev 3.017;

43

Page 44: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

Sampling Distribution (from a bimodal population) n = 3: number of samples = 4000; mean = 4.946; std dev 2.425;

Sampling Distribution n = 30: number of samples = 4000; mean = 5.032; std dev 0.722;

44

Page 45: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

45

Most inferential statistics are used in the context of a research study. Typically, the researcher begins with a general question about how a treatment will affect the individuals in a population. For example,Will the drug affect blood pressure?Will the hormone affect growth?Will the special training affect students’ reading scores?

STANDARD ERROR AND STATISTICAL INFERENCE : Standard error as a measure of chance

Page 46: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

46

Page 47: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

47

The question for the researcher is how to interpret the 4-point difference. Specifically, there are two possible explanations:1. The treatment may have caused the scores in the sample to be 4 points higher.2. The 4-point difference may be sampling error. Remember, a sample mean is not expected to be exactly the same as the population mean. Perhaps the treatment has no effect at all, and the 4-point difference has occurred just bychance.The standard error can help the researcher decide between these two alternatives. In particular, the standard error tells exactly how much difference is reasonable to expect just by chance. For example, if the standard error is only 1 point, then the researcher could conclude that the observed difference (4 points) is much larger than would be expected by chance. In this case, it would be reasonable to conclude that the treatment has caused the difference.

Page 48: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

48

The standard error is reported in Scientific Journals in two ways. It may be reported in a table along with the sample means (see Table 7.2). Alternatively, the standard error may be reported in graphs.

Page 49: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

49

Figure 7.8 illustrates the use of a bar graph to display information about the sample mean and the standard error. Note that the mean is represented by the height of the bar, and the standard error is depicted on the graph by brackets at the top of each bar. Each bracket extends 1 standard error above and 1 standard error below the sample mean.

Page 50: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

50

Figure 7.9 shows how sample means and standard error are displayed on a line graph.

Page 51: Chapter 7 Sampling Distributions Statistics for Business (Env) 1

ONEExplain why sometime sampling is the only feasible way to learn about a population.TWO Define and construct a sample distribution of the sample mean.THREE Explain and apply the central limit theorem. FOUR

STANDARD ERROR AND STATISTICAL INFERENCE

Summary: Sampling Methods and the Central Sampling Methods and the Central Limit TheoremLimit Theorem

51