10.2 estimating a population mean when σ is unknown when we don’t know σ, we substitute the...

10.2 Estimating a Population Mean when σ is Unknown

When we don’t know σ, we substitute the standard error

of for its standard deviation. This is the standard error

of the sample mean

ns

x

The distribution of the resulting statistic, t, is not Normal. It’s a t distribution.

There is a different t distribution for each sample size n. We specify a particular t distribution by giving its degrees of freedom (df). When we perform inference about µ using a t distribution, the appropriate degrees of freedom is df = n – 1.

We write the t distribution with k degrees of freedom or t(k) for short. We refer to the standard Normal distribution as the z distribution.

While there is just a single Normal distribution for a given mean and standard deviation, there is a different t distribution for each n.

The table of Standard Normal Probabilities (Table A) is the table for Normal distribution, and the table of t Distribution Critical Values (Table C) is a snapshot of many t distributions.

Using the t procedure• Except in the case of small samples, the assumption that the data are an SRS or a random sample from the population of interest is more important than the assumption that the population distribution is Normal.• Sample size less than 15. Use t procedures if the data are close to Normal. If the data are clearly non- Normal or if outliers are present, do not use t procedures.• Sample size at least 15. The t procedures can be used except in the presence of outliers or strong skewness.• Large samples. The t procedures can be used even for clearly skewed distributions when the sample is large, say n ≥ 30.

Note:The points on the previous slide are when you an use the t procedures. Students commonly try to justify the use of z procedures based on large sample size rather than on knowledge of σ. You will not receive full credit for such reasoning. To be safe, the use of t procedures is warranted any time inference is done without knowledge of σ (which is almost always) – no matter how large n is.

Conditions

1. SRS – Random sample or SRS2. Normality – Observations from the population have

a Normal distribution with a mean, µ. And a standard deviation, σ. In practice, it is enough that the distribution be symmetric and single-peaked unless the sample is very small. Sample size should be ≥ 30. Both µ and σ are unknown.

3. Independence – To keep calculations reasonably accurate when we sample from a finite population, we should verify that the population size is at least 10 times the sample size.

Example 10.9 (from your book – pg 646)

Environmentalists, government officials, and vehicle manufacturers are all interested in studying the auto exhaust emissions produced by motor vehicles. The major pollutants in auto exhaust from gasoline engines are hydrocarbons, monoxide, and nitrogen oxides (NOX) The NOX levels (in grams per mile) are given for a random sample of light-duty engines of the same type.

1.28 1.17 1.16 1.08 0.60 1.32 1.24 0.71 0.490.95 2.20 1.78 1.83 1.26 1.73 1.31 1.80 1.151.31 1.45 1.22 1.32 1.47 1.44 0.51 1.49 1.332.27 1.87 2.94 1.16 1.45 1.51 1.47 1.06 2.011.38 1.20 0.78 0.97 1.12 0.72 0.86 0.57 1.791.49

Construct and interpret a 95% confidence interval for the mean amount of NOX emitted by light-duty engines of this type.

Step 1: Parameters – The population of interest is all light-duty engines of this type. We want to estimate µ, the mean amount of the pollutant NOX emitted, for all of these engines.

Step 2: Conditions – Since we do not know σ, we should construct a one-sample t interval for the mean NOX level µ if the conditions are satisfied.• SRS – the data come from a “random sample” of 46 engines from a population of all light-duty engines of this type. A random sample is sufficient.• Normality – Is the population distribution of NOX emission Normal? We don’t know from the problem statement, so examine the sample data. See the stem, box, and probability plots in the book. For t tests, nearly normal is sufficient.• Independence – We must assume that there are at least 460 light-duty engines of this type since we are sampling without replacement.

Step 3: Calculations

Put the info in your calculator and find the mean and the standard deviation of the sample. The mean is 1.329 and the standard deviation is 0.484.

n = 46 and df = 46 – 1 = 45.

There is no row corresponding to 45 degrees of freedom in Table C. When the actual df does not appear in Table C, use the greatest df available that is less than your desired df. So, in this case, use df = 40. This will yield a higher critical value t*, and thus a wider confidence interval.

46484.0

021.2329.1*

ns

tx

1.329 ± 0.144 = (1.185, 1.473)

Step 4: Interpretation

We are 95% confident that the true mean level of nitrogen oxides emitted by this type of light-duty engine is between 1.185 and 1.473 grams/mile.

Paired t Procedures

One sample inference is less common than comparative inference. A common design is to compare two treatments. To compare the responses to the two treatments in a matched pairs design or before-and-after measurements on the same subjects, apply one-sample t procedures to the observed differences in each pair.The proper analysis depends on the design used to produce the data.

The parameter µ in a paired t procedure is

• the mean difference in the responses to the two treatments within matched pairs of subjects in the entire population (when subjects are matched in pairs), or• the mean difference in response to the two treatments for individuals in the population (when the same subject receives both treatments), or• the mean difference between before-and-after measurements for all individuals in the population (for before-and-after observations on the same individuals).

Special Note

Random selection and random assignment serve different purposes. Experiments are rarely done on randomly selected subjects, although sample surveys often are, because the purpose is often to compare two treatments rather than to generalize to a larger population. Random selection allows us to generalize, and random assignment allow us to compare treatments.

Make sure you understand the different purposes of random selection and random assignment.

Do Now: ex 10.22 page 657 in your book

Do piano lessons improve the spatial-temporal reasoning of preschool children? Neurobiological arguments suggest that his may be true. A study designed to test this hypothesis measured the spatial-temporal reasoning of 34 preschool children before and after six months of piano lessons. (The study also included children who took computer lessons and a control group; but we are not concerned with those here.) The changes in the reasoning scores are:

2 5 7 -2 2 7 4 1 0 7 3 4 3 4 9 4 5 2 9 6 0 3 6 -1 3 4 6 7 -2 7 -3 3 4 4

a) Display the data and summarize the distribution.

b) Construct and interpret a 95% confidence interval for the mean improvements in reasoning scores.

c) Can we conclude that playing the piano causes an improvement in spatial-temporal reasoning in children? Justify your answer.

a) The distribution is roughly symmetric with mean = 3.62 and a standard deviation s = 3.055.

b) df = 30 (this is the closest to the sample size), t* = 2.042

x

34055.3

042.262.3* ns

tx

The interval is (2.548, 4.684). I am 95% confident that the mean change in reasoning score after six months of piano lessons for all preschool children is between 2.55 and 4.68 points.

c) No. We don’t know that students were assigned at random to the groups in this study. Also, some improvements could come with increased age.

10.2 estimating a population mean when σ is unknown when we don’t know σ, we substitute the...

Documents