lecture8 handout

52
Lecture 8: Estimation Matt Golder & Sona Golder Pennsylvania State University Introduction Introduction Populations are characterized by numerical descriptive measures called parameters. These parameters that describe a population are xed constants . Important population parameters that we might be interested in are the population mean µ and variance σ 2 . A paramet er of interest is often called a target parameter. Methods for making inferences about parameters fall into one of two categories 1 We will estimate (predict) the value of the target parameter of interest. “What is the value of the population parameter?” 2 We will test a hypothesis about the value of the target parameter. “Is the parameter value equal to this specic value?” We’ll use greek letters for population parameters (like θ), and letters with “hats” ( ˆ θ) for specic data-based estimates of those parameters. Notes Notes Notes

Upload: froso-erotokritou

Post on 14-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 1/51

Lecture 8: Estimation

Matt Golder & Sona Golder

Pennsylvania State University

Introduction

Introduction

Populations are characterized by numerical descriptive measures calledparameters. These parameters that describe a population are fixed constants .

Important population parameters that we might be interested in are thepopulation mean µ and variance σ2. A parameter of interest is often called atarget parameter.

Methods for making inferences about parameters fall into one of two categories

1 We will estimate (predict) the value of the target parameter of interest.“What is the value of the population parameter?”

2 We will test a hypothesis  about the value of the target parameter.“Is the parameter value equal to this specific value?”

We’ll use greek letters for population parameters (like θ), and letters with“hats” (θ) for specific data-based estimates of those parameters.

Notes

Notes

Notes

Page 2: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 2/51

Introduction

Suppose we are interested in estimating the mean waiting time µ in asupermarket. We can give our estimate in two forms.

1 Point estimate: A single value or point – say, 3 minutes – that we thinkis close to the unknown population mean µ.

2 Interval estimate: Two values that correspond to an interval – say 2 and4 minutes – that is intended to enclose the parameter of interest µ.

Estimation is accomplished by using an estimator for the target parameter.

Introduction

An estimator is a rule, often expressed as a formula, that tells us how tocalculate the value of an estimate based on the measurements contained in asample.

For example, the sample mean

X  =1

N i=1

X i

is one possible point estimator of the population mean µ.

Recall that any estimate we make is itself a random variable.

Random Variables

One way to think of a random variable is that it is made up of two parts: asystematic component and a random part.

X i = µ + ui

This implies something about u:

ui = X i − µ

Notes

Notes

Notes

Page 3: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 3/51

Random Variables

What is the expected value of  u?

E(u) = E(X − µ)

= E(X )− E(µ)

= E(X )− µ= µ− µ

= 0

The mean is a number such that, if it is subtracted from each value of  X  in

the sample, the sum of those differences will be zero.

Random Variables

What about the variance of  X  and u?

Var(X ) = E[(X − µ)2]

= E[u2]

Var(u) = E[(u− E (u))2]

= E[(u− 0)2]

= E[u2]

If we define a random variable as composed of a fixed part and a random part,then:

The variable will have a population mean (i.e. E (X )) equal to µ, and

The variance of  X  is equal to the variance of  u.

Estimates are Random Variables

Suppose that:

We want to know µ for the population, but

we only have data on a sample of  N  observations from the population, so

we use these data to estimate  the mean.

X  =1

N i=1

X i

Notes

Notes

Notes

Page 4: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 4/51

Estimates are Random Variables

Recalling that each X i = µ + ui, then we can write:

X  =1

N i=1

(µ + ui)

=

1

N i=1(µ) +

1

N i=1(ui)

=1

N (Nµ ) +

1

N i=1

(ui)

= µ + u

Estimates are Random Variables

This means that

The estimate of the mean is itself a random variable .

For different samples, we’ll get different values of  u (the sample-based“average” of the stochastic component of  X ), and correspondinglydifferent estimates of the mean.

All  of this (stochastic) variation is due to the random component of  X .

We could show in analogous fashion that the usual estimate of the variance σ2

(s2) is also a random variable.

Properties of Estimators

In theory, there are many different estimators. The one(s) we will choose willdepend on their properties .

There are two general types of properties of estimators:

Small-Sample Properties

These properties hold irrespective of the size of the sample on which theestimate is based.

In other words, in order for a estimator to have these properties, theymust hold for all possible sample sizes.

Notes

Notes

Notes

Page 5: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 5/51

Properties of Estimators

Large-Sample (Asymptotic) Properties

These are properties which hold only as the sample size increases toinfinity.

In practical terms, it means that to receive the benefits of theseproperties, “more is better” (at least as far as sample size goes).

In what follows, we’ll consider an abstract population parameter θ (a mean, a

correlation, etc.) We’ll assume that we estimate it with a sample of  N 

observations. And we’ll call this generic estimator θ.

Unbiased Point Estimators

We generally prefer that estimators be “accurate” i.e., that they reflect thepopulation parameter as closely as possible.

E(θ) = θ

If this property holds, then we say an estimator is unbiased .

An unbiased estimator is one for which its expected value is thepopulation parameter.

Unbiased Point Estimators

Definition: Let θ be a point estimator for parameter θ. Then θ is an unbiasedestimator if  E (θ) = θ. If  E (θ) = θ, then θ is said to be biased. The bias of apoint estimator θ is given by B(θ) = E (θ)− θ.

Figure: Sampling Distribution for an Unbiased and Biased Estimator

 

E( 1)θ 

θ   

f( )θ  1

θ  2

θ  1

θ   E( 2)θ 

Bias

f( 2)θ 

Notes

Notes

Notes

Page 6: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 6/51

Unbiased Point Estimators

We’ve already seen that the sample mean is an unbiased estimate of thepopulation mean:

E(X ) = E(µ + u)

= E(µ) + E(u)

= µ + 0= µ

But only if we have random sampling.

Unbiased Point Estimators and Random Sampling

Example : Suppose each of 200,000 people in a city under study has eaten X number of fast-food meals in the last week. However, a residential phonesurvey on a week-day afternoon misses those who are working - the very peoplemost likely to eat fast food.

Table: Target Population and Biased Subpopulation

Whole Target Population Subpopulation RespondingX  = Meals Frequency Relative Frequency Frequency Relative Frequency

0 100,000 0.50 38,000 0.761 40,000 0.20 6,000 0.122 40,000 0.20 4,000 0.083 20,000 0.10 2,000 0.04

200,000 1.00 50,000 1,00

Unbiased Point Estimators and Random Sampling

Population mean: µ = 0(0.5) + 1(0.2) + 2(0.2) + 3(0.1) = 0.9.Subpopulation mean: µR = 0(0.76) + 1(0.12) + 2(0.08) + 3(0.04) = 0.4.

A random sample of 200 phone calls during the week will bring a response rateof about 50, whose average R will be used to estimate µ. What is the bias?

The sample mean R has an obvious non-response bias.

Bias = E (R)− µ

= µR − µ

= 0.4− 0.9 = −0.5

The bias is large and will lead researchers to underestimate the number of fastfood meals eaten in a week by 0.5.

Notes

Notes

Notes

Page 7: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 7/51

Unbiased Point Estimators

So, how do we know if an estimator is unbiased?

As the example of the sample mean illustrates, we can sometimes prove it.

Other times, it can be difficult or impossible to show that an estimator is

unbiased.

Moreover, there may be many, many unbiased estimators for a particularpopulation parameter.

Unbiased Point Estimators

Example: Consider a sample of two observations X 1 and X 2, and a generalizedestimator for the mean:

Z  = λ1X 1 + λ2X 2.

E(Z ) = E(λ1X 1 + λ2X 2)

= E(λ1X 1) + E(λ2X 2)

= λ1E(X 1) + λ2E(X 2)

= λ1µ + λ2µ= (λ1 + λ2)µ

Unbiased Point Estimators

So long as (λ1 + λ2) = 1.0, then E(Z ) = µ and the estimator is unbiased.

This means that there are, in principle, an infinite number of unbiasedestimators.

We could extend this to N  observations: So long as the sum of the“weights” add up to 1.0, the estimate is unbiased.

So how do we choose which estimator to use?

Notes

Notes

Notes

Page 8: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 8/51

Relative Efficiency of Point Estimators

If  θ1 and θ2 denote two unbiased estimators for the same parameter θ, weprefer to use the estimator with the smaller variance i.e. the estimator whosesampling distribution is more concentrated around the target parameter. Thisis the notion of  efficiency.

Figure: Comparing the Efficiency of Estimators

f( 1)θ 

 

θ  1 θ f( 2)θ   

θ θ    2

Relative Efficiency of Point Estimators

To compare the relative efficiency of two estimators we examine the ratio of their variances.

Definition: Given two unbiased estimators θ1 and θ2 of a parameter θ, withvariances Var(θ1) and Var(θ2), respectively, then the efficiency of  θ1 relative toθ2, denoted eff (θ1, θ2), is defined by the ratio

eff (θ1, θ2) ≡ Var(θ2)

Var(θ1)

If  θ1 and θ2 are unbiased estimators for θ, eff (θ1, θ2), is greater than 1 only if Var(θ2) > Var(θ1). In this case, θ1 is a better unbiased estimator than θ2.

If eff (θ1, θ2) < 1, then θ2 is preferred to θ2.

Relative Efficiency of Point Estimators

Example : We saw a long time ago that when the population being sampled isexactly symmetric, its center can be estimated without bias by the samplemean X  and the sample median X Med. But which is more efficient?

When we sample from a normal population, it can be shown that

Var(X Med) 1.57 σ2

N  .

And we already know that for a normal population Var(X ) = σ2

N  .

Notes

Notes

Notes

Page 9: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 9/51

Relative Efficiency of Point Estimators

This means that:

eff (X, X Med ) ≡ Var(X Med)

Var(X )

1.57σ2/N 

σ2/N 

= 1.57 = 157%

The sample mean X  is 57% more efficient than the sample median X Med.

The sample median will yield as accurate an estimate as the sample mean onlyif we take a 57% larger sample.

Relative Efficiency of Point Estimators

Example : A distribution with thicker tails than the normal distribution is calledthe Laplace distribution. What is the efficiency of the sample mean relative tothe sample median now?

Figure: Comparing the Standard Normal Distribution and Standard LaplaceDistribution

 

Relative Efficiency of Point Estimators

In sampling from a Laplace distribution, Var(X Med) 0.5 σ2

N  .

And so, we have:

eff (X, X Med ) ≡ Var(X Med)

Var(X ) 0.5σ2/N 

σ2/N = 0.57 = 50%

The sample mean is less efficient than the sample median in this case.

If a symmetric population has thick tails, so that outlying observations arelikely to occur, then the sample mean has a larger variance. This is b ecause ittakes into account all observations, even the distant outliers that the samplemedian ignores.

Notes

Notes

Notes

Page 10: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 10/51

Relative Efficiency of Point Estimators

Example : Consider our generalized estimator Z  = λ1X 1 + λ2X 2. Consider thevariance of  Z .

Var(Z ) = Var(λ1X 1 + λ2X 2)

= (λ21 + λ2

2)σ2

We want to know what combination minimizes this variance. Since we knowthat λ1 + λ2 = 1.0, we can rewrite:

λ21 + λ2

2 = λ21 + (1 − λ1)2

= λ21 + (1 − 2λ1 + λ2

1)

= 2λ21 − 2λ1 + 1

Relative Efficiency of Point Estimators

We then minimize this by taking the derivative with respect to λ1 and settingequal to zero:

4λ1 − 2 = 0

λ1 = 0.5

So the (equally-weighted) sample average has the smallest variance of all the

possible unbiased estimators for the population mean.

Efficient Point Estimators

So far, we have talked about the relative efficiency of point estimators.

However, we sometimes talk about an efficient estimator in absolute terms.

An efficient estimator is an unbiased estimator whose variance is equal towhat is known as the Cramer-Rao lower bound, I (θ).

If we can show that an estimator is equal to the Cramer-Rao lower bound, thenwe know that we have the most  efficient estimator.

Note that an efficient estimator must be unbiased. It is possible that a biasedestimator has a smaller variance than I (θ), but that does not make it efficient.

Notes

Notes

Notes

Page 11: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 11/51

Efficient Point Estimators

Let X 1, X 2, . . . , X  N  denote a random sample from a probability densityfunction f (x), which has unknown parameter θ. If  θ is an unbiased estimatorof  θ, then under very general conditions

Var(θ) ≥ I (θ)

where

I (θ) =

NE 

−∂ 2lnf (x)

∂θ2

−1

This is known as the Cramer-Rao inequality.

If Var(θ) = I (θ), then the estimator θ is said to be efficient.

Mean Squared Error

Clearly one would like to have unbiased and efficient estimators. If one had thechoice of two unbiased estimators, we would chose the more efficient one.

But what if we are comparing both biased and unbiased estimators? It turnsout that it may no longer be appropriate to select the estimator with leastvariance or the estimator with least bias.

Maybe we will want to “tradeoff” some bias in favor of gains in efficiency, orvice versa.

Mean Squared Error

How do we decide which estimator is closest to the target parameter θ overall?

Figure: Mean Squared Error

 

θ  3

θ  2

θ  1 

θ  

It turns out that we use something called the mean squared error (MSE).

Notes

Notes

Notes

Page 12: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 12/51

Mean Squared Error

Definition: The mean squared error (MSE) of a point estimator θ is

MSE(θ) = E [(θ − θ)2]

The mean squared error is, therefore, the expected squared bias of an estimator.

The MSE can be re-written as

MSE(θ) = Var(θ) + [B(θ)]2

This shows that the MSE reduces to the variance for unbiased estimators.

Mean Squared Error

The MSE can be regarded as a general kind of variance that applies to eitherunbiased or biased estimators.

This leads to the general definition of the relative efficiency of two estimators.

Definition: For any two estimators – whether biased or unbiased –

eff (θ1, θ2) ≡ MSE(θ2)

MSE(θ1)

With regards to our three hypothetical estimators shown earlier, θ2 has theleast mean squared error and is therefore the more “efficient” estimator.

If we choose this estimator, we’d be trading off slightly more bias for greaterefficiency.

Mean Squared Error

Example: We want to estimate µ with a sample size of  N . One estimator isthe mean (X ), which has:

B(X ) = 0 (because the mean is an unbiased estimator of  µ).

Var(X ) = σ2/N , where σ2 is the variance of  X .

MSE(X ) = σ2/N + (0)2 = σ2/N .

An alternative estimator, λ, might be:

λ = 6

In other words, this estimator says that its guess of the expectation of  X  is

always equal to six.

Notes

Notes

Notes

Page 13: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 13/51

Mean Squared Error

The bias of  λ, B(λ), is:

B(λ) = E(λ − µ) = E(6)− E(µ) = 6− µ

The variance of  λ is:

Var(λ) = Var(6) = 0

And, the MSE of  λ is:

MSE(λ) = Var(λ) + [B(λ)]2

= 0 + (6− µ)2

= 36 − 12µ + µ2

Mean Squared Error

Figure: Mean Squared Error

 

The black line is the MSE of  λ as a function of the “true” population mean µ.

The red lines are the MSEs for X , under the assumption that σ2 = 10 andN  = {20, 100, 1000}, respectively.

Mean Squared Error

There are several things to note:

The MSE of  λ is quite good if  µ ≈ 6. In some circumstances, the MSEfor λ will be smaller than that for X , even though X  is both unbiasedand a “better” estimator.

But, the MSE of  λ gets much worse as µ gets further away from six.Since we don’t know whether µ = 6 or not, this is not a desirableproperty.

Relatedly, our estimator λ doesn’t “improve” in MSE terms if we addmore data to our sample (that is, as N →∞).

In contrast, the MSE of  X  drops considerably as N  increases, and doesso irrespective of the “true” value of  µ.

This example illustrates that while MSE can be a good way to choose amongestimators, it shouldn’t be applied uncritically.

Notes

Notes

Notes

Page 14: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 14/51

Mean Squared Error

Example : Recall the phone survey of 50 responses from 200 calls that had aserious non-response bias. In addition, the average response R has variabilitytoo. Calculate the MSE of  R.

Table: Biased Subpopulation

r f (r) rf (r) r−

µR (r−

µR)2 (r−

µR)2f (r)0 0.76 0 -0.4 0.16 0.12161 0.12 0.12 0.6 0.36 0.04322 0.08 0.16 1.6 2.56 0.20483 0.04 0.12 2.6 6.76 0.2704

µR = 0.4 σ2R = 0.64

Mean Squared Error

We saw from earlier that the bias was -0.5.

Var(R) =σ2

R

N =

0.64

50= 0.013

MSE(R) = Var(R) + [Bias(R)]2

= 0.013 + 0.25 = 0.263

Mean Squared Error

If we increase the sample size fivefold, how much would the MSE be reduced?

Notes

Notes

Notes

Page 15: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 15/51

Mean Squared Error

If we increase the sample size fivefold, how much would the MSE be reduced?

Var(R) =σ2

R

N =

0.64

5 × 50= 0.003

The increase in sample size would not affect the bias and so

MSE(R) = Var(R) + [Bias(R)]2

= 0.003 + 0.25 = 0.253

Given that the main term in the MSE is the bias and this has not been

reduced, an increase in sample size does not affect the MSE that much.

Mean Squared Error

A second statistician takes a sample survey of only N  = 20 phone calls, withpersistent follow-up until he gets a response. Let this small but unbiasedsample have a sample mean denoted by X . What is the MSE?

Table: Whole Population

x f (x) xf (x) x− µ (x− µ)2 (x− µ)2f (x)0 0.50 0 -0.90 0.81 0.4051 0.20 0.20 0.10 0.01 0.0022 0.20 0.40 1.10 1.21 0.242

3 0.10 0.30 2.10 4.41 0.441µ = 0.9 σ2 = 1.09

Mean Squared Error

Var(X ) =σ2

X

N =

1.09

20= 0.055

MSE(X ) = Var(X ) + [Bias(X )]2 = 0.055 + 0 = 0.055

The variance is larger due to the smaller sample size but the mean squarederror is much smaller.

In publishing his results, the second statistician is criticized for using a sample

only 1/10 the size of the first statistician. What defense might he offer?

Notes

Notes

Notes

Page 16: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 16/51

Mean Squared Error

Var(X ) =σ2

X

N =

1.09

20= 0.055

MSE(¯

X ) = Var(¯

X ) + [Bias(¯

X )]

2

= 0.055 + 0 = 0.055

The variance is larger due to the smaller sample size but the mean squarederror is much smaller.

In publishing his results, the second statistician is criticized for using a sampleonly 1/10 the size of the first statistician. What defense might he offer?

MSE(X ) = 0.055

MSE(R) = 0.253

Large Sample Properties

Unbiasedness, relative efficiency, and efficiency are small-sample  properties of estimators, that hold irrespective of sample size.

In contrast, large-sample properties  are properties of estimators that hold onlyas the sample size increases without limit.

Note that this is dependent on sample size , not on the “number” of samplesdrawn.

Intuitively: what would you expect to happen as sample size gets larger?

The variance around the “true” value decreases (less possibility of drawing a “bad” sample).

Eventually the sample size = population, and the estimate “collapses” onthe true value.

Consistent Estimators

In an informal sense, a consistent estimator is one that concentrates in anarrower and narrower band around its target as sample size N  increasesindefinitely.

Figure: Consistency

 θ  

N=5

N =10

N=50

N=200

One of the conditions that makes an estimator consistent is if its bias and

variance both approach zero as the sample size increases.

Notes

Notes

Notes

Page 17: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 17/51

Consistent Estimators

Definition: The estimator θN  is said to be a consistent estimator of  θ if itconverges in probability to its population value as N  goes to infinity. We writethis as:

limN →∞

Pr|θN − θ| ≤

= 1

or equivalently

limN →∞

Pr|θN − θ| >

= 0

for an arbitrarily small > 0.

If we consider an estimator whose properties vary by sample size (say θN ), then

θN  is consistent if E(θN ) → θ as N →∞.

Consistent Estimators

Example : Is the sample mean X  a consistent estimator of the population meanµ?

We know that X  is unbiased and that Var

σ2

approaches zero as N 

increases.

As a result, X  is both an unbiased and consistent estimator of  µ.

Consistent Estimators

The fact that the sample mean is consistent for the population mean, orconverges in probability to the population mean, is sometimes referred to as alaw of large numbers.

This provides the theoretical justification for the averaging process that many

employ to obtain precision in measurements. For example, an experimenter

may take the average of the weights of many animals to obtain a precise

estimate of the average weight of animals in a species.

Notes

Notes

Notes

Page 18: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 18/51

Consistent Estimators

Is P  a consistent estimator of  π? Is the average response in our fast foodexample R a consistent estimator of  µ?

Consistent Estimators

Is P  a consistent estimator of  π? Is the average response in our fast foodexample R a consistent estimator of  µ?

Because proportions are just disguised means, it follows that P  is also anunbiased and consistent estimator of  π.

Consistent Estimators

Is P  a consistent estimator of  π? Is the average response in our fast foodexample R a consistent estimator of  µ?

Because proportions are just disguised means, it follows that P  is also anunbiased and consistent estimator of  π.

In terms of our fast food example, we saw that the estimator R concentrated

around µR = 0 .40, which is far below the target µ = 0.90. Thus, R is

inconsistent.

Notes

Notes

Notes

Page 19: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 19/51

Asymptotically Unbiased Estimators

An asymptotically unbiased estimator has a bias that tends to zero as samplesize N  increases. If its variance also tends to zero, then the estimator isconsistent.

Although the MSD estimator is a biased estimator of the population varianceσ2, is it asymptotically unbiased?

Mean Squared Deviation =1

N i=1

(X N − X )2

Recall from last time that the sample variance is an unbiased estimator of thepopulation variance

s2 =1

N − 1

N i=1

(X N − X )2

Asymptotically Unbiased Estimators

We can write the MSD in terms of the unbiased s2.

Mean Squared Deviation =

N − 1

s2 =

1− 1

s2

E (MSD) =

1− 1

E (s2)

=

1− 1

σ2 = σ2 −

1

σ2

Since1

N  tends to zero as N  increases, the bias tends to zero. As a result, theMSD is biased but asymptotically unbiased.

It can also be shown that the variance of MSD approaches zero as the sample

size increases. As a result, the MSD is a consistent estimator of the population

variance.

Asymptotic Efficiency

Asymptotic efficiency can be thought of as efficiency as N →∞.

It is intuitive to think of this as the “speed” with which θ “collapses” on θ.

All else equal, we prefer an estimator that does so faster (i.e. for smallersample sizes) rather than more slowly.

Notes

Notes

Notes

Page 20: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 20/51

General Issues

We prefer estimators that have desirable small-sample properties.

We prefer unbiased to consistent estimators, and

We prefer efficient to asymptotically efficient ones.

but...

We can’t always figure out the small sample properties of certain

estimators, and/or

Our estimators with desirable small-sample properties may have otherproblems (e.g. computational cost).

As a result, we often have to choose among estimators that differ in theirdegree of desirable properties.

Properties of Point Estimators

To go to Properties of Point Estimators applet, click here

Some Common Unbiased Point Estimators

Table: Expected Values and Standard Errors of Common Point Estimators

Target Parameter θ Sample Size(s) Point Estimator θ E(θ) Standard Error σθ

µ N  X µ σN 

π N P  = XN 

= X π 

P (1−P )N 

µ1 − µ2 N 1 and N 2 X1 − X2 µ1 − µ2

 σ21N 1

+σ22N 2

π1 − π2 N 1 and N 2 P 1 − P 2 π1 − π2

 P 1(1−P 1)

N 1+

P 2(1−P 2)N 2

The difference in means and difference in proportions assume that the randomsamples are independent.

All four estimators in the table possess sampling distributions that are

approximately normal for large samples.

Notes

Notes

Notes

Page 21: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 21/51

Interval Estimators and Confidence Intervals

An interval estimator is a rule specifying the method for using the samplemeasurements to calculate two numbers that form the endpoints of an interval.

Ideally, the resulting interval will have two properties.

1 It should contain the target parameter θ.

2 It should be as narrow as possible.

The length and location of the interval are random variables and we cannot becertain that a (fixed) target parameter will fall in the interval calculated from asingle sample.

We want to find an interval estimator capable of generating narrow intervalsthat have a high probability of enclosing θ.

Interval Estimators and Confidence Intervals

Interval estimators are more commonly called confidence intervals.

The upper and lower end points of a confidence interval are called the upperand lower confidence limits (bounds).

The probability that a (random) confidence interval will enclose θ (a fixedquantity) is called the confidence coefficient.

The confidence coefficient identifies the fraction of the time, in repeatedsampling, that the intervals constructed will contain the target parameter θ.

If the confidence coefficient associated with our estimator is high, then we can

be highly confident that any confidence interval, constructed by using the

results from a single sample, will enclose θ.

Interval Estimators and Confidence Intervals

Suppose that θL and θU  are the (random) lower and upper confidence limits,respectively, for a parameter θ.

Then if 

Pr(θL ≤ θ ≤ θU ) = 1− α

then the probability (1−α) is the confidence coefficient (or level of confidence).

The resulting random interval defined by [θL, θR] is called a two-sidedconfidence interval.

The value of  1

−α is something that is determined by the researcher, and is

usually set with an eye to whether she is more concerned with the parameter θbeing in the confidence interval, or with the relative precision of the intervalestimate.

Notes

Notes

Notes

Page 22: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 22/51

Interval Estimators and Confidence Intervals

It is also possible to form a lower one-sided confidence interval such that

Pr(θL ≤ θ) = 1− α

The implied confidence interval here is [θL,∞).

Similarly, we could have what is called an upper one-sided confidence intervalsuch that

Pr(θ ≤ θU ) = 1 − α

The implied confidence interval here is (−∞, θU ).

Interval Estimators and Confidence Intervals

One method for finding confidence intervals is called the pivotal method .

To use this method, we must have a pivotal quantity that possesses twocharacteristics:

1 It is a function of the sample measurements and the unknown parameterθ, where θ is the only unknown quantity.

2 Its probability distribution does not depend on the parameter θ.

If an estimator has these characteristics, then (as we’ll discuss below) we can

use simple linear transformations to construct confidence intervals.

Large-Sample Confidence Intervals

As we noted previously, the sampling distribution of a mean (or any sum of asufficiently large number of independent random variables) follows a normaldistribution.

Our typical estimator of  µ, denoted X , can be thought of as being normallydistributed:

X ∼ N (µ, σ2X )

where we defined σ2X = σ2

N  and σ2 is just the variance of  X .

Notes

Notes

Notes

Page 23: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 23/51

Large-Sample Confidence Intervals

To use the pivotal method, we must have a pivotal quantity that possesses twocharacteristics:

1 It is a function of the sample measurements and the unknown parameterθ, where θ is the only unknown quantity.

2 Its probability distribution does not depend on the parameter θ.

With respect to these two criteria:

1 The sample mean X  depends only on the values of  X  in the sample, andon the value of  µ.

2 The shape  of its sampling distribution does not depend on µ, but only onother things (like the size of the sample).

Large-Sample Confidence Intervals

To construct a confidence interval, we can start with the sample statistic X .

Since we know that E(X ) = µ, it makes sense to use the sample value X  asthe “center” or “pivot” of our confidence interval.

Next, we choose a level of confidence – tradition suggests that we set1− α = 0.95 (a “95 percent level of confidence”), though there’s nothingspecial about this number.

This means that we want to create a confidence interval such that

Pr(X L ≤ µ ≤ X U ) = 0.95

Large-Sample Confidence Intervals

One way of calculating the bounds of the confidence interval is to choose X Land X U  so that

Pr(µ < X L) =

 XL

−∞φX (u) du = 0.025

and

Pr(µ > X H ) =

 ∞XH

φX (u) du = 0.025.

Since we know the parameters of  φX – that is, the distribution is N (µ, σ2X ) –

calculating values for the upper and lower limits of a confidence interval isstraightforward.

Notes

Notes

Notes

Page 24: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 24/51

Large-Sample Confidence Intervals

More generally, for any sample statistic θ which is an estimator of  θ (where θmight be µ, π, µ1 − µ2, or π1 − π2) and whose sampling distribution is Normal(in large samples), the statistic

Z  =θ − θ

σθ

is distributed according to a standard normal distribution.

As a result, Z  forms (at least approximately) a pivotal quantity – it is a

function of the sample measurements θ and a single unknown parameter θ, and

the standard normal distribution does not depend on θ.

Large-Sample Confidence Intervals

We can consider two values in the tails of that standard normal distribution−z α/2 and z α/2 such that

Pr(−z α/2 ≤ Z ≤ z α/2) = 1− α.

We can rewrite this as

1−

α = Pr−z α/2 ≤

θ

−θ

σθ ≤z 

α/2

= Pr−z α/2σθ ≤ θ − θ ≤ z α/2σθ

= Pr

−θ − z α/2σθ ≤ −θ ≤ −θ + z α/2σθ

= Pr

θ − z α/2σθ ≤ θ ≤ θ + z α/2σθ

Large-Sample Confidence Intervals

Figure: Location of −z α/2 and z α/2

 

zα/2-zα/2

α/2α/2

1-α

This means that a (1− α)× 100-percent confidence interval for θ is given by

[θL, θU ] =

θ − z α/2σθ, θ + z α/2σθ

Notes

Notes

Notes

Page 25: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 25/51

Large-Sample Confidence Intervals

Thus, constructing a confidence interval for a variable whose (asymptotic)sampling distribution is normal consists of five steps:

1 Select your level of confidence 1 − α.

2 Calculate the sample statistic θ.

3 Calculate the z -value associated with the 1− α level of confidence.

4 Multiply that z -value by σθ, the standard error of the sampling statistic.

5 Construct the confidence interval according to

[θL, θU ] =

θ − z α/2σθ , θ + z α/2σθ

.

Large-Sample Confidence Intervals: Mean

Example (Mean): The shopping times of  N  = 64 randomly selected customersat a supermarket were recorded. The average and variance of the 64 shoppingtimes were 33 and 256 respectively. Estimate µ, the true average shoppingtime per customer, with a confidence coefficient of  1− α = 0.90 i.e. a 90%confidence interval.

In this case, we are interested in target parameter θ = µ. Thus, θ = X  = 33and s2 = 256 for a sample of  N  = 64 . The population variance σ2 is unknown,so we will use s2 as its estimated value.

The confidence interval

θ ± z α/2σθ

has the form

X ± z α/2

σ√ N 

≈ X ± z α/2

s√ N 

Large-Sample Confidence Intervals: Mean

If we use a standard normal distribution table, we can find thatz α/2 = z 0.05 = 1.645.

Thus, the confidence intervals are

X − z α/2

s√ N 

= 33 − 1.645

16

8

= 29.71

X + z α/2

s√ N 

= 33 + 1.645

16

8

= 36.29

In other words, our confidence interval for µ is [29.71, 36.29].

Notes

Notes

Notes

Page 26: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 26/51

Interpreting Confidence Intervals

Our confidence interval for µ is [29.71, 36.29]. What does this mean?

It is very important to remember that this 90% confidence interval does NOTmean that there is a 90% chance that the true population mean µ is in thisinterval.

The population mean is a fixed constant and is either in the confidence intervalor it is not.

Interpreting Confidence Intervals

The correct interpretation is that over a large number of repeated samples,

approximately 90% of all intervals of the form X ± 1.645

s√ N 

will include µ,

the true population mean.

Although we do not know whether the particular interval [29.71, 36.29] that wehave calculated from our sample contains µ, the procedure that generated ityields intervals that do capture the true mean in approximately 90% of allinstances where the procedure is used.

This is why we sometimes say that we are “90% confident” that the interval

contains the target parameter.

Large-Sample Confidence Intervals: Mean

What if we wanted a confidence coefficient of  1− α = 0.95 i.e. a 95%confidence interval?

If we use a standard normal distribution table, we can find thatz α/2 = z 0.025 = 1.96.

Thus, the confidence intervals are

X − z α/2

s√ N 

= 33 − 1.96

16

8

= 29.08

X + z α/2

s√ N 

= 33 + 1.96

16

8

= 36.92

In other words, our 95% confidence interval for µ is [29.08, 36.92].

Notes

Notes

Notes

Page 27: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 27/51

Large-Sample Confidence Intervals: Mean

What if we wanted a confidence coefficient of  1− α = 0.99 i.e. a 99%confidence interval?

If we use a standard normal distribution table, we can find thatz α/2 = z 0.005 = 2.58.

Thus, the confidence intervals are

X − z α/2

s√ N 

= 33 − 2.58

16

8

= 27.84

X + z α/2

s√ N 

= 33 + 2.58

16

8

= 38.16

In other words, our 99% confidence interval for µ is [27.84, 38.16].

Large-Sample Confidence Intervals: Proportions

As we noted previously, for π sufficiently different from either zero or one, andN  sufficiently large, the sampling distribution of  P  is N (π, σ2

P ).

That means that we can calculate confidence intervals for an estimatedproportion as

P L = P − z α/2

 P (1 − P )

and

P U  = P  + z α/2

 P (1− P )

Large-Sample Confidence Intervals: Proportions

Example : Suppose that we have a sample of size 20, and P  = 0.390. Thelower bound of the associated 95% confidence interval is

πL = 0.390 − 1.96

 0.39(0.61)

20

= 0.390 − 0.214

= 0.176

while the upper bound is

πU  = 0.390 + 1.96

 0.39(0.61)

20

= 0.390 + 0.214

= 0.604

Notes

Notes

Notes

Page 28: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 28/51

Large-Sample Confidence Intervals: Proportions

Figure: Confidence Intervals for P  = π for N  = 20 (black dashes), N  = 100(red dashes), and N  = 400 (green dashes)

0.0 0.2 0.4 0.6 0.8 1.0

        0  .

        0

        0  .

        2

        0  .        4

        0  .

        6

        0  .

        8

        1  .

        0

π

     π

   −

         h      a              t

The confidence interval for a proportion is a straightforward function of twoquantities: the estimated proportion P  = π, and the sample size N .

Large-Sample Confidence Intervals: Mean

To go to ConfidenceIntervalP under Estimation to illustrate how confidenceintervals work, click here

Difference in Proportions

Example (Difference in Proportions): Two brands of refrigerators, A and B, areeach guaranteed for 1 year. In a random sample of 50 refrigerators of brand A,12 were observed to fail before the guarantee period ended. An independentrandom sample of 60 brand B refrigerators also revealed 12 failures during theguarantee period. Estimate the true difference (π1 − π2) between proportionsof failures during the guarantee period with a confidence coefficientapproximately 0.98.

The confidence interval

θ ± z α/2σθ

has the form

(P 1 − P 2)± z α/2 

P 1(1− P 1)N 1+ P 2(1− P 2)N 2

Notes

Notes

Notes

Page 29: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 29/51

Difference in Proportions

We have P 1 = 0.24, 1 − P 1 = 0.76, P 2 = 0.20, and 1− P 2 = 0.80, andz 0.01 = 2.33.

Thus, the desired 98% confidence interval is

(0.24 − 0.20) ± 2.33 (0.24)(0.76)

50 +

(0.20)(0.80)

60

0.04 ± 0.1851 or [−0.1451, 0.2251]

Difference in Proportions

The 98% confidence interval is [-0.1451, 0.2251].

Notice that the confidence interval contains 0. Thus, a zero value for thedifference in proportions is “believable” (at approximately the 98% level) onthe basis of the observed data.

But of course  the interval also contains the value 0.1, and so 0.1 represents

another value for the difference in proportions that is “believable” etc.

Selecting the Sample Size

There are two considerations in choosing the appropriate sample size forestimating µ using a confidence interval.

1 The tolerable error. This establishes the desired width of the confidenceinterval.

2 The confidence level that should be selected.

A wide confidence interval would not be very informative, but the cost of obtaining a narrow confidence interval could be quite large.

Similarly, too low a confidence level would mean that the stated confidence

interval is likely to be in error, but obtaining a higher level of confidence might

be quite expensive.

Notes

Notes

Notes

Page 30: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 30/51

Selecting the Sample Size

Suppose we wish to estimate the average daily yield µ of a chemical and wewish the error of estimation to be less than 5 tons with probability 0.95.

Because approximately 95% of the sample means will lie within 2σX (really1.96σX ) of  µ in repeated sampling, we are asking that 2σX equals 5 tons.

√ N  = 5

N  =4σ2

25

Selecting the Sample Size

We cannot obtain an exact numerical value of  N  unless the populationstandard deviation σ is known.

We could use an estimate s obtained from a previous sample. Let’s say thatσ = 21 .

N  =(4)(21)2

25= 70.56 = 71

Thus, using a sample size N  = 71 , we can be 95% confidence that our

estimate will lie within 5 tons of the true average daily yield.

Small-Sample Confidence Intervals

The formula for calculating large-sample confidence intervals is

θ ± z α/2σθ

When θ = µ is the target parameter, then θ = X  and σ2θ

= σ2

N  , where σ2 is thepopulation variance.

If the true value of  σ2 is known, then this value should be used whencalculating the confidence interval.

However, if  σ2 is unknown (as will almost always be the case) and N  is large,then there is no real loss of accuracy if  s2 is substituted for σ2 (recall that s2

converges to σ2 as N  increases).

As a result, we can use the standard normal distribution in these circumstancesas well.

Notes

Notes

Notes

Page 31: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 31/51

Small-Sample Confidence Intervals

Problems only arise if  σ2 is unknown AND N  is small.

In this case, we will need to calculate small-sample confidence intervals.

In effect, using s instead of  σ introduces an additional source of unreliability

into our calculations and we must, therefore, widen the confidence intervals.

Small-Sample Confidence Intervals: Mean

In terms of the population mean, we have already seen that

Z  =X − µ

σ/√ 

possessed approximately a standard normal distribution.

Well, if we substitute in s for σ we have

T  =X − µ

s/√ 

which has a t distribution with (N − 1) degrees of freedom.

Small-Sample Confidence Intervals: Mean

The quantity T  now serves as a pivotal quantity that we will use to formconfidence intervals for µ.

We can use a t distribution table to find values −tN −1,α/2 and tN −1,α/2 so that

P (−tN −1,α/2 ≤ µ ≤ tN −1,α/2) = 1− α

Thus, we will now construct our confidence intervals according to:

[X L, X U ] = X ± tN −1,α/2

s√ N 

Notes

Notes

Notes

Page 32: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 32/51

Student’s t-Distribution

Figure: Standard Normal and Student’s t-Distributions

 

The confidence intervals constructed using the z  distribution and the t

distribution are effectively the same when the degrees of freedom ( N − 1) are

greater than 120; they are also very close as soon as the degrees of freedom

(N − 1) are greater than 30.

Student’s t-Distribution

To go to Comparison of Student’s t and Normal Distributions underDistributions Related to the Normal, click here

Small-Sample Confidence Intervals: Mean

Technically, the small-sample confidence intervals for the mean are based onthe assumption that the sample is randomly drawn from a normal  population.

However, experimental evidence has shown that the interval for a single mean

is quite robust in relation to moderate departures from normality.

Notes

Notes

Notes

Page 33: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 33/51

Small-Sample Confidence Intervals: Mean

Example : A manufacturer of gunpowder has developed a new powder, whichwas tested in eight shells. The resulting muzzle velocities were: 3005, 2925,2935, 2965, 2995, 3005, 2937, 2905. Find a 95% confidence interval for thetrue average velocity µ for shells of this type. Assume that muzzle velocitiesare approximately normally distributed.

The confidence interval for µ is

X ± tN −1,α/2

s√ N 

Small-Sample Confidence Intervals: Mean

For the given data, X  = 2959 and s = 39 .1.

Using the table for the t-distribution, we have t7,0.025 = 2.365.

Thus, we have

2959 ± 2.365

39.1√ 

8

or 2959 ± 32.7

as the observed confidence interval for µ.

Small-Sample Confidence Intervals: Mean

. sum muzzle_velocity

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- muzzle_vel~y | 8 2959 39.08964 2905 3005

. ci muzzle_velocity, level(95)

Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+--------------------------------------------------------------- muzzle_vel~y | 8 2959 13.82027 2926.32 2991.68

. ci muzzle_velocity, level(99)

Variable | Obs Mean Std. Err. [99% Conf. Interval]

-------------+--------------------------------------------------------------- muzzle_vel~y | 8 2959 13.82027 2910.636 3007.364

Notes

Notes

Notes

Page 34: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 34/51

Small-Sample Confidence Intervals: Mean

Example : From a large class, a random sample of 4 grades was drawn: 64, 66,89, and 77. Calculate a 95% confidence interval for the whole class mean µ.Assume that the class grades are approximately normally distributed.

Table: Small-Sample Confidence Interval for a Mean

X  (X − X ) (X − X )2

64 -10 10066 -8 6489 15 22577 3 9

X  = 2964

= 74 0 s2 = 3983

= 132.7

Small-Sample Confidence Intervals: Mean

The confidence interval for µ is

X ± tN −1,α/2

s√ N 

For the given data, X  = 74 and s =√ 

132.7.

In this example, we have N − 1 = 3 degrees of freedom.

Using the table for the t distribution, we have t3,0.025 = 3.18.

Small-Sample Confidence Intervals: Mean

Thus, we have

74± 3.18

√ 132.7√ 

4

or 74 ± 18

as the observed confidence interval for µ.

That is, with 95% confidence, we can conclude that the mean grade of the

whole class is between 56 and 92.

Notes

Notes

Notes

Page 35: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 35/51

Difference in Means

Suppose we are interested in comparing the means of two normal  populations,one with mean µ1 and variance σ2

1 and the other with mean µ2 and σ22 .

If the samples are independent , then confidence intervals for µ1 − µ2 based ona t-distributed random variable can be constructed if we assume that the twopopulations have a common but unknown variance, σ2

1 = σ22 = σ2.

If  X 1 and X 2 are the two sample means, then the large-sample confidenceinterval for (µ1 − µ2) is developed by using

Z  =(X 1 − X 2)− (µ1 − µ2) 

σ21N 1

+σ22N 2

as a pivotal quantity.

Small-Sample Confidence Intervals: Difference in Means

Using the assumption σ21 = σ2

2 = σ2,

Z  =(X 1 − X 2)− (µ1 − µ2)

σ 

1N 1

+ 1N 2

Because σ is unknown, though, we need to find an estimator for the commonvariance σ2 so that we can construct a quantity with a t distribution.

Small-Sample Confidence Intervals: Difference in Means

Let X 11, X 12, . . . , X  1N 1 denote the random sample of size N 1 from the firstpopulation and let X 21, X 22, . . . , X  2N 2 denote an independent random sampleof size N 2 from the second population. Then we have

X 1 =1

N 1

N i=1

X 1i

and

X 2 =1

N 2

N i=1

X 2i

Notes

Notes

Notes

Page 36: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 36/51

Difference in Means

The usual unbiased estimator of the common variance σ2 is obtained bypooling the sample data to obtain the pooled estimator s2

p:

s2p =

N 1i=1(X 1i − X 1)2 +

N 2i=1(X 2i − X 2)2

(N 1 − 1) + (N 2 − 1)

=(N 1 − 1)s2

1 + (N 2 − 1)s22

N 1 + N 2 − 2

where s2i is the sample variance from the ith sample, i = 1, 2.

Notice that if  N 1 = N 2, then s2p is just the average of  s2

1 and s22.

If  N 1 = N 2, then s2p is the weighted average of  s2

1 and s22, with larger weight

given to the sample variance associated with the larger sample size.

Difference in Means

From all of this we can calculate the following pivotal quantity:

T  =(X 1 − X 2)− (µ1 − µ2)

sp

 1

N 1+ 1

N 2

This quantity has a t distribution with (N 1 + N 2 − 2) degrees of freedom.

If we use the pivotal method, we find that the small-sample confidence interval

for (µ1 − µ2) is just

(X 1 − X 2)± tN 1+N 2−2,α/2 × sp

 1

N 1+

1

N 2

Difference in Means

Technically, the small-sample confidence intervals for the difference in twomeans are based on the assumptions that the samples are randomly drawn fromtwo independent  and normal  populations with equal variances .

Experimental evidence has shown that these intervals are robust to moderatedepartures from normality and to the assumption of equal population variancesif  N 1 ≈ N 2.

As N 1 and N 2 become dissimilar, the assumption of equal population variancesbecomes more crucial.

Notes

Notes

Notes

Page 37: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 37/51

Difference in Means

Example : Suppose we want to compare two methods for training people. Atthe end of the training, two groups of nine employees are timed at some task.The nine people who had the standard training had times of 32, 37, 35, 28, 41,44, 35, 31, and 34. The nine people who had the new training had times of 35,31, 29, 25, 34, 40, 27, 32, and 31. Estimate the true mean difference (µ1 − µ2)with confidence coefficient 0.95.

Assume that the assembly times are approximately normally distributed, thatthe variances of the assembly times are approximately equal for the twomethods, and that the samples are independent.

Difference in Means

For the standard training method, we have sample mean X 1 = 35.22 and

sample variance s21 =

9i=1(X1i−X1)

N −1= 195.56

8= 24 .445.

For the new training method, we have sample mean X 2 = 31.56 and sample

variance s22 =

9i=1(X2i−X2)

N −1 = 160.228 = 20 .027.

As a result, we have

s

2

p =

8(24.445) + 8(20.027)

9 + 9 − 2 =

195.56 + 160.22

16 = 22.236

sp = 4.716

Difference in Means

Since t16,0.025 = 2.120, the observed confidence interval is

(X 1 − X 2)± tN 1+N 2−2,α/2sp

 1

N 1+

1

N 2

(35.22 − 31.56 ± (2.120)(4.716)

 1

9+

1

9

3.66 ± 4.71

This confidence interval can be written as [-1.05, 8.37].

Since the interval contains both positive and negative numbers, we cannot saythat the new training method differs from the other at our given level of 

confidence.

Notes

Notes

Notes

Page 38: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 38/51

Difference in Means

Example : From a large class, a sample of 4 grades were drawn: 64, 66, 89, and77. From a second large class, an independent sample of 3 grades were drawn:56, 71, and 53. Calculate the 95% confidence interval for the differencebetween the two class means, µ1 − µ2. Assume that the grades from bothclasses are approximately normally distributed and that the variances of thegrades are approximately equal for the two classes.

Difference in Means

Table: Difference in Two Means (Independent Samples): Class 1

X1 (X1 − X1) (X1 − X1)2

64 -10 10066 -8 6489 15 22577 3 9

X1 = 2964

= 74 0 s21 = 3983

= 132.7

Table: Difference in Two Means (Independent Samples): Class 2

X2 (X2 − X2) (X2 − X2)2

56 -4 1671 11 12153 -7 49

X2 = 1803

= 60 0 s22 = 1862

= 93

Difference in Means

Class 1: The sample mean is X 1 = 74 and the sample variance is

s21 =

4i=1(X1i−X1)

N −1= 398

3= 132.7.

Class 2: The sample mean is X 2 = 31.56 and the sample variance is

s22 =

3i=1(X2i−X2)

N −1 = 1862 = 93 .

s2p =

3(132.7) + 2(93

4 + 3 − 2=

398 + 186

5= 117

sp =√ 

117

Notes

Notes

Notes

Page 39: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 39/51

Difference in Means

We can find that t5,0.025 = 2.57. The observed confidence interval is therefore

(X 1 − X 2)± tN 1+N 2−2,α/2sp

 1

N 1+

1

N 2

(74 − 60 ± (2.57)√ 117 

14

+ 13

14 ± 21

This confidence interval can be written as [-7, 35].

Since the interval contains both positive and negative numbers, we cannot say

that the mean grades differ from one class to the other at our given level of 

confidence.

Difference in Means (Dependent or Matched Samples)

We might also want to compare means across dependent  samples. Dependentsamples are sometimes called matched  or paired samples .

Suppose that we want to compare the fall grades and spring grades for thesame  students.

Table: Difference in Two Means (Dependent Samples)

Observed Grades Difference

Name X1 X2 D = X1 −X2 D − D (D − D)2

Trimble 64 57 7 -4 16Wilde 66 57 9 -2 4

Giannos 89 73 16 5 25Ames 77 65 12 1 1

D = 444 = 11 0 s2D = 46

3 = 15 .3

Difference in Means (Dependent or Matched Samples)

We can use the sample D to construct a confidence interval for the averagepopulation difference .

The confidence interval for in a matched pair sample is

= D ± tN −1,α/2

sD√ 

Suppose we want to construct a 95% confidence interval.

Notes

Notes

Notes

Page 40: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 40/51

Difference in Means (Dependent or Matched Samples)

D = 11 and sD =√ 

15.3.

Using the table for the t distribution, we have t3,0.025 = 3.18.

Thus, we have

11 ± 3.18√ 15.3

√ 4

or 11 ± 6

as the observed confidence interval for .

Difference in Means (Dependent or Matched Samples)

We are estimating the same parameter (the difference in two populationmeans) with the dependent samples as we did with the independent samples.

The matched-pair approach is much better because it has a smaller confidenceinterval. Why?

Independent samples confidence interval was [-7, 35].

Dependent samples confidence interval for the same data was just [5, 17].

Difference in Means (Dependent or Matched Samples)

We are estimating the same parameter (the difference in two populationmeans) with the dependent samples as we did with the independent samples.

The matched-pair approach is much better because it has a smaller confidenceinterval. Why?

Independent samples confidence interval was [-7, 35].

Dependent samples confidence interval for the same data was just [5, 17].

Essentially, pairing achieves a match that keeps many of the extraneous

variables that might affect our results constant.

Notes

Notes

Notes

Page 41: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 41/51

Overview

Example : To measure the effect of a fitness campaign, a ski club randomlysampled five members before the campaign and another 5 afterwards. Theweights were as follows:

Before: JH 168, KL 195, MM 155, TR 183, MT 169

After: LW 183, VG 177, EP 148, JC 162, MW 180

Calculate a 95% confidence interval for (i) the mean weight before thecampaign, (ii) the mean weight after the campaign, and (iii) the mean weightloss during the campaign.

Overview

Table: Small-Sample Confidence Interval for Difference in Two Means(Independent Samples)

Before AfterX1 (X1 − X1) (X1 − X1)2 X2 (X2 − x2) (X2 − X2)2

168 -6 36 183 13 169195 21 441 177 7 49155 -19 361 148 -22 484183 9 81 162 -8 64169 -5 25 180 10 100

X1 = 8705 = 174 0 944 X2 = 850

5 = 170 0 866

Overview

µ1 = 174± 2.78

 944/4√ 

5= 174± 19

µ2 = 170± 2.78

 866/4√ 

5= 170± 18

µ1 − µ2 = 174− 170 ± 2.31

 944 + 866

4 + 4× 

1

5+

1

5= 4 ± 22

Notes

Notes

Notes

Page 42: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 42/51

Overview

It was then decided that a better sampling design would be to measure thesame people after, as before.

KL 194, MT 160, TR 177, MM 147, JH 157

Table: Difference in Two Means (Dependent Samples)

Weights Difference

Name X1 X2 D = X1 −X2 D − D (D − D)2

JH 168 157 11 4 16KL 195 194 1 -6 36

MM 155 147 8 1 1TR 183 177 6 -1 1MT 169 160 9 2 4

D = 355

= 7 0 s2D = 584

= 14 .5

= 7 ± 2.78

√ 14.5√ 

5

or 7 ± 5

Confidence Interval for Population Variance σ2

As we’ve seen before, s2 =N i=1(Xi−X)

N −1is an unbiased estimator of  σ2.

Given that it is distributed according to a gamma distribution, it can bedifficult to determine the probability of it lying in a specific interval.

But we can transform it (as we did before) into a quantity that has a χ2

distribution with N − 1 degrees of freedom.

χ2

=

(N 

−1)s2

σ2

Confidence Interval for Population Variance σ2

As we saw before, this can be written as:

χ2N −1 =

(N − 1)s2

σ2=

N i=1(X i − X )

σ2

This quantity becomes the pivotal quantity that allows us to calculateconfidence intervals for the population variance σ2.

In effect, we want to find two numbers χ2L and χ2

U  such that

P χ2L ≤

N i=1(X i − X )

σ2≤ χ2

U  = 1 − α

for any confidence coefficient (1− α).

Notes

Notes

Notes

Page 43: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 43/51

Confidence Interval for Population Variance σ2

Figure: χ2 Distribution with (N − 1) = 3 Degrees of Freedom

 

n-1 = 3 degrees of freedom

We would like to find the shortest interval that includes σ2 with probability(1 − α). This is difficult and requires trial and error.

Typically, we compromise by choosing points that cut off equal tail areas.

Confidence Interval for Population Variance σ2

Given the choice to cut off equal tail areas, we obtain

χ2

N −1,1−(α/2) ≤N 

i=1(X i − X )

σ2≤ χ2

N −1,α/2

= 1 − α

When we reorder the inequality, we get

(N − 1)s2

χ2N −

1,α/2

≤ σ2 ≤ (N − 1)s2

χ2N −

1,1−

(α/2)

= 1 − α

Thus, the 100(1 − α)% confidence interval for σ2 is(N − 1)s2

χ2N −1,α/2

,(N − 1)s2

χ2N −1,1−(α/2)

Confidence Interval for Population Variance σ2

Technically, the small-sample confidence intervals for the population varianceσ2 assume that the sampled population is normally distributed.

Unlike with small-sample confidence intervals for the population mean ordifference in population means, which are reasonably robust to deviations fromnormality, experimental evidence suggests that the small-sample confidenceintervals for the population variance can be quite misleading if the sampledpopulation is not normally distributed.

Notes

Notes

Notes

Page 44: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 44/51

Confidence Interval for Population Variance σ2

Example : Suppose we have a sample of observations with values 4.1, 5.2, and10.2. Estimate σ2 with confidence coefficient 0.90. Assume normality.

For the data, we have s2 = 10 .57.

We can see from the χ2 distribution table that χ22,0.95 = 0.103 and

χ

2

2,0.05= 5

.991

.

Confidence Interval for Population Variance σ2

Thus, the 90% confidence interval for σ2 is(N − 1)s2

χ22,0.05

,(n− 1)s2

χ22,0.95

(2)(10.57)

5.991,

(2)(10.57)

0.103

[3.53, 205.24]

This confidence interval is very wide. Why?

Comparing σ2 in Two Populations

What if we want to compare σ2 in two populations?

Rather than look at s21 − s2

2, which has a complicated sampling distribution, we

look ats21s22

.

Let two independent random samples of sizes N 1 and N 2 be drawn from twonormal populations with variances σ2

1 and σ22 .

Let the variances of the random samples be s21 and s2

2.

Notes

Notes

Notes

Page 45: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 45/51

Comparing σ2 in Two Populations

Notice that(N 1−1)s21

σ21and

(n2−1)s22σ22

are both independent χ2 random variables.

The ratio of two χ2N −1 random variables is distributed according to an F 

distribution with (N 1 − 1) and (N 2 − 1) degrees of freedom, i.e.,

F N 1−

1,N 2−

1 =(N 1 − 1)s2

1

(N 1 − 1)σ21

/(N 2 − 1)s2

2

(N 2 − 1)σ22

=s2

1σ22

s22σ21

that has an F  distribution with (N 1 − 1) numerator degrees of freedom and(N 2 − 1) denominator degrees of freedom.

This quantity acts as a pivotal quantity.

Comparing σ2 in Two Populations

And so, we want to find:

F N 1−1,N 2−1,α/2 ≤ s2

1σ22

s22σ2

1

≤ F N 1−1,N 2−1,1−(α/2)

= 1 − α

When we reorder the inequalities, we have

1

F N 1−1,N 2−1,1−(α/2)× s2

1

s22

≤ σ21

σ22

≤ 1

F N 1−1,n2−1,α/2× s2

1

s22

Thus, if we were to construct a 90% confidence interval for the ratio of twopopulation variances based on two sample variances where N 1 = 10 andN 2 = 8, we would have

1

F 9,7,0.95× s2

1

s22

≤ σ21

σ22

≤ 1

F 9,7,0.05× s2

1

s22

Comparing σ2 in Two Populations

But how do you find 1F 9,7,0.95

and 1F 9,7,0.05

?

Most F  distribution tables will only give you information related to theright-hand tail of the distribution.

Thus, it is relatively straightforward to find that 1F 9,7,0.95

= 13.68 .

But how do we find what 1F 9,7,0.05

is?

Notes

Notes

Notes

Page 46: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 46/51

Comparing σ2 in Two Populations

Recall from our discussion of probability distributions that if  W 1 and W 2 areindependent and ∼ χ2

k and χ2 , respectively, then

W 1W 2

∼ F k,

In other words, the ratio of two chi-squared variables is distributed as F  withd.f. equal to the number of d.f. in the numerator and denominator variables.

We saw that this implied that:

If  X ∼ F (k, ), then 1X∼ F (, k) (because 1

X= 1

(W 1/W 2)= W 2

W 1).

Comparing σ2 in Two Populations

Well, it follows from this that:

F N 2−1,N 1−1,1−(α/2) =1

F N 1−1,N 2−1,α/2

Given this, we have

F 9,7,0.05 =1

F 7,9,0.95=

1

3.29= 0.3

As a result, we can write the 90% confidence interval as

1

3.68× s2

1

s22

≤ σ21

σ22

≤ 1

0.3× s2

1

s22

Comparing σ2 in Two Populations

Example : Two samples of sizes 16 and 10 are drawn at random from twonormal populations. Suppose their sample variances are 25.2 and 20respectively. Find the (i) 98% and (ii) 90% confidence limits for the ratio of the variances.

We need to calculate 1F 15,9,0.99

and 1F 15,9,0.01

.

Looking at the back of the book, we find that F 15,9,0.99 = 4.96.

We also know that F 15,9,0.01 = 1F 9,15,0.99

= 13.89 .

Notes

Notes

Notes

Page 47: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 47/51

Comparing σ2 in Two Populations

Given this, we find for the 98% confidence interval that we have

1

4.96× 25.2

20.0≤ σ2

1

σ22

≤ 3.89× 25.2

20.0

0.283≤

σ21

σ22 ≤4.90

Comparing σ2 in Two Populations

We now need to find 1F 15,9,0.95

and 1F 15,9,0.05

.

Looking at the back of the book, we find that F 15,9,0.95 = 3.01.

We also know that F 15,9,0.05 = 1F 9,15,0.95

= 12.59 .

Given this, we find for the 90% confidence interval that we have

1

3.01× 25.2

20.0≤ σ2

1

σ22

≤ 2.59× 25.2

20.0

0.4186 ≤ σ21

σ22

≤ 3.263

Comparing σ2 in Two Populations

Example : Find the 98% and 90% confidence limits for the ratio of the standarddeviations in the previous example.

By taking square roots of the inequalities in the previous example, we find the98% confidence limits are

√ 0.283 ≤ σ1

σ2≤√ 

4.90

0.53 ≤ σ1

σ2≤ 2.21

and that the 90% confidence limits are

√ 0.4186 ≤ σ1

σ2≤ √ 3.263

0.65 ≤ σ1

σ2≤ 1.81

Notes

Notes

Notes

Page 48: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 48/51

Stata

Let’s return to Zorn’s Warren & Burger Court data from last time.

One variable – constit – was coded one if the cases was decided onconstitutional grounds, and zero otherwise.

The true “population” proportion is π = 0.2536; we’ll use this as an example of 

how we can learn about that parameter through the use of confidence intervals.

We’ll begin by considering a rather small random sample of cases (N  = 20),and calculating the confidence interval for π based on that sample

Stata

. use WarrenBurger

. su

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

us | 0id | 7161 3581 2067.347 1 7161

amrev | 7161 .4319229 1.342633 0 33amaff | 7161 .4099986 1.302139 0 37sumam | 7161 .8419215 2.189712 0 39

-------------+--------------------------------------------------------fedpet | 7161 .173998 .3791343 0 1

constit | 7161 .2535959 .4350993 0 1sgam | 7161 .0786203 .269164 0 1

. sample 20, count(7141 observations deleted)

Stata

. ci constit, level(95)

Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+---------------------------------------------------------------

constit | 20 .2 .0917663 .0079309 .3920691

The confidence interval for this sample is [0.008, 0.392], which means that inrepeated random samples from this population, we would expect the “true”population parameter to be contained in an interval constructed in this way95% of the time.

Notes

Notes

Notes

Page 49: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 49/51

Stata

To illustrate the idea of a confidence interval, we can do what we just did, say,100 times, and then see how many of the resulting confidence intervals containthe true population value 0.2536.

program define CI20, rclassversion 10use WarrenBurger, clear

sample 20, counttempvar zgen ‘z’ = constitsummarize ‘z’return scalar mean=r(mean)return scalar ub=r(mean) + 1.96*sqrt((r(mean) * (1-r(mean)))/20)return scalar lb=r(mean) - 1.96*sqrt((r(mean) * (1-r(mean)))/20)end

. set seed 11101968

. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI20, nodots

Stata

. su

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

pihat | 100 .2335 . 0901892 .05 .5ub | 100 .4125235 .1171615 .1455186 .7191347lb | 10 0 .0 54 47 65 . 06 41 94 1 -. 04 55 18 6 . 28 08 65 3

. tab pihat

r(mean) | Freq. Percent Cum.------------+-----------------------------------

.05 | 4 4.00 4.00.1 | 7 7.00 11.00

.15 | 14 14.00 25.00.2 | 24 24.00 49.00

.25 | 21 21.00 70.00.3 | 10 10.00 80.00

.35 | 16 16.00 96.00.4 | 3 3.00 99.00.5 | 1 1.00 100.00

------------+-----------------------------------Total | 100 100.00

Stata

Figure: 100 Confidence Intervals for πconstit for N  = 200

   0

0

   1

1

   2

2

   3

3

   4

4Density of Estimates of pi)

   (   D  e  n  s   i   t  y  o   f   E  s   t   i  m  a   t  e  s  o   f  p   i   )

(Density of Estimates of pi)

   0

02

 .   2

.24

 .   4

.46

 .   6

.68

 .   8

.8I Range

   C   I   R  a  n  g  e

CI Range

0

01

.1

.12

.2

.23

.3

.34

.4

.45

.5

.5alue of Pi

Value of Pi

Value of Pi

Note that we have four observations with π = 0.05, seven with π = 0.10, andone with π = 0.50, all of which have calculated confidence intervals that do notinclude the “true” value π = 0.2536. That’s 12/100, or α = 0.12, which isquite different from α = 0.05.

Notes

Notes

Notes

Page 50: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 50/51

Stata

We can modify the code slightly to do the same thing with 100 samples of N  = 100:

program define CI100, rclassversion 10use WarrenBurger, clearsample 100, count

tempvar zgen ‘z’ = constitsummarize ‘z’return scalar mean=r(mean)return scalar ub=r(mean) + 1.96*sqrt((r(mean) * (1-r(mean)))/100)return scalar lb=r(mean) - 1.96*sqrt((r(mean) * (1-r(mean)))/100)end

. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI100, nodots

Stata

. tab pihat

r(mean) | Freq. Percent Cum.------------+-----------------------------------

.16 | 3 3.00 3.00

.17 | 1 1.00 4.00

.18 | 2 2.00 6.00

.19 | 3 3.00 9.00.2 | 5 5.00 14.00

.21 | 7 7.00 21.00

.22 | 6 6.00 27.00

.23 | 10 10.00 37.00

.24 | 13 13.00 50.00

.25 | 14 14.00 64.00

.26 | 8 8.00 72.00

.27 | 8 8.00 80.00

.28 | 4 4.00 84.00

.29 | 1 1.00 85.00.3 | 7 7.00 92.00

.31 | 3 3.00 95.00

.32 | 2 2.00 97.00

.33 | 1 1.00 98.00

.35 | 1 1.00 99.00

.37 | 1 1.00 100.00------------+-----------------------------------

Total | 100 100.00

Stata

Figure: 100 Confidence Intervals for πconstit for N  = 1000

   0

0

   5

50

   1   0

10Density of Estimates of pi)

   (   D  e  n  s   i   t  y  o   f   E  s   t   i  m  a   t  e  s  o   f  p   i   )

(Density of Estimates of pi)1

 .   1

.12

 .   2

.23

 .   3

.34

 .   4

.45

 .   5

.5I Range

   C   I   R  a  n  g  e

CI Range15

.15

.152

.2

.225

.25

.253

.3

.335

.35

.35alue of Pi

Value of Pi

Value of Pi

Only six samples (three with π = 0.16, one with π = 0.17, one with π = 0.35, and

one with π = 0.37) out of 100 whose confidence intervals do not include π now.

That’s much closer to α = 0.05, as we expect it to be. What we say here is that the

“coverage probabilities” are getting better as the size of the sample increases.

Notes

Notes

Notes

Page 51: Lecture8 Handout

7/29/2019 Lecture8 Handout

http://slidepdf.com/reader/full/lecture8-handout 51/51

Stata

If we do the same for 100 samples each with N  = 400, the coverage gets evenbetter:

. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI400, nodots

command: CI400, nodotspihat: r(mean)

ub: r(ub)lb: r(lb)

Simulations (100)----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5.................................................. 50.................................................. 100

. su

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

pihat | 100 .24975 .0213984 .2 .3025ub | 1 00 . 29 21 02 6 . 022 60 69 . 23 92 . 3 47 515 4lb | 1 00 . 20 73 97 4 . 020 19 02 . 16 08 . 2 57 484 6

Stata

Figure: 100 Confidence Intervals for πconstit for N  = 4000

   0

0

   5

50

   1   0

105

   1   5

150

   2   0

20Density of Estimates of pi)

   (   D  e  n  s   i   t  y  o   f   E  s   t   i  m  a   t  e  s  o   f  p   i   )

(Density of Estimates of pi)15

 .   1   5

.152

 .   2

.225

 .   2   5

.253

 .   3

.335

 .   3   5

.35I Range

   C   I   R  a  n  g  e

CI Range2

.2

.222

.22

.2224

.24

.2426

.26

.2628

.28

.283

.3

.3alue of Pi

Value of Pi

Value of Pi

While they are not shown here the coverage probability is more-or-less perfect(96/100). In each figure, the density plot overlaid shows the distribution of estimatedmeans (πs). Note that they look both increasingly Normal, and that their range /standard deviation declines, as the sample sizes increase.

Stata

This is the code for the figures.

. twoway (scatter pihat pihat, mcolor(black) msymbol(circle))(rcap ub lb pihat, lcolor(black) lwidth(vthin) msize(small))(kdensity pihat, yaxis(2) lcolor(gs8) lpattern(dash)),ytitle(CI Range) yline(.2536, lpattern(longdash) lcolor(cranberry))ytitle((Density of Estimates of pi), axis(2))xtitle(Value of Pi) legend(off)aspectratio(1, placement(center))

Notes

Notes

Notes