chapters 8 - 9

Chapters 8 - 9Chapters 8 - 9

EstimationEstimationMat og metlarMat og metlar

©

Estimator and EstimateEstimator and EstimateMetill og matMetill og mat

An estimatorestimator of a population parameter is a random variable that depends on the sample information and whose value provides approximations to this unknown parameter. A specific value of that random variable is called an estimateestimate.

Metill fyrir þýðisstika er hending sem er háð úrtaksupplýsingum og gildi metilsins sem kallast mat gefur nálgun á hinn óþekkta þýðisstika.

Point Estimator and Point Estimate Point Estimator and Point Estimate Punktmetill og punktmatPunktmetill og punktmat

Let represent a population parameter (such as the population mean or the population proportion ). A point estimator, , of a population parameter, , is a function of the sample information that yields a single number called a point estimate. For example, the sample mean, , is a point estimator of the population mean , and the value that assumes for a given set of data is called the point estimate.

θ

XX

Þýðisstiki (population parameter)

θ

X

UnbiasednessUnbiasedness Óhneigður (óbjagaður)Óhneigður (óbjagaður)

The point estimator is said to be an unbiased estimator of the parameter if the expected value, or mean, of the sampling distribution of is ; that is,

θ

)ˆ(E

θ

Punktmetill er sagður óhneigður metill fyrir stikann ef vongildi líkindadreifingar úrtaks fyrir er ; þ.e.,

θ

)ˆ(E

θ

Probability Density Functions for unbiased Probability Density Functions for unbiased and Biased Estimatorsand Biased Estimators

Þéttifall fyrir hneigðan og óhneigðan metilÞéttifall fyrir hneigðan og óhneigðan metil(Figure 8.1)(Figure 8.1)

1 2

Bias Bias Bjögun (skekkja)Bjögun (skekkja)

Let be an estimator of . The bias in is defined as the difference between its mean and ; that is

It follows that the bias of an unbiased estimator is 0.

θ

)ˆ()ˆ( EBias

θ

Látum vera metil fyrir . Bjögun í er skilgreind sem mismunur milli vongildis metilsins og ; þ.e.

Samkvæmt þessu er bjögun (bias) fyrir óhneigðan metil 0.

θ

)ˆ()ˆ( EBias

θ

Most Efficient Estimator and Relative Most Efficient Estimator and Relative EfficiencyEfficiency

Skilvirkasti metillinn og hlutfallsleg Skilvirkasti metillinn og hlutfallsleg skilvirkniskilvirkni

Suppose there are several unbiased estimators of . Then the unbiased estimator with the smallest variance is said to be the most efficient most efficient estimatorestimator or to be the minimum variance minimum variance unbiased estimatorunbiased estimator of . Let and be two unbiased estimators of , based on the same number of sample observations. Then,

a) is said to be more efficient than if b) The relative efficiency of with respect to is

the ratio of their variances; that is, hlutfallsleg skilvirkni

1θ 2θ

1θ 2θ )ˆ()ˆ( 21 VarVar

1θ 2θ

)θVar(

)θVar(

1

2

ˆ

ˆ Efficiency Relative

Point Estimators of Selected Point Estimators of Selected Population ParametersPopulation Parameters

(Table 8.1)(Table 8.1)

Population Parameter

Point Estimato

r

Properties

Mean, X Unbiased, Most Efficient (assuming normality)

Mean, XmUnbiased (assuming normality), but not most efficient

Proportion,

p Unbiased, Most Efficient

Variance, 2

s2 Unbiased, Most Efficient (assuming normality)

Confidence Interval Confidence Interval EstimatorEstimator

Metill fyrir öryggismörkMetill fyrir öryggismörk

A confidence interval estimatorconfidence interval estimator for a population parameter is a rule for determining (based on sample information) a range, or interval that is likely to include the parameter. The corresponding estimate is called a confidence interval estimateconfidence interval estimate.Metill fyrir öryggismörk á þýðisstika er til að ákvarða (byggt á úrtaksgögnum) spönn, eða bil sem líklegt er til að ná utan um hinn sanna stika. Samsvarandi mat köllum við mat fyrir öryggismörk eða bara öryggismörk.

Confidence Interval and Confidence Confidence Interval and Confidence LevelLevel

Let be an unknown parameter. Suppose that on the basis of sample information, random variables A and B are found such that P(A < < B) = 1 - , where is any number between 0 and 1. If specific sample values of A and B are a and b, then the interval from a to b is called a 100(1 - )% confidence intervalconfidence interval of . The quantity of (1 - ) is called the confidence levelconfidence level of the interval.

If the population were repeatedly sampled a very large number of times, the true value of the parameter would be contained in 100(1 - )% of intervals calculated this way. The confidence interval calculated in this manner is written as a < < b with 100(1 - )% confidence.Látum vera óþekktan stika. Hugsum okkur á að á grunni úrtaksupplýsinga séu hendingar A og B reiknaðar þannig að P(A < < B) = 1 - , þar sem er einhver tala milli 0 og 1. Ef ákveðin gildi A og B eru a and b, þá er bilið frá a til b kallað 100(1 - )% öryggismörk fyrir . Stærðin (1 - ) er kallað öryggsstig bilsins.

Ef endurtekin úrtök væru tekin úr þýðinu mjög oft þá myndi 100(1 - )% allra þeirra bila sem reiknuð væri út innihalda hinn sanna stika . Öryggismörkin sem reiknuð eru á þennan hátt eru skrifuð sem a < < b með 100(1 - )% vissu.

P(-1.96 < Z < 1.96) = 0.95, where P(-1.96 < Z < 1.96) = 0.95, where Z is a Standard Normal VariableZ is a Standard Normal Variable

(Figure 8.3)(Figure 8.3)

0.95 = P(-1.96 < Z < 1.96)

-1.96 1.96

0.025 0.025

Notation Notation TáknmálsnotkunTáknmálsnotkun Let Z/2 be the number for which

where the random variable Z follows a standard normal distribution.

2)( 2/

ZZP

Látum Z/2 vera tölu sem

Þar sem hendingin Z fylgir staðlaðri normaldreifingu

2)( 2/

ZZP

Selected Values ZSelected Values Z/2/2 from the from the Standard Normal Distribution TableStandard Normal Distribution Table

(Table 8.2)(Table 8.2)

0.01 0.02 0.05 0.10

Z/2 2.58 2.33 1.96 1.645

Confidence Level

99% 98% 95% 90%

Confidence Intervals for the Mean of a Population Confidence Intervals for the Mean of a Population that is Normally Distributed: Population Variance that is Normally Distributed: Population Variance

KnownKnown Öryggismörk fyrir meðaltal þýðis sem er Öryggismörk fyrir meðaltal þýðis sem er

normaldreift og með þekkta dreifninormaldreift og með þekkta dreifni

Consider a random sample of n observations from a normal distribution with mean and variance 2. If the sample mean is X, then a 100(1 - )% confidence interval for the population mean confidence interval for the population mean with known variancewith known variance is given by

or equivalently,

where the margin of error (also called the sampling error, the bound, or the interval half width) is given by

n

ZX

n

ZX

2/2/

BX

nZB

2/

Basic Terminology for Confidence Interval Basic Terminology for Confidence Interval for a Population Mean with Known for a Population Mean with Known

Population VariancePopulation VarianceOrðnotkun fyrir öryggismörk þýðismeðaltals Orðnotkun fyrir öryggismörk þýðismeðaltals

með þekktri dreifnimeð þekktri dreifni(Table 8.3)(Table 8.3)

Terms Symbol

To Obtain:

Standard Error of the Mean

Z Value (also called the Reliability Factor)

Use Standard Normal Distribution Table

Margin of Error skekkjumörk

Lower Confidence Limit Neðri mörk

Upper Confidence Limit Efri mörk

Width (width is twice the bound)Breidd

X

2/Z

B

LCL

UCL

w

n/

nZB

2/

nZXLCL

2/

nZXUCL

2/

nZBw

2/22

Student’s Student’s tt Distribution DistributionGiven a random sample of n observations, with mean X and standard deviation s, from a normally distributed population with mean , the variable t follows the Student’s Student’s t t distributiondistribution with (n - 1) degrees of freedom and is given by

ns

Xt

/

Hugsum okkur slembið úrtak n athugana með úrtaksmeðaltal X og úrtaksstaðalfrávik s, úrtakið er fengið úr þýði sem er normaldreift með vongildi , breytan t er sögð fylgja Student’s Student’s t t dreifingudreifingu með (n - 1) frígráður og er gefin af

Notation Notation TáknmálsnotkunTáknmálsnotkun

A random variable having the Student’s t distribution with v degrees of freedom will be denoted tv. The tv,/2 is defined as the number for which

2/)( 2/, vv ttP

Slembin breyta sem hefur Student’s t dreifingu með v frelsisgráður verður táknuð með tv. Stærðin tv,/2 er skilgreind sem stærðin sem

Confidence Intervals for the Mean of a Normal Confidence Intervals for the Mean of a Normal Population: Population Variance Unknown Population: Population Variance Unknown

Öryggismörk fyrir vongildi í normaldreifðu þýði Öryggismörk fyrir vongildi í normaldreifðu þýði með óþekktri dreifnimeð óþekktri dreifni

Suppose there is a random sample of n observations from a normal distribution with mean and unknown variance. If the sample mean and standard deviation are, respectively, X and s, then a 100(1 - )% confidence interval for the confidence interval for the population mean, variance unknownpopulation mean, variance unknown, is given by

or equivalently,where the margin of errormargin of error, the sampling error, or bound, B, is given by

and tn-1,/2 is the number for which

The random variable tn-1 has a Student’s t distribution with v=(n-1) degrees of freedom.

n

stX

n

stX nn 2/,12/,1

BX

n

stB n 2/,1

2/)( 2/,11 nn ttP

Confidence Intervals for Population Proportion Confidence Intervals for Population Proportion (Large Samples) (Large Samples) Öryggismörk fyrir þýðishlutfallÖryggismörk fyrir þýðishlutfall

(Stór úrtök) (Stór úrtök)

Let p denote the observed proportion of “successes” in a random sample of n observations from a population with a proportion of successes. Then, if n is large enough that (n)()(1- )>9, then a 100(1 - )% confidence interval for confidence interval for the population proportionthe population proportion is given by

or equivalently,where the margin of errormargin of error, the sampling error, or bound, B, is given by

and Z/2, is the number for which a standard normal variable Z satisfies

n

ppZp

n

ppZp

)1()1(2/2/

Bp

n

ppZB

)1(2/

2/)( 2/ ZZP

Notation Notation TáknmálsnotkunTáknmálsnotkun

A random variable having the chi-square distribution with v = n-1 degrees of freedom will be denoted by 2

v or simply 2

n-1. Define as 2n-1, the number for

which )( 2,1

21 nnP

Hending með chi-square dreifingu þar sem v = n-1 frelsisgráður er táknuð með 2

v eða 2

n-1. Skilgreinum 2n-1, sem töluna sem

um gildir að

The Chi-Square DistributionThe Chi-Square Distribution(Figure 8.17)(Figure 8.17)

1 -

2n-1,0

The Chi-Square Distribution for The Chi-Square Distribution for n – 1 and (1-n – 1 and (1-)% Confidence )% Confidence

LevelLevel(Figure 8.18)(Figure 8.18)

/21 -

2n-1,/2

/2

2n-1,1- /2

Confidence Intervals for the Variance of a Normal Confidence Intervals for the Variance of a Normal Population Population Öryggismörk fyrir dreifni í Öryggismörk fyrir dreifni í

normaldreifðu þýðinormaldreifðu þýðiSuppose there is a random sample of n observations from a normally distributed population with variance 2. If the observed variance is s2 , then a 100(1 - )% confidence confidence interval for the population variance interval for the population variance is given by Hugsum okkur slembið úrtak n gagna úr normaldreifðu þýði með dreifni 2. Ef úrtaksdreifni er s2 , þá eru 100(1 - )% öryggismörk fyrir þýðisdreifni gefin semöryggismörk fyrir þýðisdreifni gefin sem

where 2n-1,/2 is the number for which

and 2n-1,1 - /2 is the number for which

And the random variable 2n-1 follows a chi-square

distribution with (n – 1) degrees of freedom. Og hendingin 2

n-1 fylgir chi-square dreifingu með (n – 1) frelsisgráður

22/1,1

22

22/,1

2 )1()1(

nn

snsn

2)( 2

2/,12

1

nnP

2)( 2

2/1,12

1

nnP

Confidence Intervals for Two Means: Matched Confidence Intervals for Two Means: Matched Pairs Pairs Öryggismörk fyrir tvö vongildi : Pör Öryggismörk fyrir tvö vongildi : Pör

(Matched Pairs)(Matched Pairs)

Suppose that there is a random sample of n matched pairs of observations from a normal distributions with means X and Y . That is, x1, x2, . . ., xn denotes the values of the observations from the population with mean X ; and y1, y2, . . ., yn the matched sampled values from the population with mean Y . Let d and sd denote the observed sample mean and standard deviation for the n differences di = xi – yi . If the population distribution of the differences is assumed to be normal, then a 100(1 - )% confidence confidence interval for the difference between meansinterval for the difference between means (d = X - Y) is given by

or equivalently,

n

std

n

std d

ndd

n 2/,12/,1

Bd

Confidence Intervals for Two Means: Confidence Intervals for Two Means: Matched PairsMatched Pairs

(continued)(continued)

Where the margin of errormargin of error, the sampling error or the bound, B, is given by

And tn-1,/2 is the number for which

The random variable tn – 1, has a Student’s t distribution with (n – 1) degrees of freedom.

2)( 2/,11

nn ttP

n

stB dn 2/,1

Confidence Intervals for Difference Between Means: Confidence Intervals for Difference Between Means: Independent Samples (Normal Distributions and Known Independent Samples (Normal Distributions and Known

Population Variances) Population Variances) Öryggismörk fyrir mismun Öryggismörk fyrir mismun

vongilda: Óháð úrtökvongilda: Óháð úrtök

Suppose that there are two independent random samples of nx and ny observations from normally distributed populations with means X and Y and variances 2

x and 2y .

If the observed sample means are X and Y, then a 100(1 - )% confidence interval for (X - Y) is given by

or equivalently,

where the margin of errormargin of error is given by

y

Y

x

XYX

y

Y

x

X

nnZYX

nnZYX

22

2/

22

2/ )()(

BYX )(

y

Y

x

X

nnZB

22

2/

Confidence Intervals for Two Means: Unknown Confidence Intervals for Two Means: Unknown Population Variances that are Assumed to be Population Variances that are Assumed to be

EqualEqual Öryggismörk fyrir mismun vongilda: Óþekkt dreifni en Öryggismörk fyrir mismun vongilda: Óþekkt dreifni en

dreifnin er eins skv. Forsendu.dreifnin er eins skv. Forsendu.

Suppose that there are two independent random samples with nx and ny observations from normallynormally distributed populations with means X and Y and a common, but unknown population variance. If the observed sample means are X and Y, and the observed sample variances are s2

X and s2

Y, then a 100(1 - )% confidence interval for (X - Y) is given by

or equivalently,


y

p

x

pnnYX

y

p

x

pnn n

s

n

stYX

n

s

n

stYX

yxyx

22

2/,2

22

2/,2 )()(

BYX )(

y

p

x

pnn n

s

n

stB

yx

22

2/,2

Confidence Intervals for Two Means: Confidence Intervals for Two Means: Unknown Population Variances that are Unknown Population Variances that are

Assumed to be EqualAssumed to be Equal(continued)(continued)

The pooled sample variancepooled sample variance, s2p, is given by

is the number for which

The random variable, T, is approximately a Student’s t distribution with nX + nY –2 degrees of freedom and T is given by,

2/,2 yx nnt2

)1()1( 222

yx

YyXxp nn

snsns

2)( 2/,22

yxyx nnnn ttP

YXp

YX

nns

YXT

11

)()(

Confidence Intervals for Two Means: Confidence Intervals for Two Means: Unknown Population Variances, Unknown Population Variances,

Assumed Not EqualAssumed Not Equal

Suppose that there are two independent random samplesindependent random samples of nx and ny observations from normallynormally distributed populations with means X and Y and it is assumed that the population variances are not equal. If the observed sample means and variances are X, Y, and s2

X , s2Y, then a 100(1 - )%

confidence interval for (X - Y) is given by


y

Y

x

XvYX

y

Y

x

Xv n

s

n

stYX

n

s

n

stYX

22

)2/,(

22

)2/,( )()(

y

Y

x

Xv n

s

n

stB

22

)2/,(

Confidence Intervals for Two Means: Confidence Intervals for Two Means: Unknown Population Variances, Assumed Unknown Population Variances, Assumed

Not EqualNot Equal(continued)(continued)

The degrees of freedom, v, is given by

If the sample sizes are equal, then the degrees of freedom reduces to

)1/()()1/()(

)]()[(

22

22

222

YY

YX

X

X

Y

Y

X

X

nns

nns

ns

ns

v

)1(2

1

2

2

2

2

n

ss

ss

v

X

Y

Y

X

Confidence Intervals for the Difference Confidence Intervals for the Difference Between Two Population Proportions (Large Between Two Population Proportions (Large

Samples) Samples) Öryggismörk fyrir mismun Öryggismörk fyrir mismun þýðishlutfalla (stór úrtök)þýðishlutfalla (stór úrtök)

Let pX, denote the observed proportion of successes in a random sample of nX observations from a population with proportion X successes, and let pY denote the proportion of successes observed in an independent random sample from a population with proportion Y successes. Then, if the sample sizes are large (generally at least forty observations in each sample), a 100(1 - )% confidence interval for confidence interval for the difference between population proportionsthe difference between population proportions (X - Y) is given by

Where the margin of error is

Bpp YX )(

Y

YY

X

XX

n

pp

n

ppZB

)1()1(2/

Sample Size for the Mean of a Normally Sample Size for the Mean of a Normally Distributed Population with Known Distributed Population with Known Population Variance Population Variance Gagnasafn fyrir Gagnasafn fyrir

vongildi normaldreifðs þýðis með þekktri vongildi normaldreifðs þýðis með þekktri þýðisdreifniþýðisdreifni

Suppose that a random sample from a normally distributed population with known variance 2 is selected. Then a 100(1 - )% confidence interval for the population mean extends a distance B (sometimes called the bound, sampling error, or the margin of error) on each side of the sample mean, if the sample sizesample size, n, is

2

222/

B

Zn

Sample Size for Population Sample Size for Population ProportionProportion

Stærð gagnasafns fyrir þýðishlutfall Stærð gagnasafns fyrir þýðishlutfall

Suppose that a random sample is selected from a population. Then a 100(1 - )% confidence interval for the population proportion, extending a distance of at most B on each side of the sample proportion, can be guaranteed if the sample sizesample size, n, is

2

22/ )(25.0

B

Zn

Key WordsKey Words Bias Bound Confidence interval:

For mean, known variance For mean, unknown

variance For proportion For two means, matched For two means, variances

equal For two means, variances

not equal For variance

Confidence Level Estimate Estimator Interval Half Width Lower Confidence Limit

(LCL) Margin of Error Minimum Variance

Unbiased Estimator Most Efficient Estimator Point Estimate Point Estimator

Key WordsKey Words(continued)(continued)

Relative Efficiency Reliability Factor Sample Size for Mean,

Known Variance Sample Size for

Proportion Sampling Error Student’s t Unbiased Estimator Upper Confidence Limit

(UCL) Width

chapters 8 - 9

Documents