lecture 3: review review of point and interval estimators statistical significance hypothesis...

Lecture 3: Review

Review of Point and Interval EstimatorsStatistical SignificanceHypothesis Testing

Point Estimates

An estimator (as opposed to an estimate) is a sample statistic that predicts a value of a parameter. For example, the sample mean and sample standard deviation are estimators.

A point estimate is a particular value of an estimator used to predict the population value. For example, an estimate of the population mean from the sample mean may be “12”.

Point Estimates

2 characteristics of point estimators are their efficiency and bias.

An efficient estimator is one which has the lowest standard error, relative to other estimators

An estimator is biased to the extent that the sampling distribution is not centred around the population parameter.

Efficiency and Bias

Sampling Distributions

Biased Estimator

Unbiased, but inefficient Estimator

Efficient Estimator

Point Estimates

Sample Mean

NYi /

nyy i /

nyy i /

Population Mean

Sample Mean as a point estimate of population mean

Point Estimates

Sample Standard Deviation

Sample Standard Deviation as estimate of Population Standard Deviation

n

yys i

2

N

Yi

2

1

ˆ2

n

yys i

Population Standard Deviation

Interval Estimates

Confidence intervals for a mean A confidence interval is an interval estimate around a mean. A confidence interval is range of values in which the mean (or other statistic, such as a proportion) has a certain probability of falling.

95% Confidence Interval of a Mean

95% Confidence Interval

95% of thearea

2.5% of thearea

Sampling Distribution of Means

x

96.1z 96.1z

Confidence Interval of a Mean

yZyIC .. Z is the z-score corresponding to a .the confidence probability;

and is an estimate of the standard error of the mean (for large samples);

s is the sample standard deviation.

n

sy y

Confidence Interval of a Mean

Example: you wish to estimate the average height for all Canadian men 18-21 from sample of 50 with mean height of 166 cm and standard deviation of 30 cm. Calculate the 95% confidence interval for the mean.

24.450

30ˆ

n

sy

173159

03.7166

)24.4(96.1166..

ˆ..

)95(

)95(

IC

ZyIC y

We are 95% confident that the true population mean falls in the interval from 159cm to 173cm.

Point Estimates for Proportions

The sample proportion of individuals in category x n

Xp

N

X

n

Xp

The population proportion of individuals with characteristic x

Sample proportion is an estimator of the population proportion

Confidence Intervals for Proportions

The standard deviation of the sample probability distribution

The standard error of the sample proportion

confidence interval for a proportion

n

pps

1

nn

ˆ1ˆˆ

ˆ ˆ

n

ZZ ˆ1ˆ

ˆˆ ˆ


Example: Consider a poll predicting election results with a sample size of 1500. Of those who responded to a question about their voting intentions in an upcoming election, 840 (56%) answered that they would vote for the current governing party.

If we use the sample proportion as an estimator of the population proportion, then

56.01500/840ˆ

and the proportion that would not vote for the governing party;

44.056.011



The standard error of the estimated proportion is then;

n

ˆ1ˆ

ˆ ˆ

1500

44.056.0ˆ ˆ

0128.0002.ˆ ˆ



The 99% confidence interval is:

ˆ99 58.2ˆ.. IC

)0128(.58.256.0.. 99 IC

033.056.0

We are 99% confident that the true population proportion lies between 0.527 and 0.593.

Significance Tests

Significance Tests

Elements of significance tests:

(1) Assumptions: - Type of data- Distributions- Sampling- Sample size

Significance Tests


(2) Hypotheses: - H0: There is no effect, relationship, or

difference- Ha: There is an effect, difference, relationship

Significance Tests


(3) Test Statistic

(4) Significance level, critical value

(5) Decision whether or not to reject H0

Large Sample Z-Test for a Mean

Example:

You sample 400 high school teachers and submit them to an IQ test, which is known to have a population mean of 100. You wish to know whether the sample mean IQ of 115 is significantly different from the population mean. The sample s.d. is 18.


Assumptions: A z-test for a mean assumes random sampling and n greater than 30.

Hypotheses:

100:

100:0

aH

H


Test Statistic:

ns

yz

yz

y

0

0

ˆ

66.16

40018

100115

testz

z


Significance Level and Critical Value:

α = .05, .01, or .001

Find the appropriate critical values (z-scores) from the table of areas under the normal curve.

For a 2-tailed z-test at α=.05 zcritical = 1.96.

Decision:

Because ztest is higher than zcritical, we will reject the null hypothesis and conclude that high school teachers do have significantly higher IQ scores than the general population

Critical Values

While in 231 we always found a critical value for a test statistic given our determined α-level, and compared it to our observed value, we can instead find the probability of observing the test statistic value, and compare it to our determined α-level.

Computer output gives us the exact probability, rather than a test statistic

Large-Sample Z-tests for proportions

Example: You randomly sample 500 cattle and dairy farmers in Saskatchewan, asking them whether they have enough stored feed to last through a summer of severe drought. 275 report that they do not. You know that of the total population (all Canadian farmers), 50% do not have adequate stored feed. Is the proportion of Saskatchewan farmers that is unprepared significantly higher?


Assumptions:

Z-tests assume that the sample size is large enough that the sampling distribution will be approximately normal. In practice this means 30 cases or more. Random sampling is also assumed.

Hypotheses:

50.0:

:

0

00

H

H

50.0:

: 0

a

a

H

H


Test Statistic:

n

z00

0

ˆ

0

1

ˆ

ˆ

ˆ

231.2500

25.

50.0500

)50.1(50.

50.055.0

testz

z

z


Significance level or critical value:

If we look at the table of normal curve probabilities for the probability of finding a zcritical value of 2.23, we find that the one-tail probability is about 0.0129.

This is greater than the .01 level, so we would fail to reject the null. However, it is less than .05.

Conclusion:

A significantly higher proportion of Saskatchewan farmers reported that they did not have enough stored feed to withstand a drought (p=.00129).

Small-Sample Inference for Means

The t-distribution

The t-distribution is another bell-shaped symmetrical distribution centred on 0. The t-distribution differs from the normal distribution in that as sample size decreases, the tails of the t-distribution become thicker than normal. However, when n ≥ 30, the t-distribution is practically the same as the normal distribution.


Example: Suppose we have a sample of 25 young offenders whom we have interviewed. We are interested in the effects of being born to young parents. The average age of the mothers at the birth of the future young offenders was 20.8, with a s.d. of 3.5 years. The average age at first birth for women in Canada in 1990 was 23.2 years. Is the average age of the mothers of young offenders different than that of mothers in the general Canadian population?


Assumptions:

It is assumed that the sample is SRS, from a population that is normally distributed (but it is pretty robust to violations of this assumptions).

Hypotheses: 2.23:

:

0

00

H

H

2.23:

: 0

a

a

H

H


Test Statistic:

ns

yyt

y /ˆ00

255.3

2.238.20 t

43.2actualt


Significance level or critical value: We look for the probability of finding a t-value of -2.43, with (n-1) degrees of freedom.

If our decided P=value is .05, we look for the tcritical value associated with a single-tail probability of .025 (half of the total 2-tailed p-value).

tcritical at p=.05 (2-tailed), 24 df = 2.064.


Decision:We decide to reject the null hypothesis, because the actual value of t is greater than our critical value of t at (p=.05). This means that the observed difference between the observed sample mean and the hypothesized population mean is sufficiently great that we are willing to conclude that the sample comes from a population with a different mean age at first birth than the Canadian population.

Significance Tests

Note that the form of the z and t- statistics are similar;

ns

yyz

y

00

ˆ

n

z00

0

ˆ

0

1

ˆˆ

)1(

/ˆ00

ndf

ns

yyt

y

Large-sample test for means

Large-sample test for proportions

Small-sample test for means

Decisions and Types of Errors

“Confusion” Matrix

H0 is true H0 is false

Reject H0 Type I error

(α)

Correct decision

(1-β) or “Power”

Fail to Reject H0

Correct decision

(1-α)

Type II error

(β)

Type I and Type II Errors

a

P-level

Distribution under H0 Distribution under Ha