hypothesis tests for means the context “statistical significance” hypothesis tests and...

Hypothesis Tests for Means

The context

“Statistical significance”Hypothesis tests and confidence intervals

The stepsHypothesisTest statisticDistributionAlpha, and the rejection regionResult

p-ValuesOne-sided vs. two-sided tests

Hypothesis tests for proportions

The context

PARAMETERS = population mean (unknown) = population SD (might be known)

STATISTICS n = sample sizex = sample means = sample SD (using n-1)

ALSO 0 = conjectured value of

x

Statistical significance

We’re trying to decide whether is equal to 0.

As usual we use x as an estimate of . Usually x is at least a little different from 0. But could the difference be due to random variation?

IF YES – then we DO NOT REJECT the hypothesis that is really equal to 0. We say that x is not significantly different from 0.

IF NO – then we REJECT the hypothesis that = 0. We say that x IS significantly different from 0.

xx

x

x

Hypothesis tests are just confidence intervals

If we only cared about hypothesis tests for means, we could make this a lot simpler.

Just construct a confidence interval for ,based on n, x, s (or ) and your favorite confidence level C.

If 0 is outside the confidence interval, then we reject the hypothesis that = 0. The significance level is = 1 – C.

That’s all there is to it. So why all the complex ritual of a hypothesis test?

Because there are other hypothesis tests, for other hypotheses (difference of two means, for example). For those tests, we need the ritual.

x,

Hypothesis Test for

Cookbook using rejection regions

1. Choose hypotheses – H0 and HA.

2. Define a test statistic.

3. Predict the distribution of the test statistic,assuming that H0 is true.

4. Choose C and . Pick a rejection region.

5. Look at the observed value of the test statistic. Is it in the rejection region? If so, reject H0.

Choose hypotheses

Two-sided test:H0: = 0 HA: 0

One-sided tests:H0: = 0 HA: > 0

orH0: = 0 HA: < 0

Working rule: Always use two-sided tests.

Hypothesis Test for







Define a test statistic

Choose

or

Do you know ? Maybe it comes with the null hypothesis. If so, use it.

0xz

n

0xt s

n

Hypothesis Test for







Distribution of the test statistic

ASSUME H0 IS TRUE.

Then (if you know ) z has a STANDARD NORMAL distribution.

Or (if you’re using s) t has a “t” distribution withn-1 degrees of freedom.

Hypothesis Test for







(Standard normal case)

The rejection region is a range (or double-range) of values of the test statistic that are(a) UNLIKELY if H0 is true

(b) roughly consistent with the alternative HA.

The rejection region should have probability (given H0).

Two-sided case:

z*/2- z*/2

Rejection region consists of two parts, each with probability /2.

Predicting the distribution

• If you’re using t, just use t-critical values.

• For the one-sided case:

z*

Rejection region probability , all in

one tail.

Chance of a Type I error

Note:

IF H0 is actually true, then there is still a probability of that you will reject the null hypothesis.

z*/2- z*/2

Chance of a Type I error

There are two possible bad results:

TYPE I ERROR (“act of commission”) – reject H0, when H0 is actually true.

The probability of a Type I error is (given that H0 is true)

TYPE II ERROR (“act of omission”) – don’t reject H0, when H0 is actually false.

The probability of a Type II error depends on the actual value of

Hypothesis Test for







Tradeoff

High (say, 10%) then you have a good chance of having a statistically significant result, but it won’t impress anyone.

MORE TYPE I ERRORS

Low (say, 1%) then your significant results are more convincing, but you’ll have fewer of them.

MORE TYPE II ERRORS

Is there a way to avoid choosing in advance?

Determine p-value

The “p-value” is the answer to this question:

What fraction of x ‘s are more extreme than the one you actually obtained?

If HA: 0 this means, what fraction are further from zero than the value you obtained?

If HA: > 0 this means, what fraction are more than the value you obtained?

If HA: < 0 this means, what fraction are less than the value you obtained?

x

Determine p-valueExample:

Do a test of H0: = 0 vs. HA: 0 .

Get test statistic z = 2.30.What’s the p-value?

Probability of seeing 2.30 OR MORE: 0.0107Probability of seeing 2.30 OR MORE EXTREME:

0.0214p-value for 2-sided test: 0.0214

z=2.30

tail: 0.0107

Determine p-value

Keep it simple?

p-value = (for 1-sided test with z) = 1 - NORMSDIST ( |z| )(for 2-sided test with z) = 2 × (1-NORMSDIST(|z|))

(for 1-sided test with t) = TDIST ( |t|, n-1, 1 )(for 2-sided test with t) = TDIST ( |t|, n-1, 2 )

df number of tails

Determine p-value

The p-value is the border between ’s for whichwe reject H0 and ’s for which we do not

reject H0.

REJECTION REGION VERSION: Pick , and the rejection region, in advance.In this story, the p-value is an afterthought.

p-VALUE FIRST VERSION: Find the p-value first. Then if anyone has a favorite , you can…

Reject H0 if p <

Do not reject if p > .

Example: 1969 Draft Lottery

Null hypothesis (informally): The numbers for the second half of the year were drawn randomly from the population 1, 2, …, 366.

(Note: The mean of these numbers is 183.5, and their standard deviation is 105.6547. )

Null hypothesis (formally): H0 : = 183.5

(and this is one of those cases where = 105.6547 comes with the null hypothesis)

Alternative: HA : 183.5

Example: 1969 Draft Lottery

H0 : = 183.5 HA : 183.5

0 = 183.5 = 105.6547

Experiment: n = 184, x = _________

Test statistic:

p-value:

Conclusion: REJECT H0 (even at 1% significance level)

x

0 183.57.789

x xz

n

160.92

= - 2.898

0.00375


PARAMETERp = population proportion

STATISTICSn = sample sizek = number of “hits”p = k / n = sample proportionp̂


Test statistic:

(Minor subtlety: The distribution of the test statistic is based on H0, so we use p0 in the formula for SE. This is different from what we do in confidence intervals, but not by much.)

0 0

0 01

ˆ ˆ

SE ( )

p p p pz

p pn

Another example

Suppose we have flipped 10000 coins, and obtained 5100 heads. Is this result statistically significant?

Another example


Choose:H0: p = 0.50 HA: p 0.50

Another example


Choose:H0: p = 0.50 HA: p 0.50

Conditions? OK.

Another example


Choose:H0: p = 0.50 HA: p 0.50

Conditions? OK.

Distribution of p^, given H0:

Normal, mean 0.50, SD=0.005

Another example

Our value of p^ is 0.51. That’s 2.0 SD’s above the mean.

What fraction of p^ values would be further from zero than 0.51 ?

Another example

Our value of p^ is 0.51. That’s 2.0 SD’s above the mean.

What fraction of p^ values would be further from zero than 0.51 ?

ABOUT 4.5%, counting both tails. So, P-value is 0.045.

Result of test

Is a P-value of 0.045 good enough to reject H0?

Result of test


If we choose = 0.05, then yes. But that’s a very mild test for such an extraordinary claim.

Result of test



If we pick = 0.05, then 5% of all our experiments will end in rejecting H0, even though H0 is true every time.

Result of test




So we should choose a lower value of . In this case, our result isn’t really “statistically significant.”

Result of test




So we should choose a lower value of . In this case, our result isn’t really “statistically significant.”

We need a bigger sample!

hypothesis tests for means the context “statistical significance” hypothesis tests and...

Documents

hypotheses h

hypotheses twosided

null hypothesis

rejection regions

onesided tests

conjectured value of

observed value

hypotheses difference