week101 decision errors and power when we perform a statistical test we hope that our decision will...

28
week10 1 Decision Errors and Power When we perform a statistical test we hope that our decision will be correct, but sometimes it will be wrong. There are two types of incorrect decisions. To help distinguish these two types of error, we give them specific names. The error made by rejecting the null hypothesis H 0 (accepting H a ) when in fact H 0 is true is called a type I error. The probability of making a type I error is denoted by . The error made by accepting the null hypothesis H 0 (rejecting H a ) when in fact H 0 is false is called a type II error. The probability of making a type II error is denoted by . The probability that a fixed level significant test will reject H 0 when a particular alternative value of the parameter is true is called the power of the test to detect that alternative.

Upload: melinda-anthony

Post on 03-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

week10 1

Decision Errors and Power • When we perform a statistical test we hope that our decision

will be correct, but sometimes it will be wrong. There are two types of incorrect decisions. To help distinguish these two types of error, we give them specific names.

• The error made by rejecting the null hypothesis H0 (accepting Ha) when in fact H0 is true is called a type I error.

• The probability of making a type I error is denoted by .• The error made by accepting the null hypothesis H0

(rejecting Ha) when in fact H0 is false is called a type II error.• The probability of making a type II error is denoted by .• The probability that a fixed level significant test will reject H0

when a particular alternative value of the parameter is true is called the power of the test to detect that alternative.

week10 2

• Significance and type I error The significance level of any fixed level test is the probability

of a Type I error. That is is the probability that the test will reject the null hypothesis H0 when H0 is in fact true.

• Power and Type II error The power of a fixed level test against a particular alternative is

Power = 1- β = 1- P( accepting H0 when H0 is false) =

= P( rejecting H0 when H0 is false)

week10 3

Ways to increase Power

• Increase α. When we increase α the strength of evidence required for the rejection is less.

• Consider a particular Ha that is farther away from μ0.

Values of μ that are in Ha but lie close to μ0 are harder to detect (lower power) then values of μ that are far from μ0.

• Increase sample size. More data will provide more information about the population so we have a better chance of distinguishing values of µ.

• Decrease σ. This has the same effect as increasing the sample size: more information about µ. Improving the measurement process and restricting attention to a subpopulation are two common ways to decrease σ.

week10 4

Example • Example 6.16 discusses a test about the mean contents of cola

bottles. The hypotheses are

H0: = 300 Ha: < 300.

• The sample size is n = 6, and the population is assumed to have a normal distribution with = 3. A 5% significance test rejects H0 if z ≤ Z0.05 = -1.645 where the test statistic z is

• Power calculations help us see how large a shortfall in the bottle contents the test can be expected to detect.

(a) Find the power of this test against the alternative = 299.(b) Find the power against the alternative = 295.(c) Is the power against = 290 higher or lower than the value

you found in (b)? Explain why this result makes sense.

3003/ 6xz

week10 5

Solution• The rejection criterion z –1.645 is equivalent to 300 – 1.645(3/6) = 297.99, so we may proceed as

follows:

(a) P(reject H0 when = 299) = P( 297.99 when = 299) = P(Z ) = P(Z –0.83) = 0.2033.

(b) P(reject H0 when = 295) = P( 297.99 when = 295) = P(Z ) = P(Z 2.44) = 0.9927.

(c) The power against = 290 would be greater—it is further from the null value of 300, so it is easier to distinguish from the null hypothesis.

X

X

X

6/3

29999.297

6/3

29599.297

week10 6

Exercise• You have an SRS of size n = 9 from a normal distribution with

σ = 1. You wish to test the following

H0: = 0 Ha: > 0

• You decide to reject H0 if and to accept H0 otherwise.

(a) Find the probability of a Type I error, that is, the probability that your test rejects H0 when in fact = 0.

(b) Find the probability of a Type II error when = 0.3. This is the probability that your test accepts H0 when in fact = 0.3.

(c) Find the probability of a Type II error when = 1.

0X

week10 7

Qestion17 Final Exam Dec 2000

When testing H0: μ = 5 vs Ha: μ ≠ 5 at = 0.01 with n =40 suppose that the probability of a type II error () is equal to 0.02 when = 2. Which of the following statements are true?

) > 0.02 when = 3

) > 0.02 if the sample size was 50 (at = 2)

c) > 0.02 if had been twice as large. (at = 2)

d) The power of the test is at = 2 0.99

week10 8

Exercise A study was carried out to investigate the effectiveness of a

treatment. 1000 subjects participated in the study, with 500 being randomly assigned to the "treatment group" and the other 500 to the "control (or placebo) group". A statistically significant difference was reported between the responses of the two groups (P <0 .005).

State whether the following statements are true of false.a) There is a large difference between the effects of the treatment

and the placebo. b) There is strong evidence that the treatment is very effective. c) There is strong evidence that there is some difference in effect

between the treatment and the placebo. d) There is little evidence that the treatment has some effect. e) The probability that the null hypothesis is true is less than 0.005.

week10 9

Use and abuse of Tests

• The spirit of a test of significance is to give a clear statement of the degree of evidence provided by the sample against the null hypothesis. The P-value does this. There is no sharp evidence between “significant” and “not significant” only increasingly strong evidence as the P-value decreases.

• When large samples are available, even tiny deviations from the null hypothesis will be significant (small P-value). Statistically significant effect need not be practically important. Always plot the data and examine them carefully. Beware of outliers.

• On the other hand, lack of significant does not imply that H0 is true, especially when the test has low power. When planning a study, verify that the test you plan to use does have high probability of detecting an effect of the size you hope to find.

week10 10

• Significant tests are not always valid. Faulty data collection, outliers in the data, and testing a hypothesis on the same data that first suggested that hypothesis can invalidate a test. Many tests run at once will probably produce some significant results by chance alone, even if the null hypotheses are true.

• The reasoning behind statistical significance works well if you decide what effect you are seeking, design an experiment or sample to search for it, and use a test of significance to weight the evidence you get.

week10 11

Example

Suppose that the population of scores of the high school seniors that took the SAT-Verbal test this year follows a normal distribution with = 48 and = 90. A report claims that 10,000 students who took part in the national program for improving SAT-verbal scores had a significantly better score (at the 5% level of sig.) than the population as a whole. In order to determine if the improvement is of practical significance one should:

Find out the actual mean score for the 10,000 students. Fine out the actual p-value.

week10 12

Example 6.22 on page 425 in IPS

• Suppose that we are testing the hypothesis of no correlation between two variables. With 400 observation, an observed correlation of only r = 0.1 is significant evidence at the α = 0.05 level that the correlation in the population is not zero. The low significance level does not mean there is strong association, only that there is some evidence of some association.

• This is an example where the test results are statistically significant but not practically significant.

week10 13

Question 19 Final Exam Dec 1998

Suppose a researcher carries out 100 separate and independent tests of a particular null hypothesis, and finds that exactly 4 tests out of 100 are statistically sig. at the 5% level. Which of the following statements are true?

a) We may report to the media that we have found evidence for the alternative hypothesis at the 5% level.

b) If the null hypothesis were in fact true, the number of statistically sig. test results (5% level), out of 100 tests, should follow a binomial( n = 100, p = ½).

c) The null hypothesis has not been disproved, since the results are not at all unusual, i.e. the observed level of significance for these findings should be considered to be quite unimpressive and of high probability.

week10 14

The t distribution • Suppose that a SRS of size n is drawn from a N(μ, σ)

population. Then the one sample t statistic

has a t distribution with n -1 degrees of freedom.

• The t distribution has mean 0 and it is a symmetric distribution.

• The is a different t distribution for each sample size.

• A particular t distribution is specified by the degrees of freedom that comes from the sample standard deviation.

ns

xt

week10 15

Tests for the population mean when is unknown

• Suppose that a SRS of size n is drawn from a population having unknown mean μ and unknown stdev. . To test the hypothesis H0: μ = μ0 , we first estimate by s – the sample stdev., then compute the one-sample t statistic given by

• In terms of a random variable T having the t (n - 1) distribution, the P-value for the test of H0 against

Ha : μ > μ 0 is P( T ≥ t )

Ha : μ < μ 0 is P( T ≤ t )

Ha : μ ≠ μ 0 is 2·P( T ≥ |t|)

ns

xt 0

week10 16

Example • In a metropolitan area, the concentration of cadmium (Cd) in

leaf lettuce was measured in 6 representative gardens where sewage sludge was used as fertilizer. The following measurements (in mg/kg of dry weight) were obtained.

Cd 21 38 12 15 14 8 Is there strong evidence that the mean concentration of Cd is

higher than 12.

Descriptive Statistics

Variable N Mean Median TrMean StDev SE MeanCd 6 18.00 14.50 18.00 10.68 4.36

• The hypothesis to be tested are: H0: μ = 12 vs Ha: μ > 12.

week10 17

• The test statistics is:

The degrees of freedom are df = 6 – 1 = 5

Since t = 1.38 < 2.015, we cannot reject H0 at the 5% level and so there are no strong evidence.

The P-value is 0.1 < P(T(5) ≥ 1.38) < 0.15 and so is greater then 0.05 indicating a non significant result.

18 12 1.38/ 10.68/ 6

xts n

week10 18

CIs for the population mean when unknown

• Suppose that a SRS of size n is drawn from a population having unknown mean μ. A C-level CI for μ when is unknown is an interval of the form

where t* is the value for the t (n -1) density curve with area C between –t* and t*.

• Example:

Give a 95% CI for the mean Cd concentration.

n

stx

n

stx ** ,

week10 19

• MINITAB commands: Stat > Basic Statistics > 1-Sample t

• MINITAB outputs for the above problem:

T-Test of the Mean

Test of mu = 12.00 vs mu > 12.00

Variable N Mean StDev SE Mean T P

Cd 6 18.00 10.68 4.36 1.38 0.11

T Confidence Intervals

Variable N Mean StDev SE Mean 95.0 % CI

Cd 6 18.00 10.68 4.36 (6.79, 29.21)

week10 20

Question 3 Final exam Dec 2000

• In order to test H0: μ = 60 vs Ha: μ ≠ 60 a random sample of 9 observations (normally distributed) is obtained, yielding and s = 5. What is the p-value of the test for this sample?

a) greater than 0.10.b) between 0.05 and 0.10.c) between 0.025 and 0.05.d) between 0.01 and 0.025.e) less than 0.01.

55x

week10 21

Question

A manufacturing company claims that its new floodlight will last 1000 hours. After collecting a simple random sample of size ten, you determine that a 95% confidence interval for the true mean number of hours that the floodlights will last, , is (970, 995). Which of the following are true? (Assume all tests are two-sided.)

I) At any < .05, we can reject the null hypothesis that the true mean is 1000.

II) If a 99% confidence interval for the mean were determined here, the numerical value 972 would certainly lie in this interval.

III) If we wished to test the null hypothesis H0: = 988, we could say that the p-value must be < 0.05.

week10 22

Questions

1. Alpha (level of sig. α) is

a) the probability of rejecting H0 when H0 is true.b) the probability of supporting H0 when H0 is false.c) supporting H0 when H0 is true.d) rejecting H0 when H0 is false.

2. Confidence intervals can be used to do hypothesis tests for

a) left tail tests.b) right tail testsc) two tailed test

3. The Type II error is supporting a null hypothesis that is false. T/F

week10 23

Robustness of the t procedures

• Robust procedures

A statistical inference procedure is called robust if the probability calculations required are insensitive to violations of the assumptions made.

• t-procedures are quite robust against nonnormality of the population except in the case of outliers or strong skewness.

week10 24

Simulation study• Let’s generate 100 samples of size 10 from a moderately

skewed distribution (Chi-square distribution with 5 df ) and calculate the 95% t-intervals to see how many of them contain the true mean μ = 5.

• First let’s have a look at the histogram of the 1000 values generated from this distribution.

Variable N Mean Median TrMean StDev

C1 1000 4.9758 4.2788 4.7329 3.1618

3020100

400

300

200

100

0

C1

Fre

quency

week10 25

T Confidence Intervals

Variable N Mean StDev SE Mean 95.0 % CIC1 10 5.21 3.89 1.23 ( 2.43, 7.99). . . C4 10 4.449 1.593 0.504 ( 3.309, 5.589)C5 10 5.33 4.23 1.34 ( 2.31, 8.36)C6 10 3.267 2.312 0.731 ( 1.612, 4.921)*C7 10 4.981 2.988 0.945 ( 2.844, 7.118)C8 10 3.725 1.520 0.481 ( 2.638, 4.812)*C9 10 4.487 2.332 0.738 ( 2.819, 6.155). . .

C14 10 4.650 1.854 0.586 ( 3.324, 5.977)C15 10 2.973 2.163 0.684 ( 1.425, 4.520)*C16 10 4.685 2.254 0.713 ( 3.072, 6.297)C26 10 5.594 2.984 0.944 ( 3.459, 7.728)C27 10 3.468 2.078 0.657 ( 1.982, 4.955)*C28 10 5.59 3.84 1.22 ( 2.84, 8.34). . .

C62 10 5.689 3.113 0.984 ( 3.462, 7.916)C63 10 3.724 1.741 0.551 ( 2.479, 4.970)*C64 10 4.387 2.157 0.682 ( 2.843, 5.930). . .

C87 10 7.01 3.44 1.09 ( 4.55, 9.47)C88 10 3.281 2.265 0.716 ( 1.661, 4.902)*C89 10 4.78 3.20 1.01 ( 2.49, 7.06). . .

C99 10 6.52 4.24 1.34 ( 3.49, 9.56)C100 10 3.614 2.198 0.695 ( 2.042, 5.186)

The number of intervals not capturing the true mean (μ = 5) is 6/100.

week10 26

Example• 100 samples of size 15 were drawn from a very skewed

distribution (Chi-square distribution with d. f. 1)

Variable N Mean Median TrMean StDev

C1 1500 0.9947 0.4766 0.8059 1.3647

• The 95% CIs (t-intervals) for these 100 samples are given below.

151050

1500

1000

500

0

C1

Fre

qu

en

cy

week10 27

T Confidence IntervalsVariable N Mean StDev SE Mean 95.0 % CIC1 15 0.773 0.939 0.242 ( 0.253, 1.293)C2 15 1.093 1.491 0.385 ( 0.268, 1.919)C3 15 0.553 0.735 0.190 ( 0.146, 0.960)*C4 15 0.387 0.732 0.189 ( -0.019, 0.792)*C5 15 1.239 2.146 0.554 ( 0.051, 2.427)...C23 15 0.491 0.619 0.160 ( 0.148, 0.834)*C24 15 0.582 1.088 0.281 ( -0.020, 1.184)C25 15 0.550 0.660 0.170 ( 0.184, 0.915)*C26 15 0.634 0.769 0.199 ( 0.208, 1.060)C27 15 0.508 0.528 0.136 ( 0.216, 0.800)*... C51 15 1.122 1.292 0.334 ( 0.406, 1.837)C52 15 0.519 0.664 0.171 ( 0.151, 0.887)*C53 15 1.666 2.028 0.524 ( 0.543, 2.789)... C59 15 1.208 2.297 0.593 ( -0.065, 2.480)C60 15 0.644 0.525 0.136 ( 0.353, 0.935)*C61 15 1.088 1.122 0.290 ( 0.466, 1.709)

week10 28

T Confidence Intervals (continuation)

...

C79 15 0.895 0.931 0.240 ( 0.379, 1.411)

C80 15 0.391 0.767 0.198 ( -0.034, 0.816)*

C81 15 1.038 0.992 0.256 ( 0.488, 1.587)

C82 15 0.952 1.407 0.363 ( 0.173, 1.732)

C83 15 0.2763 0.2999 0.0774 ( 0.1102, 0.4424)*

C84 15 1.237 1.999 0.516 ( 0.130, 2.345)

...

C99 15 0.921 0.865 0.223 ( 0.442, 1.400)

C100 15 0.813 1.437 0.371 ( 0.018, 1.609)

The number of intervals not capturing the true mean (μ = 1) is 9/100.