tests of significance · tests of significance . proof 925 950 975 1000 p x 1000125 25 x 25 v x 979...

Introduction to Inference

Tests of Significance

Proof

925 950 975 1000

1000x 125

2525

x

979x

xz

sn

979 1000

25

.84

( 979) .2005P x

Proof

925 950 975 1000

1000x 125

2525

x

920x

xz

sn

920 1000

25

3.2

( 920) .0007P x

Definitions

• A test of significance is a method for using

sample data to decide between two competing

claims about a population characteristic.

• The null hypothesis, denoted by H0, says that

there is no effect or no change to a claim

assumed to be true (i.e. H0 : 1000).

• The alternative hypothesis, denoted by Ha, is

the competing claim (i.e. Ha : 1000).

Note: population characteristic could

be , or hypothesized value

Chrysler Concord

• H0: 8

• Ha: 8

xz

sn

8.7 8

110

Phrasing our decision

• In justice system, what is our null and

alternative hypothesis?

• H0: defendant is innocent

• Ha: defendant is guilty

• What does the jury state if the defendant

wins?

• Not guilty

• Why?


• What is the goal of the prosecutor?

• The goal of a trial is to provide evidence that

the defendant is guilty.

• When does the prosecutor win?

• What is the decision with respect to:

– the null hypothesis (H0: defendant is innocent)

– the alternative hypothesis (Ha: defendant is guilty)

• We reject the null because we have the

evidence to believe the alternative.


• When does the defendant win?

• What is the decision with respect to:

– the null hypothesis (H0: defendant is innocent)

– the alternative hypothesis (Ha: defendant is guilty)

• We fail to reject the null because we do not

have the evidence to believe the alternative.

Summary

• H0: defendant is innocent

• Ha: defendant is guilty

• We have the evidence:

– We reject the null because we have the

evidence to believe the alternative.

• We don’t have the evidence:

– We fail to reject the null because we do not

have the evidence to believe the alternative.

Chrysler Concord

• H0: 8

• Ha: 8

• p-value = .0134

• We reject H0 since the probability is so

small there is enough evidence to believe

the mean Concord time is greater than 8

seconds.

K-mart light bulb

• H0: 1000

• Ha: 1000

• p-value = .1078

• We fail to reject H0 since the probability is

not very small there is not enough

evidence to believe the mean lifetime is

less than 1000 hours.

Remember:

Inference procedure overview

• State the procedure

• Define any variables

• Establish the conditions (assumptions)

• Use the appropriate formula

• Draw conclusions

Test of Significance Example

• A package delivery service claims it takes an

average of 24 hours to send a package from

New York to San Francisco. An independent

consumer agency is doing a study to test the

truth of the claim. Several complaints have led

the agency to suspect that the delivery time is

longer than 24 hours. Assume that the delivery

times are normally distributed with standard

deviation (assume for now) of 2 hours. A

random sample of 25 packages has been taken.

The thought process of a test

test of significance

= true mean delivery time

Ho: = 24

Ha: > 24

Given a random sample

Given a normal distribution

Safe to infer a population of at least 250 packages

Thought process continued

22.8 23.2 23.6 24 24.4 24.8 25.2

24x 2

0.425

x

24.85x

24.85

xz

sn

24.85 24

.4

2.125


let a = .05

test of significance = true mean delivery time

Ho: = 24 Ha: > 24



Assume a population of at least 250 packages 24.85 24

2.1252

25

z

p-value 1 .9834 .0166


• Question: What can I conclude?

• If I believe the statistic is just too extreme and unusual (P-value < a), I will reject the null hypothesis.

• If I believe the statistic is just normal chance variation (P-value > a), I will fail to reject the null hypothesis.

Thought process continued test of significance = true mean delivery time

Ho: = 24 Ha: > 24



Assume a population of at least 250 packages

let = .05a

We reject Ho. Since p-value<a there is enough

evidence to believe the delivery time is longer than

24 hours.

p-value .016624.85 24

2.1252

25

z

Second example test of significance = true mean VVIQ

Ho: = 67 Ha: < 67


Sample is large (n>40) Central Limit Theorem

ensures a normal distribution

Assume a population of at least 510 varsity athletes

let = .05a.p-value=.1882

We fail to reject Ho. Since p-value>a there is not

enough evidence to believe the mean VVIQ score is

less than 67.

64.6 67.88

19.3851

z

1 proportion z-test p = true proportion pure short

Ho: p = .25 Ha: p = .25

Given a random sample.

np = 1064(.25) > 10 n(1–p) = 1064(1–.25) > 10

Sample size is large enough to use normality

Safe to infer a population of at least 10,640 plants.

let = .05a

.2603 .25.78

.25(1 .25)

1064

z

.p-value=.4361

We fail to reject Ho. Since p-value>a there is not

enough evidence to believe the proportion of pure

short is different than 25%.

Choosing a level of significance

• How plausible is H0? If H0 represents a

long held belief, strong evidence (small a)

might be needed to dissolve the belief.

• What are the consequences of rejecting

H0? The choice of a will be heavily

influenced by the consequences of

rejecting or failing to reject.

Errors in the justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Correct decision

Correct decision

Type I error

Type II error

“No innocent man is jailed” justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Type I error

Type II error

smaller

larger

“No guilty man goes free” justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Type I error

Type II error smaller

larger

Errors in the justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Correct decision

Correct decision

Type I error

Type II error

(Ha true) (H0 true)

(reject H0)

(fail to reject H0)

Type I and Type II errors

• If we believe Ha when in fact H0 is true,

this is a type I error.

• If we believe H0 when in fact Ha is true,

this is a type II error.

• Type I error: if we reject H0 and it’s a

mistake.

• Type II error: if we fail to reject H0 and

it’s a mistake. APPLET

http://www.amstat.org/publications/jse/v11n3/java/Hypothesis/

Type I and Type II example

A distributor of handheld calculators receives very large

shipments of calculators from a manufacturer. It is too

costly and time consuming to inspect all incoming

calculators, so when each shipment arrives, a sample is

selected for inspection. Information from the sample is

then used to test Ho: p = .02 versus Ha: p < .02, where p

is the true proportion of defective calculators in the

shipment. If the null hypothesis is rejected, the distributor

accepts the shipment of calculators. If the null hypothesis

cannot be rejected, the entire shipment of calculators is

returned to the manufacturer due to inferior quality. (A

shipment is defined to be of inferior quality if it contains

2% or more defectives.)


• Type I error: We think the proportion of

defective calculators is less than 2%, but

it’s actually 2% (or more).

• Consequence: Accept shipment that has

too many defective calculators so potential

loss in revenue.


• Type II error: We think the proportion of

defective calculators is 2%, but it’s actually

less than 2%.

• Consequence: Return shipment thinking

there are too many defective calculators,

but the shipment is ok.


• Distributor wants to avoid Type I error.

Choose a = .01

• Calculator manufacturer wants to avoid

Type II error. Choose a = .10

Concept of Power

• Definition?

• Power is the capability of accomplishing

something…

• The power of a test of significance is…

Power Example

In a power generating plant, pressure in a certain line is

supposed to maintain an average of 100 psi over any 4

- hour period. If the average pressure exceeds 103 psi

for a 4 - hour period, serious complications can evolve.

During a given 4 - hour period, thirty random

measurements are to be taken. The standard

deviation for these measurements is 4 psi (graph of

data is reasonably normal), test Ho: = 100 psi versus

the alternative “new” hypothesis = 103 psi. Test at

the alpha level of .01. Calculate a type II error and the

power of this test. In context of the problem, explain

what the power means.

Type I error and a

4.73

30s

n

100100.73

101.46102.19

for =.01 t*=2.462a

a is the probability that we think

the mean pressure is above 100 psi,

but actually the mean pressure is

100 psi (or less)

Type I error and a

4.73

30s

n

100100.73

101.46102.19

101.80

for =.01 t*=2.462a

1002.462

.73

x

Type II error and b

100100.73

101.46102.19

101.8

103

103

.73zb

1.64

.0505b

Type II error and b

100100.73

101.46102.19

101.8

103

.0505b

b is the probability that we think the mean pressure is 100 psi,

but actually the pressure is greater than 100 psi.

Power?

100100.73

101.46102.19

103

.0505b

Power = 1 .0505 .9495

100100.73

101.46102.19

103

For a sample size of 30, there is a .9495

probability that this test of significance will

correctly detect if the pressure is above

100 psi.

Concept of Power

• The power of a test of significance is

the probability that the null hypothesis

will be correctly rejected.

• Because the true value of is unknown,

we cannot know what the power is for ,

but we are able to examine “what if”

scenarios to provide important

information.

• Power = 1 – b

Effects on the Power of a Test

• The larger the difference between the hypothesized value and the true value of the population characteristic, the higher the power.

• The larger the significance level, a, the higher the power of the test.

• The larger the sample size, the higher the power of the test.

APPLET

http://www.amstat.org/publications/jse/v11n3/java/Hypothesis/

tests of significance · tests of significance . proof 925 950 975 1000 p x 1000125 25 x 25 v x 979...

Documents