tests of significance · tests of significance . proof 925 950 975 1000 p x 1000125 25 x 25 v x 979...
TRANSCRIPT
Introduction to Inference
Tests of Significance
Proof
925 950 975 1000
1000x 125
2525
x
979x
xz
sn
979 1000
25
.84
( 979) .2005P x
Proof
925 950 975 1000
1000x 125
2525
x
920x
xz
sn
920 1000
25
3.2
( 920) .0007P x
Definitions
• A test of significance is a method for using
sample data to decide between two competing
claims about a population characteristic.
• The null hypothesis, denoted by H0, says that
there is no effect or no change to a claim
assumed to be true (i.e. H0 : 1000).
• The alternative hypothesis, denoted by Ha, is
the competing claim (i.e. Ha : 1000).
Note: population characteristic could
be , or hypothesized value
Chrysler Concord
• H0: 8
• Ha: 8
xz
sn
8.7 8
110
Phrasing our decision
• In justice system, what is our null and
alternative hypothesis?
• H0: defendant is innocent
• Ha: defendant is guilty
• What does the jury state if the defendant
wins?
• Not guilty
• Why?
Phrasing our decision
• What is the goal of the prosecutor?
• The goal of a trial is to provide evidence that
the defendant is guilty.
• When does the prosecutor win?
• What is the decision with respect to:
– the null hypothesis (H0: defendant is innocent)
– the alternative hypothesis (Ha: defendant is guilty)
• We reject the null because we have the
evidence to believe the alternative.
Phrasing our decision
• When does the defendant win?
• What is the decision with respect to:
– the null hypothesis (H0: defendant is innocent)
– the alternative hypothesis (Ha: defendant is guilty)
• We fail to reject the null because we do not
have the evidence to believe the alternative.
Summary
• H0: defendant is innocent
• Ha: defendant is guilty
• We have the evidence:
– We reject the null because we have the
evidence to believe the alternative.
• We don’t have the evidence:
– We fail to reject the null because we do not
have the evidence to believe the alternative.
Chrysler Concord
• H0: 8
• Ha: 8
• p-value = .0134
• We reject H0 since the probability is so
small there is enough evidence to believe
the mean Concord time is greater than 8
seconds.
K-mart light bulb
• H0: 1000
• Ha: 1000
• p-value = .1078
• We fail to reject H0 since the probability is
not very small there is not enough
evidence to believe the mean lifetime is
less than 1000 hours.
Remember:
Inference procedure overview
• State the procedure
• Define any variables
• Establish the conditions (assumptions)
• Use the appropriate formula
• Draw conclusions
Test of Significance Example
• A package delivery service claims it takes an
average of 24 hours to send a package from
New York to San Francisco. An independent
consumer agency is doing a study to test the
truth of the claim. Several complaints have led
the agency to suspect that the delivery time is
longer than 24 hours. Assume that the delivery
times are normally distributed with standard
deviation (assume for now) of 2 hours. A
random sample of 25 packages has been taken.
The thought process of a test
test of significance
= true mean delivery time
Ho: = 24
Ha: > 24
Given a random sample
Given a normal distribution
Safe to infer a population of at least 250 packages
Thought process continued
22.8 23.2 23.6 24 24.4 24.8 25.2
24x 2
0.425
x
24.85x
24.85
xz
sn
24.85 24
.4
2.125
Thought process continued
let a = .05
test of significance = true mean delivery time
Ho: = 24 Ha: > 24
Given a random sample
Given a normal distribution
Assume a population of at least 250 packages 24.85 24
2.1252
25
z
p-value 1 .9834 .0166
Thought process continued
• Question: What can I conclude?
• If I believe the statistic is just too extreme and unusual (P-value < a), I will reject the null hypothesis.
• If I believe the statistic is just normal chance variation (P-value > a), I will fail to reject the null hypothesis.
Thought process continued test of significance = true mean delivery time
Ho: = 24 Ha: > 24
Given a random sample
Given a normal distribution
Assume a population of at least 250 packages
let = .05a
We reject Ho. Since p-value<a there is enough
evidence to believe the delivery time is longer than
24 hours.
p-value .016624.85 24
2.1252
25
z
Second example test of significance = true mean VVIQ
Ho: = 67 Ha: < 67
Given a random sample
Sample is large (n>40) Central Limit Theorem
ensures a normal distribution
Assume a population of at least 510 varsity athletes
let = .05a.p-value=.1882
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the mean VVIQ score is
less than 67.
64.6 67.88
19.3851
z
1 proportion z-test p = true proportion pure short
Ho: p = .25 Ha: p = .25
Given a random sample.
np = 1064(.25) > 10 n(1–p) = 1064(1–.25) > 10
Sample size is large enough to use normality
Safe to infer a population of at least 10,640 plants.
let = .05a
.2603 .25.78
.25(1 .25)
1064
z
.p-value=.4361
We fail to reject Ho. Since p-value>a there is not
enough evidence to believe the proportion of pure
short is different than 25%.
Choosing a level of significance
• How plausible is H0? If H0 represents a
long held belief, strong evidence (small a)
might be needed to dissolve the belief.
• What are the consequences of rejecting
H0? The choice of a will be heavily
influenced by the consequences of
rejecting or failing to reject.
Errors in the justice system
Actual truth
Jury decision
Guilty Not guilty
Guilty
Not guilty
Correct decision
Correct decision
Type I error
Type II error
“No innocent man is jailed” justice system
Actual truth
Jury decision
Guilty Not guilty
Guilty
Not guilty
Type I error
Type II error
smaller
larger
“No guilty man goes free” justice system
Actual truth
Jury decision
Guilty Not guilty
Guilty
Not guilty
Type I error
Type II error smaller
larger
Errors in the justice system
Actual truth
Jury decision
Guilty Not guilty
Guilty
Not guilty
Correct decision
Correct decision
Type I error
Type II error
(Ha true) (H0 true)
(reject H0)
(fail to reject H0)
Type I and Type II errors
• If we believe Ha when in fact H0 is true,
this is a type I error.
• If we believe H0 when in fact Ha is true,
this is a type II error.
• Type I error: if we reject H0 and it’s a
mistake.
• Type II error: if we fail to reject H0 and
it’s a mistake. APPLET
Type I and Type II example
A distributor of handheld calculators receives very large
shipments of calculators from a manufacturer. It is too
costly and time consuming to inspect all incoming
calculators, so when each shipment arrives, a sample is
selected for inspection. Information from the sample is
then used to test Ho: p = .02 versus Ha: p < .02, where p
is the true proportion of defective calculators in the
shipment. If the null hypothesis is rejected, the distributor
accepts the shipment of calculators. If the null hypothesis
cannot be rejected, the entire shipment of calculators is
returned to the manufacturer due to inferior quality. (A
shipment is defined to be of inferior quality if it contains
2% or more defectives.)
Type I and Type II example
• Type I error: We think the proportion of
defective calculators is less than 2%, but
it’s actually 2% (or more).
• Consequence: Accept shipment that has
too many defective calculators so potential
loss in revenue.
Type I and Type II example
• Type II error: We think the proportion of
defective calculators is 2%, but it’s actually
less than 2%.
• Consequence: Return shipment thinking
there are too many defective calculators,
but the shipment is ok.
Type I and Type II example
• Distributor wants to avoid Type I error.
Choose a = .01
• Calculator manufacturer wants to avoid
Type II error. Choose a = .10
Concept of Power
• Definition?
• Power is the capability of accomplishing
something…
• The power of a test of significance is…
Power Example
In a power generating plant, pressure in a certain line is
supposed to maintain an average of 100 psi over any 4
- hour period. If the average pressure exceeds 103 psi
for a 4 - hour period, serious complications can evolve.
During a given 4 - hour period, thirty random
measurements are to be taken. The standard
deviation for these measurements is 4 psi (graph of
data is reasonably normal), test Ho: = 100 psi versus
the alternative “new” hypothesis = 103 psi. Test at
the alpha level of .01. Calculate a type II error and the
power of this test. In context of the problem, explain
what the power means.
Type I error and a
4.73
30s
n
100100.73
101.46102.19
for =.01 t*=2.462a
a is the probability that we think
the mean pressure is above 100 psi,
but actually the mean pressure is
100 psi (or less)
Type I error and a
4.73
30s
n
100100.73
101.46102.19
101.80
for =.01 t*=2.462a
1002.462
.73
x
Type II error and b
100100.73
101.46102.19
101.8
103
103
.73zb
1.64
.0505b
Type II error and b
100100.73
101.46102.19
101.8
103
.0505b
b is the probability that we think the mean pressure is 100 psi,
but actually the pressure is greater than 100 psi.
Power?
100100.73
101.46102.19
103
.0505b
Power = 1 .0505 .9495
100100.73
101.46102.19
103
For a sample size of 30, there is a .9495
probability that this test of significance will
correctly detect if the pressure is above
100 psi.
Concept of Power
• The power of a test of significance is
the probability that the null hypothesis
will be correctly rejected.
• Because the true value of is unknown,
we cannot know what the power is for ,
but we are able to examine “what if”
scenarios to provide important
information.
• Power = 1 – b
Effects on the Power of a Test
• The larger the difference between the hypothesized value and the true value of the population characteristic, the higher the power.
• The larger the significance level, a, the higher the power of the test.
• The larger the sample size, the higher the power of the test.
APPLET