chapter 4(1) basic logic
DESCRIPTION
TRANSCRIPT
Hypothesis Testing for
Continuous Variables
Yuantao Hao
19th,Otc., 2009
Chapter4
Methods of statistical inference :
Parameter estimation: interval estimation
Hypothesis testing
4.1
Specific logic and
main steps of hypothesis testing
4.1 Specific logic and main steps of hypothesis testing
Example 4.1 : Randomly select 20 cases from the patients with certain kind of disease. The sample mean of blood sedimentation (mm/h) (血沉) is 9.15, sample standard deviation is 2.13. To estimate the 95% confidence interval and 99% confidence interval of population mean under the assumption that the blood sedimentation of this kind of disease follows a normal distribution
Solution:
15.8 and 15.1020
13.2093.215.905.005.0
n
stxstx x
87.7 and 51.1020
13.2861.215.901.001.0
n
stxstx x
the 95% confidence interval is (8.15, 10.15),
the 99% confidence interval is (7.78, 10.51).
Other consideration:
However, researchers often have preconceived
ideas about what these parameters might be
and wish to test whether the data conform with
these ideas.
Question:
Whether the population mean was equal to
10.50 that had been reported in the literatures?
It was one of the typical problems of
hypothesis testing.
50.1015.9
Sample mean
μ
How to explain this difference?
Two guesse
s
4.1.1 Set up the statistical hypotheses
5.10:0 H
50.10:1 H
null hypothesis
alternative hypothesis
4.1.2 Select statistics and calculate its current value
1.~/
50.10
ndistt
nS
Xt
19120,8345.220/13.2
50.1015.9
t
-2.8345 0 2.8345
Fig.4.1 Demonstration for the current value of t and the P-value
Symmetric around 0
4.1.3 Determine the P value
P-value is defined as a probability of the event
that the current situation and even more
extreme situation towards appear in the
population. 0H
The P-value can also be thought of as the
probability of obtaining a test statistic as
extreme as or more extreme than the actual
test statistic obtained, given that the null
hypothesis is true.
)8345.2( tPP
-2.8345 0 2.8345
0.01<p<0.02
Fig.4.1 Demonstration for the current value of t and the P-value
Current situation
Extreme situation
4.1.4 Decision and conclusion
In general, the decision rule is:
When P≤ , reject ;
otherwise, not reject .
0H
0H
An ignorable small probability alpha should be
defined in advance such as alpha=0.05
Statements:
For convenience of statement, “reject ” is
often stated as “there is a statistically
significant difference” or “the difference is
statistically significant”, but it does not mean
that the difference is big or obvious;
0H
Statements:
accordingly, “not reject ” is often stated as
“there is no statistically significant difference” o
r “the difference is not statistically significant”.
there is no enough evidence to reject and i
t does not straightforwardly mean to “accept ”
0H
0H
Conclusion:
The result of the above example might cover: t = -2.8345 , P < 0.02 , reject , that is, there is a statistical significant difference between the population mean and 10.50 mm/h, which is reported in the literatures.
0H
Incorporating the background, it is considerable
that the blood sedimentation (mm/h) of this kind of
patients might be lower than 10.50 on an average.
Two Errors:
Type I error : If is true, reject it.
Type II error : If is not true, not reject it.
0H
0H
Table 1 Two by Two Table
Truth
H0 H1
Decision Not reject H0 Correct conclusion
(1-)
False not reject H0
Type II error ()
False negative result
Reject H0
False reject H0
Type I error ()
False positive result
Correct conclusion
Power=(1-)
Probability of detecting a
predefined statistical significant
difference.
Making Type I or Type II errors often
result in monetary and nonmonetary
costs.
4.2
The t Test for One Group of Data under
Completely Randomized Design
4.2 The t Test for One Group of Data under Completely Randomized Design
Based on the mean and standard deviation of a
sample with n individuals randomly selected from
a normal distribution , if one wants to judge
whether the population mean is equal to a
given constant , the t test for one group of
data under completely randomized design can be
used.
0
main steps:
(1) Set up the statistical hypotheses
(2) Select statistics and calculate its current value
(3) Determine the P-value
00 : H 01 : H
nS
Xt
/0
) statistic of aluecourrent v( ttPP
(4) Decision and conclusion
Comparing the P-value with the pre-assigned
small probability , if P ≤ , then reject ;
otherwise, not reject . Finally, issue the
conclusion incorporating with the background.
0H
0H
Example 4.2
A large scale survey had reported that the mean
of pulses for healthy males is 72 times/min. A
physician randomly selected 25 healthy males in
a mountainous area and measured their pulses,
resulting in a sample mean of 75.2 times/min and
a standard deviation of 6.5 times/min. Can one
conclude that the mean of pulses for healthy
males in the mountainous area is higher than that
in the general population?
Solution: step1
72:0 H
05.0
72:1 H 72:1 H
One-side & two-side tests:
01 : H
01 : H
01 : H
two-side test
one-side test
Definition:
A two-side test is a test in which the values
of the parameter being studied under the
alternative hypothesis are allowed to be
either greater than or less than the values
of the parameter under the null hypothesis.
Definition:
A one-side test is a test in which the values
of the parameter being studied under the alter
native hypothesis are allowed to be either gre
ater than or less than the values of the param
eter under the null hypothesis, but not both.
Solution:
72:0 H 72:1 H
05.0
t =2.69 , 0.005<P<0.01
Conclusion: the mean of pulses for healthy males in the mountainous area is higher than that in the general population
-2.69 0 2.69
0.005<p<0.01
Fig.4.1 Demonstration for the current value of t and the P-value
P valueP valueOne side
Exercise 1:
Suppose we want to test the hypothesis that
mothers with low socioeconomic status (SES)
deliver babies whose birth-weights are lower
than “normal”.
To test this hypothesis, a list is obtained of birth-
weights from 100 consecutive, full-term, live-
born deliveries from the maternity ward of a
hospital in a low-SES area.
The mean birth-weight is found to be 115 oz,
with a sample standard deviation of 24 oz.
Suppose we know from nationwide survey based
on millions of deliveries that the mean birth-
weight in the United States is 120 oz.
Can we actually say the underlying mean
birth-weight from this hospital is lower than
the national average?
Questions:
1. How to test the hypothesis?
2. What are the type I error and type II
errors for the data? What results will
be occurred by the errors?
Solution: step1
120:0 H
05.0
120:1 H
Step2:
991100,08.2100/24
120115
t
-2.08 0 2.08
0.01<p<0.05
Fig.4.1 Demonstration for the current value of t and the P-value
P valueOne side
Step3:
We can reject H0 at a significance level of
0.05.
The true mean birth-weight is significantly
lower in this hospital than in the general
population.
Two Errors:
Type I error would be the probability of
deciding that the mean birth-weight in the
hospital was lower than 120 oz when in fact it
was 120 oz.
IF a type I error is made, then a special-care
nursery will be recommended, with all the
extra costs involved, when in fact it is not
needed.
Type II error would be the probability of decidin
g that the mean birth-weight was 120 oz when i
n fact it was lower than 120 oz.
If a type II error is made, a special-care nursery
will not be needed, when in fact it is needed. Th
e nonmonetary cost of this decision is that low-
birthweight babies may not survive without the
unique equipment in a special-care nursery.
THE END
THANKS!