Download - Chapter 4 est. test of hypothesis
ESTIMATION AND TEST OF HYPOTHESES: ONE- SAMPLE, TWO- SAMPLE
Introduction to hypothesis Testing: Suppose you have to buy cornflakes from a salesman. The issue is not the price of cornflakes but the amount of cornflakes in each box. The salesman appears and claims that the cornflakes he is selling are
packaged at 10 oz/box. You have exactly 4 alternative possible views of his claim.
INTRODUCTION…He is honest and
μ = 10 ozHe is conservative and there is more than 10 oz/box;
μ > 10 oz
He is trying to cheat you and there is less than 10
oz/box; μ < 10 oz
He is new on the job and does not really know the amount per box; his claim could be high or low, ie
μ ≠ 10 oz.
INTRODUCTION…If you think he is honest you would just go
ahead and order your cornflakes from him. You may, however, have one the other
views, he is i)CONSERVATIVE or
ii)LIAR or
iii)CLUELESS. The position you hold regarding the
salesman can be any one of these but not more than one. You can’t assume he is liar and conservative ie μ < 10 oz and μ > 10 oz , at the same time.
INTRODUCTION…Proper use of scientific method will allow you to test one of these alternative positions through a sampling process. Remember you can choose only one to test.
How would you decide ?
?????
INTRODUCTION… CASE 1: Testing the salesman is conservativeSuppose the salesman is remarkably shy and seems
to lack self confidence. You feel from his general conduct that he is being conservative in his claim of 10 oz/box. The situation can be summarized with a pair of hypothesis – actually a pair of predictions.
A) The salesman’s claim and the prediction we
will directly test. It is usually called Ho or null hypothesis. In this case
Ho: μ=10 oz.
INTRODUCTION…B) The second is called the alternative or
research hypothesis which is your belief or position. The alternative hypothesis in this case is Ha: μ > 10 oz. By writing the null hypothesis as Ho: μ ≤ 10 oz. Predictions take the following forms
Ho: μ ≤ 10 oz (null hypothesis) Ha: μ > 10 oz (alternative
hypothesis)And we have generated two mutually
exclusive and all-inclusive possibilities. Therefore, either Ho or Ha will be true, but not both.
INTRODUCTION..Hypothses
A. Salesman’s claim(Ho)
B. Customer’s belief or position (Ha)
INTRODUCTION…In order to test the salesman’s claim (Ho)
against your views (Ha), you decide to do a small experiment. You select 25 boxes of cornflakes from a consignment and carefully empty each box, weigh and record its contents. This experimental sampling is done after you have formulated the two hypotheses. If the first hypothesis were true you would expect the sample mean of the 25 boxes to be close to or less than 10 oz.
INTRODUCTION…If the second hypothesis were true you
would expect the sample mean to be significantly greater than 10 oz. We have to think about what significantly greater means in this context. In statistics significantly less or more or different means that the result of the experiment would be a rare result if the null hypothesis were true. In other words, the result is far enough from the prediction in the null hypothesis that we feel that we must reject the truthfulness of the hypothesis.
INTRODUCTION…The idea leads to the problem of what is a
rare result or rare enough result to be sufficiently suspicious of the null hypothesis. For now we will say if the result could occur by chance less than 1 in 20 times if the null hypothesis were true. When we will reject the null hypothesis and consequently accept the alternative ones. Let’s now look at how this decision making criterion works in CASE 1.
INTRODUCTION…Ho : μ ≤ 10 ozHa : μ > 10 ozn= 25 and assume and is widely
known. 0.1
INTRODUCTION….Suppose the mean of your 25 box sample is
10.36 oz. Is that significantly different from (>) 10 oz so that we should reject the claim of 10 oz stated in Ho. Clearly it is greater than 10 oz but is this mean rare enough under the claim of μ ≤ 10 oz for us to reject the claim.
To answer this question we will use the standard normal transformation to find the probability of ≥10.36 oz when the mean of the sampling distribution of is 10 oz. If this probability is less than 0.05 (1 in 20), we consider the result to be too rare for acceptance of Ho.
X
X
INTRODUCTION…CASE II: Testing that the salesman is a cheat Suppose our salesman is a fast and smooth
talker with fancy cloths and a new sports car. Your view might be that cornflakes salesman only gain this type affluence through unethical practices. You think this guy is cheat. Your null hypothesis is Ho: μ ≥ 10 oz and your alternative hypothesis is Ha:μ < 10 oz . Notice that the two hypothesis are again mutually exclusive and all inclusive and that the equal sign is always in the null hypothesis.
INTRODUCTION…..It is the null hypothesis (the salesman’s
claim) that will be tested. Ho : μ ≥ 10 ozHa : μ < 10 oz. Suppose you again sample 25 boxes to
determine the average weight. The question you want to answer and the predictions (Ho, Ha) stemming from that question are again formulated before the sampling is done,
INTRODUCTION…n = 25, oz and again we find =
10.36 oz. How does this result fit our predictions ? If Ho is false, we expect the mean to be significantly less than 10 oz.
0.1 X
X
INTRODUCTION…CASE III: Testing that the salesman is clueless
The last case is somewhat different from the
first in that we really don’t know whether to expect the mean of the sample to be higher or lower than the salesman’s claim. The salesman is new on the job and does not know his product very well. The claim of 10 oz per box is what he has been told, but you don’t have a sense that he is either overly conservative (CASE I) or dishonest (CASE II). Your alternative hypothesis here is less focused.
INTRODUCTION…It becomes that the mean is different from
10 oz. The prediction become Ho: μ = 10 ozHa : μ ≠ 10 oz. Under Ho we expect to
be close to 10 oz, while under Ha we
expect to be different from 10 oz in either direction ie significantly smaller or significantly larger than 10 oz.
X
X
TYPICAL STEPS IN A STATISTICAL TEST OF HYPOTHESIS
1. State the problem: should I buy cornflakes from salesman?
2. Formulate the null and alternative hypothesis
Ho : μ = 10 oz Ha : μ ≠ 10 oz3. Choose the level of significance. This means
to choose the probability of rejecting a true null hypothesis. We choose 1 in 20 in our cornflakes example, that is, 5% or 0.05. When Z was so extreme as to occur less than 1 in 20 times if Ho were true, we rejected Ho.
TYPICAL STEPS…4. Z is calculated as
Determine the appropriate test statistic. Here we mean the index whose sampling distribution is known, so that objective criteria can be used to decide between Ho and Ha. In the cornflakes example we used a Z transformation because under the Central Limit Theorem was assumed to be normally or approximately normally distributed and the value of was known.
n
XZ
X
TYPICAL STEPS…5. Calculate the appropriate test statistic.
Only after the first four steps are completed , can one do the sampling and generate the so-called test statistic.
Here Z= 8.120.0
36.0
25
100.1036.10
TYPICAL STEPS…6. Determine the critical values for the
sampling distribution and appropriate level of significance. For the two tailed test and level of significance of 1 in 20 we have critical values of + 1.960 (C.3 Tab). These values or more extreme ones only occur 1 in 20 times if Ho is true. The critical values serve as cutoff points in the sampling distribution for regions to reject Ho.
TYPICAL STEPS….7. Compare the test statistic to the critical
values. In a two-tailed test, the CV’s = + 1.960 and the test statistic is 1.8, so
- 1.960<1.8<1.960. 8. Based on the comparison in step 7,
accept or reject Ho. Since Z falls between the critical values, it is not extreme enough to reject Ho.
9. State your conclusion and answer the question posed in step 1. SO WE ACCEPT HO.
TYPE I VS TYPE II ERROR IN HYPOTHESIS TESTINGBecause the predictions in Ho and Ha are written so
that they are naturally exclusive and all inclusive, we have a situation where one is true and the other is automatically false.
When Ho is true, then Ha is false.
If we accept Ho we have done the right thing If we reject Ho we have made an error
This type of mistake is called a Type I error
TYPE I VS TYPE II ERRORWhen Ho is false , then Ha is true
If we accept Ho, we have made an error
If we reject Ho, we have done the right thing
The second type of mistake is called Type II error
t- test ( Hypothesis involving the mean)
Example 1. A forest ecologist studying regeneration of rain forest communities in gaps caused by large tree falling during storms, read the stinging (bow) tree, Dendrocnide excelsa, seedlings will grow 1.5m/yr in direct sun light in each gap. In the gaps in her study plot she identified 9 specimens of this species and measured them in 2009and again 1 yr later. Listed below are the changes in height for the nine specimens.
T-TEST…Do her data support the published
contention that seedlings of this species will average 1.5 m of growth per yr in direct sun light ?
1.9 2.5 1.6 2.0 1.5 2.7 1.9 1.0 2.0 SolutionHypothesis : Ho: μ = 1.5 m/yr Ha: μ ≠ 1.5 m/yr
T-TEST…If the sample mean for 9 specimens is close
to 1.5 m/yr we will accept Ho. If sample mean is significantly larger or smaller than 1.5 m/yr we will accept Ha (reject Ho). To test significant difference, it means that they are so rare that they would occur by chance less than 5% of the time, if Ho is true ie α = 0.05. Test statistic will be
n
sX
t
T-TEST…Here, n=9, s2 =0.260 m2 , s= 0.51
and
Clearly t-value of 2.35 is not zero but it is far enough away from zero so that we can comfortably reject Ho. With a predetermined α level of 0.05 we must get a t-value far enough from zero that would occur <5% of the time if Ho is true.
,90.1 mX
35.2
3
51.040.0
9
51.050.190.1
n
sX
t
T-TEST…From Tab C.4 we have the following
sampling distribution for t with v=n-1= 8 and α=0.05 for a two tailed test.
-2.306 +2.306
t=2.35
0accept
reject
reject
0.025
0.025
T-TEST…If Ho is true and we sample hundreds or
thousands of times with samples of 9 species and each time we calculate the t-value for the sample, these t-values would form a distribution with the shape indicated above. 2.5% of the samples would generate t-values below -2.306 and 2.5% of the samples would generate t values above 2.306. So values as extreme as + 2.306 are rare if Ho is true.
T-TEST…The test statistic in this sample is 2.35
and since 2.35>2.306, the result would be considered rare for a true null hypothesis. We reject Ho based on this comparison and conclude that average growth of stinging trees in direct sun light is different from the published value and is, in fact, greater than 1.5 m/yr.
Rejecting Ho may lead to a Type I error.
EXAMPLE: TWO SAMPLE TESTWatching an infomercial on TV you hear
the claim that without changing your eating habits, a particular herbal extract when taken daily will allow you to loose 5lb in 5 days. You decide to test this claim by enlisting 12 of your classmates into an experiment. You weigh each subject, ask them to use the herbal extract for 5 days and then weigh them again. From the results recorded below, test the infomercial’s claim of 5 lb lost in 5 days.
EXAM. TWO SAMPLE TESTSubject Weight
before(lb)Weight after(lb)
1 128 120
2 131 123
3 165 163
4 140 141
5 178 170
6 121 118
7 190 188
8 135 136
9 118 121
10 146 140
11 212 207
12 135 126
EXAM: TWO SAMPLE TESTSolution: Because the data are paired
we are not directly interested in the values presented above, but are interested in the differences or changes on the pairs of members. Think of data as in groups
Group 1 Group 2
X11 X21
X12 X22
X13 X23
… …
X1n X2n
For the paired data here we wish to investigate the differences or di’s where X11-X21 = d1, X12-X22 = d2, X1n-X2n =dn
EXAM: TWO SAMPLE TESTExpressing the data set in terms of these
differences di’s, we have the following table. Note importance of sign of these differences
subjects
di subjects
di
1 8 7 2
2 8 8 -1
3 2 9 -3
4 -1 10 6
5 8 11 5
6 3 12 9
EXAM: TWO SAMPLE TESTThe infomercial claim of a 5 lb loss in 5
days could be written Ho: μB- μA = 5lb but Ho: μd = 5lb is
somewhat more appealingHo: μd = 5 lb
Ha: μd ≠ 5 lb
Choose α = 0.05, since the two columns of data collapse into one column of interest, we treat these data now as a one sample experiment.
EXAM: TWO SAMPLE TESTThere is no preliminary F test and our only
assumption is that the di’s are approximately normally distributed. The test statistic for the paired sample t test is
With v = n-1, where n is number of pairs of data points.
n
sX
td
d
EXAM: TWO SAMPLE TEST, Here = 3.8 lb, sd = 4.1 lb, n=12. We
expect this statistic to be close to 0 if Ho is true ie the herbal extract allows you to loose 5 lb in 5 days. We expect this statistic to be significantly different from 0 if the claim is false.
dX
01.1
12
1.458.3
t
EXAM: TWO SAMPLE TESTWith v= n-1= 12-1 =11. The critical value for
this left tailed test from Tab C.4 is t0.05(11)= -1.796. Since -1.796<-1.01 the test statistic does not deviate enough from expectation under a true Ho that you can reject Ho. The data gathered from your classmates support the claim of an average loss of 5 lbs in 5 days with the herbal extract. Because you accept Ho here, you may be making a Type II error (accepting a false Ho), but we have no way of quantifying the probability of this type of error.
EXAMPLE 3An expt. was conducted to compare the
performance of two varieties of wheat, A and B. Seven farms were randomly chosen for the expt. and the yields in metric tons per hectare for each variety on each farm were as follows;
Farm Yield of var. A
Yield of var. B
1 4.6 4.1
2 4.8 4.0
3 3.2 3.5
4 4.7 4.1
5 4.3 4.5
6 3.7 3.3
7 4.1 3.8
EXAMPLE 3…a) Why do you think both varieties were
on each farm rather than testing variety A on seven farms and variety B on seven different farms?
b) Carry out a hypothesis test to decide whether the mean yields are the same for the two varieties.
EXAMPLE 3…Solution: The expt. was designed to test
both varieties on each farm because different farms may have significantly different yields due to differences in
i) soil characteristics
ii) micro climate
iii) cultivation practices
“Pairing” the data points accounts for most of the “between farm” variability and should make any difference in yield due solely to what variety.
EXAMPLE 3…Farm Difference
(A-B)
1 0.5
2 0.8
3 -0.3
4 0.6
5 -0.2
6 0.4
7 0.3
The hypotheses areHo : μA – μB or μd = 0
Ha : μd ≠ 0
Let α = 0.05.Then ton/hectare n =7
and and sd = 0.41 ton/hectare.
30.0dX
94.1
7
41.0030.0
t
EXAMPLE 3…With v=7-1=6 . The critical values from
Tab C.4 are t0.025(6)= -2.447 and t0.975(6) = 2.447. Since
-2.447<1.94<2.447 the test statistic does not deviate enough from 0, the expected t value if Ho is true, to reject Ho. From the data given we can not say that the yields of varieties A and B are significantly different.
CHI-SQUARE TESTExample: A geneticist interested in human
population has been studying growth patterns in US males since 1900. A monograph written in 1902 states that the mean height of adult US males is 67.0 inch with a standard deviation of 3.5 inch. Wishing to see if these values have changed over the 20th century the geneticists measured a random sample of adult US males and found that = 69.4 inch and s = 4.0 inch. Are these values significantly different from the values published in 1902?
X
CHI-SQUARE…Solution: There are two questions here –
one about the mean and the second about the standard deviation or variance. Two questions require two sets of hypotheses and two test statistics. For the question about means, the hypotheses are
Ho : μ = 67.0 inchHa : μ ≠ 67.0 inch
CHI-SQUARE…With n = 28 and α = 0.01. This is a two
tail test with the question and hypotheses (Ho and Ha) formulated before the data were collected or analyzed.
Using an α level of 0.01 for v= n-1= 27, we find the critical values to be ± 2.771 (Tab C.4).
16.376.0
4.2
28
0.40.674.69
n
sX
t
CHI-SQUARE…Since 3.16>2.77, we reject Ho and say
that modern mean is significantly different from that reported in 1902 and , in fact, is higher than the reported value (because the t-value falls in the right hand tail). P (Type I error)< 0.01.
For the question about variance, the hypotheses are Ho: Ha :
22 25.12 inch 22 25.12 inch
CHI-SQUARE….Here n=28. Then
The question about variability is answered
with a Chi-square statistic. The value is expected to be close to 27 (n-1), if Ho is true and significantly different from 27, if Ha is true.
3.3525.12
16)128()1(2
22
sn
2
CHI-SQUARE…From Table C.5 using an alpha level of
0.01 for v = 27, we find the critical values for to be 11.8 and 49.6. Since 11.8<35.3<49.6 we do not reject Ho here. There is not statistical support for Ha. The p value here for p
is between 0.500(31.5) and 0.250(36.7) indicating the calculated value is not a rare event under the null hypothesis.
2
)3.35( 2
CHI-SQUARE…..We would conclude that the mean height
of adult US males is higher now than reported in 1902, but the variability in heights is not significantly different today than in 1902.
CHI-SQUARE TEST FOR GOODNESS OF FITAssumptions for the test for goodness of fit
are that1. An independent random sample of size n is
drawn from the population.2. The population can be divided into a set of
k mutually exclusive categories.3. The expected frequencies for each
category must be specified. Let Ei denote the expected frequency for the i-th category. The sample size must be sufficiently large so that each Ei is at least 5 (categories may be combined to achieve this).
2
…GOODNESS OF FITThe hypothesis test takes only one form
Ho : The observed frequency distribution is the same as the hypothesized frequency distribution
Ha : The observed and hypothesized frequency distributions are different
Generally speaking, this is an example of a statistical test where one wishes to confirm the null hypothesis.
….GOODNESS OF FITTest statistic
Let Oi denote the observed frequency of the i-th category. The test statistic is based on the difference between the observed and expected frequencies, Oi - Ei.
The intuition for the test is that if the observed and expected frequencies are nearly equal for each category, then each
Oi – Ei will be small and, hence, will be small. Small values of Chi-squares should lead
to acceptance of Ho while large values lead to rejection. The test is always right tailed. Ho is rejected only when the test statistic exceeds a specified value.
k
ii
ii
E
EO1
22 )(
2
….GOODNESS OF FITThe statistic has an approximate Chi-
square distribution where Ho is true; the approximation improves as sample size increases. The values of the Chi-square distribution are tabulated in C.5.
….GOODNESS OF FIT :EXAMPLEThe progeny of self-fertilized four-o’clocks
were expected to flower red, pink and white in the ratio of 1:2:1. There were 240 progeny produced with 55 red plants, 132 pink plants, and 53 white plants. Are these data reasonably consistent with the Mendelian 1:2:1 ratio?
EXAMPLE…Solution: The hypotheses are
Ho: The data are consistent with a Mendelian model (1:2:1)
Ha: The data are inconsistent with a Mendelian model (1:2:1)
The THREE colours are the THREE categories. In order to calculate frequencies, no parameters need to be estimated. The Mendelian ratios are given; 25% red, 50% pink and 25% white. Using the fact that there are 240 observations, the number of expected red four-o’clock is 0.25 × 240 = 60 ie Ei = 60. Similar calculations for pink and white yield the following table:
EXAMPLE…Category Oi Ei
Red 55 60 0.42
Pink 132 120 1.20
White 53 60 0.82
Total 240 240 2.44
i
ii
E
EO 2)(
EXAMPLE…
44.282.020.142.0)(3
1
22
i
i
ii
E
EO
EXAMPLE….v = df = no. of categories-1 = 3-1 = 2 Let
α = 0.05Because the test is right tailed, the critical
value occurs when . Thus in Table C.5 for df=2 and p=1-α = 0.95, the critical value is found to be 5.99. Since 2.44<5.99, Ho is accepted. This support Mendelian 1:2:1 ratio.
)( 21
21p