chapter 6 ranksumtest
TRANSCRIPT
12. Nonparametric test based on ranks
1
• A large scale survey reported that the mean of
pulses for healthy males is 72 bpm. A physician
randomly selected 25 healthy males in a
mountainous area and measured their pulses,
resulting in a sample mean of 75.2 bpm and a
standard deviation of 6.5 bpm. Can one conclude
that the mean of pulses for healthy males in the
mountainous area is higher than that in the general
population ( μ>μ0 ) ?2
3
nS
xt
/0
462.225/5.6
0.722.75
405.0P
SupportingArea
RejectionArea
RejectionArea
Parametric Test
The methods of hypothesis testing we have learnt (1) Assume: the variable follows a normal
distribution; (2) To test whether the means (parameters) are equal
or not under such an assumption.
Therefore, they are called parametric tests .
5
Non-parametric tests (distribution-free tests)
• There aren’t any assumptions about the distribution.
• Chi-square test ( chapter. 6 ) is a kind of non-parametric test.
• Rank sum tests: Another kind of non- parametric test, which is based on ranks of the
data.
6
Under the following situations, the non-parametric tests could be used:
a. The distribution of data is unknown;b. The distribution of data is skew;c. Ranked data or non-precise data;d. A quick and brief analysis ( for pilot study ).
7
It is suitable for a variety of data:• Measurement or enumeration or ordinal• Normal distribution or not• Symmetric or not However, If the data are suitable for parametric tests, the power of non-parametric test (if it is used) will be slightly lower.
8
12.1 Wilcoxon’s signed rank sum test (matched pairs)
Example 12-1 In order to study the difference of intelligence between twin brothers, the intelligence scores of 12 pairs of twin brothers were measured. The results are listed in Table 12.2.
9
10
Table 12.2 Intelligence scores of 12 pairs of twin brothers Pair No.
i Senior
ix Junior
iy Difference
iii yxd Rank for || id
iR Rank with sign
iR
(1) (2) (3) (4) (5) (6) 1 86 88 2 3 3 2 71 77 6 7 7 3 77 76 -1 1.5 -1.5 4 68 64 -4 4 -4 5 91 96 5 5.5 5.5 6 72 72 0 - - 7 77 65 -12 10 -10 8 91 90 -1 1.5 -1.5 9 70 65 -5 5.5 -5.5
10 71 80 9 9 9 11 88 81 -7 8 -8 12 87 72 -15 11 -11
T +=24.5; T -=41.5
Steps:(1) Hypotheses: H0: The median of the difference is 0 H1: The median of the difference is not 0 α=0.05.(2) Difference(3) Ranking absolute differences (omit zero) and give back the signs(4) Rank sum and statistic T = min {positive sum, negative sum}(5) P-value and conclusion From Table 10, T is in 10-56, P>0.05, H0 is not rejected.
Conclusion: The intelligence score are at the same level.11
12.2 Wilcoxon’s rank sum test for two samples
Two independent samples; it is not a normal distribution, or it is not sure whether the variable follows a normal distribution .
12
Table 12.4 Survival time of cats and rabbits without oxygen Cats Rabbits
Survival time Ranks Survival time Ranks 25 9.5 15 1.5 34 15 15 1.5 44 17 16 3 46 18.5 17 4 46 18.5 19 5
21 6.5 21 6.5 23 8 25 9.5 27 11 28 12.5 28 12.5 30 14 35 16
n1=5 1R =78.5 n2=14 2R =111.5
13
(1)Hypotheses: H0: The distributions of two populations are same H1: The two distributions are not same α = 0.05(2) Ranking all the observations in two samples. If same values appear in (tie), give a mean rank. “25” in both sample, and the ranks should be 9
and 10, so that (9+10)/2= 9.5 for each.(3) Rank sum for smaller sample, T=T1= 78.5(4) P-value and conclusion (Table 11 ) T0.05,5,9=28~72, T is outside the range, P<0.05. The difference is of statistical significance
between two animals.14
12.3.1 Kruskal-Wallis’ H test for comparing more than 2
samples• Example 12.3 14 newborn infants were
grouped into 4 categories according to their mother’s smoking habit.
A: smoking more than 20 cigarettes per day; B: smoking less than 20 cigarettes per day; C: ex-smoker; D: never smoking. Their weights are listed in Table 12.7.
15
Table 12.7 The weights of newborn infants grouped by their mothers’ smoking habit
Weight ijx Rank ijr A B C D A B C D 2.7 2.9 3.3 3.5 3 4 7 11 2.4 3.2 3.6 3.6 2 5.5 12.5 12.5 2.2 3.2 3.4 3.7 1 5.5 9 14 3.4 3.4 9 9 in 4 3 4 3 iR 15 15 37.5 37.5
16
(1)Hypothesis: H0: The distributions of three populations are all
same H1: The distributions of three populations are not all
same α = 0.05
(2) Ranking all the observations in three samples (Same way for ties)
(3) Rank sums for each sample R1=R2=15, R3=R4=37.5 17
(4) Statistic H If there is no tie
If there are ties tj : Number of individuals in j-th tie
Example 12.7:
18
)1(3)1(
12 2
Nn
R
NNH
i
i
NN
ttC jj
3
3 )(1
C
HHC
9868.01414
)22()33()22(1 3
333
C 500.99868.0375.9 CH
375.9)114(3)3
5.37
4
5.37
3
15
4
15(
)114(14
12 2222
H
19
(5) (5) PP-value and conclusion-value and conclusion
—— —— Compare with critical value of Compare with critical value of HH ( (C 7C 7) )
oror
kk: Number of samples: Number of samples
Example 12.7:Example 12.7:
Conclusion: The weights are not all at an equal Conclusion: The weights are not all at an equal level.level.
21, k
815.723,05.0
12.3.2 Friedman test for the data from a randomized block design
20
Example 12.4 The riboflavin were tested for 3 samples of cabbage under four test conditions (A, B, C and D). The results are listed in Table 12.9. Now the question is if the test results are different in different kinds of test conditions.
Table 12.9 The Riboflavin in cabbages ( g /g) Test conditions
Sample A B C D
1 27.2 24.6 39.5 38.6 2 23.2 24.2 43.1 39.5 3 24.8 22.2 45.2 33.0 Rj 5 4 12 9
21
Table 12.9 The Riboflavin in cabbages ( g /g) Test conditions
Sample A B C D
1 27.2(2) 24.6(1) 39.5(4) 38.6(3) 2 23.2(1) 24.2(2) 43.1(4) 39.5(3) 3 24.8(2) 22.2(1) 45.2(4) 33.0(3) Rj 5 4 12 9
22
Since 815.7,32
05.0 <8.2 and 05.0P , 0H is rejected.
We conclude that the results under different conditions
may have different levels of readings.
23
)1(3
)1(
12
12/)1(
2/)1(
1
2
1
2
2
kbRkbkkkb
kbR k
jj
k
j
j
2.8)14)(3(3)91245()14)(4(3
12 2222
12.3.3. multiple comparison of mean ranks
When the comparison among four groups results in significant differences, multiple comparison is needed to know who and who are different. Z tests for pair-wise comparison could be used. H0: The location of population A and B are different H1: The location of population A and B are not different α = 0.05
24
ji RR
jiij
RRZ
jiRR nn
nnji
11
12
)1(2
Table 12.7 The weights and ranks of newborn infants grouped by their mothers’ smoking habit
Weight ijx Rank ijr A B C D A B C D 2.7 2.9 3.3 3.5 3 4 7 11 2.4 3.2 3.6 3.6 2 5.5 12.5 12.5 2.2 3.2 3.4 3.7 1 5.5 9 14 3.4 3.4 9 9 in 4 3 4 3 iR 15 15 37.5 37.5
25
75.34/151 R 00.53/152 R
38.94/5.373 R 50.123/5.374 R
(1)Hypothesis: H0: this pair of two population distributions
have the same location H1: this pair of two population distributions
have different locations, α=0.05.(2) Calculate Z value:
26
74.2
)3
1
4
1(
12
)114(14
50.1275.3
)11
(12
)1(
4
414,1
nn
nn
RRZ
i
(3) Decide P value,
Weights in first group has a different level from that of fourth group. Since , The mothers who smoke may have babies with lower weights.
41 RR
27
Conclusion: Smoking may lead to the newborn’s lower weights.
0167.0,12.274.2 PZ
0167.03
05.0* c
12.20167.0 Z
28