chapter 14: nonparametric statistics

+

Chapter 14:Nonparametric Statistics

Lecture PowerPoint Slides

Discovering Statistics

2nd Edition Daniel T. Larose

+ Chapter 14 Overview

14.1 Introduction to Nonparametric Statistics

14.2 Sign Test

14.3 Wilcoxon Signed Ranks Test for Matched-Pairs Data

14.4 Wilcoxon Rank Sum Test for Two Independent Samples

14.5 Kruskal-Wallis Test

14.6 Rank Correlation Test

14.7 Runs Test for Randomness

2

+ The Big Picture

Where we are coming from and where we are headed…

In earlier chapters, we learned how to perform hypothesis tests for population parameters, such as the mean µ or proportion p.

Here in Chapter 14 we learn about a family of hypothesis tests, called nonparametric hypothesis tests, whose conditions are similar to those in earlier chapters but less stringent.

Congratulations on getting this far in your discovery of the field of statistics! Best of luck in the future!

3

+ 14.1: Introduction to Nonparametric Statistics

Objectives:

Explain what a nonparametric hypothesis test is and why we use it.

Describe what is meant by the efficiency of a nonparametric test.

4

5

Nonparametric Hypothesis TestsIn Chapters 9–13, we learned how to perform hypothesis tests for population parameters, such as µ and p. To perform each of these parametric hypothesis tests, certain conditions need to be satisfied.

Parametric hypothesis tests are used to test claims about a population parameter, such as the population mean µ or proportion p. Often, parametric tests require that the population follow a particular distribution, such as the normal distribution.

Nonparametric hypothesis tests, also called distribution-free hypothesis tests, generally have fewer required conditions. In particular, nonparametric tests do not require the population to follow a particular distribution, such as the normal distribution.

6

Nonparametric Hypothesis TestsAdvantages of Nonparametric Hypothesis Tests

1. May be used on a greater variety of data because they require fewer conditions than their parametric counterparts.

2. Can be applied to categorical (qualitative) data.

3. Manual computations tend to be easier than their parametric counterparts.

Disadvantages of Nonparametric Hypothesis Tests

1. Less efficient than parametric tests as they require a larger sample size to reject a null hypothesis.

2. Replace actual data values with either signs or ranks. Thus, the actual data values are wasted.

3. Technology often does not have dedicated procedures for performing these tests.

7

Nonparametric Hypothesis TestsIn general, parametric tests are more efficient than corresponding nonparametric tests. The efficiency of a nonparametric test is used to compare it with its corresponding parametric test.

The efficiency of a nonparametric hypothesis test is defined as the ratio of the sample size required for the corresponding parametric test to the sample size required for the nonparametric test, in order to achieve the same result (such as correctly rejecting the null hypothesis). The efficiency ratings are reported on the assumption that required conditions for both the parametric and nonparametric tests have been met.

Section Situation Parametric Test Nonparametric Test Efficiency

14.2 Matched pairs t or Z test Sign test 0.63

14.3 Matched pairs t or Z test Wilcoxon signed ranks test 0.95

14.4 Two independent samples t or Z test Wilcoxon rank sum test 0.95

14.5 Several independent samples ANOVA Kruskal-Wallis test 0.95

14.6 Correlation Linear Correlation Rank correlation test 0.91

14.7 Randomness None Runs test --

+ 14.2: Sign Test

Objectives:

Perform the sign test for a single population median.

Carry out the sign test for matched-pair data from two dependent samples.

Perform the sign test for binomial data.

8

9

Sign Test for a Population Median

In Section 9.4, we learned how to perform the one-sample t test for the population mean µ. This is a parametric test requiring either a normal population or large sample. What do we do when we have neither? We use the sign test for the population median.

The sign test is a nonparametric hypothesis test in which the original data are transformed into plus or minus signs. The sign test may be conducted for (a) a single population median, (b) matched-pair data from two dependent samples, or (c) binomial data.The sign test requires only that the sample data have been randomly selected. It is not required that the population be normally distributed.

10


Sign Test for the Population Median M (Small Sample ≤ 25)

If the data have been randomly selected, assign each value a (+) if greater than the hypothesized median or (–) if less than the hypothesized median.

Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0

Step 2: Find the critical value and state the rejection rule. Use Table X, , and the sample size n to identify Scrit.

Step 3: Find the test statistic Sdata.

Right-tailed test, Sdata= number of minus signs

Left-tailed test, Sdata= number of plus signs

Two-tailed test, Sdata= the smaller number of plus or minus signs

Step 4: State the conclusion and the interpretation.

11


Sign Test for the Population Median M (Large Sample > 25)

If the data have been randomly selected, assign each value a (+) if greater than the hypothesized median or (–) if less than the hypothesized median.

Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0

Step 2: Find the critical value and state the rejection rule. Use Table X to find Zcrit.

Step 3: Find the test statistics and Zdata.


2

2)5.0( data

datan

nS

Z

+ 14.3: Wilcoxon Signed Ranks Test for Matched-Pair Data

Objectives:

Assess whether or not a data set is symmetric.

Carry out the Wilcoxon signed ranks test for matched-pair data from two dependent samples.

Perform the Wilcoxon signed ranks test for a single population median.

12

13

Assessing the Symmetry of a Data Set

In Section 2.2, we learned that a distribution is symmetric if there is an axis of symmetry that splits the image in half so one side is the mirror image of the other. In Section 3.5, we learned that a boxplot is a convenient method for assessing the symmetry of a dataset.

Boxplot Criterion for Assessing Symmetry

A data set is symmetric when its corresponding boxplot has whiskers of approximately equal length, and the median line is situated approximately in the center of the box.

14

Wilcoxon Signed Ranks Test

In Section 14.2, we performed the sign test for both a single population median and for the population median of the difference between two dependent samples.

The Wilcoxon signed ranks test is a nonparametric hypothesis test in which the original data are transformed into their ranks. The Wilcoxon signed ranks test may be conducted for (a) a single population median, or (b) matched-pair data from two dependent samples.

To perform the test, data must be randomly selected and have a symmetric distribution. Order the observations or the absolute value of the differences from smallest to largest. Rank these values from smallest to largest, assigning the average rank to any values that are the same. Then, attach the sign of corresponding values to the ranks.

15


Wilcoxon Signed Ranks Test for Matched-Pair Data

(Small Sample ≤ 30)

If the data have been randomly selected and the distribution is symmetric, assign each value a signed rank.

Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0

Step 2: Find the critical value and state the rejection rule. Use Table X, , and the sample size n to identify Tcrit.

Step 3: Find the test statistic Tdata.

Right-tailed test, Tdata= |T–|

Left-tailed test, Tdata= T+

Two-tailed test, Sdata= the smaller of T+ or |T–|.


16


Wilcoxon Signed Ranks Test for Matched-Pair Data

(Large Sample > 30)

If the data have been randomly selected and the distribution is symmetric, assign each value a signed rank.

Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0

Step 2: Find the critical value and state the rejection rule. Use Table X to identify Zcrit.

Step 3: Find the test statistic Zdata.


24)12)(1(

4)1(

data

data

nnn

nnT

Z

17


We can use the same methods for the Wilcoxon signed ranks test for a single population median that we used for matched-pair data. However, there is no subtracting of sample values to find the differences.

Instead, subtract the hypothesized median from each data value and assign the signed ranks to the differences.

Null Hypothesis Alternative Hypothesis Type of Test

H0: M = M0 H0: M > M0 Right-tailed

H0: M = M0 H0: M < M0 Left-tailed

H0: M = M0 H0: M ≠ M0 Two-tailed

+ 14.4: Wilcoxon Rank Sum Test for Two Independent Samples

Objective:

Perform the Wilcoxon rank sum test for the difference in population medians, using two independent samples.

18

19

Wilcoxon Rank Sum Test

In Section 14.3, we compared data from dependent samples. Recall that two samples are independent when the subjects selected for the first sample do not determine the subjects in the second sample. The two-sample t test that we learned in Section 10.2 required that either each sample be large or that each population be normally distributed.

The Wilcoxon rank sum test is a nonparametric hypothesis test in which the original data from two independent samples are transformed into their ranks. It tests whether the two population medians are equal or not.

In the Wilcoxon rank sum test, the two samples are temporarily combined, and the ranks of the combined data values are calculated. Then the ranks are summed separately for each sample.

R1 = the sum of the ranks for the first sampleR2 = the sum of the ranks for the second sample

20


Wilcoxon Rank Sum Test for Two Independent Samples

The requirements are: (a) two independent random samples, (b) each sample size > 10, and (c) the shapes of the distributions are the same.

Step 1: State the hypotheses. H0: M1= M2 vs. Ha: M1>, M1<, or M1 ≠ M2

Step 2: Find the critical value and state the rejection rule. See Table 14.14.



R

RμRZ

1data

2

)1( 211

nnnμR

R n1n2(n1 n2 1)

12

+ 14.5: Kruskal-Wallis Test

Objective:

Perform Kruskal-Wallis test for equal medians in three or more populations.

21

22

Kruskal-Wallis Test

In Section 14.4, we learned the Wilcoxon rank sum test, which tests whether the population medians of two independent random samples are equal. Here, we extend this method to three or more populations.

The Kruskal-Wallis test is a nonparametric hypothesis test in which the original data from three or more independent samples are transformed into their ranks. It tests whether the population medians are all equal.

Like the Wilcoxon rank sum test, the samples are temporarily combined, and the ranks of the combined data values are calculated. Then the ranks are summed separately for each sample.

R1 = the sum of the ranks for the first sampleR2 = the sum of the ranks for the second sample, and so on..Rk = the sum of the ranks for the last sample

23


Wilcoxon Rank Sum Test for Two Independent Samples

The requirements are (a) k ≥ 3 independent random samples and (b) each sample size > 5.

Step 1: State the hypotheses. H0: The population medians are all equal vs. Ha: Not all population medians are equal.

Step 2: Find the 2 critical value and state the rejection rule. Use Table X, , and k – 1 degrees of freedom.

Step 3: Find the test statistic 2data.


)1(3...)1(

12χ

2

2

22

1

212

data

N

n

R

n

R

n

R

NN k

k

+ 14.6: Rank Correlation Test

Objective:

Perform the rank correlation test for paired data.

24

25

Rank Correlation TestIn Chapter 4, we learned how to calculate the correlation coefficient, which measures the strength of linear association between two variables. Here, we will learn how to calculate the rank correlation of two variables, which is the correlation of the variables based on ranks.

The rank correlation test (Spearman’s rank correlation test) is based on the ranks of matched-pair data. This test may also be applied when the original data are ranks. In the rank correlation test, we investigate whether two variables are related by analyzing the ranks of matched-pair data. The rank correlation test may also be used to detect a nonlinear relationship between two variables.

The hypotheses for the rank correlation test are:H0 = there is no rank correlation between the two variablesHa = there is a rank correlation between the two variables

To find the test statistic, we must calculate and square the paired differences of the ranks.

26

Rank Correlation Test

Rank Correlation Test (Small Sample ≤ 30)

The sample data must be randomly selected.

Step 1: State the hypotheses.

Step 2: Find the rcrit critical value and state the rejection rule. Use Table X, , and sample size n.

Step 3: Find the test statistic rdata.

Rank the values of each variable from lowest to highest.

Find the difference in ranks for each subject, square the differences, and add them up.


)1(

61

2

2

data

nn

dr

27

Rank Correlation Test

Rank Correlation Test (Large Sample > 30)

The sample data must be randomly selected.

Step 1: State the hypotheses.

Step 2: Find the Zcrit critical value and state the rejection rule. Use Table 14.24.


Rank the values of each variable from lowest to highest.

Find the difference in ranks for each subject, square the differences, and add them up.


)1(

61

2

2

data

nn

dZ

+ 14.7: Runs Test for Randomness

Objective:

Perform the runs test for randomness.

28

29

Runs Test for RandomnessRecall from Chapter 13 that one of the assumptions for the linear regression model was that the values y were independent. Here we learn a test for checking this assumption.

The runs test for randomness helps us determine whether the data in a sequence are random or if there is a pattern. The test applies to data that have two possible outcomes or data that can be re-expressed as one of two outcomes. The test works by counting the number of runs in the data set.

A sequence is an ordered set of data. A run is a sequence of observations sharing the same value (of two possible values), preceded or followed by data having the other possible value or by no data at all. The runs test for randomness tests whether the data in a sequence are random or whether there is a pattern in the sequence.

30

Runs Test for Randomness

The notation for the runs test for randomness is as follows:

n1 = the number of observations having the first outcome

n2 = the number of observations having the second outcome

n = the total number of observations

G = the number of runs in the sequence

31


Runs Test for Randomness (Small Samples n1 and n2 ≤ 20)

There are two conditions: (a) the data are ordered, and (b) each data value represents one of two distinct outcomes.

Step 1: State the hypotheses. H0: The sequence of data is random vs. Ha: The sequence of data is not random.

Step 2: Find the Gcrit critical value and state the rejection rule. Use Table X, = 0.05, row n1, and column n2.

Step 3: Find the test statistic Gdata.


GG data

32


Runs Test for Randomness (Large Samples n1 or n2 > 20)

There are two conditions: (a) the data are ordered, and (b) each data value represents one of two distinct outcomes.

Step 1: State the hypotheses. H0: The sequence of data is random vs. Ha: The sequence of data is not random.

Step 2: Find the Zcrit critical value and state the rejection rule. Use Table 14.27.

Step 3: Find the test statistic Gdata.


G

GGG

data

G 2n1n2n1 n2

1

G (2n1n2)(2n1n2 n1 n2)(n1 n2)

2(n1 n2 1)

+ Chapter 14 Overview

14.1 Introduction to Nonparametric Statistics

14.2 Sign Test

14.3 Wilcoxon Signed Ranks Test for Matched-Pairs Data

14.4 Wilcoxon Rank Sum Test for Two Independent Samples

14.5 Kruskal-Wallis Test

14.6 Rank Correlation Test

14.7 Runs Test for Randomness

33

chapter 14: nonparametric statistics

Documents

corresponding parametric

randomnessnoneruns test

sample t test

null hypothesis

sample data

normal population

parametric counterparts

population parameters