chapter 14: nonparametric statistics

33
+ Chapter 14: Nonparametric Statistics Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

Upload: damon-hansen

Post on 31-Dec-2015

78 views

Category:

Documents


4 download

DESCRIPTION

Chapter 14: Nonparametric Statistics. Lecture PowerPoint Slides. Chapter 14 Overview. 14.1 Introduction to Nonparametric Statistics 14.2 Sign Test 14.3 Wilcoxon Signed Ranks Test for Matched-Pairs Data 14.4 Wilcoxon Rank Sum Test for Two Independent Samples 14.5 Kruskal-Wallis Test - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 14: Nonparametric Statistics

+

Chapter 14:Nonparametric Statistics

Lecture PowerPoint Slides

Discovering Statistics

2nd Edition Daniel T. Larose

Page 2: Chapter 14: Nonparametric Statistics

+ Chapter 14 Overview

14.1 Introduction to Nonparametric Statistics

14.2 Sign Test

14.3 Wilcoxon Signed Ranks Test for Matched-Pairs Data

14.4 Wilcoxon Rank Sum Test for Two Independent Samples

14.5 Kruskal-Wallis Test

14.6 Rank Correlation Test

14.7 Runs Test for Randomness

2

Page 3: Chapter 14: Nonparametric Statistics

+ The Big Picture

Where we are coming from and where we are headed…

In earlier chapters, we learned how to perform hypothesis tests for population parameters, such as the mean µ or proportion p.

Here in Chapter 14 we learn about a family of hypothesis tests, called nonparametric hypothesis tests, whose conditions are similar to those in earlier chapters but less stringent.

Congratulations on getting this far in your discovery of the field of statistics! Best of luck in the future!

3

Page 4: Chapter 14: Nonparametric Statistics

+ 14.1: Introduction to Nonparametric Statistics

Objectives:

Explain what a nonparametric hypothesis test is and why we use it.

Describe what is meant by the efficiency of a nonparametric test.

4

Page 5: Chapter 14: Nonparametric Statistics

5

Nonparametric Hypothesis TestsIn Chapters 9–13, we learned how to perform hypothesis tests for population parameters, such as µ and p. To perform each of these parametric hypothesis tests, certain conditions need to be satisfied.

Parametric hypothesis tests are used to test claims about a population parameter, such as the population mean µ or proportion p. Often, parametric tests require that the population follow a particular distribution, such as the normal distribution.

Nonparametric hypothesis tests, also called distribution-free hypothesis tests, generally have fewer required conditions. In particular, nonparametric tests do not require the population to follow a particular distribution, such as the normal distribution.

Page 6: Chapter 14: Nonparametric Statistics

6

Nonparametric Hypothesis TestsAdvantages of Nonparametric Hypothesis Tests

1. May be used on a greater variety of data because they require fewer conditions than their parametric counterparts.

2. Can be applied to categorical (qualitative) data.

3. Manual computations tend to be easier than their parametric counterparts.

Disadvantages of Nonparametric Hypothesis Tests

1. Less efficient than parametric tests as they require a larger sample size to reject a null hypothesis.

2. Replace actual data values with either signs or ranks. Thus, the actual data values are wasted.

3. Technology often does not have dedicated procedures for performing these tests.

Page 7: Chapter 14: Nonparametric Statistics

7

Nonparametric Hypothesis TestsIn general, parametric tests are more efficient than corresponding nonparametric tests. The efficiency of a nonparametric test is used to compare it with its corresponding parametric test.

The efficiency of a nonparametric hypothesis test is defined as the ratio of the sample size required for the corresponding parametric test to the sample size required for the nonparametric test, in order to achieve the same result (such as correctly rejecting the null hypothesis). The efficiency ratings are reported on the assumption that required conditions for both the parametric and nonparametric tests have been met.

Section Situation Parametric Test Nonparametric Test Efficiency

14.2 Matched pairs t or Z test Sign test 0.63

14.3 Matched pairs t or Z test Wilcoxon signed ranks test 0.95

14.4 Two independent samples t or Z test Wilcoxon rank sum test 0.95

14.5 Several independent samples ANOVA Kruskal-Wallis test 0.95

14.6 Correlation Linear Correlation Rank correlation test 0.91

14.7 Randomness None Runs test --

Page 8: Chapter 14: Nonparametric Statistics

+ 14.2: Sign Test

Objectives:

Perform the sign test for a single population median.

Carry out the sign test for matched-pair data from two dependent samples.

Perform the sign test for binomial data.

8

Page 9: Chapter 14: Nonparametric Statistics

9

Sign Test for a Population Median

In Section 9.4, we learned how to perform the one-sample t test for the population mean µ. This is a parametric test requiring either a normal population or large sample. What do we do when we have neither? We use the sign test for the population median.

The sign test is a nonparametric hypothesis test in which the original data are transformed into plus or minus signs. The sign test may be conducted for (a) a single population median, (b) matched-pair data from two dependent samples, or (c) binomial data.The sign test requires only that the sample data have been randomly selected. It is not required that the population be normally distributed.

Page 10: Chapter 14: Nonparametric Statistics

10

Sign Test for a Population Median

Sign Test for the Population Median M (Small Sample ≤ 25)

If the data have been randomly selected, assign each value a (+) if greater than the hypothesized median or (–) if less than the hypothesized median.

Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0

Step 2: Find the critical value and state the rejection rule. Use Table X, , and the sample size n to identify Scrit.

Step 3: Find the test statistic Sdata.

Right-tailed test, Sdata= number of minus signs

Left-tailed test, Sdata= number of plus signs

Two-tailed test, Sdata= the smaller number of plus or minus signs

Step 4: State the conclusion and the interpretation.

Page 11: Chapter 14: Nonparametric Statistics

11

Sign Test for a Population Median

Sign Test for the Population Median M (Large Sample > 25)

If the data have been randomly selected, assign each value a (+) if greater than the hypothesized median or (–) if less than the hypothesized median.

Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0

Step 2: Find the critical value and state the rejection rule. Use Table X to find Zcrit.

Step 3: Find the test statistics and Zdata.

Step 4: State the conclusion and the interpretation.

2

2)5.0( data

datan

nS

Z

Page 12: Chapter 14: Nonparametric Statistics

+ 14.3: Wilcoxon Signed Ranks Test for Matched-Pair Data

Objectives:

Assess whether or not a data set is symmetric.

Carry out the Wilcoxon signed ranks test for matched-pair data from two dependent samples.

Perform the Wilcoxon signed ranks test for a single population median.

12

Page 13: Chapter 14: Nonparametric Statistics

13

Assessing the Symmetry of a Data Set

In Section 2.2, we learned that a distribution is symmetric if there is an axis of symmetry that splits the image in half so one side is the mirror image of the other. In Section 3.5, we learned that a boxplot is a convenient method for assessing the symmetry of a dataset.

Boxplot Criterion for Assessing Symmetry

A data set is symmetric when its corresponding boxplot has whiskers of approximately equal length, and the median line is situated approximately in the center of the box.

Page 14: Chapter 14: Nonparametric Statistics

14

Wilcoxon Signed Ranks Test

In Section 14.2, we performed the sign test for both a single population median and for the population median of the difference between two dependent samples.

The Wilcoxon signed ranks test is a nonparametric hypothesis test in which the original data are transformed into their ranks. The Wilcoxon signed ranks test may be conducted for (a) a single population median, or (b) matched-pair data from two dependent samples.

To perform the test, data must be randomly selected and have a symmetric distribution. Order the observations or the absolute value of the differences from smallest to largest. Rank these values from smallest to largest, assigning the average rank to any values that are the same. Then, attach the sign of corresponding values to the ranks.

Page 15: Chapter 14: Nonparametric Statistics

15

Wilcoxon Signed Ranks Test

Wilcoxon Signed Ranks Test for Matched-Pair Data

(Small Sample ≤ 30)

If the data have been randomly selected and the distribution is symmetric, assign each value a signed rank.

Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0

Step 2: Find the critical value and state the rejection rule. Use Table X, , and the sample size n to identify Tcrit.

Step 3: Find the test statistic Tdata.

Right-tailed test, Tdata= |T–|

Left-tailed test, Tdata= T+

Two-tailed test, Sdata= the smaller of T+ or |T–|.

Step 4: State the conclusion and the interpretation.

Page 16: Chapter 14: Nonparametric Statistics

16

Wilcoxon Signed Ranks Test

Wilcoxon Signed Ranks Test for Matched-Pair Data

(Large Sample > 30)

If the data have been randomly selected and the distribution is symmetric, assign each value a signed rank.

Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0

Step 2: Find the critical value and state the rejection rule. Use Table X to identify Zcrit.

Step 3: Find the test statistic Zdata.

Step 4: State the conclusion and the interpretation.

24)12)(1(

4)1(

data

data

nnn

nnT

Z

Page 17: Chapter 14: Nonparametric Statistics

17

Wilcoxon Signed Ranks Test

We can use the same methods for the Wilcoxon signed ranks test for a single population median that we used for matched-pair data. However, there is no subtracting of sample values to find the differences.

Instead, subtract the hypothesized median from each data value and assign the signed ranks to the differences.

Null Hypothesis Alternative Hypothesis Type of Test

H0: M = M0 H0: M > M0 Right-tailed

H0: M = M0 H0: M < M0 Left-tailed

H0: M = M0 H0: M ≠ M0 Two-tailed

Page 18: Chapter 14: Nonparametric Statistics

+ 14.4: Wilcoxon Rank Sum Test for Two Independent Samples

Objective:

Perform the Wilcoxon rank sum test for the difference in population medians, using two independent samples.

18

Page 19: Chapter 14: Nonparametric Statistics

19

Wilcoxon Rank Sum Test

In Section 14.3, we compared data from dependent samples. Recall that two samples are independent when the subjects selected for the first sample do not determine the subjects in the second sample. The two-sample t test that we learned in Section 10.2 required that either each sample be large or that each population be normally distributed.

The Wilcoxon rank sum test is a nonparametric hypothesis test in which the original data from two independent samples are transformed into their ranks. It tests whether the two population medians are equal or not.

In the Wilcoxon rank sum test, the two samples are temporarily combined, and the ranks of the combined data values are calculated. Then the ranks are summed separately for each sample.

R1 = the sum of the ranks for the first sampleR2 = the sum of the ranks for the second sample

Page 20: Chapter 14: Nonparametric Statistics

20

Wilcoxon Rank Sum Test

Wilcoxon Rank Sum Test for Two Independent Samples

The requirements are: (a) two independent random samples, (b) each sample size > 10, and (c) the shapes of the distributions are the same.

Step 1: State the hypotheses. H0: M1= M2 vs. Ha: M1>, M1<, or M1 ≠ M2

Step 2: Find the critical value and state the rejection rule. See Table 14.14.

Step 3: Find the test statistic Zdata.

Step 4: State the conclusion and the interpretation.

R

RμRZ

1data

2

)1( 211

nnnμR

R n1n2(n1 n2 1)

12

Page 21: Chapter 14: Nonparametric Statistics

+ 14.5: Kruskal-Wallis Test

Objective:

Perform Kruskal-Wallis test for equal medians in three or more populations.

21

Page 22: Chapter 14: Nonparametric Statistics

22

Kruskal-Wallis Test

In Section 14.4, we learned the Wilcoxon rank sum test, which tests whether the population medians of two independent random samples are equal. Here, we extend this method to three or more populations.

The Kruskal-Wallis test is a nonparametric hypothesis test in which the original data from three or more independent samples are transformed into their ranks. It tests whether the population medians are all equal.

Like the Wilcoxon rank sum test, the samples are temporarily combined, and the ranks of the combined data values are calculated. Then the ranks are summed separately for each sample.

R1 = the sum of the ranks for the first sampleR2 = the sum of the ranks for the second sample, and so on..Rk = the sum of the ranks for the last sample

Page 23: Chapter 14: Nonparametric Statistics

23

Wilcoxon Rank Sum Test

Wilcoxon Rank Sum Test for Two Independent Samples

The requirements are (a) k ≥ 3 independent random samples and (b) each sample size > 5.

Step 1: State the hypotheses. H0: The population medians are all equal vs. Ha: Not all population medians are equal.

Step 2: Find the 2 critical value and state the rejection rule. Use Table X, , and k – 1 degrees of freedom.

Step 3: Find the test statistic 2data.

Step 4: State the conclusion and the interpretation.

)1(3...)1(

12χ

2

2

22

1

212

data

N

n

R

n

R

n

R

NN k

k

Page 24: Chapter 14: Nonparametric Statistics

+ 14.6: Rank Correlation Test

Objective:

Perform the rank correlation test for paired data.

24

Page 25: Chapter 14: Nonparametric Statistics

25

Rank Correlation TestIn Chapter 4, we learned how to calculate the correlation coefficient, which measures the strength of linear association between two variables. Here, we will learn how to calculate the rank correlation of two variables, which is the correlation of the variables based on ranks.

The rank correlation test (Spearman’s rank correlation test) is based on the ranks of matched-pair data. This test may also be applied when the original data are ranks. In the rank correlation test, we investigate whether two variables are related by analyzing the ranks of matched-pair data. The rank correlation test may also be used to detect a nonlinear relationship between two variables.

The hypotheses for the rank correlation test are:H0 = there is no rank correlation between the two variablesHa = there is a rank correlation between the two variables

To find the test statistic, we must calculate and square the paired differences of the ranks.

Page 26: Chapter 14: Nonparametric Statistics

26

Rank Correlation Test

Rank Correlation Test (Small Sample ≤ 30)

The sample data must be randomly selected.

Step 1: State the hypotheses.

Step 2: Find the rcrit critical value and state the rejection rule. Use Table X, , and sample size n.

Step 3: Find the test statistic rdata.

Rank the values of each variable from lowest to highest.

Find the difference in ranks for each subject, square the differences, and add them up.

Step 4: State the conclusion and the interpretation.

)1(

61

2

2

data

nn

dr

Page 27: Chapter 14: Nonparametric Statistics

27

Rank Correlation Test

Rank Correlation Test (Large Sample > 30)

The sample data must be randomly selected.

Step 1: State the hypotheses.

Step 2: Find the Zcrit critical value and state the rejection rule. Use Table 14.24.

Step 3: Find the test statistic Zdata.

Rank the values of each variable from lowest to highest.

Find the difference in ranks for each subject, square the differences, and add them up.

Step 4: State the conclusion and the interpretation.

)1(

61

2

2

data

nn

dZ

Page 28: Chapter 14: Nonparametric Statistics

+ 14.7: Runs Test for Randomness

Objective:

Perform the runs test for randomness.

28

Page 29: Chapter 14: Nonparametric Statistics

29

Runs Test for RandomnessRecall from Chapter 13 that one of the assumptions for the linear regression model was that the values y were independent. Here we learn a test for checking this assumption.

The runs test for randomness helps us determine whether the data in a sequence are random or if there is a pattern. The test applies to data that have two possible outcomes or data that can be re-expressed as one of two outcomes. The test works by counting the number of runs in the data set.

A sequence is an ordered set of data. A run is a sequence of observations sharing the same value (of two possible values), preceded or followed by data having the other possible value or by no data at all. The runs test for randomness tests whether the data in a sequence are random or whether there is a pattern in the sequence.

Page 30: Chapter 14: Nonparametric Statistics

30

Runs Test for Randomness

The notation for the runs test for randomness is as follows:

n1 = the number of observations having the first outcome

n2 = the number of observations having the second outcome

n = the total number of observations

G = the number of runs in the sequence

Page 31: Chapter 14: Nonparametric Statistics

31

Runs Test for Randomness

Runs Test for Randomness (Small Samples n1 and n2 ≤ 20)

There are two conditions: (a) the data are ordered, and (b) each data value represents one of two distinct outcomes.

Step 1: State the hypotheses. H0: The sequence of data is random vs. Ha: The sequence of data is not random.

Step 2: Find the Gcrit critical value and state the rejection rule. Use Table X, = 0.05, row n1, and column n2.

Step 3: Find the test statistic Gdata.

Step 4: State the conclusion and the interpretation.

GG data

Page 32: Chapter 14: Nonparametric Statistics

32

Runs Test for Randomness

Runs Test for Randomness (Large Samples n1 or n2 > 20)

There are two conditions: (a) the data are ordered, and (b) each data value represents one of two distinct outcomes.

Step 1: State the hypotheses. H0: The sequence of data is random vs. Ha: The sequence of data is not random.

Step 2: Find the Zcrit critical value and state the rejection rule. Use Table 14.27.

Step 3: Find the test statistic Gdata.

Step 4: State the conclusion and the interpretation.

G

GGG

data

G 2n1n2n1 n2

1

G (2n1n2)(2n1n2 n1 n2)(n1 n2)

2(n1 n2 1)

Page 33: Chapter 14: Nonparametric Statistics

+ Chapter 14 Overview

14.1 Introduction to Nonparametric Statistics

14.2 Sign Test

14.3 Wilcoxon Signed Ranks Test for Matched-Pairs Data

14.4 Wilcoxon Rank Sum Test for Two Independent Samples

14.5 Kruskal-Wallis Test

14.6 Rank Correlation Test

14.7 Runs Test for Randomness

33