nonparametric statistical methods

110
Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1

Upload: reina

Post on 06-Jan-2016

61 views

Category:

Documents


3 download

DESCRIPTION

Nonparametric Statistical Methods. Presented by Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang. Introduction. Definition. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nonparametric Statistical Methods

Nonparametric Statistical Methods

Presented by Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang

1

Page 2: Nonparametric Statistical Methods

Introduction

Page 3: Nonparametric Statistical Methods

Definition

Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled.

Used for small sample sizes. Used when the data are measured on an

ordinal scale and only their ranks are meaningful.

3

Page 4: Nonparametric Statistical Methods

Outline

1. Sign Test 2. Wilcoxon Signed Rank Test 3. Inferences for Two Independent Samples 4. Inferences for Several Independent Samples 5. Friedman Test 6. Spearman’s Rank Correlation 7. Kendall’s Rank Correlation Coefficient

4

Page 5: Nonparametric Statistical Methods

1 .Sign Test

5

Page 6: Nonparametric Statistical Methods

Parameter of interest: Median

Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions.

6

Page 7: Nonparametric Statistical Methods

Hypothesis test

H0: µ = µ0 vs Ha: µ > µ0 where µ0 is a specified value and µ is unknown median

7

Page 8: Nonparametric Statistical Methods

Testing Procedure

Step 1: Given a random sample x1, x2, …, xn from a population with unknown median µ, count the number of xi’s that exceed µ0. Denote them by s+. s-= n - s+

Step 2: Reject H0 if s+ is large or s- is small.

8

Page 9: Nonparametric Statistical Methods

How to reject H0?

To determine how large s+ must be in order to reject H0, we need to find out the distribution of the corresponding random variable S+.

Xi: random variable corresponding to the observed values xi

S-: random variable corresponding to s-

9

Page 10: Nonparametric Statistical Methods

Distribution of S+ and S-

10

Page 11: Nonparametric Statistical Methods

Calculating P-value

11

Page 12: Nonparametric Statistical Methods

Rejection criteria

12

Page 13: Nonparametric Statistical Methods

Large sample z-test

13

Page 14: Nonparametric Statistical Methods

Confidence Interval

14

Page 15: Nonparametric Statistical Methods

Example

15

Page 16: Nonparametric Statistical Methods

SAS code

16

DATA themostat;INPUT temp;datalines;202.2203.4…;PROC UNIVARIATE DATA=themostat loccount mu0=200;VAR temp;RUN;

Page 17: Nonparametric Statistical Methods

SAS Output Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode . Range 8.30000 Interquartile Range 2.90000

Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048

17

Page 18: Nonparametric Statistical Methods

2. Wilcoxon signed rank test

18

Page 19: Nonparametric Statistical Methods

Inventor

Frank Wilcoxon (2 September 1892 in County Cork, Ireland – 18 November 1965, Tallahassee, Florida, USA) was a chemist and statistician, known for development of several statistical tests.

19

Page 20: Nonparametric Statistical Methods

What is it used for?

Two related samples Matched samples Repeated measurements on a single

sample

Page 21: Nonparametric Statistical Methods

Hypothesis

21

Page 22: Nonparametric Statistical Methods

Testing procedure

22

Page 23: Nonparametric Statistical Methods

Example

23

Page 24: Nonparametric Statistical Methods

SAS codes

24

DATA thermo;INPUT temp;datalines;202.2203.4…;PROC UNIVARIATE DATA=thermo loccount mu0=200;TITLE "Wilcoxon signed rank test the thermostat";VAR temp;RUN;

Page 25: Nonparametric Statistical Methods

SAS outputs (selected results)

25

8

Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode . Range 8.30000 Interquartile Range 2.90000

Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048

Page 26: Nonparametric Statistical Methods

Large sample approximation

26

Page 27: Nonparametric Statistical Methods

Derive E(x) & Var(x)

27

Page 28: Nonparametric Statistical Methods

Rejection region:

28

Page 29: Nonparametric Statistical Methods

3. Inferences for Two Independent Samples

29

Page 30: Nonparametric Statistical Methods

Hypothesis

Page 31: Nonparametric Statistical Methods

Definition

31

Page 32: Nonparametric Statistical Methods

Definition

32

Page 33: Nonparametric Statistical Methods

Wilcoxon sum rank test

33

Page 34: Nonparametric Statistical Methods

Mann-Whitney-U test

34

Page 35: Nonparametric Statistical Methods

Between two tests

35

Page 36: Nonparametric Statistical Methods

Advantages

36

Page 37: Nonparametric Statistical Methods

For large samples

37

Page 38: Nonparametric Statistical Methods

For large samples

38

Page 39: Nonparametric Statistical Methods

Treatment of ties

39

Page 40: Nonparametric Statistical Methods

Example

To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows

A: 8.50 9.48 8.65 8.16 8.83 7.76 8.63 B: 8.27 8.20 8.25 8.14 9.00 8.10 7.20

8.32 7.70

40

Page 41: Nonparametric Statistical Methods

Example

7.20 7.70 7.76 8.10 8.14 8.16 8.20 8.25

B B A B B A B B

1 2 3 4 5 6 7 8

8.27 8.32 8.50 8.63 8.65 8.83 9.00 9.48

B B A A A A B A

9 10 11 12 13 14 15 16

41

Page 42: Nonparametric Statistical Methods

Example

42

Page 43: Nonparametric Statistical Methods

Example

43

Page 44: Nonparametric Statistical Methods

SAS code

Data exam;Input group $ score @@;Datalines;A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20 B 8.32 B 7.70;

44

Page 45: Nonparametric Statistical Methods

SAS code

Proc npar1way data=exam wilcoxon;Var score;Class group;Exact wilcoxon;Run;

45

Page 46: Nonparametric Statistical Methods

Output

Wilcoxon Scores (Rank Sums) for Variable scoreClassified by Variable group

group N Sum ofScores

ExpectedUnder H0

Std DevUnder H0

MeanScore

A 7 75.0 59.50 9.447222 10.714286

B 9 61.0 76.50 9.447222 6.777778

46

Page 47: Nonparametric Statistical Methods

OutputWilcoxon Two-Sample Test

Statistic (S) 75.0000

   

Normal Approximation  

Z 1.5878

One-Sided Pr > Z 0.0562

Two-Sided Pr > |Z| 0.1123

   

t Approximation  

One-Sided Pr > Z 0.0666

Two-Sided Pr > |Z| 0.1332

   

Exact Test  

One-Sided Pr >= S 0.0571

Two-Sided Pr >= |S - Mean| 0.1142

Z includes a continuity correction of0.5.

47

Page 48: Nonparametric Statistical Methods

Output

48

Page 49: Nonparametric Statistical Methods

4. Inferences for Several Independent Samples

49

Page 50: Nonparametric Statistical Methods

Introduction

We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test.

50

Page 51: Nonparametric Statistical Methods

When to use Kruskal-Wallis test?

But what happens when our data is not normal? This is when we use the nonparametric

Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution.

The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data. 51

Page 52: Nonparametric Statistical Methods

Kruskal-Wallis Test (kw Test)

A non-parametric method for testing whether samples originate from the same distribution.

Used for comparing more than two samples that are independent.  

52

Page 53: Nonparametric Statistical Methods

Kruskal-Wallis Test: History William Henry Kruskal

October 10th, 1919 – April 21st, 2005 Obtained Bachelors and Masters degree

in Mathematics at Harvard University and received his Ph. D. from Columbia University in 1955.

Wilson Allen Wallis November 5th,1912 – October 12th, 1998 Undergraduate work at the University of

Minnesota and Graduate work at the University of Chicago in 1933.

53

Page 54: Nonparametric Statistical Methods

Kruskal-Wallis Test: Steps

1. Create Hypothesis:Null Hypothesis (Ho): The samples from populations are identicalAlternative Hypothesis (Ha): At least one sample is different

54

Page 55: Nonparametric Statistical Methods

Kruskal-Wallis Test: Steps

2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied.

3. All the ranks of the different samples are added together. Label these sums L1, L2, L3, and L4.

55

Page 56: Nonparametric Statistical Methods

Kruskal-Wallis Test: Steps

4. Find Test Statistic:

n = total number of observations in all samplesLi = total rank of each sample

kw = test statistic

5. Reject Ho if H is greater than the chi-square table value.

56

Page 57: Nonparametric Statistical Methods

Kruskal-Wallis Test: Example

An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal-Wallis test to the test scores data set.

57

Page 58: Nonparametric Statistical Methods

Kruskal-Wallis Test: Example

Given

Data

Ranksof Data values

58

Chris
Page 59: Nonparametric Statistical Methods

Kruskal-Wallis Test: Example

59

Page 60: Nonparametric Statistical Methods

Kruskal-Wallis Test: Example

60

Page 61: Nonparametric Statistical Methods

SAS Input

data test; input methodname $ scores; cards; case 14.59 case 23.44 case 25.43 case 18.15 Case 20.82 Case 14.06 Case 14.26 Formula 20.27 Formula 26.84 Formula 14.71 Formula 22.34 Formula 19.49 Formula 24.92 Formula 20.20 Equation 27.82

Equation 24.92 Equation 24.92 Equation 28.68 Equation 23.32 Equaiton 32.85 Equation 33.90 Equation 23.42 Unitary 33.16 Unitary 26.93 Unitary 30.43 Unitary 36.43 Unitary 37.04 Unitary 29.76 Unitary 33.88 ; proc npar1way

data=test wilcoxon; class methodname; var scores; run;

61

Page 62: Nonparametric Statistical Methods

SAS Output

Wilcoxon Scores (Rank Sums) for Variable scores Classified by Variable methodname

Sum of Expected Std Dev Mean methodname N Scores Under H0 Under H0 Score case 7 49.00 101.50 18.845498 7.000000 formula 7 66.50 101.50 18.845498 9.500000 equation 7 125.50 101.50 18.845498 17.928571 unitary 7 165.00 101.50 18.845498 23.571429

Average scores were used for ties.

Kruskal-Wallis Test

Chi-Square 18.1390 DF 3 Pr > Chi-Square 0.0004

62

Page 63: Nonparametric Statistical Methods

4. Friedman Test

63

Page 64: Nonparametric Statistical Methods

Introduction

A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it.

The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data.

64

Page 65: Nonparametric Statistical Methods

Steps in the Friedman test

65

Page 66: Nonparametric Statistical Methods

Steps in the Friedman test

66

Page 67: Nonparametric Statistical Methods

Example

Now we have 8 treatments separated in 3 blocks,

α = 0.025

67

Page 68: Nonparametric Statistical Methods

Define Null and Alternative Hypothesis

H0: There is no difference between 8 treatments

Ha: There exists difference between 8 treatments

68

Page 69: Nonparametric Statistical Methods

Rank Sum

69

Page 70: Nonparametric Statistical Methods

Friedman Test

70

Page 71: Nonparametric Statistical Methods

Conclusion

71

Page 72: Nonparametric Statistical Methods

5. Spearman’s Rank Correlation Coefficient

72

Page 73: Nonparametric Statistical Methods

Introduction

From Pearson to Spearman Spearman’s Rank Correlation

Coefficient Large-Sample Approximation  Hypothesis Test  Examples           

73

Page 74: Nonparametric Statistical Methods

From Pearson to Spearman Pearson’s

Measure only the degree of linear association Based on the assumption of bivariate

normally of two variables

Spearman’s Take in account only the ranks Measure the degree of monotone association Inferences on the rank correlation

coefficients are distribution-free

74

Page 75: Nonparametric Statistical Methods

From Pearson to Spearman

75

Page 76: Nonparametric Statistical Methods

From Pearson to Spearman

Charles Edward Spearman As a psychologist ① General factor of intelligence

          

② the nature and causes of variations in human

As a statistician ① Rank correlation

                            ② two-way analysis

Charles Edward Spearman (10 Sept. 1863 – 17 Sept. 1945)

③ Correlation coefficient

            

76

Page 77: Nonparametric Statistical Methods

Spearman’s Rank Correlation Coefficient

77

Page 78: Nonparametric Statistical Methods

Spearman’s Rank Correlation Coefficient

78

Page 79: Nonparametric Statistical Methods

Large sample approximation

79

Page 80: Nonparametric Statistical Methods

Hypothesis testing

80

Page 81: Nonparametric Statistical Methods

Example

Table 5.1 Wine Consumption and Heart Disease Deaths

81

Page 82: Nonparametric Statistical Methods

Example

82

Page 83: Nonparametric Statistical Methods

ExampleTable 5.2 Ranks of Wine Consumption and Heart Disease Deaths

83

Page 84: Nonparametric Statistical Methods

Example

84

Page 85: Nonparametric Statistical Methods

Example

85

Page 86: Nonparametric Statistical Methods

6. Kendall’s Rank Correlation Coefficient

86

Page 87: Nonparametric Statistical Methods

Kendall’s Tau It is a coefficient use to measure the

association between two pairs of ranked data.

Named after British statistician Maurice Kendall who developed it in 1938.

Ranges from -1.0 to 1.0 Tau-a (with no ties) and Tau-b (with ties)

87

Page 88: Nonparametric Statistical Methods

Formula for Tau-a

88

Page 89: Nonparametric Statistical Methods

Concordant and Discordant

89

Page 90: Nonparametric Statistical Methods

Example 1 Kendall’s tau-a

Raw data for 11 students in 2 exams:Exam 1 Exam 2

85 8598 9590 8083 7557 7063 6577 7399 9380 7996 8869 74 90

Page 91: Nonparametric Statistical Methods

Ranks of exam resultsExam1 x Exam 2 y c d

1 2 9 1

2 1 9 0

3 3 8 0

4 5 6 1

5 4 6 0

6 7 4 1

7 6 4 0

8 9 2 1

9 8 2 010 11 0 111 10 C=50 D=5 91

Page 92: Nonparametric Statistical Methods

Calculation for ṫ

92

Page 93: Nonparametric Statistical Methods

Steps for calculating ṫ

1.Sort data x in ascending order, pair y ranks with x2.Count c and d for each y3.Sum C and D4.Use formula to calculate ṫ

93

Page 94: Nonparametric Statistical Methods

Formula for tau-b(with ties)

94

Page 95: Nonparametric Statistical Methods

Example 2 Kendall’s tau-b   Wine Consumption and heart disease deaths data               i Country xi yi c d

1 Ireland 0.7 300 0 182 Iceland 0.80.8 211 3 11

2 Norway 0.80.8 227 2 134 Finland 0.80.8 297 0 155 U.S. 1.2 199 5 96 U.K 1.3 285 0 137 Sweden 1.6 207 3 9

8 Netherlands 1.8 167 5 59 N. Z 1.9 266 0 10

10 Canada 2.4 191 2 711 Australia 2.5 211 1 712 Germany 2.7 172 1 613 Belgium 2.9 131 2 414 Denmark 2.9 220 0 515 Austria 3.9 167 0 4

16 Switzerland 5.8 115 0 317 Spain 6.5 86 1 118 Italy 7.9 107 0 119 France 9.1 71 0 0

C=25 D=141 95

Page 96: Nonparametric Statistical Methods

Calculation for tau-b

96

Page 97: Nonparametric Statistical Methods

Hypothesis Test for τ

97

Page 98: Nonparametric Statistical Methods

Hypothesis test results

98

Page 99: Nonparametric Statistical Methods

Hypothesis test results

99

Page 100: Nonparametric Statistical Methods

100

Page 101: Nonparametric Statistical Methods

Example 1 extension

 

101

Page 102: Nonparametric Statistical Methods

102

Page 103: Nonparametric Statistical Methods

103

Page 104: Nonparametric Statistical Methods

SAS CodeData exams;Input exam1 exam2;Datalines;85 8598 95…;Run;Proc corr data=exams kendall;Var exam1 exam2;Run;

104

Page 105: Nonparametric Statistical Methods

SAS outputThe CORR Procedure

2 Variables: exam1 exam2

Simple Statistics

Variable N Mean Std Dev Median Minimum Maximum

exam1 11 81.54545 14.13056 83.00000 57.00000 99.00000

exam2 11 79.72727 9.58218 79.00000 65.00000 95.00000

Kendall Tau b Correlation Coefficients, N = 11 Prob > |tau| under H0: Tau=0

exam1 exam2

exam1 1.00000

0.81818

0.0005

exam2 0.81818

0.0005

1.00000

105

Page 106: Nonparametric Statistical Methods

7. Conclusion

106

Page 107: Nonparametric Statistical Methods

Summary

Nonparametric tests are very useful when we don’t know anything about the distributions.

Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods.

Median is a better measurement of central tendency for non-normal population.

Sample can be ordinal and sample size is usually small.

107

Page 108: Nonparametric Statistical Methods

Summary

In summary, we have briefly introduced some most common methods in our presentation including:Sign test Wilcoxon rank sum test and signed rank testKruskal-Wallis TestFriedman TestSpearman’s Rank CorrelationKendall’s Rank Correlation Coefficient

108

Page 109: Nonparametric Statistical Methods

Questions

109

Page 110: Nonparametric Statistical Methods

The End.

Thank You !

110