applied statistics using sas and spss
DESCRIPTION
Applied Statistics Using SAS and SPSS. Topic: Chi-square tests By Prof Kelly Fan, Cal. State Univ., East Bay. Outline. ALL variables must be categorical Goal one: verify a distribution of Y One-sample Chi-square test (SPSS lesson 40; SAS handout) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/1.jpg)
1
Applied Statistics Using SAS and SPSS
Topic: Chi-square tests
By Prof Kelly Fan, Cal. State Univ., East Bay
![Page 2: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/2.jpg)
2
Outline ALL variables must be categorical Goal one: verify a distribution of Y
One-sample Chi-square test (SPSS lesson 40; SAS handout) Goal two: test the independence between two categorical
variablesChi-square test for two-way contingency table (SPSS lesson
41; SAS section 3.G)McNemar’s test for paired data (SPSS lesson 44; SAS
section 3.L) Measure the dependence (Phil and Kappa coefficients)
(SPSS lesson 41, 44; SAS section 3.G, 3.M) Goal three: test the independence between two categorical
variables after controlling the third factorMantel-Haenszel Chi-square test (SPSS in class; SAS
section 3.Q)
![Page 3: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/3.jpg)
3
Example: Postpartum Depression Study
Are women equally likely to show an increase, no change, or a decrease in depression as a function of childbirth?
Are the proportions associated with a decrease, no change, and an increase in depression from before to after childbirth the same?
![Page 4: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/4.jpg)
Raw data vs. Grouped data
Raw data:
Grouped data are shown in next slide.4
ID Name Depression level after birth in comparison with before birth
1 *** Same
2 *** Less depressed
3 *** More depressed
![Page 5: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/5.jpg)
5
Example: Postpartum Depression Study
Depression after birth in comparison with before birth
Observed frequencies
Hypothesized proportions
Expected frequencies
Less depressed (-1) 14 1/3 20
Neither less nor more depressed (0)
33 1/3 20
More depressed (1) 13 1/3 20
From a random sample of 60 women
![Page 6: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/6.jpg)
6
One-sample Chi-Square Test
Must be a random sample
The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories
![Page 7: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/7.jpg)
7
One-sample Chi-Square Test
Test statistic:
Oi = the observed frequency of i-th category
ei = the expected frequency of i-th category
i i
ii
e
eo 22 )(
![Page 8: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/8.jpg)
8
SPSS Output
1. Weight your data by count first (data>>weight cases)
2. Analyze >> Nonparametric Tests >> Legacy Dialogs >> Chi Square, count as test variable
Postpartum Depression
14 20.0 -6.0
33 20.0 13.0
13 20.0 -7.0
60
less depressed
same
more depressed
Total
Observed N Expected N Residual
Test Statistics
12.700
2
.002
Chi-Square a
df
Asymp. Sig.
PostpartumDepression
0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 20.0.
a.
![Page 9: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/9.jpg)
9
Conclusion
Reject Ho
The proportions associated with a decrease, no change, and an increase in depression from before to after childbirth are significantly different to 1/3, 1/3, 1/3.
![Page 10: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/10.jpg)
10
Example: Postpartum Depression Study
Are the proportions associated with a change and no change from before to after childbirth the same?
![Page 11: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/11.jpg)
11
Example: Postpartum Depression Study
Depression after birth in comparison with before birth
Observed frequencies
Hypothesized proportions
Expected frequencies
Same amount of depression (0)
33 1/2 30
More or less depressed (1)
27 1/2 30
From a random sample of 60 women
![Page 12: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/12.jpg)
12
SPSS Output
Postpartum Depression--Recoded
33 30.0 3.0
27 30.0 -3.0
60
same
more or less depressed
Total
Observed N Expected N Residual
Test Statistics
.600
1
.439
Chi-Square a
df
Asymp. Sig.
PostpartumDepression--Recoded
0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 30.0.
a.
![Page 13: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/13.jpg)
13
Two-way Contingency Tables
Report frequencies on two variables
Such tables are also called crosstabs.
![Page 14: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/14.jpg)
14
Contingency Tables (Crosstabs)
1991 General Social Survey
Frequency Party Identification
Democrat Independent Republican
Race White 341 105 405
Black 103 15 11
![Page 15: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/15.jpg)
15
Crosstabs Analysis (Two-way Chi-square test) Chi-square test for testing the
independence between two variables:
1. For a fixed column, the distribution of frequencies over rows keeps the same regardless of the column
2. For a fixed row, the distribution of frequencies over columns keeps the same regardless of the row
![Page 16: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/16.jpg)
16
Measure of dependence for 2x2 tables
The phi coefficient measures the association between two categorical variables
-1 < phi < 1 | phi | indicates the strength of the
association If the two variables are both ordinal, then
the sign of phi indicate the direction of association
![Page 17: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/17.jpg)
SPSS Output
P. 332 : Data>> weight cases>> Weight cases by, select count variable P. 333: Analyze >> descriptive statistics >> crosstabs, cell
17
![Page 18: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/18.jpg)
18
SAS Output
Statistic DF Value ProbChi-Square 2 79.4310 <.0001
Likelihood Ratio Chi-Square 2 90.3311 <.0001Mantel-Haenszel Chi-Square 1 79.3336 <.0001
Phi Coefficient 0.2847 Contingency Coefficient 0.2738 Cramer's V 0.2847
Sample Size = 980
![Page 19: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/19.jpg)
Measure of dependence for non-2x2 tables
Cramers V
Range from 0 to 1V may be viewed as the association between
two variables as a percentage of their maximum possible variation.
V= phi for 2x2, 2x3 and 3x2 tables
19
![Page 20: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/20.jpg)
20
Fisher’s Exact Test for Independence
The Chi-squared tests are ONLY for large samples:
The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories
![Page 21: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/21.jpg)
21
SAS/SPSS Output
• SAS output: Fisher's Exact Test Table Probability (P) 3.823E-22 Pr <= P 2.787E-20
• SPSS output: in “crosstabs” window, click “exact”, then tick “exact”:
![Page 22: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/22.jpg)
22
Matched-pair Data
Comparing categorical responses for two “paired” samples
When eitherEach sample has the same subjects (or say
subjects are measured twice)
OrA natural pairing exists between each subject in
one sample and a subject from the other sample (eg. Twins)
![Page 23: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/23.jpg)
23
Example: Rating for Prime Minister
Second Survey
First Survey Approve Disapprove
Approve 794 150
Disapprove 86 570
![Page 24: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/24.jpg)
24
Marginal Homogeneity
The probabilities of “success” for both samples are identical
Eg. The probability of “approve” at the first and 2nd surveys are identical
![Page 25: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/25.jpg)
25
McNemar Test (for 2x2 Tables only)
SAS: Section 3.L; SPSS: Lesson 44
Ho: marginal homogeneity
Ha: no marginal homogeneity
Exact p-valueApproximate p-value (When n12+n21>10)
![Page 26: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/26.jpg)
26
SAS Output
McNemar's Test Statistic (S) 17.3559 DF 1 Asymptotic Pr > S <.0001 Exact Pr >= S 3.716E-05
Simple Kappa Coefficient Kappa 0.6996 ASE 0.0180 95% Lower Conf Limit 0.6644 95% Upper Conf Limit 0.7348
Sample Size = 1600
Level of agreement
In SPSS: Analyze >> Descriptive statistics >> crosstabs, in “statistics” tick “Kappa” and “McNemar”
![Page 27: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/27.jpg)
SPSS Output
27
• SPSS(p. 361): Analyze >> Nonparametric tests >> Legacy dialogs >> 2 related samples; in “two-samples tests” tick “McNemar” and click “exact”, then tick “exact” again
![Page 28: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/28.jpg)
Stratified 2 by 2 Tables (Meta-Analysis)
Goal: to investigate the risk factor (lack of sleep) to the outcome (failing a test)
28
Test Results, Boys
Sleep Fail Pass
Low 20 100
High 15 150
Test Results, Girls
Sleep Fail Pass
Low 30 100
High 25 200
![Page 29: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/29.jpg)
Cochran Mantel-Haenszel Test After Importing your dataset, and providing names to variables,
click on: ANALYZE >> DESCRIPTIVE STATISTICS >> CROSSTABS For ROWS, Select the Independent Variable For COLUMNS, Select the Dependent Variable For LAYERS, Select the Strata Variable Under STATISTICS, Click on COCHRAN’S AND MANTEL-
HAENSZEL STATISTICS NOTE: You will want to code the data so that the outcome
present (Yes) category has the lower value (e.g. 1) and the outcome absent (No) category has the higher value (e.g. 2). Do the same for risk factor: 1 for exposure; 2 for no exposure. Use Value Labels to keep output straight.
![Page 30: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/30.jpg)
SPSS Output
30
![Page 31: Applied Statistics Using SAS and SPSS](https://reader035.vdocuments.net/reader035/viewer/2022081503/56813b18550346895da3c994/html5/thumbnails/31.jpg)
SAS Output
Common Odds Ratio and Relative Risks
Statistic Method Value 95% Confidence Limits
Odds Ratio Mantel-Haenszel 2.2289 1.4185 3.5024
Logit 2.2318 1.4205 3.5064
Relative Risk (Column 1)
Mantel-Haenszel 1.9775 1.3474 2.9021
Logit 1.9822 1.3508 2.9087
Relative Risk (Column 2)
Mantel-Haenszel 0.8891 0.8283 0.9544
Logit 0.8936 0.8334 0.9582 31
Breslow-Day Test forHomogeneity of the Odds Ratios
Chi-Square 0.1501
DF 1
Pr > ChiSq 0.6985
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis
DF Value Prob
1 Nonzero Correlation
1 12.4770 0.0004
2 Row Mean Scores Differ
1 12.4770 0.0004
3 General Association
1 12.4770 0.0004