types of categorical data qualitative/categorical data nominal categoriesordinal categories
TRANSCRIPT
![Page 1: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/1.jpg)
Statistical Tests to Analyze the Categorical Data
![Page 2: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/2.jpg)
![Page 3: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/3.jpg)
![Page 4: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/4.jpg)
![Page 5: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/5.jpg)
THE CHI-SQUARE TEST
BACKGROUND AND NEED OF THE TEST
Data collected in the field of medicine is often qualitative.
--- For example, the presence or absence of a symptom, classification of pregnancy as ‘high risk’ or ‘non-high risk’, the degree of severity of a disease (mild, moderate, severe)
![Page 6: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/6.jpg)
The measure computed in each instance is a proportion, corresponding to the mean in the case of quantitative data such as height, weight, BMI, serum cholesterol.
Comparison between two or more proportions, and the test of significance employed for such purposes is called the “Chi-square test”
![Page 7: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/7.jpg)
KARL PEARSON IN 1889, DEVISED AN INDEX OF DISPERSION OR TEST CRITERIOR DENOTED AS “CHI-SQUARE “. (2).
![Page 8: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/8.jpg)
Introduction What is the 2 test?
2 is a non-parametric test of statistical significance for bi variate tabular analysis.
• Any appropriately performed test of statistical significance lets you know that degree of confidence you can have in accepting or rejecting a hypothesis.
![Page 9: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/9.jpg)
Introduction
What is the 2 test?• The hypothesis tested with chi square is whether or not two different samples are different enough in some characteristics or aspects of their behavior that we can generalize from our samples that the populations from which our samples are drawn are also different in the behavior or characteristic.
![Page 10: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/10.jpg)
Introduction What is the 2 test?
• The 2 test is used to test a distribution observed in the field against another distribution determined by a null hypothesis.
• Being a statistical test, 2 can be expressed as a formula. When written in mathematical notation the formula looks like this:
![Page 11: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/11.jpg)
Chi- SquareChi- Square
( o - e ) 2
eX X 2 = =
Figure for Each Cell
![Page 12: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/12.jpg)
reject reject HHoo if if 22 > > 22..,,dfdf
where df = (where df = (rr-1)(-1)(cc-1) -1) 22 = ∑ = ∑
((OO - - EE))22
EE
3.3. E is the expected frequencyE is the expected frequency^̂
EE = =^̂
(total of all cells)(total of all cells)
total of row intotal of row inwhich the cell lieswhich the cell lies
total of column intotal of column inwhich the cell lieswhich the cell lies••
1.1. The summation is over all cells of the contingency The summation is over all cells of the contingency table consisting of r rows and c columnstable consisting of r rows and c columns
2.2. O is the observed frequencyO is the observed frequency
4.4. The degrees of freedom are df = (r-1)(c-1)The degrees of freedom are df = (r-1)(c-1)
![Page 13: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/13.jpg)
When using the chi square test, the researcher needs a clear idea of what is being investigate.
It is customary to define the object of the research by writing an hypothesis.
Chi square is then used to either prove or disprove the hypothesis.
![Page 14: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/14.jpg)
Hypothesis The hypothesis is the most
important part of a research project. It states exactly what the researcher is trying to establish. It must be written in a clear and concise way so that other people can easily understand the aims of the research project.
![Page 15: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/15.jpg)
Chi-square test
PurposePurpose
To find out whether the association between two To find out whether the association between two categorical variables are statistically significant categorical variables are statistically significant
Null HypothesisNull Hypothesis
There is no association between two variablesThere is no association between two variables
![Page 16: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/16.jpg)
Requirements Prior to using the chi square
test, there are certain requirements that must be met.• The data must be in the form of frequencies counted in each of a set of categories. Percentages cannot be used.
• The total number observed must be exceed 20.
![Page 17: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/17.jpg)
Requirements• The expected frequency under the H0 hypothesis in any one fraction must not normally be less than 5.
• All the observations must be independent of each other. In other words, one observation must not have an influence upon another observation.
![Page 18: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/18.jpg)
APPLICATION OF CHI-SQUARE TEST
TESTING INDEPENDCNE (or ASSOCATION)
TESTING FOR HOMOGENEITY
TESTING OF GOODNESS-OF-FIT
![Page 19: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/19.jpg)
Chi-square test
Objective : Smoking is a risk factor for MI
Null Hypothesis: Smoking does not cause MI
D (MI) No D( No MI) Total
Smokers 29 21 50
Non-smokers 16 34 50
Total 45 55 100
![Page 20: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/20.jpg)
Chi-SquareChi-Square
E
O
29
E
O
21
E
O
16
E
O
34
MI Non-MI
Smoker
Non-Smoker
![Page 21: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/21.jpg)
Chi-SquareChi-Square
E
O
29
E
O
21
E
O
16
E
O
34
MI Non-MI
Smoker
Non-smoker
50
50
5545 100
![Page 22: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/22.jpg)
Estimating the Expected FrequenciesEstimating the Expected Frequencies
EE = =^̂ (row total for this cell)•(column total for this cell)(row total for this cell)•(column total for this cell)
nn
ClassificationClassification 1 1
ClassificationClassification 2 2
11
22
33
44 cc
rr
11 22 33
CC11 CC22 CC33 CC44 CCcc
RR11
RR22
RR33
RRrr
EE = =^̂ RR22CC33
nn
![Page 23: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/23.jpg)
Chi-SquareChi-Square
E
O
29
E
O
21
E
O
16
E
O
34
MI Non-MI
Smoker
Non-smoker
50
50
5545 100
50 X 45 100
22.5 =22.5
![Page 24: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/24.jpg)
Chi-SquareChi-Square
E
O
29
E
O
21
E
O
16
E
O
34
Alone Others
Males
Females
50
50
5545 100
22.5 27.5
22.5 27.5
![Page 25: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/25.jpg)
Chi-SquareChi-Square Degrees of Freedom
df = (r-1) (c-1) = (2-1) (2-1) =1
Critical Value (Table A.6) = 3.84
X2 = 6.84
Calculated value(6.84) is greater than critical (table) value (3.84) at 0.05 level with 1 d.f.f
Hence we reject our Ho and conclude that there is highly statistically significant association between smoking and MI.
![Page 26: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/26.jpg)
Chi square table
df ά = 0.05 ά = 0.01 ά = 0.001
1 3.84 6.64 10.83
2 5.99 9.21 13.82 3 7.82 11.35 16.27 4 9.49 13.28 18.47 5 11.07 15.09 20.52 6 12.59 16.81 22.46 7 14.07 18.48 24.32 8 15.51 20.09 26.13 9 16.92 21.67 27.88
10 18.31 23.21 29.59
![Page 27: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/27.jpg)
AgeAge
GenderGender <30<30 30-4530-45 >45>45 TotalTotal
MaleMale 60 (60) 60 (60) 20 (30)20 (30) 40 (30)40 (30) 120120
FemaleFemale 40 (40) 40 (40) 30 (20)30 (20) 10 (20)10 (20) 8080
TotalTotal 100100 5050 5050 200200
Chi- square test
Find out whether the gender is equally distributed among each age group
![Page 28: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/28.jpg)
Test for Homogeneity (Similarity)
To test similarity between frequency distribution or group. It is used in assessing the similarity between non-responders and responders in any survey
Age (yrs) Responders Non-responders Total
<20 76 (82) 20 (14) 96
20 – 29 288 (289) 50 (49) 338
30-39 312 (310) 51 (53) 363
40-49 187 (185) 30 (32) 217
>50 77 (73) 9 (13) 86
Total 940 160 1100
![Page 29: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/29.jpg)
X2 = 0.439+ 2.571+ 0.003+ 0.020+ 0.013+ 0.075+ 0.022+ 0.125+ 0.219+ 1.231 = 4.718
Degrees of Freedom df = (r-1) (c-1) = (5-1) (2-1) =4
Critical Value of X2 with 4 d.f.f at 0.05= 9.49
Calculated value(4.718) is less than critical (table) value (9.49) at 0.05 level with 4 d.f.f
Hence we can not reject our Ho and conclude that the distributions are similar, that is non responders do not differ from responders.
![Page 30: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/30.jpg)
Chi square table
df ά = 0.05 ά = 0.01 ά = 0.001
1 3.84 6.64 10.83
2 5.99 9.21 13.82 3 7.82 11.35 16.27 4 9.49 13.28 18.47 5 11.07 15.09 20.52 6 12.59 16.81 22.46 7 14.07 18.48 24.32 8 15.51 20.09 26.13 9 16.92 21.67 27.88
10 18.31 23.21 29.59
![Page 31: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/31.jpg)
Chi-square test
df
cb
cb
dcbadbca
NbcadN
dfbadcdbca
bcadn
fdeeo
corrected
cr
1
2
2
2
2
2
2
)1)(1(
2
2
~1
~))()()((
2
~))()()((
)(
.~
1
1.
2.
3.
4.
Test statisticsTest statistics
![Page 32: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/32.jpg)
The following data relate to suicidal feelings in samples of psychotic and neurotic patients:
![Page 33: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/33.jpg)
The following data compare malocclusion of teeth with method of feeding infants.
Normal teeth Malocclusion
Breast fed 4 16
Bottle fed 1 21
![Page 34: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/34.jpg)
The method of Yates's correction was useful when manual calculations were done. Now different types of statistical packages are available. Therefore, it is better to use Fisher's exact test rather than Yates's correction as it gives exact result.
1 2 1 2! ! ! !'
! ! ! ! !
R R C CFisher s ExactTest
n a b c d
![Page 35: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/35.jpg)
What to do when we have a paired samples and both the exposure and outcome variables are qualitative variables (Binary).
![Page 36: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/36.jpg)
Problem
A researcher has done a matched case-control study of endometrial cancer (cases) and exposure to conjugated estrogens (exposed).
In the study cases were individually matched 1:1 to a non-cancer hospital-based control, based on age, race, date of admission, and hospital.
![Page 37: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/37.jpg)
Cases Controls Total
Exposed 55 19 74
Not exposed 128 164 292
Total 183 183 366
Data
![Page 38: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/38.jpg)
can’t use a chi-squared test - observations are not independent - they’re paired.
we must present the 2 x 2 table differently each cell should contain a count of the
number of pairs with certain criteria, with the columns and rows respectively referring to each of the subjects in the matched pair
the information in the standard 2 x 2 table used for unmatched studies is insufficient because it doesn’t say who is in which pair - ignoring the matching
![Page 39: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/39.jpg)
Controls
Cases Exposed Not exposed Total
Exposed 12 43 55
Not exposed 7 121 128
Total 19 164 183
Data
![Page 40: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/40.jpg)
McNemar’s test
Situation:
Two paired binary variables that form a particular type of 2 x 2 table
e.g. matched case-control study or cross-over trial
![Page 41: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/41.jpg)
Controls
Cases Exposed Not exposed Total
Exposed e f e+f
Not exposed g h g+h
Total e+g f+h n
We construct a matched 2 x 2 table:
![Page 42: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/42.jpg)
The odds ratio is: f/g
The test is:
Formula
gf
1)-gf( 22
Compare this to the 2 distribution on 1 df
![Page 43: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/43.jpg)
5.2450
1225734
1)-7-43( 22
P <0.001, Odds Ratio = 43/7 = 6.1
p1 - p2 = (55/183) – (19/183) = 0.197 (20%)
s.e.(p1 - p2) = 0.036
95% CI: 0.12 to 0.27 (or 12% to 27%)
![Page 44: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/44.jpg)
Degrees of Freedom df = (r-1) (c-1) = (2-1) (2-1) =1
Critical Value (Table A.6) = 3.84
X2 = 25.92 Calculated value(25.92) is greater than
critical (table) value (3.84) at 0.05 level with 1 d.f.f
Hence we reject our Ho and conclude that there is highly statistically significant association between Endometrial cancer and Estrogens.
![Page 45: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/45.jpg)
![Page 46: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/46.jpg)
| Controls |Cases | Exposed Unexposed | Total-----------------+------------------------+------------ Exposed | 12 43 | 55 Unexposed | 7 121 | 128-----------------+------------------------+------------ Total | 19 164 | 183
McNemar's chi2(1) = 25.92 Prob > chi2 = 0.0000Exact McNemar significance probability = 0.0000
Proportion with factor Cases .3005464 Controls .1038251 [95% Conf. Interval] --------- -------------------- difference .1967213 .1210924 .2723502 ratio 2.894737 1.885462 4.444269 rel. diff. .2195122 .1448549 .2941695
odds ratio 6.142857 2.739772 16.18458 (exact)
Stata Output
![Page 47: Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c01f1a28abf838cd19d5/html5/thumbnails/47.jpg)
In Conclusion !
When both the study variables and outcome variables are categorical (Qualitative):
Apply (i) Chi square test(ii) Fisher’s exact test (Small samples)(iii) Mac nemar’s test ( for paired
samples)