inference for categorical data
DESCRIPTION
Inference for Categorical Data. William P. Wattles, Ph. D. Francis Marion University. Continuous vs. Categorical. Continuous (measurement) variables have many values Categorical variables have only certain values representing different categories - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/1.jpg)
1
Inference for Categorical Data
William P. Wattles, Ph. D.Francis Marion University
![Page 2: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/2.jpg)
2
Continuous vs. Categorical• Continuous (measurement) variables have
many values• Categorical variables have only certain
values representing different categories• Ordinal-a type of categorical with a natural
order (e.g., year of college)• Nominal-a type of categorical with no order
(e.g., brand of cola)
![Page 3: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/3.jpg)
3
Categorical Data• Tells which category an individual is in
rather than telling how much.• Sex, race, occupation naturally categorical• A quantitative variable can be grouped to
form a categorical variable. • Analyze with counts or percents.
![Page 4: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/4.jpg)
4
Describing relationships in categorical data
• No single graph portrays the relationship
• Also no similar number summarizes the relationship
• Convert counts to proportions or percents
![Page 5: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/5.jpg)
55
Prediction
![Page 6: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/6.jpg)
66
Prediction
![Page 7: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/7.jpg)
7
Moving from descriptive to Inferential
• Chi Square Inference involves a test of independence.
• If variable are independent, knowledge of one variable tells you nothing about the other.
![Page 8: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/8.jpg)
8
Moving from descriptive to Inferential
• Inference involves expected counts. – Expected count=The count that would occur if
the variables are independent
![Page 9: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/9.jpg)
9
Inference for two-way tables
• Chi Square test of independence.• For more than two groups• Cannot compare multiple groups one at a
time.
![Page 10: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/10.jpg)
10
To Analyze Categorical Data
• First obtain counts• In Excel can do this with a pivot table• Put data in a Matrix or two-way table
![Page 11: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/11.jpg)
11
Matrix or two-way table
Republican Democrat Independent
Male 18 43 14
Female 39 23 18
![Page 12: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/12.jpg)
12
Inference for two-way tables
• Expected count• The count that would occur if the variables
are independent
![Page 13: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/13.jpg)
13
Matrix or two-way table• Rows• Columns• Distribution: how often each outcome
occurred• Marginal distribution: Count for all entries
in a row or column
![Page 14: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/14.jpg)
14
Row and column totals
RepublicanDemocrat IndependentMale 18 43 14 75Female 39 23 18 80
57 66 32 155
![Page 15: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/15.jpg)
15
RepublicanDemocrat IndependentMale 75 48%Female 80 52%
57 66 32 15537% 43% 21%
![Page 16: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/16.jpg)
16
Expected counts• 37% of all subjects are Republicans• If independent 37% of females should be
Republican (expected value)• 37% of 80= 29• 37% of 75 = 28
![Page 17: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/17.jpg)
17
Expected counts rounded
Republican Democrat Independent totalMale 28 32 15 75Female 29 34 17 80total 57 66 32 155
![Page 18: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/18.jpg)
18
Observed vs. ExpectedRepublicanDemocrat Independent
Male 18 43 14 75Female 39 23 18 80
57 66 32 155
Republican Democrat Independent totalMale 28 32 15 75Female 29 34 17 80total 57 66 32 155
![Page 19: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/19.jpg)
19
Chi-Square• Chi-square A measure of how far the
observed counts are from the expected counts
![Page 20: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/20.jpg)
20
Chi-square test of independence
e
eo
fffX
22 )(
![Page 21: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/21.jpg)
21
Chi Square test of independence with SPSS
![Page 22: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/22.jpg)
22
Chi Square test of independence with SPSS
![Page 23: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/23.jpg)
23
Chi Square
![Page 24: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/24.jpg)
24
Chi-square test of independence
• Degrees of Freedom• df=number of rows-1 times number of
columns -1• compare the observed and expected counts.• P-value comes from comparing the Chi-
square statistic with critical values for a chi-square distribution
![Page 25: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/25.jpg)
25
Example• Have the percent of majors changed by
school?
![Page 26: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/26.jpg)
26
Data collection
http://www.fmarion.edu/about/FactBook2004/2005 Fall 2004 Graduates by Major
![Page 27: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/27.jpg)
27
![Page 28: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/28.jpg)
28
![Page 29: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/29.jpg)
29
Chi Square
![Page 30: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/30.jpg)
30
Marital Status, page 543
job grade single married divorced widowed1 58 874 15 82 222 3927 70 203 50 2396 34 104 7 533 7 4
![Page 31: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/31.jpg)
31
Marital Status, page 543
Test Statistics Value df p-valuePearson Chi-Square 67.491 9 0.0000
![Page 32: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/32.jpg)
32
Olive Oil, page 578
low medium highColon cancer 398 397 430rectal 250 241 217controls 1368 1377 1409
Olive Oil
![Page 33: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/33.jpg)
33
Olive Oil, page 578
Test Statistics Value df p-valuePearson Chi-Square 1.552 4 0.817Continuity Adjusted Chi-Square1.396 4 0.845Likelihood Ratio Chi-Square1.549 4 0.818
![Page 34: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/34.jpg)
34
Business Majors, page 563
Female MaleAccounting 68 56Administration 91 40Economics 5 6Finance 61 59
![Page 35: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/35.jpg)
35
Business Majors, page 563
Test Statistics Value df p-valuePearson Chi-Square 10.827 3 0.013
![Page 36: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/36.jpg)
36
Exam Three• 37 multiple choice
questions, 4 short answer• T-tests and chi square on
Excel• General questions about
analyzing categorical data and t-tests
• Review from earlier this term
![Page 37: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/37.jpg)
37
Inference as a decision• We must decide if the null hypothesis is
true.• We cannot know for sure.• We choose an arbitrary standard that is
conservative and set alpha at .05• Our decision will be either correct or
incorrect.
![Page 38: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/38.jpg)
38
Type I and Type II errors
Ho is really True
Ho is really False
We reject Ho
Type I Error (false alarm)
Correct Decision
We accept Ho
Correct decision Type II Error (miss)
![Page 39: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/39.jpg)
39
Type I error• If we reject Ho when in fact Ho is true, this
is a Type I error• Statistical procedures are designed to
minimize the probability of a Type I error, because they are more serious for science.
• With a Type I error we erroneously conclude that an independent variable works.
![Page 40: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/40.jpg)
40
Type II error
• If we accept Ho when in fact Ho is false this is a Type II error.
• A type two error is serious to the researcher.• The Power of a test is the probability that
Ho will be rejected when it is, in fact, false.
![Page 41: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/41.jpg)
41
Probability
Ho is really True
Ho is really False
We reject Ho
p= p=1-
We accept Ho
p=1- p=
![Page 42: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/42.jpg)
42
Power• The goal of any scientific research is to
reject Ho when Ho is false.• To increase power:
– a. increase sample size– b. increase alpha– c. decrease sample variability– d. increase the difference between the means
![Page 43: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/43.jpg)
43
Categorical data example• African-American students more likely to
register via the web.
![Page 44: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/44.jpg)
44
Table
Variable White African-AmericanStudents University-Wide n Percent n PercentRegister on the Web 447 34% 284 44%Register with other method 876 66% 356 56%Total 1323 640
![Page 45: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/45.jpg)
45
Web Registration by Race
34%
25%
44%
29%
0%
10%
20%
30%
40%
50%
60%
2000 2001Year
WhiteAfrican-American
![Page 46: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/46.jpg)
46
Categorical Data Example• African-American students university-wide
(44%) were more likely that white students (34%) to use web registration, X2(1, N = 1963) = 20.7 , p < .001.
![Page 47: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/47.jpg)
47
![Page 48: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/48.jpg)
48
Smoking among French Men
• Do these data show a relationship between education and smoking in French men?
![Page 49: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/49.jpg)
49
![Page 50: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/50.jpg)
50
![Page 51: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/51.jpg)
51
The EndThe End
![Page 52: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/52.jpg)
52
Benford’s Law page 550• Faking data?
![Page 53: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/53.jpg)
53
Problem 20.14Digit ratio Observed
1 0.301 62 0.176 43 0.125 64 0.097 75 0.079 36 0.067 57 0.058 68 0.051 49 0.046 4
![Page 54: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/54.jpg)
54
Digit ratio Expected Observed1 0.301 13.545 62 0.176 7.92 43 0.125 5.625 64 0.097 4.365 75 0.079 3.555 36 0.067 3.015 57 0.058 2.61 68 0.051 2.295 49 0.046 2.07 4
![Page 55: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/55.jpg)
55
Expected Observed13.545 6 4.20280731
7.92 4 1.940202025.625 6 0.0254.365 7 1.590658653.555 3 0.086645573.015 5 1.306873962.61 6 4.40310345
2.295 4 1.266677562.07 4 1.7994686
16.6214371
![Page 56: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/56.jpg)
56
Significance test
chitest p = 0.03430
![Page 57: Inference for Categorical Data](https://reader035.vdocuments.net/reader035/viewer/2022062410/56815ee7550346895dcd9238/html5/thumbnails/57.jpg)
57
Example• Survey2 Berk & Carey
page 261