chi square test dealing with categorical dependant variable

Post on 21-Dec-2015

221 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chi Square Test

Dealing with categorical dependant variable

So Far:

Continuous DV

Categorical DV

Categorical IV

Continuous IV

•T-test•ANOVA

•Correlation•Regression

Categorical IV

•CHI Square

Pearson Chi-Square:

•Frequencies No mean and SD 2

statisticsNo assumption of normality Non-parametric test

Chi-Square test for goodness of fit

50 30 30 10

Observed Frequencies

-Is the frequency of balls with different colors equal in our bag?

25% 25% 25% 25%

Expected Frequencies

Chi-Square test for goodness of fit

50 30 30 10

Observed Frequencies

25% 25% 25% 25%

Expected Frequencies

120

120

Total

=

30 30 30 30

Expected Frequencies

H0

Chi-Square test for goodness of fit

50 30 30 10

Observed Frequencies

30 30 30 30

Expected Frequencies

2 =( f0− fe)

2

fe∑

2 =(50 − 30)2

30+

(30 − 30)2

30+

(30 − 30)2

30+

(10 − 30)2

30= 26.6

Difference

Normalize

Chi-Square test for goodness of fit

2=26.6

df =C−1= 4−1= 3

25% 25% 25% ? 100

TotalFixed = 25%

(1/2)k /2

Γ(k /2)xk /2 −1e−x /2

Chi-Square test for goodness of fit

2=26.6

df =C−1= 4−1= 3

Critical value = 7.81

26.6

2(3,n=120) = 26.66, p< 0.001

Chi-Square test for Goodness of fit

•Chi-Square test for goodness of fit is like one sample t-test

•You can test your sample against any possible expected values

25% 25% 25% 25%

10% 10% 10% 70%

H0

H0

Chi-Square test for independence

•When we have tow or more sets of categorical data (IV,DV both categorical)

10 50 35

15 60 40

Male

Female

None

Obama McCain

95

115

25 110 75 210

FO

Chi-Square test for independence•Also called contingency table analysis

•H0: There is no relation between gender and voting preference (like correlation)

OR

•H0: There is no difference between the voting preference of males and females (like t-test)

•The logic is the same as the goodness of fit test: Comparing observed freq and Expected freq if the two variables were independent

Chi-Square test for independence

10 50 35

15 60 40

Male

Female

None

Obama McCain

95

115

25 110 75 210

FO

Male

Female

None

Obama McCain

12% 52% 36% 100%

FE

Chi-Square test for independenceIn case of

independence:

12% 52% 36%

12% 52% 36%

Male

Female

None

Obama McCain

12% 52% 36% 100%

FE

95

115

Finaly:

11.4 49.4 34.2

13.8 59.8 41.4

Male

Female

None

Obama McCainFE

Chi-Square test for independence

•Anotehr way:

Male

Female

None

Obama McCain

95

25210

FE

95 x 25 210

fe= fc× fr

n= column×row

total

Chi-Square test for independence

•Now we can calculate the chi square value :

11.4 49.4 34.2

13.8 59.8 41.4

FE

10 50 35

15 60 40

FO

2 =( f0− fe)

2

fe∑

2 =(10 −11.4)2

11.4+

(15 −13.8)2

13.8+ ...= 0.35

df =(C−1)×(R−1)=(3−1)×(2−1)= 2

Chi-Square test for independence

df =(C−1)×(R−1)=(3−1)×(2−1)= 2

11.4 49.4 Fixed

Fixed Fixed Fixed

Male

Female

None

Obama McCain

95

115

25 110 75 210

FE

Chi-Square test for independence

2(2, n=210) = 0.35, p= 0.83

There is no significant effect of gender on vote preference

Or

We cannot reject the null hypothesis that gender and vote preference are independent

Effect size in Chi square

•For a 2 x 2 table -> Phi Coefficient

•For larger tables -> Cramer’s V coeffiecient€

φ= 2

n

V =χ 2

n×df *

Correlation between two categorical variables

Df* is the smallest of C-1, R-1

Phi of 0.1 small, 0.3 medium, 0.5 large

Assumptions of Chi Square

•Independence of observations each subject in only one category

•Size of expected frequencies: be cautious with small cell frequencies

•No assumption of Normality: Nonparametric test

Likelihood ratio test: an alternative

•Instead of using Chi-Square, when dealing with categorical data we can calculate log likelihood ratio:

G= 2× f0×ln(fofe

∑ )

•A ration of observed and expected frequencies

Likelihood ratio test: an alternative

G= 2× f0×ln(fofe

∑ )11.4 49.4 34.2

13.8 59.8 41.4

FE

10 50 35

15 60 40

FO

G = 2 × (10 × ln(10

11.4) +15 × ln(

15

13.8) + ...) = 0.355

•Follows a Chi-square distribution with df of (R-1)(C-1)

Chi Square test with rank ordered data

M 2 =(N−1)r2

10 50 35

15 60 40

•Rank order your data for the two variables•Get the correlation of the two variables: Spearman r•Calculate chi Square as follows:

1

2

1 2 3

Anxiety Level

1 1 1

2 3 2

3 2 2

4 2 1

5 1 1

6 2 1

7 1 2

S A G

top related