t10 statisitical analysis

Statistical Analysis

By Rama Krishna Kompella

Relationships Between Variables

• The relationship between variables can be explained in various ways such as:– Presence /absence of a relationship– Directionality of the relationship– Strength of association– Type of relationship


• Presence / absence of a relationship– E.g., if we are interested to study the customer

satisfaction levels of a fast-food restaurant, then we need to know if the quality of food and customer satisfaction have any relationship or not


• Direction of the relationship– The direction of a relationship can be either

positive or negative– Food quality perceptions are related positively to

customer commitment toward a restaurant.


• Strength of association– They are generally categorized as nonexistent, weak,

moderate, or strong.– Quality of food is strongly associated with customer

satisfaction in a fast-food restaurant


• Type of association– How can the link between Y and X best be

described? – There are different ways in which two variables

can share a relationship• Linear relationship• Curvilinear relationship

Chi-Square (χ2) and Frequency Data

• Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale.

• The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.

Steps in Test of Hypothesis1. Determine the appropriate test 2. Establish the level of significance:α3. Formulate the statistical hypothesis4. Calculate the test statistic5. Determine the degree of freedom6. Compare computed test statistic against a

tabled/critical value

1. Determine Appropriate Test• Chi Square is used when both variables are

measured on a nominal scale.• It can be applied to interval or ratio data that

have been categorized into a small number of groups.

• It assumes that the observations are randomly sampled from the population.

• All observations are independent (an individual can appear only once in a table and there are no overlapping categories).

• It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.

2. Establish Level of Significance

• α is a predetermined value• The convention

• α = .05• α = .01 • α = .001

3. Determine The Hypothesis:Whether There is an Association

or Not• Ho : The two variables are independent

• Ha : The two variables are associated

4. Calculating Test Statistics• Contrasts observed frequencies in each cell of a

contingency table with expected frequencies.• The expected frequencies represent the number of

cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).

• Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases.

Fe= Fr Fc / N

4. Calculating Test Statistics

e

eo

F

FF 22 )(


e

eo

F

FF 22 )(

Observed

frequencies

Expe

cted

fre

quen

cy

Expected

frequency

5. Determine Degrees of Freedom

df = (R-1)(C-1)

Num

ber of

levels in

column

variable

Num

ber of levels in row

variable

6. Compare computed test statistic against a tabled/critical value

• The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable

• The critical tabled values are based on sampling distributions of the Pearson chi-square statistic

• If calculated 2 is greater than 2 table value, reject Ho

Example

• Suppose a researcher is interested in buying preferences of environmentally conscious consumers.

• A questionnaire was developed and sent to a random sample of 90 voters.

• The researcher also collects information about the gender of the sample of 90 respondents.

Bivariate Frequency Table or Contingency Table

Favor Neutral Oppose f row

Male 10 10 30 50

Female 15 15 10 40

f column 25 25 40 n = 90



Male 10 10 30 50

Female 15 15 10 40

f column 25 25 40 n = 90

Obser

ved

frequ

encie

s



Male 10 10 30 50

Female 15 15 10 40

f column 25 25 40 n = 90

Row

frequency



Male 10 10 30 50

Female 15 15 10 40

f column 25 25 40 n = 90Column frequency

1. Determine Appropriate Test

1. Gender ( 2 levels) and Nominal2. Buying Preference ( 3 levels) and Nominal

2. Establish Level of Significance

Alpha of .05

3. Determine The Hypothesis

• Ho : There is no difference between men and women in their opinion on pro-environmental products.

• Ha : There is an association between gender and opinion on pro-environmental products.



Men fo =10

fe =13.9

fo =10

fe =13.9

fo =30

fe=22.2

50

Women fo =15

fe =11.1

fo =15

fe =11.1

fo =10

fe =17.8

40

f column 25 25 40 n = 90



Men fo =10

fe =13.9

fo =10

fe =13.9

fo =30

fe=22.2

50

Women fo =15

fe =11.1

fo =15

fe =11.1

fo =10

fe =17.8

40

f column 25 25 40 n = 90

= 50*25/90



Men fo =10

fe =13.9

fo =10

fe =13.9

fo =30

fe=22.2

50

Women fo =15

fe =11.1

fo =15

fe =11.1

fo =10

fe =17.8

40

f column 25 25 40 n = 90

= 40* 25/90


8.17

)8.1710(

11.11

)11.1115(

11.11

)11.1115(

2.22

)2.2230(

89.13

)89.1310(

89.13

)89.1310(

222

2222

= 11.03

5. Determine Degrees of Freedom

df = (R-1)(C-1) =(2-1)(3-1) = 2

6. Compare computed test statistic against a tabled/critical value

• α = 0.05• df = 2• Critical tabled value = 5.991• Test statistic, 11.03, exceeds critical value• Null hypothesis is rejected• Men and women differ significantly in their

opinions on pro-environmental products

SPSS Output Example

Chi-Square Tests

11.025a 2 .004

11.365 2 .003

8.722 1 .003

90

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. Theminimum expected count is 11.11.

a.

Additional Information in SPSS Output

• Exceptions that might distort χ2 Assumptions– Associations in some but not all categories– Low expected frequency per cell

• Extent of association is not same as statistical significance

Demonstratedthrough an example

Another Example Heparin Lock Placement

Complication Incidence * Heparin Lock Placement Time Group Crosstabulation

9 11 20

10.0 10.0 20.0

18.0% 22.0% 20.0%

41 39 80

40.0 40.0 80.0

82.0% 78.0% 80.0%

50 50 100

50.0 50.0 100.0

100.0% 100.0% 100.0%

Count

Expected Count

% within Heparin LockPlacement Time Group

Count

Expected Count


Count

Expected Count


Had Compilca

Had NO Compilca

ComplicationIncidence

Total

1 2

Heparin LockPlacement Time Group

Total

from Polit Text: Table 8-1

Time:1 = 72 hrs 2 = 96 hrs

Hypotheses in Smoking Habit

• Ho: There is no association between complication incidence and duration of smoking habit. (The variables are independent).

• Ha: There is an association between complication incidence and duration of smoking habit. (The variables are related).

More of SPSS Output

Pearson Chi-Square

• Pearson Chi-Square = .250, p = .617

Since the p > .05, we fail to reject the null hypothesis that the complication rate is unrelated to smoking habit duration.

• Continuity correction is used in situations in which the expected frequency for any cell in a 2 by 2 table is less than 10.

More SPSS Output

Symmetric Measures

-.050 .617

.050 .617

-.050 .100 -.496 .621c

-.050 .100 -.496 .621c

100

Phi

Cramer's V

Nominal byNominal

Pearson's RInterval by Interval

Spearman CorrelationOrdinal by Ordinal

N of Valid Cases

ValueAsymp.

Std. Errora

Approx. Tb

Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on normal approximation.c.

Phi Coefficient

• Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship

• Phi coefficient is the measure of the strength of the association

Symmetric Measures

-.050

.050

-.050

-.050

100

Phi

Cramer's V

Nominal byNominal



N of Valid Cases

Value




N

2

Cramer’s V

• When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V.

• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.

Symmetric Measures

-.050

.050

-.050 .100

-.050 .100

100

Phi

Cramer's V

Nominal byNominal



N of Valid Cases

ValueAsymp.

Std. Error




)1(

2

kNV

Cramer’s V

• When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V.

• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.

Symmetric Measures

-.050

.050

-.050 .100

-.050 .100

100

Phi

Cramer's V

Nominal byNominal



N of Valid Cases

ValueAsymp.

Std. Error




)1(

2

kNV

Number of cases

Smallest of number of rows or

columns

Q & As

t10 statisitical analysis

Technology

test statistics fo fe

nominal variables

frequency data

variables direction

computed test statistic

relationship directionality

expected frequency

test of hypothesis1