t10 statisitical analysis
TRANSCRIPT
Statistical Analysis
By Rama Krishna Kompella
Relationships Between Variables
• The relationship between variables can be explained in various ways such as:– Presence /absence of a relationship– Directionality of the relationship– Strength of association– Type of relationship
Relationships Between Variables
• Presence / absence of a relationship– E.g., if we are interested to study the customer
satisfaction levels of a fast-food restaurant, then we need to know if the quality of food and customer satisfaction have any relationship or not
Relationships Between Variables
• Direction of the relationship– The direction of a relationship can be either
positive or negative– Food quality perceptions are related positively to
customer commitment toward a restaurant.
Relationships Between Variables
• Strength of association– They are generally categorized as nonexistent, weak,
moderate, or strong.– Quality of food is strongly associated with customer
satisfaction in a fast-food restaurant
Relationships Between Variables
• Type of association– How can the link between Y and X best be
described? – There are different ways in which two variables
can share a relationship• Linear relationship• Curvilinear relationship
Chi-Square (χ2) and Frequency Data
• Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale.
• The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.
Steps in Test of Hypothesis1. Determine the appropriate test 2. Establish the level of significance:α3. Formulate the statistical hypothesis4. Calculate the test statistic5. Determine the degree of freedom6. Compare computed test statistic against a
tabled/critical value
1. Determine Appropriate Test• Chi Square is used when both variables are
measured on a nominal scale.• It can be applied to interval or ratio data that
have been categorized into a small number of groups.
• It assumes that the observations are randomly sampled from the population.
• All observations are independent (an individual can appear only once in a table and there are no overlapping categories).
• It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.
2. Establish Level of Significance
• α is a predetermined value• The convention
• α = .05• α = .01 • α = .001
3. Determine The Hypothesis:Whether There is an Association
or Not• Ho : The two variables are independent
• Ha : The two variables are associated
4. Calculating Test Statistics• Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.• The expected frequencies represent the number of
cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).
• Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases.
Fe= Fr Fc / N
4. Calculating Test Statistics
e
eo
F
FF 22 )(
4. Calculating Test Statistics
e
eo
F
FF 22 )(
Observed
frequencies
Expe
cted
fre
quen
cy
Expected
frequency
5. Determine Degrees of Freedom
df = (R-1)(C-1)
Num
ber of
levels in
column
variable
Num
ber of levels in row
variable
6. Compare computed test statistic against a tabled/critical value
• The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable
• The critical tabled values are based on sampling distributions of the Pearson chi-square statistic
• If calculated 2 is greater than 2 table value, reject Ho
Example
• Suppose a researcher is interested in buying preferences of environmentally conscious consumers.
• A questionnaire was developed and sent to a random sample of 90 voters.
• The researcher also collects information about the gender of the sample of 90 respondents.
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column 25 25 40 n = 90
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column 25 25 40 n = 90
Obser
ved
frequ
encie
s
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column 25 25 40 n = 90
Row
frequency
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column 25 25 40 n = 90Column frequency
1. Determine Appropriate Test
1. Gender ( 2 levels) and Nominal2. Buying Preference ( 3 levels) and Nominal
2. Establish Level of Significance
Alpha of .05
3. Determine The Hypothesis
• Ho : There is no difference between men and women in their opinion on pro-environmental products.
• Ha : There is an association between gender and opinion on pro-environmental products.
4. Calculating Test Statistics
Favor Neutral Oppose f row
Men fo =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
Women fo =15
fe =11.1
fo =15
fe =11.1
fo =10
fe =17.8
40
f column 25 25 40 n = 90
4. Calculating Test Statistics
Favor Neutral Oppose f row
Men fo =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
Women fo =15
fe =11.1
fo =15
fe =11.1
fo =10
fe =17.8
40
f column 25 25 40 n = 90
= 50*25/90
4. Calculating Test Statistics
Favor Neutral Oppose f row
Men fo =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
Women fo =15
fe =11.1
fo =15
fe =11.1
fo =10
fe =17.8
40
f column 25 25 40 n = 90
= 40* 25/90
4. Calculating Test Statistics
8.17
)8.1710(
11.11
)11.1115(
11.11
)11.1115(
2.22
)2.2230(
89.13
)89.1310(
89.13
)89.1310(
222
2222
= 11.03
5. Determine Degrees of Freedom
df = (R-1)(C-1) =(2-1)(3-1) = 2
6. Compare computed test statistic against a tabled/critical value
• α = 0.05• df = 2• Critical tabled value = 5.991• Test statistic, 11.03, exceeds critical value• Null hypothesis is rejected• Men and women differ significantly in their
opinions on pro-environmental products
SPSS Output Example
Chi-Square Tests
11.025a 2 .004
11.365 2 .003
8.722 1 .003
90
Pearson Chi-Square
Likelihood Ratio
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)
0 cells (.0%) have expected count less than 5. Theminimum expected count is 11.11.
a.
Additional Information in SPSS Output
• Exceptions that might distort χ2 Assumptions– Associations in some but not all categories– Low expected frequency per cell
• Extent of association is not same as statistical significance
Demonstratedthrough an example
Another Example Heparin Lock Placement
Complication Incidence * Heparin Lock Placement Time Group Crosstabulation
9 11 20
10.0 10.0 20.0
18.0% 22.0% 20.0%
41 39 80
40.0 40.0 80.0
82.0% 78.0% 80.0%
50 50 100
50.0 50.0 100.0
100.0% 100.0% 100.0%
Count
Expected Count
% within Heparin LockPlacement Time Group
Count
Expected Count
% within Heparin LockPlacement Time Group
Count
Expected Count
% within Heparin LockPlacement Time Group
Had Compilca
Had NO Compilca
ComplicationIncidence
Total
1 2
Heparin LockPlacement Time Group
Total
from Polit Text: Table 8-1
Time:1 = 72 hrs 2 = 96 hrs
Hypotheses in Smoking Habit
• Ho: There is no association between complication incidence and duration of smoking habit. (The variables are independent).
• Ha: There is an association between complication incidence and duration of smoking habit. (The variables are related).
More of SPSS Output
Pearson Chi-Square
• Pearson Chi-Square = .250, p = .617
Since the p > .05, we fail to reject the null hypothesis that the complication rate is unrelated to smoking habit duration.
• Continuity correction is used in situations in which the expected frequency for any cell in a 2 by 2 table is less than 10.
More SPSS Output
Symmetric Measures
-.050 .617
.050 .617
-.050 .100 -.496 .621c
-.050 .100 -.496 .621c
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
Phi Coefficient
• Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship
• Phi coefficient is the measure of the strength of the association
Symmetric Measures
-.050
.050
-.050
-.050
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
Value
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
N
2
Cramer’s V
• When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V.
• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.
Symmetric Measures
-.050
.050
-.050 .100
-.050 .100
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
ValueAsymp.
Std. Error
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
)1(
2
kNV
Cramer’s V
• When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V.
• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.
Symmetric Measures
-.050
.050
-.050 .100
-.050 .100
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
ValueAsymp.
Std. Error
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
)1(
2
kNV
Number of cases
Smallest of number of rows or
columns
Q & As