lt4011 week 22 slides
TRANSCRIPT
TESTS OF CORRELATIONWeek 22
IN THIS LECTURE
Last week we looked at what correlation was
This week, we learn: Some different tests of correlation, and when we use them How to produce tests of correlation in SPSS The impact of significance on statistical results
DATA ANALYSIS Univariate analysis (one variable at a time) =
Most graphs and tables & descriptive statistics
Bivariate analysis (comparing two variables) = Contingency tables (cross-tabs) and the tests of correlation we look at today
Multivariate analysis (comparing more than two variables at a time) – not covered in this module!
CORRELATION – A REMINDER Will usually be from -1 to +1 The number indicates the strength of the
relationship The closer the number is to 1, the stronger the
relationship, the closer to 0, the weaker the relationship
The number will be positive or negative, which will indicate the direction of the relationship
Correlations do not prove causality The type of test you need depends on the
type of data contained in the variables you are analysing
WHICH TEST TO USE?
Nominal Ordinal ScaleNominal Contingency
table+ Chi-Square+ Cramer’s V
Contingency table+ Chi-Square+ Cramer’s V
Contingency table+ Chi-Square+ Cramer’s V
Ordinal Contingency table+ Chi-Square+ Cramer’s V
Spearman’s rho Spearman’s rho
Scale Contingency table+ Chi-Square+ Cramer’s V
Spearman’s rho Pearson’s r
An example – same as last week
We are interested to know whether the amount of CO2 emission is associated with the gross domestic product (or growth in the economy) among some countries.In particular, we want to know if the increase in CO2 emission is explained by growth in the economy (GDP) and vice versa.
Using SPSS to calculate the correlation coefficient
If we were dealing with ordinal data, we would use the Spearman correlation.
The correlation coefficient between Co2 emissions and GDP is 0.721This is considered a relatively strong positive correlation.
ONE TAIL OR TWO? One-tailed and two-tailed
tests are alternative methods for testing hypotheses
One-tailed assumes the result in only one direction
Two-tailed assumes the possibility that the parameter could deviate in either direction
Unless you are certain why, you should use a two-tailed test
SO WHAT ABOUT SIGNIFICANCE? Statistical significance,
represented by p, is the level of confidence that the findings do actually exist in the populations (versus the findings occurring by chance)
Generally speaking, the maximum acceptable level we would consider something significant is p = <0.05 – this means there are fewer than 5 chances in 100 that you have a sample that shows a relationship when there is not one in the population
SO WHAT IF THE VARIABLES ARE NOMINAL? We use a Chi-Square test to compare the
frequencies we have to the frequencies we would expect to see if the variables were unrelated (ie. Observed counts versus expected counts)
The higher the Chi-square value, the greater the difference between these counts are
The significance tells you the likelihood that these results have occurred by chance (should be <0.05)
Chi-square ONLY indicates whether or not the variables are not related. It DOES NOT tell you anything about the strength or direction of any correlation
CHI-SQUARE TEST Chi-square should
only be produced alongside a cross-tabulation
It assumes that there are no expected counts of zero, and less than 20% of categories with an expected count of less than 5 – it will tell you these assumptions as notes!
CHI-SQUARE TEST
Degrees of freedom
Significance – should be
<0.05
This is the Chi-Square value – higher means more difference, but the number
alone is meaningless without the significance
The Chi-square test is comparing observed counts with expected
counts
Assumptions checked here