introduction to correlation. correlation – when a relationship exists between two sets of data ...
TRANSCRIPT
Introduction to Correlation
Correlation – when a relationship exists between two sets of data
The news is filled with examples of correlation◦ If you eat so many helpings of tomatoes…◦ One alcoholic beverage a day…◦ Driving faster than the speed limit…◦ Women who smoke during pregnancy…◦ If you eat only fast food for 30 days…◦ If your parents did not have offspring, then you
won’t either (huh?)
Make an XY scatterplot of the data, putting one variable on the x-axis and one variable on the y-axis.
Insert a linear trendline on the graph and include the R2 value
Interpret the results
The higher the R2 value, the better If you only have a few data points, then you
need a higher R2 value in order to conclude there is a correlation
Crude estimate: R2 > 0.5, most people say there is a correlation; R2 < 0.3, the correlation is essentially non-existent
R2 between 0.3 and 0.5?? Gray area!
Look at:◦ CigarettesBirthweight.xls◦ SpeedLimits.xls◦ HeightWeight.xls◦ Grades.xls◦ WineConsumption.xls◦ BreastCancerTemperature.xls
In SPSS, click on Analyze -> Correlate -> Bivariate
Select the two columns of data you want to analyze (move them from the left box to the right box)
You can actually pick more than two columns, but we’ll keep it simple for now
Make sure the checkbox for Pearson Correlation Coefficients is checked
Click OK to run the correlation You should get an output window something
like the following slide
The correlation betweenheight and weight is 0.861
The Pearson Correlation value is not the sameas Excel’s R-squared value; it can be positiveor negative
Positive correlation: as the values of one variable increase, the values of a second variable increase (values from 0 to 1.0)
Negative correlation: as the values of one variable increase, the values of a second variable decrease (values from 0 to -1.0)
Note: The SPSS R value will be greater than Excel’s R2 value! R=.5 equivalent to R2=.25
There is a negative correlation between TV viewing and class grades—students who spend more time watching TV tend to have lower grades (or, students with higher grades tend to spend less time watching TV).
Positive correlation Negative correlation
When looking for correlation, positive correlation is not necessarily greater than negative correlation
Which correlation is the greatest? -.34 .72 -.81 .40 -.12
If two variables are correlated, then we can predict one based on the other
But correlation does NOT imply cause! It might be the case that having more
education causes a person to earn a higher income. It might be the case that having higher income allows a person to go to school more. There could also be a third variable. Or a fourth. Or a fifth…
Causality – one variable, say A, actually causes the change in B. In the absence of any other evidence, data from observational studies simply cannot be used to establish causation.
Common underlying cause or causes – most important one – A is correlated to B, but there is a third factor C (the common underlying cause) that causes the changes in both A and B.
Example: as ice cream sales go up, so do crime rates.
Sheer coincidence – the two variables have nothing in common, but they create a strong R or R2 value
Both variables are changing over time – divorce rates are going up and so are drug-offenses. Is an increase in divorce causing more people to use drugs (and get caught)?