education 793 class notes joint distributions and correlation 1 october 2003
TRANSCRIPT
Education 793 Class Notes
Joint Distributions and Correlation
1 October 2003
Today’s Agenda
• Class and lab announcements
• Your questions?
• Joint distributions
• Correlation analysis to regression
Joint Distributions
• In correlational studies, the researcher is interested in questions about the relationship between two or more variables.
• How are scores on one variable associated with scores on another variable?
• A joint distribution is a distribution in which pairs of scores for each subject are recorded.
Graphical Representation
• Scatterplots of the (x,y)’s.
SticiGui: Scatterplots and Association
Definition:
Correlation - a measure of the strength of association between two variables.
Pearson-Product Correlation: Measure of Association
• An index showing the degree to which two distributions that show a linear relationship in the scatterplot are associated
• Values range from –1 to +1, with 0 indicating no relationship
• The average crossproduct of the standard scores of two variables
• Computed as:
Important Properties
• Will underestimate curvilinear relationships• As homogenity increases, correlation
coefficient tends to decrease• Size of sample does not affect size of correlation
coefficient• Positive Associations mean that as X increases Y
increases and negative association means that as X increases Y decreases
• Correlation is just the standardized version of the covariance (does not depend on magnitude of sdy and sdx
Individual Contributions to rMean of x = 27.50; s = 17.08Mean of y = 31.25; s = 18.87
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50
(5;45) (25;45)
(45;5)
(35;30)
++
+---
-+
Visualizing CorrelationsPlot A Plot B Plot C Plot D
Plot E Plot F Plot G Plot H
Squared Correlation Coefficient or Coefficient of Determination2xyr
Coefficient of Determination tells you how much (percent) of the variance in one set of scores is accounted for by knowing the other set of scores.
Shared Variance
=shared variance / total variance
Restricting Range
N = 255R = .63
A
The Impact of Restricted Range
N = 43R = .17
B C
N = 4R = .10
Correlation and Causality
Correlation does not equal causation
The higher the absolute value of a correlation, the stronger the relationship between two variables. Strength, though, does not explain the source of the relationship
Causal Interpretation
Logical possibility
Symbolic representation
Causal Explanation
1. A B A causes B
2. A B B causes A
3. A C B C causes both A and B
4. D C A
D A
D causes C which, in turn causes A
D causes A directly
Extending Correlation to Regression
Goal:
To predict values of our dependent variable based on values of our independent variable(s) and our knowledge of the underlying relationship (measured by Pearson's r)
Requirements:
Have data appropriate for computing rBe willing to specify nature of relationship (IV DV)
Extending the Correlation
Aptitude and Performance
Creating the Prediction Equation
Calculating Y-hat
Predicting the DV
Residuals
Standard Error of Estimate
• A natural extension of the standard deviation– Deviations from the mean predicted value– Squared– Summed– Divide by N (or N-2 when estimating
parameters)– This is an estimate or the error made when
estimating y from x.
Formula for SE of Estimate
2
)ˆ( 2
.2
n
yys yx
An alternative formula: 222
. 1 xyyyx rss
Since r2xy=proportion of variance in y predictable from x,
1- r2xy is the proportion that is NOT predictable from x.
Hence, the error.
For Next Week
• Chapter 8 p. 211-225
• Chapter 10 p. 249-271