introduction to regression analysis. two purposes explanation –explain (or account for) the...
TRANSCRIPT
Two Purposes
• Explanation– Explain (or account for) the variance in a
variable (e.g., explain why children’s test scores vary).
– We’ll cover this later.• Prediction
– Construction an equation to predict scores on some variable.
– Construct an equation that can be used in selecting individuals.
Prediction• Use a set of scores collected from a sample
to make predictions about individuals in the population (not in the sample).
• Use the scores to construct a mathematical (typically linear) equation that allows us to predict performance.
• Two types of scores are collected:– Usually a measure on one criterion (outcome,
dependent) variable.– Scores on one or more predictor (independent)
variables.
The equations
The equation for one individual’s criterion score:
The prediction equation for that individual’s score
The difference between the two equations (called a residual)
Y f X X e1 11 1 2 1 ( . . . )
( ... )Y f X X1 11 1 2
e Y Y1 1 1
The function
The linear function has the form:
Where the βs are weights (regression weights) selected such that sum of squared errors are minimized (least squares criterion)
1 1 11 2 12( ) ...f X X X
M in e 2
M in Y Y2
Multiple Correlation
Minimizing the sum of squared errors causes the correlation between the actual criterion scores and the predicted scores to be maximized (as large as possible). This correlation is called a multiple correlation. It is the correlation between the criterion variable and a linear composite of the predictor variables.
ˆyyR Maximum
Coefficient of DeterminatinThe square of the multiple correlation,
is called the coefficient of determination. It gives the proportion of shared variance (i.e., covariance) between the criterion variable and the weighted linear composite.
Hence the larger the, R2, the better the prediction equation.
R 2
Basic regression equation
The parametric regression model is given by
Yi = α + βXi + εi The model f or a sample, by
Yi = a + bXi + ei
Partitioning the Sum ofSquares (SSy)
SSy is given by
Now, Consider the following identity
Subtracting from each side gives,
Squaring and summing gives,
2( )Y Y
)ˆ()ˆ( YYYYYY
ˆ ˆ( ) ( )Y Y Y Y Y Y
2 2
2 2
ˆ ˆ( ) [( ) ( )]
ˆ ˆ ˆ ˆ( ) ( ) 2 ( )( )
Y Y Y Y Y Y
Y Y Y Y Y Y Y Y
Simplifying the previous equation
Where SSreg = Sum of squares due to regression, and
SSres = Residual sum of squares.
Dividing through by the total sum of squares, gives:
, or
2 2 2ˆ ˆ( ) ( )
reg res
y Y Y Y Y
SS SS
Calculation of squares and cross-products
Deviation squares and cross-products
Sums of squares and cross-products
Y 3 1 0 4 5
y .4
-1.6 -2.6 1.4 2.4
y2 .16
2.56 6.76 1.96 5.76
X 1 0 1
-1 2
x .4
-.6 -1.6
.4 1.4
x2 .16 .36
2.56 .16
1.96
xy .16 .96
4.16 .56
3.36
Calculation of the coefficients
The slope,
the intercept,
and the regression line.
769.12.5
2.92
x
xyb
5224.1
0776.16.2
)6.769.16.2(
XbYa
XY 769.1522.1ˆ
Calculation of SSreg
From an earlier equation…
.28.162.5
64.84
)(
)(
)(
)(
])[(
)(
2
2
22
22
2
2
2
2
x
xy
xx
xy
xb
bx
YbxY
YbxY
YYSSreg
The score, Y, is partitioned
Hence, Y is partitioned into a deviation of a predicted score from the mean or the scores PLUS a deviation of the actual score from the predicted score.
Our next step is to square the deviation,
and sum over all the scores.
y Y Y
y Y Y
Partitioning the sum of squared deviations (sum of squares,
SSy)
y Y Y
Y Y Y Y
Y Y Y Y
SS SS
2 2
2
2 2
( )
[( ) ( )]
( ) ( )
reg res
What happened to the term,
Showing that reduces
to zero requires some complicated
algebra, recalling that
and that
ˆ ˆ2 ( )( )?Y Y Y Y ˆ ˆ2 ( )( )Y Y Y Y
ˆyxY a b X
2/ .yxb xy x
Calculation of proportions of sums of squares due to regression and due to
error (or residual)
y
y
SS
y
SS
y
SS
y
SS
y
2
2
21
reg
2
res
2
reg
2
res
Alternative formulas for computing the sums of squares due to regression
SS Y Y
Y bx Y
bx
b x
xy
xx
xy
x
xy
xxy
b xy
reg
( )
( )
( )
( )
( )
( )
2
2
2
2 2
2
2 22
2
2
2
Test of the regression coefficient, byx, (i.e. test the null hypothesis that byx =
0)First compute the variance of estimate
s est
Y Y
N kSS
N k
y x y
2 2
2
1
1
( )
( )
res
Test of the regression coefficient, byx, (i.e. test the null hypothesis that byx =
0)Then obtain the standard error of estimate
Then compute the standard error of the regression coefficient, Sb
s sy x y x 2
ss
x n
s
x Nb
y x y x
2
2 21 1( ) / ( ) ( ) / ( )
The test of significance of the regression coefficient (byx)
The significance of the regression coefficient is tested using a t test with (N-k-1) degrees of freedom:
tb
s
b
S
S n
yx
b
yx
y x
x
1
Computing regression using correlations
The correlation, in the population, is given by
The population correlation coefficient, ρxy, is estimated by the
sample correlation coefficient, rxy
xy
x yN
rz z
Ns
s s
xy
x y
xyx y
xy
x y
2 2
Sums of squares, regression (SSreg)
Recalling that R2 gives the proportion of variance of Y accounted for (or explained) by X, we can obtain
or, in other words, SSreg is that portion of SSy predicted or explained by the regression of Y on X.
SS r y
SS r y
reg
res
2 2
2 21( )
Standard error of estimate
From SSres we can compute the variance of estimate and standard error of estimate as
(Note alternative formulas were given earlier.)
sr y
N ks s
y x
y x y x
22 21
1( )
Testing the Significance of r
The significance of a correlation coefficient, r, is tested using a t test:
With N-2 degrees of freedom.
21
2
r
Nrt
Testing the difference between two correlations
To test the difference between two Pearson correlation coefficients, use the “Comparing two correlation coefficients” calculator on my web site.
Testing the difference between two regression coefficients
This, also, is a t test:
Where
was given earlier.
22
21
21 bb SS
bbt
2bS
Point-biserial and Phi correlation
These are both Pearson Product-moment correlationsThe Point-biserial correlation is used when
on variable is a scale variable and the other represents a true dichotomy.For instance, the correlation between a
performance on an item—the dichotomous variable—and the total score on a test—the scaled variable.
Point-biserial and Phi correlation
The Phi correlation is used when both variables represent a true dichotomy.For instance, the correlation between two
test items.
Biserial and Tetrachoric correlation
These are non-Pearson correlations.Both are rarely used anymore.The biserial correlation is used when
one variable is truly a scaled variable and the other represents an artificial dichotomy.
The Tetrachoric correlation is used when both variables represent an artificial dichotomy.