multiple regression. regression attempts to predict one criterion variable using one predictor...
Post on 21-Dec-2015
254 views
TRANSCRIPT
Multiple Regression
Multiple Regression
Regression Attempts to predict one criterion variable
using one predictor variable Addresses the question: Does the predictor
significantly predict the criterion?
Multiple Regression
Multiple Regression Attempts to predict one criterion variable
using 2+ predictor variables Addresses the questions: Do the predictors
significantly predict the criterion? If so, which predictor is best?
Allows for variance to be removed from one predictor prior to evaluating the rest (like ANCOVA)
Multiple Regression
How to compare the predictive value of 2+ predictors
When comparing multiple predictors within an experiment
Use standardized b (β) β = bxs/sintercept
z-score = lets you compare performance between 2 variables with different metrics, by addressing performance relative to a sample mean & SD
Multiple Regression
How to compare the predictive value of 2+ predictors When comparing multiple predictors between
experiments Use b
SE highly variable between experiments the SE from Exp. 1 ≠ the SE from Exp. 2 β’s from both experiments not comparable
Can’t compare z-score of your Stats grade from this semester with your Stats grade if you take the class again next semester If next semester’s class is especially dumb, you
appear to have gotten much smarter
Multiple Regression
Magnitude of the relationship between one predictor and a criterion (b/β) in a model dependent upon the other predictors in that model Relationship between IQ and SES (with
College GPA and Parents’ SES in the model) will be different if more, less, or different predictors included in the model
Multiple Regression
When comparing the results of 2 experiments using regression, coefficients (b/β) will not be the same Will be similar to the extent that the
regression models are similar
Why not?
Multiple Regression
Coefficients (b/β) represent partial and semipartial (part) correlations, not traditional Pearson’s r Partial Correlation – the correlation
between 2 variables with the variance from one or more variables removed I.e. correlation between the residuals of both
variables, once variance from one or more covariates has been removed
Multiple Regression
Partial Correlation = the amount of the variance in a criterion that is associated with a predictor that could not be explained by the other covariate(s)
Multiple Regression
Semipartial/Part Correlation -the correlation between 2 variables with the variance from one or more variables removed from the predictor only (i.e. not the criterion) I.e. correlation between the residuals of the
predictor, once variance from one or more covariates has been removed, and the criterion
Multiple Regression
Part Correlation = the amount of variance that a predictor explains in a criterion once variance from the covariates has been removed I.e. the percentage of the total variance left
unexplained by the covariate that the predictor accounts for
Since the variance that is removed from the criterion depends on the other predictors in the model, different models yield different regression coefficients
Partial Correlation = B
Part Correlation = B/A + B
Multiple Regression
How to compare the predictive value of 2+ predictors Remember: Regression coefficients are
very unstable from sample to sample, so interpret large differences in coefficients only (> ~.2)
Multiple Regression
Like regression, tests: Ability of each predictor to predict the criterion variable
(tests b’s/β’s) Overall ability of the model (all predictors combined) to
predict the criterion variable (Model R2) Model R2 = total % variance in criterion accounted for by
predictors Model R = correlation between predictors and criterion
Also can test: If one or more predictors can predict the criterion if
variance from one or more other predictors is removed If each predictor significantly increases the Model R2
Multiple Regression
Predictors are evaluated with variance from other predictors removed More than one way to remove this
variance Examine all predictors en masse with
variance from all other predictors removed Remove variance from one or more
predictors first, then look at second set Like in factorial ANCOVA
Multiple Regression
This is done by specifying different selection methods Selection method = method of inputting
predictors into a regression equation Four most commonly used methods
Commonly-used = Only 4 methods offered by SPSS
Multiple Regression
Selection Methods Simultaneous – Adds all predictors at
once & is therefore the lack of a selection method Good if there is no theory to guide which
predictors should be entered first But when does this ever happen?
Multiple Regression
Selection Methods All Subsets – Computer finds method of
entering predictors that maximizes overall Model R2
But SPSS doesn’t do it and it finds best subset in your particular dataset – since data, not theory, guiding selection method not guarantee that the model will generalize to other datasets, particularly in smaller samples
Multiple Regression
Selection Methods Backward Elimination – Starts will all predictors
in the model and eliminates the predictor with least unique variance related to criterion iteratively until all predictors are significant Iterative = process involving several steps It begins with all predictors, so predictors with least
variance not overlapping with other predictors (i.e. that would be partialled out) are removed
But, also atheoretical/based on data only
Multiple Regression
Selection Methods Forward Selection – the opposite of backward
elimination - starts will the predictor in the model most strongly related to the criterion and adds the predictor next most strongly-related to criterion iteratively until a nonsignificant predictor is found Step 1: predictor most correlated with the criterion (P1)
Step 2: add strongest predictor when P1 partialled out
But also atheoretical
Multiple Regression
Selection Methods Stepwise
Technically, any selection method that procedes iteratively (in steps) is stepwise (i.e. both backward elimination and forward selection)
However, usually refers to method where order of predictors is determined in advance by the researcher based upon theory
Multiple Regression
Selection Method Stepwise
Why would you use it? Same reason as covariates in ANCOVA Want to know if Measure A of treatment adherence
is better than Measure B? Run stepwise regression and enter Measure B first, then Measure A with treatment outcome as the criterion. Addresses the question: Does Measure A
predict treatment outcome even when variance from Measure B has already been removed (i.e. above and beyond Measure B)?
Multiple Regression
Selection Method Stepwise
Why would you use it? Running a repeated-measures design and want to
make sure your groups are equal on pre-test scores? Enter the pre-test into the first step of your regression.
Multiple Regression
Assumptions Linearity of Regression
Variables linearly related to one another Normality in Arrays
Actual values of DV normally distributed around predicted values (i.e. regression line) – AKA regression line is good approximation of population parameter
Homogeneity of Variance in Arrays Assumes that variance of criterion is equal for all
levels of predictor(s)
Multiple Regression
Issues to be aware of: Range Restriction Heterogenous Subsamples Outliers
With multiple predictors, must be aware of both univariate outliers (unusual values on one variable) as well as multivariate outliers (unusual values on two or more variables)
Multiple Regression
Outliers Univariate outlier – a man weighing 500 lbs. Multivariate outlier – a man who is 6’ tall and
weights 120 lbs. – Note neither value is a univariate outlier, but both together are quite odd Three variables define the presence of an outlier in
multiple regression: Distance – distance from the regression line Leverage – distance from predictor mean Influence – average of distance and leverage
Distance – distance from the regression line See A
Leverage – distance from predictor mean See B
Influence – average of distance and leverage
Multiple Regression
Degree of Overlap in Predictors Adding predictors is like adding covariates in
ANCOVA: In adding one that correlates too highly with others, model R2 remains unchanged but df decreases, making the regression less powerful
Tolerance = multiple R2 between all predictors – want to be low Examine bivariate correlations between predictors,
if correlation exceeds internal consistency (α), get rid of one of them
Multiple Regression
Multiple regression can also test for more complex relationships, such as mediation and moderation Mediation – when one variable (predictor)
operates on another variable (criterion) via a third variable (mediator)
Math self-efficacy mediates math ability and interest in a math major Must establish paths A & B, and that path C is
smaller when paths A & B are included in the model (i.e. math self-efficacy accounts for variance in interest in a math major above and beyond math ability)
1. Find significant correlations between the predictor and mediator (path A) and mediator and criterion (path B)
2. Run a stepwise regression with the predictor entered first, then the predictor and mediator entered together in step 2
Multiple Regression
The mediator should be a significant predictor of the criterion in step 2
The predictor-criterion relationship (b/β) should decrease from step 1 to step 2 Full mediation: If this relationship is significant
in step 1, but nonsignificant in step 2 Partial mediation: This relationship is
significant in step 1, and smaller, but still significant, in step 2
Multiple Regression
Partial mediation Sobel’s test (1982): tests the statistical significance
of this mediation relationship
1. Regress predictor on mediator (path A) and mediator on criterion (path B) in 2 separate regressions
2. Calculate sβ for path A & B, where sβ = β/t
3. Calculate a t-statistic, where df = n – 3 and
222222BAABBA
BA
sssst
Multiple Regression
Multiple regression can also test for more complex relationships, such as mediation and moderation Moderation (in regression) – when the
strength of a predictor-criterion changes as a result of a third variable (moderator) Interaction (in ANOVA) – when the strength
of the relationship between an IV and DV changes as a function of levels of the IV
Multiple Regression
Moderation Unlike in ANOVA, you have to create a
moderator term for yourself by multiplying the predictor and moderator In SPSS, go to Transform Compute Typical to enter the predictor and mediator in the
first step of a regression and the interaction term in the second step to determine the contribution of the mediator above and beyond the main effect terms Just like how variance is partitioned in a
factorial ANOVA
Logistic Regression
Logistic Regression = used to predict a dichotomous criterion (only 2 levels) variable with 1+ continuous or discrete predictors
Can’t use linear regression with a dichotomous criterion because:
1. Dichtotomous = assuming the criterion isn’t normally distributed (i.e. assumption of normality in arrays is violated)
Can’t use linear regression with a dichotomous criterion because:
2. Regression line fits data more poorly when predictor = 0 (i.e. assumption of homogeneity of variance arrays is violated)
Logistic Regression
Logistic Regression Interpreting coefficients
In logistic regression, b represents change in log odds in criterion with one point increase in predictor
Raise “ex” where x = b, to find the odds – b = -.0812 e-.0812 = .9220
Logistic Regression
Logistic Regression Interpreting coefficients
Continuous predictor: One pt. increase in predictor corresponds to decreasing (because b is neg) odds of criterion by factor of .922 (almost 100% or twice as likely)
Dichotomous predictor: Odds of change in one group vs. other group (sign indicates increase or decrease)