descriptions. description correlation – simply finding the relationship between two scores ○...
Post on 19-Jan-2016
214 Views
Preview:
TRANSCRIPT
REGRESSIONDescriptions
Description
• Correlation – simply finding the relationship between two scores
○ Both the magnitude (how strong or how big)○ And direction (positive / negative)
Description
Whereas regression seeks to use one of the variables as the predictorTherefore you have an X variable (IV) -
predictorAnd Y variable (DV) - criterion
Description
Predictor X – variables – more flexible than ANOVACan be any combination of variables,
continuous, Likert, categorical Dependent Y-variables – usually
continuous, but you can predict categorical variablesBetter with discriminant or log regression
Description
Still not causal design, unless you manipulate the X (IV) variable
However, sometimes very obvious which variable would be predictiveSmoking predicts cancer
REGRESSIONResearch Questions
Research Questions
Usually want to know the relationship between IV and DV and the importance of each IV
ORControl for some variables variance and
then see if other IVs add any additional prediction
Compare sets of IV and how predictive they are (which is better)
Research Questions
How good is the equation?Is it better than chance? Or better than
using the mean to predict scores?
Research Questions
Importance of IVsWhich IVs are the most important? Which
contribute the most prediction to the equation?
Research Questions
Adding IVsFor example, PTSD scores are predictive of
alcohol useAfter we control for these scores, do
meaning in life scores help predict alcohol use?
Research Questions
Non-linear relationships can be assessed and determinedSo, you can use X2 to help with curvilinear
relationships that you might see when data screening
Research Questions
Controlling for other sets of IVs Using demographics to control for unequal
groups or additional variance over being people
Comparing sets of IVsUsing several IVs together to be predictive
over another set of IVs
Research Questions
Making an equation to predict new people’s scoresAfter you have shown that your IVs are
predictive, using those scores to assess new people’s performance
Entrance exams for school, military, etc
REGRESSION PARTS
Equation
Y-hat = A + B1X1 + B2X2 + …Y hat = predicted value for each participantA = constant, value added to each score to
predict participants scores @ zero (y-intercept)
Equation
Y-hat = A + B1X1 + B2X2 + …B = coefficient
○ Holding all other variables constant for every one unit increase in X there is a B unit increase in Y
○ Slope for that X variable given all others are zero
Equation
Standardized EquationY-hat = βx1 + βx2 …Beta = standardized B (or z-score B if you
like)For each 1 standard deviation increase in X,
there is a B standard deviation increase in Y○ Difficult to interpret○ BUT! B is standardized to -1 to 1 so you can
treat it as if it were r (which means you can tell direction and magnitude)
Equation
Pearson product – moment correlation = RR is the correlation between y and y-hat R2 = variance accounted for in DV by all the
IVs (not just one like r, but ALL of them).
SR
Semipartial correlations = sr = part in SPSSUnique contribution of
IV to R2 for those IVsIncrease in proportion of
explained Y variance when X is added to the equation
A/DV variance
DV Variance
IV 1
IV 2
A
PR
Partial correlation = pr = partial in SPSSProportion in variance in Y not explained by
other predictors but this X onlyA/BPr > sr
DV Variance
IV 1
IV 2
AB
TYPES OF REGRESSION
ANOVA = Regression
ANOVA = Regression with discrete variablesHowever, you cannot easily create a ANOVA
from a regressionMust convert continuous variables into
discrete variables, which causes you to lose variance
More power with regression
Simple (SLR)
SLR involves only one IV and one DV.It’s called simple because there’s only ONE
thing predicting.In this case, beta = r.
Multiple (MLR)
MLR uses several IVs and only one DV.You can use a mix of variables – continuous,
categorical, Likert, etc.You can use MLR to figure out which IVs are
the most important.○ 3 Types MLR
Simultaneous/Standard
All of the variables are entered “at once” Each variable assessed as if it were the
last variable enteredThis “controls” for the other IVs, as we
talked about the interpretation of B.Evaluates sr > 0?
Simultaneous/Standard
If you have two highly correlated IVs the one with the biggest sr gets all the variance
Therefore the other IV will get very little variance associated with it and look unimportant
Sequential/Hierarchical
IVs enter the regression equation in an order specified by the researcher
First IV is basically tested against r (since there’s nothing else in the equation it gets all the variance)
Next IVs are tested against pr (they only get the left over variance)
Sequential/Hierarchical
What order?Assigned by theoretical importanceOr you can control for nuisance variables in
the first step
Sequential/Hierarchical
Using SETS of IVs instead of individualsSo, say you have a group of IVs that are
super highly correlated but you don’t know how to combine them or want to eliminate them.
Instead you will process each step as a SET and you don’t care about each individual predictor
Stepwise/Statistical Entry into the equation is solely based on
statistical relationship and nothing to do with theory or your experiment
Stepwise/Statistical
Forward – biggest IV is added first, then each IV is added as long as it accounts for enough variance
Backward – all are entered in the equation at first, and then each one is removed if it doesn’t account for enough variance
Stepwise – mix between the two (adds them but then may later delete them if they are no longer important).
ASSUMPTIONS
Number of People
Ratio of cases to IVsIf you have less cases than IVs you will get
a perfect solution (aka account for all the variance in the DV)
But that doesn’t mean anything…
Number of People
Ratio of cases to IVsGpower = for how many cases given alpha,
power, predictors, etc.Rules of thumb = more than 50 + 8(K)
(number of IVs)Or 104 + K (for testing importance of
predictors)
Number of People
How many people?However…you can have too many people.Any correlation or predictor will be
significant with very large N○ Practical versus statistical significance
Missing Data
Continuous data – linear trend at point, mean replace, etc.
Categorical data – best to leave it out because you can’t guess at it.
Outliers
Now, since IVs are continuous, we want to make sure there are not outliers on both the IVs and DVsMahalanobis
Outliers
Leverage – how much influence over the slope a point hasCut off rule of thumb = (2K+2)/N
Discrepancy – how far away from other data points a point is (no influence)
Cooks – influence – combination of both leverage and discrepancyCut off rule of thumb = 4/(N-K-1)
Multicollinearity
If IVs are too highly correlated there are several issuesSPSS may not runSPSS picks which variable to go first
depending on the type of analysis Check – bivariate correlation table of IVs
(you want it to be correlated with DV!)
Normal/Linear
Normality – we want our IVs and DVs to be normally distributedResidual Histogram
Linearity – relationships between IV and DV should be linear or you will do a special X2 Normality PP Plot
Homogeneity/Homoscedasticity
Homogeneity – you want the IVs/DVs to have equal variancesResidual Plot (equal spread up and down -
raining) Homoscedasticity – you want the errors
to be spread evenly across the values of the other variablesResidual Plot (equal spread up and down
across the bottom – megaphones)
Theoretical Assumption
Independence of errorsYou need to know that the scores of the first
person tested are not affecting the scores of the last person tested
Mud on a scale
EXAMPLES
SLR
Data set 1 IV
Books – number of books people readAttend – attendance for class
DVGrade – final grade in the class
SLR
Research Question:Does the number of books predict final
grade in the course?Does attendance predict final grade in the
course?
MLR - Simultaneous
Research QuestionDo books and attendance both predict final
course grade?○ Overall – together?○ Individual predictors?
MLR – Hierarchical
Research question: What predicts how well people take care of their cars?We want to first control for demographics
(age, gender)And then use extroversion to predict how
well people take care of their cars.
MLR Hierarchical
So after controlling for demographics, does extroversion predict?
Interactions
Dummy Coding Types
Two categoricalOne categorical, One continuousTwo continuous
Dummy Coding
A way to do ANOVA in regressionIf you have two levels, simply type them in
as 0 and 1If you have more than two levels, you need
to enter each separately
Dummy Coding
More than two levels:You will need Levels – 1 columns F – value tells you the overall main effectB value – compares that group to the group
coded as all zeros
Dummy Coding After you enter each variable separately, then
enter them as a set (or one simultaneous) regression
The significance of the overall model will tell if you if the main effect is significant
B gives you differences between groups (two levels)
Dummy Coding
How many friends do people have?This example is from ANOVA.IV: Health condition – excellent, fair or poor.DV: Number of Friends.
Dummy Coding
Since we have three groups or levels, we’ll need to recode this variable into 2 variables.One for excellentOne for fairThe blanks for poor.
Dummy Coding
Why not three?Because that would be repetitive.
Interactions
Interactions – well we automatically test for interactions in ANOVA, why not in regression?In regression an interaction says that there
are differences in the slope of the line predicting Y from one IV depending on the level of the other IV
Interactions
Nominal variable interactions:So we have two categorical predictors.Example – create interaction term
○ Testing environment by Learning Environment.
Interactions - Nominal
Now that we’ve created our interaction terms, we can test them using a hierarchical regressionStep one – main effectsStep two – main effects and interactions
Interactions - Nominal
Now we examine step 1 for main effects Step two for interactions
You ignore the main effects in Step 2
Interactions - Nominal
What does all that mean?!After a significant ANOVA, you do a post
hoc correct?Simple slopes – post hoc analyses for
interactions in regression○ These are “harder to get” than an ANOVA, but
there are less “tests” to run so technically more powerful/less type 1 error
Interactions - Nominal
You will write out the equation and figure out the slopes/means/picture for each condition combination.
Equation = 30.8 + -8 (learning) + -14.1 (testing) + 20.5(learning X testing)
Interactions - Nominal
Now we’ll fill in the equation for all the combinations.Learning (0 or 1)Testing (0 or 1)Interaction (0 or 1 depending on the
combination).
Interaction - NominalDry (0) Wet (1)
Dry (0) 30.8 16.7
Wet (1) 22.8 29.2
Dry (0) Wet (1)0
5
10
15
20
25
30
35
Dry (0)
Wet (1)
Learning Environment
Sco
re
Interactions - Mix
Data Set 4IVsEvents – number of events attendedStatus – low (0) versus high (1)DVsStress levels
How to
Create interactionTransform > compute > multiply
Run regression as beforeStep 1 – main effectsStep 2 – main effects and interaction
Interactions - Mix
LOW status, look at events slope.B = .121, β = .52, t(57) =3.94, p<.001,
indicating that low status people feel more stress as the number of events they attend increases.
HIGH status, look at events slope. B = .02, β = .10, t(57) = .55, p.=58, indicating
that high status people feel the same amount of stress no matter how many events they attend.
Interaction - MixLow Events High Events
Low Status 20.93 27.81
High Status 17.78 18.98
Low Events High Events0
5
10
15
20
25
30
Low Status
High Status
Interactions - continuous
Most likely combination since you are running a regressionCreate interaction term first (multiply them
together)Books * Attendance Interaction to predict
grades.
Interactions – continuous Pick ONE variable to examine. Let’s go
with attendance.You can get the AVERAGE slope for attendance
and books. Since we picked attendance, we will look at the slope for books, β=-.532, t(37) = -1.21, p=.24. So at average attendance, readings books do not increase your grade.
Let’s create hi and lo terms for ONE of the variables.AttendanceHI, AttendanceLOAttendanceHI by Books, AttendanceLO by Books.
Interactions - continuous
Now, we can’t just use 1 and 0 for different groupsSo we have to create “hi” and “lo” groups for
one variableThis theory is also backwards…for the hi
group, you subtract 1 SD, for the lo group you add 1SD
Basically you are bringing them up or down to the mean
Interaction
low books high books0
10
20
30
40
50
60
70
80
90
High Attendance
Average Attendance
Low Attendance
Books Read
Gra
de
Mediation
Mediation
Mediation occurs when the relationship between an X variable and a Y variable is eliminated or lowered when an additional Mediator variable is added to the equation.
Mediation Steps
Baron and KennyStep 1 – use X to predict Y to get c pathway.Step 2 – use X to predict M to get a
pathway.Step 3 – use X and M to predict Y to get b
pathway.Step 4 – use the same regression to look at
the c’ pathway. Sobel test
Mediation Steps
top related