descriptions. description correlation – simply finding the relationship between two scores ○...

REGRESSIONDescriptions

Description

• Correlation – simply finding the relationship between two scores

○ Both the magnitude (how strong or how big)○ And direction (positive / negative)

Description

Whereas regression seeks to use one of the variables as the predictorTherefore you have an X variable (IV) -

predictorAnd Y variable (DV) - criterion

Description

Predictor X – variables – more flexible than ANOVACan be any combination of variables,

continuous, Likert, categorical Dependent Y-variables – usually

continuous, but you can predict categorical variablesBetter with discriminant or log regression

Description

Still not causal design, unless you manipulate the X (IV) variable

However, sometimes very obvious which variable would be predictiveSmoking predicts cancer

REGRESSIONResearch Questions

Research Questions

Usually want to know the relationship between IV and DV and the importance of each IV

ORControl for some variables variance and

then see if other IVs add any additional prediction

Compare sets of IV and how predictive they are (which is better)

Research Questions

How good is the equation?Is it better than chance? Or better than

using the mean to predict scores?

Research Questions

Importance of IVsWhich IVs are the most important? Which

contribute the most prediction to the equation?

Research Questions

Adding IVsFor example, PTSD scores are predictive of

alcohol useAfter we control for these scores, do

meaning in life scores help predict alcohol use?

Research Questions

Non-linear relationships can be assessed and determinedSo, you can use X2 to help with curvilinear

relationships that you might see when data screening

Research Questions

Controlling for other sets of IVs Using demographics to control for unequal

groups or additional variance over being people

Comparing sets of IVsUsing several IVs together to be predictive

over another set of IVs

Research Questions

Making an equation to predict new people’s scoresAfter you have shown that your IVs are

predictive, using those scores to assess new people’s performance

Entrance exams for school, military, etc

REGRESSION PARTS

Equation

Y-hat = A + B1X1 + B2X2 + …Y hat = predicted value for each participantA = constant, value added to each score to

predict participants scores @ zero (y-intercept)

Equation

Y-hat = A + B1X1 + B2X2 + …B = coefficient

○ Holding all other variables constant for every one unit increase in X there is a B unit increase in Y

○ Slope for that X variable given all others are zero

Equation

Standardized EquationY-hat = βx1 + βx2 …Beta = standardized B (or z-score B if you

like)For each 1 standard deviation increase in X,

there is a B standard deviation increase in Y○ Difficult to interpret○ BUT! B is standardized to -1 to 1 so you can

treat it as if it were r (which means you can tell direction and magnitude)

Equation

Pearson product – moment correlation = RR is the correlation between y and y-hat R2 = variance accounted for in DV by all the

IVs (not just one like r, but ALL of them).

Semipartial correlations = sr = part in SPSSUnique contribution of

IV to R2 for those IVsIncrease in proportion of

explained Y variance when X is added to the equation

A/DV variance

DV Variance

Partial correlation = pr = partial in SPSSProportion in variance in Y not explained by

other predictors but this X onlyA/BPr > sr

DV Variance

TYPES OF REGRESSION

ANOVA = Regression

ANOVA = Regression with discrete variablesHowever, you cannot easily create a ANOVA

from a regressionMust convert continuous variables into

discrete variables, which causes you to lose variance

More power with regression

Simple (SLR)

SLR involves only one IV and one DV.It’s called simple because there’s only ONE

thing predicting.In this case, beta = r.

Multiple (MLR)

MLR uses several IVs and only one DV.You can use a mix of variables – continuous,

categorical, Likert, etc.You can use MLR to figure out which IVs are

the most important.○ 3 Types MLR

Simultaneous/Standard

All of the variables are entered “at once” Each variable assessed as if it were the

last variable enteredThis “controls” for the other IVs, as we

talked about the interpretation of B.Evaluates sr > 0?

Simultaneous/Standard

If you have two highly correlated IVs the one with the biggest sr gets all the variance

Therefore the other IV will get very little variance associated with it and look unimportant

Sequential/Hierarchical

IVs enter the regression equation in an order specified by the researcher

First IV is basically tested against r (since there’s nothing else in the equation it gets all the variance)

Next IVs are tested against pr (they only get the left over variance)

What order?Assigned by theoretical importanceOr you can control for nuisance variables in

the first step

Using SETS of IVs instead of individualsSo, say you have a group of IVs that are

super highly correlated but you don’t know how to combine them or want to eliminate them.

Instead you will process each step as a SET and you don’t care about each individual predictor

Stepwise/Statistical Entry into the equation is solely based on

statistical relationship and nothing to do with theory or your experiment

Stepwise/Statistical

Forward – biggest IV is added first, then each IV is added as long as it accounts for enough variance

Backward – all are entered in the equation at first, and then each one is removed if it doesn’t account for enough variance

Stepwise – mix between the two (adds them but then may later delete them if they are no longer important).

ASSUMPTIONS

Number of People

Ratio of cases to IVsIf you have less cases than IVs you will get

a perfect solution (aka account for all the variance in the DV)

But that doesn’t mean anything…

Number of People

Ratio of cases to IVsGpower = for how many cases given alpha,

power, predictors, etc.Rules of thumb = more than 50 + 8(K)

(number of IVs)Or 104 + K (for testing importance of

predictors)

Number of People

How many people?However…you can have too many people.Any correlation or predictor will be

significant with very large N○ Practical versus statistical significance

Missing Data

Continuous data – linear trend at point, mean replace, etc.

Categorical data – best to leave it out because you can’t guess at it.

Outliers

Now, since IVs are continuous, we want to make sure there are not outliers on both the IVs and DVsMahalanobis

Outliers

Leverage – how much influence over the slope a point hasCut off rule of thumb = (2K+2)/N

Discrepancy – how far away from other data points a point is (no influence)

Cooks – influence – combination of both leverage and discrepancyCut off rule of thumb = 4/(N-K-1)

Multicollinearity

If IVs are too highly correlated there are several issuesSPSS may not runSPSS picks which variable to go first

depending on the type of analysis Check – bivariate correlation table of IVs

(you want it to be correlated with DV!)

Normal/Linear

Normality – we want our IVs and DVs to be normally distributedResidual Histogram

Linearity – relationships between IV and DV should be linear or you will do a special X2 Normality PP Plot

Homogeneity/Homoscedasticity

Homogeneity – you want the IVs/DVs to have equal variancesResidual Plot (equal spread up and down -

raining) Homoscedasticity – you want the errors

to be spread evenly across the values of the other variablesResidual Plot (equal spread up and down

across the bottom – megaphones)

Theoretical Assumption

Independence of errorsYou need to know that the scores of the first

person tested are not affecting the scores of the last person tested

Mud on a scale

EXAMPLES

Data set 1 IV

Books – number of books people readAttend – attendance for class

DVGrade – final grade in the class

Research Question:Does the number of books predict final

grade in the course?Does attendance predict final grade in the

course?

MLR - Simultaneous

Research QuestionDo books and attendance both predict final

course grade?○ Overall – together?○ Individual predictors?

MLR – Hierarchical

Research question: What predicts how well people take care of their cars?We want to first control for demographics

(age, gender)And then use extroversion to predict how

well people take care of their cars.

MLR Hierarchical

So after controlling for demographics, does extroversion predict?

Interactions

Dummy Coding Types

Two categoricalOne categorical, One continuousTwo continuous

Dummy Coding

A way to do ANOVA in regressionIf you have two levels, simply type them in

as 0 and 1If you have more than two levels, you need

to enter each separately

Dummy Coding

More than two levels:You will need Levels – 1 columns F – value tells you the overall main effectB value – compares that group to the group

coded as all zeros

Dummy Coding After you enter each variable separately, then

enter them as a set (or one simultaneous) regression

The significance of the overall model will tell if you if the main effect is significant

B gives you differences between groups (two levels)

Dummy Coding

How many friends do people have?This example is from ANOVA.IV: Health condition – excellent, fair or poor.DV: Number of Friends.

Dummy Coding

Since we have three groups or levels, we’ll need to recode this variable into 2 variables.One for excellentOne for fairThe blanks for poor.

Dummy Coding

Why not three?Because that would be repetitive.

Interactions

Interactions – well we automatically test for interactions in ANOVA, why not in regression?In regression an interaction says that there

are differences in the slope of the line predicting Y from one IV depending on the level of the other IV

Interactions

Nominal variable interactions:So we have two categorical predictors.Example – create interaction term

○ Testing environment by Learning Environment.

Interactions - Nominal

Now that we’ve created our interaction terms, we can test them using a hierarchical regressionStep one – main effectsStep two – main effects and interactions

Now we examine step 1 for main effects Step two for interactions

You ignore the main effects in Step 2

What does all that mean?!After a significant ANOVA, you do a post

hoc correct?Simple slopes – post hoc analyses for

interactions in regression○ These are “harder to get” than an ANOVA, but

there are less “tests” to run so technically more powerful/less type 1 error

You will write out the equation and figure out the slopes/means/picture for each condition combination.

Equation = 30.8 + -8 (learning) + -14.1 (testing) + 20.5(learning X testing)

Now we’ll fill in the equation for all the combinations.Learning (0 or 1)Testing (0 or 1)Interaction (0 or 1 depending on the

combination).

Interaction - NominalDry (0) Wet (1)

Dry (0) 30.8 16.7

Wet (1) 22.8 29.2

Dry (0) Wet (1)0

Dry (0)

Wet (1)

Learning Environment

Interactions - Mix

Data Set 4IVsEvents – number of events attendedStatus – low (0) versus high (1)DVsStress levels

How to

Create interactionTransform > compute > multiply

Run regression as beforeStep 1 – main effectsStep 2 – main effects and interaction

Interactions - Mix

LOW status, look at events slope.B = .121, β = .52, t(57) =3.94, p<.001,

indicating that low status people feel more stress as the number of events they attend increases.

HIGH status, look at events slope. B = .02, β = .10, t(57) = .55, p.=58, indicating

that high status people feel the same amount of stress no matter how many events they attend.

Interaction - MixLow Events High Events

Low Status 20.93 27.81

High Status 17.78 18.98

Low Events High Events0

Low Status

High Status

Interactions - continuous

Most likely combination since you are running a regressionCreate interaction term first (multiply them

together)Books * Attendance Interaction to predict

grades.

Interactions – continuous Pick ONE variable to examine. Let’s go

with attendance.You can get the AVERAGE slope for attendance

and books. Since we picked attendance, we will look at the slope for books, β=-.532, t(37) = -1.21, p=.24. So at average attendance, readings books do not increase your grade.

Let’s create hi and lo terms for ONE of the variables.AttendanceHI, AttendanceLOAttendanceHI by Books, AttendanceLO by Books.

Interactions - continuous

Now, we can’t just use 1 and 0 for different groupsSo we have to create “hi” and “lo” groups for

one variableThis theory is also backwards…for the hi

group, you subtract 1 SD, for the lo group you add 1SD

Basically you are bringing them up or down to the mean

Interaction

low books high books0

High Attendance

Average Attendance

Low Attendance

Books Read

Mediation

Mediation occurs when the relationship between an X variable and a Y variable is eliminated or lowered when an additional Mediator variable is added to the equation.

Mediation Steps

Baron and KennyStep 1 – use X to predict Y to get c pathway.Step 2 – use X to predict M to get a

pathway.Step 3 – use X and M to predict Y to get b

pathway.Step 4 – use the same regression to look at

the c’ pathway. Sobel test

Mediation Steps

descriptions. description correlation – simply finding the relationship between two scores ○...

variables variance

yhat r2

sets of ivs

b1x1 b2x2 y hat

variables constant

standardized b

b unit increase

ptsd scores

Documents

how should educators interpret value-added scores oct 2012...

1a1a the magnitude of the energy access problem: how many...

how to report and interpret the nepsy-‐ii scores within a...

understanding fico scores - merrick bank · understanding...

academic correlates of taiwanese senior...

measurements how exact? how many fingers? how accurate? what...

credit scores, credit cards: how consumer finance works: how...

setting cut-scores for written tests: a how-to guide (and...

fixing your credit objectives: understanding how credit...

wisc-v interpretive considerations for sample...

how light pollution affects the stars: magnitude reader ·...

marouane boukhris - scores in cto pci how do they help?

how do symbolic and magnitude processing skills relate to...

lesson fourteen interpreting scores. contents five questions...

how to use propensity scores in the analysis of...

how credit inquiries affect your credit scores?

how are credit scores determined?

how nghs can use the ncee to improve its sat scores

how functional movement screening scores correlate to y

sample: how to build culture and act scores