d ay 2 a genda i. statistical significance ii. correlational analyses iii. simple linear regression...

74
DAY 2 AGENDA I. Statistical Significance II. Correlational Analyses III.Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V. Multiple Regression Analysis VI. Interaction Effects VII.Testing Interaction Effects 1

Upload: georgiana-howard

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

1

DAY 2 AGENDA

I. Statistical SignificanceII. Correlational Analyses

III. Simple Linear Regression Analysis

IV. Assumptions of Regression Analysis

V. Multiple Regression Analysis

VI. Interaction Effects

VII. Testing Interaction Effects

Page 2: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

2

I. STATISTICAL SIGNIFICANCE

Page 3: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

3

STATISTICAL SIGNIFICANCE (CONCEPTUAL)

Yesterday we discussed causal inferences. Today we want to look at our ability to make inferences based on statistical significance.

We want to determine whether the results of our analyses are statistically significant.

What do we mean when we say that analytic result is statistically significant?

It refers to the likelihood that the obtained value has occurred by chance.

Page 4: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

4

INFERENTIAL STATISTICS

Population

This is the general group of individuals you are interested in knowing more about, but typically cannot study everyone in the population because of its size.

Sample

A group of individuals, selected from the population, that are involved in the researcher’s study.

Page 5: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

5

INFERENTIAL STATISTICS (CONT.)

What the researcher is hoping to do is to draw conclusions about (or make inferences about) what is occurring in the population based on the data obtained from the sample.

In order to determine whether these inferences can indeed be made, the researcher needs to rely on inferential statistics.

The purpose of inferential statistics is to provide some indication of the degree of confidence a researcher can have that the statistic they’ve obtained from the sample is indicative of what is occurring in the population of interest.

Page 6: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

6

INFERENTIAL STATISTICS (CONT.)

The purpose of data collection and interpretation of inferential statistics is a determination of whether the null/alternative hypothesis should be retained or rejected (more on this decision shortly).

Page 7: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

7

PROBABILITY LEVEL AND ERRORS IN INFERENCE

The p value (sometimes referred to as α), represents the probability of a researcher making an error in inference.

We also noted that it is to the researcher’s benefit to set this value low (perhaps 5% or .05).

However, why not just make alpha as low as possible?

Well, like most things in life, there are trade-offs.

Page 8: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

8

PROBABILITY LEVEL AND ERRORS IN INFERENCE (CONT.)

Let’s discuss errors in inference a bit. To do that, we need to return to our discussion of null and alternative hypotheses from earlier.

Null Hypothesis is

True in Population

Null Hypothesis is False in the Population

Reject null hypothesis

Type I Error Correct Decision

Fail to reject null hypothesis

Correct Decision

Type II Error

Page 9: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

9

PROBABILITY LEVEL AND ERRORS IN INFERENCE (CONT.)

There is a trade-off between making a type I and a type II error.

If you set alpha too high (say .10 vs .05), you increase your chance of making a type I error (concluding that a significant effect exists in the population based on what’s observed in the sample when, in fact, no effect does exist).

However, if you set you alpha too low (say .001 vs .01), you increase your chance of making a type II error (concluding that an effect does not exist in the population based on sample data, when, in fact, an effect does exist).

Page 10: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

10

PROBABILITY LEVEL AND ERRORS IN INFERENCE (CONT.)

It really comes down to determining which error you are more comfortable with making when selecting an appropriate alpha level.

In most situations, making a type I error is a bigger problem than making a type II error. For example, telling the research community that a program is associated with improvements in mental health (when it’s not) and funds are wasted on taking the program to a wider scale.

However, sometimes a type II error might be problematic

Pilot studies, for example.

In pilot studies, alpha might be set at .2 or .25.

Page 11: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

11

PROBABILITY LEVEL AND ERRORS IN INFERENCE (CONT.)

Sample size and Type I errors

The size of the sample used in a study has a big impact on the chances of making a type I error.

In general, the chances of making a type I error increases the smaller the sample used.

Why?

Think about how representative the sample will be of the population and the confidence you will have in what is truly occurring in the population.

Page 12: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

12

II. CORRELATIONAL ANALYSES

Page 13: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

13

CORRELATION – WHAT IS IT?

An association between two variables and the extent and direction of the association

Some examples of research questions that could be examined with correlational analysis:

Is degree of severity of maltreatment associated with years of education obtained?

Is having parents who are violent with one another associated with children being more violent?

Is association with prosocial peers related with less physical aggression?

Page 14: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

14

CHARACTERISTICS OF CORRELATIONS - STRENGTH

Pearson Product Moment Correlation Coefficient

rxy or r

Magnitude or strength of r

Ranges between -1 and +1, with a value of 0 being an indication of no relationship between two variables.

The sign of the correlation coefficient does not tell us anything about the magnitude or strength of the association. Rather, the sign only tells us about the direction of the association (more on direction in a minute).

Page 15: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

15

CHARACTERISTICS OF CORRELATIONS - STRENGTH (CONT.)

Coefficient values of r closer to 1.0, either positive or negative, indicate stronger associations between variables of interest, whereas values closer to 0 indicate weaker associations.

So:

-.75 > +.50 -.90 > +.80-.60 < +.70 -.52 > +.50

Page 16: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

16

CHARACTERISTICS OF CORRELATIONS - DIRECTION

Positive (Direct) Correlation – as the values of one variable increase, the values of a second variable also increase.

r = +.90

Graphically:

Page 17: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

17

CHARACTERISTICS OF CORRELATIONS – DIRECTION (CONT.)

Examples of Positive Correlations

The number of hours parents spend reading to their children is positively correlated with the child’s reading ability.

The number of parenting classes parents attend is positively correlated with the amount of empathy they show towards their children.

One thing to keep in mind, always remember to indicate the direction of the association when hypothesizing how variables might be related to one another.

Page 18: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

18

CHARACTERISTICS OF CORRELATIONS – DIRECTION (CONT.)

Negative (Indirect) Correlation – as the values of one variable increase, the values of a second variable decrease.

r = - .90

Graphically:

Page 19: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

19

CHARACTERISTICS OF CORRELATIONS – DIRECTION (CONT.)

Examples of Negative Correlations

The number of times teachers reward prosocial behavior in the classroom is negatively correlated with the student’s trips to the principal’s office for conduct problems.

The number of parenting classes parents attend is negatively correlated with number of referrals to CPS for child abuse.

Page 20: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

20

CHARACTERISTICS OF CORRELATIONS – DIRECTION (CONT.)

Curvilinear Association – as the values of one variable increase, the values of a second variable increase and decrease.

r = 0Graphically:

Page 21: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

21

CHARACTERISTICS OF CORRELATIONS – DIRECTION (CONT.)

No Correlation – There is no association between values of the two variables.

r = 0

Graphically:

Page 22: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

22

CHARACTERISTICS OF CORRELATIONS – CAUSALITY

“Correlation does not imply causation”

Why?

Direction of causality

Var A Var B or Var A Var B or

Var A Var B

Example: A social psychologist finds that kids who watch more violent TV display greater levels of physical aggression. But which variable came first and which came second (both directions are plausible).

Violent TV Aggression or Aggression Violent TV

So, because data used in correlational analyses are often collected cross-sectionally, temporal precedence can not be established.

Page 23: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

23

CHARACTERISTICS OF CORRELATIONS – CAUSALITY (CONT.)

Third Variable Problem

Even when data are collected across time and used in correlational analyses, a researcher may still be unable to make assumptions about causality.

Why?

Victim of ? Behavioral problemschild abuse at 6 at age 10

Violent Parent

Victim of Behavioral problemschild abuse at 6 at age 10

Page 24: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

24

CHARACTERISTICS OF CORRELATIONS – EXTREME VALUES

Computation of the correlation coefficient involves the mean of two sets of values (variables A and B).

As such, keep in mind that extreme values will impact a correlation coefficient significantly, particularly when the n is small.

Always graph your data points and determine whether you have outliers (data points > 3 stds).

Page 25: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

25

OTHER TYPES OF CORRELATION COEFFICIENTS

The Pearson’s Correlation Coefficient is used when two variables are continuous (have continuous values).

However, when two variables are dichotomous (have only two possible values), we use what’s called the phi coefficient Φ.

Example: correlating whether physical abuse (Y/N) is associated with mental health diagnosis (Y/N).

When a continuous variable is correlated with a dichotomous variable, we compute the point-biserial coefficient (rPB).

Page 26: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

26

III. SIMPLE LINEAR REGRESSION

Page 27: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

27

SIMPLE REGRESSION - FUNDAMENTALS We have been discussing the relationship

between two variables and we have previously examined the X, Y data in terms of scatterplots.

Page 28: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

28

SIMPLE REGRESSION – FUNDAMENTALS (CONT.)

We are going to shift our focus a bit to begin to try and identify the line that best fits this data.

This line is called the Least Squares Regression Line.

Now, when we say the line that “best fits” the data, we are referring to the line that minimizes the squared errors around the line.

Why squared? Well, it removes the sign when we square the error terms. In other words, some of the errors will be above the line and some will be below.

Page 29: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

29

LEAST SQUARES REGRESSION LINE

Here, in the example below, the line drawn bisects the top data points from the bottom points in such a way that the distance between the line and the points is minimized to the fullest.

Page 30: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

30

LEAST SQUARES REGRESSION LINE (CONT.)

So, the key principle to keep in mind when fitting the least squares regression line is that the line bisects the data points in such a way that the distance from the points to the line is minimized.

Fortunately, we have a more precise way of figuring out where the line of best fit or the least squares regression line should fall than simply eyeballing it.

We will be relying on an equation that you probably learned way back in high school and it’s the equation for a line.

Page 31: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

31

LEAST SQUARES REGRESSION LINE - COMPUTING

Recall that the equation for a line is:

Y = mx + b

Where m is equal to the slope of the line and b is equal to the intercept.

When determining the line of best fit or the least squares regression line, we rely on the same formula, but it looks a little different now:

Predicted value of Y = b0 + b1x

b0 represents the Y – intercept and b1 represents the slope of the line.

Page 32: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

32

IMPORTANCE OF THE LEAST SQUARES REGRESSION LINE

Now, what’s the importance of being able to draw this least squares regression line?

As you will hopefully see in a second, the importance lies in prediction.

With the line of best fit, we’ll be able to predict scores on the Y variable if we know scores on X.

As I am sure I don’t have to tell you, the ability to predict scores on an outcome variable of interest is of extreme importance, not just in the social sciences, but across a wide variety of fields/disciplines.

Page 33: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

33

LET’S THINK ABOUT WHAT WE’VE DONE Let’s think conceptually about coming up with the

equation for the least squares regression line.

What we have done is taken existing x and y data and used that data to create an equation for a line that best fits the existing data.

Knowing characteristics about that line of best fit, like the slope and the y-intercept, can then be used to derive predicted values of y.

As I said earlier, the ability to predict values on some outcome (Y) based on existing data is a tremendous resource used by a countless number of professionals.

Page 34: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

34

IV. ASSUMPTIONS OF REGRESSION ANALYSIS

Page 35: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

35

THE IMPORTANCE OF ASSUMPTIONS Now that we have obtained a good understanding

of the importance of regression analysis and the information we can obtain from it, I would like to discuss the assumptions or ground rules that should dictate when it can and can’t be used.

Considering whether the data meet these assumptions is important because violating these assumptions can, at worst, increase the likelihood of the researcher committing either a Type I or II error, or, at the very least, cause the researcher to over- or under-estimate the size of the association.

Page 36: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

36

ASSUMPTIONS OF REGRESSION ANALYSIS

We are going to focus our discussion on some of the more common assumptions of linear regression:

Assumption of linearity

Assumption of normality

Page 37: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

37

ASSUMPTION OF LINEARITY As we saw with correlational analysis, when we

set out to perform a regression analysis, we assume that the relationship between our variables is a linear one.

If the relationship between the predictor and outcome variables is non-linear, the regression analysis will underestimate the true relationship between the variables.

This, of course will increase our chance of committing a type II error (assuming there is no relationship, when there really is one).

Page 38: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

38

VIOLATION OF ASSUMPTION OF LINEARITY – HOW TO AVOID OR DEAL WITH

Probably the first step in dealing with the assumption of linearity is to have a firm understanding of your hypothesized relationship between your variables.

Is there the possibility of a non-linear relationship?

The other recommendation is to always plot your data to determine the pattern.

Non-linear relationships between variables will be obvious and will call for other methods of examining these associations.

Page 39: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

39

ASSUMPTION OF NORMALITY Regression analysis assumes that variables

used in the analysis have normal distributions.

Page 40: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

40

ASSUMPTION OF NORMALITY (CONT.) Variables that are not normally distributed can

drastically affect regression analysis’s ability to inform how strong a predictor a variable is of an outcome.

Recall that our least squares regression line attempts to minimize the sum of the squared errors between the line and the data.

Well, if you have a distribution that isn’t normal, then the slope of that line will be affected, particularly when n is small.

The assumption of normality is violated because the distribution of a variable is

Skewed and/or kurtotic

Page 41: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

41

SKEWED AND KURTOTIC DISTRIBUTIONS

Page 42: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

42

OUTLIERS The distribution of a variable can be skewed and/or

kurtotic for a couple of reasons.

The first has to do with an outlier, which has been defined as a score that is + 3 standard deviations from the mean.

Outliers can occur as a result of mistakes in data entry or because the individual’s score on the variable truly is an outlier.

There are a few techniques and statistics that we can use through SPSS to detect the presence of outliers (e.g., leverage, Cook’s D). The easiest technique, however, is to plot your data.

Page 43: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

43

TRANSFORMATIONS AND DICHOTOMIZATIONS

In terms of dealing with skewed and/or kurtotic distributions, there are techniques we can use to lessen the problem.

Transformations are mathematical operations that are performed on each value of a given variable and they have the result of making the distribution more normal.

One example is to take the square root of each value – taking the square root reduces the spread of the distribution and the impact of outliers.

In cases that are skewed because much of the distribution has a zero value (90% or greater), dichotomization of the variable is required and logistic regression would be required.

Page 44: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

44

V. MULTIPLE REGRESSION

Page 45: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

45

MULTIPLE REGRESSION In multiple regression we have at least three

variables – 2 predictor variables and 1 outcome variable (realize that we can have more that 2 predictors.

Y

X1 X2

Page 46: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

46

USES OF MULTIPLE REGRESSION

The advantages of multiple regression over simple, bivariate regression are, I think, pretty obvious.

It is extremely unlikely, for example, that an outcome of interest will have only one predictor.

Thus, multiple regression can allow a researcher to examine multiple predictors, and in so doing, try to increase the percentage of variance explained in the outcome.

So, the first use of multiple regression is to try and increase R2.

Page 47: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

47

USES OF MULTIPLE REGRESSION – INCREASING R2

Why might it be important to increase R2?

Well, when researchers are trying to maximize R2, they are usually interested in doing their best to predict Y (the outcome).

So, in situations like trying to predict the weather, success in a graduate school program, success with a particular type of treatment, etc., the researchers want to see whether adding more predictors to the model improves variance explained.

If adding a predictor does not improve R2, then data on that variable does not need to be gathered.

Page 48: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

48

USES OF MULTIPLE REGRESSION – DIFFERENTIATING BETWEEN PREDICTORS

The second use of multiple regression is as a means of comparing the predictive ability of different predictor variables.

Here, the question is whether X1 is predictive of Y given X2 and vice versa (is X2 predictive of Y given X1).

When researchers are interested in differentiating between predictors, while they are still interested in predicting Y, they are more interested in the process of getting to Y.

Page 49: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

49

USES OF MULTIPLE REGRESSION – DIFFERENTIATING BETWEEN PREDICTORS (CONT.)

For example, trying to predict children’s scores on a behavioral index (e.g., aggression).

The researcher might be interested in knowing whether influences from outside of the home (e.g., peers, community) have a significant association with these behaviors over-and-above those in the home.

The implications would be that the design of an intervention program would have to consider both influences in maximizing R2 (change in kids’ behaviors).

These are not mutually exclusive considerations.

Page 50: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

50

COLLINEARITY AND MULTICOLLINEARITY

Collinearity occurs when two predictors correlate with one another very strongly.

Multicollinearity occurs when three or more predictors correlate with one another very strongly.

Keep in mind that we’re not discussing the predictors’ association with the outcome variable, only their correlation with one another.

What’s the problem with collinearity and multicollinearity?

Page 51: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

51

COLLINEARITY AND MULTICOLLINEARITY (CONT.)

The problem is that when we have two predictors that are highly correlated with one another, neither predicts a significant amount of unique variance in the outcome (see Venn diagram).

When we enter these two predictors into a multiple regression analysis, while the overall R2 will likely be significant, the independent predictors are likely not to be significant.

Page 52: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

52

Collinearity and Muticollinearity

Y

X2X1

Page 53: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

53

COLLINEARITY AND MULTICOLLINEARITY (CONT.)

When two predictors are highly correlated with one another (r > .75), it is almost impossible to determine whether they account for a unique percentage of the variance in the outcome.

What I want to point out is that if the sole goal is to maximize R2, then collinearity and multicollinearity are not a problem.

However, when the primary goal of the study is to disentangle the relationship between the predictor variables, then collinearity and multicollinearity are definitely an issue that need to be addressed.

Page 54: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

54

VI. INTERACTION EFFECTS

Page 55: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

55

INTERACTION EFFECTS I think you will be able to see that adding an

interaction effect to the model is more applicable to the types of questions researchers are interested in examining.

An interaction effect occurs when the nature of the relationship between one of our predictor variables and our outcome depends on the level of another predictor variable.

This other predictor variable, on which the relationship depends, is called a moderator variable.

I know this definition is wordy, so let’s graph these relationships out to see what we’re saying here.

Page 56: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

56

INTERACTION EFFECTS EXPLAINED PICTORIALLY

Let’s say that we find that greater participation in an intervention (X1) is predictive of reductions in our outcome of interest (Y). This is what’s called a Main Effect.

Let’s further suppose that we now want to know whether our intervention is equally effective for different kinds of individuals or is the effect specific to one group?

Page 57: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

57

INTERACTION EFFECTS EXPLAINED PICTORIALLY (CONT.)

One possibility might be that females benefit from the program while males do not.

We are saying, relative to our definition of interaction effect, that the impact of the intervention depends on the level of participant’s sex (X2; moderator).

1 2 3 40

1

2

3

4

5

6

7

8

9

MalesFemales

Page 58: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

58

1 2 3 40

1

2

3

4

5

6

7

8

9

Males

Females

1 2 3 40

1

2

3

4

5

6

7

8

9

Males

Females

1 2 3 40

1

2

3

4

5

6

7

8

9

MalesFemales

1 2 3 40

1

2

3

4

5

6

7

8

9

Males

Females

Page 59: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

59

OTHER POSSIBLE INTERACTION EFFECTS The first graph of an interaction effect (top left) on the

previous slide is a situation where both females and males are benefitting from the intervention, but females are deriving more benefit than males.

In the second graph of an interaction effect (top right) , females are benefitting from the intervention, while for males, the intervention is actually causing them harm (iatrongenic effect).

The final two slides (bottom left and right) are not interaction effects. Why? Because we see the same effect regardless of the participant’s sex. In the bottom left, we see a main effect, but no interaction.

In the bottom right, no main effect and no interaction effect.

Page 60: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

60

IMPORTANT POINTS ABOUT INTERACTIONS

Always graph your data to see what the interaction effect looks like.

Just because you know you have a significant interaction effect does not tell you how your variables are related to one another.

Also, as we discussed with correlational effects, when speaking about interaction effects, you don’t want to just say an interaction effect was found. You also want to specify what that relationship looks like.

From the graph where females are benefitting and males are being harmed, you may not have a significant main effect since females’ improvement would be canceled out by males’ decline.

Page 61: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

61

MODERATOR VARIABLES I just want to point out that moderators do not just

specify for whom an intervention might be more effective. They could also indicate for whom any relationship between predictor and outcome might be different.

For example, while there may be a significant association between community violence exposure and trauma symptoms for adolescents, there may be no association for young children.

In addition to moderators that specify for whom the relationship between predictor and outcome is stronger and weaker, other moderators specify whether there is an additive effect of two predictors on the outcome.

Page 62: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

62

ADDITIVE EFFECTS In the case of additive effects, we are likely to see

significant main effects for each of our predictors, but an even stronger effect for the interaction term.

For example, let’s say that a therapist uses two different types of treatment strategies in their work with children exposed to maltreatment.

One group of youth receive the first strategy; a second group receives the second strategy; and a third group receives a combination of both strategies.

We might hypothesize that treatment strategies 1 and 2 would both independently improve mental health, but that the combination would impact mental health the most.

Page 63: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

63

ADDITIVE EFFECTS (CONT.)

This type of an additive effect is exactly what my colleagues and I were examining in a study involving community violence exposure and caregiver transitions.

1 2 3 40

2

4

6

8

10

12

Strategy 2Strategy 1Strategy 1+2

Page 64: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

64

CAREGIVER TRANSITIONS AND COMMUNITY VIOLENCE EXPOSURE

Previous studies have found a positive correlation between the number of caregiver transitions kids experience and the number of psychosocial problems they exhibit.

Also, previous studies find that the more community violence kids experience, the more psychosocial problems they exhibit.

We hypothesized that there should be an additive effect. In other words, CVE should add to, or enhance the negative effects of caregiver transitions on psychosocial problems.

Page 65: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

65

CAREGIVER TRANSITIONS AND COMMUNITY VIOLENCE EXPOSURE (CONT.)

This is exactly what we found.

While there were main effects of caregiver transitions and CVE on psychosocial problems, these main effects were qualified by a significant interaction effect – more specifically, an additive effect.

Kids who experienced high levels of both caregiver transitions and CVE experienced the highest levels of youth, caregiver, and teacher-reported psychosocial problems.

CVE enhanced, or added to, the relationship between caregiver transitions and psychosocial problems.

Page 66: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

66

BUFFERING EFFECTS Yet another type of interaction effect is

something called a buffering effect.

Let’s contrast a buffering effect with the additive effect we just looked at.

In the additive effect, the moderator variable enhances the relationship between the predictor variable and outcome.

In the buffering effect, the moderator variable attenuates the relationship between predictor and outcome variable.

Page 67: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

67

BUFFERING EFFECTS (CONT.) Let’s say, for example, that you want to test a

treatment program for newly-diagnosed depressed patients.

You randomly assign depressed participants to either a treatment or control group.

After 12 weeks of the study, you might find that in the control group, there is a strong, positive correlation between level of baseline depression symptoms and sleep disturbances across the 12 week period.

However, in the treatment group, after 12 weeks, you might find no relationship between baseline levels of depression and sleep problems.

We would say that the treatment program acted as a buffer against the harmful effects of depression.

Page 68: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

68

DATING VIOLENCE AND THE ROLE OF CAREGIVERS AND PEERS In a study published in 2013, we wanted to

examine the relationship between adolescents’ exposure to intimate partner violence (IPV) and their subsequent involvement in dating violence.

We hypothesized that adolescents who had been exposed to IPV would be more likely to be perpetrators and victims of their own dating violence.

However, we further hypothesized that when adolescents had positive caregivers and peers in their life, that these influences would buffer the effects of previous exposure to IPV.

Page 69: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

69

DATING VIOLENCE AND THE ROLE OF CAREGIVERS AND PEERS (CONT.)

This is exactly what we found.

Youth who had been exposed to IPV were more likely to be victims of dating violence.

Positive caregiver practices and prosocial peers buffered the impact of prior exposure to IPV.

When adolescents did not have positive caregivers or prosocial peers, there was a strong positive correlation between IPV exposure and both dating violence perpetration and victimization.

When adolescents did have these positive influences, the association was significantly attenuated.

Page 70: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

70

VII. TESTING INTERACTION EFFECTS

Page 71: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

71

A COMMON MISTAKE WITH TESTING INTERACTION EFFECTS - SPLITTING THE SAMPLE One common mistake made by researchers is

to test for an interaction effect between two predictors by splitting the sample.

For example, if I think that participants’ sex will determine who my intervention program is most effective for, I might split my sample into males and females.

After splitting the sample, I might then look at the bivariate regression with number of sessions attended predicting the outcome variable to see if there is a significant association.

Page 72: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

72

A COMMON MISTAKE WITH TESTING INTERACTION EFFECTS - SPLITTING THE SAMPLE (CONT.)

There are a couple of problems with the practice of splitting one’s sample.

The first problem with this practice reduces the power to detect a significant effect, which increases your chance of making what type of error?

The second problem with splitting your sample is that you do not examine the ability of the interaction term (X1X2) to add significant, unique variance to that explained by the independent predictors (main effects).

The only way to truly examine whether the interaction term adds unique variance is to conduct hierarchical multiple regression (step 1 & step 2).

Page 73: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

73

TESTING INTERACTIONS IN SPSS Two general means of assessing interaction effects.

1. Step 1 – Main effects and covariates (X1 + X2 + covariates).

Step 2 – Main effects, interaction effects, & covariates (X1 + X2 + X1X2, covariates).

2. Step 1 – X1 + covariates

Step 2 – X2 + covariates

Step 3 – X1, X2, X1X2, + covariates

The choice of which method to use depends on whether you want to separate out your predictors and examine the unique variance associated with each.

Page 74: D AY 2 A GENDA I. Statistical Significance II. Correlational Analyses III. Simple Linear Regression Analysis IV. Assumptions of Regression Analysis V

74

TESTING INTERACTIONS IN SPSS (CONT.) Remember that you only interpret main effects

when the interaction term is not in the model (either in step 1 and/or in step 2).

This is because you only want to know about your predictors’ impact on Y relative to one another (not the interaction term).

When interpreting the significance of the interaction effect, you must include the individual predictors (X1 & X2) in the model (either in step 2 or step 3).

Remember, you want to know whether the interaction term explains a significant amount of the variance over-and-above the X1 and X2 predictors individually.