sw388r7 data analysis & computers ii slide 1 hierarchical multiple regression differences...

103
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample problem Steps in hierarchical multiple regression Homework Problems

Upload: lorin-caldwell

Post on 14-Jan-2016

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 1

Hierarchical Multiple Regression

Differences between hierarchical and standard multiple regression

Sample problem

Steps in hierarchical multiple regression

Homework Problems

Page 2: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 2

Differences between standard and hierarchical multiple regression

Standard multiple regression is used to evaluate the relationship between a set of independent variables and a dependent variable.

Hierarchical regression is used to evaluate the relationship between a set of independent variables and the dependent variable, controlling for or taking into account the impact of a different set of independent variables on the dependent variable.

For example, a research hypothesis might state that there are differences between the average salary for male employees and female employees, even after we take into account differences between education levels and prior work experience.

In hierarchical regression, the independent variables are entered into the analysis in a sequence of blocks, or groups that may contain one or more variables. In the example above, education and work experience would be entered in the first block and sex would be entered in the second block.

Page 3: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 3

Differences in statistical results

SPSS shows the statistical results (Model Summary, ANOVA, Coefficients, etc.) as each block of variables is entered into the analysis.

In addition (if requested), SPSS prints and tests the key statistic used in evaluating the hierarchical hypothesis: change in R² for each additional block of variables.

The null hypothesis for the addition of each block of variables to the analysis is that the change in R² (contribution to the explanation of the variance in the dependent variable) is zero.

If the null hypothesis is rejected, then our interpretation indicates that the variables in block 2 had a relationship to the dependent variable, after controlling for the relationship of the block 1 variables to the dependent variable.

Page 4: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 4

Variations in hierarchical regression - 1

A hierarchical regression can have as many blocks as there are independent variables, i.e. the analyst can specify a hypothesis that specifies an exact order of entry for variables.

A more common hierarchical regression specifies two blocks of variables: a set of control variables entered in the first block and a set of predictor variables entered in the second block.

Control variables are often demographics which are thought to make a difference in scores on the dependent variable. Predictors are the variables in whose effect our research question is really interested, but whose effect we want to separate out from the control variables.

Page 5: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 5

Variations in hierarchical regression - 2

Support for a hierarchical hypothesis would be expected to require statistical significance for the addition of each block of variables.

However, many times, we want to exclude the effect of blocks of variables previously entered into the analysis, whether or not a previous block was statistically significant. The analysis is interested in obtaining the best indicator of the effect of the predictor variables. The statistical significance of previously entered variables is not interpreted.

The latter strategy is the one that we will employ in our problems.

Page 6: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 6

Differences in solving hierarchical regression problems

R² change, i.e. the increase when the predictors variables are added to the analysis is interpreted rather than the overall R² for the model with all variables entered.

In the interpretation of individual relationships, the relationship between the predictors and the dependent variable is presented.

Similarly, in the validation analysis, we are only concerned with verifying the significance of the predictor variables. Differences in control variables are ignored.

Page 7: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 7

A hierarchical regression problem

The problem asks us to examine the feasibility of doing multiple regression to evaluate the relationships among these variables. The inclusion of the “controlling for” phrase indicates that this is a hierarchical multiple regression problem.

Multiple regression is feasible if the dependent variable is metric and the independent variables (both predictors and controls) are metric or dichotomous, and the available data is sufficient to satisfy the sample size requirements.

Page 8: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 8

Level of measurement - answer

"Spouse's highest academic degree" [spdeg] is ordinal, satisfying the metric level of measurement requirement for the dependent variable, if we follow the convention of treating ordinal level variables as metric. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation.

"Age" [age] is interval, satisfying the metric or dichotomous level of measurement requirement for independent variables.

"Highest academic degree" [degree] is ordinal, satisfying the metric or dichotomous level of measurement requirement for independent variables, if we follow the convention of treating ordinal level variables as metric. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation.

"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level of measurement requirement for independent variables.

True with caution is the correct answer.

Hierarchical multiple regression requires that the dependent variable be metric and the independent variables be metric or dichotomous.

Page 9: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 9

Sample size - question

The second question asks about the sample size requirements for multiple regression.

To answer this question, we will run the initial or baseline multiple regression to obtain some basic data about the problem and solution.

Page 10: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 10

The baseline regression - 1

After we check for violations of assumptions and outliers, we will make a decision whether we should interpret the model that includes the transformed variables and omits outliers (the revised model), or whether we will interpret the model that uses the untransformed variables and includes all cases including the outliers (the baseline model).

In order to make this decision, we run the baseline regression before we examine assumptions and outliers, and record the R² for the baseline model. If using transformations and outliers substantially improves the analysis (a 2% increase in R²), we interpret the revised model. If the increase is smaller, we interpret the baseline model.

To run the baseline model, select Regression | Linear… from the Analyze model.

Page 11: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 11

The baseline regression - 2

First, move the dependent variable spdeg to the Dependent text box.

Second, move the independent variables to control for age and sex to the Independent(s) list box.

Third, select the method for entering the variables into the analysis from the drop down Method menu. In this example, we accept the default of Enter for direct entry of all variables in the first block which will force the controls into the regression.

Fourth, click on the Next button to tell SPSS to add another block of variables to the regression analysis.

Page 12: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 12

The baseline regression - 3

First, move the predictor independent variable degree to the Independent(s) list box for block 2.

Second, click on the Statistics… button to specify the statistics options that we want.

SPSS identifies that we will now be adding variables to a second block.

Page 13: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 13

The baseline regression - 4

Second, mark the checkboxes for Model Fit, Descriptives, and R squared change.

The R squared change statistic will tell us whether or not the variables added after the controls have a relationship to the dependent variable.

Fifth, click on the Continue button to close the dialog box.

First, mark the checkboxes for Estimates on the Regression Coefficients panel.

Third, mark the Durbin-Watson statistic on the Residuals panel.

Fourth, mark the Collinearity diagnostics to get tolerance values for testing multicollinearity.

Page 14: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 14

The baseline regression - 5

Click on the OK button to request the regression output.

Page 15: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 15

R² for the baseline model

Prior to any transformations of variables to satisfy the assumptions of multiple regression or the removal of outliers, the proportion of variance in the dependent variable explained by the independent variables (R²) was 28.1%.

The relationship is statistically significant, though we would not stop if it were not significant because the lack of significance may be a consequence of violation of assumptions or the inclusion of outliers.

The R² of 0.281 is the benchmark that we will use to evaluate the utility of transformations and the elimination of outliers.

Page 16: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 16

Descriptive Statistics

1.78 1.281 136

45.80 14.534 136

1.60 .491 136

1.65 1.220 136

SPOUSES HIGHESTDEGREE

AGE OF RESPONDENT

RESPONDENTS SEX

RS HIGHEST DEGREE

Mean Std. Deviation N

Sample size – evidence and answer

Hierarchical multiple regression requires that the minimum ratio of valid cases to independent variables be at least 5 to 1. The ratio of valid cases (136) to number of independent variables (3) was 45.3 to 1, which was equal to or greater than the minimum ratio. The requirement for a minimum ratio of cases to independent variables was satisfied.

In addition, the ratio of 45.3 to 1 satisfied the preferred ratio of 15 cases per independent variable.

The answer to the question is true.

Page 17: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 17

Assumption of normality for the dependent variable - question

Having satisfied the level of measurement and sample size requirements, we turn our attention to conformity with three of the assumptions of multiple regression: normality, linearity, and homoscedasticity.

First, we will evaluate the assumption of normality for the dependent variable.

Page 18: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 18

Run the script to test normality

First, move the variables to the list boxes based on the role that the variable plays in the analysis and its level of measurement.

Third, mark the checkboxes for the transformations that we want to test in evaluating the assumption.

Second, click on the Normality option button to request that SPSS produce the output needed to evaluate the assumption of normality.

Fourth, click on the OK button to produce the output.

Page 19: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 19

Descriptives

1.78 .110

1.56

2.00

1.75

1.00

1.640

1.281

0

4

4

2.00

.573 .208

-1.051 .413

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

SPOUSESHIGHEST DEGREE

Statistic Std. Error

Normality of the dependent variable: spouse’s highest degree

The dependent variable "spouse's highest academic degree" [spdeg] did not satisfy the criteria for a normal distribution. The skewness of the distribution (0.573) was between -1.0 and +1.0, but the kurtosis of the distribution (-1.051) fell outside the range from -1.0 to +1.0. The answer to the

question is false.

Page 20: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 20

Normality of the transformed dependent variable:

spouse’s highest degree

The "log of spouse's highest academic degree [LGSPDEG=LG10(1+SPDEG)]" satisfied the criteria for a normal distribution. The skewness of the distribution (-0.091) was between -1.0 and +1.0 and the kurtosis of the distribution (-0.678) was between -1.0 and +1.0.

The "log of spouse's highest academic degree [LGSPDEG=LG10(1+SPDEG)]" was substituted for "spouse's highest academic degree" [spdeg] in the analysis.

Page 21: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 21

Normality of the control variable: age

Next, we will evaluate the assumption of normality for the control variable, age.

Page 22: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 22

Descriptives

45.99 1.023

43.98

48.00

45.31

43.50

282.465

16.807

19

89

70

24.00

.595 .148

-.351 .295

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

AGE OF RESPONDENTStatistic Std. Error

Normality of the control variable: age

The independent variable "age" [age] satisfied the criteria for a normal distribution. The skewness of the distribution (0.595) was between -1.0 and +1.0 and the kurtosis of the distribution (-0.351) was between -1.0 and +1.0.

Page 23: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 23

Normality of the predictor variable: highest academic degree

Next, we will evaluate the assumption of normality for the predictor variable, highest academic degree.

Page 24: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 24

Descriptives

1.41 .071

1.27

1.55

1.35

1.00

1.341

1.158

0

4

4

1.00

.948 .149

-.051 .297

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

RS HIGHEST DEGREEStatistic Std. Error

Normality of the predictor variable:respondent’s highest academic degree

The independent variable "highest academic degree" [degree] satisfied the criteria for a normal distribution. The skewness of the distribution (0.948) was between -1.0 and +1.0 and the kurtosis of the distribution (-0.051) was between -1.0 and +1.0.

Page 25: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 25

Assumption of linearity for spouse’s degree and respondent’s degree - question

The metric independent variables satisfied the criteria for normality, but the dependent variable did not.

However, the logarithmic transformation of "spouse's highest academic degree" produced a variable that was normally distributed and will be tested as a substitute in the analysis.

The script for linearity will support our using the transformed dependent variable without having to add it to the data set.

Page 26: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 26

Run the script to test linearity

First, click on the Linearity option button to request that SPSS produce the output needed to evaluate the assumption of linearity.

Third, click on the OK button to produce the output.

When the linearity option is selected, a default set of transformations to test is marked.

Second , since we have decided to use the log transformation of the dependent variable, we mark the check box for the Logarithmic transformation and clear the check box for the Untransformed version of the dependent variable.

Page 27: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 27

Linearity test: spouse’s highest degree and respondent’s highest academic degree

The correlation between "highest academic degree" and logarithmic transformation of "spouse's highest academic degree" was statistically significant (r=.519, p<0.001). A linear relationship exists between these variables.

Page 28: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 28

Linearity test: spouse’s highest degree and respondent’s age

The assessment of the linear relationship between logarithmic transformation of "spouse's highest academic degree" [LGSPDEG=LG10(1+SPDEG)] and "age" [age] indicated that the relationship was weak, rather than nonlinear. Neither the correlation between logarithmic transformation of "spouse's highest academic degree" and "age" nor the correlations with the transformations were statistically significant.

The correlation between "age" and logarithmic transformation of "spouse's highest academic degree" was not statistically significant (r=.009, p=0.921). The correlations for the transformations were: the logarithmic transformation (r=.061, p=0.482); the square root transformation (r=.034, p=0.692); the inverse transformation (r=.112, p=0.194); and the square transformation (r=-.037, p=0.668)

Page 29: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 29

Assumption of homogeneity of variance - question

Sex is the only dichotomous independent variable in the analysis. We will test if for homogeneity of variance using the logarithmic transformation of the dependent variable which we have already decided to use.

Page 30: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 30

Run the script to test homogeneity of variance

First, click on the Homogeneity of variance option button to request that SPSS produce the output needed to evaluate the assumption of linearity.

Third, click on the OK button to produce the output.

When the homogeneity of variance option is selected, a default set of transformations to test is marked.

Second , since we have decided to use the log transformation of the dependent variable, we mark the check box for the Logarithmic transformation and clear the check box for the Untransformed version of the dependent variable.

Page 31: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 31

Assumption of homogeneity of variance – evidence and answer

Based on the Levene Test, the variance in "log of spouse's highest academic degree [LGSPDEG=LG10(1+SPDEG)]" was homogeneous for the categories of "sex" [sex]. The probability associated with the Levene statistic (0.687) was p=0.409, greater than the level of significance for testing assumptions (0.01). The null hypothesis that the group variances were equal was not rejected.

The homogeneity of variance assumption was satisfied. The answer to the question is true.

Page 32: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 32

Including the transformed variable in the data set - 1

In the evaluation for normality, we resolved a problem with normality for spouse’s highest academic degree with a logarithmic transformation. We need to add this transformed variable to the data set, so that we can incorporate it in our detection of outliers.

We can use the script to compute transformed variables and add them to the data set.

We select an assumption to test (Normality is the easiest), mark the check box for the transformation we want to retain, and clear the check box "Delete variables created in this analysis."

NOTE: this will leave the transformed variable in the data set. To remove it, you can delete the column or close the data set without saving.

Page 33: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 33

Including the transformed variable in the data set - 2

Second, click on the Normality option button to request that SPSS do the test for normality, including the transformation we will mark.

First, move the variable SPDEG to the list box for the dependent variable.

Fifth, click on the OK button.

Third, mark the transformation we want to retain (Logarithmic) and clear the checkboxes for the other transformations.

Fourth, clear the check box for the option "Delete variables created in this analysis".

Page 34: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 34

Including the transformed variable in the data set - 3

If we scroll to the rightmost column in the data editor, we see than the log of SPDEG in included in the data set.

Page 35: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 35

Including the transformed variable in the list of variables in the script - 1

If we scroll to the bottom of the list of variables, we see that the log of SPDEG is not included in the list of available variables.

To tell the script to add the log of SPDEG to the list of variables in the script, click on the Reset button. This will start the script over again, with a new list of variables from the data set.

Page 36: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 36

Including the transformed variable in the list of variables in the script - 2

If we scroll to the bottom of the list of variables now, we see that the log of SPDEG is included in the list of available variables.

Page 37: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 37

Detection of outliers - question

In multiple regression, an outlier in the solution can be defined as a case that has a large residual because the equation did a poor job of predicting its value.

We will run the regression again incorporating any transformations we have decided to test, and have SPSS compute the standardized residual for each case. Cases with a standardized residual larger than +/- 3.0 will be treated as outliers.

Page 38: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 38

The revised regression using transformations

To run the regression to detect outliers, select the Linear Regression command from the menu that drops down when you click on the Dialog Recall button.

Page 39: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 39

The revised regression: substituting transformed variables

Remove the variable SPDEG from the list of independent variables. Include the log of the variable, LGSPDEG.

Click on the Statistics… button to select statistics we will need for the analysis.

Page 40: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 40

The revised regression: selecting statistics

Second, mark the checkboxes for Model Fit, Descriptives, and R squared change.

The R squared change statistic will tell us whether or not the variables added after the controls have a relationship to the dependent variable.

Sixth, click on the Continue button to close the dialog box.

First, mark the checkboxes for Estimates on the Regression Coefficients panel.

Third, mark the Durbin-Watson statistic on the Residuals panel.

Fifth, mark the Collinearity diagnostics to get tolerance values for testing multicollinearity.

Fourth, mark the checkbox for the Casewise diagnostics, which will be used to identify outliers.

Page 41: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 41

The revised regression: saving standardized residuals

Mark the checkbox for Standardized Residuals so that SPSS saves a new variable in the data editor. We will use this variable to omit outliers in the revised regression model.

Click on the Continue button to close the dialog box.

Page 42: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 42

The revised regression: obtaining output

Click on the OK button to obtain the output for the revised model.

Page 43: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 43

Outliers in the analysis

If cases have a standardized residual larger than +/- 3.0, SPSS creates a table titled Casewise Diagnostics, in which it lists the cases and values that results in their being an outlier.

If there are no outliers, SPSS does not print the Casewise Diagnostics table. There was no table for this problem. The answer to the question is true.

We can verify that all standardized residuals were less than +/- 3.0 by looking the minimum and maximum standardized residuals in the table of Residual Statistics. Both the minimum and maximum fell in the acceptable range.

Since there were no outliers, we can use the regression just completed to make our decision about which model to interpret.

Page 44: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 44

Selecting the model to interpret - question

Since there were no outliers, we can use the regression just completed to make our decision about which model to interpret.

If the R² for the revised model is higher by 2% or more, we will base out interpretation on the revised model; otherwise, we will interpret the baseline model.

Page 45: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 45

Selecting the model to interpret – evidence and answer

Prior to any transformations of variables to satisfy the assumptions of multiple regression and the removal of outliers, the proportion of variance in the dependent variable explained by the independent variables (R²) was 28.1%. After substituting transformed variables, the proportion of variance in the dependent variable explained by the independent variables (R²) was 27.1%.

Since the revised regression model did not explain at least two percent more variance than explained by the baseline regression analysis, the baseline regression model with all cases and the original form of all variables should be used for the interpretation.

The transformations used to satisfy the assumptions will not be used, so cautions should be added for the assumptions violated.

False is the correct answer to the question.

Page 46: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 46

Re-running the baseline regression - 1

Having decided to use the baseline model for the interpretation of this analysis, the SPSS regression output was re-created.

To run the baseline regression again, select the Linear Regression command from the menu that drops down when you click on the Dialog Recall button.

Page 47: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 47

Re-running the baseline regression - 2

Remove the transformed variable lgspdeg from the dependent variable textbox and add the variable spdeg.

Click on the Save button to remove the request to save standardized residuals to the data editor.

Page 48: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 48

Revised regression using transformations and omitting outliers - 3

Clear the checkbox for Standardized Residuals so that SPSS does not save a new set of them in the data editor when it runs the new regression.

Click on the Continue button to close the dialog box.

Page 49: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 49

Re-running the baseline regression - 4

Click on the OK button to request the regression output.

Page 50: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 50

Assumption of independence of errors - question

We can now check the assumption of independence of errors for the analysis we will interpret.

Page 51: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 51

Model Summaryc

.014a .000 -.015 1.290 .000 .013 2 133 .987

.531b .281 .265 1.098 .281 51.670 1 132 .000 1.754

Model1

2

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Durbin-Watson

Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENTa.

Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, RS HIGHEST DEGREEb.

Dependent Variable: SPOUSES HIGHEST DEGREEc.

Assumption of independence of errors:evidence and answer

Having selected a regression model for interpretation, we can now examine the final assumptions of independence of errors.

The Durbin-Watson statistic is used to test for the presence of serial correlation among the residuals, i.e., the assumption of independence of errors, which requires that the residuals or errors in prediction do not follow a pattern from case to case.

The value of the Durbin-Watson statistic ranges from 0 to 4. As a general rule of thumb, the residuals are not correlated if the Durbin-Watson statistic is approximately 2, and an acceptable range is 1.50 - 2.50.

The Durbin-Watson statistic for this problem is 1.754 which falls within the acceptable range.

If the Durbin-Watson statistic was not in the acceptable range, we would add a caution to the findings for a violation of regression assumptions. The answer to the question is true.

Page 52: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 52

Multicollinearity - question

The final condition that can have an impact on our interpretation is multicollinearity.

Page 53: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 53

Multicollinearity – evidence and answer

The tolerance values for all of the independent variables are larger than 0.10: "highest academic degree" [degree] (.990), "age" [age] (.954) and "sex" [sex] (.947).

Multicollinearity is not a problem in this regression analysis.

True is the correct answer to the question.

Page 54: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 54

Overall relationship between dependent variable

and independent variables - question

The first finding we want to confirm concerns the relationship between the dependent variable and the set of predictors after including the control variables in the analysis.

Page 55: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 55

Overall relationship between dependent variable

and independent variables – evidence and answer

Hierarchical multiple regression was performed to test the hypothesis that there was a relationship between the dependent variable "spouse's highest academic degree" [spdeg] and the predictor independent variables "highest academic degree" [degree] after controlling for the effect of the control independent variables "age" [age] and "sex" [sex]. In hierarchical regression, the interpretation for overall relationship focuses on the change in R². If change in R² is statistically significant, the overall relationship for all independent variables will be significant as well.

Page 56: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 56

Overall relationship between dependent variable

and independent variables – evidence and answer

Based on model 2 in the Model Summary table where the predictors were added , (F(1, 132) = 51.670, p<0.001), the predictor variable, highest academic degree, did contribute to the overall relationship with the dependent variable, spouse's highest academic degree. Since the probability of the F statistic (p<0.001) was less than or equal to the level of significance (0.05), the null hypothesis that change in R² was equal to 0 was rejected. The research hypothesis that highest academic degree reduced the error in predicting spouse's highest academic degree was supported.

Page 57: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 57

Overall relationship between dependent variable

and independent variables – evidence and answer

The increase in R² by including the predictor variables ("highest academic degree") in the analysis was 0.281, not 0.241.

Using a proportional reduction in error interpretation for R², information provided by the predictor variables reduced our error in predicting "spouse's highest academic degree" [spdeg] by 28.1%, not 24.1%.

The answer to the question is false because the problem stated an incorrect statistical value.

Page 58: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 58

Relationship of the predictor variable and the dependent variable - question

In these hierarchical regression problems, we will focus the interpretation of individual relationships on the predictor variables and ignore the contribution of the control variables.

Page 59: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 59

Coefficientsa

1.781 .577 3.085 .002

.001 .008 .009 .100 .920 .956 1.046

-.023 .231 -.009 -.100 .920 .956 1.046

.525 .521 1.007 .316

.003 .007 .037 .495 .622 .954 1.049

.114 .198 .044 .575 .566 .947 1.056

.559 .078 .533 7.188 .000 .990 1.010

(Constant)

AGE OF RESPONDENT

RESPONDENTS SEX

(Constant)

AGE OF RESPONDENT

RESPONDENTS SEX

RS HIGHEST DEGREE

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: SPOUSES HIGHEST DEGREEa.

Relationship of the predictor variable and the dependent variable – evidence and answer

Based on the statistical test of the b coefficient (t = 7.188, p<0.001) for the independent variable "highest academic degree" [degree], the null hypothesis that the slope or b coefficient was equal to 0 (zero) was rejected. The research hypothesis that there was a relationship between "highest academic degree" and "spouse's highest academic degree" was supported.

Page 60: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 60

Coefficientsa

1.781 .577 3.085 .002

.001 .008 .009 .100 .920 .956 1.046

-.023 .231 -.009 -.100 .920 .956 1.046

.525 .521 1.007 .316

.003 .007 .037 .495 .622 .954 1.049

.114 .198 .044 .575 .566 .947 1.056

.559 .078 .533 7.188 .000 .990 1.010

(Constant)

AGE OF RESPONDENT

RESPONDENTS SEX

(Constant)

AGE OF RESPONDENT

RESPONDENTS SEX

RS HIGHEST DEGREE

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: SPOUSES HIGHEST DEGREEa.

Relationship of the predictor variable and the dependent variable – evidence and answer

The b coefficient for the relationship between the dependent variable "spouse's highest academic degree" [spdeg] and the independent variable "highest academic degree" [degree]. was .559, which implies a direct relationship because the sign of the coefficient is positive. Higher numeric values for the independent variable "highest academic degree" [degree] are associated with higher numeric values for the dependent variable "spouse's highest academic degree" [spdeg].

The statement in the problem that "survey respondents who had higher academic degrees had spouses with higher academic degrees" is correct. The answer to the question is true with caution. Caution in interpreting the relationship should be exercised because of an ordinal variable treated as metric; and violation of the assumption of normality.

Page 61: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 61

Validation analysis - question

The problem states the random number seed to use in the validation analysis.

Page 62: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 62

Validation analysis:set the random number seed

To set the random number seed, select the Random Number Seed… command from the Transform menu.

Validate the results of your regression analysis by conducting a 75/25% cross-validation, using 998794 as the random number seed.

Page 63: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 63

Set the random number seed

First, click on the Set seed to option button to activate the text box.

Second, type in the random seed stated in the problem.

Third, click on the OK button to complete the dialog box.

Note that SPSS does not provide you with any feedback about the change.

Page 64: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 64

Validation analysis:compute the split variable

To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.

Page 65: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 65

The formula for the split variable

First, type the name for the new variable, split, into the Target Variable text box.

Second, the formula for the value of split is shown in the text box.

The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.75.

If the random number is less than or equal to 0.75, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.75, the formula will return a 0, the SPSS numeric equivalent to false.Third, click on the OK

button to complete the dialog box.

Page 66: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 66

The split variable in the data editor

In the data editor, the split variable shows a random pattern of zero’s and one’s.

To select the cases for the training sample, we select the cases where split = 1.

Page 67: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 67

Repeat the regression for the validation

To run the regression for the validation training sample, select the Linear Regression command from the menu that drops down when you click on the Dialog Recall button.

Page 68: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 68

Using "split" as the selection variable

First, scroll down the list of variables and highlight the variable split.

Second, click on the right arrow button to move the split variable to the Selection Variable text box.

Page 69: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 69

Setting the value of split to select cases

When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split.

Click on the Rule… button to enter a value for split.

Page 70: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 70

Completing the value selection

First, type the value for the training sample, 1, into the Value text box.

Second, click on the Continue button to complete the value entry.

Page 71: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 71

Requesting output for the validation analysis

When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.

Click on the OK button to

request the output.

Page 72: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 72

Validation analysis - 1

The validation analysis requires that the regression model for the 75% training sample replicate the pattern of statistical significance found for the full data set.

In the analysis of the 75% training sample, the relationship between the set of independent variables and the dependent variable was statistically significant, F(3, 103) = 11.569, p<0.001, as was the overall relationship in the analysis of the full data set, F(3, 132) = 17.235, p<0.001

Page 73: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 73

Validation analysis - 2

The validation of a hierarchical regression model also requires that the change in R² demonstrate statistical significance in the analysis of the 75% training sample.

The R² change of 0.249 satisfied this requirement (F change(1, 103) = 34.319, p<0.001).

Page 74: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 74

Validation analysis - 3

The pattern of significance for the individual relationships between the dependent variable and the predictor variable was the same for the analysis using the full data set and the 75% training sample.

The relationship between highest academic degree and spouse's highest academic degree was statistically significant in both the analysis using the full data set (t=7.188, p<0.001) and the analysis using the 75% training sample (t=5.484, p<0.001). The pattern of statistical significance of the independent variables for the analysis using the 75% training sample matched the pattern identified in the analysis of the full data set.

Page 75: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 75

Validation analysis - 4

The total proportion of variance explained in the model using the training sample was 25.2% (.502²), compared to 40.6% (.637²) for the validation sample. The value of R² for the validation sample was actually larger than the value of R² for the training sample, implying a better fit than obtained for the training sample.

This supports a conclusion that the regression model would be effective in predicting scores for cases other than those included in the sample.

The validation analysis supported the generalizability of the findings of the analysis to the population represented by the sample in the data set.

The answer to the question is true.

Page 76: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 76

Steps in complete hierarchical regression analysis

The following flow charts depict the process for solving the complete regression problem and determining the answer to each of the questions encountered in the complete analysis.

Text in italics (e.g. True, False, True with caution, Incorrect application of a statistic) represent the answers to each specific question.

Many of the steps in hierarchical regression analysis are identical to the steps in standard regression analysis. Steps that are different are identified with a magenta background, with the specifics of the difference underlined.

Page 77: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 77

Complete Hierarchical multiple regression analysis:

level of measurement

Incorrect application of a statistic

NoIs the dependent variable metric and the independent variables metric or dichotomous?

Yes

Question: do variables included in the analysis satisfy the level of measurement requirements?

Ordinal variables included in the relationship?

No

Yes

True

True with caution

Examine all independent variables – controls as well as predictors

Page 78: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 78

Complete Hierarchical multiple regression analysis:

sample size

Compute the baseline regression in SPSS

Yes

Ratio of cases to independent variables at least 5 to 1?

Yes

No Inappropriate application of a statistic

Question: Number of variables and cases satisfy sample size requirements?

Yes

Ratio of cases to independent variables at preferred sample size of at least 15 to 1?

No

True

True with caution

Include both controls and predictors, in the count of independent variables

Page 79: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 79

Question: each metric variable satisfies the assumption of normality?

Complete Hierarchical multiple regression analysis:

assumption of normality

The variable satisfies criteria for a normal distribution?

Yes

Use transformation in revised model, no caution needed

Log, square root, or inverse transformation satisfies normality?

If more than one transformation satisfies normality, use one with smallest skew

True

False

Yes

No

No

Use untransformed variable in analysis, add caution to interpretation for violation of normality

Test the dependent variable and both controls and predictor independent variables

Page 80: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 80

Question: relationship between dependent variable and metric independent variable satisfies assumption of linearity?

Complete Hierarchical multiple regression analysis:

assumption of linearity

Probability of Pearson correlation (r) <= level of significance?

Probability of correlation (r) for relationship with any transformation of IV <= level of significance?

No

Weak relationship.No caution

needed

No

Use transformation in revised model

Yes

If independent variable was transformed to satisfy normality, skip check for linearity. If more than one

transformation satisfies linearity, use one with largest r

If dependent variable was transformed for normality, use transformed dependent variable in the test for linearity.

Yes

True

Test both control and predictor independent variables

Page 81: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 81

Complete Hierarchical multiple regression analysis:

assumption of homogeneity of variance

Probability of Levene statistic <= level of significance?

Yes

No

If dependent variable was transformed for normality, substitute transformed dependent variable in the test for the assumption of homogeneity of variance

Question: variance in dependent variable is uniform across the categories of a dichotomous independent variable?

True

Do not test transformations of dependent variable, add caution to interpretation for violation of homoscedasticity

False

Test both control and predictor independent variables

Page 82: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 82

Complete Hierarchical multiple regression analysis: detecting outliers

No

Is the standardized residual for any case greater than +/-3.00?

Question: After incorporating any transformations, no outliers

were detected in the regression analysis.

If any variables were transformed for normality or linearity, substitute transformed variables in the regression for the detection of outliers.

Yes False

TrueRemove outliers and run revised regression again.

Page 83: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 83

Complete Hierarchical multiple regression analysis:

picking regression model for interpretation

R² for revised regression greater than R² for baseline regression by 2% or more?

Pick baseline regression with untransformed variables and all cases for interpretation

Pick revised regression with transformations and omitting outliers for interpretation

Yes No

Question: interpretation based on model that includes transformation of variables and removes outliers?

FalseTrue

Page 84: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 84

Complete Hierarchical multiple regression analysis:

assumption of independence of errors

Residuals are independent,Durbin-Watson between 1.5 and 2.5?

No

Yes

Question: serial correlation of errors is not a problem in this regression analysis?

NOTE: caution for violation of assumption of independence of errors

False

True

Page 85: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 85

Complete Hierarchical multiple regression analysis:

multicollinearity

Tolerance for all IV’s greater than 0.10, indicating no multicollinearity?

No

Yes

False

Question: Multicollinearity is not a problem in this regression analysis?

True

NOTE: halt the analysis until problem is diagnosed

Page 86: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 86

Complete Hierarchical multiple regression analysis:

overall relationship

Yes

Probability of F test of R² change less than/equal to level of significance?

NoFalse

Yes

Strength of R² change for predictor variables interpreted correctly?

NoFalse

No

Yes

True

True with cautionSmall sample, ordinal variables, or violation of assumption in the relationship?

Question: Finding about overall relationship between dependent variable and independent variables.

Page 87: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 87

Complete Hierarchical multiple regression analysis:

individual relationships

Yes

Probability of t test between predictors and DV <= level of significance?

Yes

No

Yes

Direction of relationship between predictors and DV interpreted correctly?

Yes

NoFalse

False

Question: Finding about individual relationship between independent variable and dependent variable.

No

Yes

True

True with cautionSmall sample, ordinal variables, or violation of assumption in the relationship?

Page 88: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 88

Complete Hierarchical multiple regression analysis:

individual relationships

Does the stated variable have the largest beta coefficient (ignoring sign) among predictors?

NoFalse

Question: Finding about independent variable with largest impact on dependent variable.

Small sample, ordinal variables, or violation of assumption in the relationship?

No

Yes

True

True with caution

Yes

Page 89: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 89

Complete Hierarchical multiple regression analysis:

validation analysis - 1

Yes

Probability of ANOVA test for training sample <= level of significance?

Yes

NoFalse

Probability of F for R² change for training sample <= level of significance?

NoFalse

Yes

Set the random seed and randomly split the sample into 75% training sample and 25% validation sample.

Question: The validation analysis supports the generalizability of the findings?

Page 90: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 90

Complete Hierarchical multiple regression analysis:

validation analysis - 2

Yes

Yes

Shrinkage in R² (R² for training sample - R² for validation sample) < 2%?

NoFalse

True

Pattern of significance for predictor variables in training sample matches pattern for full data set?

NoFalse

Page 91: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 91

Homework ProblemsMultiple Regression – Hierarchical Problems - 1

The hierarchical regression homework problems parallel the complete standard regression problems. The only assumption made is the problems is that there is no problem with missing data.

The complete hierarchical multiple regression will include:•Testing assumptions of normality and linearity •Testing for outliers•Determining whether to use transformations or exclude outliers, •Testing for independence of errors, •Checking for multicollinearty, and •Validating the generalizability of the analysis.

Page 92: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 92

Homework ProblemsMultiple Regression – Hierarchical Problems - 2

The statement of the hierarchical regression problem identifies the dependent variable, the predictor independent variables, and the control independent variables.

Page 93: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 93

Homework ProblemsMultiple Regression – Hierarchical Problems - 3

The findings, which must all be correct for a problem to be true, include: •a finding about the R2 change when the predictor independent variables are included in the regression, and •an interpretive statement about each of the predictor independent variables.

The findings do not specify any finding about the control independent variables.

Page 94: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 94

Homework ProblemsMultiple Regression – Hierarchical Problems - 4

The first prerequisite for a problem is the satisfaction of the level of measurement and minimum sample size requirements.

Failing to satisfy either of these requirement results in an inappropriate application of a statistic.

Page 95: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 95

Homework ProblemsMultiple Regression – Hierarchical Problems - 5

The assumption of normality requires that each metric variable be tested. If the variable is not normal, transformations should be examined to see if we can improve the distribution of the variable. If transformations are unsuccessful, a caution is added to any true findings.

Page 96: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 96

Homework ProblemsMultiple Regression – Hierarchical Problems - 6

The assumption of linearity is examined for any metric independent variables that were not transformed for the assumption of normality.

Page 97: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 97

Homework ProblemsMultiple Regression – Hierarchical Problems - 7

After incorporating any transformations, we look for outliers using standard residuals as the criterion.

Page 98: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 98

Homework ProblemsMultiple Regression – Hierarchical Problems - 8

We compare the results of the regression without transformations and exclusion of outliers to the model with transformations and excluding outliers to determine whether we will base our interpretation on the baseline or the revised analysis.

Page 99: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 99

Homework ProblemsMultiple Regression – Hierarchical Problems - 9

We test for the assumption of independence of errors and the presence of multicollinearity.

If we violate the assumption of independence, we attach a caution to our findings.

If there is a mutlicollinearity problem, we halt the analysis, since we may be reporting erroneous findings.

Page 100: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 100

Homework ProblemsMultiple Regression – Hierarchical Problems - 9

In hierarchical regression, we interpret the change in R² in the overall relationship that is associated with the inclusion of the predictor independent variables. The change in R² must be statistically significant and the magnitude of the R² change must be correctly stated.

Page 101: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 101

Homework ProblemsMultiple Regression – Hierarchical Problems -

10

The relationships between predictor independent variables and the dependent variable stated in the problem must be statistically significant, and worded correctly for the direction of the relationship.

Page 102: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 102

Homework ProblemsMultiple Regression – Hierarchical Problems -

11

We use a 75-25% validation strategy to support the generalizability of our findings. The validation must support:•the significance of the overall relationship,•the statistical significance of the change in R²,•the pattern of significance for the individual predictors, and •the shrinkage in R² for the validation sample must not be more than 2% less than the training sample.

Page 103: SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 103

Homework ProblemsMultiple Regression – Hierarchical Problems -

12

Cautions are added as limitations to the analysis, if needed.