slide 1 testing multivariate assumptions the multivariate statistical techniques which we will cover...
TRANSCRIPT
Slide 1
Testing Multivariate Assumptions
The multivariate statistical techniques which we will cover in this class require one or more the following assumptions about the data: normality of the metric variables, homoscedastic relationships between the dependent variable and the metric and nonmetric independent variables, linear relationships between the metric variables, and absence of correlated prediction errors.
Multivariate analysis requires that the assumptions be tested twice: first, for the separate variables as we are preparing to do the analysis, and second, for the multivariate model variate, which acts collectively for the variables in the analysis and thus must meet the same assumptions as individual variables. In this section, we will examine the tests that we normally perform prior to computing the multivariate statistic. Since the pattern of prediction errors cannot be examined without computing the multivariate statistic, we will defer that discussion until we examine each of the specific techniques.
If the data fails to meet the assumptions required by the analysis, we can attempt to correct the problem with a transformation of the variable. There are two classes of transformations that we attempt: for violations of normality and homoscedasticity, we transform the individual metric variable to a inverse, logarithmic, or squared form; for violations of linearity, we either do a power transformation, e.g. raise the data to a squared or square root power, or we add an additional polynomial variable that contains a power term.
Testing Multivariate Assumptions
Slide 2
Testing Multivariate Assumptions - 2
Transforming variables is a trial and error process. We do the transformation and then see if it has corrected the problem with the data. It is not usually possible to be certain in advance that the transformation will correct the problem; sometimes it only reduces the degree of the violation. Even when the transformation might decrease the violation of the assumption, we might opt not to include it in the analysis because of the increased complexity it adds to the interpretation and discussion of the results.
It often happens that one transformation solves multiple problems. For example, skewed variables can produce violations of normality and homoscedasticity. No matter which test of assumptions identified the violation, our only remedy is a transformation of the metric variable to reduce the skewness.
Testing Multivariate Assumptions
Slide 3
1. Evaluating the Normality of Metric Variables
Determining whether or not the distribution of values for a metric variable complies with the definition of a normal curve is tested with histograms, normality plots, and statistical tests.
The histogram shows us the relative frequency of different ranges of values for the variable. If the variable is normally distributed, we expect the greatest frequency of values to occur in the center of the distribution, with decreasing frequency for values away from the center. In addition, a normally distributed variable will be symmetric, showing the same proportion of cases in the left and right tails of the distribution.
In a normality plot in SPSS, the actual distribution of cases is plotted in red against the distribution of cases that would be expected if the variable is normally distributed, plotted as a green line on the chart. Our conclusion about normality is based on the convergence or divergence between the plot of red points and the green line.
There are two statistical tests for normality: the Kolmogorov-Smirnov statistic with the Lilliefors correction factor for variables that have 50 cases or more, and the Shapiro-Wilk's test for variables that have fewer than 50 cases. SPSS will compute the test which is appropriate to the sample size. The statistical test is regarded as sensitive to violations of normality, especially for a large sample, so we should examine the histogram and normality plot for confirmation of a distribution problem.
The statistical test for normality is a test of the null hypothesis that the distribution is normal. The desirable outcome is a significance value for the statistic more than 0.05 so that we fail to reject the null hypothesis. If we fail to reject the null hypothesis, we conclude that the variable is normally distributed and meets the normality assumption.
If the significance value of the normality test statistic is smaller than 0.05, we reject the null hypothesis of normality and see if a transformation of the variable can induce normality to meet the statistical assumption.
Testing Multivariate Assumptions
Slide 4
Requesting Statistics to Test Normality
First, we select the 'Descriptive Statistics | Explore…' command from the Analyze menu.
Second we move the metric variables 'Delivery Speed' (x1), Price Level' (x2), 'Price Flexibility' (x3), 'Manufacturer Image' (x4), 'Service' (x5), 'Salesforce Image' (x6), 'Product Quality' (x7), Usage Level' (x9), and 'Satisfaction Level' (x10) to the 'Dependent List: ' list box.
Third, we click on the Plots option button in the Display panel since all of our output will be from plots commands.
Fourth, we click on the Plots button to request the histograms and normality plots.
Testing Multivariate Assumptions
Slide 5
Requesting the Plot to Test Normality
First, we markthe 'None' optionbutton in the'Boxplots' panel.
Second, we mark the'Normality plots withtests' check box.
Third, we markthe 'Histogram'check box in the'Descriptive'panel.
Fourth, weclick on theContinuebutton toclose thePlots dialogbox.
Fifth, weclick on theOK buttonto requestthe output.
Testing Multivariate Assumptions
Slide 6
Output for the Statistical Tests of Normality
The null hypothesis in the K-S Lilliefors test of normality is that the data for the variable is normally distributed. The desirable outcome for this test is to fail to reject the null hypothesis.
When the probability value in the Sig. column is less than 0.05, we conclude that the variable is not normally distributed. In this table we see that Price Level, Price Flexibility, Manufacturer Image, Salesforce Image, and Product Quality are not normally distributed. We will consider transformations for some of these variables to see if we can induce normality.
Testing Multivariate Assumptions
Slide 7
The Histogram for Delivery Speed (X1)
According to the K-S Lilliefors test, the data for Delivery Speed is normally distributed, even though the graph shows slight departures.
Testing Multivariate Assumptions
Slide 8
The Normality Plot for Delivery Speed (X1)
I n support of the K-S Lilliefors test that this variable is normal, thenormality plot shows that the red points closely track the green linewhich represents the plot of a perfectly normal distribution. (Note thatthis is the Q-Q version of the normality plot; the text shows the P-Pversion of the normality plot. Both normality plots convey the sameinformation and support the same interpretation.)
Testing Multivariate Assumptions
Slide 9
The Histogram for Price Level (X2)
According to the K-S Lilliefors test, Price Level was not normally distributed. The histogram suggests that normality problem is due to positive skewing.
Testing Multivariate Assumptions
Slide 10
The Normality Plot for Price Level (X2)
When a distribution is skewed, the points at the ends of the distribution diverge from the green degree line representing normality.
Testing Multivariate Assumptions
Slide 11
Transformations to Induce Normality
The following chart shows prototypical departures from normality and theformula for the recommended transformation to correct the problem.
Testing Multivariate Assumptions
Slide 12
Computing the Square Root Transformation for Price Level
Second, we type a name for the new variable in the 'Target Variable: ' text box. I prefer to incorporate an abbreviation for the type of transformation and the original variable name into a mnemonic.
Third, we use the square root, SQRT, function to compute the square root of the original Price Level variable.
Fourth, we click on the OK button to produce the transformed variable.
Based on the histogram for Price Level shown above, the square root transformation should be appropriate for inducing normality. First, we select the 'Compute…' command from the Transform menu.
Testing Multivariate Assumptions
Slide 13
Request the Normality Analysis for the Transformed Price Level Variable
First, we select the 'Descriptive Statistics | Explore…' command from the Analyze menu.
Second, we move the transformed variable, 'sqrt_x2' to the 'Dependent List: ' text box.
All of the other options that we selected on previous Explore commands remain in effect. Third, we click on the OK button to produce the output.
Testing Multivariate Assumptions
Slide 14
The K-S Lilliefors Test for the Transformed Price Level Variable
The Sig value for the K-S Lilliefors test is now greater than 0.05. We cannot reject the null hypothesis and we conclude that 'SQRT_X2' is normally distributed. The transformation was effective.
Testing Multivariate Assumptions
Slide 15
The Histogram for the Transformed Price Level Variable
The overall shape of the histogram for the transformed Price Level variable more closely approximates a normal distribution.
Testing Multivariate Assumptions
Slide 16
The Normality Plot for the Transformed Price Level Variable
The points previously representing skewed values pulling away from the green normality line now fit the green normality line more closely.
Testing Multivariate Assumptions
Slide 17
The Histogram for Price Flexibility (X3)
The histogram for the Price Flexibility variable supports the K-S Lilliefors test that this variable is not normally distributed. It appears to flatter than a normal distribution indicating a kurtosis problem and also shows negative skewing.
Testing Multivariate Assumptions
Slide 18
The Normality Plot for Price Flexibility (X3)
When the variable has a kurtosis problem, it shows up as an s-shaped curve deviating from the green normality line. The points pulled away from the green line represent skewing.
Testing Multivariate Assumptions
Slide 19
Computing the Square Root Transformation for Price Flexibility
Since it is not obvious which transformation will correct the normality problems, we will compute all three. First, we select the 'Compute…' command from the Transform menu.
Second, we type in a mnemonic for the name of the new variable in the 'Target Variable: ' text box.
Third, we type in the formula for the square root transformation using the SQRT function. Note that since we have negative skewing, we use the reflection version of the transformation in which we subtract all values from a constant that is one unit larger than the largest value of the original variable.
Fourth, we click on the OK button to produce the new variable.
Testing Multivariate Assumptions
Slide 20
Computing the Logarithmic Transformation for Price Flexibility
First, we select the 'Compute…' command from the Transform menu.
Second, we type in a mnemonic for the name of the new variable in the 'Target Variable: ' text box.
Third, we type in the formula for the logarithmic transformation using the LG10 function. Note that since we have negative skewing, we use the reflection version of the transformation in which we subtract all values from a constant that is one unit larger than the largest value of the original variable.
Fourth, we click on the OK button to produce the new variable.
Testing Multivariate Assumptions
Slide 21
Computing the Inverse Transformation for Price Flexibility
First, we select the 'Compute…' command from the Transform menu.
Second, we type in a mnemonic for the name of the new variable in the 'Target Variable: ' text box.
Third, we type in the formula for the inverse transformation. Note that since we have negative skewing, we use the reflection version of the transformation in which we subtract all values from a constant that is one unit larger than the largest value of the original variable.
Fourth, we click on the OK button to produce the new variable.
Testing Multivariate Assumptions
Slide 22
Request the explore command for the three transformed variables
First, select the 'Summarize | Explore…' command from the Analyze menu.
Second, move the variables sqrt_x3, log_x3, and inv_x3 to the 'Dependent List: ' list box.
Third, the options from the previous analysis remain in effect, so we click on the OK button to obtain the output.
Testing Multivariate Assumptions
Slide 23
The K-S Lilliefors tests for the transformed variables
The test of normality for all three transformed variables all have a Sig value less than 0.05, so we conclude that none of them follow a normal distribution. Since none of the transformations offer an improvement, we retain the original form of the variable in our analysis.
Testing Multivariate Assumptions
Slide 24
2. Evaluating Homogeneity of Variance for Non-metric Variables
The Levene statistic tests for equality of variance across subgroups on a non-metric variable. The null hypothesis in the test is that the variance of each subgroup is the same. The desired outcome is a failure to reject the null hypothesis. If we do reject the null hypothesis and conclude that the variance of at least one of the subgroups is not the same, we can use a special formula for computing the variance if one exists, such as we do with t-tests, or we can apply one of the transformations used to induce normality on the metric variable.
While the Levene statistic is available through several statistical procedures in SPSS, we can obtain it for any number of groups using the One-way ANOVA Procedure.
We will demonstrate this test by checking the homogeneity of variance for the metric variables 'Delivery Speed', Price Level', 'Price Flexibility', 'Manufacturer Image', 'Service', 'Salesforce Image', 'Product Quality', Usage Level', and 'Satisfaction Level' among the subgroups of the non-metric variable 'Firm Size.'
Testing Multivariate Assumptions
Slide 25
Requesting a One-way ANOVA
First, select the 'Compare Means | One-Way ANOVA...' command from the Analyze menu.
Second, move the metric variables 'Delivery Speed' (x1), Price Level' (x2), 'Price Flexibility' (x3), 'Manufacturer Image' (x4), 'Service' (x5), 'Salesforce Image' (x6), 'Product Quality' (x7), Usage Level' (x9), and 'Satisfaction Level' (x10) to the 'Dependent List: ' list box.
Fourth, click on the 'Options...' button to request the Levene statistic.
Third, move the non-metric variable 'Firm Size' (x8) to the 'Factor: ' text box.
Testing Multivariate Assumptions
Slide 26
Request the Levene Homogeneity of Variance Test
First, mark the check box for the 'Homogeneity-of-variance test' on the Statistics' panel.
Second, click on the Continue button to close the 'One-Way ANOVA: Options' dialog box.
Third, click on the OK button to close the 'One-Way ANOVA' dialog.
Testing Multivariate Assumptions
Slide 27
The Tests of Homogeneity of Variances
Using an alpha level of 0.05, we see that four metric variables do not have the same variance for both small and large firms: Manufacturer Image, Service, Salesforce Image, and Product Quality.
Testing Multivariate Assumptions
Slide 28
Compute the Transformed Variables for 'Manufacturer Image' (x4)
Enter the formulas for thelogarithmic, square root, andinverse transformations for'Manufacturer Image' in theCompute Variable dialog box.
Testing Multivariate Assumptions
Slide 29
Request the Levene Test for the Transformed Manufacturer Image Variables
First, select the 'Compare Means | One-Way ANOVA...' command from the Analyze menu.
Second, move the transformed variables to the 'Dependent List: ' list box and the non-metric variable 'Firm Size' (x8) to the 'Factor: ' text box.
Third, click on the 'Options...' button to request the Levene statistic.
Fourth, mark the check box for the 'Homogeneity-of-variance test' on the Statistics' panel.
Fifth, click on the Continue button to close the 'One-Way ANOVA: Options' dialog box and click on the OK button to close the 'One-Way ANOVA' dialog.
Testing Multivariate Assumptions
Slide 30
Levene Test Results for the Transformed Manufacturer Image Variables
The results of the Levene Tests of Homogeneity of Variances indicate that none of the transformations are effective in resolving the homogeneity of variance problem for the subgroups of Firm Size on the variable Product Quality. We would note the problem in our statement about the limitations of our analysis.
Testing Multivariate Assumptions
Slide 31
Compute the Transformed Variables for 'Product Quality' (x7)
Enter the formulas for thelogarithmic, square root, andinverse transformations for'Product Quality' in the ComputeVariable dialog box. Since thevariable is negatively skewed,we use the reflection form of thetransformations.
Testing Multivariate Assumptions
Slide 32
Request the Levene Test for the Transformed Product Quality Variables
First, select the 'Compare Means | One-Way ANOVA...' command from the Analyze menu.
Second, move the transformed variables to the 'Dependent List: ' list box and the non-metric variable 'Firm Size' (x8) to the 'Factor: ' text box.
Third, click on the 'Options...' button to request the Levene statistic.
Fourth, mark the check box for the 'Homogeneity-of-variance test' on the Statistics' panel.
Fifth, click on the Continue button to close the 'One-Way ANOVA: Options' dialog box and click on the OK button to close the 'One-Way ANOVA' dialog.
Testing Multivariate Assumptions
Slide 33
Results of the Levene Test for the Transformed Product Quality Variables
The results of the Levene Tests of Homogeneity of Variances indicate that either the logarithmic transformation or the square root transformation are effective in resolving the homogeneity of variance problem for the subgroups of Firm Size on the variable Product Quality.
Testing Multivariate Assumptions
Slide 34
3. Evaluate Linearity and Homoscedasticity of Metric Variables with Scatterplots
Other assumptions required for multivariate analysis focus on the relationships between pairs of metric variables. It is assumed that the relationship between metric variables is linear, and the variance is homogenous through the range of both metric variables. If both the linearity and the homoscedasticity assumptions are met, the plot of points will appear as a rectangular band in a scatterplot. If there is a strong relationship between the variables, the band will be narrow. If the relationship is weaker, the band becomes broader. If the pattern of points is curved instead of rectangular, there is a violation of the assumption of linearity. If the band of points is narrower at one end than it is at the other (funnel-shaped), there is a violation of the assumption of homogeneity of variance. Violations of the assumptions of linearity and homoscedasticity may be correctable through transformation of one or both variables, similar to the transformations employed for violations of the normality assumption. A diagnostic graphic with recommended transformations is available in the text on page 77.
SPSS provides a scatterplot matrix for examining the linearity and homoscedasticity for a set of metric variables as a diagnostic tool. If greater detail is required, a bivariate scatterplot for pairs of variables is available. We will request a scatterplot matrix for the eight metric variables from the HATCO data set in the scatterplot matrix on page 43 of the text. None of the relationships in this scatterplot matrix shows any serious problem with linearity or heteroscedasticity, so this exercise will not afford the opportunity to examine transformations. Examples of transformations to achieve linearity will be included in the next set of exercises titled A Further Look at Transformations.
Testing Multivariate Assumptions
Slide 35
Requesting the Scatterplot Matrix
First, select the 'Scatter...' command from the Graphs menu.
Second, in the 'Scatterplot' dialog box, click on the 'Matrix' thumbnail sketch to highlight it. Click on the 'Define' button to access the next dialog box.
Third, click on the 'Define' button to access the next dialog box.
Testing Multivariate Assumptions
Slide 36
Specify the Variables to Include in the Scatterplot Matrix
First, highlight the variables'Delivery Speed' (x1), Price Level'(x2), 'Price Flexibility' (x3),'Manufacturer Image' (x4), 'Service'(x5), 'Salesforce Image' (x6),'Product Quality' (x7), and UsageLevel' (x9) in the list of availablevariables and click on the movearrow to add these variables to thelist of 'Matrix Variables."
Second, click onthe OK button torequest the output.
Testing Multivariate Assumptions
Slide 37
Add Fit Lines to the Scatterplot Matrix
The Scatterplot Matrix appears in the SPSS output viewer. To add fit or trend lines to each scatterplot to make it easier to interpret, we double click on the graph to open it in the SPSS chart editor.
Testing Multivariate Assumptions
Slide 38
Requesting the Fit Lines
First, select the 'Options...' command from the Chart menu.
Second, mark the 'Total' check box in the 'Fit Line' panel/
Third, click on the OK button to close the dialog box.
Testing Multivariate Assumptions
Slide 39
Changing the Thickness of the Fit Lines
The fit lines appear on each scatterplot as thin red lines that are difficult to see. We will change their thickness and color. First, click on one of the fit lines so that the editing handles appear at the end and middle of each.
Second, click on the 'Line Style' tool button to access the 'Line Styles' dialog box.
Third, highlight the second entry in the 'Weight' panel.
Fourth, click on the 'Apply' button to change the weight of the fit lines. Click on the 'Close' button to close the dialog box.
Testing Multivariate Assumptions
Slide 40
Changing the Color of the Fit Lines
First, with the fit lines still highlighted, click onthe 'Color' tool button to open the 'Colors' dialog.
Second, highlightthe blue color box.
Third, click on the 'Apply'button to change the colorof the fit lines.
Click on the Close buttonto close the dialog box.
Testing Multivariate Assumptions
Slide 41
The Final Scatterplot Matrix
The final scatterplot matrix has the same basic appearance as the graphic on page 43 of the text. There are not any obvious violations of linearity or homoscedasticity, so the diagnostic task is completed. Were there violations, we would test the transformations used to achieve linearity to see if we can correct the violation of the assumption.
Testing Multivariate Assumptions