chapter sixteen copyright © 2006 mcgraw-hill/irwin data analysis: testing for association

53
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

Upload: alvin-bridges

Post on 04-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

Chapter SixteenChapter Sixteen

Copyright © 2006McGraw-Hill/Irwin

Data Analysis:

Testing for Association

Page 2: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 2

1. Understand and evaluate the types of relationships between variables.

2. Explain the concepts of association and covariation.

3. Discuss the differences in chi square, Pearson correlation, and Spearman correlation.

4. Explain the concept of statistical significance versus practical significance.

5. Understand when and how to use regression analysis.

Learning Objectives

Page 3: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 3

• Relationship–consistent and systematic link between two or more variables

– First issue–are two or more variables related at all

• Presence of a relationship–systematic relationship exists between two or more variables

• Statistical significance–measures whether a relationship is present

– Second issue–the direction of the relationship–positive or negative

Understand and evaluate the types of relationships between

variables

Relationships Between Variables

Page 4: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 4

– Third issue–understanding the strength of the association

• Weak–the low probability of the variables having a relationship

• Moderate

• Strong–high probability a consistent and systematic relationship exists

Understand and evaluate the types of relationships between

variables

Relationships Between Variables

Page 5: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 5

– Fourth issue–type of relationship

• Two variables–related and the nature of the relationship

• Linear relationship–between two variables whereby the strength and nature of the relationship remains the same over the range of both variables

• Curvilinear relationship–between two variables whereby the strength and/or direction of their relationship changes over the range of both variables

Understand and evaluate the types of relationships between

variables

Relationships Between Variables

Page 6: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 6

• Three questions to ask about a relationship between two variables

– Is there a relationship between the two variables we are interested in?

– How strong is the relationship?

– How can that relationship be best described?

Understand and evaluate the types of relationships between

variables

Relationships Between Variables

Page 7: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 7

• Covariation–amount of change in one variable that is consistently related to the change in another variable of interest

• Scatter Diagram–graphic plot of the relative position of two variables using a horizontal and a vertical axis to represent the values of the respective variables

Explain the concepts of association and covariation

Using Covariation to Describe Variable

Relationships

Page 8: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 8

Explain the concepts of association and covariationExhibit 16.1

Page 9: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 9

Explain the concepts of association and covariationExhibit 16.2

Page 10: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 10

Explain the concepts of association and covariationExhibit 16.3

Page 11: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 11

Explain the concepts of association and covariationExhibit 16.4

Page 12: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 12

• Chi-Square (X2) Analysis–test for significance between the frequency distributions of two or more nominally scaled variables in a cross-tabulation table to determine if there is any association

– Assesses how closely the observed frequencies fit the pattern of the expected frequencies and is referred to as a ”goodness-of-fit” test

– Used to analyze nominal data which cannot be analyzed with other types of statistical analysis, such as ANOVA or t-tests

– Results will be distorted if more than 20 percent of the cells have an expected count of less than 5

Discuss the differences in Chi-square, Pearson correlation,

and Spearman correlation

Using Covariation to Describe Variable

Relationships

Page 13: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 13

Explain the concepts of association and covariationExhibit 16.5

Page 14: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 14

• Pearson Correlation Coefficient–statistical measure of the strength of a linear relationship between two metric variables

– It varies between – 1.00 and +1.00, with 0 representing absolutely no association between two variables, and – 1.00 and +1.00 representing perfect linkage between two variables

– The higher the correlation coefficient–the stronger the level of association

– The correlation coefficient can be either positive or negative–depending upon the direction of the relationship between two variables.

Using Covariation to Describe Variable

Relationships

Discuss the differences in Chi-square, Pearson correlation,

and Spearman correlation

Page 15: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 15

– Null hypothesis states that there is no association between the two variables in the population and that the correlation coefficient is zero

– The correlation coefficient statistically significant–null hypothesis is rejected and the conclusion is that the two variables do share some association in the population

– The size of the correlation coefficient can be used to quantitatively describe the strength of the association between two variables

Using Covariation to Describe Variable

Relationships

Discuss the differences in Chi-square, Pearson correlation,

and Spearman correlation

Page 16: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 16

Explain the concepts of association and covariationExhibit 16.6

Page 17: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 17

• Pearson Correlation Coefficient–several assumptions about the nature of the data

– The two variables are assumed to have been measured using interval or ratio-scaled measures

– Nature of the relationship to be measured is linear

– Variables to be analyzed come from a bivariate normally distributed population

Using Covariation to Describe Variable

Relationships

Discuss the differences in Chi-square, Pearson correlation,

and Spearman correlation

Page 18: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 18

Explain the concepts of association and covariationExhibit 16.7

Page 19: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 19

• Coefficient of Determination (r2)–a number measuring the proportion of variation in one variable accounted for by another.

– The r2 measure can be thought of as a percentage and varies from 0.0 to 1.00

– The larger the size of the coefficient of determination, the stronger the linear relationship between the two variables under study

Using Covariation to Describe Variable

Relationships

Discuss the differences in Chi-square, Pearson correlation,

and Spearman correlation

Page 20: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 20

• Spearman Rank Order Correlation Coefficient–a statistical measure of the linear association between two variables where both have been measured using ordinal (rank order) scales

– If either one of the variables is represented by rank order data–use the Spearman rank order correlation coefficient

– Spearman rank order correlation coefficient tends to produce the lower coefficient and is considered the more conservative measure

Using Covariation to Describe Variable

Relationships

Discuss the differences in Chi-square, Pearson correlation,

and Spearman correlation

Page 21: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 21

Explain the concepts of association and covariationExhibit 16.8

Page 22: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 22

Explain the concepts of association and covariationExhibit 16.9

Page 23: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 23

• Regression Analysis--One method for arriving at more detailed answers than can be provided by the correlation coefficient

– Marketing manager interested in making predictions about future sales levels or how much impact a potential price increase will have on the profits or market share of the company–number of ways to make such predictions

• Extrapolation from past behavior of the variable

• Simple guesses

• Use of a regression equation which compares information about related variables to assist in the prediction

What is Regression Analysis?

Understand when and how to use regression analysis

Page 24: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 24

• Bivariate Regression Analysis–a statistical technique which analyzes the linear relationship between two variables by estimating coefficients for an equation for a straight line. One variable is designated as a dependent variable and the other is called an independent or predictor variable

– Assumptions behind regression analysis

• Just like correlation analysis, regression analysis assumes that a linear relationship will provide a good description of the relationship between two variables

– Even though the common terminology of regression analysis uses the labels dependent and independent for the variables, those names do not mean that one variable causes the behavior of the other

What is Regression Analysis?

Understand when and how to use regression analysis

Page 25: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 25

• The use of a simple regression model assumes

1. The variables of interest are measured on interval or ratio scales (except in the case of dummy variables)

2. These variables come from a bivariate normal population

3. The error terms associated with making predictions are normally and independently distributed

What is Regression Analysis?

Understand when and how to use regression analysis

Page 26: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 26

Explain the concepts of association and covariationExhibit 16.10

Page 27: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 27

• Formula for a Straight Line

y = a + bX + eiwhere

y = the dependent variable

a = the intercept (point where the straight line intersects the y-axis when X = 0

b = the slope (the change in y for very 1-unit change in x)

X = the independent variable used to predict y

ei = the error for the prediction

What is Regression Analysis?

Understand when and how to use regression analysis

Page 28: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 28

• Regression Analysis–examine the relationship between the independent variable X and the dependent variable Y

• Least Squares Procedures–determines the best-fitting line by minimizing the vertical distances of all points from the line

• Test Statistical Significance–t-test is used to determine whether the computed intercept and slop are significantly different from zero

What is Regression Analysis?

Understand when and how to use regression analysis

Page 29: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 29

Explain the concepts of association and covariationExhibit 16.11

Page 30: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 30

• Ordinary Least Squares (OLS)–a statistical procedure that estimates regression equation coefficients which produce the lowest sum of squared differences between the actual and predicted values of the dependent variable

• Errors in Regression

– Differences between the actual and predicted values Y are represented by ei (the error term of the regression equation)

– Square the errors for each observation (the difference between actual values of Y and predicted values of Y) and add them up, the total represents an aggregate or overall measure of the accuracy of the regression equation

What is Regression Analysis?

Understand when and how to use regression analysis

Page 31: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 31

• Error in Regression– Regression equations calculated through the use of the

OLS procedures will always give the lowest squared error totals and this is why both bivariate and multiple regression analysis are sometimes referred to as OLS regression

– The error terms also can be used to diagnose potential problems caused by data observations that do not meet the assumptions described above

– The pattern obtained by comparing the actual values of Y with predicted Y values indicates whether the errors are normally distribute and/or have equal variances across the range of X values

What is Regression Analysis?

Understand when and how to use regression analysis

Page 32: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 32

Explain the concepts of association and covariationExhibit 16.12

Page 33: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 33

Explain the concepts of association and covariationExhibit 16.13

Page 34: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 34

Explain the concepts of association and covariationExhibit 16.14

Page 35: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 35

Explain the concepts of association and covariationExhibit 16.15

Page 36: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 36

• Adjusted R-square–adjustment reduces the R2 by taking into account the sample size and the number of independent variables in the regression equation. It tells you when the multiple regression equation has too many independent variables

• Explained variance–the amount of variation in the dependent construct that can be accounted for by the combination of independent variables

• Unexplained variance–the amount of variation in the dependent construct that cannot be accounted for by the combination of independent variables

• Regression coefficient–indicator of the importance of an independent variable in predicting a dependent variable. Large coefficients are good predictors and small coefficients are weak predictors

What is Regression Analysis?

Understand when and how to use regression analysis

Page 37: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 37

• Statistical Significance of the Regression Coefficients

1. If yes–answered the first question about the relationship–Is there a relationship between the dependent and independent variable

2. How strong is the relationship–what is the coefficient of determination (r2)–tells what percentage of the total variation in dependent variable

3. r2 measure varies between .00 and 1.00–the size of the r2 will indicate the strength of the relationship–the closer to 1.00 the stronger the relationship

What is Regression Analysis?

Understand when and how to use regression analysis

Page 38: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 38

End here today

• Hand back Lab 1

Page 39: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 39

• Multiple Regression Analysis–a statistical technique which analyzes the linear relationship between a dependent variable and multiple independent variables by estimating coefficients for the equation for a straight line

Multiple Regression Analysis

Understand when and how to use regression analysis

Page 40: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 40

• Relationship that exists between each independent variable and the dependent measure is still linear

– Analyze the relationships–examine the regression coefficients for each independent variable

– Describes the average amount of change to be expected in Y given a unit change in the value of the particular independent variable–this describes the relationship of that independent variable to the dependent variable

Multiple Regression Analysis

Understand when and how to use regression analysis

Page 41: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 41

• Concerns

• Each independent variable may be measured using a different scale

• The use of different scales will not allow the relative comparisons between regression coefficients to see which independent variable has the most influence on the dependent variable

Multiple Regression Analysis

Understand when and how to use regression analysis

Page 42: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 42

• Standardized regression coefficient–corrects this problem

• Beta Coefficient–– an estimated regression coefficient which has been

recalculated to have a mean of 0 and a standard deviation of 1.

– Such a change enables independent variables with different units of measurement to be directly compared on their association with the dependent variable.

• Standardization removes the effects of different scales

Multiple Regression Analysis

Understand when and how to use regression analysis

Page 43: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 43

• Regression coefficient–divided by its standard error to produce a t-test statistic, which is compared against the critical value to determine whether the null hypothesis can be rejected.

– Examine the t-test statistics for each regression coefficient

– Many times not all the independent variables in a regression equation will be statistically significant.

– Using multiple regression analysis–examine the overall statistical significance of the regression model

Multiple Regression Analysis

Explain the concept of statistical significance versus

practical significance

Page 44: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 44

• Model F statistic–measure is compared against a critical value to determine whether or not to reject the null hypothesis

– If the F statistic is statistically significant, it means that the chances of the regression model for your sample producing a large r2 when the population r2 is actually 0 are acceptably small

Multiple Regression Analysis

Explain the concept of statistical significance versus

practical significance

Page 45: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 45

• Appropriate procedure to follow in evaluating the results of a regression analysis

– Assess the statistical significance of the overall regression model using the F statistic and its associated probability

– Evaluate the obtained r2 to see how large it is

– Examine the individual regression coefficient and their t-test statistic to see which are statistically significant

– Look at the beta coefficient to assess relative influence

Multiple Regression Analysis

Explain the concept of statistical significance versus

practical significance

Page 46: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 46

Exhibit 16.16Explain the concept of

statistical significance versus practical significance

Page 47: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 47

• Dummy Variables–artificial variables introduced into a regression equation to represent the categories of a nominally scaled variable

• Sometimes the particular independent variables you may want to use to predict a dependent variable are not measured using interval or ratio scales

– In this case, use dummy variables

– There will be one dummy variable for each of the nominal categories of the independent variable and the values will typically be 0 or 1

Multiple Regression Analysis

Understand when and how to use regression analysis

Page 48: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 48

Exhibit 16.17 Understand when and how to use regression analysis

Page 49: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 49

• Multicollinearity–a situation in which several independent variables are highly correlated with each other.

– This characteristic can result in difficulty in estimating separate or independent regression coefficients for the correlated variables

Multiple Regression Analysis

Understand when and how to use regression analysis

Page 50: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 50

• Multicollinearity inflates the standard error of the coefficient and lowers the t statistic associated with it

• The major impact is confined to the statistical significance of the individual regression coefficients.

• Multicollinearity problems do not have an impact on the size of the r2 or the ability to predict values of the dependent variable

Multiple Regression Analysis

Understand when and how to use regression analysis

Page 51: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 51

Exhibit 16.18 Understand when and how to use regression analysis

Page 52: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 52

• Relationships Between Variables

• Using Covariation to Describe Variable Relationships

• What is Regression Analysis?

• Multiple Regression

Summary

Page 53: Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association

McGraw-Hill/Irwin 53

The End

Copyright © 2006 McGraw-Hill/Irwin