regression analysis · 2015-06-03 · learn how to detect relationships between ordinal and...

135
Relationships Between Two Variables Regression Assumptions Regression Analysis BUS 735: Business Decision Making and Research BUS 735: Business Decision Making and Research Regression Analysis

Upload: others

Post on 15-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Regression Analysis

BUS 735: Business Decision Making and Research

BUS 735: Business Decision Making and Research Regression Analysis

Page 2: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Goals of this section 1/ 23

Specific goals

Learn how to detect relationships between ordinal and categoricalvariables.

Learn how to estimate a linear relationship between many variables.

Learning objectives

LO2: Be able to construct and use multiple regression models (includingsome limited dependent variable models) to construct and testhypotheses considering complex relationships among multiple variables.

LO6: Be able to use standard computer packages such as SPSS andExcel to conduct the quantitative analyses described in the learningobjectives above.

LO7: Have a sound familiarity of various statistical and quantitativemethods in order to be able to approach a business decision problem andbe able to select appropriate methods to answer the question.

BUS 735: Business Decision Making and Research Regression Analysis

Page 3: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Goals of this section 1/ 23

Specific goals

Learn how to detect relationships between ordinal and categoricalvariables.

Learn how to estimate a linear relationship between many variables.

Learning objectives

LO2: Be able to construct and use multiple regression models (includingsome limited dependent variable models) to construct and testhypotheses considering complex relationships among multiple variables.

LO6: Be able to use standard computer packages such as SPSS andExcel to conduct the quantitative analyses described in the learningobjectives above.

LO7: Have a sound familiarity of various statistical and quantitativemethods in order to be able to approach a business decision problem andbe able to select appropriate methods to answer the question.

BUS 735: Business Decision Making and Research Regression Analysis

Page 4: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 5: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 6: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 7: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 8: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 9: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 10: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 11: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Correlation 2/ 23

Pearson linear correlation coefficient: a value between -1 and+1 that is used to measure the strength of a positive ornegative linear relationship.

Valid for interval or ratio data.Not appropriate for ordinal or nominal data.Test depends on assumptions behind the central limit theorem(CLT)

Spearman rank correlation: non-parametric test.

Valid for small sample sizes (when assumptions of CLT areviolated)Appropriate for interval, ratio, and even ordinal data.Still makes no sense to use for nominal data.

BUS 735: Business Decision Making and Research Regression Analysis

Page 12: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Chi-Squared Test for Independence 3/ 23

Used to determine if two categorical variables (eg: nominal)are related.

Example: Suppose a hotel manager surveys guest whoindicate they will not return:

Reason for Not Returning

Reason for Stay Price Location Amenities

Personal/Vacation 56 49 0Business 20 47 27

Data in the table are always frequencies that fall intoindividual categories.

Could use this table to test if two variables are independent.

BUS 735: Business Decision Making and Research Regression Analysis

Page 13: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Chi-Squared Test for Independence 3/ 23

Used to determine if two categorical variables (eg: nominal)are related.

Example: Suppose a hotel manager surveys guest whoindicate they will not return:

Reason for Not Returning

Reason for Stay Price Location Amenities

Personal/Vacation 56 49 0Business 20 47 27

Data in the table are always frequencies that fall intoindividual categories.

Could use this table to test if two variables are independent.

BUS 735: Business Decision Making and Research Regression Analysis

Page 14: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Chi-Squared Test for Independence 3/ 23

Used to determine if two categorical variables (eg: nominal)are related.

Example: Suppose a hotel manager surveys guest whoindicate they will not return:

Reason for Not Returning

Reason for Stay Price Location Amenities

Personal/Vacation 56 49 0Business 20 47 27

Data in the table are always frequencies that fall intoindividual categories.

Could use this table to test if two variables are independent.

BUS 735: Business Decision Making and Research Regression Analysis

Page 15: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Chi-Squared Test for Independence 3/ 23

Used to determine if two categorical variables (eg: nominal)are related.

Example: Suppose a hotel manager surveys guest whoindicate they will not return:

Reason for Not Returning

Reason for Stay Price Location Amenities

Personal/Vacation 56 49 0Business 20 47 27

Data in the table are always frequencies that fall intoindividual categories.

Could use this table to test if two variables are independent.

BUS 735: Business Decision Making and Research Regression Analysis

Page 16: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Chi-Squared Test for Independence 3/ 23

Used to determine if two categorical variables (eg: nominal)are related.

Example: Suppose a hotel manager surveys guest whoindicate they will not return:

Reason for Not Returning

Reason for Stay Price Location Amenities

Personal/Vacation 56 49 0Business 20 47 27

Data in the table are always frequencies that fall intoindividual categories.

Could use this table to test if two variables are independent.

BUS 735: Business Decision Making and Research Regression Analysis

Page 17: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Test of independence 4/ 23

Null hypothesis: there is no relationship between the rowvariable and the column variable.

Alternative hypothesis: The two variables are dependent.

Test statistic:

χ2 =∑ (O − E )2

E

O: observed frequency in a cell from the contingency table.

E : expected frequency assuming variables are independent.

Large χ2 values indicate variables are dependent (reject thenull hypothesis).

BUS 735: Business Decision Making and Research Regression Analysis

Page 18: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Test of independence 4/ 23

Null hypothesis: there is no relationship between the rowvariable and the column variable.

Alternative hypothesis: The two variables are dependent.

Test statistic:

χ2 =∑ (O − E )2

E

O: observed frequency in a cell from the contingency table.

E : expected frequency assuming variables are independent.

Large χ2 values indicate variables are dependent (reject thenull hypothesis).

BUS 735: Business Decision Making and Research Regression Analysis

Page 19: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Test of independence 4/ 23

Null hypothesis: there is no relationship between the rowvariable and the column variable.

Alternative hypothesis: The two variables are dependent.

Test statistic:

χ2 =∑ (O − E )2

E

O: observed frequency in a cell from the contingency table.

E : expected frequency assuming variables are independent.

Large χ2 values indicate variables are dependent (reject thenull hypothesis).

BUS 735: Business Decision Making and Research Regression Analysis

Page 20: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Test of independence 4/ 23

Null hypothesis: there is no relationship between the rowvariable and the column variable.

Alternative hypothesis: The two variables are dependent.

Test statistic:

χ2 =∑ (O − E )2

E

O: observed frequency in a cell from the contingency table.

E : expected frequency assuming variables are independent.

Large χ2 values indicate variables are dependent (reject thenull hypothesis).

BUS 735: Business Decision Making and Research Regression Analysis

Page 21: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Test of independence 4/ 23

Null hypothesis: there is no relationship between the rowvariable and the column variable.

Alternative hypothesis: The two variables are dependent.

Test statistic:

χ2 =∑ (O − E )2

E

O: observed frequency in a cell from the contingency table.

E : expected frequency assuming variables are independent.

Large χ2 values indicate variables are dependent (reject thenull hypothesis).

BUS 735: Business Decision Making and Research Regression Analysis

Page 22: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Test of independence 4/ 23

Null hypothesis: there is no relationship between the rowvariable and the column variable.

Alternative hypothesis: The two variables are dependent.

Test statistic:

χ2 =∑ (O − E )2

E

O: observed frequency in a cell from the contingency table.

E : expected frequency assuming variables are independent.

Large χ2 values indicate variables are dependent (reject thenull hypothesis).

BUS 735: Business Decision Making and Research Regression Analysis

Page 23: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

CorrelationChi-Squared Test of Independence

Test of independence 4/ 23

Null hypothesis: there is no relationship between the rowvariable and the column variable.

Alternative hypothesis: The two variables are dependent.

Test statistic:

χ2 =∑ (O − E )2

E

O: observed frequency in a cell from the contingency table.

E : expected frequency assuming variables are independent.

Large χ2 values indicate variables are dependent (reject thenull hypothesis).

BUS 735: Business Decision Making and Research Regression Analysis

Page 24: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression 5/ 23

Regression line: equation of the line that describes the linearrelationship between variable x and variable y .

Need to assume that independent variables influencedependent variables.

x : independent or explanatory variable.y : dependent variable.Variable x can influence the value for variable y , but not viceversa.

Example: How does smoking affect lung capacity?

Example: How does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 25: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression 5/ 23

Regression line: equation of the line that describes the linearrelationship between variable x and variable y .

Need to assume that independent variables influencedependent variables.

x : independent or explanatory variable.y : dependent variable.Variable x can influence the value for variable y , but not viceversa.

Example: How does smoking affect lung capacity?

Example: How does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 26: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression 5/ 23

Regression line: equation of the line that describes the linearrelationship between variable x and variable y .

Need to assume that independent variables influencedependent variables.

x : independent or explanatory variable.y : dependent variable.Variable x can influence the value for variable y , but not viceversa.

Example: How does smoking affect lung capacity?

Example: How does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 27: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression 5/ 23

Regression line: equation of the line that describes the linearrelationship between variable x and variable y .

Need to assume that independent variables influencedependent variables.

x : independent or explanatory variable.y : dependent variable.Variable x can influence the value for variable y , but not viceversa.

Example: How does smoking affect lung capacity?

Example: How does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 28: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression 5/ 23

Regression line: equation of the line that describes the linearrelationship between variable x and variable y .

Need to assume that independent variables influencedependent variables.

x : independent or explanatory variable.y : dependent variable.Variable x can influence the value for variable y , but not viceversa.

Example: How does smoking affect lung capacity?

Example: How does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 29: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression 5/ 23

Regression line: equation of the line that describes the linearrelationship between variable x and variable y .

Need to assume that independent variables influencedependent variables.

x : independent or explanatory variable.y : dependent variable.Variable x can influence the value for variable y , but not viceversa.

Example: How does smoking affect lung capacity?

Example: How does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 30: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression 5/ 23

Regression line: equation of the line that describes the linearrelationship between variable x and variable y .

Need to assume that independent variables influencedependent variables.

x : independent or explanatory variable.y : dependent variable.Variable x can influence the value for variable y , but not viceversa.

Example: How does smoking affect lung capacity?

Example: How does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 31: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression line 6/ 23

Population regression line:

yi = β0 + β1xi + εi

The actual coefficients β0 and β1 describing the relationshipbetween x and y are unknown.

Use sample data to come up with an estimate of theregression line:

yi = b0 + b1xi + ei

Since x and y are not perfectly correlated, still need to havean error term.

BUS 735: Business Decision Making and Research Regression Analysis

Page 32: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression line 6/ 23

Population regression line:

yi = β0 + β1xi + εi

The actual coefficients β0 and β1 describing the relationshipbetween x and y are unknown.

Use sample data to come up with an estimate of theregression line:

yi = b0 + b1xi + ei

Since x and y are not perfectly correlated, still need to havean error term.

BUS 735: Business Decision Making and Research Regression Analysis

Page 33: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression line 6/ 23

Population regression line:

yi = β0 + β1xi + εi

The actual coefficients β0 and β1 describing the relationshipbetween x and y are unknown.

Use sample data to come up with an estimate of theregression line:

yi = b0 + b1xi + ei

Since x and y are not perfectly correlated, still need to havean error term.

BUS 735: Business Decision Making and Research Regression Analysis

Page 34: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression line 6/ 23

Population regression line:

yi = β0 + β1xi + εi

The actual coefficients β0 and β1 describing the relationshipbetween x and y are unknown.

Use sample data to come up with an estimate of theregression line:

yi = b0 + b1xi + ei

Since x and y are not perfectly correlated, still need to havean error term.

BUS 735: Business Decision Making and Research Regression Analysis

Page 35: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression line 6/ 23

Population regression line:

yi = β0 + β1xi + εi

The actual coefficients β0 and β1 describing the relationshipbetween x and y are unknown.

Use sample data to come up with an estimate of theregression line:

yi = b0 + b1xi + ei

Since x and y are not perfectly correlated, still need to havean error term.

BUS 735: Business Decision Making and Research Regression Analysis

Page 36: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Regression line 6/ 23

Population regression line:

yi = β0 + β1xi + εi

The actual coefficients β0 and β1 describing the relationshipbetween x and y are unknown.

Use sample data to come up with an estimate of theregression line:

yi = b0 + b1xi + ei

Since x and y are not perfectly correlated, still need to havean error term.

BUS 735: Business Decision Making and Research Regression Analysis

Page 37: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Predicted values and residuals 7/ 23

Given a value for xi , can come up with a predicted value foryi , denoted yi .

yi = b0 + b1xi

This is not likely be the actual value for yi .

Residual is the difference in the sample between the actualvalue of yi and the predicted value, y .

ei = yi − y = yi − b0 − b1xi

BUS 735: Business Decision Making and Research Regression Analysis

Page 38: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Predicted values and residuals 7/ 23

Given a value for xi , can come up with a predicted value foryi , denoted yi .

yi = b0 + b1xi

This is not likely be the actual value for yi .

Residual is the difference in the sample between the actualvalue of yi and the predicted value, y .

ei = yi − y = yi − b0 − b1xi

BUS 735: Business Decision Making and Research Regression Analysis

Page 39: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Predicted values and residuals 7/ 23

Given a value for xi , can come up with a predicted value foryi , denoted yi .

yi = b0 + b1xi

This is not likely be the actual value for yi .

Residual is the difference in the sample between the actualvalue of yi and the predicted value, y .

ei = yi − y = yi − b0 − b1xi

BUS 735: Business Decision Making and Research Regression Analysis

Page 40: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Predicted values and residuals 7/ 23

Given a value for xi , can come up with a predicted value foryi , denoted yi .

yi = b0 + b1xi

This is not likely be the actual value for yi .

Residual is the difference in the sample between the actualvalue of yi and the predicted value, y .

ei = yi − y = yi − b0 − b1xi

BUS 735: Business Decision Making and Research Regression Analysis

Page 41: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Predicted values and residuals 7/ 23

Given a value for xi , can come up with a predicted value foryi , denoted yi .

yi = b0 + b1xi

This is not likely be the actual value for yi .

Residual is the difference in the sample between the actualvalue of yi and the predicted value, y .

ei = yi − y = yi − b0 − b1xi

BUS 735: Business Decision Making and Research Regression Analysis

Page 42: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Multiple Regression 8/ 23

Multiple regression line (population):

yi = β0 + β1x1,i + β2x2 + ...+ βk−1xk−1 + εi

Multiple regression line (sample):

yi = b0 + b1x1,i + b2x2 + ...+ bkxk + ei

k : number of parameters (coefficients) you are estimating.εi : error term, since linear relationship between the x variablesand y are not perfect.ei : residual = the difference between the predicted value y andthe actual value yi .

BUS 735: Business Decision Making and Research Regression Analysis

Page 43: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Multiple Regression 8/ 23

Multiple regression line (population):

yi = β0 + β1x1,i + β2x2 + ...+ βk−1xk−1 + εi

Multiple regression line (sample):

yi = b0 + b1x1,i + b2x2 + ...+ bkxk + ei

k : number of parameters (coefficients) you are estimating.εi : error term, since linear relationship between the x variablesand y are not perfect.ei : residual = the difference between the predicted value y andthe actual value yi .

BUS 735: Business Decision Making and Research Regression Analysis

Page 44: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Multiple Regression 8/ 23

Multiple regression line (population):

yi = β0 + β1x1,i + β2x2 + ...+ βk−1xk−1 + εi

Multiple regression line (sample):

yi = b0 + b1x1,i + b2x2 + ...+ bkxk + ei

k : number of parameters (coefficients) you are estimating.εi : error term, since linear relationship between the x variablesand y are not perfect.ei : residual = the difference between the predicted value y andthe actual value yi .

BUS 735: Business Decision Making and Research Regression Analysis

Page 45: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Multiple Regression 8/ 23

Multiple regression line (population):

yi = β0 + β1x1,i + β2x2 + ...+ βk−1xk−1 + εi

Multiple regression line (sample):

yi = b0 + b1x1,i + b2x2 + ...+ bkxk + ei

k : number of parameters (coefficients) you are estimating.εi : error term, since linear relationship between the x variablesand y are not perfect.ei : residual = the difference between the predicted value y andthe actual value yi .

BUS 735: Business Decision Making and Research Regression Analysis

Page 46: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Multiple Regression 8/ 23

Multiple regression line (population):

yi = β0 + β1x1,i + β2x2 + ...+ βk−1xk−1 + εi

Multiple regression line (sample):

yi = b0 + b1x1,i + b2x2 + ...+ bkxk + ei

k : number of parameters (coefficients) you are estimating.εi : error term, since linear relationship between the x variablesand y are not perfect.ei : residual = the difference between the predicted value y andthe actual value yi .

BUS 735: Business Decision Making and Research Regression Analysis

Page 47: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Multiple Regression 8/ 23

Multiple regression line (population):

yi = β0 + β1x1,i + β2x2 + ...+ βk−1xk−1 + εi

Multiple regression line (sample):

yi = b0 + b1x1,i + b2x2 + ...+ bkxk + ei

k : number of parameters (coefficients) you are estimating.εi : error term, since linear relationship between the x variablesand y are not perfect.ei : residual = the difference between the predicted value y andthe actual value yi .

BUS 735: Business Decision Making and Research Regression Analysis

Page 48: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Multiple Regression 8/ 23

Multiple regression line (population):

yi = β0 + β1x1,i + β2x2 + ...+ βk−1xk−1 + εi

Multiple regression line (sample):

yi = b0 + b1x1,i + b2x2 + ...+ bkxk + ei

k : number of parameters (coefficients) you are estimating.εi : error term, since linear relationship between the x variablesand y are not perfect.ei : residual = the difference between the predicted value y andthe actual value yi .

BUS 735: Business Decision Making and Research Regression Analysis

Page 49: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Least Squares Estimate 9/ 23

How should we obtain the “best fitting line”.

Ordinary least squares (OLS) method.

Choose sample estimates for the regression coefficients thatminimizes:

n∑i=0

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 50: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Least Squares Estimate 9/ 23

How should we obtain the “best fitting line”.

Ordinary least squares (OLS) method.

Choose sample estimates for the regression coefficients thatminimizes:

n∑i=0

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 51: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Least Squares Estimate 9/ 23

How should we obtain the “best fitting line”.

Ordinary least squares (OLS) method.

Choose sample estimates for the regression coefficients thatminimizes:

n∑i=0

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 52: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Least Squares Estimate 9/ 23

How should we obtain the “best fitting line”.

Ordinary least squares (OLS) method.

Choose sample estimates for the regression coefficients thatminimizes:

n∑i=0

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 53: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 54: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 55: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 56: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 57: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 58: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 59: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 60: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 61: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Interpreting the slope 10/ 23

Interpreting the slope, β: amount the y is predicted toincrease when increasing x by one unit.

When β < 0 there is a negative linear relationship.

When β > 0 there is a positive linear relationship.

When β = 0 there is no linear relationship between x and y .

SPSS reports sample estimates for coefficients, along with...

Estimates of the standard errors.T-test statistics for H0 : β = 0.P-values of the T-tests.Confidence intervals for the coefficients.

BUS 735: Business Decision Making and Research Regression Analysis

Page 62: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 11/ 23

Data from 1960 about public expenditures per capita, andvariables that may influence it.

In SPSS, choose Analyze menu and select Regression andLinear.

Select EX (Expenditure per capita) as your dependentvariable. This is the variable your are interested in explaining.

Select your independent (aka explanatory) variables. Theseare the variables that you think can explain the dependentvariable. I suggest you select these:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

BUS 735: Business Decision Making and Research Regression Analysis

Page 63: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 11/ 23

Data from 1960 about public expenditures per capita, andvariables that may influence it.

In SPSS, choose Analyze menu and select Regression andLinear.

Select EX (Expenditure per capita) as your dependentvariable. This is the variable your are interested in explaining.

Select your independent (aka explanatory) variables. Theseare the variables that you think can explain the dependentvariable. I suggest you select these:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

BUS 735: Business Decision Making and Research Regression Analysis

Page 64: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 11/ 23

Data from 1960 about public expenditures per capita, andvariables that may influence it.

In SPSS, choose Analyze menu and select Regression andLinear.

Select EX (Expenditure per capita) as your dependentvariable. This is the variable your are interested in explaining.

Select your independent (aka explanatory) variables. Theseare the variables that you think can explain the dependentvariable. I suggest you select these:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

BUS 735: Business Decision Making and Research Regression Analysis

Page 65: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 11/ 23

Data from 1960 about public expenditures per capita, andvariables that may influence it.

In SPSS, choose Analyze menu and select Regression andLinear.

Select EX (Expenditure per capita) as your dependentvariable. This is the variable your are interested in explaining.

Select your independent (aka explanatory) variables. Theseare the variables that you think can explain the dependentvariable. I suggest you select these:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

BUS 735: Business Decision Making and Research Regression Analysis

Page 66: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 11/ 23

Data from 1960 about public expenditures per capita, andvariables that may influence it.

In SPSS, choose Analyze menu and select Regression andLinear.

Select EX (Expenditure per capita) as your dependentvariable. This is the variable your are interested in explaining.

Select your independent (aka explanatory) variables. Theseare the variables that you think can explain the dependentvariable. I suggest you select these:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

BUS 735: Business Decision Making and Research Regression Analysis

Page 67: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 12/ 23

If the percentage of the population living in metropolitanareas in expected to increase by 1%, what change should weexpect in public expenditure?

Is this change statistically significantly different from zero?

Accounting for economic ability, metropolitan population, andpopulation growth, how much more to Western states spendon public expenditure per capita?

BUS 735: Business Decision Making and Research Regression Analysis

Page 68: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 12/ 23

If the percentage of the population living in metropolitanareas in expected to increase by 1%, what change should weexpect in public expenditure?

Is this change statistically significantly different from zero?

Accounting for economic ability, metropolitan population, andpopulation growth, how much more to Western states spendon public expenditure per capita?

BUS 735: Business Decision Making and Research Regression Analysis

Page 69: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 12/ 23

If the percentage of the population living in metropolitanareas in expected to increase by 1%, what change should weexpect in public expenditure?

Is this change statistically significantly different from zero?

Accounting for economic ability, metropolitan population, andpopulation growth, how much more to Western states spendon public expenditure per capita?

BUS 735: Business Decision Making and Research Regression Analysis

Page 70: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Sum of Squares Measures of Variation 13/ 23

Sum of Squares Regression (SSR): measure of the amountof variability in the dependent (Y) variable that is explainedby the independent variables (X’s).

SSR =n∑

i=1

(yi − y)2

Sum of Squares Error (SSE): measure of the unexplainedvariability in the dependent variable.

SSE =n∑

i=1

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 71: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Sum of Squares Measures of Variation 13/ 23

Sum of Squares Regression (SSR): measure of the amountof variability in the dependent (Y) variable that is explainedby the independent variables (X’s).

SSR =n∑

i=1

(yi − y)2

Sum of Squares Error (SSE): measure of the unexplainedvariability in the dependent variable.

SSE =n∑

i=1

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 72: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Sum of Squares Measures of Variation 13/ 23

Sum of Squares Regression (SSR): measure of the amountof variability in the dependent (Y) variable that is explainedby the independent variables (X’s).

SSR =n∑

i=1

(yi − y)2

Sum of Squares Error (SSE): measure of the unexplainedvariability in the dependent variable.

SSE =n∑

i=1

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 73: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Sum of Squares Measures of Variation 13/ 23

Sum of Squares Regression (SSR): measure of the amountof variability in the dependent (Y) variable that is explainedby the independent variables (X’s).

SSR =n∑

i=1

(yi − y)2

Sum of Squares Error (SSE): measure of the unexplainedvariability in the dependent variable.

SSE =n∑

i=1

(yi − yi )2

BUS 735: Business Decision Making and Research Regression Analysis

Page 74: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Sum of Squares Measures of Variation 14/ 23

Sum of Squares Total (SST): measure of the totalvariability in the dependent variable.

SST =n∑

i=1

(yi − y)2

SST = SSR + SSE.

BUS 735: Business Decision Making and Research Regression Analysis

Page 75: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Sum of Squares Measures of Variation 14/ 23

Sum of Squares Total (SST): measure of the totalvariability in the dependent variable.

SST =n∑

i=1

(yi − y)2

SST = SSR + SSE.

BUS 735: Business Decision Making and Research Regression Analysis

Page 76: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Sum of Squares Measures of Variation 14/ 23

Sum of Squares Total (SST): measure of the totalvariability in the dependent variable.

SST =n∑

i=1

(yi − y)2

SST = SSR + SSE.

BUS 735: Business Decision Making and Research Regression Analysis

Page 77: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Coefficient of determination 15/ 23

The coefficient of determination is the percentage ofvariability in y that is explained by x .

R2 =SSR

SST

R2 will always be between 0 and 1. The closer R2 is to 1, thebetter x is able to explain y .

The more variables you add to the regression, the higher R2

will be.

BUS 735: Business Decision Making and Research Regression Analysis

Page 78: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Coefficient of determination 15/ 23

The coefficient of determination is the percentage ofvariability in y that is explained by x .

R2 =SSR

SST

R2 will always be between 0 and 1. The closer R2 is to 1, thebetter x is able to explain y .

The more variables you add to the regression, the higher R2

will be.

BUS 735: Business Decision Making and Research Regression Analysis

Page 79: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Coefficient of determination 15/ 23

The coefficient of determination is the percentage ofvariability in y that is explained by x .

R2 =SSR

SST

R2 will always be between 0 and 1. The closer R2 is to 1, thebetter x is able to explain y .

The more variables you add to the regression, the higher R2

will be.

BUS 735: Business Decision Making and Research Regression Analysis

Page 80: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Coefficient of determination 15/ 23

The coefficient of determination is the percentage ofvariability in y that is explained by x .

R2 =SSR

SST

R2 will always be between 0 and 1. The closer R2 is to 1, thebetter x is able to explain y .

The more variables you add to the regression, the higher R2

will be.

BUS 735: Business Decision Making and Research Regression Analysis

Page 81: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Adjusted R216/ 23

R2 will likely increase (slightly) even by adding nonsensevariables.

Adding such variables increases in-sample fit, but will likelyhurt out-of-sample forecasting accuracy.

The Adjusted R2 penalizes R2 for additional variables.

R2adj = 1 − n − 1

n − k − 1

(1 − R2

)When the adjusted R2 increases when adding a variable, thenthe additional variable really did help explain the dependentvariable.

When the adjusted R2 decreases when adding a variable, thenthe additional variable does not help explain the dependentvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 82: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Adjusted R216/ 23

R2 will likely increase (slightly) even by adding nonsensevariables.

Adding such variables increases in-sample fit, but will likelyhurt out-of-sample forecasting accuracy.

The Adjusted R2 penalizes R2 for additional variables.

R2adj = 1 − n − 1

n − k − 1

(1 − R2

)When the adjusted R2 increases when adding a variable, thenthe additional variable really did help explain the dependentvariable.

When the adjusted R2 decreases when adding a variable, thenthe additional variable does not help explain the dependentvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 83: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Adjusted R216/ 23

R2 will likely increase (slightly) even by adding nonsensevariables.

Adding such variables increases in-sample fit, but will likelyhurt out-of-sample forecasting accuracy.

The Adjusted R2 penalizes R2 for additional variables.

R2adj = 1 − n − 1

n − k − 1

(1 − R2

)When the adjusted R2 increases when adding a variable, thenthe additional variable really did help explain the dependentvariable.

When the adjusted R2 decreases when adding a variable, thenthe additional variable does not help explain the dependentvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 84: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Adjusted R216/ 23

R2 will likely increase (slightly) even by adding nonsensevariables.

Adding such variables increases in-sample fit, but will likelyhurt out-of-sample forecasting accuracy.

The Adjusted R2 penalizes R2 for additional variables.

R2adj = 1 − n − 1

n − k − 1

(1 − R2

)When the adjusted R2 increases when adding a variable, thenthe additional variable really did help explain the dependentvariable.

When the adjusted R2 decreases when adding a variable, thenthe additional variable does not help explain the dependentvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 85: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Adjusted R216/ 23

R2 will likely increase (slightly) even by adding nonsensevariables.

Adding such variables increases in-sample fit, but will likelyhurt out-of-sample forecasting accuracy.

The Adjusted R2 penalizes R2 for additional variables.

R2adj = 1 − n − 1

n − k − 1

(1 − R2

)When the adjusted R2 increases when adding a variable, thenthe additional variable really did help explain the dependentvariable.

When the adjusted R2 decreases when adding a variable, thenthe additional variable does not help explain the dependentvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 86: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Adjusted R216/ 23

R2 will likely increase (slightly) even by adding nonsensevariables.

Adding such variables increases in-sample fit, but will likelyhurt out-of-sample forecasting accuracy.

The Adjusted R2 penalizes R2 for additional variables.

R2adj = 1 − n − 1

n − k − 1

(1 − R2

)When the adjusted R2 increases when adding a variable, thenthe additional variable really did help explain the dependentvariable.

When the adjusted R2 decreases when adding a variable, thenthe additional variable does not help explain the dependentvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 87: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

F-test for Regression Fit 17/ 23

F-test for Regression Fit: Tests if the regression line explainsthe data.

Very, very, very similar to ANOVA F-test.

H0 : β1 = β2 = ... = βk = 0.

H1 : At least one of the variables has explanatory power (i.e.at least one coefficient is not equal to zero).

F =SSR/(k − 1)

SSE/(n − k)

Where k is the number of explanatory variables.

BUS 735: Business Decision Making and Research Regression Analysis

Page 88: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

F-test for Regression Fit 17/ 23

F-test for Regression Fit: Tests if the regression line explainsthe data.

Very, very, very similar to ANOVA F-test.

H0 : β1 = β2 = ... = βk = 0.

H1 : At least one of the variables has explanatory power (i.e.at least one coefficient is not equal to zero).

F =SSR/(k − 1)

SSE/(n − k)

Where k is the number of explanatory variables.

BUS 735: Business Decision Making and Research Regression Analysis

Page 89: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

F-test for Regression Fit 17/ 23

F-test for Regression Fit: Tests if the regression line explainsthe data.

Very, very, very similar to ANOVA F-test.

H0 : β1 = β2 = ... = βk = 0.

H1 : At least one of the variables has explanatory power (i.e.at least one coefficient is not equal to zero).

F =SSR/(k − 1)

SSE/(n − k)

Where k is the number of explanatory variables.

BUS 735: Business Decision Making and Research Regression Analysis

Page 90: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

F-test for Regression Fit 17/ 23

F-test for Regression Fit: Tests if the regression line explainsthe data.

Very, very, very similar to ANOVA F-test.

H0 : β1 = β2 = ... = βk = 0.

H1 : At least one of the variables has explanatory power (i.e.at least one coefficient is not equal to zero).

F =SSR/(k − 1)

SSE/(n − k)

Where k is the number of explanatory variables.

BUS 735: Business Decision Making and Research Regression Analysis

Page 91: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

F-test for Regression Fit 17/ 23

F-test for Regression Fit: Tests if the regression line explainsthe data.

Very, very, very similar to ANOVA F-test.

H0 : β1 = β2 = ... = βk = 0.

H1 : At least one of the variables has explanatory power (i.e.at least one coefficient is not equal to zero).

F =SSR/(k − 1)

SSE/(n − k)

Where k is the number of explanatory variables.

BUS 735: Business Decision Making and Research Regression Analysis

Page 92: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

F-test for Regression Fit 17/ 23

F-test for Regression Fit: Tests if the regression line explainsthe data.

Very, very, very similar to ANOVA F-test.

H0 : β1 = β2 = ... = βk = 0.

H1 : At least one of the variables has explanatory power (i.e.at least one coefficient is not equal to zero).

F =SSR/(k − 1)

SSE/(n − k)

Where k is the number of explanatory variables.

BUS 735: Business Decision Making and Research Regression Analysis

Page 93: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 18/ 23

In the previous example, how much of the variability in publicexpenditure is explained by the following four variables:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

Is the combination of these variables significant in explainingpublic expenditure?

Re-run the regression, this time also including:

YOUNG: Percentage of population that is young.OLD: Percentage of population that is old.

BUS 735: Business Decision Making and Research Regression Analysis

Page 94: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 18/ 23

In the previous example, how much of the variability in publicexpenditure is explained by the following four variables:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

Is the combination of these variables significant in explainingpublic expenditure?

Re-run the regression, this time also including:

YOUNG: Percentage of population that is young.OLD: Percentage of population that is old.

BUS 735: Business Decision Making and Research Regression Analysis

Page 95: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 18/ 23

In the previous example, how much of the variability in publicexpenditure is explained by the following four variables:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

Is the combination of these variables significant in explainingpublic expenditure?

Re-run the regression, this time also including:

YOUNG: Percentage of population that is young.OLD: Percentage of population that is old.

BUS 735: Business Decision Making and Research Regression Analysis

Page 96: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 18/ 23

In the previous example, how much of the variability in publicexpenditure is explained by the following four variables:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

Is the combination of these variables significant in explainingpublic expenditure?

Re-run the regression, this time also including:

YOUNG: Percentage of population that is young.OLD: Percentage of population that is old.

BUS 735: Business Decision Making and Research Regression Analysis

Page 97: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 18/ 23

In the previous example, how much of the variability in publicexpenditure is explained by the following four variables:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

Is the combination of these variables significant in explainingpublic expenditure?

Re-run the regression, this time also including:

YOUNG: Percentage of population that is young.OLD: Percentage of population that is old.

BUS 735: Business Decision Making and Research Regression Analysis

Page 98: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 18/ 23

In the previous example, how much of the variability in publicexpenditure is explained by the following four variables:

ECAB: Economic AbilityMET: MetropolitanGROW: Growth rate of populationWEST: Western state = 1.

Is the combination of these variables significant in explainingpublic expenditure?

Re-run the regression, this time also including:

YOUNG: Percentage of population that is young.OLD: Percentage of population that is old.

BUS 735: Business Decision Making and Research Regression Analysis

Page 99: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 19/ 23

What happened to the coefficient of determination?

What happened to the adjusted coefficient of determination?What is your interpretation?

What happened to the estimated effect of the other variables:metropolitan area? Western state?

BUS 735: Business Decision Making and Research Regression Analysis

Page 100: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 19/ 23

What happened to the coefficient of determination?

What happened to the adjusted coefficient of determination?What is your interpretation?

What happened to the estimated effect of the other variables:metropolitan area? Western state?

BUS 735: Business Decision Making and Research Regression Analysis

Page 101: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Single Variable RegressionMultiple RegressionVariance Decomposition

Example: Public Expenditure 19/ 23

What happened to the coefficient of determination?

What happened to the adjusted coefficient of determination?What is your interpretation?

What happened to the estimated effect of the other variables:metropolitan area? Western state?

BUS 735: Business Decision Making and Research Regression Analysis

Page 102: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Assumptions from the CLT 20/ 23

Using the normal distribution to compute p-values depends onresults from the Central Limit Theorem.

Sufficiently large sample size (much more than 30).

Useful for normality result from the Central Limit TheoremAlso necessary as you increase the number of explanatoryvariables.

Normally distributed dependent and independent variables

Useful for small sample sizes, but not essential as sample sizeincreases.

Types of data:

Dependent variable must be interval or ratio.Independent variable can be interval, ratio, or a dummyvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 103: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Assumptions from the CLT 20/ 23

Using the normal distribution to compute p-values depends onresults from the Central Limit Theorem.

Sufficiently large sample size (much more than 30).

Useful for normality result from the Central Limit TheoremAlso necessary as you increase the number of explanatoryvariables.

Normally distributed dependent and independent variables

Useful for small sample sizes, but not essential as sample sizeincreases.

Types of data:

Dependent variable must be interval or ratio.Independent variable can be interval, ratio, or a dummyvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 104: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Assumptions from the CLT 20/ 23

Using the normal distribution to compute p-values depends onresults from the Central Limit Theorem.

Sufficiently large sample size (much more than 30).

Useful for normality result from the Central Limit TheoremAlso necessary as you increase the number of explanatoryvariables.

Normally distributed dependent and independent variables

Useful for small sample sizes, but not essential as sample sizeincreases.

Types of data:

Dependent variable must be interval or ratio.Independent variable can be interval, ratio, or a dummyvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 105: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Assumptions from the CLT 20/ 23

Using the normal distribution to compute p-values depends onresults from the Central Limit Theorem.

Sufficiently large sample size (much more than 30).

Useful for normality result from the Central Limit TheoremAlso necessary as you increase the number of explanatoryvariables.

Normally distributed dependent and independent variables

Useful for small sample sizes, but not essential as sample sizeincreases.

Types of data:

Dependent variable must be interval or ratio.Independent variable can be interval, ratio, or a dummyvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 106: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Assumptions from the CLT 20/ 23

Using the normal distribution to compute p-values depends onresults from the Central Limit Theorem.

Sufficiently large sample size (much more than 30).

Useful for normality result from the Central Limit TheoremAlso necessary as you increase the number of explanatoryvariables.

Normally distributed dependent and independent variables

Useful for small sample sizes, but not essential as sample sizeincreases.

Types of data:

Dependent variable must be interval or ratio.Independent variable can be interval, ratio, or a dummyvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 107: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Assumptions from the CLT 20/ 23

Using the normal distribution to compute p-values depends onresults from the Central Limit Theorem.

Sufficiently large sample size (much more than 30).

Useful for normality result from the Central Limit TheoremAlso necessary as you increase the number of explanatoryvariables.

Normally distributed dependent and independent variables

Useful for small sample sizes, but not essential as sample sizeincreases.

Types of data:

Dependent variable must be interval or ratio.Independent variable can be interval, ratio, or a dummyvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 108: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Assumptions from the CLT 20/ 23

Using the normal distribution to compute p-values depends onresults from the Central Limit Theorem.

Sufficiently large sample size (much more than 30).

Useful for normality result from the Central Limit TheoremAlso necessary as you increase the number of explanatoryvariables.

Normally distributed dependent and independent variables

Useful for small sample sizes, but not essential as sample sizeincreases.

Types of data:

Dependent variable must be interval or ratio.Independent variable can be interval, ratio, or a dummyvariable.

BUS 735: Business Decision Making and Research Regression Analysis

Page 109: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 110: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 111: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 112: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 113: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 114: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 115: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 116: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 117: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 118: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 119: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Crucial Assumptions for Regression 21/ 23

Linearity: a straight line reasonably describes the data.

Exceptions: experience on productivity, ordinal data likeeducation level on income.Consider transforming variables.

Stationarity:

The central limit theorem: behavior of statistics as sample sizeapproaches infinity!The mean and variance must exist and be constant.Big issue in economic and financial time series.

Exogeneity of explanatory variables.

Dependent variable must not influence explanatory variables.Explanatory variables must not be influenced by excludedvariables that can influence dependent variable.Example problem: how does advertising affect sales?

BUS 735: Business Decision Making and Research Regression Analysis

Page 120: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 121: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 122: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 123: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 124: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 125: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 126: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 127: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Multicollinearity 22/ 23

Multicollinearity: when two or more of the explanatoryvariables are highly correlated.

With multicollinearity, it is difficult to determine the effectcoming from a specific individual variable.

Correlated variables will have standard errors for coefficientswill be large (coefficients will be statistically insignificant).

Examples:experience and age used to predict productivitysize of store (sq feet) and store sales used to predict demandfor inventories.parent’s income and parent’s education used to predict studentperformance.

Perfect multicollinearity - when two variables are perfectlycorrelated.

BUS 735: Business Decision Making and Research Regression Analysis

Page 128: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis

Page 129: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis

Page 130: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis

Page 131: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis

Page 132: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis

Page 133: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis

Page 134: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis

Page 135: Regression Analysis · 2015-06-03 · Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate a linear relationship between many variables

Relationships Between Two VariablesRegression

Assumptions

Assumptions from the CLTCrucial Assumptions for RegressionMulticollinearityHomoscedasticity

Homoscedasticity 23/ 23

Homoscedasticity: when the variance of the error term isconstant (it does not depend on other variables).

Counter examples (heteroscedasticity):

Impact of income on demand for houses.Many economic and financial variables related to income sufferfrom this.

Heteroscedasticity is not too problematic:

Estimates will still be unbiased.Your standard errors will be downward biased (reject morethan you should).

May be evidence of a bigger problem: linearity or stationarity.

BUS 735: Business Decision Making and Research Regression Analysis