bcor 1020 business statistics lecture 25 – april 22, 2008

26
BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Post on 22-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

BCOR 1020Business Statistics

Lecture 25 – April 22, 2008

Page 2: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Overview

Chapter 12 – Linear Regression– Ordinary Least Squares Formulas– Tests for Significance– Analysis of Variance: Overall Fit– Confidence and Prediction Intervals for Y– Example(s)

Page 3: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Ordinary Least Squares Formulas

• The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small.

• Recall that the residuals are the differences between observed y-values and the fitted y-values on the line…

• The sum of the residuals = 0 for any line…

• So, we consider the sum of the squared residuals (the SSE)…

Slope and Intercept:

iii yye ˆ

Page 4: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Ordinary Least Squares Formulas

• To find our OLS estimators, we need to find the values of b0 and b1 that minimize the SSE…

• The OLS estimator for the slope is:

• The OLS estimator for the intercept is:

Slope and Intercept:

or

These are computed by the regression function on your computer or calculator.

Page 5: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Ordinary Least Squares Formulas

Example (Regression Output):• We will consider the dataset “ShipCost” from your text

(12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).

• Using MegaStat we can generate a regression output (in handout)…

• Demonstration in Excel…

y = 4.9322x - 31.19

R2 = 0.67170

1000

2000

3000

4000

5000

6000

7000

0 500 1000 1500

Orders (X)

Sh

ip C

os

t (Y

)

Page 6: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Ordinary Least Squares Formulas

Example (Regression Output):Regression Analysis

r² 0.672 n 12

r 0.820 k 1

Std. Error 599.029 Dep. Var. Ship Cost (Y)

Regression output confidence interval

variables coefficients std. error t (df=10) p-value 95% lower 95% upper

Intercept -31.1895 1,059.8678 -0.029 .9771 -2,392.7222 2,330.3432

Orders (X) 4.9322 1.0905 4.523 .0011 2.5024 7.3619

ANOVA table

Source SS df MS F p-value

Regression 7,340,819.5514 1 7,340,819.5514 20.46 .0011

Residual 3,588,357.1152 10 358,835.7115

Total 10,929,176.6667 11      

Page 7: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Ordinary Least Squares Formulas

• We want to explain the total variation in Y around its mean (SST for Total Sums of Squares)

• The regression sum of squares (SSR) is the explained variation in Y

Assessing Fit:

Page 8: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Ordinary Least Squares Formulas

Assessing Fit:• The error sum of squares (SSE) is the

unexplained variation in Y

• If the fit is good, SSE will be relatively small compared to SST.

• A perfect fit is indicated by an SSE = 0.• The magnitude of SSE depends on n and on the

units of measurement.

Page 9: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Ordinary Least Squares Formulas

Coefficient of Determination:

0 < R2 < 1

• Often expressed as a percent, an R2 = 1 (i.e., 100%) indicates perfect fit.

• In a bivariate regression, R2 = (r)2

• R2 is a measure of relative fit based on a comparison of SSR and SST.

Page 10: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Clickers

Suppose you are have found the regression model for a given set of bivariate data. If the correlation is r = -0.72, what is the coefficient of determination?

(A) -0.5184

(B) 0.5184

(C) 0.7200

(D) 0.8485

(E) -0.8485

Page 11: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Test for Significance

• The standard error (syx) is an overall measure of model fit.

Standard Error of Regression:

• If the fitted model’s predictions are perfect (SSE = 0), then syx = 0. Thus, a small syx indicates a better fit.

• Used to construct confidence intervals.

• Magnitude of syx depends on the units of measurement of Y and on data magnitude.

Page 12: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Test for Significance

• Standard error of the slope:Confidence Intervals for Slope and Intercept:

• Standard error of the intercept:

• Confidence interval for the true slope:

• Confidence interval for the true intercept:

Page 13: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Test for Significance

• If b1 = 0, then X cannot influence Y and the regression model collapses to a constant b0 plus random error.

• The hypotheses to be tested are:

• These are tested in the standard regression output in any statistics package like MegaStat.

Hypothesis Tests:

Page 14: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Test for Significance

• A t test is used with = n – 2 degrees of freedomThe test statistics for the slope and intercept are:

Hypothesis Tests:

• tn-2 is obtained from Appendix D or Excel for a given .

• Reject H0 if t > t or if p-value < .

• The p-value is provided in the regression output.

Page 15: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Test for Significance

Example (Regression Output):• Let’s revisit the regression output from the dataset

“ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).

• Go through tests for significance on 0 and 1.

Page 16: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Analysis of Variance

• To explain the variation in the dependent variable around its mean, use the formula

Decomposition of Variance:

• This same decomposition for the sums of squares is

• The decomposition of variance is written asSST

(total variation around the

mean)

SSE

(unexplained or error variation)

SSR

(variation explained by the

regression)

= +

Page 17: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Analysis of Variance

• For a bivariate regression, the F statistic is

F Statistic for Overall Fit:

• For a given sample size, a larger F statistic indicates a better fit.

• Reject H0 if F > F1,n-2 from Appendix F for a given significance level or if p-value < .

Page 18: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Analysis of Variance

Example (Regression Output):• Let’s revisit the regression output from the dataset

“ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).

• Go through the Analysis of Variance (ANOVA) to assess overall fit.

Page 19: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Example

Example (Exam Scores):• We will consider the dataset “ExamScores” from your

text (Table 12.3 on p.434) which considers the relationship between Study Hours (X) and Exam Scores (Y).

• Generate MegaStat regression output.• Output on Overhead…

Page 20: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Clickers

If a randomly selected student had studied 12 hours for this exam, what score would this model Predict (to the nearest %)?

(A) 51%

(B) 61%

(C) 73%

(D) 82%

Page 21: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Clickers

Find the p-value on the hypothesis test…

(A) 0.0012

(B) 0.0520

(C) 0.3940

(D) 1.9641

0:

0:

11

10

H

H

Page 22: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Clickers

Recall from Tuesday’s lecture, the critical value for testing whether the correlation is significant is given by

Compute the critical value and determine whether the correlation is significant using = 10%.

(A) Yes, r is significant.(B) No, r is not significant.

22

2,2/

2,2/

nt

tr

n

n

Page 23: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Clickers – Work…

Work…Since n = 10 and = 10%, t/2,n-2 = t.05,8 = 1.860.From the output, r = 0.628.

Since |r| > r, we can reject H0: = 0 in favor of H1: 0.

Or, using …

Since |T*| > t/2,n-2 = t.05,8 = 1.860, we reach the same conclusion. The correlation is significant.

549.0210860.1

860.1

2 22

2,2/

2,2/

nt

tr

n

n

282.2628.0 22 628.01210

12*

r

nrT

Page 24: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Confidence & Prediction Intervals for Y

• The regression line is an estimate of the conditional mean of Y.

• An interval estimate is used to show a range of likely values of the point estimate.

• Confidence Interval for the conditional mean of Y

How to Construct an Interval Estimate for Y

Page 25: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Confidence & Prediction Intervals for Y

How to Construct an Interval Estimate for Y• Prediction interval for individual values of Y is

• Prediction intervals are wider than confidence intervals because individual Y values vary more than the mean of Y.

Page 26: BCOR 1020 Business Statistics Lecture 25 – April 22, 2008

Chapter 12 – Confidence & Prediction Intervals for Y

MegaStat’s Confidence and Prediction Intervals: