slides by john loucks st. edward’s university

55
1 © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University

Upload: leia

Post on 15-Feb-2016

23 views

Category:

Documents


3 download

DESCRIPTION

Slides by JOHN LOUCKS St. Edward’s University. Chapter 14 Simple Linear Regression. Simple Linear Regression Model. Least Squares Method. Coefficient of Determination. Model Assumptions. Testing for Significance. Using the Estimated Regression Equation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Slides by JOHN LOUCKS St. Edward’s University

1 Slide

© 2008 Thomson South-Western. All Rights Reserved

Slides byJOHN

LOUCKSSt. Edward’sUniversity

Page 2: Slides by JOHN LOUCKS St. Edward’s University

2 Slide

© 2008 Thomson South-Western. All Rights Reserved

Chapter 14 Simple Linear Regression

Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression

Equation for Estimation and Prediction Residual Analysis: Validating Model Assumptions Outliers and Influential Observations

Page 3: Slides by JOHN LOUCKS St. Edward’s University

3 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression

Regression analysis can be used to develop an equation showing how the variables are related.

Managerial decisions often are based on the relationship between two or more variables.

The variables being used to predict the value of the dependent variable are called the independent variables and are denoted by x.

The variable being predicted is called the dependent variable and is denoted by y.

Page 4: Slides by JOHN LOUCKS St. Edward’s University

4 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression

The relationship between the two variables is approximated by a straight line.

Simple linear regression involves one independent variable and one dependent variable.

Regression analysis involving two or more independent variables is called multiple regression.

Page 5: Slides by JOHN LOUCKS St. Edward’s University

5 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression Model

y = b0 + b1x +e

where: b0 and b1 are called parameters of the model, e is a random variable called the error term.

The simple linear regression model is:

The equation that describes how y is related to x and an error term is called the regression model.

Page 6: Slides by JOHN LOUCKS St. Edward’s University

6 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression Equation

The simple linear regression equation is:

• E(y) is the expected value of y for a given x value.• b1 is the slope of the regression line.• b0 is the y intercept of the regression line.• Graph of the regression equation is a straight line.

E(y) = b0 + b1x

Page 7: Slides by JOHN LOUCKS St. Edward’s University

7 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression Equation

Positive Linear Relationship

E(y)

x

Slope b1is positive

Regression line

Intercept b0

Page 8: Slides by JOHN LOUCKS St. Edward’s University

8 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression Equation

Negative Linear Relationship

E(y)

x

Slope b1is negative

Regression lineIntercept b0

Page 9: Slides by JOHN LOUCKS St. Edward’s University

9 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression Equation

No Relationship

E(y)

x

Slope b1is 0

Regression lineIntercept

b0

Page 10: Slides by JOHN LOUCKS St. Edward’s University

10 Slide

© 2008 Thomson South-Western. All Rights Reserved

Estimated Simple Linear Regression Equation

The estimated simple linear regression equation

0 1y b b x

• is the estimated value of y for a given x value.y• b1 is the slope of the line.• b0 is the y intercept of the line.• The graph is called the estimated regression line.

Page 11: Slides by JOHN LOUCKS St. Edward’s University

11 Slide

© 2008 Thomson South-Western. All Rights Reserved

Estimation Process

Regression Modely = b0 + b1x +e

Regression EquationE(y) = b0 + b1x

Unknown Parametersb0, b1

Sample Data:x yx1 y1. . . . xn yn

b0 and b1provide estimates of

b0 and b1

EstimatedRegression Equation

Sample Statistics

b0, b1

0 1y b b x

Page 12: Slides by JOHN LOUCKS St. Edward’s University

12 Slide

© 2008 Thomson South-Western. All Rights Reserved

Least Squares Method

Least Squares Criterion

min (y yi i )2

where:yi = observed value of the dependent variable for the ith observation

^yi = estimated value of the dependent variable for the ith observation

Page 13: Slides by JOHN LOUCKS St. Edward’s University

13 Slide

© 2008 Thomson South-Western. All Rights Reserved

Slope for the Estimated Regression Equation

1 2( )( )

( )i i

i

x x y yb

x x

Least Squares Method

where:xi = value of independent variable for ith observation

_y = mean value for dependent variable

_x = mean value for independent variable

yi = value of dependent variable for ith observation

Page 14: Slides by JOHN LOUCKS St. Edward’s University

14 Slide

© 2008 Thomson South-Western. All Rights Reserved

y-Intercept for the Estimated Regression Equation

Least Squares Method

0 1b y b x

Page 15: Slides by JOHN LOUCKS St. Edward’s University

15 Slide

© 2008 Thomson South-Western. All Rights Reserved

Reed Auto periodically hasa special week-long sale. As part of the advertisingcampaign Reed runs one ormore television commercialsduring the weekend preceding the sale. Data

from asample of 5 previous sales are shown on the next

slide.

Simple Linear Regression

Example: Reed Auto Sales

Page 16: Slides by JOHN LOUCKS St. Edward’s University

16 Slide

© 2008 Thomson South-Western. All Rights Reserved

Simple Linear Regression

Example: Reed Auto Sales

Number of TV Ads (x)

Number ofCars Sold (y)

13213

1424181727

Sx = 10 Sy = 1002x 20y

Page 17: Slides by JOHN LOUCKS St. Edward’s University

17 Slide

© 2008 Thomson South-Western. All Rights Reserved

Estimated Regression Equation

ˆ 10 5y x

1 2( )( ) 20 5( ) 4

i i

i

x x y yb

x x

0 1 20 5(2) 10b y b x

Slope for the Estimated Regression Equation

y-Intercept for the Estimated Regression Equation

Estimated Regression Equation

Page 18: Slides by JOHN LOUCKS St. Edward’s University

18 Slide

© 2008 Thomson South-Western. All Rights Reserved

Scatter Diagram and Trend Line

y = 5x + 10

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Car

s So

ld

Page 19: Slides by JOHN LOUCKS St. Edward’s University

19 Slide

© 2008 Thomson South-Western. All Rights Reserved

Coefficient of Determination

Relationship Among SST, SSR, SSE

where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

SST = SSR + SSE

2( )iy y 2ˆ( )iy y 2ˆ( )i iy y

Page 20: Slides by JOHN LOUCKS St. Edward’s University

20 Slide

© 2008 Thomson South-Western. All Rights Reserved

The coefficient of determination is:

Coefficient of Determination

where:SSR = sum of squares due to regressionSST = total sum of squares

r2 = SSR/SST

Page 21: Slides by JOHN LOUCKS St. Edward’s University

21 Slide

© 2008 Thomson South-Western. All Rights Reserved

Coefficient of Determination

r2 = SSR/SST = 100/114 = .8772 The regression relationship is very strong; 87.7%of the variability in the number of cars sold can beexplained by the linear relationship between thenumber of TV ads and the number of cars sold.

Page 22: Slides by JOHN LOUCKS St. Edward’s University

22 Slide

© 2008 Thomson South-Western. All Rights Reserved

Sample Correlation Coefficient

21 ) of(sign rbrxy

ionDeterminat oft Coefficien ) of(sign 1brxy

where: b1 = the slope of the estimated regression equation xbby 10ˆ

Page 23: Slides by JOHN LOUCKS St. Edward’s University

23 Slide

© 2008 Thomson South-Western. All Rights Reserved

21 ) of(sign rbrxy

The sign of b1 in the equation is “+”.ˆ 10 5y x

=+ .8772xyr

Sample Correlation Coefficient

rxy = +.9366

Page 24: Slides by JOHN LOUCKS St. Edward’s University

24 Slide

© 2008 Thomson South-Western. All Rights Reserved

Assumptions About the Error Term e

1. The error e is a random variable with mean of zero.

2. The variance of e , denoted by 2, is the same for all values of the independent variable.

3. The values of e are independent.

4. The error e is a normally distributed random variable.

Page 25: Slides by JOHN LOUCKS St. Edward’s University

25 Slide

© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance

To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero.

Two tests are commonly used:t Test and F Test

Both the t test and F test require an estimate of 2, the variance of e in the regression model.

Page 26: Slides by JOHN LOUCKS St. Edward’s University

26 Slide

© 2008 Thomson South-Western. All Rights Reserved

An Estimate of 2

Testing for Significance

210

2 )()ˆ(SSE iiii xbbyyy

where:

s 2 = MSE = SSE/(n 2)

The mean square error (MSE) provides the estimateof 2, and the notation s2 is also used.

Page 27: Slides by JOHN LOUCKS St. Edward’s University

27 Slide

© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance

An Estimate of

2SSEMSE

n

s

• To estimate we take the square root of 2.• The resulting s is called the standard error of the estimate.

Page 28: Slides by JOHN LOUCKS St. Edward’s University

28 Slide

© 2008 Thomson South-Western. All Rights Reserved

Hypotheses

Test Statistic

Testing for Significance: t Test

0 1: 0H b

1: 0aH b

1

1

b

bts

where1 2( )b

i

ssx x

S

Page 29: Slides by JOHN LOUCKS St. Edward’s University

29 Slide

© 2008 Thomson South-Western. All Rights Reserved

Rejection Rule

Testing for Significance: t Test

where: t is based on a t distributionwith n - 2 degrees of freedom

Reject H0 if p-value < or t < -t or t > t

Page 30: Slides by JOHN LOUCKS St. Edward’s University

30 Slide

© 2008 Thomson South-Western. All Rights Reserved

1. Determine the hypotheses.

2. Specify the level of significance.

3. Select the test statistic.

= .05

4. State the rejection rule.Reject H0 if p-value < .05or |t| > 3.182 (with

3 degrees of freedom)

Testing for Significance: t Test

0 1: 0H b

1: 0aH b

1

1

b

bts

Page 31: Slides by JOHN LOUCKS St. Edward’s University

31 Slide

© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: t Test

5. Compute the value of the test statistic.

6. Determine whether to reject H0.t = 4.541 provides an area of .01 in the uppertail. Hence, the p-value is less than .02. (Also,t = 4.63 > 3.182.) We can reject H0.

1

1 5 4.631.08b

bts

Page 32: Slides by JOHN LOUCKS St. Edward’s University

32 Slide

© 2008 Thomson South-Western. All Rights Reserved

Confidence Interval for b1

H0 is rejected if the hypothesized value of b1 is not included in the confidence interval for b1.

We can use a 95% confidence interval for b1 to test the hypotheses just used in the t test.

Page 33: Slides by JOHN LOUCKS St. Edward’s University

33 Slide

© 2008 Thomson South-Western. All Rights Reserved

The form of a confidence interval for b1 is:

Confidence Interval for b1

11 / 2 bb t s

where is the t value providing an areaof /2 in the upper tail of a t distributionwith n - 2 degrees of freedom

2/tb1 is the

pointestimat

or

is themarginof error

1/ 2 bt s

Page 34: Slides by JOHN LOUCKS St. Edward’s University

34 Slide

© 2008 Thomson South-Western. All Rights Reserved

Confidence Interval for b1

Reject H0 if 0 is not included inthe confidence interval for b1.

0 is not included in the confidence interval. Reject H0

= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb

or 1.56 to 8.44

Rejection Rule

95% Confidence Interval for b1

Conclusion

Page 35: Slides by JOHN LOUCKS St. Edward’s University

35 Slide

© 2008 Thomson South-Western. All Rights Reserved

Hypotheses

Test Statistic

Testing for Significance: F Test

F = MSR/MSE

0 1: 0H b

1: 0aH b

Page 36: Slides by JOHN LOUCKS St. Edward’s University

36 Slide

© 2008 Thomson South-Western. All Rights Reserved

Rejection Rule

Testing for Significance: F Test

where:F is based on an F distribution with1 degree of freedom in the numerator andn - 2 degrees of freedom in the denominator

Reject H0 if p-value <

or F > F

Page 37: Slides by JOHN LOUCKS St. Edward’s University

37 Slide

© 2008 Thomson South-Western. All Rights Reserved

1. Determine the hypotheses.

2. Specify the level of significance.

3. Select the test statistic.

= .05

4. State the rejection rule.Reject H0 if p-value < .05or F > 10.13 (with 1 d.f.

in numerator and 3 d.f. in denominator)

Testing for Significance: F Test

0 1: 0H b

1: 0aH b

F = MSR/MSE

Page 38: Slides by JOHN LOUCKS St. Edward’s University

38 Slide

© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: F Test

5. Compute the value of the test statistic.

6. Determine whether to reject H0. F = 17.44 provides an area of .025 in the upper tail. Thus, the p-value corresponding to F = 21.43 is less than 2(.025) = .05. Hence, we reject H0.

F = MSR/MSE = 100/4.667 = 21.43

The statistical evidence is sufficient to concludethat we have a significant relationship between thenumber of TV ads aired and the number of cars sold.

Page 39: Slides by JOHN LOUCKS St. Edward’s University

39 Slide

© 2008 Thomson South-Western. All Rights Reserved

Some Cautions about theInterpretation of Significance Tests

Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable

us to conclude that there is a linear relationshipbetween x and y.

Rejecting H0: b1 = 0 and concluding that the

relationship between x and y is significant does not enable us to conclude that a cause-and-effect

relationship is present between x and y.

Page 40: Slides by JOHN LOUCKS St. Edward’s University

40 Slide

© 2008 Thomson South-Western. All Rights Reserved

Using the Estimated Regression Equationfor Estimation and Prediction

/ y t sp yp 2

where:confidence coefficient is 1 - andt/2 is based on a t distributionwith n - 2 degrees of freedom

/ 2 indpy t s

Confidence Interval Estimate of E(yp)

Prediction Interval Estimate of yp

Page 41: Slides by JOHN LOUCKS St. Edward’s University

41 Slide

© 2008 Thomson South-Western. All Rights Reserved

If 3 TV ads are run prior to a sale, we expectthe mean number of cars sold to be:

Point Estimation

^y = 10 + 5(3) = 25 cars

Page 42: Slides by JOHN LOUCKS St. Edward’s University

42 Slide

© 2008 Thomson South-Western. All Rights Reserved

Residual Analysis

ˆi iy y

Much of the residual analysis is based on an examination of graphical plots.

Residual for Observation i The residuals provide the best information about e .

If the assumptions about the error term e appear questionable, the hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid.

Page 43: Slides by JOHN LOUCKS St. Edward’s University

43 Slide

© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against x If the assumption that the variance of e is the

same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then

The residual plot should give an overall impression of a horizontal band of points

Page 44: Slides by JOHN LOUCKS St. Edward’s University

44 Slide

© 2008 Thomson South-Western. All Rights Reserved

x

ˆy y

0

Good PatternRe

sidua

l

Residual Plot Against x

Page 45: Slides by JOHN LOUCKS St. Edward’s University

45 Slide

© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against x

x

ˆy y

0

Resid

ual

Nonconstant Variance

Page 46: Slides by JOHN LOUCKS St. Edward’s University

46 Slide

© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against x

x

ˆy y

0

Resid

ual

Model Form Not Adequate

Page 47: Slides by JOHN LOUCKS St. Edward’s University

47 Slide

© 2008 Thomson South-Western. All Rights Reserved

Residuals

Residual Plot Against x

Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2

Page 48: Slides by JOHN LOUCKS St. Edward’s University

48 Slide

© 2008 Thomson South-Western. All Rights Reserved

Residual Plot Against x

TV Ads Residual Plot

-3

-2

-1

0

1

2

3

0 1 2 3 4TV Ads

Resi

dual

s

Page 49: Slides by JOHN LOUCKS St. Edward’s University

49 Slide

© 2008 Thomson South-Western. All Rights Reserved

Standardized Residual for Observation i

Standardized Residuals

ˆ

ˆi i

i i

y y

y ys

ˆ 1i i iy ys s h

2

2( )1( )i

ii

x xhn x x

where:

Page 50: Slides by JOHN LOUCKS St. Edward’s University

50 Slide

© 2008 Thomson South-Western. All Rights Reserved

Standardized Residual Plot

The standardized residual plot can provide insight about the assumption that the error term e has a normal distribution.

If this assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution.

Page 51: Slides by JOHN LOUCKS St. Edward’s University

51 Slide

© 2008 Thomson South-Western. All Rights Reserved

Standardized Residuals

Standardized Residual Plot

Observation Predicted Y Residuals Standard Residuals1 15 -1 -0.5352 25 -1 -0.5353 20 -2 -1.0694 15 2 1.0695 25 2 1.069

Page 52: Slides by JOHN LOUCKS St. Edward’s University

52 Slide

© 2008 Thomson South-Western. All Rights Reserved

Standardized Residual Plot

Standardized Residual Plot

A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y ResidualsStandard Residuals32 1 15 -1 -0.53452233 2 25 -1 -0.53452234 3 20 -2 -1.06904535 4 15 2 1.06904536 5 25 2 1.06904537

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30

Cars Sold

Stan

dard

Res

idua

ls

Page 53: Slides by JOHN LOUCKS St. Edward’s University

53 Slide

© 2008 Thomson South-Western. All Rights Reserved

Standardized Residual Plot

All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that e has a normal distribution.

Page 54: Slides by JOHN LOUCKS St. Edward’s University

54 Slide

© 2008 Thomson South-Western. All Rights Reserved

Outliers and Influential Observations Detecting Outliers

• An outlier is an observation that is unusual in comparison with the other data.

• Minitab classifies an observation as an outlier if its standardized residual value is < -2 or > +2.

• This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier.

• This rule’s shortcoming can be circumvented by using studentized deleted residuals.

• The |i th studentized deleted residual| will be larger than the |i th standardized residual|.

Page 55: Slides by JOHN LOUCKS St. Edward’s University

55 Slide

© 2008 Thomson South-Western. All Rights Reserved

End of Chapter 14