slides by john loucks st. edward’s university

1 Slide

© 2008 Thomson South-Western. All Rights Reserved

Slides byJOHN

LOUCKSSt. Edward’sUniversity

2 Slide


Chapter 14 Simple Linear Regression

Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression

Equation for Estimation and Prediction Residual Analysis: Validating Model Assumptions Outliers and Influential Observations

3 Slide


Simple Linear Regression

Regression analysis can be used to develop an equation showing how the variables are related.

Managerial decisions often are based on the relationship between two or more variables.

The variables being used to predict the value of the dependent variable are called the independent variables and are denoted by x.

The variable being predicted is called the dependent variable and is denoted by y.

4 Slide



The relationship between the two variables is approximated by a straight line.

Simple linear regression involves one independent variable and one dependent variable.

Regression analysis involving two or more independent variables is called multiple regression.

5 Slide


Simple Linear Regression Model

y = b0 + b1x +e

where: b0 and b1 are called parameters of the model, e is a random variable called the error term.

The simple linear regression model is:

The equation that describes how y is related to x and an error term is called the regression model.

6 Slide


Simple Linear Regression Equation

The simple linear regression equation is:

• E(y) is the expected value of y for a given x value.• b1 is the slope of the regression line.• b0 is the y intercept of the regression line.• Graph of the regression equation is a straight line.

E(y) = b0 + b1x

7 Slide



Positive Linear Relationship

E(y)

x

Slope b1is positive

Regression line

Intercept b0

8 Slide



Negative Linear Relationship

E(y)

x

Slope b1is negative

Regression lineIntercept b0

9 Slide



No Relationship

E(y)

x

Slope b1is 0

Regression lineIntercept

b0

10 Slide


Estimated Simple Linear Regression Equation

The estimated simple linear regression equation

0 1y b b x

• is the estimated value of y for a given x value.y• b1 is the slope of the line.• b0 is the y intercept of the line.• The graph is called the estimated regression line.

11 Slide


Estimation Process

Regression Modely = b0 + b1x +e

Regression EquationE(y) = b0 + b1x

Unknown Parametersb0, b1

Sample Data:x yx1 y1. . . . xn yn

b0 and b1provide estimates of

b0 and b1

EstimatedRegression Equation

Sample Statistics

b0, b1

0 1y b b x

12 Slide


Least Squares Method

Least Squares Criterion

min (y yi i )2

where:yi = observed value of the dependent variable for the ith observation

^yi = estimated value of the dependent variable for the ith observation

13 Slide


Slope for the Estimated Regression Equation

1 2( )( )

( )i i

i

x x y yb

x x


where:xi = value of independent variable for ith observation

_y = mean value for dependent variable

_x = mean value for independent variable

yi = value of dependent variable for ith observation

14 Slide


y-Intercept for the Estimated Regression Equation


0 1b y b x

15 Slide


Reed Auto periodically hasa special week-long sale. As part of the advertisingcampaign Reed runs one ormore television commercialsduring the weekend preceding the sale. Data

from asample of 5 previous sales are shown on the next

slide.


Example: Reed Auto Sales

16 Slide



Example: Reed Auto Sales

Number of TV Ads (x)

Number ofCars Sold (y)

13213

1424181727

Sx = 10 Sy = 1002x 20y

17 Slide


Estimated Regression Equation

ˆ 10 5y x

1 2( )( ) 20 5( ) 4

i i

i

x x y yb

x x

0 1 20 5(2) 10b y b x

Slope for the Estimated Regression Equation

y-Intercept for the Estimated Regression Equation

Estimated Regression Equation

18 Slide


Scatter Diagram and Trend Line

y = 5x + 10

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Car

s So

ld

19 Slide


Coefficient of Determination

Relationship Among SST, SSR, SSE

where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

SST = SSR + SSE

2( )iy y 2ˆ( )iy y 2ˆ( )i iy y

20 Slide


The coefficient of determination is:


where:SSR = sum of squares due to regressionSST = total sum of squares

r2 = SSR/SST

21 Slide



r2 = SSR/SST = 100/114 = .8772 The regression relationship is very strong; 87.7%of the variability in the number of cars sold can beexplained by the linear relationship between thenumber of TV ads and the number of cars sold.

22 Slide


Sample Correlation Coefficient

21 ) of(sign rbrxy

ionDeterminat oft Coefficien ) of(sign 1brxy

where: b1 = the slope of the estimated regression equation xbby 10ˆ

23 Slide


21 ) of(sign rbrxy

The sign of b1 in the equation is “+”.ˆ 10 5y x

=+ .8772xyr

Sample Correlation Coefficient

rxy = +.9366

24 Slide


Assumptions About the Error Term e

1. The error e is a random variable with mean of zero.

2. The variance of e , denoted by 2, is the same for all values of the independent variable.

3. The values of e are independent.

4. The error e is a normally distributed random variable.

25 Slide


Testing for Significance

To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero.

Two tests are commonly used:t Test and F Test

Both the t test and F test require an estimate of 2, the variance of e in the regression model.

26 Slide


An Estimate of 2


210

2 )()ˆ(SSE iiii xbbyyy

where:

s 2 = MSE = SSE/(n 2)

The mean square error (MSE) provides the estimateof 2, and the notation s2 is also used.

27 Slide



An Estimate of

2SSEMSE

n

s

• To estimate we take the square root of 2.• The resulting s is called the standard error of the estimate.

28 Slide


Hypotheses

Test Statistic

Testing for Significance: t Test

0 1: 0H b

1: 0aH b

1

1

b

bts

where1 2( )b

i

ssx x

S

29 Slide


Rejection Rule


where: t is based on a t distributionwith n - 2 degrees of freedom

Reject H0 if p-value < or t < -t or t > t

30 Slide


1. Determine the hypotheses.

2. Specify the level of significance.

3. Select the test statistic.

= .05

4. State the rejection rule.Reject H0 if p-value < .05or |t| > 3.182 (with

3 degrees of freedom)


0 1: 0H b

1: 0aH b

1

1

b

bts

31 Slide



5. Compute the value of the test statistic.

6. Determine whether to reject H0.t = 4.541 provides an area of .01 in the uppertail. Hence, the p-value is less than .02. (Also,t = 4.63 > 3.182.) We can reject H0.

1

1 5 4.631.08b

bts

32 Slide


Confidence Interval for b1

H0 is rejected if the hypothesized value of b1 is not included in the confidence interval for b1.

We can use a 95% confidence interval for b1 to test the hypotheses just used in the t test.

33 Slide


The form of a confidence interval for b1 is:


11 / 2 bb t s

where is the t value providing an areaof /2 in the upper tail of a t distributionwith n - 2 degrees of freedom

2/tb1 is the

pointestimat

or

is themarginof error

1/ 2 bt s

34 Slide



Reject H0 if 0 is not included inthe confidence interval for b1.

0 is not included in the confidence interval. Reject H0

= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb

or 1.56 to 8.44

Rejection Rule

95% Confidence Interval for b1

Conclusion

35 Slide


Hypotheses

Test Statistic

Testing for Significance: F Test

F = MSR/MSE

0 1: 0H b

1: 0aH b

36 Slide


Rejection Rule


where:F is based on an F distribution with1 degree of freedom in the numerator andn - 2 degrees of freedom in the denominator

Reject H0 if p-value <

or F > F

37 Slide


1. Determine the hypotheses.

2. Specify the level of significance.

3. Select the test statistic.

= .05

4. State the rejection rule.Reject H0 if p-value < .05or F > 10.13 (with 1 d.f.

in numerator and 3 d.f. in denominator)


0 1: 0H b

1: 0aH b

F = MSR/MSE

38 Slide



5. Compute the value of the test statistic.

6. Determine whether to reject H0. F = 17.44 provides an area of .025 in the upper tail. Thus, the p-value corresponding to F = 21.43 is less than 2(.025) = .05. Hence, we reject H0.

F = MSR/MSE = 100/4.667 = 21.43

The statistical evidence is sufficient to concludethat we have a significant relationship between thenumber of TV ads aired and the number of cars sold.

39 Slide


Some Cautions about theInterpretation of Significance Tests

Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable

us to conclude that there is a linear relationshipbetween x and y.

Rejecting H0: b1 = 0 and concluding that the

relationship between x and y is significant does not enable us to conclude that a cause-and-effect

relationship is present between x and y.

40 Slide


Using the Estimated Regression Equationfor Estimation and Prediction

/ y t sp yp 2

where:confidence coefficient is 1 - andt/2 is based on a t distributionwith n - 2 degrees of freedom

/ 2 indpy t s

Confidence Interval Estimate of E(yp)

Prediction Interval Estimate of yp

41 Slide


If 3 TV ads are run prior to a sale, we expectthe mean number of cars sold to be:

Point Estimation

^y = 10 + 5(3) = 25 cars

42 Slide


Residual Analysis

ˆi iy y

Much of the residual analysis is based on an examination of graphical plots.

Residual for Observation i The residuals provide the best information about e .

If the assumptions about the error term e appear questionable, the hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid.

43 Slide


Residual Plot Against x If the assumption that the variance of e is the

same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then

The residual plot should give an overall impression of a horizontal band of points

44 Slide


x

ˆy y

0

Good PatternRe

sidua

l

Residual Plot Against x

45 Slide



x

ˆy y

0

Resid

ual

Nonconstant Variance

46 Slide



x

ˆy y

0

Resid

ual

Model Form Not Adequate

47 Slide


Residuals


Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2

48 Slide



TV Ads Residual Plot

-3

-2

-1

0

1

2

3

0 1 2 3 4TV Ads

Resi

dual

s

49 Slide


Standardized Residual for Observation i

Standardized Residuals

ˆ

ˆi i

i i

y y

y ys

ˆ 1i i iy ys s h

2

2( )1( )i

ii

x xhn x x

where:

50 Slide


Standardized Residual Plot

The standardized residual plot can provide insight about the assumption that the error term e has a normal distribution.

If this assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution.

51 Slide


Standardized Residuals


Observation Predicted Y Residuals Standard Residuals1 15 -1 -0.5352 25 -1 -0.5353 20 -2 -1.0694 15 2 1.0695 25 2 1.069

52 Slide




A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y ResidualsStandard Residuals32 1 15 -1 -0.53452233 2 25 -1 -0.53452234 3 20 -2 -1.06904535 4 15 2 1.06904536 5 25 2 1.06904537

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30

Cars Sold

Stan

dard

Res

idua

ls

53 Slide



All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that e has a normal distribution.

54 Slide


Outliers and Influential Observations Detecting Outliers

• An outlier is an observation that is unusual in comparison with the other data.

• Minitab classifies an observation as an outlier if its standardized residual value is < -2 or > +2.

• This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier.

• This rule’s shortcoming can be circumvented by using studentized deleted residuals.

• The |i th studentized deleted residual| will be larger than the |i th standardized residual|.

55 Slide


End of Chapter 14

slides by john loucks st. edward’s university

Documents

thomson southwestern

estimated regression

estimated value of y

multiple regression

value of dependent variable

expected value of y

mean value

value of independent