multiple regression complete example - cbachapter 3 multiple regression complete example econ 504...

Department of Quantitative Methods & Information Systems

Chapter 3

Multiple Regression

Complete Example

ECON 504

Dr. Mohammad Zainal Spring 2013

Review Goals

After completing this lecture, you should be

able to:

Formulate null and alternative hypotheses for applications involving a single population mean or proportion

Formulate a decision rule for testing a hypothesis

Know how to use the test statistic, critical value, and p-value approaches to test the null hypothesis

Know what Type I and Type II errors are

2

Review Goals

explain model building using multiple regression

analysis

apply multiple regression analysis to business

decision-making situations

analyze and interpret the computer output for a

multiple regression model

test the significance of the independent variables

in a multiple regression model

(continued)

3

Review Goals

recognize potential problems in multiple

regression analysis and take steps to correct the

problems

incorporate qualitative variables into the

regression model by using dummy variables

use variable transformations to model nonlinear

relationships

(continued)

4

What is a Hypothesis?

A hypothesis is a claim (assumption) about a population parameter:

population mean

population proportion

Example: The mean monthly cell phone bill

of this city is µ = $42

Example: The proportion of adults in this

city with cell phones is P = .68

5

The Null Hypothesis, H0

States the assumption (numerical) to be

tested

Example: The average number of TV sets in

U.S. Homes is at least three ( )

Is always about a population parameter,

not about a sample statistic

3μ:H0

3μ:H0 3x:H0

6

The Null Hypothesis, H0

Begin with the assumption that the null

hypothesis is true

Similar to the notion of innocent until

proven guilty

Refers to the status quo

Always contains “=” , “≤” or “” sign

May or may not be rejected

(continued)

7

The Alternative Hypothesis, HA

Is the opposite of the null hypothesis

e.g.: The average number of TV sets in U.S.

homes is less than 3 ( HA: µ < 3 )

Challenges the status quo

Never contains the “=” , “≤” or “” sign

May or may not be accepted

Is generally the hypothesis that is believed

(or needs to be supported) by the

researcher – a research hypothesis

8

Formulating Hypotheses

Example 1: Ford motor company has

worked to reduce road noise inside the cab

of the redesigned F150 pickup truck. It

would like to report in its advertising that

the truck is quieter. The average of the

prior design was 68 decibels at 60 mph.

What is the appropriate hypothesis test?

9


Example 1: Ford motor company has worked to reduce road noise inside

the cab of the redesigned F150 pickup truck. It would like to report in its

advertising that the truck is quieter. The average of the prior design was

68 decibels at 60 mph.

What is the appropriate test?

H0: µ ≥ 68 (the truck is not quieter) status quo

HA: µ < 68 (the truck is quieter) wants to support

If the null hypothesis is rejected, Ford has sufficient evidence to support that the truck is now quieter.

10


Example 2: The average annual income of

buyers of Ford F150 pickup trucks is

claimed to be $65,000 per year. An

industry analyst would like to test this

claim.

What is the appropriate hypothesis test?

11


Example 1: The average annual income of buyers of Ford F150 pickup trucks is claimed to be $65,000 per year. An industry analyst would like to test this claim.

What is the appropriate test?

H0: µ = 65,000 (income is as claimed) status quo

HA: µ ≠ 65,000 (income is different than claimed)

The analyst will believe the claim unless sufficient evidence is found to discredit it.

12

Population

Claim: the population mean age is 50.

Null Hypothesis:

REJECT

Suppose the sample

mean age is 20:

x = 20

Sample

Null Hypothesis

Is x = 20

likely if

µ = 50?

Hypothesis Testing Process

If not likely,

Now select a random sample:

H0: µ = 50

13

Sampling Distribution of x

μ = 50 If H0 is true

If it is unlikely that

we would get a

sample mean of

this value ...

... then we

reject the null

hypothesis that

μ = 50.

Reason for Rejecting H0

20

... if in fact this were

the population mean…

x

14

Errors in Making Decisions

Type I Error

Reject a true null hypothesis

Considered a serious type of error

The probability of Type I Error is

Called level of significance of the test

Set by researcher in advance

15

Errors in Making Decisions

Type II Error

Fail to reject a false null hypothesis

The probability of Type II Error is β

β is a calculated value, the formula is

discussed later in the chapter

(continued)

16

Outcomes and Probabilities

State of Nature

Decision

Do Not

Reject

H 0

No error

(1 - )

Type II Error

( β )

Reject

H 0

Type I Error

( )

Possible Hypothesis Test Outcomes

H0 False H0 True

Key:

Outcome

(Probability) No Error

( 1 - β )

17

Type I & II Error Relationship

Type I and Type II errors cannot happen at the same time

Type I error can only occur if H0 is true

Type II error can only occur if H0 is false

If Type I error probability ( ) , then

Type II error probability ( β )

18

Factors Affecting Type II Error

All else equal,

β when the difference between

hypothesized parameter and its true value

β when

β when σ

β when n

The formula used to

compute the value of β

is discussed later in the

chapter

19

Level of Significance,

Defines unlikely values of sample statistic if

null hypothesis is true

Defines rejection region of the sampling

distribution

Is designated by , (level of significance)

Typical values are .01, .05, or .10

Is selected by the researcher at the beginning

Provides the critical value(s) of the test

20

Hypothesis Tests for the Mean

σ Known σ Unknown

Hypothesis

Tests for

Assume first that the population

standard deviation σ is known

21

1. Specify population parameter of interest

2. Formulate the null and alternative hypotheses

3. Specify the desired significance level, α

4. Define the rejection region

5. Take a random sample and determine

whether or not the sample result is in the

rejection region

6. Reach a decision and draw a conclusion

Process of Hypothesis Testing

22

Level of Significance and the Rejection Region

H0: μ ≥ 3

HA: μ < 3

0

H0: μ ≤ 3

HA: μ > 3

H0: μ = 3

HA: μ ≠ 3

/2

Lower tail test

Level of significance =

0

/2

Upper tail test Two tailed test

0

-zα zα -zα/2 zα/2

Reject H0 Reject H0 Reject H0 Reject H0 Do not

reject H0

Do not

reject H0

Do not

reject H0

Example: Example: Example:

23

Reject H0 Do not reject H0

The cutoff value, or ,

is called a critical value

-zα

xα

-zα xα

0

µ=3

H0: μ ≥ 3

HA: μ < 3

n

σzμx

Critical Value for Lower Tail Test

24


Critical Value for Upper Tail Test

zα

xα

0

H0: μ ≤ 3

HA: μ > 3

n

σzμx

µ=3

The cutoff value, or ,

is called a critical value

zα xα

25

Do not reject H0 Reject H0 Reject H0

There are two cutoff

values (critical values):

or

Critical Values for Two Tailed Tests

/2

-zα/2

xα/2

± zα/2

xα/2

0

H0: μ = 3

HA: μ 3

zα/2

xα/2

n

σzμx /2/2

Lower

Upper xα/2

Lower Upper

/2

µ=3

26

The Rejection Region

H0: μ ≥ 3

HA: μ < 3

0

H0: μ ≤ 3

HA: μ > 3

H0: μ = 3

HA: μ ≠ 3

/2

Lower tail test

0

/2

Upper tail test Two tailed test

0

-zα zα -zα/2 zα/2

Reject H0 Reject H0 Reject H0 Reject H0 Do not

reject H0

Do not

reject H0

Do not

reject H0

Example: Example: Example:

Reject H0 if z < -zα

i.e., if x < xα

αx αx α/2(L)x α/2(U)x

Reject H0 if z > zα

i.e., if x > xα

Reject H0 if z < -zα/2 or z > zα/2

i.e., if x < xα/2(L) or x > xα/2(U)

27

z-units: For given , find the critical z value(s):

-zα , zα ,or ±zα/2

Convert the sample mean x to a z test statistic:

Reject H0 if z is in the rejection region,

otherwise do not reject H0

x units: Given , calculate the critical value(s)

xα , or xα/2(L) and xα/2(U)

The sample mean is the test statistic. Reject H0 if x is in the

rejection region, otherwise do not reject H0

Two Equivalent Approaches to Hypothesis Testing

n

σ

μxz

28

Hypothesis Testing Example

Test the claim that the true mean # of TV sets

in US homes is at least 3.

(Assume σ = 0.8)

1. Specify the population value of interest

The mean number of TVs in US homes

2. Formulate the appropriate null and alternative

hypotheses

H0: μ 3 HA: μ < 3 (This is a lower tail test)

3. Specify the desired level of significance

Suppose that = .05 is chosen for this test

29


4. Determine the rejection region

= .05

-zα= -1.645 0

This is a one-tailed test with = .05.

Since σ is known, the cutoff value is a z value:

Reject H0 if z < z = -1.645 ; otherwise do not reject H0

Hypothesis Testing Example (continued)

30

5. Obtain sample evidence and compute the

test statistic

Suppose a sample is taken with the following

results: n = 100, x = 2.84 ( = 0.8 is assumed known)

Then the test statistic is:

2.0.08

.16

100

0.8

32.84

n

σ

μxz

Hypothesis Testing Example

31


= .05

-1.645 0

6. Reach a decision and interpret the result

-2.0

Since z = -2.0 < -1.645, we reject the null

hypothesis that the mean number of TVs in US

homes is at least 3. There is sufficient evidence

that the mean is less than 3.


z

32

Reject H0

= .05

2.8684

Do not reject H0

3

An alternate way of constructing rejection region:

2.84

Since x = 2.84 < 2.8684,

we reject the null

hypothesis


x

Now

expressed

in x, not z

units

2.8684100

0.81.6453

n

σzμx αα

33

p-Value Approach to Testing

Convert Sample Statistic ( ) to Test Statistic

(a z value, if σ is known)

Determine the p-value from a table or

computer

Compare the p-value with

If p-value < , reject H0

If p-value , do not reject H0

x

34

p-Value Approach to Testing

p-value: Probability of obtaining a test

statistic more extreme ( ≤ or ) than the

observed sample value given H0 is true

Also called observed level of significance

Smallest value of for which H0 can be

rejected

(continued)

35

p-value =.0228

= .05

p-value example

Example: How likely is it to see a sample mean

of 2.84 (or something further below the mean) if

the true mean is = 3.0?

2.8684 3

2.84

x .02282.0)P(z

1000.8

3.02.84zP

3.0)μ|2.84xP(

0 -1.645 -2.0

z

36

Compare the p-value with

If p-value < , reject H0

If p-value , do not reject H0

Here: p-value = .0228 = .05

Since .0228 < .05, we reject

the null hypothesis

(continued)

p-value example

p-value =.0228

= .05

2.8684 3

2.84 37

Example: Upper Tail z Test for Mean ( Known)

A phone industry manager thinks that

customer monthly cell phone bill have

increased, and now average over $52 per

month. The company wishes to test this

claim. (Assume = 10 is known)

H0: μ ≤ 52 the average is not over $52 per month

HA: μ > 52 the average is greater than $52 per month (i.e., sufficient evidence exists to support the manager’s claim)

Form hypothesis test:

38


Suppose that = .10 is chosen for this test

Find the rejection region:

= .10

zα=1.28 0

Reject H0

Reject H0 if z > 1.28

Example: Find Rejection Region (continued)

39

Review: Finding Critical Value - One Tail

Z .07 .09

1.1 .3790 .3810 .3830

1.2 .3980 .4015

1.3 .4147 .4162 .4177 z 0 1.28

.08

Standard Normal

Distribution Table (Portion) What is z given = 0.10?

= .10

Critical Value

= 1.28

.90

.3997

.10

.40 .50

40

Obtain sample evidence and compute the test

statistic

Suppose a sample is taken with the following

results: n = 64, x = 53.1 (=10 was assumed known)

Then the test statistic is:

0.88

64

10

5253.1

n

σ

μxz

Example: Test Statistic (continued)

41


Example: Decision

= .10

1.28 0

Reject H0

Do not reject H0 since z = 0.88 ≤ 1.28

i.e.: there is not sufficient evidence that the mean bill is over $52

z = .88

Reach a decision and interpret the result:

(continued)

42

Reject H0

= .10

Do not reject H0 1.28

0

Reject H0

z = .88

Calculate the p-value and compare to

(continued)

.1894

.3106.50.88)P(z

6410

52.053.1zP

52.0)μ|53.1xP(

p-value = .1894

p -Value Solution

Do not reject H0 since p-value = .1894 > = .10

43

Critical Value Approach to Testing

When σ is known, convert sample statistic ( ) to

a z test statistic

x

Known Unknown

Hypothesis

Tests for

The test statistic is:

n

σ

μxz

44

Critical Value Approach to Testing

When σ is unknown, convert sample statistic ( )

to a t test statistic

x

Known Unknown

Hypothesis

Tests for

The test statistic is:

n

s

μxt 1n

(The population must be

approximately normal)

45

Hypothesis Tests for μ, σ Unknown

1. Specify the population value of interest

2. Formulate the appropriate null and alternative hypotheses

3. Specify the desired level of significance

4. Determine the rejection region (critical values are from the t-distribution with n-1 d.f.)

5. Obtain sample evidence and compute the t test statistic

6. Reach a decision and interpret the result

46

Example: Two-Tail Test ( Unknown)

The average cost of a

hotel room in New York

is said to be $168 per

night. A random sample

of 25 hotels resulted in

x = $172.50 and

s = $15.40. Test at the

= 0.05 level. (Assume the population distribution is normal)

H0: μ = 168

HA: μ 168

47

= 0.05

n = 25

Critical Values:

t24 = ± 2.0639

is unknown, so

use a t statistic

Example Solution: Two-Tail Test

Do not reject H0: not sufficient evidence that

true mean cost is different than $168

Reject H0 Reject H0

/2=.025

-tα/2

Do not reject H0

0 tα/2

/2=.025

-2.0639 2.0639

1.46

25

15.40

168172.50

n

s

μxt 1n

1.46

H0: μ = 168

HA: μ 168

48

Reject

H0: μ 52

Do not reject H0 : μ 52

Type II Error

Type II error is the probability of

failing to reject a false H0

52 50

Suppose we fail to reject H0: μ 52

when in fact the true mean is μ = 50

49

Reject

H0: 52

Do not reject H0 : 52

Type II Error

Suppose we do not reject H0: 52 when in fact

the true mean is = 50

52 50

This is the true

distribution of x if = 50

This is the range of x where

H0 is not rejected

(continued)

50

Reject

H0: μ 52


Type II Error

Suppose we do not reject H0: μ 52 when

in fact the true mean is μ = 50

52 50

β

Here, β = P( x cutoff ) if μ = 50

(continued)

51

Reject

H0: μ 52


Suppose n = 64 , σ = 6 , and = .05

52 50

So β = P( x 50.766 ) if μ = 50

Calculating β

50.76664

61.64552

n

σzμxcutoff

(for H0 : μ 52)

50.766

52

Reject

H0: μ 52


.1539.3461.51.02)P(z

646

5050.766zP50)μ|50.766xP(

Suppose n = 64 , σ = 6 , and = .05

52 50

Calculating β (continued)

Probability of

type II error:

β = .1539

53

Hypothesis Tests in Minitab

54

Hypothesis Tests in Minitab

55

Sample Minitab Output

56

Hypothesis Tests Summary

Addressed hypothesis testing methodology

Performed z Test for the mean (σ known)

Discussed p–value approach to

hypothesis testing

Performed one-tail and two-tail tests . . .

57

Hypothesis Tests Summary

Performed t test for the mean (σ

unknown)

Performed z test for the proportion

Discussed Type II error and computed its

probability

(continued)

58

Multiple Regression Assumptions

The model errors are independent and random

The errors are normally distributed

The mean of the errors is zero

Errors have a constant variance

e = (y – y)

<

Errors (residuals) from the regression model:

59

Model Specification

Decide what you want to do and select the

dependent variable

Determine the potential independent variables for

your model

Gather sample data (observations) for all

variables

60

The Correlation Matrix

Correlation between the dependent variable and

selected independent variables can be found

using Excel:

Formula Tab: Data Analysis / Correlation

Can check for statistical significance of correlation

with a t test

61

Example

A distributor of frozen desert pies wants to

evaluate factors thought to influence demand

Dependent variable: Pie sales (units per week)

Independent variables: Price (in $)

Advertising ($100’s)

Data are collected for 15 weeks

62

Pie Sales Model

Sales = b0 + b1 (Price)

+ b2 (Advertising)

Week

Pie

Sales

Price

($)

Advertising

($100s)

1 350 5.50 3.3

2 460 7.50 3.3

3 350 8.00 3.0

4 430 8.00 4.5

5 350 6.80 3.0

6 380 7.50 4.0

7 430 4.50 3.0

8 470 6.40 3.7

9 450 7.00 3.5

10 490 5.00 4.0

11 340 7.20 3.5

12 300 7.90 3.2

13 440 5.90 4.0

14 450 5.00 3.5

15 300 7.00 2.7

Pie Sales Price Advertising

Pie Sales 1

Price -0.44327 1

Advertising 0.55632 0.03044 1

Correlation matrix:

Multiple regression model:

63

Interpretation of Estimated Coefficients

Slope (bi)

Estimates that the average value of y changes by bi

units for each 1 unit increase in Xi holding all other

variables constant

Example: if b1 = -20, then sales (y) is expected to

decrease by an estimated 20 pies per week for each $1

increase in selling price (x1), net of the effects of

changes due to advertising (x2)

y-intercept (b0)

The estimated average value of y when all xi = 0

(assuming all xi = 0 is within the range of observed

values)

64

Pie Sales Correlation Matrix

Price vs. Sales : r = -0.44327

There is a negative association between

price and sales

Advertising vs. Sales : r = 0.55632

There is a positive association between

advertising and sales

Pie Sales Price Advertising

Pie Sales 1

Price -0.44327 1

Advertising 0.55632 0.03044 1

65

Scatter Diagrams

Sales vs. Price

0

100

200

300

400

500

600

0 2 4 6 8 10

Sales vs. Advertising

0

100

200

300

400

500

600

0 1 2 3 4 5

Sales

Sales

Price

Advertising

66

Estimating a Multiple Linear Regression Equation

Computer software is generally used to generate

the coefficients and measures of goodness of fit

for multiple regression

Excel:

Data / Data Analysis / Regression

Minitab:

Stat / Regression / Regression…

67


Excel:

68


Minitab:

69

Multiple Regression Output

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales

70

The Multiple Regression Equation


b1 = -24.975: sales

will decrease, on

average, by 24.975

pies per week for

each $1 increase in

selling price, net of

the effects of changes

due to advertising

b2 = 74.131: sales will

increase, on average,

by 74.131 pies per

week for each $100

increase in

advertising, net of the

effects of changes

due to price

where

Sales is in number of pies per week

Price is in $

Advertising is in $100’s.

71

Using The Model to Make Predictions

Predict sales for a week in which the selling

price is $5.50 and advertising is $350:

Predicted sales

is 428.62 pies

428.62

(3.5) 74.131 (5.50) 24.975 - 306.526


Note that Advertising is

in $100’s, so $350

means that x2 = 3.5

72

Predictions in Minitab

73

Predictions in Minitab (continued)

Predicted y value

<

Confidence interval for the

mean y value, given

these x’s

<

Prediction interval for an

individual y value, given

these x’s

<

74

Multiple Coefficient of Determination (R2)

Reports the proportion of total variation in y

explained by all x variables taken together

squares of sum Total

regression squares of Sum

SST

SSRR2

75


Multiple R 0.72213

R Square 0.52148



Observations 15


Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

.5214856493.3

29460.0

SST

SSRR2

52.1% of the variation in pie sales

is explained by the variation in

price and advertising

Multiple Coefficient of Determination

(continued)

76

Adjusted R2

R2 never decreases when a new x variable is

added to the model

This can be a disadvantage when comparing

models

What is the net effect of adding a new variable?

We lose a degree of freedom when a new x

variable is added

Did the new x variable add enough

explanatory power to offset the loss of one

degree of freedom?

77

Shows the proportion of variation in y explained by all x variables adjusted for the number of x variables used

(where n = sample size, k = number of independent variables)

Penalize excessive use of unimportant independent variables

Smaller than R2

Useful in comparing among models

Adjusted R2

(continued)

1kn

1n)R1(1R 22

A

78


Multiple R 0.72213

R Square 0.52148



Observations 15


Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

.44172R2

A

44.2% of the variation in pie sales is

explained by the variation in price and

advertising, taking into account the sample

size and number of independent variables

Multiple Coefficient of Determination

(continued)

79

Is the Model Significant?

F-Test for Overall Significance of the Model

Shows if there is a linear relationship between all

of the x variables considered together and y

Use F test statistic

Hypotheses:

H0: β1 = β2 = … = βk = 0 (no linear relationship)

HA: at least one βi ≠ 0 (at least one independent

variable affects y)

80

F-Test for Overall Significance

Test statistic:

where F has (numerator) D1 = k and

(denominator) D2 = (n – k – 1)

degrees of freedom

(continued)

MSE

MSR

1kn

SSEk

SSR

F

81

6.53862252.8

14730.0

MSE

MSRF


Multiple R 0.72213

R Square 0.52148



Observations 15


Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

(continued)

F-Test for Overall Significance

With 2 and 12 degrees

of freedom P-value for

the F-Test

82

H0: β1 = β2 = 0

HA: β1 and β2 not both zero

= .05

df1= 2 df2 = 12

Test Statistic:

Decision:

Conclusion:

Reject H0 at = 0.05

The regression model does explain

a significant portion of the variation

in pie sales

(There is evidence that at least one

independent variable affects y )

0

= .05

F.05 = 3.885


6.5386MSE

MSRF

Critical

Value:

F = 3.885

F-Test for Overall Significance (continued)

F

83

Are Individual Variables Significant?

Use t-tests of individual variable slopes

Shows if there is a linear relationship between the

variable xi and y

Hypotheses:

H0: βi = 0 (no linear relationship)

HA: βi ≠ 0 (linear relationship does exist between xi and y)

84


H0: βi = 0 (no linear relationship)

HA: βi ≠ 0 (linear relationship does exist between xi and y ) Test Statistic: (df = n – k – 1)

ib

i

s

0bt

(continued)

85


Multiple R 0.72213

R Square 0.52148



Observations 15


Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

t-value for Price is t = -2.306, with

p-value .0398

t-value for Advertising is t = 2.855,

with p-value .0145

(continued)


86

d.f. = 15-2-1 = 12

= .05

t/2 = 2.1788

Inferences about the Slope: t Test Example

H0: βi = 0

HA: βi 0

The test statistic for each variable falls

in the rejection region (p-values < .05)

There is evidence that both

Price and Advertising affect

pie sales at = .05

From Excel output:

Reject H0 for each variable

Coefficients Standard Error t Stat P-value

Price -24.97509 10.83213 -2.30565 0.03979

Advertising 74.13096 25.96732 2.85478 0.01449

Decision:

Conclusion:

Reject H0 Reject H0

/2=.025

-tα/2

Do not reject H0

0

tα/2

/2=.025

-2.1788 2.1788

87

Confidence Interval Estimate for the Slope

Confidence interval for the population slope β1

(the effect of changes in price on pie sales):

Example: Weekly sales are estimated to be reduced

by between 1.37 to 48.58 pies for each increase of $1

in the selling price

ib2/i stb

Coefficients Standard Error … Lower 95% Upper 95%

Intercept 306.52619 114.25389 … 57.58835 555.46404

Price -24.97509 10.83213 … -48.57626 -1.37392

Advertising 74.13096 25.96732 … 17.55303 130.70888

where t has (n – k – 1) d.f.

88

Standard Deviation of the Regression Model

The estimate of the standard deviation of the

regression model is:

MSEkn

SSEs

1

Is this value large or small? Must compare to the

mean size of y for comparison

89


Multiple R 0.72213

R Square 0.52148



Observations 15


Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

The standard deviation of the

regression model is 47.46

(continued)


90

The standard deviation of the regression model is

47.46

A rough prediction range for pie sales in a given

week is

Pie sales in the sample were in the 300 to 500

per week range, so this range is probably too

large to be acceptable. The analyst may want to

look for additional variables that can explain more

of the variation in weekly sales

(continued)


94.22(47.46)

91

Multicollinearity

Multicollinearity: High correlation exists

between two independent variables

This means the two variables contribute

redundant information to the multiple regression

model

92

Multicollinearity

Including two highly correlated independent

variables can adversely affect the regression

results

No new information provided

Can lead to unstable coefficients (large

standard error and low t-values)

Coefficient signs may not match prior

expectations

(continued)

93

Some Indications of Severe Multicollinearity

Incorrect signs on the coefficients

Large change in the value of a previous

coefficient when a new variable is added to the

model

A previously significant variable becomes

insignificant when a new independent variable

is added

The estimate of the standard deviation of the

model increases when a variable is added to

the model

94

Detect Collinearity (Variance Inflationary Factor)

VIFj is used to measure collinearity:

If VIFj ≥ 5, xj is highly correlated with

the other explanatory variables

R2j is the coefficient of determination when the jth

independent variable is regressed against the

remaining k – 1 independent variables

21

1

j

jR

VIF

95

Detect Collinearity

Output for the pie sales example:

Since there are only two

explanatory variables, only one VIF

is reported

VIF is < 5

There is no evidence of

collinearity between Price and

Advertising

Regression Analysis

Price and all other X


Multiple R 0.030437581

R Square 0.000926446

Adjusted R

Square -0.075925366


Observations 15

VIF 1.000927305

96

Qualitative (Dummy) Variables

Categorical explanatory variable (dummy variable) with two or more levels: yes or no, on or off, male or female

coded as 0 or 1

Regression intercepts are different if the variable is significant

Assumes equal slopes for other variables

The number of dummy variables needed is (number of levels – 1)

97

Dummy-Variable Model Example (with 2 Levels)

Let:

y = pie sales

x1 = price

x2 = holiday (X2 = 1 if a holiday occurred during the week)

(X2 = 0 if there was no holiday that week)

210 xbxbby21

98

Same

slope

Dummy-Variable Model Example (with 2 Levels)

(continued)

x1 (Price)

y (sales)

b0 + b2

b0

1010

12010

xb b (0)bxbby

xb)b(b(1)bxbby

121

121

Holiday

No Holiday

Different

intercept

If H0: β2 = 0 is

rejected, then

“Holiday” has a

significant effect

on pie sales

99

Sales: number of pies sold per week

Price: pie price in $

Interpreting the Dummy Variable Coefficient (with 2 Levels)

Example:

1 If a holiday occurred during the week

0 If no holiday occurred

b2 = 15: on average, sales were 15 pies greater in

weeks with a holiday than in weeks without a

holiday, given the same price

)15(Holiday 30(Price) - 300 Sales

100

Dummy-Variable Models (more than 2 Levels)

The number of dummy variables is one less than

the number of levels

Example:

y = house price ; x1 = square feet

The style of the house is also thought to matter:

Style = ranch, split level, condo

Three levels, so two dummy

variables are needed

101

Dummy-Variable Models (more than 2 Levels)

not if 0

level split if 1x

not if 0

ranch if 1x 32

3210 xbxbxbby321

b2 shows the impact on price if the house is a

ranch style, compared to a condo

b3 shows the impact on price if the house is a

split level style, compared to a condo

(continued) Let the default category be “condo”

102

Interpreting the Dummy Variable Coefficients (with 3 Levels)

With the same square feet, a

ranch will have an estimated

average price of 23.53

thousand dollars more than a

condo

With the same square feet, a

ranch will have an estimated

average price of 18.84

thousand dollars more than a

condo.

Suppose the estimated equation is

321 18.84x23.53x0.045x20.43y

18.840.045x20.43y 1

23.530.045x20.43y 1

10.045x20.43y

For a condo: x2 = x3 = 0

For a ranch: x3 = 0

For a split level: x2 = 0

103

Interaction Effects

Hypothesizes interaction between pairs of x

variables

Response to one x variable varies at different

levels of another x variable

Contains two-way cross product terms

2

2

152143322110 xxβxxβxβxβxββ y

Basic Terms Interactive Terms

104

Effect of Interaction

Given:

Without interaction term, effect of x1 on y is

measured by β1

With interaction term, effect of x1 on y is

measured by β1 + β3 x2

Effect changes as x2 increases

ε xxβxβxββy 21322110

105

x2 = 1

x2 = 0

y = 1 + 2x1 + 3(1) + 4x1(1)

= 4 + 6x1

y = 1 + 2x1 + 3(0) + 4x1(0)

= 1 + 2x1

Interaction Example

Effect (slope) of x1 on y does depend on x2 value

x1

4

8

12

0

0 1 0.5 1.5

y

y = 1 + 2x1 + 3x2 + 4x1x2 where x2 = 0 or 1 (dummy variable)

106

Interaction Regression Model Worksheet

Case, i yi x1i x2i x1i x2i

1 1 1 3 3

2 4 8 5 40

3 1 3 2 6

4 3 5 6 30

: : : : :

multiply x1 by x2 to get x1x2, then

run regression with y, x1, x2 , x1x2

107

ε xxβxβxββy 21322110

Hypothesize interaction between pairs of

independent variables

Hypotheses:

H0: β3 = 0 (no interaction between x1 and x2)

HA: β3 ≠ 0 (x1 interacts with x2)

Evaluating Presence of Interaction

108

Model Building

Goal is to develop a model with the best set of

independent variables Easier to interpret if unimportant variables are

removed Lower probability of collinearity

Stepwise regression procedure

Provide evaluation of alternative models as variables

are added

Best-subset approach

Try all combinations and select the best using the

highest adjusted R2 and lowest sε

109

Idea: develop the least squares regression

equation in steps, either through forward

selection, backward elimination, or through

standard stepwise regression

The coefficient of partial determination is the

measure of the marginal contribution of each

independent variable, given that other

independent variables are in the model

Stepwise Regression

110

Best Subsets Regression

Idea: estimate all possible regression equations

using all possible combinations of independent

variables

Choose the best fit by looking for the highest

adjusted R2 and lowest standard error sε

Stepwise regression and best subsets

regression can be performed using Minitab, or

other statistical software packages

111

multiple regression complete example - cbachapter 3 multiple regression complete example econ 504...

Documents