economics 173 business statistics lecture 22 fall, 2001© professor j. petry

Economics 173Business Statistics

Lecture 22

Fall, 2001©

Professor J. Petry

http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/

2

19.1 Introduction

• Regression analysis is one of the most commonly used techniques in statistics.

• It is considered powerful for several reasons:– It can cover variety of mathematical models

• linear relationships.• non - linear relationships.• qualitative variables.

– It provides efficient methods for model building, to select the best fitting set of variables.

3

19.2 Polynomial Models

• The independent variables may appear as functions of a number of predictor variables.– Polynomial models of order p with one predictor

variable: y = 0 + 1x + 2x2 + …+pxp + – Polynomial models with two predictor variables

For example:y = 0 + 1x1+ 2x2 + y = 0 + 1x1+ 2x2 + 3x1x2 +

4

y01x

• Polynomial models with one predictor variable

– First order model (p = 1)

y = 0 + 1x + 2x2 +

2 < 0 2 > 0

– Second order model (p=2)

5

y = 0 + 1x + 2x2 +

– Third order model (p=3)

3x3 +

3 < 0 3 > 0

6

– First order modely = 0 + 1x1 +

• Polynomial models with two predictor variables

x1

x2

y

2x2 + 1 < 0

1 > 0

x1

x2

y

2 > 0

2 <

0

7

– First order model with interactiony = 0 + 1x1 + 2x2

+3x1x2 +

X2 = 2

X2 = 3

x1

X2 =1

The two variables interact to affect the value of y.

– First order modely = 0 + 1x1 + 2x2 +

• Polynomial models with two predictor variables

The effect of one predictor variable on y is independent of the effect of the other predictor variable on y.

x1

0+2(1)] +(1+3(1))x1

X2 =1X2 = 2X2 = 3

0+2(1)] +1x10+2(2)] +1x10+2(3)] +1x1

0+2(3)] +(1+3(3))x1

0+2(2)] +(1+3(2))x1

8

– Second order model withinteractiony = 0 + 1x1 + 2x2

+3x12 + 4x2

2 +

y = [0+2(3)+4(32)]+ 1x1 + 3x12 +

y = [0+2(2)+4(22)]+ 1x1 + 3x12 +

– Second order modely = 0 + 1x1 + 2x2

+ 3x12 + 4x2

2 + 5x1x2 +

X2 =1

X2 = 2

X2 = 3

y = [0+2(1)+4(12)]+ 1x1 + 3x12 +

x1

X2 =1

X2 = 2

X2 = 3

9

• Example 19.1 Location for a new restaurant

– A fast food restaurant chain tries to identify new locations that are likely to be profitable.

– The primary market for such restaurants is middle-income adults and their children (between the age 5 and 12).

– Which regression model should be proposed to predict the profitability of new locations?

10

• Solution– The dependent variable will be Gross Revenue

– There are quadratic relationships between Revenue and each predictor variable. Why?

• Members of middle-class families are more likely to visit a fast food family than members of poor or wealthy families.

IncomeLow Middle High

Revenue

• Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids.

age

Revenue

Low Middle High

Revenue = 0 + 1Income + 2Age + 3Income2 +4Age2 + 5(Income)(Age) +Revenue = 0 + 1Income + 2Age + 3Income2 +4Age2 + 5(Income)(Age) +

11

19.3 Qualitative Independent Variables

• In many real-life situations one or more independent variables are qualitative.

• Including qualitative variables in a regression analysis model is done via indicator variables.

• An indicator variable (I) can assume one out of two values, “zero” or “one”.

1 if a first condition out of two is met0 if a second condition out of two is metI=1 if data were collected before 19800 if data were collected after 19801 if the temperature was below 50o

0 if the temperature was 50o or more1 if a degree earned is in Finance0 if a degree earned is not in Finance

12

Example 17.1 - continued

• The dealer believes that color is a variable that affects a car’s price.

• Three color categories are considered:– White– Silver– Other colors

• Note: Color is a qualitative variable.

I1 =1 if the color is white0 if the color is not white

I2 =1 if the color is silver0 if the color is not silver

And what about “Other colors”? Set I1 = 0 and I2 = 0

13

• Solution– the proposed model is

y = 0 + 1(Odometer) + 2I1 + 3I2 + – The data

To represent a qualitative variable that hasm possible categories (levels), we must create m-1 indicator variables.

Price Odometer I-1 I-25318 37388 1 05061 44758 1 05008 45833 0 05795 30862 0 05784 31705 0 15359 34010 0 1

. . . .

. . . .

White car

Other color

Silver color

14

Price = 6350 - .0278(Odometer) + 45.2(0) + 148(1)

Price = 6350 - .0278(Odometer) + 45.2(1) + 148(0)

Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0)

From Excel we get the regression equationPRICE = 6350-.0278(ODOMETER)+45.2I1+148I2

For one additional mile the auction pricedecreases by 2.78 cents.

Odometer

Price

A white car sells, on the average, for $45.2 more than a car of the “Other color” category

6350 - .0278(Odometer)

6395.2 - .0278(Odometer)

6498 - .0278(Odometer)

A silver color car sells, on the average, for $148 more than a car of the “Other color” category

The equation for acar of the “Other color”category.

The equation for acar of white color

The equation for acar of silver color

15

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.835482R Square 0.69803Adjusted R Square0.688594Standard Error142.271Observations 100

ANOVAdf SS MS F Significance F

Regression 3 4491749 1497250 73.97095 7.22E-25Residual 96 1943141 20241.05Total 99 6434890

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 6350.323 92.16653 68.90053 1.5E-83 6167.374 6533.272Odometer -0.02777 0.002369 -11.7242 3.14E-20 -0.03247 -0.02307I-1 45.24098 34.08443 1.327321 0.187551 -22.4161 112.8981I-2 147.738 38.18499 3.869007 0.000199 71.94135 223.5347

There is insufficient evidenceto infer that a white color car anda car of “Other color” sell for adifferent auction price.

There is sufficient evidenceto infer that a silver color carsells for a larger price than acar of the “Other color” category.

16

Create and identify indicator variables to represent the following qualitative variables.

• Religious affiliation (Catholic, Protestant, other)

• Working shift (8:00am to 4:00pm, 4:00pm to 12:00 midnight, 12:00 midnight to 8:00am)

• Supervisor (Ringo Star, Rondal Gondarfshkitka, Seymour Heinne, and Billy Bob Thorton)1. Assume there are no other supervisors2. Assume there are other supervisors

• Example

17

19.6 Model Building

• Identify the dependent variable, and clearly define it.• List potential predictors.

– Bear in mind the problem of multicolinearity.– Consider the cost of gathering, processing and storing

data.– Be selective in your choice (try to use as few variables

as possible).

18

• Identify several possible models.– A scatter diagram of the dependent variables can be

helpful in formulating the right model.– If you are uncertain, start with first order and second

order models, with and without interaction.– Try other relationships (transformations) if the

polynomial models fail to provide a good fit.• Use statistical software to estimate the model.

• Gather the required observations (have at least six observations for each independent variable).

19

• Determine whether the required conditions are satisfied. If not, attempt to correct the problem.

• Select the best model.– Use the statistical output.– Use your judgment!!

economics 173 business statistics lecture 22 fall, 2001© professor j. petry

Documents