economics 173 business statistics lecture 22 fall, 2001© professor j. petry
TRANSCRIPT
Economics 173Business Statistics
Lecture 22
Fall, 2001©
Professor J. Petry
http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/
2
19.1 Introduction
• Regression analysis is one of the most commonly used techniques in statistics.
• It is considered powerful for several reasons:– It can cover variety of mathematical models
• linear relationships.• non - linear relationships.• qualitative variables.
– It provides efficient methods for model building, to select the best fitting set of variables.
3
19.2 Polynomial Models
• The independent variables may appear as functions of a number of predictor variables.– Polynomial models of order p with one predictor
variable: y = 0 + 1x + 2x2 + …+pxp + – Polynomial models with two predictor variables
For example:y = 0 + 1x1+ 2x2 + y = 0 + 1x1+ 2x2 + 3x1x2 +
4
y01x
• Polynomial models with one predictor variable
– First order model (p = 1)
y = 0 + 1x + 2x2 +
2 < 0 2 > 0
– Second order model (p=2)
5
y = 0 + 1x + 2x2 +
– Third order model (p=3)
3x3 +
3 < 0 3 > 0
6
– First order modely = 0 + 1x1 +
• Polynomial models with two predictor variables
x1
x2
y
2x2 + 1 < 0
1 > 0
x1
x2
y
2 > 0
2 <
0
7
– First order model with interactiony = 0 + 1x1 + 2x2
+3x1x2 +
X2 = 2
X2 = 3
x1
X2 =1
The two variables interact to affect the value of y.
– First order modely = 0 + 1x1 + 2x2 +
• Polynomial models with two predictor variables
The effect of one predictor variable on y is independent of the effect of the other predictor variable on y.
x1
0+2(1)] +(1+3(1))x1
X2 =1X2 = 2X2 = 3
0+2(1)] +1x10+2(2)] +1x10+2(3)] +1x1
0+2(3)] +(1+3(3))x1
0+2(2)] +(1+3(2))x1
8
– Second order model withinteractiony = 0 + 1x1 + 2x2
+3x12 + 4x2
2 +
y = [0+2(3)+4(32)]+ 1x1 + 3x12 +
y = [0+2(2)+4(22)]+ 1x1 + 3x12 +
– Second order modely = 0 + 1x1 + 2x2
+ 3x12 + 4x2
2 + 5x1x2 +
X2 =1
X2 = 2
X2 = 3
y = [0+2(1)+4(12)]+ 1x1 + 3x12 +
x1
X2 =1
X2 = 2
X2 = 3
9
• Example 19.1 Location for a new restaurant
– A fast food restaurant chain tries to identify new locations that are likely to be profitable.
– The primary market for such restaurants is middle-income adults and their children (between the age 5 and 12).
– Which regression model should be proposed to predict the profitability of new locations?
10
• Solution– The dependent variable will be Gross Revenue
– There are quadratic relationships between Revenue and each predictor variable. Why?
• Members of middle-class families are more likely to visit a fast food family than members of poor or wealthy families.
IncomeLow Middle High
Revenue
• Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids.
age
Revenue
Low Middle High
Revenue = 0 + 1Income + 2Age + 3Income2 +4Age2 + 5(Income)(Age) +Revenue = 0 + 1Income + 2Age + 3Income2 +4Age2 + 5(Income)(Age) +
11
19.3 Qualitative Independent Variables
• In many real-life situations one or more independent variables are qualitative.
• Including qualitative variables in a regression analysis model is done via indicator variables.
• An indicator variable (I) can assume one out of two values, “zero” or “one”.
1 if a first condition out of two is met0 if a second condition out of two is metI=1 if data were collected before 19800 if data were collected after 19801 if the temperature was below 50o
0 if the temperature was 50o or more1 if a degree earned is in Finance0 if a degree earned is not in Finance
12
Example 17.1 - continued
• The dealer believes that color is a variable that affects a car’s price.
• Three color categories are considered:– White– Silver– Other colors
• Note: Color is a qualitative variable.
I1 =1 if the color is white0 if the color is not white
I2 =1 if the color is silver0 if the color is not silver
And what about “Other colors”? Set I1 = 0 and I2 = 0
13
• Solution– the proposed model is
y = 0 + 1(Odometer) + 2I1 + 3I2 + – The data
To represent a qualitative variable that hasm possible categories (levels), we must create m-1 indicator variables.
Price Odometer I-1 I-25318 37388 1 05061 44758 1 05008 45833 0 05795 30862 0 05784 31705 0 15359 34010 0 1
. . . .
. . . .
White car
Other color
Silver color
14
Price = 6350 - .0278(Odometer) + 45.2(0) + 148(1)
Price = 6350 - .0278(Odometer) + 45.2(1) + 148(0)
Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0)
From Excel we get the regression equationPRICE = 6350-.0278(ODOMETER)+45.2I1+148I2
For one additional mile the auction pricedecreases by 2.78 cents.
Odometer
Price
A white car sells, on the average, for $45.2 more than a car of the “Other color” category
6350 - .0278(Odometer)
6395.2 - .0278(Odometer)
6498 - .0278(Odometer)
A silver color car sells, on the average, for $148 more than a car of the “Other color” category
The equation for acar of the “Other color”category.
The equation for acar of white color
The equation for acar of silver color
15
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.835482R Square 0.69803Adjusted R Square0.688594Standard Error142.271Observations 100
ANOVAdf SS MS F Significance F
Regression 3 4491749 1497250 73.97095 7.22E-25Residual 96 1943141 20241.05Total 99 6434890
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 6350.323 92.16653 68.90053 1.5E-83 6167.374 6533.272Odometer -0.02777 0.002369 -11.7242 3.14E-20 -0.03247 -0.02307I-1 45.24098 34.08443 1.327321 0.187551 -22.4161 112.8981I-2 147.738 38.18499 3.869007 0.000199 71.94135 223.5347
There is insufficient evidenceto infer that a white color car anda car of “Other color” sell for adifferent auction price.
There is sufficient evidenceto infer that a silver color carsells for a larger price than acar of the “Other color” category.
16
Create and identify indicator variables to represent the following qualitative variables.
• Religious affiliation (Catholic, Protestant, other)
• Working shift (8:00am to 4:00pm, 4:00pm to 12:00 midnight, 12:00 midnight to 8:00am)
• Supervisor (Ringo Star, Rondal Gondarfshkitka, Seymour Heinne, and Billy Bob Thorton)1. Assume there are no other supervisors2. Assume there are other supervisors
• Example
17
19.6 Model Building
• Identify the dependent variable, and clearly define it.• List potential predictors.
– Bear in mind the problem of multicolinearity.– Consider the cost of gathering, processing and storing
data.– Be selective in your choice (try to use as few variables
as possible).
18
• Identify several possible models.– A scatter diagram of the dependent variables can be
helpful in formulating the right model.– If you are uncertain, start with first order and second
order models, with and without interaction.– Try other relationships (transformations) if the
polynomial models fail to provide a good fit.• Use statistical software to estimate the model.
• Gather the required observations (have at least six observations for each independent variable).
19
• Determine whether the required conditions are satisfied. If not, attempt to correct the problem.
• Select the best model.– Use the statistical output.– Use your judgment!!