© 2000 prentice-hall, inc. chap. 9 - 1 forecasting using the simple linear regression model and...
Post on 22-Dec-2015
220 views
TRANSCRIPT
© 2000 Prentice-Hall, Inc. Chap. 9 - 1
Forecasting Using the Simple Linear Regression
Model and Correlation
© 2000 Prentice-Hall, Inc. Chap. 9 - 2
What is a forecast?
Using a statistical method on past data to predict the future.
Using experience, judgment and surveys to predict the future.
© 2000 Prentice-Hall, Inc. Chap. 9 - 3
Why forecast?
to enhance planning.to force thinking about the future.to fit corporate strategy to future conditions. to coordinate departments to the same future.to reduce corporate costs.
© 2000 Prentice-Hall, Inc. Chap. 9 - 4
Kinds of Forecasts
Causal forecasts are when changes in a variable (Y) you wish to predict are caused by changes in other variables (X's).
Time series forecasts are when changes in a variable (Y) are predicted based on prior values of itself (Y).
Regression can provide both kinds of forecasts.
© 2000 Prentice-Hall, Inc. Chap. 9 - 5
Types of Relationships
Positive Linear Relationship Negative Linear Relationship
© 2000 Prentice-Hall, Inc. Chap. 9 - 6
Types of Relationships
Relationship NOT Linear No Relationship
(continued)
© 2000 Prentice-Hall, Inc. Chap. 9 - 7
Relationships
If the relationship is not linear, the forecaster often has to use math transformations to make the relationship linear.
© 2000 Prentice-Hall, Inc. Chap. 9 - 8
Correlation Analysis
•Correlation measures the strength of the linear relationship between variables.
•It can be used to find the best predictor variables.
• It does not assure that there is a causal relationship between the variables.
© 2000 Prentice-Hall, Inc. Chap. 9 - 9
The Correlation Coefficient
• Ranges between -1 and 1.
• The Closer to -1, The Stronger Is The Negative Linear Relationship.
• The Closer to 1, The Stronger Is The Positive Linear Relationship.
• The Closer to 0, The Weaker Is Any Linear Relationship.
© 2000 Prentice-Hall, Inc. Chap. 9 - 10
Y
X
Y
X
Y
X
Y
X
Y
X
Graphs of Various Correlation (r) Values
r = -1 r = -.6 r = 0
r = .6 r = 1
© 2000 Prentice-Hall, Inc. Chap. 9 - 11
The Scatter Diagram
0
20
40
60
80
0 20 40 60
X
Y
Plot of all (Xi , Yi) pairs
© 2000 Prentice-Hall, Inc. Chap. 9 - 12
The Scatter Diagram
Is used to visualize the relationship and to assess its linearity. The scatter diagram can also be used to identify outliers.
© 2000 Prentice-Hall, Inc. Chap. 9 - 13
Regression Analysis
Regression Analysis can be used to model causality and make predictions.
Terminology: The variable to be predicted is called the dependent or response variable. The variables used in the prediction model are called independent, explanatory or predictor variables.
© 2000 Prentice-Hall, Inc. Chap. 9 - 14
Simple Linear Regression Model
•The relationship between variables is described by a linear function.•A change of one variable causes the other variable to change.
© 2000 Prentice-Hall, Inc. Chap. 9 - 15
• Population Regression Line Is A Straight Line that Describes The Dependence of One Variable on The Other
Population Y intercept
Population SlopeCoefficient
Random Error
Dependent (Response) Variable
Independent (Explanatory) Variable
Population Linear Regression
ii iY X PopulationRegressionLine
© 2000 Prentice-Hall, Inc. Chap. 9 - 16
= Random Error
Y
X
How is the best line found?
Observed Value
Observed Value
YX iX
i
ii iY X
© 2000 Prentice-Hall, Inc. Chap. 9 - 17
Sample Linear Regression
Sample Y Intercept
SampleSlopeCoefficient
Sample Regression Line Provides an Estimate of The Population Regression Line
0 1i iib bY X e
provides an estimate of
provides an estimate of
0b
1b
Sample Regression Line
Residual
Y
© 2000 Prentice-Hall, Inc. Chap. 9 - 18
Simple Linear Regression: An Example
You wish to examine the relationship between the square footage of produce stores and their annual sales. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best
Annual Store Square Sales
Feet ($1000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
© 2000 Prentice-Hall, Inc. Chap. 9 - 19
The Scatter Diagram
0
2000
4000
6000
8000
10000
12000
0 1000 2000 3000 4000 5000 6000
Square Feet
An
nu
al
Sa
les
($00
0)
Excel Output
© 2000 Prentice-Hall, Inc. Chap. 9 - 20
The Equation for the Regression Line
i
ii
X..
XbbY
4871415163610
From Excel Printout:
CoefficientsIntercept 1636.414726X Variable 1 1.486633657
© 2000 Prentice-Hall, Inc. Chap. 9 - 21
Graph of the Regression Line
0
2000
4000
6000
8000
10000
12000
0 1000 2000 3000 4000 5000 6000
Square Feet
An
nu
al
Sa
les
($00
0)
Y i = 1636.415 +1.487X i
© 2000 Prentice-Hall, Inc. Chap. 9 - 22
Interpreting the Results
Yi = 1636.415 +1.487Xi
The slope of 1.487 means that each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.
The model estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487.
© 2000 Prentice-Hall, Inc. Chap. 9 - 23
The Coefficient of Determination
SSR regression sum of squares
SST total sum of squaresr2 = =
The Coefficient of Determination (r2 ) measures the proportion of variation in Y explained by the independent variable X.
© 2000 Prentice-Hall, Inc. Chap. 9 - 24
Coefficients of Determination (R2) and Correlation (R)
r2 = 1,
Y
Yi = b0 + b1Xi
X
^
r = +1
© 2000 Prentice-Hall, Inc. Chap. 9 - 25
Coefficients of Determination (R2) and Correlation (R)
r2 = .81, r = +0.9
Y
Yi = b0 + b1Xi
X
^
(continued)
© 2000 Prentice-Hall, Inc. Chap. 9 - 26
Coefficients of Determination (R2) and Correlation (R)
r2 = 0, r = 0
Y
Yi = b0 + b1Xi
X
^
(continued)
© 2000 Prentice-Hall, Inc. Chap. 9 - 27
Coefficients of Determination (R2) and Correlation (R)
r2 = 1, r = -1
Y
Yi = b0 + b1Xi
X
^
(continued)
© 2000 Prentice-Hall, Inc. Chap. 9 - 28
Correlation: The Symbols
• Population correlation coefficient (‘rho’) measures the strength between two variables.
• Sample correlation coefficient r estimates based on a set of sample observations.
© 2000 Prentice-Hall, Inc. Chap. 9 - 29
Regression StatisticsMultiple R 0.9705572R Square 0.94198129Adjusted R Square 0.93037754Standard Error 611.751517Observations 7
Example: Produce Stores
From Excel Printout
© 2000 Prentice-Hall, Inc. Chap. 9 - 30
Inferences About the Slope
• t Test for a Population SlopeIs There A Linear Relationship between X and Y ?
1
11
bS
bt
•Test Statistic:
n
ii
YXb
)XX(
SS
1
21
and df = n - 2
• Null and Alternative Hypotheses
H0: 1 = 0 (No Linear Relationship) H1: 1 0 (Linear Relationship)
Where
© 2000 Prentice-Hall, Inc. Chap. 9 - 31
Example: Produce Stores
Data for 7 Stores: Estimated Regression Equation:
The slope of this model is 1.487.
Is Square Footage of the store affecting its Annual Sales?
Annual Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
Yi = 1636.415 +1.487Xi
© 2000 Prentice-Hall, Inc. Chap. 9 - 32
t Stat P-valueIntercept 3.6244333 0.0151488X Variable 1 9.009944 0.0002812
H0: 1 = 0
H1: 1 0
.05
df 7 - 2 = 5
Critical value(s):
Test Statistic:
Decision:
Conclusion:
There is evidence of a linear relationship.t0 2.5706-2.5706
.025
Reject Reject
.025
From Excel Printout
Reject H0
Inferences About the Slope: t Test Example
© 2000 Prentice-Hall, Inc. Chap. 9 - 33
Inferences About the Slope Using A Confidence Interval
Confidence Interval Estimate of the Slopeb1 tn-2 1bS
Excel Printout for Produce Stores
At 95% level of Confidence The confidence Interval for the slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear relationship between annual sales and the size of the store.
Lower 95% Upper 95%Intercept 475.810926 2797.01853X Variable 11.06249037 1.91077694
© 2000 Prentice-Hall, Inc. Chap. 9 - 34
Residual Analysis
Is used to evaluate validity of assumptions. Residual analysis uses numerical measures and plots to assure the validity of the assumptions.
© 2000 Prentice-Hall, Inc. Chap. 9 - 35
Linear Regression Assumptions
1. X is linearly related to Y.
2. The variance is constant for each value of Y (Homoscedasticity).
3. The Residual Error is Normally Distributed.
4. If the data is over time, then the errors must be independent.
© 2000 Prentice-Hall, Inc. Chap. 9 - 36
Residual Analysis for Linearity
Not Linear Linear
X
e eX
Y
X
Y
X
© 2000 Prentice-Hall, Inc. Chap. 9 - 37
Residual Analysis for Homoscedasticity
Heteroscedasticity Homoscedasticity
e
X
e
X
Y
X X
Y
© 2000 Prentice-Hall, Inc. Chap. 9 - 38
Residual Analysis for Independence: The Durbin-Watson Statistic
It is used when data is collected over time.
It detects autocorrelation; that is, the residuals in one time period are related to residuals in another time period.
It measures violation of independence assumption.
n
ii
n
iii
e
)ee(D
1
2
2
21 Calculate D and
compare it to the value in Table E.8.
© 2000 Prentice-Hall, Inc. Chap. 9 - 39
Preparing Confidence Intervals for Forecasts
© 2000 Prentice-Hall, Inc. Chap. 9 - 40
Interval Estimates for Different Values of X
X
Y
X
Confidence Interval for a individual Yi
A Given X
Confidence Interval for the mean of Y
Y i = b0 + b1X i
_
© 2000 Prentice-Hall, Inc. Chap. 9 - 41
Estimation of Predicted Values
Confidence Interval Estimate for YX
The Mean of Y given a particular Xi
n
ii
iyxni
)XX(
)XX(
nStY
1
2
2
21
t value from table with df=n-2
Standard error of the estimate
Size of interval vary according to distance away from mean, X.
© 2000 Prentice-Hall, Inc. Chap. 9 - 42
Estimation of Predicted Values
Confidence Interval Estimate for Individual Response Yi at a Particular Xi
n
ii
iyxni
)XX(
)XX(
nStY
1
2
2
21
1
Addition of 1 increases width of interval from that for the mean of Y
© 2000 Prentice-Hall, Inc. Chap. 9 - 43
Example: Produce Stores
Yi = 1636.415 +1.487Xi
Data for 7 Stores:
Regression Model Obtained:
Predict the annual sales for a store with
2000 square feet.
Annual Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
© 2000 Prentice-Hall, Inc. Chap. 9 - 44
Estimation of Predicted Values: Example
Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet
n
ii
iyxni
)XX(
)XX(
nStY
1
2
2
21
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706
= 4610.45 612.66
Confidence interval for mean Y
Confidence Interval Estimate for YX
© 2000 Prentice-Hall, Inc. Chap. 9 - 45
Estimation of Predicted Values: Example
Find the 95% confidence interval for annual sales of one particular store of 2,000 square feet
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706
= 4610.45 1687.68
Confidence interval for individual Y
n
ii
iyxni
)XX(
)XX(
nStY
1
2
2
21
1
Confidence Interval Estimate for Individual Y