ryan ehardt ime 416 lab1 milk sales lab final
TRANSCRIPT
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
1/23
HAPPY COW MILK CO. LTD
LAB 1 RegressionBob White
Ryan Ehardt
10/3/2011
This report forecasts the next 24 months of sales from the 1% milk product line for Happy
Cow Milk Co. Ltd. It then will compare and contrast different forecasting methods usedalong with providing evaluation techniques for each method.
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
2/23
DATA:
In order to set up the regression we must first have data. Data is provided for the past five years
(60 months) of 1% milk from Happy Cow Milk Company. Along with the data from Happy Cow,
there are also ten other variables considered for our multiple regression method. Those data
are: CPI of all milk in USA, sales for retail and food service, retail sales of grocery stores, CPI of
breakfast cereal, eggs, cheese and related products, ice cream and related products, non-frozen
noncarbonated juices, flour and prepared flour mixes, and cookies.
All data is adjusted for seasonality by finding the average of the sales data, and then dividing the
sales data for each period by that average. This gives us the seasonality for each of the data
sets, but then we must average all of the same time periods. So for example to find the
seasonality index of January, we must average the seasonality index for the month of January
from years 2006 through 2010.
Finding the SI index of this data is important for the fact that we can then take the seasonality
out of our data so instead of large swings in the data due to the time of year, it keeps the data
from fluctuating quite as much. This aspect helps when we forecast our values into the future.
Listed in table 1 is an example of how adjusted data is found from the raw data provided on the
internet. Step 1 is the raw data for each month divided by the average of all data sets (60
months total). Next, Step 2 (Seasonality Index) averages the values that we found in Step 1 over
the course of our whole data set for each month in particular, i.e. the value for January on step 2
of 1.04547 is found from the averages of Step 1 for January from 2006,2007,2008,2009, and 2010.
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
3/23
Then to get our seasonality out of the equation we must divide the raw data from Step 2
(Seasonality Index)
Table 1. Finding SI and adjusted data for HCMC Lowfat Milk (1%) sales
MOOOO VING AVERAGE METHODS:
You can perform moving averages with any amount of months that you would like, but the most
common for data such as this is three month and six month. Three month moving average will
follow the individual data points more closely whereas the six month will track the overall trend
giving you less fluctuation to individual data points.
Three month moving averages takes the past three months worth of data and averages them.
That number is then used as the forecast for month four. Formula 1 shows this in more detail
Formula 1.
( )=
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
4/23
Table 2. Three month moving average methods for forecasting milk sales
Graph 1 illustrates the three month moving average using raw data for the sales of 1% milk. As
we can see, the trend line follows data closer in graph 1 as it does in graph 2 for the six month
moving average.
Graph 1 Three month moving average for sales of 1% milk from HCMC over five years
Six month moving average uses the same idea only it is formed from the averages of the past 6
months of data. Below in Table 3 shows how the six month moving average works
400
450
500
550
600
650
H C M C 1 % M i l k S a
l e s M i l l
$
Three Month Moving Average
Sales Data
3 Month M.A.
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
5/23
Table 3. - Six month moving average methods for forecasting milk sales
Graph 2 Six month moving average, five years of sales for 1% milk from HCMC
Simple Linear Regression:
Simple linear regression is a straight line best fit of the data which is then used to project future
results based on that straight line. The main governing formula for the straight line is found in
400
450
500
550
600
650
H C M C 1 % M i l k S a
l e s M i l l
$
Six Month Moving Average
Sales
6 Month M.A.
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
6/23
formula 2. Out of that formula, a (Formula 4) and b (Formula 3) must be found from the data
series that you are examining.
Formula 2. Formula 3. ( ) Formula 4. + After applying these formulas to the dataset for 1% milk sales, the regression line came out to be
Y = 526.2 + 1.1(X) as shown on Graph 3 for the regression line over five years of sales data
Graph 3. Linear regression line, five years of sales for HCMCs 1% milk
Non-Linear Regression:
For the non-linear regression a fourth order polynomial was used. This formula was found by
using the fourth order polynomial function in excel. The equation was:
y= -.00003X^4 + .0038X^3 -.1492X^2 + 2.5759X^3 + 522.99
500.00
515.00
530.00
545.00
560.00
575.00
590.00
m i l k s a
l e s
Linear Regression line
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
7/23
Graph 4. Non- Linear regression line, five years of sales for HCMCs 1% milk
Multiple Regression With 10 Variables:
In order to perform multiple regression we need data to compare our milk sales data to. For
this particular report, the data that was used for comparison was CPI of Milk sales X1, Bread X2,
Grocery Store Sales X3, Breakfast Cereal X4, Eggs X5, Cheese and Related Products X6, Ice Cream
and Related Products X7, Non-frozen Noncarbonated Juices X8, Cookies X9, and Flour and
Prepared Flour Mixes X10.
Multiple regression is performed in the same manner as linear regression, only instead of
comparing the data that you are interested to with a linear line (1,2,3,4n) we compare the data
that we are interested in (HCMC 1% Lowfat Milk Sales) with the other data sets mentioned above.
We can compare the milk sales with all of the 10 variables or less to find out the best match for
our data set.
When comparing all data points the resulting equation from Minitab statistical software was:
Y = 6 1.2X1 + .487X2 + .0062X3 - .212X4 + .107X5 + .514X6 + 1.12X7 + 1.2X8 + .716X9 - .995X10
480.00
500.00
520.00
540.00
560.00
580.00
600.00
620.00
S a l e s D a t a 1 % M i l
k
Non-Linear Regression
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
8/23
This gave us an adjusted R^2 of 86.7%, an F value of 39.5 and a D.W. Stat of 2.571
This is not terrible results, but by removing some of the variables, a higher adjusted R^2 and F
value was able to be found, along with a D.W. stat closer to 2.
Table 4. Standard Error for variables 1 10
By observing the Standard Error associated with each variable, variables with the largest
Standard Error were taken out until half are left (the half with the lowest error). The reason for
this is to eliminate the variables that are known to cause error in the base equation but still leave
enough variables to see if they help to get better stats when they are together.
Below is Table 5 listing the various statistics found once variables with large Standard Error were
taken out of the equation. Notice how having X1 and X3 yielded a D.W. Stat closer to 2, higher
Adjusted R^2 and a slightly higher F Value than just having variable X3.
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
9/23
Variables F Value D.W. Stat Adj. R^2
X3, X5, X6, X9 65.99 2.47592 84.6
X1, X3 163.62 2.12152 84.6
X3 162.42 1.23638 73.2
Table 5. Multiple regression results
The best outcome for multiple regression was using X1, and X3. The formula came out to be:
Y = 82.9 - .769X1 + .0139X3
Multiple Regression with Variable Transformations:
The purpose of transformations on our multiple regression variables is to see if we change the
underlying data by a certain degree if that will help smooth out the data to give a better data set
for performing a regression off of. As we can see from the table, most of the transformations
yielded a D.W. Stat close to 2, an Adjusted R^2 close to 85% and an F Value between 160 and 165.
All 56 transformations tried fall within the acceptable range for a good regression, but the
500.00
510.00
520.00
530.00
540.00
550.00
560.00
570.00
580.00
590.00
600.00
1 % M i l k S a l e s H C M C
Multiple Regression
Sales
Reg. X1,X3
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
10/23
transformation that yielded a D.W. stat closest to 2 was the Log of X1 and X3^3
X1 Transform X3 Transform F Value D.W. Stat Adj. R^2Log(X1) X3^3 160.4 2.09701 84.4
ln(X1) X3^3 160.4 2.09701 84.4
ln(X1)^2 X3^3 160.55 2.09851 84.4
log(X1)^2 X3^3 160.55 2.09851 84.4
sqr. Rt X1 X3^3 160.76 2.10067 84.4
X1 X3^3 161.1 2.10418 84.4
Log(X1) X3^2 161.86 2.10728 84.5
ln(X1) X3^2 161.86 2.10728 84.5
ln(X1)^2 X3^2 162.01 2.10886 84.5
log(X1)^2 X3^2 162.01 2.10886 84.5
X1^2 X3^3 161.73 2.11075 84.5
sqr. Rt X1 X3^2 162.23 2.11113 84.5
Log(X1) X3 162.89 2.11362 84.6
Log(X1) X3 162.89 2.11362 84.6
ln(X1) X3 162.89 2.11362 84.6
Log(X1) sqr. Rt X3 163.23 2.11526 84.6
ln(X1) sqr. Rt X3 163.23 2.11526 84.6
ln(X1)^2 X3 163.04 2.11528 84.6
log(X1)^2 X3 163.04 2.11528 84.6
Log(X1) ln(X3)^2 163.42 2.11583 84.6
Log(X1) log(X3)^2 163.42 2.11583 84.6
ln(X1) ln(X3)^2 163.42 2.11583 84.6
ln(X1) log(X3)^2 163.42 2.11583 84.6
Log(X1) ln(X3) 163.45 2.11586 84.6
ln(X1) Log(X3) 163.45 2.11586 84.6
ln(X1)^2 sqr. Rt X3 163.39 2.11696 84.6
log(X1)^2 sqr. Rt X3 163.39 2.11696 84.6
ln(X1)^2 log(X3)^2 163.58 2.11756 84.6
log(X1)^2 ln(X3)^2 163.58 2.11756 84.6
ln(X1)^2 Log(X3) 163.61 2.1176 84.6
log(X1)^2 Log(X3) 163.61 2.1176 84.6
ln(X1)^2 ln(X3) 163.61 2.1176 84.6
log(X1)^2 ln(X3) 163.61 2.1176 84.6
sqr. Rt X1 X3 163.27 2.11766 84.6
sqr. Rt X1 ln(X3)^2 163.8 2.12003 84.7
sqr. Rt X1 log(X3)^2 163.8 2.12003 84.7
sqr. Rt X1 Log(X3) 163.84 2.12008 84.7
sqr. Rt X1 ln(X3) 163.84 2.12008 84.7
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
11/23
X1 sqr. Rt X3 163.97 2.12333 84.7
X1 ln(X3)^2 164.17 2.12404 84.7
X1 log(X3)^2 164.17 2.12404 84.7
X1 Log(X3) 164.2 2.1241 84.7
X1 ln(X3) 164.2 2.1241 84.7
X1^3 X3^2 163.8 2.12796 84.7
X1^2 X3 164.28 2.12869 84.7
X1^2 sqr. Rt X3 164.63 2.13064 84.7
X1^2 ln(X3)^2 164.83 2.13146 84.7
X1^2 log(X3)^2 164.83 2.13146 84.7
X1^2 Log(X3) 164.87 2.13155 84.7
X1^2 ln(X3) 164.87 2.13155 84.7
X1^3 X3 164.85 2.13518 84.7
X1^3 sqr. Rt X3 165.21 2.13725 84.8
X1^3 ln(X3)^2 165.41 2.13815 84.8
X1^3 log(X3)^2 165.41 2.13815 84.8
X1^3 Log(X3) 165.44 2.13826 84.8
X1^3 ln(X3) 165.44 2.13826 84.8
Table 6. List of multiple regression transformations
Since in the multiple regression analysis part I found X1 and X3 to yield the best results, I wanted
to find the most optimum form of these two variables, but it is important to note that other
variables could also be used in performing the transformations and potentially get better results.
500.00
510.00
520.00
530.00
540.00
550.00
560.00
570.00
580.00
590.00
600.00
1 % M i l k S a
l e s H C M C
Multiple Regression W/ Transform
Sales
Reg.Log(X1),X3^3
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
12/23
1st order Exponential Smoothing:
To gather data for the 1 st order exponential smoothing, formula 5 is used. 1 st order smoothing is
performed on both raw sales data and adjusted data. When the adjusted data is used, the
seasonality index must be included for the forecasted data (multiplied) and then compared to the
raw sales data before finding the alpha which yields the lowest error. It is also important to
notice that the main difference between this method and the moving average method is that this
method takes into account the average of the entire data and not just the last N periods of data
by use of the smoothing factor alpha.
Formula 5. (1 )
An example of the first four months of data applying the 1 st order exponential smoothing method
on our adjusted sales data is shown in table 7 below:
Table 7. First 4 months of sales data and accompanying 1 st order smoothing results
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
13/23
2nd order Exponential Smoothing:
This method is an upgraded form of 1 st order exponential smoothing. The difference between
this method and 1 st order is that it takes into account the change in alpha to further smooth the
data. There are four steps to finding the forecast for the 2 nd order exponential smoothing
method. The first step is to find the forecast for the 1 st order. Next multiply the error from the
previous months 1 st order forecast and sales by alpha (Formula 6). The third step is to multiply
what you found in step 2 by alpha, after this add on 1 alpha multiplied by your previous trend
from this step (you must start with an arbitrary trend which you can then optimize in solver)
(Formula 7). The fourth step is basically taking the 1 st order results and factoring in the change of
alpha from what you have found in step 3 (Formula 8).
Formula 6. 2 (1 )
450
470
490
510
530
550
570
590610
630
650
1 % M i l k s a
l e s H C M C
1st Order Exponential Smoothing
Sales
1st order
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
14/23
Formula 7 . 3 ( 2) (1 )
Formula 8. 4 1 3 Once you have calculated step 4, this is your forecast from which you can multiply by the
seasonality index if using adjusted data, find the error in the trend, and solve for alpha.
Below in Table 8 is an example of the first 4 months from the 2 nd order method
Table 8. First 4 months of 2 nd order exponential smoothing method
Holt-Winter:
The Holt-Winter method is a type of 2 nd order exponential smoothing method that takes into
account a moving linear time series trend line. Holt-Winter uses the conventional alpha, but
450
470
490
510
530
550
570
590
610
630
650
1 % M i l k S a
l e s H C M C
2nd Order Exponential Smoothing
Sales
2nd order
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
15/23
also includes a beta and gamma. The first step to finding the Holt-Winter is to take the centered
moving average of your data. To calculate a 4 month centered moving average with the center
on the 3 rd period, average the first four months, add that value to the average of months two
through five. After you have this value, divide it by two. Continue this process moving down
by one month each time. Centered moving average is shown in Formula 9.
Formula 9.
( )
+ +( )
After you have found the centered moving averages, divide the sales data by the centered
moving average. This will give you the time series for every month, but it needs to be the
combined for all months, so find the average of all the values for January, February, March, etc.
Next, they must all add up to 12. If they do not, then add them all up and divide by 12.
Multiply the result by each of the original averages for the time series. Now you are ready to
divide the sales data by the time series that you just found. This will give you sixty months that
have the time series taken out of it. The next step is to perform a regression analysis on your
new data. The Intercept and X variable will be used within the Holt-Winter formulas to project
out into the future.
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
16/23
In order to get started projecting sales data out past your analysis period, you must use the
Intercept found from regression as your old forecast, and the X variable as your trend. Inserting
this data into Formula 10 will give you the value for F that you need to calculate for the rest of
the formulas.
Formula 10. (1 )( ) Formula 11. (1 ) Formula 12. ( ) (1 ) Formula 13.
( )
The final formula, formula 13 will give you the forecast for the next month data.
450
470
490
510
530
550
570
590610
630
650
S a l e s o
f 1 % m i l k H C M C
Holt-Winter
Sales
Holt-Winter
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
17/23
Table 9. Holt winter for months 58(-2),59(-1),60(0), and eight months into the future
In this table, I used simple linear regression to project sales data for the unknown future values of
the sales data. Period 0 corresponds to the last period that we have sales data for, period 1 is
the first month that we do not yet have fixed sales data for and is starting the validation period .
ARRSES:
The advantage of Adaptive Response Rate Single Exponential Smoothing is that the alpha is
dynamic, meaning that it changes as n increases. Plugging our sales data into the following
formulas we are able to produce results found as in table 10.
Formula 14. (1 ) Formula 15. Formula 16.
( ) (1 )
Formula 17.
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
18/23
Formula 18. || (1 )
Table 10. ARRSES for the first four months of sales data
Time Series Method:
This method is a continuation of simple linear regression. It takes the sales data and divides by
the forecast for regression. Next multiply that value by the linear regression value and the
seasonal index.
450
470
490
510
530
550
570
590
610
630
650
S a l e s o
f 1 % M i l k H C M C
ARRSES
Sales
ARRSES
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
19/23
NiuBi Method:
This method is based upon the 6 month moving average method and the principal that you want
more weight on the values closer to the data points that you are trying to forecast. This method
takes the averages of sales for i-1,i-2, and i-5
Evaluation Techniques:
450
470
490
510
530
550
570
590610
630
650
S a l e s o
f 1 % m i l k H C M C
Time Series
Sales
Time Series
480
500
520
540
560
580
600
620
1 % m i l k s a
l e s H C M C
NiuBi Method
SalesNiuBi
06 07 08 09 10
06 07 08 09 10
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
20/23
The first evaluation technique is the Mean Squared Error (MSE) shown in formula 19. Another
is the Mean Absolute Deviation (MAD) -- Formula 20. They are both ways of looking at the error
between your sales data and your forecasted sales. The MAD is generally preferred over the
MSE since it doesnt require squaring. MAPE is also another evaluation te chnique shown in
Formula 21, and unlike MSE and MAD it is not dependent on the magnitude of the values of
demand. The last of our common evaluation techniques is the Mean Percentage Error (Formula
22), which is the same as the MAPE just not using the absolute value.
Formula 19. = Formula 20. ||= Formula 21. * |/ |= Formula 22.
* /=
One more test is the Theil U Statistic. A Theil U of zero to one shows that the method you are
using is significant, however this statistic cannot be compared to other regression methods like
the four previous evaluation techniques.
Formula 23. ( )( )
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
21/23
To find the future data, we must extrapolate out the multiple regression on X1 and X3, but to do
so, we must have values for X1 and X3 into the future also. The way to get around this is to
perform a simple linear regression on X1 and project it out, and also a simple linear regression on
X3 projecting that out for the next two years also. Now that we have data for X1 and X3 for the
next two years we can continue our multiple regression analysis. The only problem with this
method is that it is only as good as our linear regression analysis on the data which had a higher
error than multiple regression. However, even though it had a higher error, the error was not
exceptionally greater than multiple regression.
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
22/23
Since the sales data for the future is unknown, and no matter what method or device you use to
determine the future sales data, it will be wrong. However, we can use Crystal Ball to add
variability to our data. Setting up Crystal Ball on Multiple Regression with the future sales data
as our random variables and forecasting MSE, it is 95% certain that if you ran 1000 trials the MSE
would be between 566 and 1754. That is a high MSE for the data in this report, however, I
contribute that number to the variation of projecting on data that is projected out using a
different method. Also, the farther out you go, the harder it is to project data. If we were only
projecting out one year, the MSE would be much smaller.
-
8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final
23/23
Executive Summary:
18 forecasting methods were compared with MSEs ranging from 60 to 1700. Each method was
performed, tested, and determined to be acceptable or not. Out of all the methods tried, all of
them were found to be acceptable. The end result was to try and find the best forecast possible
for Happy Cow Milk Companys 1% Milk Sales. The best method used was found to be the
Multiple Regression using X1 and X3 (Grocery Sales and CPI of milk) based on MSE, MAD, and
MAPE. This method was then used to forecast out for the next two years by using linear
regression for X1 and X3 data. Crystal Ball was used to add variability to the unknown sales data
in order to determine the MSE of forecasting this data out two years.