ryan ehardt ime 416 lab1 milk sales lab final

Upload: ryan-ehardt

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    1/23

    HAPPY COW MILK CO. LTD

    LAB 1 RegressionBob White

    Ryan Ehardt

    10/3/2011

    This report forecasts the next 24 months of sales from the 1% milk product line for Happy

    Cow Milk Co. Ltd. It then will compare and contrast different forecasting methods usedalong with providing evaluation techniques for each method.

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    2/23

    DATA:

    In order to set up the regression we must first have data. Data is provided for the past five years

    (60 months) of 1% milk from Happy Cow Milk Company. Along with the data from Happy Cow,

    there are also ten other variables considered for our multiple regression method. Those data

    are: CPI of all milk in USA, sales for retail and food service, retail sales of grocery stores, CPI of

    breakfast cereal, eggs, cheese and related products, ice cream and related products, non-frozen

    noncarbonated juices, flour and prepared flour mixes, and cookies.

    All data is adjusted for seasonality by finding the average of the sales data, and then dividing the

    sales data for each period by that average. This gives us the seasonality for each of the data

    sets, but then we must average all of the same time periods. So for example to find the

    seasonality index of January, we must average the seasonality index for the month of January

    from years 2006 through 2010.

    Finding the SI index of this data is important for the fact that we can then take the seasonality

    out of our data so instead of large swings in the data due to the time of year, it keeps the data

    from fluctuating quite as much. This aspect helps when we forecast our values into the future.

    Listed in table 1 is an example of how adjusted data is found from the raw data provided on the

    internet. Step 1 is the raw data for each month divided by the average of all data sets (60

    months total). Next, Step 2 (Seasonality Index) averages the values that we found in Step 1 over

    the course of our whole data set for each month in particular, i.e. the value for January on step 2

    of 1.04547 is found from the averages of Step 1 for January from 2006,2007,2008,2009, and 2010.

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    3/23

    Then to get our seasonality out of the equation we must divide the raw data from Step 2

    (Seasonality Index)

    Table 1. Finding SI and adjusted data for HCMC Lowfat Milk (1%) sales

    MOOOO VING AVERAGE METHODS:

    You can perform moving averages with any amount of months that you would like, but the most

    common for data such as this is three month and six month. Three month moving average will

    follow the individual data points more closely whereas the six month will track the overall trend

    giving you less fluctuation to individual data points.

    Three month moving averages takes the past three months worth of data and averages them.

    That number is then used as the forecast for month four. Formula 1 shows this in more detail

    Formula 1.

    ( )=

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    4/23

    Table 2. Three month moving average methods for forecasting milk sales

    Graph 1 illustrates the three month moving average using raw data for the sales of 1% milk. As

    we can see, the trend line follows data closer in graph 1 as it does in graph 2 for the six month

    moving average.

    Graph 1 Three month moving average for sales of 1% milk from HCMC over five years

    Six month moving average uses the same idea only it is formed from the averages of the past 6

    months of data. Below in Table 3 shows how the six month moving average works

    400

    450

    500

    550

    600

    650

    H C M C 1 % M i l k S a

    l e s M i l l

    $

    Three Month Moving Average

    Sales Data

    3 Month M.A.

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    5/23

    Table 3. - Six month moving average methods for forecasting milk sales

    Graph 2 Six month moving average, five years of sales for 1% milk from HCMC

    Simple Linear Regression:

    Simple linear regression is a straight line best fit of the data which is then used to project future

    results based on that straight line. The main governing formula for the straight line is found in

    400

    450

    500

    550

    600

    650

    H C M C 1 % M i l k S a

    l e s M i l l

    $

    Six Month Moving Average

    Sales

    6 Month M.A.

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    6/23

    formula 2. Out of that formula, a (Formula 4) and b (Formula 3) must be found from the data

    series that you are examining.

    Formula 2. Formula 3. ( ) Formula 4. + After applying these formulas to the dataset for 1% milk sales, the regression line came out to be

    Y = 526.2 + 1.1(X) as shown on Graph 3 for the regression line over five years of sales data

    Graph 3. Linear regression line, five years of sales for HCMCs 1% milk

    Non-Linear Regression:

    For the non-linear regression a fourth order polynomial was used. This formula was found by

    using the fourth order polynomial function in excel. The equation was:

    y= -.00003X^4 + .0038X^3 -.1492X^2 + 2.5759X^3 + 522.99

    500.00

    515.00

    530.00

    545.00

    560.00

    575.00

    590.00

    m i l k s a

    l e s

    Linear Regression line

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    7/23

    Graph 4. Non- Linear regression line, five years of sales for HCMCs 1% milk

    Multiple Regression With 10 Variables:

    In order to perform multiple regression we need data to compare our milk sales data to. For

    this particular report, the data that was used for comparison was CPI of Milk sales X1, Bread X2,

    Grocery Store Sales X3, Breakfast Cereal X4, Eggs X5, Cheese and Related Products X6, Ice Cream

    and Related Products X7, Non-frozen Noncarbonated Juices X8, Cookies X9, and Flour and

    Prepared Flour Mixes X10.

    Multiple regression is performed in the same manner as linear regression, only instead of

    comparing the data that you are interested to with a linear line (1,2,3,4n) we compare the data

    that we are interested in (HCMC 1% Lowfat Milk Sales) with the other data sets mentioned above.

    We can compare the milk sales with all of the 10 variables or less to find out the best match for

    our data set.

    When comparing all data points the resulting equation from Minitab statistical software was:

    Y = 6 1.2X1 + .487X2 + .0062X3 - .212X4 + .107X5 + .514X6 + 1.12X7 + 1.2X8 + .716X9 - .995X10

    480.00

    500.00

    520.00

    540.00

    560.00

    580.00

    600.00

    620.00

    S a l e s D a t a 1 % M i l

    k

    Non-Linear Regression

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    8/23

    This gave us an adjusted R^2 of 86.7%, an F value of 39.5 and a D.W. Stat of 2.571

    This is not terrible results, but by removing some of the variables, a higher adjusted R^2 and F

    value was able to be found, along with a D.W. stat closer to 2.

    Table 4. Standard Error for variables 1 10

    By observing the Standard Error associated with each variable, variables with the largest

    Standard Error were taken out until half are left (the half with the lowest error). The reason for

    this is to eliminate the variables that are known to cause error in the base equation but still leave

    enough variables to see if they help to get better stats when they are together.

    Below is Table 5 listing the various statistics found once variables with large Standard Error were

    taken out of the equation. Notice how having X1 and X3 yielded a D.W. Stat closer to 2, higher

    Adjusted R^2 and a slightly higher F Value than just having variable X3.

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    9/23

    Variables F Value D.W. Stat Adj. R^2

    X3, X5, X6, X9 65.99 2.47592 84.6

    X1, X3 163.62 2.12152 84.6

    X3 162.42 1.23638 73.2

    Table 5. Multiple regression results

    The best outcome for multiple regression was using X1, and X3. The formula came out to be:

    Y = 82.9 - .769X1 + .0139X3

    Multiple Regression with Variable Transformations:

    The purpose of transformations on our multiple regression variables is to see if we change the

    underlying data by a certain degree if that will help smooth out the data to give a better data set

    for performing a regression off of. As we can see from the table, most of the transformations

    yielded a D.W. Stat close to 2, an Adjusted R^2 close to 85% and an F Value between 160 and 165.

    All 56 transformations tried fall within the acceptable range for a good regression, but the

    500.00

    510.00

    520.00

    530.00

    540.00

    550.00

    560.00

    570.00

    580.00

    590.00

    600.00

    1 % M i l k S a l e s H C M C

    Multiple Regression

    Sales

    Reg. X1,X3

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    10/23

    transformation that yielded a D.W. stat closest to 2 was the Log of X1 and X3^3

    X1 Transform X3 Transform F Value D.W. Stat Adj. R^2Log(X1) X3^3 160.4 2.09701 84.4

    ln(X1) X3^3 160.4 2.09701 84.4

    ln(X1)^2 X3^3 160.55 2.09851 84.4

    log(X1)^2 X3^3 160.55 2.09851 84.4

    sqr. Rt X1 X3^3 160.76 2.10067 84.4

    X1 X3^3 161.1 2.10418 84.4

    Log(X1) X3^2 161.86 2.10728 84.5

    ln(X1) X3^2 161.86 2.10728 84.5

    ln(X1)^2 X3^2 162.01 2.10886 84.5

    log(X1)^2 X3^2 162.01 2.10886 84.5

    X1^2 X3^3 161.73 2.11075 84.5

    sqr. Rt X1 X3^2 162.23 2.11113 84.5

    Log(X1) X3 162.89 2.11362 84.6

    Log(X1) X3 162.89 2.11362 84.6

    ln(X1) X3 162.89 2.11362 84.6

    Log(X1) sqr. Rt X3 163.23 2.11526 84.6

    ln(X1) sqr. Rt X3 163.23 2.11526 84.6

    ln(X1)^2 X3 163.04 2.11528 84.6

    log(X1)^2 X3 163.04 2.11528 84.6

    Log(X1) ln(X3)^2 163.42 2.11583 84.6

    Log(X1) log(X3)^2 163.42 2.11583 84.6

    ln(X1) ln(X3)^2 163.42 2.11583 84.6

    ln(X1) log(X3)^2 163.42 2.11583 84.6

    Log(X1) ln(X3) 163.45 2.11586 84.6

    ln(X1) Log(X3) 163.45 2.11586 84.6

    ln(X1)^2 sqr. Rt X3 163.39 2.11696 84.6

    log(X1)^2 sqr. Rt X3 163.39 2.11696 84.6

    ln(X1)^2 log(X3)^2 163.58 2.11756 84.6

    log(X1)^2 ln(X3)^2 163.58 2.11756 84.6

    ln(X1)^2 Log(X3) 163.61 2.1176 84.6

    log(X1)^2 Log(X3) 163.61 2.1176 84.6

    ln(X1)^2 ln(X3) 163.61 2.1176 84.6

    log(X1)^2 ln(X3) 163.61 2.1176 84.6

    sqr. Rt X1 X3 163.27 2.11766 84.6

    sqr. Rt X1 ln(X3)^2 163.8 2.12003 84.7

    sqr. Rt X1 log(X3)^2 163.8 2.12003 84.7

    sqr. Rt X1 Log(X3) 163.84 2.12008 84.7

    sqr. Rt X1 ln(X3) 163.84 2.12008 84.7

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    11/23

    X1 sqr. Rt X3 163.97 2.12333 84.7

    X1 ln(X3)^2 164.17 2.12404 84.7

    X1 log(X3)^2 164.17 2.12404 84.7

    X1 Log(X3) 164.2 2.1241 84.7

    X1 ln(X3) 164.2 2.1241 84.7

    X1^3 X3^2 163.8 2.12796 84.7

    X1^2 X3 164.28 2.12869 84.7

    X1^2 sqr. Rt X3 164.63 2.13064 84.7

    X1^2 ln(X3)^2 164.83 2.13146 84.7

    X1^2 log(X3)^2 164.83 2.13146 84.7

    X1^2 Log(X3) 164.87 2.13155 84.7

    X1^2 ln(X3) 164.87 2.13155 84.7

    X1^3 X3 164.85 2.13518 84.7

    X1^3 sqr. Rt X3 165.21 2.13725 84.8

    X1^3 ln(X3)^2 165.41 2.13815 84.8

    X1^3 log(X3)^2 165.41 2.13815 84.8

    X1^3 Log(X3) 165.44 2.13826 84.8

    X1^3 ln(X3) 165.44 2.13826 84.8

    Table 6. List of multiple regression transformations

    Since in the multiple regression analysis part I found X1 and X3 to yield the best results, I wanted

    to find the most optimum form of these two variables, but it is important to note that other

    variables could also be used in performing the transformations and potentially get better results.

    500.00

    510.00

    520.00

    530.00

    540.00

    550.00

    560.00

    570.00

    580.00

    590.00

    600.00

    1 % M i l k S a

    l e s H C M C

    Multiple Regression W/ Transform

    Sales

    Reg.Log(X1),X3^3

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    12/23

    1st order Exponential Smoothing:

    To gather data for the 1 st order exponential smoothing, formula 5 is used. 1 st order smoothing is

    performed on both raw sales data and adjusted data. When the adjusted data is used, the

    seasonality index must be included for the forecasted data (multiplied) and then compared to the

    raw sales data before finding the alpha which yields the lowest error. It is also important to

    notice that the main difference between this method and the moving average method is that this

    method takes into account the average of the entire data and not just the last N periods of data

    by use of the smoothing factor alpha.

    Formula 5. (1 )

    An example of the first four months of data applying the 1 st order exponential smoothing method

    on our adjusted sales data is shown in table 7 below:

    Table 7. First 4 months of sales data and accompanying 1 st order smoothing results

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    13/23

    2nd order Exponential Smoothing:

    This method is an upgraded form of 1 st order exponential smoothing. The difference between

    this method and 1 st order is that it takes into account the change in alpha to further smooth the

    data. There are four steps to finding the forecast for the 2 nd order exponential smoothing

    method. The first step is to find the forecast for the 1 st order. Next multiply the error from the

    previous months 1 st order forecast and sales by alpha (Formula 6). The third step is to multiply

    what you found in step 2 by alpha, after this add on 1 alpha multiplied by your previous trend

    from this step (you must start with an arbitrary trend which you can then optimize in solver)

    (Formula 7). The fourth step is basically taking the 1 st order results and factoring in the change of

    alpha from what you have found in step 3 (Formula 8).

    Formula 6. 2 (1 )

    450

    470

    490

    510

    530

    550

    570

    590610

    630

    650

    1 % M i l k s a

    l e s H C M C

    1st Order Exponential Smoothing

    Sales

    1st order

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    14/23

    Formula 7 . 3 ( 2) (1 )

    Formula 8. 4 1 3 Once you have calculated step 4, this is your forecast from which you can multiply by the

    seasonality index if using adjusted data, find the error in the trend, and solve for alpha.

    Below in Table 8 is an example of the first 4 months from the 2 nd order method

    Table 8. First 4 months of 2 nd order exponential smoothing method

    Holt-Winter:

    The Holt-Winter method is a type of 2 nd order exponential smoothing method that takes into

    account a moving linear time series trend line. Holt-Winter uses the conventional alpha, but

    450

    470

    490

    510

    530

    550

    570

    590

    610

    630

    650

    1 % M i l k S a

    l e s H C M C

    2nd Order Exponential Smoothing

    Sales

    2nd order

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    15/23

    also includes a beta and gamma. The first step to finding the Holt-Winter is to take the centered

    moving average of your data. To calculate a 4 month centered moving average with the center

    on the 3 rd period, average the first four months, add that value to the average of months two

    through five. After you have this value, divide it by two. Continue this process moving down

    by one month each time. Centered moving average is shown in Formula 9.

    Formula 9.

    ( )

    + +( )

    After you have found the centered moving averages, divide the sales data by the centered

    moving average. This will give you the time series for every month, but it needs to be the

    combined for all months, so find the average of all the values for January, February, March, etc.

    Next, they must all add up to 12. If they do not, then add them all up and divide by 12.

    Multiply the result by each of the original averages for the time series. Now you are ready to

    divide the sales data by the time series that you just found. This will give you sixty months that

    have the time series taken out of it. The next step is to perform a regression analysis on your

    new data. The Intercept and X variable will be used within the Holt-Winter formulas to project

    out into the future.

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    16/23

    In order to get started projecting sales data out past your analysis period, you must use the

    Intercept found from regression as your old forecast, and the X variable as your trend. Inserting

    this data into Formula 10 will give you the value for F that you need to calculate for the rest of

    the formulas.

    Formula 10. (1 )( ) Formula 11. (1 ) Formula 12. ( ) (1 ) Formula 13.

    ( )

    The final formula, formula 13 will give you the forecast for the next month data.

    450

    470

    490

    510

    530

    550

    570

    590610

    630

    650

    S a l e s o

    f 1 % m i l k H C M C

    Holt-Winter

    Sales

    Holt-Winter

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    17/23

    Table 9. Holt winter for months 58(-2),59(-1),60(0), and eight months into the future

    In this table, I used simple linear regression to project sales data for the unknown future values of

    the sales data. Period 0 corresponds to the last period that we have sales data for, period 1 is

    the first month that we do not yet have fixed sales data for and is starting the validation period .

    ARRSES:

    The advantage of Adaptive Response Rate Single Exponential Smoothing is that the alpha is

    dynamic, meaning that it changes as n increases. Plugging our sales data into the following

    formulas we are able to produce results found as in table 10.

    Formula 14. (1 ) Formula 15. Formula 16.

    ( ) (1 )

    Formula 17.

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    18/23

    Formula 18. || (1 )

    Table 10. ARRSES for the first four months of sales data

    Time Series Method:

    This method is a continuation of simple linear regression. It takes the sales data and divides by

    the forecast for regression. Next multiply that value by the linear regression value and the

    seasonal index.

    450

    470

    490

    510

    530

    550

    570

    590

    610

    630

    650

    S a l e s o

    f 1 % M i l k H C M C

    ARRSES

    Sales

    ARRSES

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    19/23

    NiuBi Method:

    This method is based upon the 6 month moving average method and the principal that you want

    more weight on the values closer to the data points that you are trying to forecast. This method

    takes the averages of sales for i-1,i-2, and i-5

    Evaluation Techniques:

    450

    470

    490

    510

    530

    550

    570

    590610

    630

    650

    S a l e s o

    f 1 % m i l k H C M C

    Time Series

    Sales

    Time Series

    480

    500

    520

    540

    560

    580

    600

    620

    1 % m i l k s a

    l e s H C M C

    NiuBi Method

    SalesNiuBi

    06 07 08 09 10

    06 07 08 09 10

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    20/23

    The first evaluation technique is the Mean Squared Error (MSE) shown in formula 19. Another

    is the Mean Absolute Deviation (MAD) -- Formula 20. They are both ways of looking at the error

    between your sales data and your forecasted sales. The MAD is generally preferred over the

    MSE since it doesnt require squaring. MAPE is also another evaluation te chnique shown in

    Formula 21, and unlike MSE and MAD it is not dependent on the magnitude of the values of

    demand. The last of our common evaluation techniques is the Mean Percentage Error (Formula

    22), which is the same as the MAPE just not using the absolute value.

    Formula 19. = Formula 20. ||= Formula 21. * |/ |= Formula 22.

    * /=

    One more test is the Theil U Statistic. A Theil U of zero to one shows that the method you are

    using is significant, however this statistic cannot be compared to other regression methods like

    the four previous evaluation techniques.

    Formula 23. ( )( )

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    21/23

    To find the future data, we must extrapolate out the multiple regression on X1 and X3, but to do

    so, we must have values for X1 and X3 into the future also. The way to get around this is to

    perform a simple linear regression on X1 and project it out, and also a simple linear regression on

    X3 projecting that out for the next two years also. Now that we have data for X1 and X3 for the

    next two years we can continue our multiple regression analysis. The only problem with this

    method is that it is only as good as our linear regression analysis on the data which had a higher

    error than multiple regression. However, even though it had a higher error, the error was not

    exceptionally greater than multiple regression.

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    22/23

    Since the sales data for the future is unknown, and no matter what method or device you use to

    determine the future sales data, it will be wrong. However, we can use Crystal Ball to add

    variability to our data. Setting up Crystal Ball on Multiple Regression with the future sales data

    as our random variables and forecasting MSE, it is 95% certain that if you ran 1000 trials the MSE

    would be between 566 and 1754. That is a high MSE for the data in this report, however, I

    contribute that number to the variation of projecting on data that is projected out using a

    different method. Also, the farther out you go, the harder it is to project data. If we were only

    projecting out one year, the MSE would be much smaller.

  • 8/3/2019 Ryan Ehardt IME 416 Lab1 Milk Sales Lab Final

    23/23

    Executive Summary:

    18 forecasting methods were compared with MSEs ranging from 60 to 1700. Each method was

    performed, tested, and determined to be acceptable or not. Out of all the methods tried, all of

    them were found to be acceptable. The end result was to try and find the best forecast possible

    for Happy Cow Milk Companys 1% Milk Sales. The best method used was found to be the

    Multiple Regression using X1 and X3 (Grocery Sales and CPI of milk) based on MSE, MAD, and

    MAPE. This method was then used to forecast out for the next two years by using linear

    regression for X1 and X3 data. Crystal Ball was used to add variability to the unknown sales data

    in order to determine the MSE of forecasting this data out two years.