improving sales forecast accuracy for...

Linköpings universitetSE–581 83 Linköping+46 13 28 10 00 , www.liu.se

Linköping University | Department of Computer and Information ScienceBachelor’s thesis, 16 ECTS | Computer Engineering

2019 | LIU-IDA/LITH-EX-G--19/055--SE

Improving sales forecastaccuracy for restaurantsFörbättrad träffsäkerhet i försäljningsprognoserför restauranger

Rickard AdolfssonEric Andersson

Supervisor : George OsipovExaminer : Ola Leifler

http://www.liu.se

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annananvändning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligheten finns lösningar av teknisk och administrativ art.Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning somgod sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentetändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.För ytterligare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for aperiod of 25 years starting from the date of publication barring exceptional circumstances.The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercialresearch and educational purpose. Subsequent transfers of copyright cannot revoke this permission.All other uses of the document are conditional upon the consent of the copyright owner. The publisherhas taken technical and administrative measures to assure authenticity, security and accessibility.According to intellectual property law the author has the right to bementionedwhen his/her workis accessed as described above and to be protected against infringement.For additional information about the Linköping University Electronic Press and its proceduresfor publication and for assurance of document integrity, please refer to its www home page:http://www.ep.liu.se/.

©Rickard AdolfssonEric Andersson

http://www.ep.liu.se/

http://www.ep.liu.se/

Abstract

Data mining and machine learning techniques are becoming more popular in helpingcompanies with decision-making, due to these processes’ ability to automatically searchthrough very large amounts of data and discover patterns that can be hard to see withhuman eyes.

Onslip is one of the companies looking to achieve more value from its data. They pro-vide a cloud-based cash register to small businesses, with a primary focus on restaurants.Restaurants are heavily affected by variations in sales. They sell products with short ex-piration dates, low profit margins and much of their expenses are tied to personnel. Bypredicting future demand, it is possible to plan inventory levels and make more effectiveemployee schedules, thus reducing food waste and putting less stress on workers.

The project described in this report, examines how sales forecasts can be improvedby incorporating factors known to affect sales in the training of machine learning models.Several different models are trained to predict the future sales of 130 different restaurants,using varying amounts of additional information. The accuracy of the predictions are thencompared against each other. Factors known to impact sales have been chosen and catego-rized into restaurant information, sales history, calendar data and weather information.

The results show that, by providing additional information, the vast majority of fore-casts could be improved significantly. In 7 of 8 examined cases, the addition of more salesfactors had an average positive effect on the predictions. The average improvement was6.88% for product sales predictions, and 26.62% for total sales. The sales history infor-mation was most important to the models’ decisions, followed by the calendar category. Italso became evident that not every factor that impacts sales had been captured, and furtherimprovement is possible by examining each company individually.

Contents

Abstract iii

Acknowledgments iv

Contents iv

List of Figures vi

List of Tables vii

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Onslip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 32.1 Automated forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Data mining and machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Decision trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Decision forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5 Gradient boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.6 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.7.1 Weather’s effects on sales . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.8 Forecast accuracy metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.8.1 Mean absolute error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Method 113.1 Raw data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Receipt data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Sales history data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4 External data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4.1 Weather data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4.2 Google Maps data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4.3 Product data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4.4 Calendar data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 The combined data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.6 Company selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.7 Model implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

iv

4 Results 214.1 Model training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Forecast accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3 Feature importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Discussion 255.1 Model training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.2.1 Restaurant variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2.2 Calendar variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2.3 Sales history variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2.4 Weather variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3 This work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6 Conclusion 346.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

References 36

A Appendix 39SMHI forecast parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39SMHI historical parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Product sales data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Total sales data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Individual and collective training comparison . . . . . . . . . . . . . . . . . . . . . . 43Product sales MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Total sales MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Feature importance and Pearson correlation coefficient . . . . . . . . . . . . . . . . . 52

v

List of Figures

2.1 Selected steps in the creation of a decision tree, and its corresponding partitioningof the training set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Boosting example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Flow diagram of the process to find the closest station . . . . . . . . . . . . . . . . . 143.2 Flow diagram of the process to fetch weather data for the given station . . . . . . . 143.3 Examples of companies with different sales patterns . . . . . . . . . . . . . . . . . . 183.4 Total number of products sold per day from January 2015 to May 2019 . . . . . . . 18

4.1 Average feature importance for the researched variables . . . . . . . . . . . . . . . 23

5.1 Feature importance for the collectively trained model . . . . . . . . . . . . . . . . . 265.2 Product sales for the two companies whose forecasts were affected worst by

adding more information, with 156.6% and 69.71% decreases in prediction accu-racy as compared to using only receipt information . . . . . . . . . . . . . . . . . . 27

5.3 Average sales per weekday . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.4 Average sales on Swedish national holidays . . . . . . . . . . . . . . . . . . . . . . . 305.5 Relationship between the maximum distance to a selected weather station and the

forecast improvement from adding weather factors . . . . . . . . . . . . . . . . . . 32

vi

List of Tables

3.1 Receipt variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Sales history variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Selected SMHI parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4 Weather variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5 Restaurant variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.6 Calendar variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.7 Company reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.8 Data set reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 Comparison of individual and collective training . . . . . . . . . . . . . . . . . . . . 214.2 Average improvement of model sales forecast for all selected companies . . . . . . 224.3 The 5 most over- and under-performing variables in the product sales data set . . . 244.4 The 5 most over- and under-performing variables in the total sales data set . . . . . 24

5.1 Effects of adding information on product sales forecasts . . . . . . . . . . . . . . . . 275.2 Effects of adding information on total sales forecasts . . . . . . . . . . . . . . . . . . 275.3 The 5 most and least improved total sales predictions with the addition of weather

information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A.1 SMHI’s avaliable forecast parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.2 SMHI’s avaliable historical parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 40A.3 Variables in the product sales data set . . . . . . . . . . . . . . . . . . . . . . . . . . 41A.4 Variables in the total sales data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A.5 MAE comparison between individually and collectively trained models for the

total sales data set (1/3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43A.6 MAE comparison between individually and collectively trained models for the

total sales data set (2/3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44A.7 MAE comparison between individually and collectively trained models for the

total sales data set (3/3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45A.8 Effects of adding extra information on product sales forecast MAE (1/3) . . . . . . 46A.9 Effects of adding extra information on product sales forecast MAE (2/3) . . . . . . 47A.10 Effects of adding extra information on product sales forecast MAE (3/3) . . . . . . 48A.11 Effects of adding extra information on total sales forecast MAE (1/3) . . . . . . . . 49A.12 Effects of adding extra information on total sales forecast MAE (2/3) . . . . . . . . 50A.13 Effects of adding extra information on total sales forecast MAE (3/3) . . . . . . . . 51A.14 Feature importance and Pearson correlation coefficient for the variables in the

product sales data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52A.15 Feature importance and Pearson correlation coefficients for the variables in the

total sales data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

vii

1 Introduction

This report describes an undergraduate thesis project for a Bachelor’s of Science degree inComputer Engineering at Linköping University. The thesis covers 16 ECTS and has beencarried out in cooperation with the company Onslip in Linköping, Sweden.

1.1 Motivation

The Internet keeps growing and gains more users every year. In 2017, 49% of the worldspopulation was connected to the Internet [1]. One effect of this growth is that an increasingamount of data is generated in almost every field. In 2017, an average of 2.5 exabytes (1018)of data was created every day [2], and if current growth continues it is estimated that by 2020,for every person on earth, 1.7 MB of new data will be created every second [3]. According tothe International Data Corporation (IDC) the amount of data stored globally is expected toincrease from 33 zettabytes (ZB = 1021) in 2018 to 175 ZB by 2025 [4].

Many businesses are now trying to take advantage of this data, with 53% of the nearly4000 companies asked in a 2017 survey reporting that they are using data analysis tools toaffect decision-making, and another 20% expecting to begin in the next two years [5]. Ina recent survey of 63 global companies, 62% of respondents answered that they had seen”measurable results” from investing in ways to analyze their data [6].

Data analysis can be applied to many different areas of business, for example to improvecustomer experience, forecast demand and streamline operations. According to a surveyby Bloomberg Businessweek Research Services, some of the goals for companies that areimplementing data analysis tools are to reduce costs, increase profitability, manage risks,optimize internal operations and improve decision-making [7].

There have been several studies performed on the value created by using data for businessanalytics. According to Chen et. al. [8], using data analysis tools can lead to "unprecedentedintelligence on consumer opinion, customer needs, and recognizing new business opportu-nities". A literature review by Wamba et. al. [9], found that data analysis has improved busi-ness performance in several ways, including reduced processing time and improved qualityin manufacturing, better insights into consumer needs, customized products and services,improved decision-making, and greater innovation. A case study by Popovic et. al. [10]found that ”when firms utilize more big data analytics, they better forecast previously unpre-dictable outcomes, and improve process performance. As a result, firms realize operational

1

1.2. Onslip

process benefits in the form of cost reductions, better operations planning, lower inventorylevels, better organization of the labor force and elimination of waste, while they leverageimprovements in operations effectiveness and customer service.”

One important source of data is from customer transactions, since this type of data directlyreflects customer actions. In 2015, 227.1 billion card transactions were made globally [11],which creates an enormous amount of interesting data for analysis. According to the SwedishLaw of Accounting (Bokföringslagen 7 kap. 1-2 §) [12], receipts of every transaction mustbe kept by the business for at least 7 years, stored in the form, physical or digital, that thebusiness obtained them in. Not having to store physical copies is an advantage for businessesin terms of space and convenience, and the digital copies also give them the opportunity touse data science tools to analyze their activities and improve their results.

1.2 Onslip

Onslip is a Swedish company that sells a cloud-based cash register solution to small compa-nies that run some kind of physical trading, regardless of industry. Onslip offers its own cashregister, but it is also possible to use their software on several different platforms. Onsliphas customers worldwide who use their product every day. With their solution, customers’data on completed transactions are automatically saved in the cloud. According to Swedishlaw, they are obligated to save data on transactions for at least 7 years and therefore have adatabase with large amounts of data. Currently, Onslip has more than 4 years of data savedthat can be accessed by a customer for statistics, but is otherwise not used to provide addi-tional services. They would like to offer more value to their customers by using the saveddata.

1.3 Purpose

The purpose of this thesis is to evaluate how predictive machine learning algorithms can beimproved, and thereby developing a method for Onslip to analyze large amounts of dataautomatically. At the end of our work, Onslip will have insights into how they can searchfor patterns in their stored transactions, and use these to predict the future sales of a product.Onslip will then, for example, be able to provide more accurate sales forecasts or suggest totheir customers products that are suitable for promotions.

1.4 Research questions

• How can machine learning algorithms be utilized to predict future sales?

• Which factors affecting sales have an impact on the accuracy of predictions?

1.5 Limitations

• This work will focus on producing a solution for one specific area of business, restau-rants, and not one general solution for all of Onslip’s customers.

• This work will not analyze algorithms or frameworks in depth, but will simply chooseexisting solutions based on previous research.

2

2 Theory

2.1 Automated forecasting

There are many applications for predicting future demand of the products and services thata business is supplying. Forecasts are used throughout most industries for planning produc-tion and business operations, purchasing materials, managing inventory, scheduling workhours, advertising, and much more [13]. Traditional forecast methods have mostly beenbased on opinions of experienced employees or on statistical analysis of previous history, butin recent years machine learning algorithms have been applied to this area with much suc-cess [14]. Using machine learning techniques instead of relying on human expertise meansthat a business can spread and hold on to knowledge that only some employees have, and byautomating the process it is possible to apply forecasting methods on a much larger scale.

2.2 Data mining and machine learning

Data mining is the process of discovering patterns in data, by using methods and techniquesfrom machine learning and statistics [15]. This process is usually automated and performedon databases. Much of data mining involves the application of machine learning tools toextract information from the data and find underlying structures. The goal is to find gen-eral patterns that explain something about the data, and use these to guide future decisionmaking.

In the context of data mining, machine learning can be explained as creating structureddescriptions of data [15]. The form of these descriptions depend on the algorithm used tocreate them, and can consist of, for example, mathematical functions, rule sets or decisiontrees. The descriptions represent what has been learned from the data, and are used to makeassumptions about new and previously unseen data. By looking at what attributes of thedata had the biggest impact in the creation of the descriptions, it is also possible to verifysuspected relationships, or see other patterns that may have been unknown earlier.

The most common way of acquiring these descriptions is by looking at examples [15].The machine learning model is provided with input data, and the objective is to find thebest transformation of input data into corresponding output data. In unsupervised machinelearning, the model does not know the output in advance, while supervised learning modelsare given examples in pairs of inputs and correct outputs. Compared to unsupervised learn-

3

2.3. Decision trees

ing, a supervised model needs to receive feedback about how close its assumptions are tothe real answers, and it learns by trying to minimize the error produced by the feedback. Aclassification model attempts to predict output data ranging over a discrete interval, while aregression model makes predictions for continuous data.

2.3 Decision trees

A decision tree is a data structure commonly used to represent a machine learning model’sknowledge. At each internal node of the tree, the data set is split in two parts based onthe values of one input variable [16]. Growing the tree involves deciding which variable touse for the split, and what value of the variable to split the data on. This is done throughexhaustive search of the different variables, meaning that all possible splits are considered,and the split that produces the lowest error is chosen. The amount of splits to be consideredcan be reduced by using heuristic search and function optimization techniques.

The leaves of the tree contain the model’s prediction. In a classification model the valuesin the leaves are different categories, and in a regression model a leaf’s value is the meanof every data point in the training set that leads to this leaf. A few of the steps involved increating a decision tree from a data set with two-dimensional input can be seen in Figure 2.1.

Figure 2.1: Selected steps in the creation of a decision tree, and its corresponding partitioningof the training set

Decision trees with many variables can grow very complex and often have problems withover-fitting: when the model is too specialized to the training set that it looses accuracy whenapplied to other, more general problems [16]. One way to avoid this is to keep the tree assimple as possible. Typical criteria of when to stop growing the tree are when it has reacheda maximum depth, when there are a certain number of leaves, or when the error reductionis less than a threshold value. It is also possible to prune the tree afterwards, a process thatremoves a split node from the tree. A node is removed if its elimination either reduces theerror, or does not increase the error too much while reducing the size of the tree sufficiently.

4

2.4. Decision forests

2.4 Decision forests

The process of applying several machine learning models to the same problem and combiningtheir results is called ensembling [16]. This approach is based on an idea from probabilitytheory in mathematics, that, if some independent predictors are correct with a probabilityhigher than 50%, the combined prediction has a higher probability of being correct than anysingle predictor. If the ensemble consists of decision trees, it is called a forest.

There are two different ways of creating forests, bagging and boosting. Bagging involvessampling subsets of the training data and creating new independent trees for each subset, sothat they hopefully reflect different aspects of the data. These trees are usually grown fromsmall subsets to reduce their similarity, but allowed to grow deeper to be able to discovermore complex patterns. The final prediction is decided using all of the models, for exampleby taking the most common, mean or median value.

In boosting, the trees are not independent but are instead created sequentially based onthe error of the previous trees [16]. By removing the already captured patterns, the followingtree can focus more on finding new aspects of the data that are harder to discover. One issuewith this approach is that, because the trees do not train on the whole training set, they aremore likely to find patterns that do not reflect the actual data. Therefore, the trees are usuallynot allowed to grow too deep, and are sometimes even kept to only one split.

The boosting algorithm starts by creating an initial model that roughly approximates thetraining data, usually by simply using the minimum or mean value [17]. The next step isto calculate how much the approximation differs from what is expected. The result is a setof vectors, one for every point in the training data, called the residual vectors, that describein which direction and how far off each estimate is from its target. A second model is thentrained to approximate these residuals. The new model’s estimate of the residuals is addedto the first model. The differences from the expected results are once again calculated, andanother set of residual vectors are produced. This continues until a certain number of modelshave been trained, or the algorithm stops making sufficient progress.

The combined prediction y of a boosted model can be represented as the sum of all indi-vidual model’s predictions fi(x).

y =N

ÿ

i=1

fi(x) = F(x) (2.1)

The current boosted model at step i can therefore be described as the previous boosted modelcombined with the current individual model.

Fi(x) = Fi´1(x) + fi(x) (2.2)

We can show the usefulness of boosting machine learning models with a simplified exam-ple using the function y = 5 + x + sin(x), seen in figure 2.2a. The first model looks at wherethe curve intercepts with the y-axis at (0, 5), and so approximates the function as f1(x) = 5,shown together with the original function in figure 2.2b. The errors produced after the firstmodel’s predictions are shown in figure 2.2d. The second model receives this curve as its in-put and sees that it can be represented quite well with a simple linear function, matching com-pletely at every„ 3.14 steps. It therefore approximates the curve as f2(x) = x, which is addedto the previous function, as can be seen in the figure 2.2c. The third model, again, is trainedon the aggregated error produced by the predictions from the previous models, shown infigure 2.2e. The model recognizes the patterns of a sine wave and adds f3(x) = sin(x) to theapproximation, which eliminates the error completely, as the boosted models have found thecorrect function. In real use cases the input is not a single variable, the function is often muchmore complex, requires many more models to solve, and can not be exactly approximated,but the process remains the same.

5

2.5. Gradient boosting

(a) y = 5 + x + sin(x) (b) y = 5 + x + sin(x), y = 5 (c) y = 5 + x + sin(x), y = 5 + x

(d) Difference between thetwo functions above

(e) Difference between thetwo functions above

Figure 2.2: Boosting example

2.5 Gradient boosting

Gradient descent is an optimization technique used to minimize functions by taking smallsteps in the direction where the slope of the function is steepest [17]. This is done by firstcalculating the gradient ∇ of the function f at the current point using its partial derivatives.

∇ f (x) =

B f (x1)Bx1¨ ¨ ¨B f (xN)BxN

where x is a vector with N dimensions (2.3)

The gradient represents a vector with the direction of the steepest slope, and a length propor-tional to the steepness of the slope.

The second step of gradient descent uses the gradient to calculate the next point to inves-tigate. To avoid going too far and missing the actual minimum, only a part of the gradient isapplied. How much is controlled with the learning rate factor γ.

xi+1 = xi ´ γ ¨∇ f (xi) (2.4)

This concept is used in the gradient boosting machine learning algorithm to minimize theerror function in a similar way; by calculating the negative gradients and using them to trainthe next model.

yi+1 = yi + γ(´∇e(y, yi)) (2.5)

2.6 XGBoost

In 2016, Tianqi and Carlos [18] published their study of a tree boosting system called eX-treme Gradient Boosting, or XGBoost (XGB). XGBoost is an effective implementation of thegradient boosted decision trees algorithm. The framework is fast and has both CPU and GPUimplementations. It also supports multi-thread parallelism, which makes it even faster.

6

2.7. Related work

XGBoost has two types of split decision algorithms, a pre-sort-based algorithm calledExact Greedy, which is used by default, and a histogram-based algorithm. Exact Greedyfirst sorts each feature before enumerating over all the possible splits and calculating thegradients. Each tree that is created is given a score that will represent how good it is. Thetrees are built sequentially so that the result of the previous tree can be used to help build thenext tree.

A common problem that can occur when using machine learning techniques on large datasets is that all the values may not fit in the CPU cache, which can greatly increase computationtimes. XGBoost has solved this problem with a cache-aware algorithm and therefore achievesbetter performance than similar machine learning frameworks when applied to large datasets. This makes XGBoost a great tool when computation resources are limited.

Another advantage of XGBoost is that it has implemented a way to handle missing values,which is another common problem in machine learning. XGBoost solves this by using adefault direction in each split, which will be selected if values are missing for this feature.

2.7 Related work

Predicting future sales of food with machine learning is not a new subject, however much ofthe research in this area has focused on forecasting for retail grocery stores.

A survey by Tsoumakas [19] reviewed 13 research papers on machine learning in foodsales predictions. Only in one of these articles had a restaurant been researched. The authorfound that most forecasts made daily predictions, while some used longer time spans up toweeks and even per quarter. The most commonly predicted output variable was amountsold in piece or weight, followed by monetary amount. According to the survey, the mostcommon input variables to the machine learning algorithms are historical sales figures fordifferent intervals. These can be, for example, the amount sold on this day last week oryear, and average sales for the last week or month. Other usual inputs are characteristicsof the date and time, such as the day of the week, month of the year, and if the date is aholiday. Some inputs that occur less often in these papers are external factors that wouldrequire collecting data from outside the company, for example financial, social and weatherfactors, and different types of events occurring in the vicinity of the business. According tothe author, there exists an unexplored opportunity to make use of product information asinput variables to predict demand for multiple products with the same model.

Doganis et. al. [20] combined a neural network with a genetic variable selection algorithmand tried to predict the daily sales of fresh milk for a dairy producer in Greece. They createdadditional input variables from the provided sales data, similar to what was used in currentforecast methods. The examined variables were sales figures from the previous week of thecurrent year, the previous week of last year, the corresponding day of last year, and the per-centile change in sales from the previous year. The authors compared the forecast accuracy ofthe neural network to other linear regression algorithms, and found that the neural network,which was the only model provided with the additional variables, had an average error ofless than 5% compared to 7-10% for the other models. The most useful input variables identi-fied by the variable selection algorithm were the sales on the previous day and the same daylast week.

Žliobaite et. al. [21] predict the weekly sales of products for a food wholesaler in theNetherlands. They start by categorizing products into predictable and random sales patterns.This is performed by creating variables from the products’ sales history, including variationsof the mean and median values, quartiles, and standard deviation, and feeding these to anensemble of classification models that decide the product category by majority vote. Fore-casts are then made for the products with predictable sales patterns, using an ensemble ofpredictive algorithms. The input variables provided to the predictors in this study were dailyproduct sales, average weekly product sales, daily total sales, product promotions, holidays,

7

2.7. Related work

season, temperature, air pressure and rain. The most used variables were the product relatedvariables, the season, the temperature and one of the included holidays. The results showthat the presented solution outperformed the baseline moving average forecast method, andby reducing the threshold of which products were categorized as predictable, the accuracycould be improved further. They also discuss the possibility that the classification model canbe excluded and its input variables incorporated directly into the prediction model.

Islek and Ögüdücü [22] developed a forecasting method for a Turkish distributor of driednuts and fruits. The company has almost 100 main distribution warehouses, which have theirown sub-distribution warehouses. The input data included warehouse related attributes, forexample location, size, number of sub-warehouses and transportation vehicles, selling area insquare meter, number of employees and amount of products sold weekly, as well as productinformation such as price and product categories. Their solution first used a bipartite graphclustering algorithm to group warehouses with similar sales patterns, and then a movingaverage and Bayesian network combination model to predict the weekly sales of individualproducts at each warehouse. The authors evaluated the forecast accuracy for three differ-ently trained models. One handled all of the warehouses grouped together, one used clustersof main warehouses, and the last was given sub-warehouse clusters. The clustering algo-rithm generated 29 different main warehouse clusters, and 97 clusters for the sub-distributionwarehouses. The results showed that the error rate of the model dropped from 49% withoutclustering, to 24% with main distribution warehouse clusters, and 17% with sub-distributionwarehouse clusters.

Liu and Ichise [23] performed a case study of a Japanese supermarket chain, in whichthey implemented a long short-term memory neural network machine learning model withweather data as input parameters to predict the sales of weather-sensitive products. Theyused six different weather factors: solar radiation, rainfall precipitation, relative humidity,temperature, and north and east wind velocity. Their results show that their model’s predic-tions had an accuracy of 61,94%. The authors mention plans to improve their work in thefuture by adding other factors known to affect sales, such as area population, nearby com-petitors, price strategy and campaigns.

An early research paper that focused on restaurants was written by Takenaka et. al. [24],who developed a forecasting method for service industry businesses based on the factorsthat interviewed managers took into account when forecasting manually. The examined fac-tors include the weekday, rain, temperature and holidays. They found that their regressionmodel could provide more accurate predictions for a restaurant in Tokyo than the restaurantmanager.

In a study from 2017, Bujisic et. al. [25] researched 17 different weather factors and theireffects on restaurant sales. They analyzed a data set consisting of every meal sold at a restau-rant in southern Florida during 47 weeks, from March 2010 to March 2011. Their resultsshowed that weather factors can have a significant effect on sales of individual products,however not all products are affected by the weather, and the same weather factor has differ-ent effects on different products. They also found that the most important weather factor wasthe temperature, followed by wind speed and air pressure.

Xinliang and Dandan [26] used a neural network to forecast daily sales of four restaurantslocated at a university campus in Shanghai. The input variables provided to their modelwere the restaurant’s name, the date, the teaching week, the week of the year, if the date wasa holiday, temperature, precipitation, maximum wind speed, and 4 different search metricsfrom Baidu, China’s largest search engine. The authors found that the name of the restaurantwas the most important variable, followed by the teaching week, holiday, and one of theBaidu variables. The weather factors were shown to have low importance to the model, withtemperature slightly above the other two.

Ma et. al. [27] predicted future visitors to restaurants using a mix of K-nearest-neighbour,Random forests and XGBoost. Their data was obtained from large restaurant ordering sites,and included 150 different restaurants. To compare different restaurants, the authors con-

8

2.8. Forecast accuracy metrics

structed several input variables from restaurant attributes such as a unique ID, latitude, lon-gitude, genre, and location area. The results showed that XGBoost was the best individualmodel, and that the most important variables were the week of year, mean visitors, restaurantID and maximum visitors.

Holmberg and Halldén [28] researched how to implement machine learning algorithmsfor restaurant sales forecasts. To improve the accuracy of the predictions, they included vari-ables based on sales history, date characteristics and weather factors. The weather factorsused for their model were temperature, average temperature of the last 7 days, rainfall, min-utes of sunshine, wind speed, cloud cover, and snow depth. They examined two different ma-chine learning algorithms and found that the XGBoost algorithm was more accurate than thelong short-term memory neural network. The date variables were the most significant and theweather factors had the least impact. They also found that the daily sales were weather de-pendent for all of the researched restaurants, and that introducing weather factors improvedtheir models’ performance by 2-4 percentage points. They suggest that continued work couldbe to create more general models that can make predictions for multiple restaurants, possiblyby categorizing different restaurants based on features such as latitude/longitude, inhabi-tants, size of restaurants, and opening hours.

From these studies a few key facts can be extracted. While models that perform timeseries forecasting by default have access to previous sales figures, results can be improved byemphasizing certain patterns with their own input variables. Which input variables prove tobe most important seems to be different in every study, and depend on the variables included,and the behaviour of each restaurant’s customers. Some variables are however more likely tohave a greater effect, with variables related to date and sales figures more often representedin the lists of most important variables. Almost all of the early studies make use of some kindof neural network architecture, but in more recent papers the use of XGBoost becomes morepopular, and it has been shown to be one of the best performing algorithms for predictionand forecasting problems [19, 27–33].

2.7.1 Weather’s effects on sales

Several studies have shown that the weather can have a significant effect on business, bothseasonal and day-to-day. Murray et. al. [34] mention three key factors as the most importantreasons. Firstly, the weather impacts an individual’s will to go outside, and thereby that per-son’s opportunity to make a purchase. If it is cold, or it rains or snows heavily, you are muchmore likely to stay home. Additionally, some weather factors and seasonal changes have ef-fects on people’s mood, which in turn affects their willingness to spend money. The lowerlevels of sunlight during the winter months can lead to Seasonal Affective Disorder [35], es-pecially in the northern parts of the world [36]. Lastly, some products are very dependant onthe season and can experience huge variations in sales over the year. Drinks, for example,sell much more during the summer.

2.8 Forecast accuracy metrics

There are a wide range of available methods for how to statistically evaluate the accuracy ofa forecast. These all have different complexity and advantages, and usually there is no metricthat is best in every case [37]. The most appropriate measurement depends on the attributesof the input data and the predicted values, and should be chosen based on each individualsituation. Some metrics are more sensitive to outliers in the data, where one bad predictioncan have a significant impact on the overall performance. Others are based on the scale ofthe predicted values, and can not be easily compared against each other. Some do not havemeaningful interpretations, and it can be hard to understand what they are meant to show.

A relative accuracy metric first uses one baseline forecast method, and then comparesother predictions against the performance of the baseline. This metric has the advantage of

9

2.8. Forecast accuracy metrics

being independent of the scale of the predicted values, and only measures the difference inprediction accuracy.

2.8.1 Mean absolute error

The mean absolute error (MAE) is a popular accuracy metric because it is easy to under-stand. It simply measures the absolute difference between the real and predicted values. It iscalculated using the formula below.

1N¨

Nÿ

i=1

| yi ´ yi | (2.6)

where y is the observed value, and y is the model’s prediction.

10

3 Method

3.1 Raw data

For this study, Onslip provided a snapshot of one of their production databases, with dataof every transaction processed from 2015-01-01 to 2019-05-07. Because Onslip’s customersoperate in various industries, it was discussed that focusing on one type of business couldmake it easier to achieve results. If this project was successful, the insights gained could thenbe applied to more businesses. The restaurant category was chosen as it was the one with themost customers, and the most data.

The data used is obtained from the receipts of transactions in a business category that On-slip calls "Small ticket Food&Beverage". This group contains restaurants that receive ordersof on average 5 products, and includes for example pizzerias, thai- and sushi-restaurants.These are Onslip’s main target group and constitutes most of their customers.

The data included 11 481 129 individual transactions, of in total 34 163 817 items, for atotal amount of 1 854 132 529.49 SEK. 374 unique companies and 59 076 different productswere identified.

3.2 Receipt data

One data point for a transaction contains all the information printed on the receipt at thetime of sale. Some of these fields are required by Swedish law, such as the date, time, items,monetary amount and tax rate of the purchase, and the organization number and addressof the selling business. Other fields contain information included by the business, as Onslipallow their customers to add their own custom information to the receipts.

From the raw data, the name and address of the business was extracted and, for everypurchase, the date and time, and individual items, quantities and prices. Each company’sextracted data is then arranged by a combination of item, price and date, and added togetherin one data set that shows the quantity of each product sold per day per price. An addi-tional data set is created by summing up the sales for all products using the product, priceand quantity variables, to show the total monetary amount sold at each restaurant per day.Because all predictions are made on a per day basis, the time stamps are not included in anyof the data sets. Table 3.1 shows all variables that were extracted from the receipt data.

11

3.3. Sales history data

Table 3.1: Receipt variables

Variable Description Type Data setday The day of the month of the date of sale Integer Bothid The ID of a product Integer Product salesmonth The month of the year of the date of sale Integer Bothprice The sale price of a product Float Product salesquantity The total amount of a product sold on this date Integer Product salesstore-address The address of the restaurant String Bothstore-name The name of the restaurant String Bothsum The total amount in SEK sold on this date Float Total salestime The time of day of the sale Time stamp Noneyear The year of the date of sale Integer Both

3.3 Sales history data

Information about previous sales were calculated from the extracted receipt data. These vari-ables were constructed based on results from previous research by Doganis et.al. [20].

The data sets are first separated by either unique product and price combinations in theproduct data set, or by store-name in the other case. The calculations are performed withthe operations shift, which shifts all data points the specified number of times, rolling, whichcreates fixed-size rolling windows, and mean, which calculates the mean value in the givenrange. For data points where the values were shifted out, the first occurring value was usedto fill the range backwards. When there was not sufficient data for the rolling operations, themean value of the entire data set was used instead. The sales history data consisted of 14additional variables, seen in Table 3.2 below.

Table 3.2: Sales history variables

Variable Description Type1_days_ago The amount sold 1 day before the date of sale Float/Integer2_days_ago The amount sold 2 days before the date of sale Float/Integer3_days_ago The amount sold 3 days before the date of sale Float/Integer4_days_ago The amount sold 4 days before the date of sale Float/Integer5_days_ago The amount sold 5 days before the date of sale Float/Integer6_days_ago The amount sold 6 days before the date of sale Float/Integer7_days_ago The amount sold 7 days before the date of sale Float/Integer14_days_ago The amount sold 14 days before the date of sale Float/Integer28_days_ago The amount sold 28 days before the date of sale Float/Integer364_days_ago The amount sold 364 days before the date of sale Float/Integermean_last_7_days The mean amount sold in the 7 days before the date of sale Float/Integermean_last_14_days The mean amount sold in the 14 days before the date of sale Float/Integermean_last_28_days The mean amount sold in the 28 days before the date of sale Float/Integermean_last_364_days The mean amount sold in the 364 days before the date of sale Float/Integer

3.4 External data collection

Three different key categories of external data have been identified as being of interest inearlier research: weather factors [21, 23, 24, 26, 28, 29, 31], attributes of the restaurants orproducts [22, 26, 27, 29, 33], and date characteristics [21, 23, 24, 26–31, 33]. This informationis gathered from different publicly available sources, in the ways described in the followingsections.

12

3.4. External data collection

3.4.1 Weather data

To obtain the weather information, the Swedish Meteorological and Hydrological Institute(SMHI) was chosen as the source. They offer daily statistics about the weather, such as pre-cipitation, temperature, wind speed, and more. SMHI offers both historical weather infor-mation several years back as well as forecasts for the next ten days. With the help of SMHI’sApplication Programming Interface (API), that information can be retrieved easily. To fetchweather information for a specific area a position is needed.

Which weather factors to use was decided by comparing the list of available parametersfor historical data to the parameters in forecasts1. The historical parameters were obtainedfrom an API call while the forecast parameters were taken from SMHI’s API documentation.

Get all weather types available in the historical data:https://opendata-download-metobs.smhi.se/api/version/latest/parameter.json

Get all weather types available in the forecasts:https://opendata.smhi.se/apidocs/metfcst/parameters.html

Eight parameters that belonged to both sets were identified: air pressure, air temperature,rainfall, relative humidity, visibility, wind direction, wind gust, and wind speed. Rainfallwas not found directly in the forecast parameter list, but could be calculated by checkingif the precipitation category was rain and then multiplying the mean precipitation by 24.Unfortunately, data could not be obtained for all the requested parameters and intervals andwere therefore not used. The remaining weather factors selected for this work are displayedin Table 3.3.

Table 3.3: Selected SMHI parameters

Parameter Weather type Attribute Measured Interval2 Air Temperature Average 1 time/day4 Wind Speed Average 1 time/hour5 Rainfall Sum 1 time/day

SMHI has two different APIs, one for retrieving older weather data and one for currentand future weather data. To retrieve older weather data, a weather station must be specifiedas an argument when using the API. The other only needs a position in longitude and lat-itude. The process to retrieve old weather data is divided into two parts. First, a list of allstations that have data for the chosen weather factor is fetched and saved. If there is a filecontaining all the weather stations for the selected weather factor, it is loaded instead. Thislist is searched through and the distance to each station is calculated to find the one that isnearest to the restaurant’s location. Figure 3.1 shows a flow diagram of how the first part isdone. The call that is used to fetch all stations is shown below, where {parameter} is replacedwith a parameter from Table 3.3.

Get all stations’ positions and IDs:https://opendata-download-metobs.smhi.se/api/version/latest/parameter/{parameter}.json

In the second step, the weather station is used as a parameter to call SMHI’s API andrequest the data. The data is received as a Comma Separated Values (CSV) file. Some rowsand columns only contain comments or unnecessary information and are therefore removed.

1The complete lists of SMHI parameters can be found in the Appendix, in Table A.1 on page 39 and Table A.2 onpage 40

13

https://opendata-download-metobs.smhi.se/api/version/latest/parameter.json

https://opendata-download-metobs.smhi.se/api/version/latest/parameter.json

https://opendata.smhi.se/apidocs/metfcst/parameters.html

https://opendata-download-metobs.smhi.se/api/version/latest/parameter/


.json


Figure 3.1: Flow diagram of the process to find the closest station

Figure 3.2 shows how this is done. Also shown below is the call that is used to fetch historicalweather data, where {station} is the ID of the chosen station.

Fetch historical weather data from a specific station:https://opendata-download-metobs.smhi.se/api/version/latest/parameter/{parameter}/station/{station}/period/corrected-archive/data.csv

Figure 3.2: Flow diagram of the process to fetch weather data for the given station

The historical weather data is obtained in two different time periods: corrected archive,which contains data from the beginning of collection until about three months before thecurrent date, and latest months, which has data for the latest four months. This caused someoverlap in the data, so because, according to SMHI, the corrected archive is more accurate,when there were multiple data points for the same day, the ones obtained from the archivewere selected.

When obtaining the weather information, several problems arose with how the stationshad saved their data, for example that many stations:

• Did not save weather history at all.

• Had stopped saving data several decades ago.

• Had not saved data continuously.

• Had saved data with different time intervals.

To solve these problems, it was first checked whether the nearest station to the positionhad any weather data available. Instead of excluding the restaurant if the weather station hadno information, the next nearest weather station was examined until a weather station whichhad weather history was found. The station must also have data for the dates in the requestedinterval. If the station did not have weather history for those dates, then the next weatherstation was searched. This was done because it was believed that slightly higher distances

14



/station/

/period/corrected-archive/data.csv


would not affect the weather too much. This is analyzed further in the discussion chapter.Once a station had been found with weather history for the dates sought, the downloadedweather information was processed. For parameters that provided more than one data pointper day, the mean value of all points was calculated and used instead.

The final weather variables used in this project are shown in Table 3.4.

Table 3.4: Weather variables

Variable Description Typerain The total amount of rainfall in millimeters on the date of sale Floattemp The average air temperature in degrees Celsius on the date of sale Floatwind The average wind speed in meters per second on the date of sale Float

3.4.2 Google Maps data

Initially, the plan was to include as much business information as possible about the restau-rants, in order for the model to be able to distinguish their unique sales patterns. This hy-pothesis was supported in previous research [21, 22, 27]. However, because Onslip do notrequire their customers to fill in any information specific to their business except for what isdemanded by law, there was no information that was available for every company used inthis study.

Instead, since there had been some success in using information obtained from Internetsearch engines as business indicators [26], and because use of public information would in-crease replicability, the option of using Google Maps as an external information source wasexplored. Variables were constructed based on the information available from Google, whichwas retrieved through their Web Services API using the Python library googlemaps2.

In the first step of the retrieval process, one API call for every address found in the receiptdata is made to the Geocode service, with the address and name of the restaurant. Thisreturns a list of locations that match all or parts of the search string. To filter out irrelevantresults such as the street, city block and municipality, which would always be found as longas the address is legitimate, search results that only have the types ’street address’, ’premise’or ’political’ are removed. Because all of the companies included are located in Sweden,any results found outside of Sweden are also discarded. After the filtering, the PlaceID ofthe location is extracted from the search result. This is an ID internal to Google Maps thatuniquely identifies the specific location.

In the second step this PlaceID is used in a call to the Place service, which returns thepublic information stored for this location. From this information a few fields are extracteddirectly, including its latitude and longitude, star rating and number of ratings, while otherfields go through more processing. Some calculations are performed on the opening hours ofthe business, and the number of hours open each day of the week is saved, along with if therestaurant was open during lunch (defined as 11:00-13:00) or dinner (17:00-19:00).

The latitude and longitude values, and the city where the business is located are used inan additional API call to the Directions service, to calculate the walking distance from thelocation to the city center, which is also saved.

Lastly, an API call to the Places service is made to find the number of nearby competitors.After removing too general types such as ’store’ and ’establishment’, a search is done withthe types of the location, and its latitude and longitude values. Google does not currentlysupport only including results within a certain radius, so an additional filtering of the resultsis therefore done to calculate the number of competitors within 1000 m.

The restaurant’s information is saved in JSON format for easy access and later added intothe data set by using the address of the restaurant where the sale was made to find the correct

2https://pypi.org/project/googlemaps/

15

https://pypi.org/project/googlemaps/

3.5. The combined data sets

file and adding it to the saved information for each sale. When merged with the sales data,there are in total 9 additional variables related to the business taken from the Google MapsAPI, as can be seen in Table 3.5.

Table 3.5: Restaurant variables

Variable Description Typedistance_to Walking distance in meters from the restaurant’s location to the Integer_city_center geographical center of the cityhours_open The number of hours the restaurant was open on the date of sale Floatlat Latitude value of the restaurant’s location Floatlng Longitude value of the restaurant’s location Floatopen_lunch If the restaurant was open during lunch (11-13) on the date of sale Booleanopen_dinner If the restaurant was open during dinner (17-19) on the date of sale Booleann_competitors The number of restaurants with the same types within 1000 meters of Integer

the restaurants’s locationn_ratings The number of ratings received on Google Maps Integerrating Average rating received on Google Maps Float

3.4.3 Product data

As with the business information, adding any information about specific products, for exam-ple their purchase cost, ingredients or sale price, is entirely optional for Onslip’s customers,and was missing for almost every company. Information of this kind is also not publiclyavailable, and would involve crawling each individual company’s website, in the case thatthey have one, hoping that every product is listed. It was therefore not feasible to add anyadditional product variables to the data sets.

3.4.4 Calendar data

The date from the receipt was used to add more variables related to the date of sale. Thedate is first converted to a Datetime object in Python with the datetime module3. With this,several variables are created, such as the weekday, the day of the year, and the week of theyear. Because it was believed that the weekday variable would have a significant impact,each weekday was also represented with individual variables.

The Datetime object is also used together with the holidays library4 to create additionalvariables representing if the current or following date is a holiday. As every Sunday is aholiday (or ”red day”) in Sweden, and the library methods reflect this behavior, the name ofeach holiday had to be checked, and any dates that were only tagged with ’Sunday’ were notcounted as holidays. There were 11 calendar variables added, shown in Table 3.6.

3.5 The combined data sets

The weather, calendar, restaurant and sales history data were combined with the receipt datato create one large data set with all the examined variables. The sales history data was mergedon the date of sale and either product ID and price in the product data set, or the restaurant’saddress in the total sales data set. Weather data was added using the date of sale and the ad-dress of the restaurant, while calendar and restaurant data were simply merged on the date ofsale and restaurant address respectively. Any data for dates that did not have correspondingdates in the receipt data was not included. If any external data could not be found for a com-pany, regardless of the reason, this company was not included. The combined product salesdata set contains 45 variables, and the total sales data set has 43. A list of all the variables is

3https://docs.python.org/2/library/datetime.html4https://pypi.org/project/holidays/

16

https://docs.python.org/2/library/datetime.html

https://pypi.org/project/holidays/

3.6. Company selection

Table 3.6: Calendar variables

Variable Description Typeday_of_week The day of the week of the date of sale Integerday_of_year The day of the year of the date of sale Integerfriday If the date of sale was a Friday Booleanholiday_today If the date of sale was a holiday Booleanholiday_tomorrow If the date after the date of sale was a holiday Booleanmonday If the date of sale was a Monday Booleansaturday If the date of sale was a Saturday Booleansunday If the date of sale was a Sunday Booleanthursday If the date of sale was a Thursday Booleantuesday If the date of sale was a Tuesday Booleanwednesday If the date of sale was a Wednesday Booleanweek_of_year The week of the year of the date of sale Integer

attached in the Appendix, and can be found in Table A.3 on page 41, and Table A.4 on page42.

3.6 Company selection

Not all the companies in the data that was provided by Onslip could be included in this study,mostly due to missing or incorrectly filled out fields in the information printed on the receipts.Some companies had names or addresses with spelling errors, and others had addresses thatdid not match the location of the restaurant. Because it would take a long time to individuallyinspect all transactions for spelling mistakes or manually search for the restaurant’s location,it was decided to discard these companies. There were also several fabricated companies,that are used internally by Onslip during testing. Of the initial 374 companies, 116 did nothave a legitimate address, and their positions could not be confirmed. This meant that therewas nothing that could be used when fetching weather data from SMHI and restaurant datafrom Google Maps. Their transactions were therefore not included when training the models,and no forecasts were made for these companies.

After the data extraction and external data collection processes, 258 companies remained.In an effort to improve the predictions of the model trained with every company’s data, thesecompanies were analyzed by plotting their sales over time and categorized as shown in Table3.7. In previous papers on this topic, it is mentioned that grouping data sources with similarsales patterns is believed to improve the model’s performance [22, 28].

Table 3.7: Company reduction

Description CompaniesMissing data 116Very few days 22Only summer 18Less than one year 57Irregular breaks 31Selected 130Total 374

Figure 3.3 shows examples of different identified sales patterns. First, any companiesthat did not have sales for more than 10 days were identified. Because they would not haveenough data to find meaningful patterns, they were removed from the data sets. Then, itwas discussed that companies that are open for just a few weeks during summer should betrained together in a separate data set, since they only had small amounts of data and someof the examined parameters, such as month, temperature, and holidays, did not show muchvariance. Companies that did not have one full year of sales were also not included, as it was

17

3.6. Company selection

(a) Continuous sales over multiple years (b) Fabricated data with over 2 000 items in a singleday

(c) Open only in the summer (d) Irregular breaks

Figure 3.3: Examples of companies with different sales patterns

believed that if a date had not occurred twice it would be difficult to compare any seasonalimpact of the examined parameters. Lastly, companies that had large gaps in their data withirregular intervals were filtered out. This did not include companies that were closed onthe same days every year, for example during Christmas and New Year, or a few weeks ofsummer vacation.

Figure 3.4a shows the product sales quantities over time for all 374 companies that wereinitially extracted from the receipt data. It shows that the sales are increasing over the yearsand have peaks in the middle of each year. The increasing sales are due to the increasingnumber of Onslip´s customers, and the peaks are mostly from restaurants that are only openduring the summer. Figure 3.4b shows the same graph for the 130 companies remaining afterthe selection process. When comparing this figure to Figure 3.4a, the new data set has a muchmore even distribution, and the peaks in the middle of each year are almost gone.

(a) All 374 companies provided (b) The selected 130 companies

Figure 3.4: Total number of products sold per day from January 2015 to May 2019

At the end of the selection process, 130 companies had been identified that had mostlycontinuous sales over more than one year. Although this meant that only about 35% of theinitial companies remained, they still accounted for almost 70% of the transactions, as shownin Table 3.8.

18

3.7. Model implementation

Table 3.8: Data set reduction

All SelectedTransactions 11 481 129 7 955 897Items 34 163 817 17 659 400Products 59 076 29 210Amount 1 854 MSEK 1 125 MSEK

3.7 Model implementation

The combined data sets were split into training and test sets. The test set, which consisted ofthe last 28 days of the data set, was saved for evaluation. The other days were used to train anXGB regression model. In order to use the regressor, all variables must be converted to eitherintegers, floating point numbers or Boolean values. The name and address of the restaurantwere therefore replaced with a unique ID. The prediction target was also separated from theother variables.

The regressor is initialized with parameters that decide how the training will be per-formed. In this implementation, the maximum allowed number of trees, the maximum al-lowed depth of a tree, and the learning rate were changed from the default values. Thevalues used instead were found by creating several models with different parameter settingsand comparing their performance. The values that occurred most often in the top perform-ing models were selected. Finding the best parameter settings took more than one day percompany. Due to the large number of companies it would be too time-consuming to find theoptimal parameters for every company and therefore the same parameter settings were usedto train every model.

To avoid overfitting, the early-stopping functionality of XGB was used. This lets the modelevaluate its progress after each successively generated tree, and stop the training if there isno improvement. In order to use early-stopping, a validation set needs to be provided, whichis used to measure the accuracy of the current model. 10% of the training set was randomlychosen and set aside for validation. The training was stopped if a better tree could not begenerated for 100 iterations.

When a model has been created it can be saved to a file, and later loaded to continuetraining or make predictions. An extract of the code to implement the model can be seen inListing 3.1.

x_train, x_val, y_train, y_val = train_test_split(x, y, test_size = 0.1)

model = xgb.XGBRegressor( max_depth = 3,learning_rate = 0.01,n_estimators = 6000,booster = ’gbtree’,objective = ’reg:squarederror’)

model.fit( X = x_train,y = y_train,early_stopping_rounds = 100,eval_metric = [’mae’],eval_set = [(x_val, y_val)])

Listing 3.1: Code used to train the model

19

3.8. Evaluation

3.8 Evaluation

To evaluate the performance of a model, its sales forecast was compared to the actual salesvalues that had been extracted from the receipt data. The model’s accuracy was measuredusing the MAE metric, which was chosen because it could be easily interpreted and explainedto a person without knowledge of statistical analysis methods. Since the different forecastsare only compared against each other to measure their relative performance, the main concernwas to obtain the results in a way that would enable good visualization for Onslip and itscustomers.

For each product or restaurant, the latest 28 days were withheld from the model duringtraining and used as a test data set. It was discussed that 28 days would be a good time frameto visualize the forecast for Onslip and its customers. When using the combined data setsto evaluate the model for specific companies, only the last 28 days of sales for that companywas put aside. Before producing any forecasts, the prediction target column (quantity in theproduct set and sum in the total sales set) was removed from the test data. The model wasthen given the reduced data points to make individual predictions on how much would besold on each day with the given parameters. No predictions were made for days in the testset that did not have any sales, since a restaurant would not try to forecast their sales on aday when they would not be open. The forecast was then compared to the real values, andthe MAE for the whole test set was calculated.

One initial evaluation was performed to determine if it was better to train a model onall the available data, or only use data generated by each company. Because of the amountof time it would require to compare every company with both data sets, only the total salesset was used. First, for every company, the combined data set, excluding that company’slast 28 days of sales, was used to train individual XGB regression models. The models thentried to forecast the total sales amounts for the company on those days, and their predictionswere saved and evaluated. Then, also for every company, additional regressors were trainedusing only that company’s data, except for the last 28 days. New forecasts were made forthe total sales amounts, which were also saved and evaluated. The two forecasts were thencompared and the relative difference in MAE was calculated for every company. Lastly, themean difference for all companies was calculated.

To evaluate the impact that the different parameters had on the predictions, a similarprocedure as the one just described was used to train several regression models. Both datasets were used in this evaluation. For each company, for both the product and total sales datasets, a model was first trained without any additional variables added. The only availableinput parameters for these models were the dates and the restaurant’s ID, and either theproduct ID, price and quantity, or the total sales amount. The predictions made by the firstmodels were saved and the MAE was calculated, so it could be used as a baseline to compareagainst later models that would be given more variables.

Because there were too many variables to examine them all individually5, the parameterswere evaluated in the groups described earlier in this chapter. One data set was created foreach of the examined parameter groups and one with all the groups together. One modelwas trained per data set. Every model then made predictions for the same days of the testdata, with only the input variables from its corresponding parameter group. The predictionswere saved, evaluated and their accuracy was compared against the baseline model to deter-mine which parameters contribute the most to a better sales forecast. For each company, theforecasts using more information were compared against the forecast made with only receiptinformation, and then the mean difference in MAE for all included companies was calculated.

5With 42 input variables there are 242 « 1012 different combinations

20

4 Results

4.1 Model training

Table 4.1 shows the results of comparing individually and collectively trained models for allthe examined companies. For the majority of the companies, the model did not perform betterwhen being provided with all the available data, and instead showed lower MAE scores forthe forecasts when the training was done with only that companies’ sales data. 75 of the130 companies had better prediction accuracy when training on their own sales data and 55companies showed better results when training on all the data.

For the companies where the model performed better with individual training, the pre-diction accuracy was on average 14.37% worse when using all the available data, while theMAE scores improved by 14.22% for the companies that preferred collective training. Over allthe examined companies, individual training showed an average improvement of 2.17%. Thefull comparisons between individual and collective training for all companies can be foundin Tables A.5-A.7 starting on page 43.

Because more companies benefited from individual training, this method was used forthe remainder of the project.

Table 4.1: Comparison of individual and collective training

Preferred method Companies Average differenceIndividual 75 14.37%Collective 55 14.22%

4.2 Forecast accuracy

To study how additional information affects the accuracy of a sales forecast, each categorywas examined separately. This was done by comparing the predictions of a model trainedwith the variables from each category as input parameters, to those of the baseline model. Theresults of these comparisons can be seen in Table 4.2, which shows a summary of individualtraining sessions for all the examined parameter groups and companies. The full comparisonscan be found in Tables A.8-A.10 beginning on page 46, and Tables A.11-A.13 beginning onpage 49.

21

4.3. Feature importance

Table 4.2: Average improvement of model sales forecast for all selected companies

Sales history Restaurant Calendar Weather AllProduct 4.67% 2.03% 5.55% 0.31% 6.88%Total 12.74% 19.07% 21.14% -0.56% 26.62%

The table clearly shows that adding extra information has a significant positive effect onthe predictions of the examined algorithm. By providing the model with all the additionalvariables, the forecast accuracy increased by 6.88% on average for the product sales data set.Forecasts for the total sales data set show an even larger improvement with an average 26.62%more accurate forecast.

The calendar parameter category has the largest impact for both data sets, followed by thesales history variables in the product sales data set and the restaurant variables in the totalsales data set. The weather factors show the least contribution, and even have a negativeeffect on the accuracy of predictions for the total sales data set.

4.3 Feature importance

A variable’s feature importance (FI) shows the proportion of times the model has decided tosplit the data set on that variable. These values can be retrieved from the XGB model, andwere taken after the training had been completed for each company. The average importancesof each variable for all companies’ models are shown in Figure 4.1.

Looking at the figure, it is clear that the sales history variables are the most significant,with 8 and 7 of the top 10 most important variables in the product and total sales data setsrespectively. The variables with lower weights in the sales history category are still in the tophalf overall. This category performs especially well for the product sales data set.

All of the weather factors are in the middle for both data sets, with temperature slightlyabove the other two.

The restaurant variables are the worst category, although hours_open is a notable exceptionwith the second highest importance in the total sales data set and a significant share of thesplits in the product data set.

Many of the calendar variables show high feature importances, especially for the totalsales data set where day_of_week is ranked first. Among the individual weekday variables,only friday and saturday seem to be important.

The figure also shows that, in the product sales data set, most of the splits are performedon a few number of variables, while the feature importances for the total sales are more spreadout.

The Pearson correlation coefficient (PCC) is a measure of how linearly correlated twovariables are. The coefficient ranges from -1 to +1, where a value close to ˘1 means that thevariables are closely correlated, while a value near 0 means there is little or no correlation.Because the gradient boosting algorithm tries to find linear relationships in the data, it wasexpected that the feature importances for the different input parameters would resemble thecorrelation coefficient to the prediction target.

Table 4.3 shows the five most over- and under-performing variables in the product salesdata set and Table 4.4 shows the corresponding variables for the total sales data set. The tablesinclude the variables’ feature importance and correlation coefficient, and its performance isthen measured by the difference in rankings between its expected and actual significance.Feature importances and Pearson correlation coefficients for all examined variables can befound in Table A.14 on page 52, and Table A.15 on page 53.

The same four variables from the calendar category are in the top five of both data sets:day_of_year, week_of_year, holiday_today and holiday_tomorrow. A similar pattern can be seen

22


in the bottom five, with the same three weekday variables represented for both data sets:monday, tuesday and wednesday.

Figure 4.1: Average feature importance for the researched variables

23


Table 4.3: The 5 most over- and under-performing variables in the product sales data set

Variable FI FI rank PCC PCC rank Differenceholiday_today 0.0072 23 0.0018 38 +15day_of_year 0.0106 16 0.0083 27 +11week_of_year 0.0089 18 0.0082 29 +11id 0.0531 5 -0.0893 15 +10holiday_tomorrow 0.0038 31 0.0009 41 +10

......

......

......

wednesday 0.0032 34 -0.0127 24 -103_days_ago 0.0073 22 0.4889 11 -11364_days_ago 0.0064 25 0.2644 14 -11tuesday 0.0032 33 -0.0189 22 -11monday 0.0000 39 -0.0195 21 -18

Table 4.4: The 5 most over- and under-performing variables in the total sales data set

Variable FI FI rank PCC PCC rank Difference4_days_ago 0.0151 22 -0.0045 42 +20day_of_year 0.0250 12 0.0376 29 +17week_of_year 0.0200 15 0.0358 31 +16holiday_today 0.0150 23 -0.0128 38 +15holiday_tomorrow 0.0115 27 0.0110 39 +12

......

......

......

open_lunch 0.0046 33 0.0900 19 -14wednesday 0.0035 34 -0.0772 20 -14tuesday 0.0033 35 -0.1350 17 -18distance_to_city_center 0.0000 39 0.0742 21 -18monday 0.0000 37 -0.1603 15 -21

24

5 Discussion

5.1 Model training

When the project began, the hope was to use all the available data to train one single modelto make forecasts for all the companies that use Onslip’s services. It was believed that moreinformation would give a better result. One common solution would also let Onslip offersales forecasts to businesses that are not currently their customers, by using the data thatthey have. However, the results showed that for most companies, individual training wasthe preferred method. This is also the method that is most used in earlier research, with onlyone other study performing predictions for a large number of restaurants [27].

The low accuracy of the collectively trained models was the initial reason for the com-pany selection process. At first, there was a discussion about using clustering algorithms tofind companies and products with similar sales, but it was decided that this fell outside thescope of this project. The inspiration for trying to group companies according to their salepatterns came from an article by Islek and Ögüdücü [22], which used warehouses instead ofcompanies, and comments by Holmberg and Halldén on future extensions of their work [28].Instead, a different idea taken from an article by Žliobaite et. al. [21], who excluded productswith sales patterns classified as random, lead to the manual selection process that was even-tually used. When it was later discovered that individual training still showed better results,the companies without regular sales patterns had already been removed from the data set.The choice was made to continue to focus on improving forecasts for the selected companies,rather than try to incorporate more data.

One reason for the bad performance of collective training could be that the companies aresimply too different to be used together. If the included companies had even more similarsales patterns, or perhaps belonged to the same franchise, it may have produced a differentresult. Previous research has managed to combine the data of multiple restaurants in thesame location with some success [26], but not achieved the same results with restaurants indifferent locations [28]. This suggests that there needs to be some similarity in the sales of therestaurants to be able to combine their data. Because of the large number of companies usedin this study, the data includes many different types of restaurants that are affected in differ-ent ways by the same variations in the examined input variables. This could have created toomuch noise in the data for the model to pick up on every combination of values that affectseach company. This assumption is backed up by looking at the feature importances taken

25

5.2. Result

from the model that was trained with all the companies’ available data, shown in Figure 5.1.As can be seen in this figure, almost all of the split decisions relied on the sales history vari-ables, which indicates that the model did not find any effects of holidays or weather factors.While a company’s sales figures are fairly accurate representations of its business, it was notthe behaviour desired of the model and resembled manual forecast methods too much.

Figure 5.1: Feature importance for the collectively trained model

Individual training did not provide a distinctly better result overall, and a significant shareof the companies performed better with the collective model. It would have been interestingto spend more time to find out if these companies had any common characteristics, but asthere were other problems with using data combined from different companies, it was ulti-mately decided to continue with the individual training method.

To be able to use collective training at all, much of the responsibility was put on the restau-rant variables. Their impact on the results of different training methods is discussed furtherin Section 5.2.1.

The evaluation of the collectively trained model was only performed for predictions onthe total sales data set. This decision was made partly because of time constraints; it tookabout a week to train and evaluate the collective models for total sales, and the product dataset was several times larger. As there had already been issues when training with multiplecompanies, the results were not likely to improve when combining all of their products.

5.2 Result

Additional information did not have a positive impact on the forecasts for all companies.Tables 5.1 and 5.2 summarize how the models’ predictions were affected by each category.

26

5.2. Result

The tables show that none of the examined categories improved the forecast accuracy forevery company. The variables that most often had a positive impact on the forecasts were thesales history category for the product sales data set, and the restaurant variables for the totalsales data set. With so many different companies, it is probably not possible to create inputvariables that would always increase the model’s performance.

These results correspond well to the average improvements shown in Table 4.2 in the re-sults chapter, except for the restaurant category’s impact in the product sales data set. Addingthese variables had a positive impact on almost as many companies as the sales history andcalendar categories, but improved the predictions by less than half as much.

Table 5.1: Effects of adding information on product sales forecasts

Sales history Restaurant Calendar Weather AllPositive 106 98 105 67 111Negative 24 32 25 63 19

Table 5.2: Effects of adding information on total sales forecasts

Sales history Restaurant Calendar Weather AllPositive 100 114 112 72 119Negative 30 16 18 58 11

By looking at the companies where the model made the least accurate predictions, andwhere additional information only made the predictions worse, the decision to exclude com-panies with irregular sales patterns is supported further. Figure 5.2 shows the two companiesthat had the largest decrease in prediction accuracy between the default model and the onewith all input variables. Both of these companies have irregular gaps of different sizes, andsome days with much higher sales than their normal levels that are not explained by the vari-ables examined in this project. Even though they are not as obvious as the examples givenearlier, they would probably not have passed a more thorough selection process. Of the twocompanies described, one is a food truck which moves around, and the other is a cafe belong-ing to a theater. These have sales factors that are outside the scope of this project, such as theevents happening where the food truck is parked for the day, or the popularity of a particu-lar show. Although the low prediction accuracy for these types of companies is unfortunate,they also may not benefit as much from forecasting their own sales, but rather the number ofvisitors to any related events.

Figure 5.2: Product sales for the two companies whose forecasts were affected worst byadding more information, with 156.6% and 69.71% decreases in prediction accuracy as com-pared to using only receipt information

One issue with only including specific companies is that Onslip would, of course, like tooffer their services to any company without putting requirements on their business hours or

27

5.2. Result

the behaviour of their customers. One suggestion is to provide a forecast solution with cus-tom variables, which can be added based on the company’s own knowledge. A company canthen represent factors that uniquely impact their sales and hopefully improve the accuracyof the model. Another option is to use an algorithm that does not rely as heavily on findinglinear relationships.

5.2.1 Restaurant variables

The addition of the variables in the restaurant category was based on the belief that if themodel was given a sufficient number of variables that could be used to distinguish differentrestaurants or companies from each other, all the available data could be combined and onlyone collective model would need to be trained. This proved not to be the case, as mentionedin Section 4.1 and discussed earlier in this chapter, and the individually trained models per-formed better for more companies.

When the models only processed data generated by one company at a time, many ofthe restaurant specific variables obtained from the Google Maps API became constants. Forinstance, any company that did not have multiple restaurants would have only one singlevalue throughout the entire data set for every restaurant related variable except the threeused to represent the restaurant’s business hours. Because the variables do not vary with thesales at all, the model does not ever try to use them to split the data set. This is reflected inthe feature importances presented in Section 4.3, where the restaurant variables make up 8 ofthe bottom 10 variables in the product data set, and 6 of the bottom 10 in the total sales dataset.

Possible reasons for the restaurant category’s disappointing results could be that the 9 se-lected restaurant variables were not enough to separate the different restaurants from eachother, or that the variables were too similar to capture different aspects of what makes arestaurant unique. For replicability reasons, the restaurant specific information used to cre-ate variables was limited to what could be found from publicly available sources. This wouldalso have had the added effect of not needing to provide licenses to users of an eventualcommercial product, or customizations for different information sources for every customer.Another source of error is that the information gathered from Google Maps is updated regu-larly, and may not reflect the actual values at the time of sale.

The contribution from the variables in this category could hopefully be improved byadding information that more accurately represents what affects the sales of a restaurant,for example the number of employees, number of available seats and square footage of thesales area, as used in previous research by Islek and Ögüdücü[22] and discussed by Liu andIchise [23]. Onslip does not currently have this type of knowledge about its customers, but ithas been discussed as a potential requirement for any businesses looking to use an eventualforecast product in the future.

It would also have been interesting to include more variables related to the locationof the restaurant, for example the city’s population and area. The idea behind the dis-tance_to_city_center variable was to capture variations within the same city, but restaurantsin Stockholm are expected to sell more on average than those located in a smaller Swedishcity, and this potential pattern was not represented by any variable in this project.

Another possibility is that the large number of variables provided too much noise to themodel. Lowering the number of input parameters could enable the model to put more im-portance on the restaurant’s ID and distinguish them in an easier way. In a study by Xinliangand Dandan [26], only an ID variable was included to separate restaurants, and the modelmade more than half of its decisions on this variable. However, the data used was providedby four restaurants compared to our 130. The relative success of the product ID variable inthe product data set, which did not include any other product related variables due to lack ofavailable information, adds further support for using fewer variables.

28

5.2. Result

5.2.2 Calendar variables

The decision to add variables for each specific weekday had a very mixed outcome. Whilethe saturday and friday variables slightly outperformed their expected importance, the otherfive showed no real importance to the model. This is explained by looking at Figure 5.3,which shows that Fridays and Saturdays have significantly higher average sales than theother weekdays with more similar values.

Figure 5.3: Average sales per weekday

Additionally, the original day_of_week variable had the highest feature importance in thetotal sales data set, ranking 10 positions higher than what was expected, and 8 places higherin the product sales data set at rank 9. This result indicates that it may have been enough torepresent the weekday with one single variable. A simple solution would be for a companyto look at the specific weekdays when their sales increase and add variables for those days.

Both of the holiday variables had a much bigger impact than suggested by their corre-lation coefficients, and ranked 10-15 places higher than expected in the feature importancetable for both data sets. However, despite over-performing, they were still only in the middleof the table and did not show the significance that was hoped for. One possible explanationis that many of the holidays are celebrated differently, and therefore also impact the salespattern of a restaurant differently. For example, few people in Sweden go to a restaurant onChristmas Eve, while New Year’s Day is one of the days when Swedes eat out the most. Thisis illustrated in Figure 5.4 which shows the average sales amount on each Swedish nationalholiday.

As of 2019, there are only 16 Swedish national holidays, which would make it possible torepresent each one with its own variable, although it is not certain this would improve theresult. This design was used by Žliobaite et. al. [21], who found that one of the examinedholidays had a much bigger effect on sales than the others. There are also other yearly re-curring days that may affect sales more than a holiday. As with the weekday variables, thebest solution seems to be for each company to examine during which days of the year thereare deviations from normal sales levels. These days can then be represented in the model,perhaps grouped together in a few different variables for days with similar impacts.

29

5.2. Result

Figure 5.4: Average sales on Swedish national holidays

While the day, month and year variables did not explicitly belong to this category, they willbe discussed here as it seems most appropriate. All of these variables performed about asexpected, and on average slightly better than their correlation coefficients. There are knownmonthly and yearly variations in sales. For example, salaries and pensions are paid out atthe end of the month which usually leads to an increase in sales, and people go on vacationduring the summers which impacts tourist and working cities differently. It was expectedthat some of this behaviour would be reflected in these variables, but the results showed thatthey had relatively low importance to the decisions of the model.

However, the variables day_of_year and week_of_year were among the 5 most over-performing in both data sets. This suggests that the model discovered at least some amountof seasonal variation in the sales data, but that these patterns were only picked up by a fewof the variables. It is also likely that the importance of many of the calendar variables wereover-shadowed by the sales history category which was also added to capture sales that varyover time.

5.2.3 Sales history variables

Based on the results seen in previous research, and because the sales history is the basis formost manual forecast methods, it was expected that these variables would have a big impacton the predictions of the models. This can be seen in the feature importances shown in Figure4.3, where 8 and 7 of the top 10 most important variables in the product and total sales datasets respectively are from the sales history category. The uneven variables with sales figures2-6 days earlier performed slightly worse than what their PCC suggested, but in line withwhat was believed when the project started, as there are no known relations between saleson different weekdays.

One potential problem is the high importance of the 1_day_ago variable, which could in-dicate that the model’s predictions would miss out on the factors that cause the first day ofrising or falling sales. If a company is not prepared at the start of a period of increased sales,it can be hard to catch up and they may lose even more sales opportunities in the followingdays.

The 364_days_ago variable did not perform as good as expected. Many manual forecastsare made by simply looking at how much was sold on the corresponding day of last year,and it was therefore assumed that it would be more important than it actually was. This

30

5.2. Result

was most likely affected by the implementation used to create the sales history category.Because the X_days_ago variables need to have that many previous days in the data set, thefirst continuous 364 days would not have real values and they were replaced with the firstoccurring value. This creates almost a year of sales where the variable does not change. Amore suitable solution could have been to instead replace the missing values with zeroes, ornot replace them at all and let the problem be handled by XGBoost’s default directions.

The sales history variables were much more important for the product data set than fortotal sales. This could be for similar reasons as to why individual training gave better results.Because the data contains products that have different correlations to the input variables, theyare more accurately represented by their previous sales figures than any of the other factors.

The results obtained for the sales history variables are in line with previous research.There was only one other study found which used data from as many companies [27]. Theirresults showed that mean and maximum visitors were two of the most important variables,which, because their prediction target was the number of visitors, correspond to the saleshistory of the restaurant.

5.2.4 Weather variables

In the early stage of the project it was discussed that adding weather factors would have abigger impact than the results eventually showed. As mentioned in related work, two articlespresented similar research where the results showed that weather factors had the least impactof the examined variables [26, 28]. What distinguished one of the articles from this researchwas the number of weather variables. They only used SMHI´s historical weather data andcould therefore use 7 variables. However, because SMHI did not provide the same type ofdata on its forecast API as the historical weather data, the weather factors in this research werelimited to only 3 variables. The other research used the same type of weather variables as inthis study, but instead of XGBoost they used a back propagation neural network. Anotherbig difference in the two studies was that they only analyzed 3 and 4 restaurants, which arefar fewer restaurants compared to this study.

Table 5.3 shows the companies with the 5 best and worst improvements in predictionaccuracy after adding the weather variables. The table shows that weather factors can havea significant effect on sales of individual companies, however our results showed that not allcompanies are affected by the weather, and the same weather factor has different effects ondifferent companies. The results also showed that the most important weather factor was thetemperature, followed by wind speed and rain.

Table 5.3: The 5 most and least improved total sales predictions with the addition of weatherinformation

Company Original Weather Improvement12 3217.45 2320.26 27.89%90 2291.10 1782.71 22.19%44 12215.89 9640.02 21.09%117 2835.36 2300.38 18.87%85 4909.91 3992.60 18.68%...

......

...6 3339.42 4321.65 -29.41%84 987.22 1364.12 -38.18%17 973.59 1356.42 -39.32%20 8817.64 13269.31 -50.49%71 3145.77 5725.31 -82.00%

As mentioned in Section 4.2, adding weather factors had a negative average impact forthe total sales data set. However, looking at Table 5.2 presented in Section 5.2, 72 companieswere positively affected compared to 58 with negative effects, which means the majority of

31

5.2. Result

companies still improved. The negative average value can be explained by looking at Table5.3, where the companies that were affected most negatively had a greater effect than forthe companies with the most positive effects, and therefore the average decreased. Whenanalyzing the different companies to try to understand why the model performed poorly, itturned out that the 5 worst were smaller cafes while the companies that improved the mostwere fast food restaurants.

One source of error with the weather information is the implementation of the search forhistorical data, which could have been more optimized. Instead of searching for a weatherstation that had data for the entire time period, if a part of the period was found, these valuescould have been saved and the other missing dates searched for in the remaining weatherstations. This would have provided more accurate weather data for the position where therestaurant is located.

Figure 5.5: Relationship between the maximum distance to a selected weather station and theforecast improvement from adding weather factors

Figure 5.5 shows how much each company’s forecast improved with the addition ofweather parameters, and how far away they were from the weather station used. As shownin the figure, some stations were found much further away than the overall average, whichreinforces the proposal to improve the search. It also shows that the three worst-improvingcompanies found their stations relatively early, and two of the best-improving companiesfound their stations a little further away than the overall average. As the figure does notshow any falling spread as the improvement increases, there seems to be no real connectionbetween how close the restaurant is to the weather station and how it affects the model. Anyvalues used were also obtained as averages over the entire day, which should reduce theeffect of local variations in some factors, for example rainfall.

An additional improvement would have been to not use the weather values directly, butinstead calculate how much they differed from the previous day, or from the monthly orweekly average. The same temperature, for example, will feel very different depending onwhat time of year it is, but a drop in temperature will more often cause people to stay home.

The wind and temperature information retrieved from SMHI contained more than onemeasurement per day, and therefore had to be converted to daily averages in order for them

32

5.3. This work in a wider context

to be used together with the other examined variables. Afterwards, it was noted that it wouldhave been better to calculate the average during the hours that each restaurant was openinstead of for the whole day. This would have provided more accurate information on whatcustomers experienced.

Since the plan from the beginning of this project was to provide Onslip with the basis tocreate a new forecast service for their customers, the included weather parameters also hadto be found in the weather forecast from SMHI. This led to a large decrease in the number ofavailable weather factors, and only three factors were eventually used. For the research partof this project, it would have been interesting to include more weather parameters, and studythe effects on restaurant sales of as many factors as possible.

5.3 This work in a wider context

Not knowing customer demand is one of the biggest difficulties that a restaurant has in re-ducing food waste [38]. Underestimates will force the restaurant to turn away customersand potential revenue, while overestimates will lead to unnecessary spending and more foodneeding to be thrown away. With better sales estimates, a restaurant does not need to have asmuch raw material stored in preparation for an increase in customers. With lower inventorylevels, less food will be left over and expire, thus reducing their food waste.

From an employee point of view there are both advantages and disadvantages of moreaccurately estimating the number of visitors to a restaurant. A more balanced schedule willlead to less stressful shifts and a better work environment in the restaurant. However, thereis a risk of more employees being asked to only cover periods with increased amounts ofcustomers and receiving reduced and more fragmented work hours.

One of the major concerns with the current trend of increased storage and use of data,is the already large environmental impact of the information and communication technol-ogy (ICT) sector [39–41]. Machine learning tasks are often computationally demanding, andtherefore require large amounts of energy to perform. Many companies offering computa-tional resources have specialized hardware, developed to accomplish these tasks faster. Ina recent paper, Strubell et. al. [42] analyzed the environmental impact from developmentand training of machine learning models for language processing. They found that, for theworst case model, if performing all computations with eight NVIDIA P100 GPUs, it wouldcontribute more than 626,000 kg of carbon dioxide equivalent emissions during its life cy-cle. This is more than five times as much as the average car. Although the XGB library usedfor this project is not as energy consuming as many other applications of machine learning,replacing manual forecast methods with automated ones will inevitably increase energy us-age. It is important that this does not offset the potential energy savings from any improvedeffectiveness.

The transaction data used in this study is sensitive for all related parties: both Onslip, therestaurants and their customers. It is therefore important that the data is stored in a secureway. If the model’s decision are to be explained in a transparent way, the data also needs tobe anonymized.

33

6 Conclusion

This project has analyzed sales of multiple restaurants, and examined how an automated fore-cast process should be developed for customers using Onslip’s services. Due to the diversitywithin this industry, and how the same change in one factor can have considerably differenteffects on the sales of different companies, it does not appear possible to provide a generalsolution. Instead, a better option seems to be to train individual models for each company sothat the factors that have the greatest impact on their sales pattern can be emphasized.

The results presented in this paper show that the predictions made by a machine learningmodel can be improved significantly by providing input variables of factors known to impactsales. When adding all of the factors examined in this project, the accuracy of predictions in-creased for 119 out of 130 companies, with an average improvement of 26.62%. The calendarvariables provided the greatest average effect, and the restaurant information had a positiveimpact on the most companies.

To achieve the most accurate sales forecast, we believe that each company needs to beanalyzed individually to find factors that correlate to increased or decreased sales. These fac-tors should then be turned into input variables for the predictive machine learning algorithm.However, this can be a time-consuming process and must be weighed against potential gainsfrom forecast improvement.

6.1 Future work

Every factor affecting sales was not captured in the variables included in this project. Futurework could look deeper into other factors that have an impact on customer behaviour, andmore ways to optimize the forecast model. There is also an interest in looking at the resultsfor companies that were excluded during the selection process, for example the companiesthat were only open during the summer.

One area that was not explored as much as hoped was the restaurant information cate-gory. Due to unavailable information, it was not possible to construct the wanted variables.Instead, others had to be chosen that did not reflect the patterns that they were intendedto capture. This showed clearly in the collectively trained models, which had difficultiesdistinguishing unique patterns, and gave worse results than using each company’s data in-dividually. A possible improvement would be to give the model information that is moreclosely related to the restaurant, for example its size, how many waiters they have or if they

34

6.1. Future work

have outdoor seating. This would hopefully provide more useful variables and increase theprediction accuracy further.

It could also be interesting to explore using less information to describe each restaurant.Based on the high importance of the product ID variable used in this study, and similar resultsin previous research, it seems possible to combine multiple data sources and simply assignthem a unique number. However, this could require the number of input variables to be keptto a minimum, so as to not overwhelm the model with too much noise.

One solution to allow a model to predict the sales of multiple restaurants or products,that has also been used and discussed in previous research on this topic, is to use clusteringalgorithms to find similarities. If different items can be properly grouped before applyingpredictive algorithms, it may produce better results.

There is still much to investigate of the requirements needed to combine data from dif-ferent sources. This project did not attempt to use data from multiple companies in thepredictions of individual products, but this has been identified as an interesting future re-search subject by Onslip. They are looking to investigate how similar products from differentrestaurants can be found from product data such as names and sales patterns.

35

References

[1] M. Meeker. Internet Trends 2018. P.8. URL: https://cdn.relayto.com/media/files/JzrE69rBRKm9NhkUM2Fk_internettrendsreport2018.pdf. (Accessed:26.04.2019).

[2] Domo Inc. Data never sleeps 5.0. URL: https://www.domo.com/learn/data-never-sleeps-5. (Accessed: 05.13.2019).

[3] Domo Inc. Data never sleeps 6.0. URL: https://web-assets.domo.com/blog/wp-content/uploads/2018/06/18_domo_data-never-sleeps-6verticals.pdf. (Accessed: 05.13.2019).

[4] David Reinsel, John Gantz, and John Rydning. The Digitization of the World: From Edgeto Core. Tech. rep. 2018. URL: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf. (Ac-cessed: 05.13.2019).

[5] LLC Dresner Advisory Services. 2017 Big Data Analytics Market Study. Tech. rep. 2017.URL: https://www.microstrategy.com/getmedia/cd052225-be60-49fd-ab1c - 4984ebc3cde9 / Dresner - Report - Big _ Data _ Analytic _ Market _Study-WisdomofCrowdsSeries-2017.pdf. (Accessed: 05.13.2019).

[6] NewVantage Partners LLC. Big Data and AI Executive Survey 2019. Tech. rep. 2019. URL:https://newvantage.com/wp-content/uploads/2018/12/Big-Data-Executive- Survey- 2019- Findings- Updated- 010219- 1.pdf. (Accessed:05.13.2019).

[7] Bloomberg Businessweek Research Services. The Current State of Business Analytics:Where Do We Go From Here? Tech. rep. 2011. URL: http://docs.media.bitpipe.com/io_10x/io_104896/item_536511/busanalyticsstudy_wp_08232011_FINAL.pdf. (Accessed: 05.13.2019).

[8] Hsinchun Chen, Roger HL Chiang, and Veda C Storey. “Business intelligence and ana-lytics: From big data to big impact”. In: MIS quarterly 36.4 (2012).

[9] Samuel Fosso Wamba, Shahriar Akter, Andrew Edwards, Geoffrey Chopin, and DenisGnanzou. “How ‘big data’can make big impact: Findings from a systematic review anda longitudinal case study”. In: International Journal of Production Economics 165 (2015),pp. 234–246.

36

https://cdn.relayto.com/media/files/JzrE69rBRKm9NhkUM2Fk_internettrendsreport2018.pdf

https://cdn.relayto.com/media/files/JzrE69rBRKm9NhkUM2Fk_internettrendsreport2018.pdf

https://www.domo.com/learn/data-never-sleeps-5

https://www.domo.com/learn/data-never-sleeps-5

https://web-assets.domo.com/blog/wp-content/uploads/2018/06/18_domo_data-never-sleeps-6verticals.pdf



https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf

https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf

https://www.microstrategy.com/getmedia/cd052225-be60-49fd-ab1c-4984ebc3cde9/Dresner-Report-Big_Data_Analytic_Market_Study-WisdomofCrowdsSeries-2017.pdf



https://newvantage.com/wp-content/uploads/2018/12/Big-Data-Executive-Survey-2019-Findings-Updated-010219-1.pdf

https://newvantage.com/wp-content/uploads/2018/12/Big-Data-Executive-Survey-2019-Findings-Updated-010219-1.pdf

http://docs.media.bitpipe.com/io_10x/io_104896/item_536511/busanalyticsstudy_wp_08232011_FINAL.pdf



References

[10] Aleš Popovic, Ray Hackney, Rana Tassabehji, and Mauro Castelli. “The impact of bigdata analytics on firms’ high value business performance”. In: Information Systems Fron-tiers 20.2 (2018), pp. 209–222.

[11] The Nilson Report. Global Cards - 2015. URL: https://www.nilsonreport.com/publication_special_feature_article.php. (Accessed: 26.04.2019).

[12] 7 Kap 1-2 § Bokföringslag (1999:1078). URL: https : / / www . riksdagen .se / sv / dokument - lagar / dokument / svensk - forfattningssamling /bokforingslag-19991078_sfs-1999-1078. (Accessed: 26.04.2019).

[13] James T Rothe. “Effectiveness of sales forecasting methods”. In: Industrial MarketingManagement 7.2 (1978), pp. 114–118.

[14] Timothy D Rey, Chip Wells, and Justin Kauhl. “Using data mining in forecasting prob-lems”. In: SAS Global Forum 2013: Data Mining and Text Analytics. Citeseer. 2013.

[15] Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. Data Mining: Practicalmachine learning tools and techniques. Morgan Kaufmann, 2016.

[16] Lior Rokach and Oded Z Maimon. Data mining with decision trees: theory and applications.Vol. 81. World scientific, 2014.

[17] Terence Parr and Jeremy Howard. How to explain gradient boosting. URL: https://explained.ai/gradient-boosting/index.html. (Accessed: 21.05.2019).

[18] Tianqi Chen and Carlos Guestrin. “Xgboost: A scalable tree boosting system”. In: Pro-ceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and datamining. ACM. 2016, pp. 785–794.

[19] Grigorios Tsoumakas. “A survey of machine learning techniques for food sales predic-tion”. In: Artificial Intelligence Review (2018), pp. 1–7.

[20] Philip Doganis, Alex Alexandridis, Panagiotis Patrinos, and Haralambos Sarimveis.“Time series sales forecasting for short shelf-life food products based on artificial neu-ral networks and evolutionary computing”. In: Journal of Food Engineering 75.2 (2006),pp. 196–204.

[21] Indre Žliobaite, Jorn Bakker, and Mykola Pechenizkiy. “Beating the baseline predictionin food sales: How intelligent an intelligent predictor is?” In: Expert Systems with Appli-cations 39.1 (2012), pp. 806–815.

[22] Irem Islek and Sule Gündüz Ögüdücü. “A retail demand forecasting model based ondata mining techniques”. In: 2015 IEEE 24th International Symposium on Industrial Elec-tronics (ISIE). IEEE. 2015, pp. 55–60.

[23] Xin Liu and Ryutaro Ichise. “Food sales prediction with meteorological data—a casestudy of a japanese chain supermarket”. In: International Conference on Data Mining andBig Data. Springer. 2017, pp. 93–104.

[24] Takeshi Takenaka and T Shimmura. “Practical and interactive demand forecastingmethod for retail and restaurant services”. In: Proc. of International Conference Advancesin Production Management Systems. 3-4. 2011, p. 2.

[25] Milos Bujisic, Vanja Bogicevic, and H. G. Parsa. “The effect of weather factors on restau-rant sales”. In: Journal of Foodservice Business Research 20(3) (2017), pp. 350–370.

[26] Liu Xinliang and Sun Dandan. “University Restaurant Sales Forecast Based on BP Neu-ral Network–In Shanghai Jiao Tong University Case”. In: International Conference onSwarm Intelligence. Springer. 2017, pp. 338–347.

[27] Xu Ma, Yanshan Tian, Chu Luo, and Yuehui Zhang. “Predicting Future Visitors OfRestaurants Using Big Data”. In: 2018 International Conference on Machine Learning andCybernetics (ICMLC). Vol. 1. IEEE. 2018, pp. 269–274.

37

https://www.nilsonreport.com/publication_special_feature_article.php

https://www.nilsonreport.com/publication_special_feature_article.php

https://www.riksdagen.se/sv/dokument-lagar/dokument/svensk-forfattningssamling/bokforingslag-19991078_sfs-1999-1078



https://explained.ai/gradient-boosting/index.html

https://explained.ai/gradient-boosting/index.html

References

[28] Mikael Holmberg and Pontus Halldén. Machine Learning for Restaurant Sales Forecast.Uppsala Universitet, 2018.

[29] Zongming Yin, Junzhang Zhu, and Xiaofeng Zhang. “Forecast customer flow usinglong short-term memory networks”. In: 2017 International Conference on Security, PatternAnalysis, and Cybernetics (SPAC). IEEE. 2017, pp. 61–66.

[30] Qifeng Zhou, Bin Xia, Wei Xue, Chunqiu Zeng, Ruyuan Han, and Tao Li. “An Ad-vanced Inventory Data Mining System for Business Intelligence”. In: 2017 IEEE ThirdInternational Conference on Big Data Computing Service and Applications (BigDataService).IEEE. 2017, pp. 210–217.

[31] Odunayo David Adedeji. From Business Understanding to Deployment: An application ofMachine Learning Algorithms to Forecast Customer Visits per Hour to a Fast-Casual Restau-rant in Dublin. Dublin Institute of Technology, 2018.

[32] Sunitha Cheriyan, Shaniba Ibrahim, Saju Mohanan, and Susan Treesa. “Intelligent SalesPrediction Using Machine Learning Techniques”. In: 2018 International Conference onComputing, Electronics & Communications Engineering (iCCECE). IEEE. 2018, pp. 53–58.

[33] Bohdan M Pavlyshenko. “Machine-Learning Models for Sales Time Series Forecasting”.In: MDPI Data 4.1 (2019), p. 15.

[34] Kyle B. Murray, Fabrizio Di Muro, Adam Finn, and Peter Popkowski Leszczyc. “Theeffect of weather on consumer spending”. In: Journal of Retailing and Consumer Services17 (2010), pp. 512–520.

[35] Norman E Rosenthal, David A Sack, J Christian Gillin, Alfred J Lewy, Frederick KGoodwin, Yolande Davenport, Peter S Mueller, David A Newsome, and Thomas AWehr. “Seasonal affective disorder: a description of the syndrome and preliminary find-ings with light therapy”. In: Archives of general psychiatry 41.1 (1984), pp. 72–80.

[36] Leora N Rosen, Steven D Targum, Michael Terman, Michael J Bryant, Howard Hoff-man, Siegfried F Kasper, Joelle R Hamovit, John P Docherty, Betty Welch, and NormanE Rosenthal. “Prevalence of seasonal affective disorder at four latitudes”. In: Psychiatryresearch 31.2 (1990), pp. 131–144.

[37] Maxim Vladimirovich Shcherbakov, Adriaan Brebels, Nataliya Lvovna Shcherbakova,Anton Pavlovich Tyukov, Timur Alexandrovich Janovsky, and Valeriy Anatol’evichKamaev. “A survey of forecast error measures”. In: World Applied Sciences Journal 24.24(2013), pp. 171–176.

[38] Nadia Ottosson and Matilda Holmgren. Från planering till servering. En studie om restau-rangbranschens arbete för minskat matsvinn. Lund Universitet, 2016.

[39] Erol Gelenbe and Yves Caseau. “The impact of information technology on energy con-sumption and carbon emissions”. In: Ubiquity 2015.June (2015), p. 1.

[40] Ward Van Heddeghem, Sofie Lambert, Bart Lannoo, Didier Colle, Mario Pickavet, andPiet Demeester. “Trends in worldwide ICT electricity consumption from 2007 to 2012”.In: Computer Communications 50 (2014), pp. 64–76.

[41] Anders Andrae and Tomas Edler. “On global electricity usage of communication tech-nology: trends to 2030”. In: Challenges 6.1 (2015), pp. 117–157.

[42] Emma Strubell, Ananya Ganesh, and Andrew McCallum. “Energy and Policy Consid-erations for Deep Learning in NLP”. In: arXiv preprint arXiv:1906.02243 (2019).

38

A Appendix

SMHI forecast parameters

Table A.1: SMHI’s avaliable forecast parametersParameter Unit Description Value rangemsl hPa Air pressure Decimal numbert C Air temperature Decimal numbervis km Horizontal visibility Decimal numberwd degree Wind direction Integerws m/s Wind speed Decimal numberr % Relative humidity Integer, 0-100tstm % Thunder probability Integer, 0-100tcc_mean octas Mean value of total cloud cover Integer, 0-8lcc_mean octas Mean value of low level cloud cover Integer, 0-8mcc_mean octas Mean value of medium level cloud cover Integer, 0-8hcc_mean octas Mean value of high level cloud cover Integer, 0-8gust m/s Wind gust speed Decimal numberpmin mm/h Minimum precipitation intensity Decimal numberpmax mm/h Maximum precipitation intensity Decimal numberspp % Percent of precipitation in frozen form Integer, -9 or 0-100pcat category Precipitation category Integer, 0-6pmean mm/h Mean precipitation intensity Decimal numberpmedian mm/h Median precipitation intensity Decimal numberWsymb2 code Weather symbol Integer, 1-27

39

SMHI historical parameters

Table A.2: SMHI’s avaliable historical parametersParameter Attribute Weather type Measured Interval

1 Instantaneous value Air temperature 1 time/hour2 Average Air temperature 1 time/day3 Average Wind direction 1 time/hour4 Average Wind speed 1 time/hour5 Sum Rainfall 1 time/day6 Instantaneous value Relative humidity 1 time/hour7 Sum Rainfall 1 time/hour8 Instantaneous value Snow depth 1 time/day9 Instantaneous value Air pressure at sea level 1 time/hour10 Sum Sunshine 1 time/hour11 Instantaneous value Global IR-radians 1 time/hour12 Instantaneous value Visibility 1 time/hour13 Instantaneous value Present weather 1/hour, 8/day14 Sum Rainfall 4 times/hour15 Max Precipitation intensity 4 times/hour16 Instantaneous value Total cloud coverage 1 time/hour17 Sum Rainfall 2 times/day18 Instantaneous value Rainfall 1 time/day19 Min Air temperature 1 time/day20 Max Air temperature 1 time/day21 Max Gust 1 time/hour22 Average Air temperature 1 time/month23 Sum Rainfall 1 time/month24 Average Long wave IR-radians 1 time/hour25 Max Wind speed 1 time/hour26 Min Air temperature 2 times/day27 Max Air temperature 2 times/day28 Instantaneous value Cloudbase, lowest layer 1 time/hour29 Instantaneous value Cloud coverage, lowest layer 1 time/hour30 Instantaneous value Cloudbase, second layer 1 time/hour31 Instantaneous value Cloud coverage, second layer 1 time/hour32 Instantaneous value Cloudbase, third layer 1 time/hour33 Instantaneous value Cloud coverage, third layer 1 time/hour34 Instantaneous value Cloudbase, fourth layer 1 time/hour35 Instantaneous value Cloud coverage, fourth layer 1 time/hour36 Instantaneous value Cloudbase, lowest base 1 time/hour37 Instantaneous value Cloud coverage, lowest base 1 time/hour38 Max of average Precipitation intensity 4 times/hour

40

Product sales data set

Table A.3: Variables in the product sales data setVariable Description Typeday The day of the month of the date of sale Integerday_of_week The day of the week of the date of sale Integerday_of_year The day of the year of the date of sale Integerdistance_to Walking distance in meters from the restaurant’s location to the Integer_city_center geographical center of the cityfriday If the date of sale was a Friday Booleanholiday_today If the date of sale was a holiday Booleanholiday_tomorrow If the date after the date of sale was a holiday Booleanhours_open The number of hours the restaurant was open on the date of sale Floatid The ID of a product Integerlast_7_days The mean amount sold in the 7 days before the date of sale Integerlast_14_days The mean amount sold in the 14 days before the date of sale Integerlast_28_days The mean amount sold in the 28 days before the date of sale Integerlast_364_days The mean amount sold in the 364 days before the date of sale Integerlat Latitude value of the restaurant’s location Floatlng Longitude value of the restaurant’s location Floatmonday If the date of sale was a Monday Booleanmonth The month of the year of the date of sale Integern_competitors The number of restaurants with the same types within 1000 meters of Integer

the restaurant’s locationn_ratings The number of ratings received on Google Maps Integeropen_dinner If the restaurant was open during dinner (17-19) on the date of sale Booleanopen_lunch If the restaurant was open during lunch (11-13) on the date of sale Booleanprice The sale price of a product Floatquantity The total amount of a product sold on this date Integerrain The total amount of rainfall in millimeters on the date of sale Floatrating Average rating received on Google Maps Floatsaturday If the date of sale was a Saturday Booleanstore-id A unique ID for the restaurant Integersunday If the date of sale was a Sunday Booleantemp The average air temperature in degrees Celsius on the date of sale Floatthursday If the date of sale was a Thursday Booleantuesday If the date of sale was a Tuesday Booleanwednesday If the date of sale was a Wednesday Booleanweek_of_year The week of the year of the date of sale Integerwind The average wind speed in meters per second on the date of sale Floatyear The year of the date of sale Integer1_days_ago The amount sold 1 day before the date of sale Integer2_days_ago The amount sold 2 days before the date of sale Integer3_days_ago The amount sold 3 days before the date of sale Integer4_days_ago The amount sold 4 days before the date of sale Integer5_days_ago The amount sold 5 days before the date of sale Integer6_days_ago The amount sold 6 days before the date of sale Integer7_days_ago The amount sold 7 days before the date of sale Integer14_days_ago The amount sold 14 days before the date of sale Integer28_days_ago The amount sold 28 days before the date of sale Integer364_days_ago The amount sold 364 days before the date of sale Integer

41

Total sales data set

Table A.4: Variables in the total sales data setVariable Description Typeday The day of the month of the date of sale Integerday_of_week The day of the week of the date of sale Integerday_of_year The day of the year of the date of sale Integerdistance_to Walking distance in meters from the restaurant’s location to the Integer_city_center geographical center of the cityfriday If the date of sale was a Friday Booleanholiday_today If the date of sale was a holiday Booleanholiday_tomorrow If the date after the date of sale was a holiday Booleanhours_open The number of hours the restaurant was open on the date of sale Floatlast_7_days The mean amount sold in the 7 days before the date of sale Floatlast_14_days The mean amount sold in the 14 days before the date of sale Floatlast_28_days The mean amount sold in the 28 days before the date of sale Floatlast_364_days The mean amount sold in the 364 days before the date of sale Floatlat Latitude value of the restaurant’s location Floatlng Longitude value of the restaurant’s location Floatmonday If the date of sale was a Monday Booleanmonth The month of the year of the date of sale Integern_competitors The number of restaurants with the same types within 1000 meters of Integer

the restaurant’s locationn_ratings The number of ratings received on Google Maps Integeropen_dinner If the restaurant was open during dinner (17-19) on the date of sale Booleanopen_lunch If the restaurant was open during lunch (11-13) on the date of sale Booleanrain The total amount of rainfall in millimeters on the date of sale Floatrating Average rating received on Google Maps Floatsaturday If the date of sale was a Saturday Booleanstore-id A unique ID for the restaurant Integersum The total amount in SEK sold on this date Floatsunday If the date of sale was a Sunday Booleantemp The average air temperature in degrees Celsius on the date of sale Floatthursday If the date of sale was a Thursday Booleantuesday If the date of sale was a Tuesday Booleanwednesday If the date of sale was a Wednesday Booleanweek_of_year The week of the year of the date of sale Integerwind The average wind speed in meters per second on the date of sale Floatyear The year of the date of sale Integer1_days_ago The amount sold 1 day before the date of sale Float2_days_ago The amount sold 2 days before the date of sale Float3_days_ago The amount sold 3 days before the date of sale Float4_days_ago The amount sold 4 days before the date of sale Float5_days_ago The amount sold 5 days before the date of sale Float6_days_ago The amount sold 6 days before the date of sale Float7_days_ago The amount sold 7 days before the date of sale Float14_days_ago The amount sold 14 days before the date of sale Float28_days_ago The amount sold 28 days before the date of sale Float364_days_ago The amount sold 364 days before the date of sale Float

42

Individual and collective training comparison

Table A.5: MAE comparison between individually and collectively trained models for thetotal sales data set (1/3)

ID Individual Collective Difference1 2027.33 2042.44 0.74%2 1578.72 1773.74 10.99%3 1465.98 1300.27 -12.74%4 1260.13 1330.74 5.31%5 2684.66 2098.58 -27.93%6 2284.53 2329.71 1.94%7 3440.28 3081.56 -11.64%8 11926.40 10789.57 -10.54%9 1679.71 1409.01 -19.21%

10 708.80 1022.40 30.67%11 663.09 659.14 -0.60%12 1358.20 1533.61 11.44%13 2294.77 1934.27 -18.64%14 1630.55 1509.58 -8.01%15 1899.09 2067.56 8.15%16 1683.26 1771.90 5.00%17 1059.21 1098.92 3.61%18 1070.48 926.41 -15.55%19 711.51 841.54 15.45%20 5811.09 5591.59 -3.93%21 7806.00 7340.00 -6.35%22 3822.53 5608.66 31.85%23 8585.33 9150.67 6.18%24 1390.45 1512.10 8.05%25 2579.24 2916.13 11.55%26 1196.23 1294.73 7.61%27 2319.72 3005.20 22.81%28 2034.01 1843.77 -10.32%29 4720.68 4308.20 -9.57%30 1179.92 1179.11 -0.07%31 2248.59 2373.19 5.25%32 3649.56 3851.07 5.23%33 2946.50 3107.88 5.19%34 780.06 822.77 5.19%35 2685.16 1541.69 -74.17%36 983.62 1002.43 1.88%37 6539.09 5100.85 -28.20%38 5690.23 6717.46 15.29%39 1798.51 1894.95 5.09%40 414.31 541.57 23.50%41 1163.67 1167.41 0.32%42 4362.51 3473.77 -25.58%43 2542.24 2412.05 -5.40%44 6828.19 5476.38 -24.68%45 1827.24 1698.02 -7.61%46 978.70 1221.93 19.91%47 3670.26 4300.86 14.66%48 2581.84 2385.86 -8.21%49 2541.58 2429.15 -4.63%

43


ID Individual Collective Difference50 2524.27 2691.82 6.22%51 2062.80 2051.97 -0.53%52 1446.26 1525.66 5.20%53 4998.02 3354.23 -49.01%54 2764.33 3033.73 8.88%55 1082.06 962.52 -12.42%56 518.82 637.37 18.60%57 560.67 585.10 4.18%58 1551.59 1215.64 -27.64%59 10040.05 8414.82 -19.31%60 1581.87 1520.65 -4.03%61 2313.14 2770.68 16.51%62 2469.09 2663.29 7.29%63 2568.29 2092.12 -22.76%64 1107.53 1371.62 19.25%65 2279.25 2155.32 -5.75%66 1526.21 2291.16 33.39%67 3146.40 5192.76 39.41%68 2136.89 2269.82 5.86%69 3641.80 6413.52 43.22%70 2468.72 2257.07 -9.38%71 1943.59 2466.45 21.20%72 456.52 583.54 21.77%73 960.11 870.77 -10.26%74 6726.76 6386.59 -5.33%75 1745.79 1881.80 7.23%76 1027.12 1219.08 15.75%77 1128.22 1235.11 8.65%78 553.56 686.78 19.40%79 4933.44 5802.99 14.98%80 2282.84 1856.68 -22.95%81 2084.40 2348.27 11.24%82 4163.82 3912.17 -6.43%83 484.78 571.75 15.21%84 785.06 684.64 -14.67%85 3408.06 3391.79 -0.48%86 1812.39 2092.48 13.39%87 1814.18 1936.84 6.33%88 1305.56 1459.03 10.52%89 904.43 964.10 6.19%90 1922.65 1278.85 -50.34%91 5059.90 6971.26 27.42%92 462.19 745.05 37.97%93 1177.74 1759.23 33.05%94 2497.96 2945.95 15.21%95 1958.71 2168.94 9.69%96 4650.20 4161.44 -11.74%97 2021.34 2175.97 7.11%98 1179.72 1460.65 19.23%99 1973.91 2141.21 7.81%

44


ID Individual Collective Difference100 1085.78 986.81 -10.03%101 2550.85 2268.96 -12.42%102 1402.36 1657.41 15.39%103 4429.78 4183.81 -5.88%104 3914.75 2756.02 -42.04%105 2399.10 2925.30 17.99%106 7713.48 7543.71 -2.25%107 197.64 395.53 50.03%108 1071.55 1031.05 -3.93%109 5254.62 5115.65 -2.72%110 2314.42 2105.39 -9.93%111 1652.77 1829.07 9.64%112 5237.78 4954.43 -5.72%113 1410.68 1357.44 -3.92%114 2333.23 2364.42 1.32%115 731.12 612.34 -19.40%116 939.70 1145.70 17.98%117 1613.81 1633.40 1.20%118 2051.67 1883.32 -8.94%119 6522.65 12289.38 46.92%120 762.20 779.55 2.23%121 1207.38 1500.21 19.52%122 1776.64 1698.90 -4.58%123 1668.93 1524.42 -9.48%124 593.22 824.93 28.09%125 1689.57 1840.25 8.19%126 1982.70 2198.41 9.81%127 6817.69 7809.78 12.70%128 1830.33 1616.62 -13.22%129 4671.42 3858.58 -21.07%130 2698.53 2786.26 3.15%

Average 2.17%

45

Product sales MAE

Table A.8: Effects of adding extra information on product sales forecast MAE (1/3)Company Sale History Restaurant Calendar Weather AllId MAE MAE Change MAE Change MAE Change MAE Change MAE Change1 1.12 1.00 10.82% 1.10 1.88% 1.01 9.99% 1.12 0.11% 0.99 11.68%2 1.21 1.20 0.79% 1.20 1.09% 1.25 -2.56% 1.22 -0.57% 1.19 2.32%3 0.68 0.68 1.19% 0.67 1.86% 0.66 3.74% 0.69 -0.67% 0.68 0.61%4 1.85 1.77 3.87% 1.86 -0.69% 1.66 9.88% 1.87 -1.16% 1.60 13.19%5 1.12 0.98 12.39% 1.08 3.92% 0.94 16.11% 1.12 0.24% 0.94 16.71%6 1.25 1.14 8.67% 1.24 0.74% 1.08 13.21% 1.31 -4.97% 1.13 9.30%7 1.87 1.86 0.59% 1.85 0.71% 1.83 1.91% 1.96 -4.69% 1.82 2.77%8 3.44 2.95 14.15% 3.42 0.44% 2.80 18.70% 3.44 0.07% 2.71 21.12%9 0.94 0.94 -0.60% 0.92 1.87% 0.93 1.08% 0.96 -2.69% 0.93 0.40%

10 2.05 1.42 30.54% 1.57 23.26% 1.31 36.10% 1.57 23.13% 0.99 51.90%11 1.29 1.09 15.91% 1.12 13.52% 1.03 20.35% 1.29 -0.27% 1.06 18.06%12 0.64 0.60 7.02% 0.63 2.36% 0.61 5.69% 0.65 -0.92% 0.60 7.13%13 2.83 2.86 -1.10% 2.57 8.92% 2.57 9.02% 2.92 -3.48% 2.68 4.99%14 1.24 1.12 9.54% 1.20 3.06% 1.17 5.62% 1.23 0.88% 1.08 12.71%15 3.28 2.91 11.35% 3.41 -3.88% 2.70 17.76% 3.30 -0.69% 2.66 19.06%16 0.80 0.79 0.97% 0.77 3.81% 0.78 3.26% 0.80 0.34% 0.76 5.81%17 0.96 0.97 -0.90% 0.95 1.30% 0.93 2.76% 1.03 -7.50% 0.97 -1.05%18 1.11 1.01 9.22% 1.07 4.18% 0.98 11.45% 1.02 7.89% 1.00 9.65%19 2.83 2.82 0.38% 2.75 2.59% 2.59 8.28% 2.75 2.78% 2.50 11.63%20 2.34 2.33 0.46% 2.32 0.88% 2.28 2.50% 2.38 -1.72% 2.25 3.98%21 2.11 1.93 8.81% 2.02 4.28% 2.01 4.85% 2.10 0.39% 1.83 13.28%22 3.59 3.54 1.37% 3.59 -0.04% 3.60 -0.09% 2.78 22.73% 2.69 25.08%23 2.64 2.28 13.86% 2.48 6.07% 2.33 11.68% 2.62 0.75% 2.29 13.17%24 1.78 1.63 8.46% 1.65 7.79% 1.68 5.58% 1.78 0.23% 1.56 12.30%25 2.09 2.19 -4.35% 2.02 3.44% 2.06 1.83% 2.24 -7.05% 2.20 -4.89%26 1.15 1.09 5.47% 1.12 3.45% 1.15 0.50% 1.17 -1.31% 1.05 9.24%27 2.21 1.96 11.54% 2.11 4.79% 1.97 10.96% 2.21 0.00% 1.92 13.11%28 0.77 0.72 6.57% 0.76 2.06% 0.68 11.93% 0.77 0.19% 0.72 7.46%29 1.43 1.34 6.79% 1.38 3.40% 1.37 4.40% 1.43 0.47% 1.34 6.63%30 1.39 1.38 0.70% 1.35 2.62% 1.35 3.24% 1.37 1.14% 1.32 5.12%31 1.58 1.54 2.29% 1.56 1.29% 1.55 1.66% 1.58 -0.08% 1.55 1.87%32 1.21 1.20 0.95% 1.22 -0.51% 1.18 2.34% 1.24 -2.26% 1.24 -2.54%33 1.22 1.19 2.42% 1.22 0.21% 1.19 2.19% 1.23 -1.20% 1.17 4.01%34 1.15 1.04 9.76% 1.18 -2.29% 1.15 0.40% 1.16 -0.35% 1.09 5.52%35 1.69 1.70 -0.22% 1.73 -2.13% 1.75 -3.32% 1.76 -3.81% 1.76 -3.83%36 1.46 1.42 2.78% 1.49 -1.62% 1.48 -0.93% 1.46 0.00% 1.42 3.05%37 1.55 1.26 18.85% 1.52 1.76% 1.21 21.66% 1.65 -6.28% 1.20 22.24%38 2.43 2.31 5.08% 2.29 5.75% 2.20 9.43% 2.52 -3.88% 2.27 6.70%39 1.20 1.13 6.10% 1.16 3.44% 1.12 7.30% 1.23 -1.88% 1.12 6.55%40 0.48 0.48 1.19% 0.47 2.46% 0.48 0.41% 0.49 -1.56% 0.49 -0.85%41 0.90 0.91 -1.24% 0.89 1.49% 0.89 1.72% 0.90 0.00% 0.90 0.61%42 2.00 1.90 5.27% 1.95 2.54% 1.88 6.04% 1.95 2.42% 1.89 5.66%43 2.61 2.36 9.62% 2.62 -0.27% 2.13 18.46% 2.65 -1.46% 2.20 15.68%44 2.35 2.12 10.11% 2.28 2.97% 2.08 11.56% 2.30 2.28% 2.03 13.62%45 0.81 0.81 0.67% 0.81 0.03% 0.79 2.70% 0.83 -2.57% 0.79 2.08%46 0.86 0.79 7.33% 0.86 -0.36% 0.78 9.11% 0.84 1.96% 0.79 8.25%47 0.86 0.83 3.32% 0.87 -1.14% 0.84 1.49% 0.83 2.81% 0.82 4.34%48 2.70 2.58 4.59% 2.63 2.59% 2.39 11.29% 2.83 -4.92% 2.47 8.40%49 0.40 0.34 15.26% 0.39 2.57% 0.38 6.58% 0.38 6.63% 0.34 16.11%

46

Table A.9: Effects of adding extra information on product sales forecast MAE (2/3)

Company Sale History Restaurant Calendar Weather AllId MAE MAE Change MAE Change MAE Change MAE Change MAE Change50 3.58 3.12 12.74% 3.21 10.23% 2.95 17.71% 3.54 1.10% 2.88 19.63%51 1.88 1.83 2.97% 1.68 10.45% 1.86 0.93% 1.84 2.30% 1.62 13.89%52 1.59 1.46 7.87% 1.56 1.67% 1.50 5.45% 1.57 1.16% 1.52 4.06%53 3.10 2.95 4.80% 3.02 2.39% 2.73 11.79% 3.11 -0.61% 2.76 10.69%54 0.84 0.77 7.98% 0.79 5.35% 0.80 4.94% 0.85 -1.48% 0.75 10.81%55 0.97 0.93 3.77% 0.98 -0.79% 0.91 6.04% 0.97 0.00% 0.91 6.09%56 0.69 0.70 -1.94% 0.69 0.82% 0.69 -0.23% 0.68 1.18% 0.71 -2.14%57 0.90 0.90 0.06% 0.87 3.79% 0.90 -0.16% 0.89 1.29% 0.88 1.98%58 1.64 1.63 0.79% 1.61 2.11% 1.67 -1.61% 1.67 -1.63% 1.62 1.47%59 1.51 1.45 3.97% 1.51 0.34% 1.47 2.79% 1.53 -1.26% 1.46 3.45%60 0.69 0.69 0.67% 0.70 -0.25% 0.68 2.29% 0.70 -0.47% 0.67 2.83%61 1.69 1.58 6.54% 1.55 8.21% 1.57 7.06% 1.71 -0.76% 1.55 8.21%62 1.73 1.69 2.06% 1.66 4.32% 1.70 1.56% 1.74 -0.79% 1.62 6.52%63 3.18 3.04 4.48% 3.18 0.00% 3.20 -0.59% 3.25 -2.28% 3.12 1.77%64 0.69 0.68 1.15% 0.69 -0.23% 0.67 2.71% 0.69 0.00% 0.68 1.33%65 2.37 2.54 -6.87% 2.22 6.29% 2.26 4.66% 2.36 0.36% 2.41 -1.70%66 3.05 2.84 6.90% 3.00 1.84% 3.07 -0.57% 3.06 -0.27% 2.67 12.46%67 1.51 1.52 -0.38% 1.49 1.32% 1.52 -0.84% 1.51 0.00% 1.49 1.72%68 1.17 1.16 0.13% 1.19 -1.82% 1.15 0.94% 1.23 -5.72% 1.17 -0.42%69 1.60 1.55 3.57% 1.52 5.25% 1.56 2.78% 1.57 2.22% 1.53 4.31%70 1.18 1.00 14.86% 1.06 10.14% 1.00 15.08% 1.15 2.37% 0.98 16.62%71 2.00 2.04 -2.01% 2.05 -2.62% 1.99 0.51% 1.89 5.50% 1.92 3.80%72 1.04 0.97 6.99% 1.03 0.72% 1.02 1.92% 1.02 1.46% 0.94 9.79%73 0.89 0.85 4.28% 0.82 7.01% 0.87 1.68% 0.86 3.20% 0.88 0.87%74 1.68 1.57 6.96% 1.68 0.06% 1.80 -6.58% 1.54 8.51% 1.52 9.88%75 1.58 1.39 12.40% 1.49 6.14% 1.40 11.27% 1.52 4.06% 1.42 10.17%76 0.61 0.58 4.64% 0.60 1.49% 0.56 7.46% 0.60 1.13% 0.54 10.89%77 1.63 1.63 -0.23% 1.66 -1.94% 1.66 -2.00% 1.70 -4.16% 1.61 1.42%78 0.59 0.58 2.86% 0.58 1.66% 0.60 -1.27% 0.63 -5.66% 0.61 -3.51%79 2.04 1.94 5.02% 1.90 7.13% 1.83 10.27% 2.00 2.00% 1.83 10.45%80 1.70 1.59 6.78% 1.79 -5.36% 1.73 -1.75% 1.74 -2.07% 1.67 1.69%81 0.84 0.82 2.53% 0.84 0.22% 0.83 1.62% 0.84 -0.33% 0.83 0.70%82 0.97 0.96 0.92% 0.97 0.61% 0.94 3.26% 0.95 2.79% 0.96 1.82%83 0.99 0.94 4.49% 0.96 2.93% 0.87 11.82% 1.01 -2.45% 0.87 11.74%84 1.18 1.21 -2.72% 1.19 -1.28% 1.20 -1.97% 1.27 -7.90% 1.27 -8.09%85 3.51 3.50 0.19% 3.55 -1.16% 3.35 4.47% 3.38 3.86% 3.21 8.70%86 1.51 1.42 6.23% 1.45 4.19% 1.40 7.70% 1.52 -0.46% 1.42 6.01%87 0.88 0.88 -0.01% 0.89 -1.53% 0.87 0.58% 0.89 -1.14% 0.86 1.37%88 0.82 0.81 2.14% 0.79 3.54% 0.80 3.03% 0.83 -1.02% 0.80 3.36%89 0.96 0.90 5.62% 0.88 7.75% 0.88 8.53% 0.97 -0.77% 0.89 6.86%90 1.02 0.95 6.93% 0.96 6.47% 0.99 3.41% 0.96 6.03% 0.94 7.78%91 1.67 1.53 8.21% 1.61 3.61% 1.57 6.19% 1.57 5.65% 1.48 11.30%92 0.60 0.62 -4.63% 0.60 -0.19% 0.60 -0.02% 0.63 -6.26% 0.63 -6.33%93 1.44 1.53 -6.49% 1.43 0.72% 1.47 -1.78% 1.39 3.45% 1.38 4.04%94 1.20 1.15 3.73% 1.22 -1.90% 1.20 -0.45% 1.21 -1.53% 1.23 -2.85%95 1.11 1.08 2.54% 1.10 0.71% 1.13 -1.62% 1.12 -0.70% 1.10 1.17%96 6.10 5.83 4.46% 6.75 -10.64% 5.42 11.20% 5.83 4.47% 5.86 4.02%97 1.73 1.70 1.30% 1.73 -0.10% 1.66 3.91% 1.70 1.53% 1.70 1.33%98 0.96 0.89 7.25% 0.96 0.13% 0.96 0.30% 0.95 0.88% 0.90 6.50%99 1.15 1.13 1.77% 1.18 -2.64% 1.08 6.18% 1.16 -1.44% 1.10 3.96%

47

Table A.10: Effects of adding extra information on product sales forecast MAE (3/3)

Company Sale History Restaurant Calendar Weather AllId MAE MAE Change MAE Change MAE Change MAE Change MAE Change

100 1.37 1.32 3.82% 1.36 0.97% 1.27 7.35% 1.36 0.63% 1.26 7.84%101 2.57 2.84 -10.70% 2.51 2.34% 2.18 15.07% 2.39 7.05% 2.52 1.84%102 0.77 0.74 3.23% 0.74 3.61% 0.72 6.38% 0.81 -4.98% 0.73 5.05%103 4.80 3.71 22.68% 4.93 -2.60% 3.47 27.68% 4.76 0.98% 3.55 26.01%104 1.55 1.49 3.45% 1.51 2.13% 1.49 3.31% 1.55 -0.17% 1.46 5.26%105 0.92 0.93 -1.29% 0.90 2.45% 0.88 4.02% 0.97 -5.78% 0.90 1.74%106 8.70 6.50 25.29% 8.41 3.30% 5.95 31.60% 9.00 -3.42% 6.30 27.56%107 0.64 0.67 -4.97% 0.63 1.79% 0.66 -3.36% 0.67 -4.97% 0.65 -2.02%108 1.24 1.30 -4.62% 1.25 -0.26% 1.29 -3.74% 1.33 -7.32% 1.29 -3.82%109 2.46 2.14 13.25% 2.29 6.94% 2.06 16.59% 2.45 0.51% 2.06 16.30%110 0.60 0.60 -0.15% 0.60 -0.85% 0.62 -3.21% 0.61 -1.44% 0.61 -1.30%111 1.62 1.51 6.81% 1.55 4.39% 1.47 9.13% 1.59 1.52% 1.47 9.28%112 2.81 2.27 18.94% 2.79 0.66% 2.29 18.42% 2.75 2.03% 2.19 22.10%113 1.14 1.11 2.79% 1.12 1.65% 1.09 4.49% 1.14 0.00% 1.07 5.89%114 1.16 1.18 -1.17% 1.16 0.30% 1.14 2.00% 1.22 -4.51% 1.17 -0.83%115 0.57 0.61 -6.59% 0.57 0.38% 0.57 0.05% 0.61 -5.97% 0.61 -5.95%116 0.93 0.89 4.77% 0.92 1.88% 0.94 -0.50% 0.95 -1.17% 0.88 5.73%117 1.21 1.13 6.43% 1.19 1.25% 1.13 6.41% 1.23 -1.92% 1.11 7.98%118 1.35 1.33 1.84% 1.29 4.60% 1.35 -0.01% 1.30 3.80% 1.25 7.56%119 13.31 14.61 -9.73% 13.20 0.87% 13.13 1.39% 12.23 8.13% 12.68 4.79%120 0.68 0.67 0.85% 0.66 3.03% 0.67 1.69% 0.64 5.12% 0.64 5.62%121 1.24 1.21 2.47% 1.22 1.64% 1.22 1.62% 1.24 0.06% 1.24 0.01%122 1.50 1.48 1.29% 1.48 1.10% 1.45 2.99% 1.54 -2.43% 1.50 -0.30%123 0.79 0.72 8.61% 0.82 -3.32% 0.80 -0.81% 0.73 7.62% 0.73 8.17%124 0.66 0.59 10.43% 0.66 0.61% 0.62 7.38% 0.66 0.60% 0.62 6.78%125 0.76 0.75 1.57% 0.75 1.28% 0.76 0.10% 0.73 3.94% 0.73 4.26%126 1.62 1.63 -0.17% 1.64 -1.25% 1.59 2.00% 1.58 2.94% 1.65 -1.43%127 2.67 2.14 20.06% 2.63 1.79% 2.05 23.49% 2.67 0.00% 2.04 23.74%128 2.94 2.78 5.22% 2.89 1.47% 2.89 1.64% 3.14 -6.90% 2.79 5.12%129 2.59 2.41 7.19% 2.62 -1.04% 2.48 4.49% 2.55 1.51% 2.34 9.57%130 3.39 3.10 8.61% 3.49 -2.87% 3.26 3.92% 2.72 19.83% 3.05 10.04%

48

Total sales MAE

Table A.11: Effects of adding extra information on total sales forecast MAE (1/3)Company Sale History Restaurant Calendar Weather All

Id MAE MAE Change MAE Change MAE Change MAE Change MAE Change1 3228.18 2061.85 36.13% 2712.04 15.99% 1851.94 42.63% 2790.25 13.57% 2027.33 37.20%2 2137.52 1897.53 11.23% 1991.31 6.84% 1800.60 15.76% 2281.36 -6.73% 1578.72 26.14%3 2033.87 1739.15 14.49% 1559.95 23.30% 1323.56 34.92% 1920.01 5.60% 1465.98 27.92%4 3064.48 1919.00 37.38% 2627.81 14.25% 1367.46 55.38% 3147.88 -2.72% 1260.13 58.88%5 3719.68 2977.11 19.96% 3323.62 10.65% 2409.82 35.21% 3615.50 2.80% 2684.66 27.83%6 3339.42 3259.10 2.41% 3511.27 -5.15% 2303.63 31.02% 4321.65 -29.41% 2284.53 31.59%7 3835.24 3390.90 11.59% 2578.20 32.78% 3146.29 17.96% 3659.56 4.58% 3440.28 10.30%8 17554.26 13891.53 20.87% 16629.84 5.27% 10537.55 39.97% 17430.03 0.71% 11926.40 32.06%9 2142.77 1928.36 10.01% 2061.75 3.78% 2050.11 4.32% 2152.09 -0.43% 1679.71 21.61%

10 1605.63 801.10 50.11% 891.42 44.48% 1235.09 23.08% 1451.09 9.62% 708.80 55.86%11 1177.09 849.80 27.81% 1067.64 9.30% 743.87 36.80% 1139.14 3.22% 663.09 43.67%12 3217.45 2456.74 23.64% 2241.14 30.34% 1222.85 61.99% 2320.26 27.89% 1358.20 57.79%13 2989.83 2700.68 9.67% 2565.32 14.20% 2411.33 19.35% 3039.09 -1.65% 2294.77 23.25%14 2615.94 1675.18 35.96% 1982.50 24.21% 1869.47 28.54% 2380.44 9.00% 1630.55 37.67%15 2583.52 2675.53 -3.56% 2030.73 21.40% 2237.44 13.40% 2327.94 9.89% 1899.09 26.49%16 2710.58 2410.20 11.08% 2199.88 18.84% 2190.65 19.18% 2395.56 11.62% 1683.26 37.90%17 973.59 1057.07 -8.57% 1088.47 -11.80% 1070.51 -9.95% 1356.42 -39.32% 1059.21 -8.79%18 1213.83 1201.93 0.98% 1231.11 -1.42% 1373.59 -13.16% 1248.89 -2.89% 1070.48 11.81%19 892.25 808.96 9.33% 593.66 33.46% 973.34 -9.09% 906.43 -1.59% 711.51 20.26%20 8817.64 7942.39 9.93% 6248.74 29.13% 6027.94 31.64% 13269.31 -50.49% 5811.09 34.10%21 13985.37 8514.17 39.12% 11355.65 18.80% 7905.66 43.47% 13566.90 2.99% 7806.00 44.18%22 4090.13 4228.62 -3.39% 4068.26 0.53% 4589.10 -12.20% 4503.32 -10.10% 3822.53 6.54%23 18323.23 8428.30 54.00% 17052.66 6.93% 17705.43 3.37% 20989.40 -14.55% 8585.33 53.15%24 2594.81 1934.14 25.46% 1286.84 50.41% 1776.75 31.53% 2607.22 -0.48% 1390.45 46.41%25 5010.63 3109.28 37.95% 2559.02 48.93% 3330.25 33.54% 5419.50 -8.16% 2579.24 48.52%26 1573.85 1474.31 6.32% 1031.73 34.45% 1436.59 8.72% 1537.14 2.33% 1196.23 23.99%27 2931.29 2509.65 14.38% 3698.99 -26.19% 3012.37 -2.77% 2931.29 0.00% 2319.72 20.86%28 2950.37 2012.05 31.80% 2670.89 9.47% 1826.77 38.08% 2935.20 0.51% 2034.01 31.06%29 5863.84 5445.49 7.13% 5783.32 1.37% 5135.12 12.43% 5249.16 10.48% 4720.68 19.50%30 1565.77 1341.47 14.33% 1558.14 0.49% 1314.97 16.02% 1585.42 -1.25% 1179.92 24.64%31 2858.15 2637.40 7.72% 2233.54 21.85% 2399.97 16.03% 2691.19 5.84% 2248.59 21.33%32 4927.80 4399.25 10.73% 4991.78 -1.30% 3561.27 27.73% 4867.85 1.22% 3649.56 25.94%33 4623.28 3423.98 25.94% 3914.67 15.33% 2951.28 36.16% 3889.07 15.88% 2946.50 36.27%34 690.84 785.55 -13.71% 699.32 -1.23% 795.19 -15.10% 696.28 -0.79% 780.06 -12.91%35 1582.25 3418.63 -116.06% 1328.50 16.04% 1484.11 6.20% 1555.83 1.67% 2685.16 -69.71%36 988.80 1204.42 -21.81% 1078.74 -9.10% 1163.95 -17.71% 988.80 0.00% 983.62 0.52%37 6381.57 6374.01 0.12% 6175.28 3.23% 4957.33 22.32% 6761.16 -5.95% 6539.09 -2.47%38 13692.63 7713.89 43.66% 4888.84 64.30% 6328.33 53.78% 13707.60 -0.11% 5690.23 58.44%39 3233.07 2803.61 13.28% 2040.22 36.90% 1747.37 45.95% 3219.42 0.42% 1798.51 44.37%40 571.24 665.19 -16.45% 404.11 29.26% 589.96 -3.28% 586.74 -2.71% 414.31 27.47%41 1383.25 1182.06 14.54% 1208.10 12.66% 1326.66 4.09% 1383.25 0.00% 1163.67 15.87%42 8410.22 4552.80 45.87% 4877.18 42.01% 4775.28 43.22% 8508.19 -1.16% 4362.51 48.13%43 4324.07 3375.20 21.94% 3890.06 10.04% 2951.95 31.73% 4204.27 2.77% 2542.24 41.21%44 12215.89 7129.11 41.64% 9948.63 18.56% 6092.01 50.13% 9640.02 21.09% 6828.19 44.10%45 2453.08 2244.31 8.51% 2182.37 11.04% 1813.88 26.06% 2389.40 2.60% 1827.24 25.51%46 1360.20 1445.77 -6.29% 1150.82 15.39% 1180.42 13.22% 1553.22 -14.19% 978.70 28.05%47 5782.67 4705.98 18.62% 4175.66 27.79% 3981.34 31.15% 5634.12 2.57% 3670.26 36.53%48 3268.82 2609.55 20.17% 3046.93 6.79% 2598.08 20.52% 2953.90 9.63% 2581.84 21.02%49 1909.40 2880.08 -50.84% 941.02 50.72% 2900.28 -51.89% 1983.74 -3.89% 2541.58 -33.11%50 3489.56 3435.51 1.55% 2514.61 27.94% 3466.04 0.67% 3467.26 0.64% 2524.27 27.66%

49

Table A.12: Effects of adding extra information on total sales forecast MAE (2/3)

Company Sale History Restaurant Calendar Weather AllId MAE MAE Change MAE Change MAE Change MAE Change MAE Change51 3259.70 3217.92 1.28% 2051.75 37.06% 2752.17 15.57% 3395.98 -4.18% 2062.80 36.72%52 1599.84 1486.25 7.10% 1499.40 6.28% 1531.48 4.27% 1503.40 6.03% 1446.26 9.60%53 5042.88 5410.86 -7.30% 4971.96 1.41% 3512.03 30.36% 4871.06 3.41% 4998.02 0.89%54 3988.00 3168.03 20.56% 2493.07 37.49% 2935.14 26.40% 3919.14 1.73% 2764.33 30.68%55 1158.82 1174.69 -1.37% 936.80 19.16% 1085.61 6.32% 1158.82 0.00% 1082.06 6.62%56 922.95 729.58 20.95% 445.60 51.72% 724.88 21.46% 859.77 6.85% 518.82 43.79%57 756.50 618.14 18.29% 529.97 29.94% 562.59 25.63% 811.01 -7.21% 560.67 25.89%58 1116.55 1582.53 -41.73% 1225.13 -9.72% 1584.42 -41.90% 1172.38 -5.00% 1551.59 -38.96%59 16402.22 13341.73 18.66% 8270.27 49.58% 10380.62 36.71% 17082.56 -4.15% 10040.05 38.79%60 2700.84 1880.06 30.39% 2252.55 16.60% 1390.84 48.50% 2703.49 -0.10% 1581.87 41.43%61 5909.60 3215.85 45.58% 2575.25 56.42% 3566.58 39.65% 5358.00 9.33% 2313.14 60.86%62 3521.30 3272.05 7.08% 2614.24 25.76% 2593.84 26.34% 3463.20 1.65% 2469.09 29.88%63 2894.52 2717.29 6.12% 2935.13 -1.40% 2319.21 19.88% 2713.92 6.24% 2568.29 11.27%64 1641.31 1468.15 10.55% 1088.92 33.66% 1295.15 21.09% 1641.31 0.00% 1107.53 32.52%65 2642.28 2705.91 -2.41% 2416.70 8.54% 2598.80 1.65% 2620.27 0.83% 2279.25 13.74%66 2917.53 1837.91 37.00% 2226.03 23.70% 3190.26 -9.35% 2622.86 10.10% 1526.21 47.69%67 3601.66 2638.96 26.73% 3909.65 -8.55% 3347.55 7.06% 3601.66 0.00% 3146.40 12.64%68 2597.27 2947.60 -13.49% 2649.68 -2.02% 2614.61 -0.67% 2519.89 2.98% 2136.89 17.73%69 8576.16 5938.52 30.76% 4323.74 49.58% 6775.37 21.00% 8205.66 4.32% 3641.80 57.54%70 5262.45 2724.22 48.23% 3261.09 38.03% 2070.53 60.65% 4514.86 14.21% 2468.72 53.09%71 3145.77 2716.72 13.64% 2470.23 21.47% 3076.44 2.20% 5725.31 -82.00% 1943.59 38.22%72 450.82 537.87 -19.31% 859.38 -90.63% 511.30 -13.42% 470.74 -4.42% 456.52 -1.26%73 852.99 934.72 -9.58% 755.87 11.39% 905.32 -6.13% 861.42 -0.99% 960.11 -12.56%74 5338.55 7502.65 -40.54% 5661.41 -6.05% 5323.67 0.28% 5506.61 -3.15% 6726.76 -26.00%75 3138.82 2551.06 18.73% 1979.25 36.94% 2853.77 9.08% 2567.07 18.22% 1745.79 44.38%76 1936.90 1314.64 32.13% 1422.96 26.53% 1107.27 42.83% 1724.41 10.97% 1027.12 46.97%77 960.36 1079.93 -12.45% 991.34 -3.23% 983.00 -2.36% 1005.84 -4.74% 1128.22 -17.48%78 834.94 791.45 5.21% 525.69 37.04% 637.02 23.70% 1019.97 -22.16% 553.56 33.70%79 8439.30 6085.25 27.89% 6544.23 22.46% 6630.17 21.44% 8772.54 -3.95% 4933.44 41.54%80 2884.56 2376.14 17.63% 2216.33 23.17% 1792.83 37.85% 2893.21 -0.30% 2282.84 20.86%81 3407.07 2538.63 25.49% 2203.84 35.32% 2279.56 33.09% 3305.26 2.99% 2084.40 38.82%82 5615.89 5619.87 -0.07% 4590.06 18.27% 4140.89 26.26% 5861.04 -4.37% 4163.82 25.86%83 517.76 560.25 -8.21% 491.37 5.10% 430.11 16.93% 537.96 -3.90% 484.78 6.37%84 987.22 993.88 -0.67% 770.10 21.99% 883.22 10.53% 1364.12 -38.18% 785.06 20.48%85 4909.91 3118.08 36.49% 5033.75 -2.52% 4398.29 10.42% 3992.60 18.68% 3408.06 30.59%86 4021.83 2530.34 37.08% 3215.65 20.05% 2030.29 49.52% 4120.06 -2.44% 1812.39 54.94%87 2526.67 1879.83 25.60% 2504.59 0.87% 1556.40 38.40% 2534.22 -0.30% 1814.18 28.20%88 2243.00 1770.77 21.05% 1611.40 28.16% 1570.85 29.97% 2194.83 2.15% 1305.56 41.79%89 1202.58 975.92 18.85% 905.76 24.68% 877.43 27.04% 1341.60 -11.56% 904.43 24.79%90 2291.10 2145.45 6.36% 1787.02 22.00% 1807.08 21.13% 1782.71 22.19% 1922.65 16.08%91 21249.49 5401.39 74.58% 7672.07 63.90% 6089.06 71.34% 19517.78 8.15% 5059.90 76.19%92 819.80 847.73 -3.41% 492.55 39.92% 881.83 -7.57% 837.80 -2.20% 462.19 43.62%93 2775.62 1990.44 28.29% 1300.50 53.15% 2108.60 24.03% 2771.81 0.14% 1177.74 57.57%94 3191.92 3018.72 5.43% 2508.08 21.42% 2580.96 19.14% 3303.96 -3.51% 2497.96 21.74%95 3730.76 2997.30 19.66% 2335.19 37.41% 3450.05 7.52% 3912.79 -4.88% 1958.71 47.50%96 10013.64 5090.52 49.16% 7520.17 24.90% 4839.85 51.67% 10087.53 -0.74% 4650.20 53.56%97 3651.34 2898.63 20.61% 3070.86 15.90% 2593.51 28.97% 4196.12 -14.92% 2021.34 44.64%98 2322.35 1705.91 26.54% 1317.13 43.28% 1101.61 52.56% 1989.11 14.35% 1179.72 49.20%99 3317.02 2714.54 18.16% 3077.88 7.21% 1878.48 43.37% 3435.20 -3.56% 1973.91 40.49%

50

Table A.13: Effects of adding extra information on total sales forecast MAE (3/3)

Company Sale History Restaurant Calendar Weather AllId MAE MAE Change MAE Change MAE Change MAE Change MAE Change

100 1728.55 1204.06 30.34% 1287.16 25.54% 1122.50 35.06% 1500.52 13.19% 1085.78 37.19%101 2977.29 3076.73 -3.34% 2392.73 19.63% 2321.59 22.02% 2884.82 3.11% 2550.85 14.32%102 2751.55 1665.92 39.46% 1265.96 53.99% 1287.19 53.22% 2758.83 -0.26% 1402.36 49.03%103 6660.59 5082.14 23.70% 6238.68 6.33% 3891.15 41.58% 8496.03 -27.56% 4429.78 33.49%104 6682.64 4113.57 38.44% 4066.85 39.14% 2937.25 56.05% 7276.88 -8.89% 3914.75 41.42%105 3955.89 4479.22 -13.23% 2342.56 40.78% 3351.25 15.28% 3879.51 1.93% 2399.10 39.35%106 15456.72 7928.18 48.71% 12608.46 18.43% 6252.36 59.55% 16346.48 -5.76% 7713.48 50.10%107 293.81 313.82 -6.81% 190.45 35.18% 302.61 -3.00% 318.37 -8.36% 197.64 32.73%108 1385.96 1238.87 10.61% 1364.68 1.54% 1276.22 7.92% 1388.09 -0.15% 1071.55 22.69%109 7575.79 7555.69 0.27% 7031.66 7.18% 5226.28 31.01% 7082.12 6.52% 5254.62 30.64%110 3047.32 2391.89 21.51% 2686.27 11.85% 2369.92 22.23% 2972.33 2.46% 2314.42 24.05%111 2088.89 1944.98 6.89% 1873.36 10.32% 1800.13 13.82% 2069.78 0.91% 1652.77 20.88%112 6625.53 5328.17 19.58% 5833.90 11.95% 4760.77 28.15% 5767.08 12.96% 5237.78 20.95%113 1454.62 1687.53 -16.01% 1350.51 7.16% 1374.81 5.49% 1454.62 0.00% 1410.68 3.02%114 3629.52 2906.26 19.93% 1704.74 53.03% 2791.07 23.10% 3049.50 15.98% 2333.23 35.72%115 833.79 729.39 12.52% 648.41 22.23% 643.78 22.79% 756.27 9.30% 731.12 12.31%116 1691.65 1348.90 20.26% 1317.95 22.09% 1133.60 32.99% 2170.31 -28.30% 939.70 44.45%117 2835.36 1618.66 42.91% 1822.38 35.73% 1356.84 52.15% 2300.38 18.87% 1613.81 43.08%118 2878.30 2780.90 3.38% 2119.81 26.35% 2382.44 17.23% 2760.92 4.08% 2051.67 28.72%119 2541.97 4070.64 -60.14% 5112.43 -101.12% 2195.64 13.62% 2306.36 9.27% 6522.65 -156.60%120 822.00 848.42 -3.21% 612.31 25.51% 785.68 4.42% 887.62 -7.98% 762.20 7.27%121 1807.60 1731.25 4.22% 1369.63 24.23% 1356.90 24.93% 1703.83 5.74% 1207.38 33.21%122 2297.77 2263.74 1.48% 1797.59 21.77% 1997.38 13.07% 2787.74 -21.32% 1776.64 22.68%123 1763.75 1807.30 -2.47% 1608.91 8.78% 1692.01 4.07% 1728.75 1.98% 1668.93 5.38%124 944.00 784.32 16.92% 828.05 12.28% 727.29 22.96% 1007.27 -6.70% 593.22 37.16%125 4437.78 2318.43 47.76% 1673.92 62.28% 3745.46 15.60% 4478.30 -0.91% 1689.57 61.93%126 3409.94 3431.91 -0.64% 2853.41 16.32% 3170.27 7.03% 3608.11 -5.81% 1982.70 41.86%127 8875.97 7507.46 15.42% 8291.63 6.58% 6642.03 25.17% 8875.97 0.00% 6817.69 23.19%128 1974.39 1933.13 2.09% 1442.95 26.92% 2004.80 -1.54% 1865.46 5.52% 1830.33 7.30%129 7747.85 6448.80 16.77% 5140.35 33.65% 4331.49 44.09% 7144.55 7.79% 4671.42 39.71%130 3132.51 2668.10 14.83% 2899.46 7.44% 2639.12 15.75% 2599.77 17.01% 2698.53 13.85%

51

Feature importance and Pearson correlation coefficient

Table A.14: Feature importance and Pearson correlation coefficient for the variables in theproduct sales data set

Variable FI FI rank PCC PCC ranklast_7_days 0.2220 1 0.7278 2last_14_days 0.1617 2 0.7321 1last_28_days 0.0890 3 0.7249 37_days_ago 0.0888 4 0.6754 4id 0.0531 5 -0.0893 15last_364_days 0.0466 6 0.6326 614_days_ago 0.0443 7 0.6513 51_days_ago 0.0348 8 0.6250 7day_of_week 0.0346 9 0.0367 1728_days_ago 0.0234 10 0.6124 8hours_open 0.0197 11 0.0552 16price 0.0153 12 -0.0250 19saturday 0.0145 13 0.0230 206_days_ago 0.0117 14 0.5735 92_days_ago 0.0109 15 0.5350 10day_of_year 0.0106 16 0.0083 27friday 0.0092 17 0.0268 18week_of_year 0.0089 18 0.0082 295_days_ago 0.0086 19 0.4788 124_days_ago 0.0079 20 0.4707 13temp 0.0076 21 0.0127 233_days_ago 0.0073 22 0.4889 11holiday_today 0.0072 23 0.0018 38day 0.0072 24 0.0060 32364_days_ago 0.0064 25 0.2644 14open_dinner 0.0062 26 0.0070 31year 0.0060 27 -0.0089 26month 0.0058 28 0.0079 30wind 0.0055 29 -0.0023 36rain 0.0048 30 -0.0045 33holiday_tomorrow 0.0038 31 0.0009 41thursday 0.0037 32 -0.0114 25tuesday 0.0032 33 -0.0189 22wednesday 0.0032 34 -0.0127 24open_lunch 0.0028 35 0.0083 28store-id 0.0019 36 0.0031 35lat 0.0017 37 0.0037 34lng 0.0003 38 0.0021 37monday 0.0000 39 -0.0195 21sunday 0.0000 40 -0.0017 39distance_to_city_center 0.0000 41 -0.0009 40n_competitors 0.0000 42 0.0005 43n_ratings 0.0000 43 0.0005 44rating 0.0000 44 0.0009 42

52

Table A.15: Feature importance and Pearson correlation coefficients for the variables in thetotal sales data set

Variable FI FI rank PCC PCC rankday_of_week 0.1251 1 0.2543 11hours_open 0.1078 2 0.4965 27_days_ago 0.1032 3 0.5187 1last_7_days 0.0573 4 0.3595 6friday 0.0508 5 0.2668 101_days_ago 0.0474 6 0.3823 514_days_ago 0.0451 7 0.4677 3last_14_days 0.0343 8 0.3515 728_days_ago 0.0336 9 0.4131 4last_28_days 0.0316 10 0.3279 8last_364_days 0.0274 11 0.1118 18day_of_year 0.0250 12 0.0376 29364_days_ago 0.0246 13 0.2218 12saturday 0.0220 14 0.1428 16week_of_year 0.0200 15 0.0358 31open_dinner 0.0199 16 0.1649 146_days_ago 0.0199 17 0.2689 9year 0.0189 18 0.0569 232_days_ago 0.0182 19 0.1783 135_days_ago 0.0171 20 0.0400 26temp 0.0161 21 0.0641 224_days_ago 0.0151 22 -0.0045 42holiday_today 0.0150 23 -0.0128 383_days_ago 0.0144 24 0.0402 25day 0.0142 25 0.0392 27month 0.0115 26 0.0349 32holiday_tomorrow 0.0115 27 0.0110 39store-id 0.0100 28 0.0376 28wind 0.0089 29 -0.0207 35rain 0.0083 30 -0.0202 36lat 0.0082 31 0.0289 34thursday 0.0055 32 -0.0469 24open_lunch 0.0046 33 0.0900 19wednesday 0.0035 34 -0.0772 20tuesday 0.0033 35 -0.1350 17lng 0.0006 36 0.0162 37monday 0.0000 37 -0.1603 15sunday 0.0000 38 -0.0368 30distance_to_city_center 0.0000 39 -0.0742 21n_ratings 0.0000 40 0.0052 41n_competitors 0.0000 41 0.0335 33rating 0.0000 42 0.0097 40

53

improving sales forecast accuracy for...

Documents