anapplicationofathree-stagexgboost-basedmodeltosales...

Research ArticleAn Application of a Three-Stage XGBoost-Based Model to SalesForecasting of a Cross-Border E-Commerce Enterprise

Shouwen Ji1 Xiaojing Wang 1 Wenpeng Zhao2 and Dong Guo3

1School of Traffic and Transportation Beijing Jiaotong University Haidian District Beijing 100044 China2Beijing Capital International Airport Co Ltd Beijing 100621 China3School of Mechanical-Electronic and Vehicle Engineering Beijing University of Civil Engineering and ArchitectureBeijing 102600 China

Correspondence should be addressed to Xiaojing Wang 17120889bjtueducn

Received 25 June 2019 Revised 17 August 2019 Accepted 28 August 2019 Published 15 September 2019

Academic Editor Elio Masciari

Copyright copy 2019 Shouwen Ji et al +is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Sales forecasting is even more vital for supply chain management in e-commerce with a huge amount of transaction datagenerated every minute In order to enhance the logistics service experience of customers and optimize inventory managemente-commerce enterprises focus more on improving the accuracy of sales prediction with machine learning algorithms In thisstudy a C-A-XGBoost forecasting model is proposed taking sales features of commodities and tendency of data series intoaccount based on the XGBoost model A C-XGBoost model is first established to forecast for each cluster of the resulting clustersbased on two-step clustering algorithm incorporating sales features into the C-XGBoost model as influencing factors offorecasting Secondly an A-XGBoost model is used to forecast the tendency with the ARIMA model for the linear part and theXGBoost model for the nonlinear part +e final results are summed by assigning weights to forecasting results of the C-XGBoostand A-XGBoost models By comparison with the ARIMA XGBoost C-XGBoost and A-XGBoost models using data fromJollychic cross-border e-commerce platform the C-A-XGBoost is proved to outperform than other four models

1 Introduction

In order to enhance the logistics service experience ofcustomers in the e-commerce industry chain supply chaincollaboration [1] requires that commodities are stocked inadvance in local warehouses of various markets around theworld which can effectively reduce logistics time Howeverfor cross-border e-commerce enterprises the productionand sales areas of e-commerce products are globalizedwhich takes them longer to make preparations from theprocurement of commodities transportation to customsquality inspection etc +erefore algorithms and tech-nologies of big data analysis are widely applied to predictsales of e-commerce commodities which provide the databasis for the supply chain management and will provide keytechnical support for the global supply chain scheme ofcross-border e-commerce enterprises

Besides the large quantity and diversity of transactiondata [2] sales forecasts are affected by many other factorsdue to the complexity of the cross-border e-commercemarket[3 4] +erefore to improve the precision and efficiency offorecasting consideration of various factors in sales fore-casting is still a challenge for e-commerce enterprises

+ere are plenty of studies having been undertaken insales forecasting +e methods of sales forecasts adopted inthese studies can roughly be divided into time series models(TSMs) and machine learning algorithms (MLAs) [5 6]

TSMs range from the exponential smoothing [7] to theARIMA families [8] which have been used extensively topredict future trends by extrapolating based on historicalobservation data Although TSMs have been proven to beuseful for sales forecasting their forecasting ability is limitedby their assumption of a linear behavior [9] and they do nottake external factors such as price changes and promotions

HindawiMathematical Problems in EngineeringVolume 2019 Article ID 8503252 15 pageshttpsdoiorg10115520198503252

into account [10] +erefore univariate forecasting methodsare usually adopted as a benchmark model in many studies[11 12]

Another important branch of forecasting has beenMLAs +e existing MLAs have been largely influenced bystate-of-the-art forecasting techniques which range fromartificial neural network (ANN) convolutional neural net-work (CNN) radial basis function (RBF) long short-termmemory network (LSTM) extreme learning machine (ELM)to support vector regression (SVR) etc [13]

On the one hand some existing forecasting models havemade comparisons between MLAs and TSMs [14] Ansujet al showed the superiority of ANN on the ARIMAmethodin sales forecasting [15] Alon et al compared ANNwith traditional methods including Winters exponentialsmoothing BoxndashJenkins ARIMA model and multivariateregression indicating that ANNs perform favorably in re-lation to the more traditional statistical methods [16] DiPillo et al assessed the application of SVM to sales fore-casting under promotion impacts which was compared withARIMA Holt-Winters and exponential smoothing [17]

On the other hand MLAs based on TSMs have also beenapplied in sales prediction Wang et al proved the advan-tages of the integrated model combining ARIMA with ANNin modeling the linear and nonlinear parts of the data set[18] In [19] an ARIMA forecasting model was establishedand the residual of the ARIMA model was trained and fittedby the BP neural network A novel LSTM ensemble fore-casting algorithm was presented by Choi and Lee [20] thateffectively combines multiple forecast results from a set ofindividual LSTM networks In order to better handle ir-regular sales patterns and take various factors into accountsome algorithms have been attempted to exploit more in-formation in sales forecasting as an increasing amount ofdata are becoming available in e-commerce Zhao andWang[21] provided a novel approach to learning effective featuresautomatically from structured data using CNN Bandaraet al attempted to incorporate sales demand patterns andcross-series information in a unified model by training theLSTM model [22] More importantly ELM was widelyapplied in forecasting Luo et al [23] proposed a novel data-driven method to predict user behavior by using ELM withdistribution optimization In [24] ELMwas enhanced underdeep learning framework to forecast wind speed

Although there are various methods of forecasting thechoice of methods is determined by the characteristics ofdifferent goods [25] Kulkarni et al [26] argued that productcharacteristics could have an impact on both searching andsales due to the characteristics inherent to products were themain attributes that potential consumers were interested in+erefore to better reflect the characteristics of goods intosales forecasting clustering techniques have been introducedto forecast [27] For example in [28 29] both fuzzy neuralnetworks and clustering methods were used to improve theresults of neural networks Lu and Wang [30] constructedthe SVR to deal with the demand forecasting problem withthe aid of the hierarchical self-organizing maps and in-dependent component analysis Lu and Kao [31] put forwarda sales forecasting method based on clustering using extreme

learning machine and combination linkage method Daiet al [32] built a clustering-based sales forecasting schemebased on SVR A clustering-based forecasting model bycombining clustering and machine learning methods wasdeveloped by Chen and Lu [33] for computer retailing salesforecasting

According to the above literature review a three-stageXGBoost-based forecasting model is constructed to focus onthe two aspects (the sales features and tendency of a dataseries) mentioned above in this study

Firstly in order to forecast the sales features variousinfluencing factors of sales are first introduced in this studyby the two-step clustering algorithm [34] which is an im-proved algorithm based on BIRCH [35] +en a C-XGBoostmodel based on clustering is presented to model for eachcluster of the resulting clusters with the XGBoost algorithmwhich has been proved to be an efficient predictor in manydata analysis contests such as Kaggle and in many recentstudies [36 37]

Secondly to achieve higher predicting accuracy in thetendency of data series an A-XGBoost model is presentedintegrating the strengths of the ARIMA and XGBoost modelrespectively for the linear part and the nonlinear part of dataseries +erefore a C-A-XGBoost model is constructed asthe final combination model by weighting for theC-XGBoost and A-XGBoost models which takes the mul-tiple factors affecting the sales of goods and the trend of thetime series into account

+e paper is organized into 5 sections the rest of which isorganized as follows In Section 2 the key models and al-gorithms employed in the study are shortly described in-cluding the feature selection two-step clustering algorithma method of parameter determination of the ARIMA andthe XGBoost In Section 3 a three-stage XGBoost-basedmodel is proposed to forecast both the sales features andtendency of time series In Section 4 numerical examples areused to illustrate the validity of the proposed forecastingmodel In Section 5 the conclusions along with a note re-garding future research directions are summarized

2 Methodologies

21 Feature Selection With the emergence of web tech-nologies there is an ever-increasing growth in the amount ofbig data in the e-commerce environment [38] Variety is oneof the critical attributes in big data as they are generatedfrom a wide variety of sources and formats including textweb tweet audio video click-stream and log files [39] Inorder to remove most irrelevant and redundant informationfrom various data many techniques of feature selection(removing variables that are irrelevant) and feature ex-traction (applying some transformations to the existingvariables to obtain a new one) have been discussed to reducethe dimensionality of the data [40] including filter-basedand wrapper feature selection Wrapper feature selectionemploys a subroutine statistical resampling technique (suchas cross-validation) in the actual learning algorithm toforecast the accuracy of feature subsets [41] which is a betterchoice for different algorithms modeling the different data

2 Mathematical Problems in Engineering

series Instead filter-based feature selection is suitable fordifferent algorithms modeling the same data series [42]

In this study wrapper feature selection in the forecastingand clustering algorithms is directly applied to removingunimportant attributes in multidimensional data based onstandard deviation (SD) the coefficient of variation (CV)Pearson correlation coefficient (PCC) and feature impor-tance scores (FIS) of which the details are as follows

SD reflects the degree of dispersion of data set which iscalculated as σ where N and μ denote the number ofsamples and mean value of the sample xi respectively

σ

1N

1113944

N

i1xi minus μ( 1113857

2

11139741113972

(1)

CV is a statistic to measure the degree of variation ofobserved values in the data which is calculated as cv

cv σμ

(2)

PCC is a statistic used to reflect the degree of linearcorrelation between two variables which is calculated as r

r 1

n minus 11113944

n

i1

Xi minus X

σX

1113888 1113889Yi minus Y

σY

1113888 1113889 (3)

where ((Xi minus X)σX) X and σX represent the standarddeviation mean value and standard score of Xi

FIS provides a score indicating how useful or valuableeach feature is in the construction of the boosted decisiontrees within themodel+emore an attribute is used tomakekey decisions with decision trees the higher its relativeimportance [43] +e importance is calculated for a singledecision tree by the performance measure increased by eachattribute split point weighted by the number of observationsthe node is responsible for +e performance measure maybe the purity such as the Gini Index [44] used to select thesplit points or another more specific error function +efeature importance is then averaged across all of the decisiontrees within the model [45]

22 Two-Step Clustering Algorithm Clustering aims atpartitioning samples into several disjoint subsets makingsamples in the same subsets highly similar to each other [46]+e most widely applied clustering algorithms can broadlybe categorized as the partition hierarchical density-basedgrid-based and model-based methods [47 48]

+e selection of clustering algorithmsmainly depends onthe scale and the type of collected data Clustering can beconducted using traditional algorithms when dealing withnumeric or categorical data [49 50] +e BIRCH as one ofthe hierarchical methods introduced by Zhang et al [35] isespecially suitable for the large data sets of continuous at-tributes [51] But in case of the large and mixed data thetwo-step clustering algorithm in SPSS Modeler is advised inthis study +e two-step clustering algorithm is a modifiedmethod based on BIRCH setting the log-likelihood distanceas the measure which can measure the distance between

continuous data and the distance between categorical data[34] Similar to BIRCH the two-step clustering algorithmfirst performs a preclustering step of scanning the entire dataset and storing the dense regions of data records in terms ofsummary statistics A hierarchical clustering algorithm isthen applied to clustering the dense regions Apart from theability to handle the mixed type of attributes the two-stepclustering algorithm differs from BIRCH in automaticallydetermining the appropriate number of clusters and a newstrategy of assigning cluster membership to noisy data

As one of the hierarchical algorithms the two-stepclustering algorithm is also more efficient in handling noiseand outliers than partition algorithms More importantly ithas unique advantages over other algorithms in the auto-matic mechanism of determining the optimal number ofclusters +erefore with regard to large and mixed trans-action data sets of e-commerce two-step clustering algo-rithm is a reliable choice for clustering goods of which thekey technologies and processes are illustrated in Figure 1

221 Preclustering +e clustering feature (CF) tree growthin the BIRCH algorithm is used to read data records in dataset one by one in the process of which the handling ofoutliers is implemented +en subclusters Cj are obtainedfrom data records in dense areas while generating a CF tree

222 Clustering Take the subclusters Cj as the object theclusters CJ are obtained by merging the subclusters one byone based on agglomerative hierarchical clustering methods[52] until the optimal number of clusters is determinedbased on the minimum value of Bayesian information cri-terion (BIC)

223 Cluster Membership Assignment +e data records areassigned to the nearest clusters by calculating the log-like-lihood distance between the data records and subclusters ofthe clusters CJ

224 Validation of the Results +e performance of clus-tering results is measured by silhouette coefficient S where a

is the mean distance between the sample and its cluster and b

is the mean distance between the sample and its differentcluster +e higher the value of S is the better the clusteringresult is

S b minus a

max(a b) (4)

23 Parameter Determination of ARIMA Model ARIMAmodels obtained from a combination of autoregressive andmoving average models [53] +e BoxndashJenkins methodologyin time series theory is applied to establish an ARIMA (p dq) model and its calculation steps can be found in [54] +eARIMA has limitations in determining parameters becauseits parameters are usually determined based on plots of ACFand PACF which usually leads to the judging deviationHowever a function named autoarima ( ) in R package

Mathematical Problems in Engineering 3

ldquoforecastrdquo [55] is used to automatically generate an optimalARIMA model for each of the time series based on thesmallest Akaike information criterion (AIC) and BIC [56]which makes up for the disadvantage of ARIMA duringjudging parameters

+erefore a combined method of parameter de-termination is proposed to improve the fitting performanceof the ARIMA which combines the results of ACF andPACF plots with that of the autoarima ( ) function +eprocedures are illustrated in Figure 2 and described asfollows

Step 1 Test the stationary and white noise by theaugmented DickeyndashFuller (ADF) and BoxndashPierce testsbefore modeling ARIMA If both stationarity and whitenoise tests are passed the ARIMA is suitable for thetime seriesStep 2 Determine a part of parameter combinationsbased on ACF and PACF plots and determine anotherpart of parameter combinations by the autoarima ( )function in R applicationStep 3 Model the ARIMA under different parametercombinations and then calculate the values of AIC fordifferent modelsStep 4 Determine the optimal parameters combinationof the ARIMA with the minimum of AIC

24 XGBoost Algorithm +e XGBoost is short for ldquoExtremeGradient Boostingrdquo proposed by Friedman [57] As therelevant basic theory of the XGBoost has been mentioned inplenty of previous papers [58 59] the procedures of thealgorithm [60] are covered in this study rather than basictheory

241 Feature Selection +e specific steps of feature selec-tion via the XGBoost are as follows data cleaning datafeature extraction and data feature selection based on thescores of feature importance

242 Modeling Training +e model is trained based on theselected features with default parameters

243 Parameter Optimization Parameter optimization isaimed at minimizing the errors between predicted valuesand actual values +ere are three types of parameters in thealgorithm of which the descriptions are listed in Table 1

+e general steps of determining the hyperparameter ofthe XGBoost model are as follows

Step 1 +e number of estimators is firstly tuned tooptimize the XGBoost when fixing the learning rate andother parametersStep 2 Different combinations of max_depth andmin_child_weight are tuned to optimize the XGBoostStep 3Max delta step and Gamma is tuned to make themodel more conservative with the determined pa-rameter in Step 1 and Step 2Step 4 Different combinations of subsample and col-sample_bytree are tuned to prevent overfittingStep 5 Regularization parameters are increased tomakethe model more conservativeStep 6 +e learning rate is reduced to preventoverfitting

3 TheProposedThree-Stage ForecastingModel

In this research a three-stage XGBoost-based forecastingmodel named C-A-XGBoost model is proposed in

Start

Timeseries

Stationaritytest

Difference

White noisetest

No

No

ACF and PACF plots Autoarima ( ) function

Parameter combinations

Comparisons of AIC

The optimal ARIMA

End

Yes

ARIMA modeling

Figure 2 +e procedures for parameter determination of theARIMA model

Preclustering

Clustering

Outliers handling

Automatic determination of theoptimal number of clusters

Clustering membership assignment

Figure 1 +e key technologies and processes of the two-stepclustering algorithm


consideration of both the sales features and tendency of dataseries

In Stage 1 a novel C-XGBoost model is put forwardbased on the clustering and XGBoost which incorporatesdifferent clustering features into forecasting as influencingfactors +e two-step clustering algorithm is first applied topartitioning commodities into different clusters based onfeatures and then each cluster in the resulting clusters ismodeled via XGBoost

In Stage 2 an A-XGBoost model is presented by com-bining the ARIMA with XGBoost to predict the tendency oftime series which takes the strength of linear fitting ability ofARIMA and the strong nonlinear mapping ability ofXGBoost ARIMA is used to predict the linear part and therolling prediction method is employed to establish XGBoostto revise the nonlinear part of the data series namely re-siduals of the ARIMA

In Stage 3 a combination model is constructed based onC-XGBoost and A-XGBoost named C-A-XGBoost +eC-A-XGBoost is aimed at minimizing the sum errors ofsquares by assigning weights to the results of C-XGBoostand A-XGBoost in which the weights reflect the reliabilityand credibility of sales features and tendency of data series

+e procedures of the proposed three-stage model aredemonstrated in Figure 3 of which the details are given asfollows

31 Stage 1 C-XGBoost Model +e two-step clustering al-gorithm is applied to clustering a data series into severaldisjoint clusters+en each cluster in the resulting clusters isset as the input and output sets to construct and optimize thecorresponding C-XGBoost model Finally testing samplesare partitioned into the corresponding cluster by the trainedtwo-step clustering model and then the prediction results

are calculated based on the corresponding trainedC-XGBoost model

32 Stage 2 A-XGBoost Model +e optimal ARIMA basedon the minimum of AIC after the data series pass the tests ofstationarity and white noise is trained and determined ofwhich the processes are described in Section 2 +en theresidual vector e (r1 r2 rn)Τ between the predictedvalues and actual values are obtained by the trained ARIMAmodel Next the A-XGBoost is established by setting col-umns from 1 to k and column (k+ 1) in R as the input andoutput respectively as is illustrated in the followingequation

R

r1 r2 middot middot middot rk rk+1

r2 r3 middot middot middot rk+1 rk+2

⋮ ⋮ middot middot middot ⋮ ⋮

rnminus kminus 1 rnminus k middot middot middot rnminus 2 rnminus 1

rnminus k rnminus k+1 middot middot middot rnminus 1 rn

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(nminus k)times(k+1)

(5)

+e final results of the test set are calculated by summingthe predicted results of the linear part by the trained ARIMAand that of residuals with the established XGBoost

33 Stage 3 C-A-XGBoost Model In this stage a combi-nation strategy is explored to minimize the error sum ofsquaresMSE in equation (6) by assigning weights wC andwAto C-XGBoost and A-XGBoost respectively +e predictedresults are calculated using equation (7) where 1113954YCA(k)1113954yC(k) and 1113954yA(k) denote the corresponding forecast valuesof the k-th sample via C-XGBoost A-XGBoost and C-A-XGBoost respectively In equation (6) y(k) is the actualvalue of the k-th sample

Table 1 +e description of parameters in the XGBoost model

Type of parameters Parameters Description of parameters Main purpose

Booster parameters

Max depth Maximum depth of a tree Increasing this value will make the model morecomplex and more likely to be overfit

Min_child_weight Minimum sum of weights in a child +e larger the min_child_weight is the moreconservative the algorithm will be

Max delta step Maximum delta step It can help make the update step moreconservative

Gamma Minimum loss reduction +e larger the gamma is the more conservativethe algorithm will be

Subsample Subsample ratio of the traininginstances It is used in the update to prevent overfitting

Col sample by a tree Subsample ratio of columns for eachtree It is used in the update to prevent overfitting

Eta Learning rate Step size shrinkage used in the update canprevent overfitting

Regularizationparameters

Alpha Regularization term on weights Increasing this value will make the model moreconservativeLambda

Learning taskparameters Reg linear Learning objective It is used to specify the learning task and the

learning objectiveCommand lineparameters

Number ofestimators Number of estimators It is used to specify the number of iterative

calculations


min MSE 1n

1113944

n

k1YCAand

(k) minus y(k)1113876 11138772 (6)

1113954YCA(k) wC1113954yC(k) + wA1113954yA(k) (7)

+e least squares are employed in exploring the optimalweights (wC and wA) the calculation of which is simplifiedby transforming the equations into the following matrixoperations

In equation (8) the matrix B consists of the predictedvalues of C-XGBoost and A-XGBoost

In equation (9) the matrix W consists of the weightsIn equation (10) the matrix Y consists of the actual

valuesEquation (11) is obtained by transforming the equation

(7) into the matrix formEquation (12) is calculated based on equation (11) left

multiplying by the transpose of the matrix BAccording to equation (13) the optional weights (wC

and wA) are calculated

B

yCand

(1)

yCand

(2)

⋮

yCand

(n)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

yAand

(1)

yAand

(2)

⋮

yAand

(n)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(8)

W wC

wA1113890 1113891 (9)

Y [y(1) y(2) y(n)] (10)

BW Y (11)

BTBW BTY (12)

W BTB1113872 1113873minus 1BTY (13)

4 Numerical Experiments and Comparisons

41 Data Description To illustrate the effectiveness of thedeveloped C-A-XGBoost model the following data series areused to verify the forecasting performance

411 Source Data Series As listed in Table 2 there are eightdata series in source data series +e data series range fromMar 1 2017 to Mar 16 2018

412 Clustering Series +ere are 10 continuous attributesand 6 categorical attributes in clustering series which areobtained by reconstructing the source data series +e at-tribute descriptions of the clustering series are illustrated inTable 3

42 Uniform Experimental Conditions To verify the per-formance of the proposed model according to performanceevaluation indexes some uniform experimental conditionsare established as follows

421 Uniform Data Set As shown in Table 4 the data seriesare partitioned into the training set validation set and testset so as to satisfy the requirements of different models +edata application is described as follows

(1) +e clustering series cover samples of 381 days(2) For the C-XGBoost model training set 1 namely

samples of the first 347 days in clustering series isutilized to establish the two-step clustering models+e resulting samples of two-step clustering are usedto construct XGBoost models +e test set with theremaining samples of 34 days is selected to validatethe C-XGBoost model In detail the test set is firstpartitioned into the corresponding clusters by theestablished two-step clustering model and then thetest set is applied to checking the validity of thecorresponding C-XGBoost models

Stage 2 A-XGBoost

Stage 1 C-XGBoost

Two-stepclustering

Residuals

ARIMAprediction

XGBoostprediction

A-XGBoostprediction

C-XGBoostprediction

Clusters XGBoostprediction

Data seriesCombination

prediction

Stage 3C-A-XGBoost

Figure 3 +e procedures of the proposed three-stage model


(3) For A-XGBoost model the training set 2 with thesamples of 1stndash277th days are used to construct theARIMA and the validation set is used to calculate theresiduals of ARIMA forecast which are used to trainthe A-XGBoost model +en the test set is employedto verify the performance of the model

(4) +e test set had the final 34 data samples which areemployed to fit the optimal combination weights forC-XGBoost and A-XGBoost models

422 Uniform Evaluation Indexes Several performancemeasures have previously been applied to verifying the vi-ability and effectiveness of forecasting models As illustratedin Table 5 the common evaluationmeasurements are chosento distinguish the optimal forecasting model +e smallerthey are the more accurate the model is

423 Uniform Parameters of the XGBoost Model +e firstpriority for optimization is to tune depth and min_-child_weight with other parameters fixed which are themost effective way for optimizing the XGBoost +e rangesof depth and child weigh are 6ndash10 and 1ndash6 respectivelyDefault values of parameters are listed in Table 6

43 Experiments of C-A-XGBoost Model

431 C-XGBoost Model

(1) Step 1 Commodity clustering +e two-step clusteringalgorithm is first applied to training set 1 Standardizationapplies to the continuous attributes the noise percent ofoutliers handling is 25 log-likelihood distance is the basis ofdistance measurement BIC is set as the clustering criterion

As shown in Figure 4 the clustering series are parti-tioned into 12 homogeneous clusters based on 11 featuresdenoted as C12 j (j 1 2 12) and the silhouettecoefficient is 04

As illustrated in Figure 5 the ratio of sizes is 264 and thepercentage is not too large or too small for each cluster+erefore cluster quality is acceptable

(2) Step 2 Construct the C-XGBoost models Features arefirst selected from each cluster C12 j of the 12 clusters basedon feature importance scores After that setting the selectedfeatures of each cluster and SKU sales in Table 3 as the inputand output varieties respectively the C-XGBoost models areconstructed for each cluster C12 j denoted asC12 j XGBoost

Take the cluster C12 3 in the 12 clusters as an example toillustrate the processes of modeling XGBoost

For C12 3 the features listed in Table 3 are first filteredand the 7 selected features are displayed in Figure 6 It can beobserved that F1 (goods click) F3 (cart click) F5 (goodsprice) F6 (sales unique visitor) and F7 (original shop price)are the dominating factors However F2 (temperaturemean) and F4 (favorites click) have fewer contributions tothe prediction

Setting the 11 features of the cluster C12 3 in Step 1 andthe corresponding SKU sales in Table 3 as the input andoutput respectively theC12 3 XGBoost is pretrained underthe default parameters in Table 6 For the prebuiltC12 3 XGBoost model the value of ME is 0393 and thevalue of MAE is 0896

(3) Step 3 Parameter optimization XGBoost is an algorithmwith supervised learning so the key to optimization is to

Table 2 +e description of source data series

Data series FieldsCustomer behavior dataa Data date goods click cart click favorites clickGoods information datab Goods id SKUi id level season brand idGoods sales datac Data date SKU sales goods price original shop price+e relationship between goods id and SKU idd Goods id SKU idGoods promote pricee Data date goods price goods promotion priceMarketingf Data date marketing planHolidaysg Data date holidayTemperatureh Data date temperature meanandashf+e six data series are sourced from the historical data of the Saudi Arabian market in Jollychic cross-border e-commerce trading platform (httpswwwjollychiccom) g+e data of holidays are captured from the URL httpshijiancc114jieri2017 h+e data of temperature are captured from the URL httpswwwwundergroundcomweatheregsaudi-arabia iSKUrsquos full name is stock keeping unit Each product has a unique SKU number

Table 3 +e description of clustering series

Fields Meaning of fields Fields Meaning of fieldsData date Date Favorites click Number of clicks on favoritesGoods code Goods code Sales unique visitor Number of unique visitorsSKU code SKU code Goods season Seasonal attributes of goodsSKU sales Sales of SKU Marketing Activity type codeGoods price Selling price Plan Activity rhythm codeOriginal shop price Tag price Promotion Promotion codeGoods click Number of clicks on goods Holiday +e holiday of the dayCart click Number of clicks on purchasing carts Temperature mean Mean of air temperatures (degF)


Table 6 Default parameters values of XGBoost

Parameters Number of estimators Max depth Min_child_weight Max delta step Objective Subsample EtaDefault value 100 6 1 0 Reg linear 1 03Parameters Gamma Col sample by tree Col sample by level Alpha Lambda Scale position weightDefault value 01 1 1 0 1 1

Table 4 +e description of the training set validation set and test set

Data set Samples Number of weeks Start date End date +e first day +e last dayTraining set 1 Clustering series 50 Mar1 2017 (WED) Dec2 2017 (SAT) 1 347Training set 2 SKU code 94033 50 Mar1 2017 (WED) Dec2 2017 (SAT) 1 277Validation set SKU code 94033 10 Dec3 2017 (SUN) Feb10 2018 (SAT) 278 347Test set SKU code 94033 5 Feb11 2018 (SUN) Mar16 2018 (FRI) 348 381

Table 5 +e description of evaluation indexes

Evaluation indexes Expression Description

ME ME 1(b minus a + 1)1113936kab(yk minus yk

and) +e mean sum error

MSE MSE 1(b minus a + 1)1113936bka(yk minus yk

and)2 +e mean squared error

RMSE RMSE

1(b minus a + 1)1113936bka(yk minus yk

and)2

1113969

+e root mean squared errorMAE MAE 1(b minus a + 1)1113936

bka|yk minus yk

and| +e mean absolute error

yk is the sales of the k-th sample yk

anddenotes the corresponding prediction

Algorithm

Inputs Clusters

Two-step

1112

(a)

10ndash05ndash10 05ndash00Poor Fair Good

(b)

Figure 4 Model summary and cluster quality of the two-step clustering model (a) +e summary of the two-step clustering model (b) +eSilhouette coefficient of cohesion and separation for 12 clusters

64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12

Size of smallest cluster

Size of largest cluster

Ratio of sizeslargest cluster tosmallest cluster

388177 (51)

1026625 (135)

264

Figure 5 Cluster sizes of the clustering series by two-step clustering algorithm


determine the appropriate input and output variables Incontrast parameter optimization has less impact on theaccuracy of the algorithm +erefore in this paper onlythe primary parameters including max_depth andmin_child_weight are tuned to optimize the XGBoost[61] +e model can achieve a balanced point becauseincreasing the value of max_depth will make the modelmore complex and more likely to be overfit but increasingthe value of min_child_weight will make the model moreconservative

+e prebuilt C12 3 XGBoost model is optimized tominimize ME andMAE by tuning max_depth (from 6 to 10)and min_child_weight (from 1 to 6) when other parametersare fixed in which the ranges of parameters are determinedaccording to lots of case studies with the XGBoost such as[62] +e optimal parameter combination is determined bythe minimum of the ME and MAE under different pa-rameter combination

Figure 7 shows the changes of ME and MAE based onXGBoost as depths and min_child_weight change It can beseen that both the ME andMAE are the smallest when depthis 9 and min_child_weight is 2 +at is the model is optimal

(4) Step 4 Results on the test set +e test set is partitionedinto the corresponding clusters by the trained two-stepclustering model in Step 1 After that the Steps 2-3 arerepeated for the test set

As shown in Table 7 the test set is partitioned into theclusters C12_3 and C12 4 +en the corresponding modelsC12 3 XGBoost and C12 4 XGBoost are determinedC12 3 XGBoost has been trained and optimized as an ex-ample in Steps 2-3 and the C12 4 XGBoost is also trainedand optimized by repeating Steps 2-3 Finally the predictionresults are obtained by the optimized C12 3 XGBoost andC12 4 XGBoost

As illustrated in Figure 8 ME and MAE forC12 4 XGBoost change with the different values of depthand min_child_weight +e model performs the best whendepth is 10 and min_child_weight is 2 because both the MEand MAE are the smallest +e forecasting results of the testset are calculated and summarized in Table 7

432 A-XGBoost Model

(1) Step 1 Test stationarity and white noise of training set 2For training set 2 the p value of the ADF test and BoxndashPierce test are 001 and 3331 times 10minus 16 respectively which arelower than 005 +erefore the time series is stationary andnonwhite noise indicating that training set 2 is suitable forthe ARIMA

(2) Step 2 Train ARIMA model According to Section 23parameter combinations are firstly determined by ACF andPACF plots and autoarima ( ) function in R packageldquoforecastrdquo

As shown in Figure 9(a) SKU sales have a significantfluctuation in the first 50 days compared with the sales after50 days in Figure 9(b) the plot of ACF has a high trailingcharacteristic in Figure 9(c) the plot of PACF has a de-creasing and oscillating phenomenon +erefore the first-order difference should be calculated

As illustrated in Figure 10(a) SKU sales fluctuatearound zero after the first-order difference Figures 10(b)and 10(c) graphically present plots of ACF and PACF afterthe first-order difference both of which have a decreasingand oscillating phenomenon It indicates that the trainingset 2 conforms to the ARMA

As a result the possible optimal models are ARIMA (21 2) ARIMA (2 1 3) and ARIMA (2 1 4) according to theplots of ACF and PACF in Figure 10

Table 8 shows the AIC values of the ARIMA underdifferent parameters which are generated by the autoar-ima ( ) function It can be concluded that the ARIMA (0 1 1)is the best model because its AIC has the bestperformance

To further determine the optimal model the AIC andRMSE of ARIMA models under different parameters aresummarized in Table 9 +e possible optimal models in-clude the 3 possible optimal ARIMA judged by Figure 10and the best ARIMA generated by the autoarima ( )function According to the minimum principles theARIMA (2 1 4) is optimal because both AIC and RMSEhave the best performance

662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6

200 1600140012001000800600400Feature importance score

Feat

ures

F7 original shop price

F6 sales unique visitor

F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0

Figure 6 Feature importance score of the C12_3_XGBoost model


035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight

Max_depth = 6Max_depth = 8


(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight


Max_depth = 6Max_depth = 8Max_depth = 10

(b)

Figure 7 ME and MAE of C12_3_XGBoost under different parameters (a) Mean error of the training set (b) Mean absolute error of thetraining set

Table 7 +e results of C-XGBoost for the test set

Test set Days C12_j C12_j_XGBoost model Depth and min_child_weightTraining set 1 Test setME MAE ME MAE

348thndash372th 25 3 C12_3_XGBoost model (9 2) 0351 0636 4385 4400373rdndash381st 9 4 C12_4_XGBoost model (10 2) 0339 0591 1778 2000348thndash381st 34 mdash mdash mdash mdash mdash 3647 3765

0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)



(3) Step 3 Calculate residuals of the optimal ARIMA +eprediction results from the 278th to the 381st day are ob-tained by using the trained ARIMA (2 1 4) denoted asARIMA forecast +en residuals between the predictionvalues ARIMA forecast and the actual values SKU_sales arecalculated denoted as ARIMA residuals

(4) Step 4 Train A-XGBoost by setting ARIMA residuals asthe input and output As shown in equation (14) the outputdata are composed of 8 columns of the matrix R and the

corresponding inputs are the residuals of the last 7 days(from Column 1 to 7)

R

r271 r272 middot middot middot r277 r278





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)

Figure 10 Plots of (a) SKU sales with days change (b) ACF and (c) PACF after the first-order difference

Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

Figure 9 Plots of (a) SKU sales with days change (b) ACF and (c) PACF


(5) Step 5 Calculate predicted residuals of the test setusing the trained A-XGBoost in Step 4 denoted asA minus XGBoost_residuals For the test set the result of the 348thday is obtained by the setting ARIMA residuals of the341stndash347th day as the input +en the result of the 349thday can be calculated by inputting ARIMA_residuals of the342ndndash347th day and A minus XGBoost_residuals of the 348thday into the trained A-XGBoost+e processes are repeateduntil the A minus XGBoost_residuals of the 349thndash381st day areobtained

(6) Step 6 Calculate the final prediction results For the testset calculate the final prediction results by summing overthe corresponding values of A minus XGBoost_residuals andARIMA_forecast denoted as A minus XGBoost_forecast +eresults of the A-XGBoost are summarized in Table 10

433 C-A-XGBoost Model +e optimal combinationweights are determined by minimizing the MSE in equation(6)

For the test set the weights wC and wA are obtained basedon the matrix operation equation (13) W (BTB)minus 1BTYwhere wC 1262 and wA 0273

44 Models for Comparison In this section the followingmodels are chosen for the comparison between the proposedmodels and other classical models

ARIMA As one of the common time series model it isused to predict sales of time sequence of which theprocesses are the same as the ARIMA in Section 432XGBoost +e XGBoost model is constructed and op-timized by setting the selected features and the cor-responding SKU sales as input and outputC-XGBoost Taking sales features of commodities intoaccount the XGBoost is used to forecast sales based onthe resulting clusters by the two-step clustering model+e procedures are the same as that in Section 431

A-XGBoost +e A-XGBoost is applied to revising re-siduals of the ARIMA Namely the ARIMA is firstlyused to model the linear part of the time series andthen XGBoost is used to model the nonlinear part +erelevant processes are described in Section 432C-A-XGBoost +e model combines the advantages ofC-XGBoost and A-XGBoost of which the proceduresare displayed in Section 433

45 Results of DifferentModels In this section the test set isused to verify the superiority of the proposed C-A-XGBoost

Figure 11 shows the curve of actual values SKU_sales andfive fitting curves of predicted values from the 348th day tothe 381st day which is obtained by the ARIMA XGBoostC-XGBoost A-XGBoost and C-A-XGBoost

It can be seen that C-A-XGBoost has the best fittingperformance to the original value as its fitting curve is themost similar in five fitting curves to the curve of actual valuesSKU sales

To further illustrate the superiority of the proposed C-A-XGBoost the evaluation indexes mentioned in Section 422are applied to distinguishing the best model of the salesforecast Table 11 provides a comparative summary of theindexes for the five models in Section 44

According to Table 11 it can be concluded that thesuperiority of the proposed C-A-XGBoost is distinct com-pared with the other models as its evaluation indexes areminimized

C-XGBoost is inferior to C-A-XGBoost but outperformsthe other three models underlining that C-XGBoost issuperior to the single XGBoost

A-XGBoost has a superior performance relative toARIMA proving that XGBoost is effective for residualmodification of ARIMA

According to the analysis above the proposed C-A-XGBoost has the best forecasting performance for sales ofcommodities in the cross-border e-commerce enterprise

Table 10 +e performance evaluation of A-XGBoost

A-XGBoost Validation set Test setMinimum error minus 0003 minus 8151Maximum error 0002 23482Mean error 0000 1213Mean absolute error 0001 4566Standard deviation 0001 6262Linear correlation 1 minus 0154Occurrences 70 34

Table 8 AIC values of the resulting ARIMA by autoairma ( ) function

ARIMA (p d q) AIC ARIMA (p d q) AICARIMA (2 1 2) with drift 2854317 ARIMA (0 1 2) with drift 2852403ARIMA (0 1 0) with drift 2983036 ARIMA (1 1 2) with drift 2852172ARIMA (1 1 0) with drift 2927344 ARIMA (0 1 1) 2850212ARIMA (0 1 1) with drift 2851296 ARIMA (1 1 1) 2851586ARIMA (0 1 0) 2981024 ARIMA (0 1 2) 2851460ARIMA (1 1 1) with drift 2852543 ARIMA (1 1 2) 2851120

Table 9 AIC values and RMSE of ARIMA models under differentparameters

ARIMA model ARIMA (p d q) AIC RMSE1 ARIMA (0 1 1) 2850170 418142 ARIMA (2 1 2) 2852980 415723 ARIMA (2 1 3) 2854940 415674 ARIMA (2 1 4) 2848850 40893


5 Conclusions and Future Directions

In this research a new XGBoost-based forecasting modelnamed C-A-XGBoost is proposed which takes the salesfeatures and tendency of data series into account

+e C-XGBoost is first presented combining the clus-tering and XGBoost aiming at reflecting sales features ofcommodities into forecasting +e two-step clustering al-gorithm is applied to partitioning data series into differentclusters based on selected features which are used as theinfluencing factors for forecasting After that the corre-sponding C-XGBoost models are established for differentclusters using the XGBoost

+e proposed A-XGBoost takes the advantages of theARIMA in predicting the tendency of data series andovercomes the disadvantages of the ARIMA by applying theXGBoost to dealing with the nonlinear part of the data series+e optimal ARIMA is obtained in comparison of AICsunder different parameters and then the trained ARIMAmodel is used to predict the linear part of the data series Fornonlinear part of data series the rolling prediction is con-ducted by the trained XGBoost of which the input andoutput are the resulting residuals by the ARIMA +e finalresults of the A-XGBoost are calculated by adding thepredicted residuals by the XGBoost to the correspondingforecast values by the ARIMA

In conclusion the C-A-XGBoost is developed byassigning appropriate weights to the forecasting results ofthe C-XGBoost and A-XGBoost so as to take their respectivestrengths Consequently a linear combination of the two

modelsrsquo forecasting results is calculated as the final pre-dictive values

To verify the effectiveness of the proposed C-A-XG-Boost the ARIMA XGBoost C-XGBoost and A-XGBoostare employed for comparison Meanwhile four commonevaluation indexes including ME MSE RMSE and MAEare utilized to check the forecasting performance of C-A-XGBoost +e experiment demonstrates that the C-A-XGBoost outperforms other models indicating that C-A-XGBoost has provided theoretical support for sales forecastof the e-commerce company and can serve as a referencefor selecting forecasting models It is advisable for thee-commerce company to choose different forecastingmodels for different commodities instead of utilizing asingle model

+e two potential extensions are put forward for futureresearch On the one hand owing to the fact that there maybe no model in which all evaluation indicators are minimalwhich leads to the difficulty in choosing the optimal model+erefore a comprehensive evaluation index of forecastingperformance will be constructed to overcome the difficultyOn the other hand sales forecasting is actually used tooptimize inventory management so some relevant factorsshould be considered including inventory cost order leadtime delivery time and transportation time

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

0

10

20

30

380370360350Day

SKU

_sal

es

SKU_salesARIMAXGBoost

C-XGBoostA-XGBoostC-A-XGBoost

Figure 11 Comparison of the SKU sales with the predicted values of five models in Section 44 +e x-axis represents the day +e y-axisrepresents the sales of SKU +e curves with different colors represent different models

Table 11 +e performance evaluation of ARIMA XGBoost A-XGBoost C-XGBoost and C-A-XGBoost

Evaluation indexes ARIMA XGBoost A-XGBoost C-XGBoost C-A-XGBoostME minus 21346 minus 3588 1213 3647 0288MSE 480980 36588 39532 23353 10769RMSE 21931 6049 6287 4832 3282MAE 21346 5059 4566 3765 2515


Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is research was supported by the National Key RampDProgram of China through the China Development ResearchFoundation (CDRF) funded by the Ministry of Science andTechnology (CDRF-SQ2017YFGH002106)

References

[1] Y Jin Data Science in Supply Chain Management Data-Related Influences on Demand Planning Proquest Llc AnnArbor MI USA 2013

[2] S Akter and S F Wamba ldquoBig data analytics in e-commercea systematic review and agenda for future researchrdquo ElectronicMarkets vol 26 no 2 pp 173ndash194 2016

[3] J L Castle M P Clements and D F Hendry ldquoForecasting byfactors by variables by both or neitherrdquo Journal ofEconometrics vol 177 no 2 pp 305ndash319 2013

[4] A Kawa ldquoSupply chains of cross-border e-commercerdquo inProceedings of the Advanced Topics in Intelligent Informationand Database Systems Springer International PublishingKanazawa Japan April 2017

[5] L Song T Lv X Chen and J Gao ldquoArchitecture of demandforecast for online retailers in China based on big datardquo inProceedings of the International Conference on Human-Cen-tered Computing Springer Colombo Sri Lanka January 2016

[6] G Iman A Ehsan R W Gary and A Y William ldquoAnoverview of energy demand forecasting methods published in2005ndash2015rdquo Energy Systems vol 8 no 2 pp 411ndash447 2016

[7] S Gmbh Forecasting with Exponential Smoothing vol 26no 1 Springer Berlin Germany 2008

[8] G E P Box and G M Jenkins ldquoTime series analysis fore-casting and controlrdquo Journal of Time vol 31 no 4 303 pages2010

[9] G P Zhang ldquoTime series forecasting using a hybrid ARIMAand neural network modelrdquo Neurocomputing vol 50pp 159ndash175 2003

[10] S Ma R Fildes and T Huang ldquoDemand forecasting withhigh dimensional data the case of SKU retail sales forecastingwith intra- and inter-category promotional informationrdquoEuropean Journal of Operational Research vol 249 no 1pp 245ndash257 2015

[11] O Gur Ali S SayIn T van Woensel and J Fransoo ldquoSKUdemand forecasting in the presence of promotionsrdquo ExpertSystems with Applications vol 36 no 10 pp 12340ndash123482009

[12] T Huang R Fildes and D Soopramanien ldquo+e value ofcompetitive information in forecasting FMCG retail productsales and the variable selection problemrdquo European Journal ofOperational Research vol 237 no 2 pp 738ndash748 2014

[13] F Cady ldquoMachine learning overviewrdquo in Ae Data ScienceHandbook Wiley Hoboken NJ USA 2017

[14] N K Ahmed A F Atiya N E Gayar and H El-ShishinyldquoAn empirical comparison of machine learning models fortime series forecastingrdquo Econometric Reviews vol 29 no 5-6pp 594ndash621 2010

[15] A P Ansuj M E Camargo R Radharamanan andD G Petry ldquoSales forecasting using time series and neural

networksrdquo Computers amp Industrial Engineering vol 31no 1-2 pp 421ndash424 1996

[16] I Alon M Qi and R J Sadowski ldquoForecasting aggregateretail sales a comparison of artificial neural networks andtraditional methodsrdquo Journal of Retailing and ConsumerServices vol 8 no 3 pp 147ndash156 2001

[17] G Di Pillo V Latorre S Lucidi and E Procacci ldquoAn ap-plication of support vector machines to sales forecastingunder promotionsrdquo 4OR vol 14 no 3 pp 309ndash325 2016

[18] L Wang H Zou J Su L Li and S Chaudhry ldquoAn ARIMA-ANN hybrid model for time series forecastingrdquo SystemsResearch and Behavioral Science vol 30 no 3 pp 244ndash2592013

[19] S Ji H Yu Y Guo and Z Zhang ldquoResearch on salesforecasting based on ARIMA and BP neural network com-bined modelrdquo in Proceedings of the International Conferenceon Intelligent Information Processing ACM Wuhan ChinaDecember 2016

[20] J Y Choi and B Lee ldquoCombining LSTM network ensemblevia adaptive weighting for improved time series forecastingrdquoMathematical Problems in Engineering vol 2018 Article ID2470171 8 pages 2018

[21] K Zhao and CWang ldquoSales forecast in e-commerce using theconvolutional neural networkrdquo 2017 httpsarxivorgabs170807946

[22] K Bandara P Shi C Bergmeir H Hewamalage Q Tran andB Seaman ldquoSales demand forecast in e-commerce using along short-termmemory neural network methodologyrdquo 2019httpsarxivorgabs190104028

[23] X Luo C Jiang W Wang Y Xu J-H Wang and W ZhaoldquoUser behavior prediction in social networks using weightedextreme learning machine with distribution optimizationrdquoFuture Generation Computer Systems vol 93 pp 1023ndash10352018

[24] L Xiong S Jiankun W Long et al ldquoShort-term wind speedforecasting via stacked extreme learning machine with gen-eralized correntropyrdquo IEEE Transactions on Industrial In-formatics vol 14 no 11 pp 4963ndash4971 2018

[25] J L Zhao H Zhu and S Zheng ldquoWhat is the value of anonline retailer sharing demand forecast informationrdquo SoftComputing vol 22 no 16 pp 5419ndash5428 2018

[26] G Kulkarni P K Kannan andW Moe ldquoUsing online searchdata to forecast new product salesrdquo Decision Support Systemsvol 52 no 3 pp 604ndash611 2012

[27] A Roy ldquoA novel multivariate fuzzy time series based fore-casting algorithm incorporating the effect of clustering onpredictionrdquo Soft Computing vol 20 no 5 pp 1991ndash20192016

[28] R J Kuo and K C Xue ldquoA decision support system for salesforecasting through fuzzy neural networks with asymmetricfuzzy weightsrdquo Decision Support Systems vol 24 no 2pp 105ndash126 1998

[29] P-C Chang C-H Liu and C-Y Fan ldquoData clustering andfuzzy neural network for sales forecasting a case study inprinted circuit board industryrdquo Knowledge-Based Systemsvol 22 no 5 pp 344ndash355 2009

[30] C-J Lu and Y-W Wang ldquoCombining independent com-ponent analysis and growing hierarchical self-organizingmaps with support vector regression in product demandforecastingrdquo International Journal of Production Economicsvol 128 no 2 pp 603ndash613 2010

[31] C-J Lu and L-J Kao ldquoA clustering-based sales forecastingscheme by using extreme learning machine and ensemblinglinkage methods with applications to computer serverrdquo


Engineering Applications of Artificial Intelligence vol 55pp 231ndash238 2016

[32] W Dai Y-Y Chuang and C-J Lu ldquoA clustering-based salesforecasting scheme using support vector regression forcomputer serverrdquo Procedia Manufacturing vol 2 pp 82ndash862015

[33] I F Chen and C J Lu ldquoSales forecasting by combiningclustering and machine-learning techniques for computerretailingrdquo Neural Computing and Applications vol 28 no 9pp 2633ndash2647 2016

[34] T Chiu D P Fang J Chen Y Wang and C Jeris ldquoA robustand scalable clustering algorithm for mixed type attributes ina large database environmentrdquo in Proceedings of the SeventhACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining August 2001

[35] T Zhang R Ramakrishnan and M Livny ldquoBirch a new dataclustering algorithm and its applicationsrdquo Data Mining andKnowledge Discovery vol 1 no 2 pp 141ndash182 1997

[36] L Li R Situ J Gao Z Yang and W Liu ldquoA hybrid modelcombining convolutional neural network with XGBoost forpredicting social media popularityrdquo in Proceedings of the 2017ACM on Multimedia ConferencemdashMM rsquo17 ACM MountainView CA USA October 2017

[37] J Ke H Zheng H Yang and X Chen ldquoShort-term fore-casting of passenger demand under on-demand ride servicesa spatio-temporal deep learning approachrdquo TransportationResearch Part C Emerging Technologies vol 85 pp 591ndash6082017

[38] K Shimada ldquoCustomer value creation in the informationexplosion erardquo in Proceedings of the 2014 Symposium on VLSITechnology IEEE Honolulu HI USA June 2014

[39] H A Abdelhafez ldquoBig data analytics trends and case studiesrdquoin Encyclopedia of Business Analytics amp Optimization Asso-ciation for ComputingMachinery New York NY USA 2014

[40] K Kira and L A Rendell ldquoA practical approach to featureselectionrdquo Machine Learning Proceedings vol 48 no 1pp 249ndash256 1992

[41] T M Khoshgoftaar K Gao and L A Bullard ldquoA com-parative study of filter-based and wrapper-based featureranking techniques for software quality modelingrdquo In-ternational Journal of Reliability Quality and Safety Engi-neering vol 18 no 4 pp 341ndash364 2011

[42] M A Hall and L A Smith ldquoFeature selection for machinelearning comparing a correlation-based filter approach to thewrapperrdquo in Proceedings of the Twelfth International FloridaArtificial Intelligence Research Society Conference DBLPOrlando FL USA May 1999

[43] V A Huynh-+u Y Saeys L Wehenkel and P GeurtsldquoStatistical interpretation of machine learning-based featureimportance scores for biomarker discoveryrdquo Bioinformaticsvol 28 no 13 pp 1766ndash1774 2012

[44] M Sandri and P Zuccolotto ldquoA bias correction algorithm forthe Gini variable importance measure in classification treesrdquoJournal of Computational and Graphical Statistics vol 17no 3 pp 611ndash628 2008

[45] J Brownlee ldquoFeature importance and feature selection withxgboost in pythonrdquo 2016 httpsmachinelearningmasterycom

[46] N V Chawla S Eschrich and L O Hall ldquoCreating ensemblesof classifiersrdquo in Proceedings of the IEEE International Con-ference on Data Mining IEEE Computer Society San JoseCA USA November-December 2001

[47] A K Jain M N Murty and P J Flynn ldquoData clustering areviewrdquo ACM Computing Surveys vol 31 no 3 pp 264ndash3231999

[48] A Nagpal A Jatain and D Gaur ldquoReview based on dataclustering algorithmsrdquo in Proceedings of the IEEE Conferenceon Information amp Communication Technologies HainanChina September 2013

[49] Y Wang X Ma Y Lao and Y Wang ldquoA fuzzy-basedcustomer clustering approach with hierarchical structure forlogistics network optimizationrdquo Expert Systems with Appli-cations vol 41 no 2 pp 521ndash534 2014

[50] B Wang Y Miao H Zhao J Jin and Y Chen ldquoA biclus-tering-based method for market segmentation using customerpain pointsrdquo Engineering Applications of Artificial Intelligencevol 47 pp 101ndash109 2015

[51] M Halkidi Y Batistakis and M Vazirgiannis ldquoOn clusteringvalidation techniquesrdquo Journal of Intelligent InformationSystems Integrating Artificial Intelligence and DatabaseTechnologies vol 17 no 2-3 pp 107ndash145 2001

[52] R W Sembiring J M Zain and A Embong ldquoA comparativeagglomerative hierarchical clustering method to clusterimplemented courserdquo Journal of Computing vol 2 no 122010

[53] M Valipour M E Banihabib and S M R BehbahanildquoComparison of the ARIMA and the auto-regressive artificialneural network models in forecasting the monthly inflow ofthe dam reservoirrdquo Journal of Hydrology vol 476 2013

[54] E Erdem and J Shi ldquoArma based approaches for forecastingthe tuple of wind speed and directionrdquo Applied Energyvol 88 no 4 pp 1405ndash1414 2011

[55] R J Hyndman ldquoForecasting functions for time series and lin-ear modelsrdquo 2019 httpmirrorcostarsfucamirrorCRANwebpackagesforecastindexhtml

[56] S Aishwarya ldquoBuild high-performance time series modelsusing auto ARIMA in Python and Rrdquo 2018 httpswwwanalyticsvidhyacomblog201808auto-arima-time-series-modeling-python-r

[57] J H Friedman ldquoMachinerdquo Ae Annals of Statistics vol 29no 5 pp 1189ndash1232 2001

[58] T Chen and C Guestrin ldquoXgboost a scalable tree boostingsystemrdquo 2016 httpsarxivorgabs160302754

[59] A Gomez-Rıos J Luengo and F Herrera ldquoA study on thenoise label influence in boosting algorithms AdaBoost Gbmand XGBoostrdquo in Proceedings of the International Conferenceon Hybrid Artificial Intelligence Systems Logrontildeo Spain June2017

[60] J Wang C Lou R Yu J Gao and H Di ldquoResearch on hotmicro-blog forecast based on XGBOOSTand random forestrdquoin Proceedings of the 11th International Conference onKnowledge Science Engineering andManagement KSEM 2018pp 350ndash360 Changchun China August 2018

[61] C Li X Zheng Z Yang and L Kuang ldquoPredicting short-term electricity demand by combining the advantages ofARMA and XGBoost in fog computing environmentrdquoWireless Communications and Mobile Computing vol 2018Article ID 5018053 18 pages 2018

[62] A M Jain ldquoComplete guide to parameter tuning in XGBoostwith codes in Pythonrdquo 2016 httpswwwanalyticsvidhyacomblog201603complete-guide-parameter-tuning-xgboost-with-codes-python


Hindawiwwwhindawicom Volume 2018

MathematicsJournal of


Mathematical Problems in Engineering

Applied MathematicsJournal of


Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of



Engineering Mathematics

International Journal of


Operations ResearchAdvances in

Journal of


Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences


Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018


Decision SciencesAdvances in


AnalysisInternational Journal of


Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

into account [10] +erefore univariate forecasting methodsare usually adopted as a benchmark model in many studies[11 12]

Another important branch of forecasting has beenMLAs +e existing MLAs have been largely influenced bystate-of-the-art forecasting techniques which range fromartificial neural network (ANN) convolutional neural net-work (CNN) radial basis function (RBF) long short-termmemory network (LSTM) extreme learning machine (ELM)to support vector regression (SVR) etc [13]

On the one hand some existing forecasting models havemade comparisons between MLAs and TSMs [14] Ansujet al showed the superiority of ANN on the ARIMAmethodin sales forecasting [15] Alon et al compared ANNwith traditional methods including Winters exponentialsmoothing BoxndashJenkins ARIMA model and multivariateregression indicating that ANNs perform favorably in re-lation to the more traditional statistical methods [16] DiPillo et al assessed the application of SVM to sales fore-casting under promotion impacts which was compared withARIMA Holt-Winters and exponential smoothing [17]

On the other hand MLAs based on TSMs have also beenapplied in sales prediction Wang et al proved the advan-tages of the integrated model combining ARIMA with ANNin modeling the linear and nonlinear parts of the data set[18] In [19] an ARIMA forecasting model was establishedand the residual of the ARIMA model was trained and fittedby the BP neural network A novel LSTM ensemble fore-casting algorithm was presented by Choi and Lee [20] thateffectively combines multiple forecast results from a set ofindividual LSTM networks In order to better handle ir-regular sales patterns and take various factors into accountsome algorithms have been attempted to exploit more in-formation in sales forecasting as an increasing amount ofdata are becoming available in e-commerce Zhao andWang[21] provided a novel approach to learning effective featuresautomatically from structured data using CNN Bandaraet al attempted to incorporate sales demand patterns andcross-series information in a unified model by training theLSTM model [22] More importantly ELM was widelyapplied in forecasting Luo et al [23] proposed a novel data-driven method to predict user behavior by using ELM withdistribution optimization In [24] ELMwas enhanced underdeep learning framework to forecast wind speed

Although there are various methods of forecasting thechoice of methods is determined by the characteristics ofdifferent goods [25] Kulkarni et al [26] argued that productcharacteristics could have an impact on both searching andsales due to the characteristics inherent to products were themain attributes that potential consumers were interested in+erefore to better reflect the characteristics of goods intosales forecasting clustering techniques have been introducedto forecast [27] For example in [28 29] both fuzzy neuralnetworks and clustering methods were used to improve theresults of neural networks Lu and Wang [30] constructedthe SVR to deal with the demand forecasting problem withthe aid of the hierarchical self-organizing maps and in-dependent component analysis Lu and Kao [31] put forwarda sales forecasting method based on clustering using extreme

learning machine and combination linkage method Daiet al [32] built a clustering-based sales forecasting schemebased on SVR A clustering-based forecasting model bycombining clustering and machine learning methods wasdeveloped by Chen and Lu [33] for computer retailing salesforecasting

According to the above literature review a three-stageXGBoost-based forecasting model is constructed to focus onthe two aspects (the sales features and tendency of a dataseries) mentioned above in this study

Firstly in order to forecast the sales features variousinfluencing factors of sales are first introduced in this studyby the two-step clustering algorithm [34] which is an im-proved algorithm based on BIRCH [35] +en a C-XGBoostmodel based on clustering is presented to model for eachcluster of the resulting clusters with the XGBoost algorithmwhich has been proved to be an efficient predictor in manydata analysis contests such as Kaggle and in many recentstudies [36 37]

Secondly to achieve higher predicting accuracy in thetendency of data series an A-XGBoost model is presentedintegrating the strengths of the ARIMA and XGBoost modelrespectively for the linear part and the nonlinear part of dataseries +erefore a C-A-XGBoost model is constructed asthe final combination model by weighting for theC-XGBoost and A-XGBoost models which takes the mul-tiple factors affecting the sales of goods and the trend of thetime series into account

+e paper is organized into 5 sections the rest of which isorganized as follows In Section 2 the key models and al-gorithms employed in the study are shortly described in-cluding the feature selection two-step clustering algorithma method of parameter determination of the ARIMA andthe XGBoost In Section 3 a three-stage XGBoost-basedmodel is proposed to forecast both the sales features andtendency of time series In Section 4 numerical examples areused to illustrate the validity of the proposed forecastingmodel In Section 5 the conclusions along with a note re-garding future research directions are summarized

2 Methodologies

21 Feature Selection With the emergence of web tech-nologies there is an ever-increasing growth in the amount ofbig data in the e-commerce environment [38] Variety is oneof the critical attributes in big data as they are generatedfrom a wide variety of sources and formats including textweb tweet audio video click-stream and log files [39] Inorder to remove most irrelevant and redundant informationfrom various data many techniques of feature selection(removing variables that are irrelevant) and feature ex-traction (applying some transformations to the existingvariables to obtain a new one) have been discussed to reducethe dimensionality of the data [40] including filter-basedand wrapper feature selection Wrapper feature selectionemploys a subroutine statistical resampling technique (suchas cross-validation) in the actual learning algorithm toforecast the accuracy of feature subsets [41] which is a betterchoice for different algorithms modeling the different data





σ

1N

1113944

N


2

11139741113972

(1)


cv σμ

(2)


r 1

n minus 11113944

n

i1

Xi minus X

σX

1113888 1113889Yi minus Y

σY

1113888 1113889 (3)













S b minus a

max(a b) (4)














Start

Timeseries

Stationaritytest

Difference

White noisetest

No

No



Comparisons of AIC

The optimal ARIMA

End

Yes

ARIMA modeling


Preclustering

Clustering

Outliers handling













R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(5)





Booster parameters













calculations


min MSE 1n

1113944

n

k1YCAand

(k) minus y(k)1113876 11138772 (6)

1113954YCA(k) wC1113954yC(k) + wA1113954yA(k) (7)








B

yCand

(1)

yCand

(2)

⋮

yCand

(n)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

yAand

(1)

yAand

(2)

⋮

yAand

(n)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(8)

W wC

wA1113890 1113891 (9)

Y [y(1) y(2) y(n)] (10)

BW Y (11)

BTBW BTY (12)

W BTB1113872 1113873minus 1BTY (13)









Stage 2 A-XGBoost

Stage 1 C-XGBoost

Two-stepclustering

Residuals

ARIMAprediction

XGBoostprediction

A-XGBoostprediction

C-XGBoostprediction



prediction

Stage 3C-A-XGBoost








431 C-XGBoost Model
























RMSE RMSE


and)2

1113969


bka|yk minus yk




Algorithm

Inputs Clusters

Two-step

1112

(a)


(b)


64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12




388177 (51)

1026625 (135)

264









432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018











σ

1N

1113944

N


2

11139741113972

(1)


cv σμ

(2)


r 1

n minus 11113944

n

i1

Xi minus X

σX

1113888 1113889Yi minus Y

σY

1113888 1113889 (3)













S b minus a

max(a b) (4)














Start

Timeseries

Stationaritytest

Difference

White noisetest

No

No



Comparisons of AIC

The optimal ARIMA

End

Yes

ARIMA modeling


Preclustering

Clustering

Outliers handling













R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(5)





Booster parameters













calculations


min MSE 1n

1113944

n

k1YCAand

(k) minus y(k)1113876 11138772 (6)

1113954YCA(k) wC1113954yC(k) + wA1113954yA(k) (7)








B

yCand

(1)

yCand

(2)

⋮

yCand

(n)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

yAand

(1)

yAand

(2)

⋮

yAand

(n)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(8)

W wC

wA1113890 1113891 (9)

Y [y(1) y(2) y(n)] (10)

BW Y (11)

BTBW BTY (12)

W BTB1113872 1113873minus 1BTY (13)









Stage 2 A-XGBoost

Stage 1 C-XGBoost

Two-stepclustering

Residuals

ARIMAprediction

XGBoostprediction

A-XGBoostprediction

C-XGBoostprediction



prediction

Stage 3C-A-XGBoost








431 C-XGBoost Model
























RMSE RMSE


and)2

1113969


bka|yk minus yk




Algorithm

Inputs Clusters

Two-step

1112

(a)


(b)


64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12




388177 (51)

1026625 (135)

264









432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018



















Start

Timeseries

Stationaritytest

Difference

White noisetest

No

No



Comparisons of AIC

The optimal ARIMA

End

Yes

ARIMA modeling


Preclustering

Clustering

Outliers handling













R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(5)





Booster parameters













calculations


min MSE 1n

1113944

n

k1YCAand

(k) minus y(k)1113876 11138772 (6)

1113954YCA(k) wC1113954yC(k) + wA1113954yA(k) (7)








B

yCand

(1)

yCand

(2)

⋮

yCand

(n)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

yAand

(1)

yAand

(2)

⋮

yAand

(n)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(8)

W wC

wA1113890 1113891 (9)

Y [y(1) y(2) y(n)] (10)

BW Y (11)

BTBW BTY (12)

W BTB1113872 1113873minus 1BTY (13)









Stage 2 A-XGBoost

Stage 1 C-XGBoost

Two-stepclustering

Residuals

ARIMAprediction

XGBoostprediction

A-XGBoostprediction

C-XGBoostprediction



prediction

Stage 3C-A-XGBoost








431 C-XGBoost Model
























RMSE RMSE


and)2

1113969


bka|yk minus yk




Algorithm

Inputs Clusters

Two-step

1112

(a)


(b)


64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12




388177 (51)

1026625 (135)

264









432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018
















R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(5)





Booster parameters













calculations


min MSE 1n

1113944

n

k1YCAand

(k) minus y(k)1113876 11138772 (6)

1113954YCA(k) wC1113954yC(k) + wA1113954yA(k) (7)








B

yCand

(1)

yCand

(2)

⋮

yCand

(n)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

yAand

(1)

yAand

(2)

⋮

yAand

(n)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(8)

W wC

wA1113890 1113891 (9)

Y [y(1) y(2) y(n)] (10)

BW Y (11)

BTBW BTY (12)

W BTB1113872 1113873minus 1BTY (13)









Stage 2 A-XGBoost

Stage 1 C-XGBoost

Two-stepclustering

Residuals

ARIMAprediction

XGBoostprediction

A-XGBoostprediction

C-XGBoostprediction



prediction

Stage 3C-A-XGBoost








431 C-XGBoost Model
























RMSE RMSE


and)2

1113969


bka|yk minus yk




Algorithm

Inputs Clusters

Two-step

1112

(a)


(b)


64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12




388177 (51)

1026625 (135)

264









432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018








min MSE 1n

1113944

n

k1YCAand

(k) minus y(k)1113876 11138772 (6)

1113954YCA(k) wC1113954yC(k) + wA1113954yA(k) (7)








B

yCand

(1)

yCand

(2)

⋮

yCand

(n)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

yAand

(1)

yAand

(2)

⋮

yAand

(n)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(8)

W wC

wA1113890 1113891 (9)

Y [y(1) y(2) y(n)] (10)

BW Y (11)

BTBW BTY (12)

W BTB1113872 1113873minus 1BTY (13)









Stage 2 A-XGBoost

Stage 1 C-XGBoost

Two-stepclustering

Residuals

ARIMAprediction

XGBoostprediction

A-XGBoostprediction

C-XGBoostprediction



prediction

Stage 3C-A-XGBoost








431 C-XGBoost Model
























RMSE RMSE


and)2

1113969


bka|yk minus yk




Algorithm

Inputs Clusters

Two-step

1112

(a)


(b)


64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12




388177 (51)

1026625 (135)

264









432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018













431 C-XGBoost Model
























RMSE RMSE


and)2

1113969


bka|yk minus yk




Algorithm

Inputs Clusters

Two-step

1112

(a)


(b)


64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12




388177 (51)

1026625 (135)

264









432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018


















RMSE RMSE


and)2

1113969


bka|yk minus yk




Algorithm

Inputs Clusters

Two-step

1112

(a)


(b)


64 93

62

110

57 69

135 127

93

82

51

91

1

2

3

4

5

6

7

8

9

10

11

12




388177 (51)

1026625 (135)

264









432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018














432 A-XGBoost Model








662

754

846

254

770

443

1620

F7

F1

F2

F3

F4

F5

F6


Feat

ures



F5 goods price

F1 goods click

F2 temperature mean

F3 cart click

F4 favorites click

0



035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018








035

0355

036

0365

037

0375

038

0385

039

0395

04

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

063

068

073

078

083

088

093

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)





0330335

0340345

0350355

0360365

0370375

0380385

039

1 2 3 4 5 6

ME

of tr

aini

ng se

t

Min_child_weight



(a)

059

0615

064

0665

069

0715

074

0765

079

1 2 3 4 5 6

MA

E of

trai

ning

set

Min_child_weight



(b)






R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018











R






⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

70times8

(14)

Day

SKU_salesdiff

ndash400

0

200

50 100 150 200 2500

(a)

ndash04

ndash02

00

02

Lag

ACF

5 201510

(b)

ndash04

ndash02

00

02

Lag

PACF

2015105

(c)


Day

SKU

_sal

es

50 100 150 200 2500

100

300

0

(a)

02

04

Lag

ACF

00

ndash0215105 20

(b)

Lag

PACF 02

04

00

025 201510

(c)

































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018






































Data Availability


0

10

20

30

380370360350Day

SKU

_sal

es









Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018










Acknowledgments


References










































































Journal of












Journal of







Volume 2018






Volume 2018
















































Journal of












Journal of







Volume 2018






Volume 2018








anapplicationofathree-stagexgboost-basedmodeltosales...

Documents