evaluation performance of svr genetic algorithm … · icic express letters part b: applications...

9
ICIC Express Letters Part B: Applications ICIC International c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631–639 EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM AND HYBRID PSO IN RAINFALL FORECASTING Rezzy Eko Caraka 1,3 , Rung Ching Chen 1,, Toni Toharudin 2 Muhammad Tahmid 4 , Bens Pardamean 3,5 and Richard Mahendra Putra 4 1 College of Informatics Chaoyang University of Technology Jifong East Road, Wufong Dist., Taichung City 41349, Taiwan [email protected]; * Corresponding author: [email protected] 2 Department of Statistics Padjadjaran University Bandung, West Java 45363, Indonesia 3 Bioinformatics and Data Science Research Center 5 Department of Computer Science, BINUS Graduate Program – Master of Computer Science Bina Nusantara University Jl. K. H. Syahdan No. 9, Kemanggisan, Palmerah, Jakarta 11480, Indonesia 4 Meteorological, Climatological, and Geophysical Agency (BMKG) Jl. Angkasa 1 No. 2 Kemayoran, Jakarta 10720, Indonesia Received April 2019; accepted May 2020 Abstract. Climate is an essential natural factor which is dynamic and challenging to predict. The accurate climate prediction is needed. In this paper, we use support vector regression (SVR) with different kernels such as polynomial, sigmoid and RBF. At the same time, we employ genetic algorithm, particle swarm optimization, and hybrid particle swarm optimization. SVR-GA are population-based algorithms that allow for optimization of problems with the search space that is very broad and complex. This property too allows genetic algorithms to jump out of the local area optimum. In contrast with SVR-PSO and SVR-HPSO they do not have the genetic operation. In PSO only use internal velocity and have the memory which is essential to the algorithm. In this paper we compare SVR-PSO, SVR-HPSO and SVR-GA by comparing the input from the correlation and ARIMA in rainfall data. It was found that the input using correlation provides better accuracy than ARIMA. Keywords: SVR, GA, HPSO, PSO 1. Introduction. Climate is a natural phenomenon that is very important and influen- tial for human life. Knowledge management of weather patterns and climate, especially rainfall, is needed in many sectors such as agriculture, plantations, and transportation. In the agriculture and plantation sectors, information that can predict the size of the amount of monthly rainfall in each region, will be beneficial to be able to determine the right cropping patterns and varieties of plants to produce good product. This is because rainfall has a direct effect on water availability. Therefore, an accurate, fast and site- specific information forecast is needed to predict future rainfall to minimize the impact of losses. Recently, models based on combining concepts have been paid more attention in climatology forecasting. Depending on different combination methods, combining models can be categorized into ensemble models and modular (or hybrid) models. Those ma- chine learning methods commonly applied for rainfall forecasting include artificial neural DOI: 10.24507/icicelb.11.07.631 631

Upload: others

Post on 07-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

ICIC Express LettersPart B: Applications ICIC International c⃝2020 ISSN 2185-2766Volume 11, Number 7, July 2020 pp. 631–639

EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHMAND HYBRID PSO IN RAINFALL FORECASTING

Rezzy Eko Caraka1,3, Rung Ching Chen1,∗, Toni Toharudin2

Muhammad Tahmid4, Bens Pardamean3,5 and Richard Mahendra Putra4

1College of InformaticsChaoyang University of Technology

Jifong East Road, Wufong Dist., Taichung City 41349, [email protected]; ∗Corresponding author: [email protected]

2Department of StatisticsPadjadjaran University

Bandung, West Java 45363, Indonesia

3Bioinformatics and Data Science Research Center5Department of Computer Science, BINUS Graduate Program – Master of Computer Science

Bina Nusantara UniversityJl. K. H. Syahdan No. 9, Kemanggisan, Palmerah, Jakarta 11480, Indonesia

4Meteorological, Climatological, and Geophysical Agency (BMKG)Jl. Angkasa 1 No. 2 Kemayoran, Jakarta 10720, Indonesia

Received April 2019; accepted May 2020

Abstract. Climate is an essential natural factor which is dynamic and challengingto predict. The accurate climate prediction is needed. In this paper, we use supportvector regression (SVR) with different kernels such as polynomial, sigmoid and RBF. Atthe same time, we employ genetic algorithm, particle swarm optimization, and hybridparticle swarm optimization. SVR-GA are population-based algorithms that allow foroptimization of problems with the search space that is very broad and complex. Thisproperty too allows genetic algorithms to jump out of the local area optimum. In contrastwith SVR-PSO and SVR-HPSO they do not have the genetic operation. In PSO onlyuse internal velocity and have the memory which is essential to the algorithm. In thispaper we compare SVR-PSO, SVR-HPSO and SVR-GA by comparing the input from thecorrelation and ARIMA in rainfall data. It was found that the input using correlationprovides better accuracy than ARIMA.Keywords: SVR, GA, HPSO, PSO

1. Introduction. Climate is a natural phenomenon that is very important and influen-tial for human life. Knowledge management of weather patterns and climate, especiallyrainfall, is needed in many sectors such as agriculture, plantations, and transportation.In the agriculture and plantation sectors, information that can predict the size of theamount of monthly rainfall in each region, will be beneficial to be able to determine theright cropping patterns and varieties of plants to produce good product. This is becauserainfall has a direct effect on water availability. Therefore, an accurate, fast and site-specific information forecast is needed to predict future rainfall to minimize the impact oflosses. Recently, models based on combining concepts have been paid more attention inclimatology forecasting. Depending on different combination methods, combining modelscan be categorized into ensemble models and modular (or hybrid) models. Those ma-chine learning methods commonly applied for rainfall forecasting include artificial neural

DOI: 10.24507/icicelb.11.07.631

631

Page 2: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

632 R. E. CARAKA, R. C. CHEN, T. TOHARUDIN ET AL.

networks (ANN). [1] compared ANN, with singular spectrum analysis (SSA) and multi-response support vector regression (MSVR). Comparison results indicated that modularmodels (referred to as ANN-SVR for daily rainfall simulations and MSVR for monthlyrainfall simulations) outperformed other models. However, [2] developed vector autore-gressive (VAR) and generalized space time autoregressive (GSTAR) as feature selectionin SVR. The main concept of SVR is to maximize the margin around the hyper planeand to obtain data points that become the support vectors. In this research, we applySVR-PSO, SVR-HPSO and SVR-GA to modeling the rainfall. We used the best inputSVR based on ARIMA and correlation rainfall in each year.

2. Methods. SVR is an extended model of support vector machine (SVM) [3] for pre-diction and regression cases [4]. SVM applies the concept of a ε-incentive loss function.SVR has reliable performance in predicting time-series data [5]. If the value of ε = 0, aperfect regression is obtained [6]. The concept of SVR is based on risk minimization [7],which is estimating a function by minimizing the upper limit of the generalization error,so that SVR can overcome overfitting:

f(x) = w · x+ b

In the case of nonlinearity, nonlinear mapping: R1 → F , where F is a feature space ofϕ which is introduced to explain the complexity of nonlinear regression problems at R1

for a simple linear regression problem at F . The regression function after transformationbecomes as follows:

f(x) = w · ϕ(x) + b

where w is a weighting vector, ϕ(x) is a function that maps x in a dimension and b is abias. To evaluate how well the regression function is, the function ε-incentive loss is usedas follows.

Lε(y, f(x)) =

{0, for |y − f(x)| ≤ ε|y − f(x)| − ε, otherwise

The function of ε-incentive loss is used to measure empirical risk the target differencewith estimation results [8]. Therefore, the parameter ε must be set to minimize empiricalrisk by using the slack variable ξ, ξ∗ which describes the deviation from training data out-side the ε-incentive zone. Besides minimizing empirical errors with ε-incentive, it mustalso minimize the Euclidean norm of linear ||w|| which is related to the generalization a-bility of the SVR model trained. The regression problem can be expressed as the followingquadratic optimization problem:

L(w, ξ) =1

2||w||2 + C

[n∑

i=1

(ξ2i + ξ′2i)

], C > 0

subject to

yi − w ∗ ϕ(xi)− b ≤ ε+ ξiw ∗ ϕ(xi) + b− yi ≤ ε+ ξ∗iϕi, ξ

∗i ≥ 0

where C states the penalty coefficient which determines the trade-off between the incisionand the generalization error in which the C value needs to be regulated. However, the val-ue of the optimal parameter C is in the range 1-1000. To solve the quadratic optimizationproblem in equation we can use dual Lagrangian:

f(xi) = (wϕ(xi) + b) =∑

αiK(xi, xj) + bnj = 1

f(xi) = wϕ(xi) + b

f(xi) =n∑

j=1

αiK(xi, xj) + b

Page 3: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

ICIC EXPRESS LETTERS, PART B: APPLICATIONS, VOL.11, NO.7, 2020 633

where K(xi, yi) is a kernel function. SVR uses kernel functions to transform non-linearinputs into higher-dimensional feature spaces [9]. Generally, problems in the real world arerarely linear separable [10]. The kernel function can solve this separable non-linear case[11]. This paper implements a merger of two optimization methods to find optimal kernelfunction parameters, such as genetic algorithm (GA) [12], particle swarm optimization(PSO) [13] and hybrid PSO [14]. Based on Figure 1 the population of chromosomes isgenerated randomly and allows it to multiply according to the law of evolution in hopesof producing a prime individual chromosome.

Figure 1. SVR-GA model

To optimize the SVR (C and γ) kernel parameters simultaneously [15], each chromo-some is defined as two parts, first, the C gene and γ gene. C consists of C1, C2, . . . , Cn andγ consists of γ1, γ2, . . . , γn. Second, the binary coding is used to represent chromosomes.Figure 2 illustrates the number of genes for each parameter is determined by the rangeof values given for that parameter. The gene contains parameter values in the form ofnumbers 0 or 1 which will be processed through crossover and mutation.

Figure 2. The chromosome of GA

This parameter value will be decoded as input in the SVR process. The assessmentcriteria used in this study are the accuracy of the SVR model that is determined bythe value of root mean square error (RMSE). The results of the crossover will mutate inspecific genes and its position (x). Next, each particle knows the best value in all data(Gbest). Particles always move towards the optimum potential solution. The movementspeed is influenced by the velocity that is renewed every iteration. The change in velocityof each particle is influenced by the value of the previous velocity, the Pbest position,and the Gbest. Different random values are generated as Pbest and Gbest accelerations.Particle swarm optimization (PSO) is an algorithm to find the minimum or maximumfunction values based on a new population. PSO has the advantage of finding complex

Page 4: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

634 R. E. CARAKA, R. C. CHEN, T. TOHARUDIN ET AL.

non-linear optimization values. PSO has similarities with the genetic algorithm whichstarts with a random population in the form of a matrix. However, PSO does not haveevolution operators, namely crossover and mutation like those in the genetic algorithm.The line on the matrix is called particle or in the genetic algorithm as a chromosomethat consists of the value of a variable. Each particle moves from its original positionto a better position with a velocity. Initialization of HPSO is the same as initializingPSO, which is the number of particles distributed for global best position search in theoptimization process.Based on Figure 3, each particle considers itself the owner of the best location (Pbest)

in each iteration. Determination of the best global position (Gbest) is determined afterthe search for the most optimal position of all particles that have considered themselves asPbest. The Gbest parameter is essential information for the movement of other particlesin position search because it influences particle movement for the next iteration. Theparticles that occupy the position have not reached optimal move towards a particle thatfinds the best equation for updating the speed and position of particles is written in theequation as follows:

V t+1id = w · V t

id + c1 · r1 · (pbestid − xid) + c2 · r2 · (gbestid − xid)

xt+1id = xt

id + vt+1id

where Vid: velocity, xid: particle position, t = iteration, d = initial number of particles,c1, c2: constants velocity, and r1, r2: random value. We can use inertia weight. We setα = 0.9, β = 0.4 and maximum iteration 1000.

w = α +α− β

maximum iteration× t

Figure 3. SVR-hybrid particle swarm model

3. Analysis. The research location is at Manado’s Sam Ratulangi Meteorology Station.Climatologically, rain in Manado is a type of Monsunal rain with peak rainfall averagesin January. Moreover, July is the peak of the dry season. From the opposite characters,atmospheric interactions are also different, from cloud cover conditions, sea surface tem-perature, monsoon winds, convectivity. Forecasting with the SVR method will be based

Page 5: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

ICIC EXPRESS LETTERS, PART B: APPLICATIONS, VOL.11, NO.7, 2020 635

on input from the ARIMA model which already has significant parameters and comparedwith input from grouping data per year based on correlation. Figure 4 represents thecorrelation of January rainfall in 2010 until 2018, 3 main groups can be formed: Group1 (January 2012, January 2015, January 2016), Group 2 (January 2010, January 2014,January 2018) and Group 3 (January 2011, January 2013, and January 2017). Then acomparison of inputs based on groups will be conducted based on correlation and inputbased on ARIMA.

Figure 4. Group input SVR from correlation

Based on the decomposition of the ARIMA model in Table 1 the lag Yt were obtainedYt−1, Yt−2, Yt−4 and Yt−12. The following are the results of decomposition of severalARIMA models that already have significant parameters along with Y lag input.

Table 1. ARIMA

ARIMA ([1, 2, 4, 12], 0, 0)

Parameter LAG White NoiseNormality Lag Input

Test SVR

ϕ1 = 0.61126 0.2017

< 0.0100

Y112 0.1651

ϕ2 = 0.172118 0.2851

Y224 0.4039

ϕ4 = 0.156930 0.5698

Y436 0.2450

ϕ12 = 0.241542 0.2341

Y1248 0.2212

Based on the input lag, the lag will be input (X) in forecasting with SVR. After gettingthe input, the next step is to determine the parameters of the SVR model. We performthe SVR by different kernel functions by parameter setting C(cost) = 100, γ = 0.01 andε = 0.5. To adjust the amount of these parameters, this study will use variations oftrial and error. To get a good forecasting result, it will be combined several choices ofparameter range values. To simplify the selection of parameters, the optimal range of theoptimal parameter ε will be sought, where the range C and γ are set. The chromosomesthat have been selected as prospective parents are given a uniform random number (0, 1).Figure 5 illustrates the crossovers on chromosomes. If the value of the amount is less thanthe probability of crossing (Pc = 0.8), the chromosome is selected as the parent, and acrossing process occurs.

Page 6: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

636 R. E. CARAKA, R. C. CHEN, T. TOHARUDIN ET AL.

Figure 5. Crossover on chromosome

Figure 6. Accuracy

Based on Figure 6, RMSE is a more intuitive alternative than MSE because it has thesame measurement scale as the data being evaluated. A low RMSE amount indicates thatthe variation in value produced by a forecast model is close to the variation in the valueof its observations. Moreover, input SVR model based on correlation provides higher ac-curacy than ARIMA. Besides the smaller RMSE, the computational time required for theSVR-PSO and SVR-HPSO methods is faster than SVR-GA. The thing that causes quicktime computing is the accuracy of the coefficient used so that the number of iterations isnot too high.The procedure of SVR-PSO and SVR-HPSO has many similarities to SVR-GA, where

the system begins with a population formed from random solutions then the system looksfor optimality through random generation updates. However, SVR-PSO does not haveevolution operators, such as mutations and crossover. Conversely, potential solutions,namely individuals, or what are called particles, “fly” follow the optimum individuals atthis time and reach the optimum particles. Each individual keeps track of its position inproblem space. The traces of the position are interpreted as the best solution, or fitnessin SVR-GA, which has been obtained so far. Thus, the information sharing mechanismowned by SVR-PSO and SVR-HPSO differs significantly from that of SVR-GA. In SVR-GA, each individual, called a chromosome, shares information with each other, so thatthe entire population moves as a whole towards optimality. In SVR-PSO and SVR-HPSO, only gbest, or lbest, gives information to others. This is a one-way informationsharing mechanism. The evolutionary process is just looking for the best solution. Thus,all individuals, called particles, move converging rapidly to the best solution. Figure 7explains the actual data (*) with predict data (–). If the predict data (–) follows theactual data (*) then the model can learn very well.

Page 7: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

ICIC EXPRESS LETTERS, PART B: APPLICATIONS, VOL.11, NO.7, 2020 637

Figure 7. BEST-SVR-HPSO in Sample (top), and out Sample (bottom)

Figure 7 represents that even we propose hybrid methods still but there are still weak-nesses to predict accurate data because the events of rain are so dynamic. Besides, inFigure 8 there are differences in precipitation from 2015 to 2018. If the colour gets red,it will increase the potential for rain in the area. While the blue colour explains that thepotential for rain will decrease in that month.

4. Conclusion. The most significant contribution of this paper is to evaluate perfor-mance SVR by using a different kernel (linear, RBF, Sigmoid, Polynomial). We apply

Page 8: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

638 R. E. CARAKA, R. C. CHEN, T. TOHARUDIN ET AL.

(a) (b)

(c) (d)

Figure 8. (color online) MSG precipitation in Sulawesi January 2015 (a),2016 (b), 2017 (c), 2018 (d)

correlation and ARIMA to getting the best input in SVR, to take account of both theaccuracy and interpretability of the forecast results. Secondly, we employ three suitableevolutionary algorithms to reduce the performance volatility of an SVR model with dif-ferent parameters and optimization by using GA, PSO, HPSO. Based on the simulationresults it can be concluded that the SVR-HPSO was able to handle complex and paral-lel problems. SVR-HPSO also has other advantages: having a simple concept, easy toimplement, and efficient in calculations when compared to mathematical algorithms andother heuristic optimization techniques. However, SVR-GA can handle various kinds ofoptimization in its objective function (fitness) whether balanced or not balanced, linearor not linear, continuous or non-continuous, or with random noise. Comparing SVR-GA,SVR-PSO, and SVR-HPSO, SVR-HPSO is more flexible in maintaining a balance be-tween global and local searches for its search space. In testing the number of iterationssignificantly affects the fitness results obtained each time. This is due to the frequentoccurrence of the position update process that occurs in each iteration. The number ofiterations will make slow or fast computing. We found during the testing inertia weightsin SVR-PSO and SVR-HPSO, it can be obtained the high accuracy if the weight is greater

Page 9: EVALUATION PERFORMANCE OF SVR GENETIC ALGORITHM … · ICIC Express Letters Part B: Applications ICIC International ⃝c 2020 ISSN 2185-2766 Volume 11, Number 7, July 2020 pp. 631{639

ICIC EXPRESS LETTERS, PART B: APPLICATIONS, VOL.11, NO.7, 2020 639

than the inertia. Also, it will be a decrease in the speed of each iteration. In other words,the higher weight then the inertia, velocity of the particle will be slowed at the startingpoint of finding a solution. If the speed slows down at the start of the point, the searchfor solutions to this will provide an opportunity for local exploitation. However, sincethe rainfall is dynamic in next research we should concern to treat the outlier (extremerainfall condition), feature engineering, and try another algorithm tuning.

Acknowledgment. This paper is supported by the Ministry of Science and Technology,Taiwan, under Grant MOST-107-2221-E-324-018-MY2 and MOST-106-2218-E-324-002,and supported by BDSRC Bina Nusantara University, Indonesia.

REFERENCES

[1] C. L. Wu and K. W. Chau, Prediction of rainfall time series using modular soft computing methods,Eng. Appl. Artif. Intell., 2013.

[2] D. D. Prastyo, F. S. Nabila, Suhartono, M. H. Lee, N. Suhermi and S. F. Fam, VAR and GSTAR-based feature selection in support vector regression for multivariate spatio-temporal forecasting, inCommunications in Computer and Information Science, 2019.

[3] C. Cortes and V. Vapnik, Support-vector networks, Mach. Learn., vol.20, no.3, pp.273-297, 1995.[4] S. R. Gunn, Support vector machines for classification and regression, ISIS Tech. Rep., vol.14, no.2,

pp.230-267, 1998.[5] R. E. Caraka and S. A. Bakar, Evaluation performance of hybrid localized multi kernel SVR

(LMKSVR) in electrical load data using 4 different optimizations, J. Eng. Appl. Sci., vol.13, no.17,2018.

[6] H. Yasin, R. E. Caraka, Tarno and A. Hoyyi, Prediction of crude oil prices using support vectorregression (SVR) with grid search – Cross validation algorithm, Glob. J. Pure Appl. Math., vol.12,no.4, pp.3009-3020, 2016.

[7] R. E. Caraka, S. A. Bakar, B. Pardamean and A. Budiarto, Hybrid support vector regression inelectric load during national holiday season, in Proceedings – 2017 International Conference on In-novative and Creative Information Technology: Computational Intelligence and IoT (ICITech 2017),2018.

[8] C. Li, H. Zhang, H. Zhang and Y. Liu, Short-term traffic flow prediction algorithm by support vectorregression based on artificial bee colony optimization, ICIC Express Letters, vol.13, no.6, pp.475-482,2019.

[9] T. Hofmann, B. Scholkopf and A. J. Smola, Kernel methods in machine learning, Ann. Stat., vol.36,no.3, pp.1171-1220, 2008.

[10] R. E. Caraka, S. A. Bakar and M. Tahmid, Rainfall forecasting multi kernel support vector regressionseasonal autoregressive integrated moving average, AIP Conf. Proc., vol.020014, 2019.

[11] Y. F. Ju and S. W. Wu, Village electrical load prediction by genetic algorithm and SVR, in Proceed-ings – 2010 3rd IEEE International Conference on Computer Science and Information Technology(ICCSIT 2010), 2010.

[12] D. Whitley, A genetic algorithm tutorial, Stat. Comput., 1994.[13] J. Kennedy and R. Eberhart, Particle swarm optimization, Proceedings of IEEE International Con-

ference on Neural Networks, vol.4, pp.1942-1948, 1995.[14] K. Premalatha and A. M. Natarajan, Hybrid PSO and GA for global maximization, Int. J. Open

Probl. Compt. Math, 2009.[15] H. Yasin, R. E. Caraka, A. Hoyyi and Sugito, Stock price modeling using localized multiple kernel

learning support vector machine, ICIC Express Letters, Part B: Applications, vol.11, no.4, pp.333-339, 2020.