classifying imbalanced data sets by a novel re-sample and...
TRANSCRIPT
![Page 1: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/1.jpg)
Research ArticleClassifying Imbalanced Data Sets by a Novel RE-Sample andCost-Sensitive Stacked Generalization Method
Jianhong Yan and Suqing Han
Department of Computer Science Taiyuan Normal University Taiyuan 030012 China
Correspondence should be addressed to Jianhong Yan yan jian hong163com
Received 3 June 2017 Revised 1 October 2017 Accepted 6 November 2017 Published 23 January 2018
Academic Editor Michele Migliore
Copyright copy 2018 JianhongYan and SuqingHanThis is an open access article distributed under theCreative CommonsAttributionLicense which permits unrestricted use distribution and reproduction in anymedium provided the originalwork is properly cited
Learning with imbalanced data sets is considered as one of the key topics in machine learning community Stacking ensemble is anefficient algorithm for normal balance data sets However stacking ensemble was seldom applied in imbalance data In this paperwe proposed a novel RE-sample and Cost-Sensitive Stacked Generalization (RECSG) method based on 2-layer learning modelsThe first step is Level 0model generalization including data preprocessing and basemodel trainingThe second step is Level 1 modelgeneralization involving cost-sensitive classifier and logistic regression algorithm In the learning phase preprocessing techniquescan be embedded in imbalance data learning methods In the cost-sensitive algorithm cost matrix is combined with both datacharacters and algorithms In the RECSGmethod ensemble algorithm is combined with imbalance data techniques According tothe experiment results obtained with 17 public imbalanced data sets as indicated by various evaluation metrics (AUC GeoMeanand AGeoMean) the proposed method showed the better classification performances than other ensemble and single algorithmsThe proposed method is especially more efficient when the performance of base classifier is low All these demonstrated that theproposed method could be applied in the class imbalance problem
1 Introduction
Classification learning becomes complicated if the classdistribution of the data is imbalanced The class imbalanceproblem occurs when the number of representative instancesis much less than that of other instances Recently theclassification problem of imbalanced data appears frequentlyand has been widely concerned [1ndash3]
Usually imbalanced data sets are grouped into binaryclass and the majority class and minority class are respec-tively denoted as negative class and positive class Tra-ditional techniques are divided into four categories RE-sample technique is to increase theminority class of instances(oversampling) [4] or decrease the majority class of instances(undersampling) [5 6] Existing algorithms are improvedby increasing the weight of positive instances [7] Classi-fier ensemble method is widely adopted to deal with theimbalance problem in the last decade [8] In cost-sensitivealgorithms data characters are incorporated withmisclassifi-cation costs in the classification phase [9 10] In general cost-sensitive and algorithms levels are more associated with the
imbalance problemwhereas data level and ensemble learningcan be used and independent of the single classifier
Ensemble methods involving training base classifiersintegrating their results and generating a single final classlabel can increase the accuracy of classifiers Bagging algo-rithm [11] and Ada AdaBoost boost algorithm [12 13] are themost common ensemble classification algorithms Ensem-ble algorithms combined with the other three techniquesare widely applied in the classification of imbalanced datasets Cost-sensitive learning targets the imbalanced learn-ing problem by using different cost matrices that can beconsidered as a numerical representation of the penalty ofmisclassifying examples from one class to another Datacharacters are incorporated with misclassification costs inthe classification phase Cost-sensitive learning is closelyrelated to learning from imbalanced data In order to solvethe imbalance problem ensemble algorithms were combinedwith data preprocessing cost-sensitive method and relevantalgorithms in previous studies Four ensemble methods forimbalance data sets are commonly used boosting- bagging-cost-sensitive-boosting and hybrid ensemble methods
HindawiMathematical Problems in EngineeringVolume 2018 Article ID 5036710 13 pageshttpsdoiorg10115520185036710
2 Mathematical Problems in Engineering
In boosting-based ensembles the data preprocessingtechnique is embedded into boosting algorithms In each iter-ation the data distribution is changed by altering the weightto train the next classifier towards the positive class Thesealgorithmsmainly include SMOTEBoost [14] RUSBoost [15]MSMOTEBoost [16] and DataBoost-IM [17] algorithms Inbagging-based ensembles the algorithms combine baggingwith data preprocessing techniques The algorithms are usu-ally simpler than their integration in boosting because ofsimplicity and good generalization ability of bagging Thefamily includes but not limited to OverBagging Under-OverBagging [18] UnderBagging [19] and IIVotes [20]In cost-sensitive-boosting ensembles the general learningframework of AdaBoost is maintained but amisclassificationcost adjustment function is introduced into the weightupdating formula These ensembles are usually different inthemodification way of the weight update rule AdaCost [21]CSB1 CSB2 [22] RareBoost [23] AdaC1 AdaC2 and AdaC3[24] are the most representative approaches Unlike previousalgorithms hybrid ensemble methods adopt the doubleensemble learningmethods For example EasyEnsemble andBalanceCascade use bagging as the main ensemble learningmethod but each base classifier is trained by AdaBoostTherefore the final classifier is an ensemble of ensembles
Stacking algorithm is another ensemble method andhas the same base classifiers to the Bagging and AdaBoostalgorithms but the structure is different Stacking algorithmhas a two-level structure consisting of Level 0 classifiers andLevel 1 classifiers Stacking algorithm involves two steps Thefirst step is to collect the output of each model into a newset of data For each instance in the original training set thedata set represents every modelrsquos prediction of that instancersquosclass and the models are base classifiers In the second stepbased on the new data and true labels of each instance in theoriginal training set a learning algorithm is employed to trainthe second-layer model In Wolperts terminology the firststep is referred to as the Level 0 layer and the second-stagelearning algorithm is referred to as the Level 1 layer [25]
Stacking ensemble is a general method in which a high-level model is combined with the lower-level models Stack-ing ensemble can achieve the higher predictive accuracyChen et al adopted ant colony optimization to configurestacking ensembles for data mining [26] Kadkhodaei andMoghadam proposed an entropy-based approach to searchfor the best combination of the base classifiers in ensembleclassifiers based on stack generalization [27] Czarnowski andJędrzejowicz focused on the machine classification with datareduction based on stacked generalization [28] Most of theprevious studies were focused on the way to use or generatestacking algorithm However the stacking ensemble does notconsider data distribution and is suitable for the commondata sets other than imbalance data
In order to solve the imbalance problem the paper intro-duces cost-sensitive learning into stacking ensemble and addsamisclassification cost adjustment function into theweight ofinstance and classifier In this waymisclassification costsmaybe considered in the data set as a form of data space weightingto select the best distribution for training On the other handin the combine stage metatechniques can be integrated with
cost-sensitive classifiers to replace standard cost-minimizingtechniques The weights for the misclassification of positiveinstances are higher and the weights for misclassification ofnegative instance are relatively lower The method providesan option for imbalanced learning domains
In this paper RE-sample and Cost-Sensitive StackedGeneralization (RECSG) method is proposed in order tosolve the imbalance problem In the method preprocessedimbalance data are used to train the Level 0 layer modelUnlike common ensemble algorithm for imbalanced datathe proposed method utilized cost-sensitive algorithm asthe Level 1 (meta)layer Stacking methods combined withimbalance data approaches including cost-sensitive learninghad been reported Kotsiantis proposed a stacking variantmethodologywith cost-sensitivemodels as base learners [29]In the method the model tree was replaced by MLR inmetalayer to determine the class with the highest probabilityassociated with the true class Lo et al proposed a cost-sensitive stacking method for audio tag annotation andretrieval [30] In these methods cost-sensitive learners areadopted in the base-layer model and the metalayer modelis trained by other learning algorithms such as SVM anddecision tree In this paper Level 0model generalizer involvesresampling the data and training the base classifier The cost-sensitive algorithm is used to train the Level 1 metalayerclassifier The two layers adopt the imbalanced data algo-rithms and take the full advantages of mature methods Level1 layer model has a bias towards the performance of theminority class Therefore the method proposed in the studyis more efficient than other methods in which cost-sensitivealgorithms are only used in the Level 0 layer
The method was compared with common classificationmethods including other ensemble algorithms Additionallythe evaluation metrics of the algorithm were analyzed basedon the results of statistical tests Statistical tests of evaluationmetrics demonstrated that the proposed new approach couldeffectively solve the imbalanced problem
The paper is structured as follows Related ensembleapproaches and cost-sensitive algorithms are introduced inSection 2 Section 3 introduces the details of proposedRECSG approach including Level 0 and Level 1 modelgeneralizers In Section 4 experiments and correspondingresults and analysis are presented Statistical tests of eval-uation metrics of algorithm performance are analyzed anddiscussed Finally Section 5 discusses the advantages anddisadvantages of the proposed method
2 Background
21 Performance Evaluation in Imbalanced Domains Theevaluation metric is a vital factor for the classifier model andperformance assessment In Table 1 the confusion matrixdemonstrates the results of incorrectly and correctly classifiedinstances of each class in the two classes of problems
Accuracy is the most popular evaluation metric How-ever it cannot effectively measure the correct rates of allthe classes so it is not an appropriate metric for imbalancedata sets For this reason in addition to accuracy moresuitable metrics should be considered in the imbalance
Mathematical Problems in Engineering 3
Table 1 Confusion matrix for performance evaluation
Positive prediction Negative predictionPositive class True positive (TP) False negative (FN)Negative class False positive (FP) True negative (TN)
problem Other metrics have been proposed to measurethe classification performance independently Based on theconfusion matrix (Table 1) these metrics are defined as
Precision = tp(tp + fp)
Recall = tp(tp + fn)
119865-Measure = (1 + 120573)2 lowast Recall lowast Precision1205732 lowast Recall + Precision
(1)
where 120573 is a coefficient to adjust the relative importance ofprecision versus recall (usually 120573 = 1)
FPrate = FP(FP + TN)
TPrate = TP(TP + FN)
(2)
The used combined evaluation metrics of these measuresinclude receiver operating characteristic (ROC) graphic [31]the area under the ROC curve (AUC) [2] average geometricmean of sensitivity and specificity (GeoMean) [32] (see (4))and average adjusted geometric mean (AGeoMean) [33] (see(5)) where 119873119899 refers to the proportion of the majoritysamples These metrics are defined as
AUC = (1 + TPrate minus FPrate)2 (3)
GeoMean = radicTPrate lowast TNrate (4)
AGeoMean
=
GeoMean + TNrate lowast 1198731198991 + 119873119899
TPrate gt 00 TPrate = 0
(5)
22 Stacking Bagging and AdaBoost are the most commonensemble learning algorithms In bagging method differentbase classifier models generate different classification resultsand the final decision is decided by majority voting InAdaBoost algorithm a series of base weak classifiers aretrained with the whole training set and the final decisionis generated by a weighted majority voting scheme In eachround of training iteration different weights are attributed toeach instance In the two algorithms the base classifiers arethe same
Stacking ensemble is another ensemble algorithm inwhich the prediction result of base classifiers is used as the
attribute to train the combination function in metalayer clas-sifier [25] The algorithm has a two-level structure consistingof Level 0 classifiers and Level 1 classifiers It was proposedby Wolpert and used by Breiman [34] and Leblanc andTibshirani [35]
Set a data set
119863 = (119883119899 119884119899) 119899 = 1 119873 (6)
where 119883119899 is a vector representing the attribute values ofthe instance 119899 and 119884119899 is the class value All 119873 instancesare randomly split into 119895 equivalent parts and 119895-fold cross-validation is used to train the model The prediction resultsof every time testing set gives a vector which includes 119899instances
1199101 1199102 1199101198991015840 (7)
Set 119872119895 is the difference base learning algorithm modelobtained with the data set 119863 119895 = 1 119896 and119872119895 is the partof Level 0 models Level 0 layer consists of 119895 base classifierswhich are employed to estimate the class probability of eachinstance
In119872119895 119895-fold cross-validation of training data set119863 givesthe prediction result
119910(119895)1 119910(119895)2 119910(119895)119899 119895 = 1 119896 (8)
All the119872119895 prediction results generate input space of Level1 model and the real value of instance is treated as the outputspace The model is expressed as follows
Input output
119910(1)1 119910(2)1 119910(119894)1 119910(119896)1 1199101119910(1)2 119910(2)2 119910(119894)2 119910(119896)2 1199102
119910(1)119899 119910(2)119899 119910(119894)119899 119910(119896)119899 119910119899
(9)
The above intermediate data are considered as the train-ing data of the Level 1 layer model The input data aretreated as features and the real value of instance is treatedas the output space The next step is to train the data withsome fused leaning algorithms The process is called Level 1generalizer and Level 1 model is denoted by which can beregarded as the function of (119910(1) 119910(2) 119910(119894) 119910(119896))
In the stacking process Level 0 (119872119895 119895 = 1 119896 ) iscombined with Level 1 () Given a new instance 1199091015840 models119872119895 produce a prediction result vector
119910(1) 119910(2) 119910(119894) 119910(119896) 119894 = 1 119896 (10)
Then model is used to combine base classifiers andpredict the final result of 1199091015840
In this paper for imbalance data we propose a stackedgeneralization based on cost-sensitive classification In Level0 model generalizer layer we firstly resample the imbal-ance data Resampling approaches include oversampling and
4 Mathematical Problems in Engineering
Training dataset
andtest dataset
Level 0 (base)-layer
resample(oversample) (undersample) (SMOTE)
Level 0 (base)-layerclassification middot middot middot
middot middot middot
middot middot middot
Ensemble spaceLevel 1 (meta)layerclassification
New data2New data3New data1
LMGJF1 LMGJF2 LMGJFk
M1M2 Mk
Decision space of M2 Decision space of Mk
MGN
Decision space of M1
u1 xiNi=1 u1 x
i
N
i=1( ) ( ) u2 xi
Ni=1 u2 x
i
N
i=1( ) ( ) uk xi
Ni=1 uk x
i
N
i=1( ) ( )
yiGN
N
i=1
D = xi yi
N
i=1( )
DNMN = xi
N
i=1( )
DGN = u1 xi yiNi=1 D
GN = u1 x
i yi
N
i=1 ( ) ( )
Figure 1 Flowchart of the RECSG architecture
undersampling Secondly Model119872119895 119895 = 1 119896 is trainedby the classification with new data and the prediction output119910(119895)119894 by 119895 = 1 119896 is produced with original data In Lever1 model generalizer layer model is generated by cost-sensitive classification based on logistic regression
3 Stacked Generalization forthe Imbalance Problem
The proposed RECSG architecture includes two layers Thefirst layer (Level 0) consisting of classifiers ensemble is calledthe base layer the second layer (Level 1) combined withbase classifiers is called the metalayer The flowchart of thearchitecture is shown in Figure 1
31 Level 0 Model Generalizers For the imbalance prob-lem Level 0 model generalizer step of RECSG includespreprocessing data and training the base classifier Firstlythe oversampling (SMOTE) method has been used in datapreprocessing of base classifier Secondly the base classifiermodel is trained with the new data set In this level weemployed three base algorithms Naıve Bayes (NB) [36]decision tree C45 [37] and 119896-nearest neighbors (119896-NN) [38]
(1) Naıve Bayes (NB) Given instance 119909 set 119875(119894 | 119909) is theposterior probability of class 119894 then
119875119894 (119909) =119875 (119894 | 119909)Σ119894119875 (119894 | 119909)
(11)
where NB uses an Laplacian estimate for estimating theconditional probabilities
Mathematical Problems in Engineering 5
Table 2 Cost matrix
Actual negative Actual positivePredict negative 119862 (0 0) = 11988800 119862 (0 1) = 11988801Predict positive 119862 (1 0) = 11988810 119862 (1 1) = 11988811
Table 3 Cost matrix of credit data sets
Actual bad Actual goodPredict bad 0 1Predict good 5 0
(2) Decision Tree (C45) Decision tree classification is amethod usually used in data mining [39] A decision tree isa tree where each input feature labels a nonleaf node oneclass or probability over the class labels each leaf node of thetree By recursive partitioning a tree can be ldquolearnedrdquo untilno prediction value is added or the subset has the same valueWhen the parameter of training data selects informationentropy the decision tree is named C45
(3) 119896-NN k-NN is a nonparametric algorithm used forregression or classification An instance is classified by amajority vote of its 119896-nearest neighbors If 119896 = 1 the instanceis simply classified in accordance with the class label of thenearest neighbor
All the above algorithms are simple and have low com-plexity so they are applicable weak base classifiers
32 Level 1 Model Generalizers Prediction results of severalbase classifiers in Level 0 layer are used as input space andtrue value class of instance is used as output space Based onLevel 0 layer Level 1 layer model is trained by other learningalgorithms For imbalanced data the cost-sensitive algorithmis used to train the Level 1 metalayer classifier in the paper
321 Cost-Sensitive Classifier Aiming at imbalanced datalearning problem in cost-sensitive classifier different costmatrices are used as the penalty ofmisclassified instance [40]For example a cost matrix 119862 has the following structure in abinary classification scenario in Table 2
In the cost matrix the row indicated alternative predictedclasser whereas the column indicates actual classes The costof false negative is notated as 11988801 and the cost of false positiveis notated as 11988810 Conceptually the cost of correctly classifiedinstances should always be less than the cost of incorrectlyclassified instances In the imbalance problem 11988810 is alwaysgreater than 11988801 For the German credit data set previouslyreported as the part of the Stalog project [41] the cost matrixis provided in Table 3
The cost of false good prediction is greater than the costof false bad prediction in the view of economical reason Soin Level 1 classifier of stacking for the imbalance problem thecost-sensitive algorithm is adopted
322 Logistic Regression Ting and Witten illustrated thatMLR (Multiresponse Linear Regression) algorithm had an
advantage over other Level 1 generalizers in stacking [42]In this paper the logistic regression classifier is used as themetalayer algorithm Based on the metalayer the cost of mis-classification is considered in the cost-sensitive algorithm Inlogistic regression classifier the prediction result of Level 0layer is used as the attribute of Level 1 metalayer and the realvalue of instance is used as output spaceThe linear regressionfor class 119894 is simply obtained as
LR119894 (119909) =119870
sum119895=1
120572119895119894119875119895119894 (119909) 119895 = 1 119896 (12)
Details of the implementation of RECSG are presentedbelow
33 Algorithm Pseudocode Pseudocode 1 presents the pro-posed RECSG approach Input parameters are two data setstraining set and testing set Output predicts class labelsof the test samples The first step in the process is datapreprocessing resample new instances and train model119872119895 (119895 = 1 119896) where 119895 is the number of base classifiersand 119906119895(119909119894) is the generated function of 119909119894 based on model119872119895(lines (1)-(2) in Pseudocode 1)
Then the metalayer model is constructed based onthe data 119863meta which is firstly predicted with Level 0 layer(line (3) in Pseudocode 1) Finally Lever 1 layer classification(cost-sensitive and logistic regression) is used to predict theultimate value of the tested samples
4 Empirical Investigation
The experiment aims to verify whether the RECSG approachcan improve the classification performance for the imbalanceproblem In this paper the RECSG approach was comparedwith other algorithms involving various combination ensem-ble techniques and imbalanced approaches For eachmethodthe same training set testing set and validation set were used
The experimental system has been implemented and iscomposed of 7 learning methods implemented in Weka [43]namely Naıve Bayes(1) (NB) C45(j48)(2) 119896-nearest neigh-bor (k-NN)(3) cost-sensitive(4) AdaBoost(5) Bagging(6) andstacking(7) The brief description the standard version andparameters of these methods are shown in Table 4
The RECSG approach includes 2 layers Level 1 layer(metalearning) system consists of 2 learning methods imple-mented inWeka namely simple logistic regression(9) (MLP)and cost-sensitive classifier(4) MLP is the base classificationalgorithm of cost-sensitive classifier Level 0 layer (base -learning) of stacking is composed of 3 classifiers NaıveBayes(1) C45(2) and 119896-NN(3) Evaluation metrics of algo-rithmperformance includeAUCGeoMean andAGeoMean
41 Experimental Settings Experiments were implementedwith 17 data sets from the UCI Machine Learning Repository[44] These data sets cover various fields and are based onIR measure values (from 054 to 0014) unique data setnames a varying number of samples (from 173 to 2338)
6 Mathematical Problems in Engineering
Input Training set119863 = (119909119894 119910119894)119873119894=1 Test dataset119863test = (1199091015840119894 )1198731015840
119894=1
Output Predict class labels of the test samples 1199101015840119894meta1198731015840
119894=1
For each119895 = 1 2 119870 do(1) Resample imbalance data and generate 119895ndashfold cross-validation sets to obtain New data119895(2) Train and compute 119906119895(119909119894)119873119894=1 and 119906119895(1199091015840119894 )119873
1015840
119894=1 in Level-0 (base)-layer classifier119872119895end(3) Construct119863meta = 1199061(119909119894) 119910119894119873
1015840
119894=1 and1198631015840meta = 1199061(1199091015840119894 ) 1199101198941198731015840
119894=1
(4) Based on the data119863meta classification (cost-sensitive and Logistic Regression) is used togenerate Level-1 (meta)-layer model through with1198631015840meta to predict 1199101015840119894meta119873
1015840
119894=1
Pseudocode 1 Pseudocodes of RE-sample and Cost-Sensitive Stacked Generalization
Table 4 Base classifier methods and corresponding ensemble algorithms
Methods Descriptions and parameters Abbreviations(1) Naıve Bayes wekaclassifiersbayesNaiveBayes NB(2) C45 wekaclassifierstreesJ48 -C 025 -M 2 C45
(3) 119896-nearest neighborwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNNSearch -AwekacoreEuclideanDistance -R first-last
KNN
(4) Cost-sensitive with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Cost-NB
(5) AdaBoost with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Ada-NB
(6) AdaBoost with cost-sensitive and Naıve Bayes
wekaclassifiersmetaAdaBoostM1 -P 100 -S 1-I10-W wekaClassifiersmetaCostSensitiveClassifier-- -cost-matrix [10 30 10 10] -S 1 -W weka
classifiersbayesNaiveBayes
Ada-Cost-NB
(7) Bagging with Naıve BayeswekaclassifiersmetaBagging -P 100 -S 1
-num-slots 1 -I 10 -WwekaclassifiersbayesNaiveBayes
Bag-NB
(8) Bagging with cost-sensitive and Naıve Bayes
wekaclassifiersmetaBagging -P 100 -S 1-num-slots 1 -I 10 -W
wekaclassifiersmetaCostSensitive Classifier ---cost-matrix [10 30 10 10] -S 1 -W
wekaclassifiersbayesNaiveBayes
Bag-Cost-NB
(9) Stacking with NB KNN C45 and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersfunctionsLogistic -R 10E-8 -M -1ndashnum ndashdecimal-places 4 -S 1 -num-slots 1 -B
weka classifiersbayesNaiveBayes -Bwekaclassifiers lazyIBk -K 1 -W 0 -A
wekacoreneighboursearch LinearNNSearch-A wekacoreEuclideanDistance -R
first-last -B wekaclassifierstreesJ48 -C025 -M 2
Stacking-log
(10) Stacking cost-sensitive with NB KNN C45and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersmetaCostSensitiveClassifier ndashcost
-matrix [10 30 10 10] -S 1 -Wwekaclassifiers functionsLogistic -- -R 10E-8 -M-1 ndashnum ndashdecimal ndashplaces 4 -S 1 -num-slots 1 -B
weka classifiers bayesNaiveBayes -BwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNN Search-A wekacoreEuclideanDistance -R first-
last -B wekaclassifierstreesJ48 -C 025-M 2
Stacking-Cost-log
Mathematical Problems in Engineering 7
Table 5 Summary of imbalanced data sets
Data sets sam (min maj) IR feawisconsin 683 (239 444) 054 9pima 768 (268 500) 053 8haberman 306 (81 225) 036 3Vehicle1 846 (217 629) 0345 18Vehicle0 846 (199 647) 0307 18segment0 2308 (329 1979) 017 19yeast-0-3-5-9 vs 7-8 506 (50 456) 011 8yeast-0-2-5-6 vs 3-7-8-9 1004 (99 905) 011 8vowel0 988 (90 898) 010 13led7digit-0-2-4-5-6-7-8-9 vs 1 443 (37 406) 009 7Glass2 214 (17 197) 008 10cleveland-0 vs 4 173 (13 160) 008 13glass-0-1-6 vs 5 184 (9 175) 005 9car-good 1728 (69 1659) 004 6flare-F 1066 (43 1023) 004 11car-vgood 1728 (65 1663) 0039 6abalone-17 vs 7-8-9-10 2338 (58 2280) 00254 9
and variations in the amount of class overlap (see the KEELrepository [45]) Multiclass data sets were modified to obtaintwo-class imbalanced problems so that the union of one ormore classes became the positive class and the union of oneor more of the remaining classes was labeled as the negativeclass A brief description of the used date set is presented inTable 5 It includes the total number of instances (Sam)the total number of each class instances (Min Maj) theimbalance ratio (IR = the ratio of the number of minorityclass instance to majority class instance) and the number offeatures (Fea)
Our system was compared with other ensemble algo-rithms including AdaBoost with Naıve Bayes AdaBoost withcost-sensitive bagging with Naıve Bayes bagging with cost-sensitive and stacking cost-sensitive with NB k-NN C45and logistic regression All the experiments were performedby 10-fold cross-validation
42 Experimental Results Tables 6 7 and 8 respectivelyshow the results of the three metrics (AUC GeoMean andAGeoMean) for the algorithms in Table 4 obtained with thedata set in Table 5The best results are emphasized in boldfaceon each data set in these tables
The results showed that the performance of the proposedRECSG method was the best for 12 of 17 data sets in termsof GeoMean and AGeoMean and for 10 of 17 data sets interms of AUC Some methods are better than others in someevaluation metrics but not in all metric and most data sets
In Table 6 (AUC) the performance of the RECSGmethodis better than that of the 3 single-base classification methodsfor 14 of 17 data sets and better than that of other 5 ensemblealgorithms for 13 of 17 data sets In Table 7 (GeoMean)the RECSG method outperforms all the 3 single-base clas-sification methods in 15 of 17 data sets and outperformsother 5 ensemble algorithms in 15 of 17 data sets In Table 8
(AGeoMean) the RECSG method outperforms all the 3single-base classification methods in 14 of 17 data sets andoutperforms other 5 ensemble algorithms in 16 of 17 data setsFor the 17 data sets the evaluation metrics of GeoMean andAGeoMean are better than AUC
43 Statistical Tests Statistical test [46] is adopted to comparedifferent algorithms In the paper we use two types of com-parisons pairwise comparison (between a pair of algorithms)and multiple comparisons (among a group of algorithms)
431 Pair of Algorithms We performed statistic 119879-teststo explore whether RECSG approach is the significantlybetter than other algorithms in the three metrics (AUCGeoMean and AGeoMean) Table 9 shows the results ofRECSG approach compared with the other methods interms of AUC GeoMean and AGeoMean The values inthe square brackets indicate the number of the metrics withstatistically significant difference in the 119879-test performedwith the confidence level 120572 = 005 in the three evaluationmetrics As for the evaluation metric of AUC the RECSGoutperformed NB for 11 of 17 data sets and 5 data sets showedthe statistically significant differencewith the confidence levelof 120572 = 005
432 Multiple Comparisons We performed Holm post hoctest [47] and selected multiple groups to test the threemetrics (AUC GeoMean and AGeoMean) The post hocprocedures can determine whether a comparison hypothesisshould be rejected at a specified confidence level 120572 Statisticalexperiment was performed in the platform on the websitehttpteccitiususcesstac Table 10 provides the results thatRECSGapproach is significantly differentwith the confidencelevel of120572 = 005 in terms of AUCGeoMean andAGeoMean1198670 indicates that there is no significant difference According
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 2: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/2.jpg)
2 Mathematical Problems in Engineering
In boosting-based ensembles the data preprocessingtechnique is embedded into boosting algorithms In each iter-ation the data distribution is changed by altering the weightto train the next classifier towards the positive class Thesealgorithmsmainly include SMOTEBoost [14] RUSBoost [15]MSMOTEBoost [16] and DataBoost-IM [17] algorithms Inbagging-based ensembles the algorithms combine baggingwith data preprocessing techniques The algorithms are usu-ally simpler than their integration in boosting because ofsimplicity and good generalization ability of bagging Thefamily includes but not limited to OverBagging Under-OverBagging [18] UnderBagging [19] and IIVotes [20]In cost-sensitive-boosting ensembles the general learningframework of AdaBoost is maintained but amisclassificationcost adjustment function is introduced into the weightupdating formula These ensembles are usually different inthemodification way of the weight update rule AdaCost [21]CSB1 CSB2 [22] RareBoost [23] AdaC1 AdaC2 and AdaC3[24] are the most representative approaches Unlike previousalgorithms hybrid ensemble methods adopt the doubleensemble learningmethods For example EasyEnsemble andBalanceCascade use bagging as the main ensemble learningmethod but each base classifier is trained by AdaBoostTherefore the final classifier is an ensemble of ensembles
Stacking algorithm is another ensemble method andhas the same base classifiers to the Bagging and AdaBoostalgorithms but the structure is different Stacking algorithmhas a two-level structure consisting of Level 0 classifiers andLevel 1 classifiers Stacking algorithm involves two steps Thefirst step is to collect the output of each model into a newset of data For each instance in the original training set thedata set represents every modelrsquos prediction of that instancersquosclass and the models are base classifiers In the second stepbased on the new data and true labels of each instance in theoriginal training set a learning algorithm is employed to trainthe second-layer model In Wolperts terminology the firststep is referred to as the Level 0 layer and the second-stagelearning algorithm is referred to as the Level 1 layer [25]
Stacking ensemble is a general method in which a high-level model is combined with the lower-level models Stack-ing ensemble can achieve the higher predictive accuracyChen et al adopted ant colony optimization to configurestacking ensembles for data mining [26] Kadkhodaei andMoghadam proposed an entropy-based approach to searchfor the best combination of the base classifiers in ensembleclassifiers based on stack generalization [27] Czarnowski andJędrzejowicz focused on the machine classification with datareduction based on stacked generalization [28] Most of theprevious studies were focused on the way to use or generatestacking algorithm However the stacking ensemble does notconsider data distribution and is suitable for the commondata sets other than imbalance data
In order to solve the imbalance problem the paper intro-duces cost-sensitive learning into stacking ensemble and addsamisclassification cost adjustment function into theweight ofinstance and classifier In this waymisclassification costsmaybe considered in the data set as a form of data space weightingto select the best distribution for training On the other handin the combine stage metatechniques can be integrated with
cost-sensitive classifiers to replace standard cost-minimizingtechniques The weights for the misclassification of positiveinstances are higher and the weights for misclassification ofnegative instance are relatively lower The method providesan option for imbalanced learning domains
In this paper RE-sample and Cost-Sensitive StackedGeneralization (RECSG) method is proposed in order tosolve the imbalance problem In the method preprocessedimbalance data are used to train the Level 0 layer modelUnlike common ensemble algorithm for imbalanced datathe proposed method utilized cost-sensitive algorithm asthe Level 1 (meta)layer Stacking methods combined withimbalance data approaches including cost-sensitive learninghad been reported Kotsiantis proposed a stacking variantmethodologywith cost-sensitivemodels as base learners [29]In the method the model tree was replaced by MLR inmetalayer to determine the class with the highest probabilityassociated with the true class Lo et al proposed a cost-sensitive stacking method for audio tag annotation andretrieval [30] In these methods cost-sensitive learners areadopted in the base-layer model and the metalayer modelis trained by other learning algorithms such as SVM anddecision tree In this paper Level 0model generalizer involvesresampling the data and training the base classifier The cost-sensitive algorithm is used to train the Level 1 metalayerclassifier The two layers adopt the imbalanced data algo-rithms and take the full advantages of mature methods Level1 layer model has a bias towards the performance of theminority class Therefore the method proposed in the studyis more efficient than other methods in which cost-sensitivealgorithms are only used in the Level 0 layer
The method was compared with common classificationmethods including other ensemble algorithms Additionallythe evaluation metrics of the algorithm were analyzed basedon the results of statistical tests Statistical tests of evaluationmetrics demonstrated that the proposed new approach couldeffectively solve the imbalanced problem
The paper is structured as follows Related ensembleapproaches and cost-sensitive algorithms are introduced inSection 2 Section 3 introduces the details of proposedRECSG approach including Level 0 and Level 1 modelgeneralizers In Section 4 experiments and correspondingresults and analysis are presented Statistical tests of eval-uation metrics of algorithm performance are analyzed anddiscussed Finally Section 5 discusses the advantages anddisadvantages of the proposed method
2 Background
21 Performance Evaluation in Imbalanced Domains Theevaluation metric is a vital factor for the classifier model andperformance assessment In Table 1 the confusion matrixdemonstrates the results of incorrectly and correctly classifiedinstances of each class in the two classes of problems
Accuracy is the most popular evaluation metric How-ever it cannot effectively measure the correct rates of allthe classes so it is not an appropriate metric for imbalancedata sets For this reason in addition to accuracy moresuitable metrics should be considered in the imbalance
Mathematical Problems in Engineering 3
Table 1 Confusion matrix for performance evaluation
Positive prediction Negative predictionPositive class True positive (TP) False negative (FN)Negative class False positive (FP) True negative (TN)
problem Other metrics have been proposed to measurethe classification performance independently Based on theconfusion matrix (Table 1) these metrics are defined as
Precision = tp(tp + fp)
Recall = tp(tp + fn)
119865-Measure = (1 + 120573)2 lowast Recall lowast Precision1205732 lowast Recall + Precision
(1)
where 120573 is a coefficient to adjust the relative importance ofprecision versus recall (usually 120573 = 1)
FPrate = FP(FP + TN)
TPrate = TP(TP + FN)
(2)
The used combined evaluation metrics of these measuresinclude receiver operating characteristic (ROC) graphic [31]the area under the ROC curve (AUC) [2] average geometricmean of sensitivity and specificity (GeoMean) [32] (see (4))and average adjusted geometric mean (AGeoMean) [33] (see(5)) where 119873119899 refers to the proportion of the majoritysamples These metrics are defined as
AUC = (1 + TPrate minus FPrate)2 (3)
GeoMean = radicTPrate lowast TNrate (4)
AGeoMean
=
GeoMean + TNrate lowast 1198731198991 + 119873119899
TPrate gt 00 TPrate = 0
(5)
22 Stacking Bagging and AdaBoost are the most commonensemble learning algorithms In bagging method differentbase classifier models generate different classification resultsand the final decision is decided by majority voting InAdaBoost algorithm a series of base weak classifiers aretrained with the whole training set and the final decisionis generated by a weighted majority voting scheme In eachround of training iteration different weights are attributed toeach instance In the two algorithms the base classifiers arethe same
Stacking ensemble is another ensemble algorithm inwhich the prediction result of base classifiers is used as the
attribute to train the combination function in metalayer clas-sifier [25] The algorithm has a two-level structure consistingof Level 0 classifiers and Level 1 classifiers It was proposedby Wolpert and used by Breiman [34] and Leblanc andTibshirani [35]
Set a data set
119863 = (119883119899 119884119899) 119899 = 1 119873 (6)
where 119883119899 is a vector representing the attribute values ofthe instance 119899 and 119884119899 is the class value All 119873 instancesare randomly split into 119895 equivalent parts and 119895-fold cross-validation is used to train the model The prediction resultsof every time testing set gives a vector which includes 119899instances
1199101 1199102 1199101198991015840 (7)
Set 119872119895 is the difference base learning algorithm modelobtained with the data set 119863 119895 = 1 119896 and119872119895 is the partof Level 0 models Level 0 layer consists of 119895 base classifierswhich are employed to estimate the class probability of eachinstance
In119872119895 119895-fold cross-validation of training data set119863 givesthe prediction result
119910(119895)1 119910(119895)2 119910(119895)119899 119895 = 1 119896 (8)
All the119872119895 prediction results generate input space of Level1 model and the real value of instance is treated as the outputspace The model is expressed as follows
Input output
119910(1)1 119910(2)1 119910(119894)1 119910(119896)1 1199101119910(1)2 119910(2)2 119910(119894)2 119910(119896)2 1199102
119910(1)119899 119910(2)119899 119910(119894)119899 119910(119896)119899 119910119899
(9)
The above intermediate data are considered as the train-ing data of the Level 1 layer model The input data aretreated as features and the real value of instance is treatedas the output space The next step is to train the data withsome fused leaning algorithms The process is called Level 1generalizer and Level 1 model is denoted by which can beregarded as the function of (119910(1) 119910(2) 119910(119894) 119910(119896))
In the stacking process Level 0 (119872119895 119895 = 1 119896 ) iscombined with Level 1 () Given a new instance 1199091015840 models119872119895 produce a prediction result vector
119910(1) 119910(2) 119910(119894) 119910(119896) 119894 = 1 119896 (10)
Then model is used to combine base classifiers andpredict the final result of 1199091015840
In this paper for imbalance data we propose a stackedgeneralization based on cost-sensitive classification In Level0 model generalizer layer we firstly resample the imbal-ance data Resampling approaches include oversampling and
4 Mathematical Problems in Engineering
Training dataset
andtest dataset
Level 0 (base)-layer
resample(oversample) (undersample) (SMOTE)
Level 0 (base)-layerclassification middot middot middot
middot middot middot
middot middot middot
Ensemble spaceLevel 1 (meta)layerclassification
New data2New data3New data1
LMGJF1 LMGJF2 LMGJFk
M1M2 Mk
Decision space of M2 Decision space of Mk
MGN
Decision space of M1
u1 xiNi=1 u1 x
i
N
i=1( ) ( ) u2 xi
Ni=1 u2 x
i
N
i=1( ) ( ) uk xi
Ni=1 uk x
i
N
i=1( ) ( )
yiGN
N
i=1
D = xi yi
N
i=1( )
DNMN = xi
N
i=1( )
DGN = u1 xi yiNi=1 D
GN = u1 x
i yi
N
i=1 ( ) ( )
Figure 1 Flowchart of the RECSG architecture
undersampling Secondly Model119872119895 119895 = 1 119896 is trainedby the classification with new data and the prediction output119910(119895)119894 by 119895 = 1 119896 is produced with original data In Lever1 model generalizer layer model is generated by cost-sensitive classification based on logistic regression
3 Stacked Generalization forthe Imbalance Problem
The proposed RECSG architecture includes two layers Thefirst layer (Level 0) consisting of classifiers ensemble is calledthe base layer the second layer (Level 1) combined withbase classifiers is called the metalayer The flowchart of thearchitecture is shown in Figure 1
31 Level 0 Model Generalizers For the imbalance prob-lem Level 0 model generalizer step of RECSG includespreprocessing data and training the base classifier Firstlythe oversampling (SMOTE) method has been used in datapreprocessing of base classifier Secondly the base classifiermodel is trained with the new data set In this level weemployed three base algorithms Naıve Bayes (NB) [36]decision tree C45 [37] and 119896-nearest neighbors (119896-NN) [38]
(1) Naıve Bayes (NB) Given instance 119909 set 119875(119894 | 119909) is theposterior probability of class 119894 then
119875119894 (119909) =119875 (119894 | 119909)Σ119894119875 (119894 | 119909)
(11)
where NB uses an Laplacian estimate for estimating theconditional probabilities
Mathematical Problems in Engineering 5
Table 2 Cost matrix
Actual negative Actual positivePredict negative 119862 (0 0) = 11988800 119862 (0 1) = 11988801Predict positive 119862 (1 0) = 11988810 119862 (1 1) = 11988811
Table 3 Cost matrix of credit data sets
Actual bad Actual goodPredict bad 0 1Predict good 5 0
(2) Decision Tree (C45) Decision tree classification is amethod usually used in data mining [39] A decision tree isa tree where each input feature labels a nonleaf node oneclass or probability over the class labels each leaf node of thetree By recursive partitioning a tree can be ldquolearnedrdquo untilno prediction value is added or the subset has the same valueWhen the parameter of training data selects informationentropy the decision tree is named C45
(3) 119896-NN k-NN is a nonparametric algorithm used forregression or classification An instance is classified by amajority vote of its 119896-nearest neighbors If 119896 = 1 the instanceis simply classified in accordance with the class label of thenearest neighbor
All the above algorithms are simple and have low com-plexity so they are applicable weak base classifiers
32 Level 1 Model Generalizers Prediction results of severalbase classifiers in Level 0 layer are used as input space andtrue value class of instance is used as output space Based onLevel 0 layer Level 1 layer model is trained by other learningalgorithms For imbalanced data the cost-sensitive algorithmis used to train the Level 1 metalayer classifier in the paper
321 Cost-Sensitive Classifier Aiming at imbalanced datalearning problem in cost-sensitive classifier different costmatrices are used as the penalty ofmisclassified instance [40]For example a cost matrix 119862 has the following structure in abinary classification scenario in Table 2
In the cost matrix the row indicated alternative predictedclasser whereas the column indicates actual classes The costof false negative is notated as 11988801 and the cost of false positiveis notated as 11988810 Conceptually the cost of correctly classifiedinstances should always be less than the cost of incorrectlyclassified instances In the imbalance problem 11988810 is alwaysgreater than 11988801 For the German credit data set previouslyreported as the part of the Stalog project [41] the cost matrixis provided in Table 3
The cost of false good prediction is greater than the costof false bad prediction in the view of economical reason Soin Level 1 classifier of stacking for the imbalance problem thecost-sensitive algorithm is adopted
322 Logistic Regression Ting and Witten illustrated thatMLR (Multiresponse Linear Regression) algorithm had an
advantage over other Level 1 generalizers in stacking [42]In this paper the logistic regression classifier is used as themetalayer algorithm Based on the metalayer the cost of mis-classification is considered in the cost-sensitive algorithm Inlogistic regression classifier the prediction result of Level 0layer is used as the attribute of Level 1 metalayer and the realvalue of instance is used as output spaceThe linear regressionfor class 119894 is simply obtained as
LR119894 (119909) =119870
sum119895=1
120572119895119894119875119895119894 (119909) 119895 = 1 119896 (12)
Details of the implementation of RECSG are presentedbelow
33 Algorithm Pseudocode Pseudocode 1 presents the pro-posed RECSG approach Input parameters are two data setstraining set and testing set Output predicts class labelsof the test samples The first step in the process is datapreprocessing resample new instances and train model119872119895 (119895 = 1 119896) where 119895 is the number of base classifiersand 119906119895(119909119894) is the generated function of 119909119894 based on model119872119895(lines (1)-(2) in Pseudocode 1)
Then the metalayer model is constructed based onthe data 119863meta which is firstly predicted with Level 0 layer(line (3) in Pseudocode 1) Finally Lever 1 layer classification(cost-sensitive and logistic regression) is used to predict theultimate value of the tested samples
4 Empirical Investigation
The experiment aims to verify whether the RECSG approachcan improve the classification performance for the imbalanceproblem In this paper the RECSG approach was comparedwith other algorithms involving various combination ensem-ble techniques and imbalanced approaches For eachmethodthe same training set testing set and validation set were used
The experimental system has been implemented and iscomposed of 7 learning methods implemented in Weka [43]namely Naıve Bayes(1) (NB) C45(j48)(2) 119896-nearest neigh-bor (k-NN)(3) cost-sensitive(4) AdaBoost(5) Bagging(6) andstacking(7) The brief description the standard version andparameters of these methods are shown in Table 4
The RECSG approach includes 2 layers Level 1 layer(metalearning) system consists of 2 learning methods imple-mented inWeka namely simple logistic regression(9) (MLP)and cost-sensitive classifier(4) MLP is the base classificationalgorithm of cost-sensitive classifier Level 0 layer (base -learning) of stacking is composed of 3 classifiers NaıveBayes(1) C45(2) and 119896-NN(3) Evaluation metrics of algo-rithmperformance includeAUCGeoMean andAGeoMean
41 Experimental Settings Experiments were implementedwith 17 data sets from the UCI Machine Learning Repository[44] These data sets cover various fields and are based onIR measure values (from 054 to 0014) unique data setnames a varying number of samples (from 173 to 2338)
6 Mathematical Problems in Engineering
Input Training set119863 = (119909119894 119910119894)119873119894=1 Test dataset119863test = (1199091015840119894 )1198731015840
119894=1
Output Predict class labels of the test samples 1199101015840119894meta1198731015840
119894=1
For each119895 = 1 2 119870 do(1) Resample imbalance data and generate 119895ndashfold cross-validation sets to obtain New data119895(2) Train and compute 119906119895(119909119894)119873119894=1 and 119906119895(1199091015840119894 )119873
1015840
119894=1 in Level-0 (base)-layer classifier119872119895end(3) Construct119863meta = 1199061(119909119894) 119910119894119873
1015840
119894=1 and1198631015840meta = 1199061(1199091015840119894 ) 1199101198941198731015840
119894=1
(4) Based on the data119863meta classification (cost-sensitive and Logistic Regression) is used togenerate Level-1 (meta)-layer model through with1198631015840meta to predict 1199101015840119894meta119873
1015840
119894=1
Pseudocode 1 Pseudocodes of RE-sample and Cost-Sensitive Stacked Generalization
Table 4 Base classifier methods and corresponding ensemble algorithms
Methods Descriptions and parameters Abbreviations(1) Naıve Bayes wekaclassifiersbayesNaiveBayes NB(2) C45 wekaclassifierstreesJ48 -C 025 -M 2 C45
(3) 119896-nearest neighborwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNNSearch -AwekacoreEuclideanDistance -R first-last
KNN
(4) Cost-sensitive with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Cost-NB
(5) AdaBoost with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Ada-NB
(6) AdaBoost with cost-sensitive and Naıve Bayes
wekaclassifiersmetaAdaBoostM1 -P 100 -S 1-I10-W wekaClassifiersmetaCostSensitiveClassifier-- -cost-matrix [10 30 10 10] -S 1 -W weka
classifiersbayesNaiveBayes
Ada-Cost-NB
(7) Bagging with Naıve BayeswekaclassifiersmetaBagging -P 100 -S 1
-num-slots 1 -I 10 -WwekaclassifiersbayesNaiveBayes
Bag-NB
(8) Bagging with cost-sensitive and Naıve Bayes
wekaclassifiersmetaBagging -P 100 -S 1-num-slots 1 -I 10 -W
wekaclassifiersmetaCostSensitive Classifier ---cost-matrix [10 30 10 10] -S 1 -W
wekaclassifiersbayesNaiveBayes
Bag-Cost-NB
(9) Stacking with NB KNN C45 and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersfunctionsLogistic -R 10E-8 -M -1ndashnum ndashdecimal-places 4 -S 1 -num-slots 1 -B
weka classifiersbayesNaiveBayes -Bwekaclassifiers lazyIBk -K 1 -W 0 -A
wekacoreneighboursearch LinearNNSearch-A wekacoreEuclideanDistance -R
first-last -B wekaclassifierstreesJ48 -C025 -M 2
Stacking-log
(10) Stacking cost-sensitive with NB KNN C45and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersmetaCostSensitiveClassifier ndashcost
-matrix [10 30 10 10] -S 1 -Wwekaclassifiers functionsLogistic -- -R 10E-8 -M-1 ndashnum ndashdecimal ndashplaces 4 -S 1 -num-slots 1 -B
weka classifiers bayesNaiveBayes -BwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNN Search-A wekacoreEuclideanDistance -R first-
last -B wekaclassifierstreesJ48 -C 025-M 2
Stacking-Cost-log
Mathematical Problems in Engineering 7
Table 5 Summary of imbalanced data sets
Data sets sam (min maj) IR feawisconsin 683 (239 444) 054 9pima 768 (268 500) 053 8haberman 306 (81 225) 036 3Vehicle1 846 (217 629) 0345 18Vehicle0 846 (199 647) 0307 18segment0 2308 (329 1979) 017 19yeast-0-3-5-9 vs 7-8 506 (50 456) 011 8yeast-0-2-5-6 vs 3-7-8-9 1004 (99 905) 011 8vowel0 988 (90 898) 010 13led7digit-0-2-4-5-6-7-8-9 vs 1 443 (37 406) 009 7Glass2 214 (17 197) 008 10cleveland-0 vs 4 173 (13 160) 008 13glass-0-1-6 vs 5 184 (9 175) 005 9car-good 1728 (69 1659) 004 6flare-F 1066 (43 1023) 004 11car-vgood 1728 (65 1663) 0039 6abalone-17 vs 7-8-9-10 2338 (58 2280) 00254 9
and variations in the amount of class overlap (see the KEELrepository [45]) Multiclass data sets were modified to obtaintwo-class imbalanced problems so that the union of one ormore classes became the positive class and the union of oneor more of the remaining classes was labeled as the negativeclass A brief description of the used date set is presented inTable 5 It includes the total number of instances (Sam)the total number of each class instances (Min Maj) theimbalance ratio (IR = the ratio of the number of minorityclass instance to majority class instance) and the number offeatures (Fea)
Our system was compared with other ensemble algo-rithms including AdaBoost with Naıve Bayes AdaBoost withcost-sensitive bagging with Naıve Bayes bagging with cost-sensitive and stacking cost-sensitive with NB k-NN C45and logistic regression All the experiments were performedby 10-fold cross-validation
42 Experimental Results Tables 6 7 and 8 respectivelyshow the results of the three metrics (AUC GeoMean andAGeoMean) for the algorithms in Table 4 obtained with thedata set in Table 5The best results are emphasized in boldfaceon each data set in these tables
The results showed that the performance of the proposedRECSG method was the best for 12 of 17 data sets in termsof GeoMean and AGeoMean and for 10 of 17 data sets interms of AUC Some methods are better than others in someevaluation metrics but not in all metric and most data sets
In Table 6 (AUC) the performance of the RECSGmethodis better than that of the 3 single-base classification methodsfor 14 of 17 data sets and better than that of other 5 ensemblealgorithms for 13 of 17 data sets In Table 7 (GeoMean)the RECSG method outperforms all the 3 single-base clas-sification methods in 15 of 17 data sets and outperformsother 5 ensemble algorithms in 15 of 17 data sets In Table 8
(AGeoMean) the RECSG method outperforms all the 3single-base classification methods in 14 of 17 data sets andoutperforms other 5 ensemble algorithms in 16 of 17 data setsFor the 17 data sets the evaluation metrics of GeoMean andAGeoMean are better than AUC
43 Statistical Tests Statistical test [46] is adopted to comparedifferent algorithms In the paper we use two types of com-parisons pairwise comparison (between a pair of algorithms)and multiple comparisons (among a group of algorithms)
431 Pair of Algorithms We performed statistic 119879-teststo explore whether RECSG approach is the significantlybetter than other algorithms in the three metrics (AUCGeoMean and AGeoMean) Table 9 shows the results ofRECSG approach compared with the other methods interms of AUC GeoMean and AGeoMean The values inthe square brackets indicate the number of the metrics withstatistically significant difference in the 119879-test performedwith the confidence level 120572 = 005 in the three evaluationmetrics As for the evaluation metric of AUC the RECSGoutperformed NB for 11 of 17 data sets and 5 data sets showedthe statistically significant differencewith the confidence levelof 120572 = 005
432 Multiple Comparisons We performed Holm post hoctest [47] and selected multiple groups to test the threemetrics (AUC GeoMean and AGeoMean) The post hocprocedures can determine whether a comparison hypothesisshould be rejected at a specified confidence level 120572 Statisticalexperiment was performed in the platform on the websitehttpteccitiususcesstac Table 10 provides the results thatRECSGapproach is significantly differentwith the confidencelevel of120572 = 005 in terms of AUCGeoMean andAGeoMean1198670 indicates that there is no significant difference According
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 3: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/3.jpg)
Mathematical Problems in Engineering 3
Table 1 Confusion matrix for performance evaluation
Positive prediction Negative predictionPositive class True positive (TP) False negative (FN)Negative class False positive (FP) True negative (TN)
problem Other metrics have been proposed to measurethe classification performance independently Based on theconfusion matrix (Table 1) these metrics are defined as
Precision = tp(tp + fp)
Recall = tp(tp + fn)
119865-Measure = (1 + 120573)2 lowast Recall lowast Precision1205732 lowast Recall + Precision
(1)
where 120573 is a coefficient to adjust the relative importance ofprecision versus recall (usually 120573 = 1)
FPrate = FP(FP + TN)
TPrate = TP(TP + FN)
(2)
The used combined evaluation metrics of these measuresinclude receiver operating characteristic (ROC) graphic [31]the area under the ROC curve (AUC) [2] average geometricmean of sensitivity and specificity (GeoMean) [32] (see (4))and average adjusted geometric mean (AGeoMean) [33] (see(5)) where 119873119899 refers to the proportion of the majoritysamples These metrics are defined as
AUC = (1 + TPrate minus FPrate)2 (3)
GeoMean = radicTPrate lowast TNrate (4)
AGeoMean
=
GeoMean + TNrate lowast 1198731198991 + 119873119899
TPrate gt 00 TPrate = 0
(5)
22 Stacking Bagging and AdaBoost are the most commonensemble learning algorithms In bagging method differentbase classifier models generate different classification resultsand the final decision is decided by majority voting InAdaBoost algorithm a series of base weak classifiers aretrained with the whole training set and the final decisionis generated by a weighted majority voting scheme In eachround of training iteration different weights are attributed toeach instance In the two algorithms the base classifiers arethe same
Stacking ensemble is another ensemble algorithm inwhich the prediction result of base classifiers is used as the
attribute to train the combination function in metalayer clas-sifier [25] The algorithm has a two-level structure consistingof Level 0 classifiers and Level 1 classifiers It was proposedby Wolpert and used by Breiman [34] and Leblanc andTibshirani [35]
Set a data set
119863 = (119883119899 119884119899) 119899 = 1 119873 (6)
where 119883119899 is a vector representing the attribute values ofthe instance 119899 and 119884119899 is the class value All 119873 instancesare randomly split into 119895 equivalent parts and 119895-fold cross-validation is used to train the model The prediction resultsof every time testing set gives a vector which includes 119899instances
1199101 1199102 1199101198991015840 (7)
Set 119872119895 is the difference base learning algorithm modelobtained with the data set 119863 119895 = 1 119896 and119872119895 is the partof Level 0 models Level 0 layer consists of 119895 base classifierswhich are employed to estimate the class probability of eachinstance
In119872119895 119895-fold cross-validation of training data set119863 givesthe prediction result
119910(119895)1 119910(119895)2 119910(119895)119899 119895 = 1 119896 (8)
All the119872119895 prediction results generate input space of Level1 model and the real value of instance is treated as the outputspace The model is expressed as follows
Input output
119910(1)1 119910(2)1 119910(119894)1 119910(119896)1 1199101119910(1)2 119910(2)2 119910(119894)2 119910(119896)2 1199102
119910(1)119899 119910(2)119899 119910(119894)119899 119910(119896)119899 119910119899
(9)
The above intermediate data are considered as the train-ing data of the Level 1 layer model The input data aretreated as features and the real value of instance is treatedas the output space The next step is to train the data withsome fused leaning algorithms The process is called Level 1generalizer and Level 1 model is denoted by which can beregarded as the function of (119910(1) 119910(2) 119910(119894) 119910(119896))
In the stacking process Level 0 (119872119895 119895 = 1 119896 ) iscombined with Level 1 () Given a new instance 1199091015840 models119872119895 produce a prediction result vector
119910(1) 119910(2) 119910(119894) 119910(119896) 119894 = 1 119896 (10)
Then model is used to combine base classifiers andpredict the final result of 1199091015840
In this paper for imbalance data we propose a stackedgeneralization based on cost-sensitive classification In Level0 model generalizer layer we firstly resample the imbal-ance data Resampling approaches include oversampling and
4 Mathematical Problems in Engineering
Training dataset
andtest dataset
Level 0 (base)-layer
resample(oversample) (undersample) (SMOTE)
Level 0 (base)-layerclassification middot middot middot
middot middot middot
middot middot middot
Ensemble spaceLevel 1 (meta)layerclassification
New data2New data3New data1
LMGJF1 LMGJF2 LMGJFk
M1M2 Mk
Decision space of M2 Decision space of Mk
MGN
Decision space of M1
u1 xiNi=1 u1 x
i
N
i=1( ) ( ) u2 xi
Ni=1 u2 x
i
N
i=1( ) ( ) uk xi
Ni=1 uk x
i
N
i=1( ) ( )
yiGN
N
i=1
D = xi yi
N
i=1( )
DNMN = xi
N
i=1( )
DGN = u1 xi yiNi=1 D
GN = u1 x
i yi
N
i=1 ( ) ( )
Figure 1 Flowchart of the RECSG architecture
undersampling Secondly Model119872119895 119895 = 1 119896 is trainedby the classification with new data and the prediction output119910(119895)119894 by 119895 = 1 119896 is produced with original data In Lever1 model generalizer layer model is generated by cost-sensitive classification based on logistic regression
3 Stacked Generalization forthe Imbalance Problem
The proposed RECSG architecture includes two layers Thefirst layer (Level 0) consisting of classifiers ensemble is calledthe base layer the second layer (Level 1) combined withbase classifiers is called the metalayer The flowchart of thearchitecture is shown in Figure 1
31 Level 0 Model Generalizers For the imbalance prob-lem Level 0 model generalizer step of RECSG includespreprocessing data and training the base classifier Firstlythe oversampling (SMOTE) method has been used in datapreprocessing of base classifier Secondly the base classifiermodel is trained with the new data set In this level weemployed three base algorithms Naıve Bayes (NB) [36]decision tree C45 [37] and 119896-nearest neighbors (119896-NN) [38]
(1) Naıve Bayes (NB) Given instance 119909 set 119875(119894 | 119909) is theposterior probability of class 119894 then
119875119894 (119909) =119875 (119894 | 119909)Σ119894119875 (119894 | 119909)
(11)
where NB uses an Laplacian estimate for estimating theconditional probabilities
Mathematical Problems in Engineering 5
Table 2 Cost matrix
Actual negative Actual positivePredict negative 119862 (0 0) = 11988800 119862 (0 1) = 11988801Predict positive 119862 (1 0) = 11988810 119862 (1 1) = 11988811
Table 3 Cost matrix of credit data sets
Actual bad Actual goodPredict bad 0 1Predict good 5 0
(2) Decision Tree (C45) Decision tree classification is amethod usually used in data mining [39] A decision tree isa tree where each input feature labels a nonleaf node oneclass or probability over the class labels each leaf node of thetree By recursive partitioning a tree can be ldquolearnedrdquo untilno prediction value is added or the subset has the same valueWhen the parameter of training data selects informationentropy the decision tree is named C45
(3) 119896-NN k-NN is a nonparametric algorithm used forregression or classification An instance is classified by amajority vote of its 119896-nearest neighbors If 119896 = 1 the instanceis simply classified in accordance with the class label of thenearest neighbor
All the above algorithms are simple and have low com-plexity so they are applicable weak base classifiers
32 Level 1 Model Generalizers Prediction results of severalbase classifiers in Level 0 layer are used as input space andtrue value class of instance is used as output space Based onLevel 0 layer Level 1 layer model is trained by other learningalgorithms For imbalanced data the cost-sensitive algorithmis used to train the Level 1 metalayer classifier in the paper
321 Cost-Sensitive Classifier Aiming at imbalanced datalearning problem in cost-sensitive classifier different costmatrices are used as the penalty ofmisclassified instance [40]For example a cost matrix 119862 has the following structure in abinary classification scenario in Table 2
In the cost matrix the row indicated alternative predictedclasser whereas the column indicates actual classes The costof false negative is notated as 11988801 and the cost of false positiveis notated as 11988810 Conceptually the cost of correctly classifiedinstances should always be less than the cost of incorrectlyclassified instances In the imbalance problem 11988810 is alwaysgreater than 11988801 For the German credit data set previouslyreported as the part of the Stalog project [41] the cost matrixis provided in Table 3
The cost of false good prediction is greater than the costof false bad prediction in the view of economical reason Soin Level 1 classifier of stacking for the imbalance problem thecost-sensitive algorithm is adopted
322 Logistic Regression Ting and Witten illustrated thatMLR (Multiresponse Linear Regression) algorithm had an
advantage over other Level 1 generalizers in stacking [42]In this paper the logistic regression classifier is used as themetalayer algorithm Based on the metalayer the cost of mis-classification is considered in the cost-sensitive algorithm Inlogistic regression classifier the prediction result of Level 0layer is used as the attribute of Level 1 metalayer and the realvalue of instance is used as output spaceThe linear regressionfor class 119894 is simply obtained as
LR119894 (119909) =119870
sum119895=1
120572119895119894119875119895119894 (119909) 119895 = 1 119896 (12)
Details of the implementation of RECSG are presentedbelow
33 Algorithm Pseudocode Pseudocode 1 presents the pro-posed RECSG approach Input parameters are two data setstraining set and testing set Output predicts class labelsof the test samples The first step in the process is datapreprocessing resample new instances and train model119872119895 (119895 = 1 119896) where 119895 is the number of base classifiersand 119906119895(119909119894) is the generated function of 119909119894 based on model119872119895(lines (1)-(2) in Pseudocode 1)
Then the metalayer model is constructed based onthe data 119863meta which is firstly predicted with Level 0 layer(line (3) in Pseudocode 1) Finally Lever 1 layer classification(cost-sensitive and logistic regression) is used to predict theultimate value of the tested samples
4 Empirical Investigation
The experiment aims to verify whether the RECSG approachcan improve the classification performance for the imbalanceproblem In this paper the RECSG approach was comparedwith other algorithms involving various combination ensem-ble techniques and imbalanced approaches For eachmethodthe same training set testing set and validation set were used
The experimental system has been implemented and iscomposed of 7 learning methods implemented in Weka [43]namely Naıve Bayes(1) (NB) C45(j48)(2) 119896-nearest neigh-bor (k-NN)(3) cost-sensitive(4) AdaBoost(5) Bagging(6) andstacking(7) The brief description the standard version andparameters of these methods are shown in Table 4
The RECSG approach includes 2 layers Level 1 layer(metalearning) system consists of 2 learning methods imple-mented inWeka namely simple logistic regression(9) (MLP)and cost-sensitive classifier(4) MLP is the base classificationalgorithm of cost-sensitive classifier Level 0 layer (base -learning) of stacking is composed of 3 classifiers NaıveBayes(1) C45(2) and 119896-NN(3) Evaluation metrics of algo-rithmperformance includeAUCGeoMean andAGeoMean
41 Experimental Settings Experiments were implementedwith 17 data sets from the UCI Machine Learning Repository[44] These data sets cover various fields and are based onIR measure values (from 054 to 0014) unique data setnames a varying number of samples (from 173 to 2338)
6 Mathematical Problems in Engineering
Input Training set119863 = (119909119894 119910119894)119873119894=1 Test dataset119863test = (1199091015840119894 )1198731015840
119894=1
Output Predict class labels of the test samples 1199101015840119894meta1198731015840
119894=1
For each119895 = 1 2 119870 do(1) Resample imbalance data and generate 119895ndashfold cross-validation sets to obtain New data119895(2) Train and compute 119906119895(119909119894)119873119894=1 and 119906119895(1199091015840119894 )119873
1015840
119894=1 in Level-0 (base)-layer classifier119872119895end(3) Construct119863meta = 1199061(119909119894) 119910119894119873
1015840
119894=1 and1198631015840meta = 1199061(1199091015840119894 ) 1199101198941198731015840
119894=1
(4) Based on the data119863meta classification (cost-sensitive and Logistic Regression) is used togenerate Level-1 (meta)-layer model through with1198631015840meta to predict 1199101015840119894meta119873
1015840
119894=1
Pseudocode 1 Pseudocodes of RE-sample and Cost-Sensitive Stacked Generalization
Table 4 Base classifier methods and corresponding ensemble algorithms
Methods Descriptions and parameters Abbreviations(1) Naıve Bayes wekaclassifiersbayesNaiveBayes NB(2) C45 wekaclassifierstreesJ48 -C 025 -M 2 C45
(3) 119896-nearest neighborwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNNSearch -AwekacoreEuclideanDistance -R first-last
KNN
(4) Cost-sensitive with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Cost-NB
(5) AdaBoost with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Ada-NB
(6) AdaBoost with cost-sensitive and Naıve Bayes
wekaclassifiersmetaAdaBoostM1 -P 100 -S 1-I10-W wekaClassifiersmetaCostSensitiveClassifier-- -cost-matrix [10 30 10 10] -S 1 -W weka
classifiersbayesNaiveBayes
Ada-Cost-NB
(7) Bagging with Naıve BayeswekaclassifiersmetaBagging -P 100 -S 1
-num-slots 1 -I 10 -WwekaclassifiersbayesNaiveBayes
Bag-NB
(8) Bagging with cost-sensitive and Naıve Bayes
wekaclassifiersmetaBagging -P 100 -S 1-num-slots 1 -I 10 -W
wekaclassifiersmetaCostSensitive Classifier ---cost-matrix [10 30 10 10] -S 1 -W
wekaclassifiersbayesNaiveBayes
Bag-Cost-NB
(9) Stacking with NB KNN C45 and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersfunctionsLogistic -R 10E-8 -M -1ndashnum ndashdecimal-places 4 -S 1 -num-slots 1 -B
weka classifiersbayesNaiveBayes -Bwekaclassifiers lazyIBk -K 1 -W 0 -A
wekacoreneighboursearch LinearNNSearch-A wekacoreEuclideanDistance -R
first-last -B wekaclassifierstreesJ48 -C025 -M 2
Stacking-log
(10) Stacking cost-sensitive with NB KNN C45and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersmetaCostSensitiveClassifier ndashcost
-matrix [10 30 10 10] -S 1 -Wwekaclassifiers functionsLogistic -- -R 10E-8 -M-1 ndashnum ndashdecimal ndashplaces 4 -S 1 -num-slots 1 -B
weka classifiers bayesNaiveBayes -BwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNN Search-A wekacoreEuclideanDistance -R first-
last -B wekaclassifierstreesJ48 -C 025-M 2
Stacking-Cost-log
Mathematical Problems in Engineering 7
Table 5 Summary of imbalanced data sets
Data sets sam (min maj) IR feawisconsin 683 (239 444) 054 9pima 768 (268 500) 053 8haberman 306 (81 225) 036 3Vehicle1 846 (217 629) 0345 18Vehicle0 846 (199 647) 0307 18segment0 2308 (329 1979) 017 19yeast-0-3-5-9 vs 7-8 506 (50 456) 011 8yeast-0-2-5-6 vs 3-7-8-9 1004 (99 905) 011 8vowel0 988 (90 898) 010 13led7digit-0-2-4-5-6-7-8-9 vs 1 443 (37 406) 009 7Glass2 214 (17 197) 008 10cleveland-0 vs 4 173 (13 160) 008 13glass-0-1-6 vs 5 184 (9 175) 005 9car-good 1728 (69 1659) 004 6flare-F 1066 (43 1023) 004 11car-vgood 1728 (65 1663) 0039 6abalone-17 vs 7-8-9-10 2338 (58 2280) 00254 9
and variations in the amount of class overlap (see the KEELrepository [45]) Multiclass data sets were modified to obtaintwo-class imbalanced problems so that the union of one ormore classes became the positive class and the union of oneor more of the remaining classes was labeled as the negativeclass A brief description of the used date set is presented inTable 5 It includes the total number of instances (Sam)the total number of each class instances (Min Maj) theimbalance ratio (IR = the ratio of the number of minorityclass instance to majority class instance) and the number offeatures (Fea)
Our system was compared with other ensemble algo-rithms including AdaBoost with Naıve Bayes AdaBoost withcost-sensitive bagging with Naıve Bayes bagging with cost-sensitive and stacking cost-sensitive with NB k-NN C45and logistic regression All the experiments were performedby 10-fold cross-validation
42 Experimental Results Tables 6 7 and 8 respectivelyshow the results of the three metrics (AUC GeoMean andAGeoMean) for the algorithms in Table 4 obtained with thedata set in Table 5The best results are emphasized in boldfaceon each data set in these tables
The results showed that the performance of the proposedRECSG method was the best for 12 of 17 data sets in termsof GeoMean and AGeoMean and for 10 of 17 data sets interms of AUC Some methods are better than others in someevaluation metrics but not in all metric and most data sets
In Table 6 (AUC) the performance of the RECSGmethodis better than that of the 3 single-base classification methodsfor 14 of 17 data sets and better than that of other 5 ensemblealgorithms for 13 of 17 data sets In Table 7 (GeoMean)the RECSG method outperforms all the 3 single-base clas-sification methods in 15 of 17 data sets and outperformsother 5 ensemble algorithms in 15 of 17 data sets In Table 8
(AGeoMean) the RECSG method outperforms all the 3single-base classification methods in 14 of 17 data sets andoutperforms other 5 ensemble algorithms in 16 of 17 data setsFor the 17 data sets the evaluation metrics of GeoMean andAGeoMean are better than AUC
43 Statistical Tests Statistical test [46] is adopted to comparedifferent algorithms In the paper we use two types of com-parisons pairwise comparison (between a pair of algorithms)and multiple comparisons (among a group of algorithms)
431 Pair of Algorithms We performed statistic 119879-teststo explore whether RECSG approach is the significantlybetter than other algorithms in the three metrics (AUCGeoMean and AGeoMean) Table 9 shows the results ofRECSG approach compared with the other methods interms of AUC GeoMean and AGeoMean The values inthe square brackets indicate the number of the metrics withstatistically significant difference in the 119879-test performedwith the confidence level 120572 = 005 in the three evaluationmetrics As for the evaluation metric of AUC the RECSGoutperformed NB for 11 of 17 data sets and 5 data sets showedthe statistically significant differencewith the confidence levelof 120572 = 005
432 Multiple Comparisons We performed Holm post hoctest [47] and selected multiple groups to test the threemetrics (AUC GeoMean and AGeoMean) The post hocprocedures can determine whether a comparison hypothesisshould be rejected at a specified confidence level 120572 Statisticalexperiment was performed in the platform on the websitehttpteccitiususcesstac Table 10 provides the results thatRECSGapproach is significantly differentwith the confidencelevel of120572 = 005 in terms of AUCGeoMean andAGeoMean1198670 indicates that there is no significant difference According
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 4: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/4.jpg)
4 Mathematical Problems in Engineering
Training dataset
andtest dataset
Level 0 (base)-layer
resample(oversample) (undersample) (SMOTE)
Level 0 (base)-layerclassification middot middot middot
middot middot middot
middot middot middot
Ensemble spaceLevel 1 (meta)layerclassification
New data2New data3New data1
LMGJF1 LMGJF2 LMGJFk
M1M2 Mk
Decision space of M2 Decision space of Mk
MGN
Decision space of M1
u1 xiNi=1 u1 x
i
N
i=1( ) ( ) u2 xi
Ni=1 u2 x
i
N
i=1( ) ( ) uk xi
Ni=1 uk x
i
N
i=1( ) ( )
yiGN
N
i=1
D = xi yi
N
i=1( )
DNMN = xi
N
i=1( )
DGN = u1 xi yiNi=1 D
GN = u1 x
i yi
N
i=1 ( ) ( )
Figure 1 Flowchart of the RECSG architecture
undersampling Secondly Model119872119895 119895 = 1 119896 is trainedby the classification with new data and the prediction output119910(119895)119894 by 119895 = 1 119896 is produced with original data In Lever1 model generalizer layer model is generated by cost-sensitive classification based on logistic regression
3 Stacked Generalization forthe Imbalance Problem
The proposed RECSG architecture includes two layers Thefirst layer (Level 0) consisting of classifiers ensemble is calledthe base layer the second layer (Level 1) combined withbase classifiers is called the metalayer The flowchart of thearchitecture is shown in Figure 1
31 Level 0 Model Generalizers For the imbalance prob-lem Level 0 model generalizer step of RECSG includespreprocessing data and training the base classifier Firstlythe oversampling (SMOTE) method has been used in datapreprocessing of base classifier Secondly the base classifiermodel is trained with the new data set In this level weemployed three base algorithms Naıve Bayes (NB) [36]decision tree C45 [37] and 119896-nearest neighbors (119896-NN) [38]
(1) Naıve Bayes (NB) Given instance 119909 set 119875(119894 | 119909) is theposterior probability of class 119894 then
119875119894 (119909) =119875 (119894 | 119909)Σ119894119875 (119894 | 119909)
(11)
where NB uses an Laplacian estimate for estimating theconditional probabilities
Mathematical Problems in Engineering 5
Table 2 Cost matrix
Actual negative Actual positivePredict negative 119862 (0 0) = 11988800 119862 (0 1) = 11988801Predict positive 119862 (1 0) = 11988810 119862 (1 1) = 11988811
Table 3 Cost matrix of credit data sets
Actual bad Actual goodPredict bad 0 1Predict good 5 0
(2) Decision Tree (C45) Decision tree classification is amethod usually used in data mining [39] A decision tree isa tree where each input feature labels a nonleaf node oneclass or probability over the class labels each leaf node of thetree By recursive partitioning a tree can be ldquolearnedrdquo untilno prediction value is added or the subset has the same valueWhen the parameter of training data selects informationentropy the decision tree is named C45
(3) 119896-NN k-NN is a nonparametric algorithm used forregression or classification An instance is classified by amajority vote of its 119896-nearest neighbors If 119896 = 1 the instanceis simply classified in accordance with the class label of thenearest neighbor
All the above algorithms are simple and have low com-plexity so they are applicable weak base classifiers
32 Level 1 Model Generalizers Prediction results of severalbase classifiers in Level 0 layer are used as input space andtrue value class of instance is used as output space Based onLevel 0 layer Level 1 layer model is trained by other learningalgorithms For imbalanced data the cost-sensitive algorithmis used to train the Level 1 metalayer classifier in the paper
321 Cost-Sensitive Classifier Aiming at imbalanced datalearning problem in cost-sensitive classifier different costmatrices are used as the penalty ofmisclassified instance [40]For example a cost matrix 119862 has the following structure in abinary classification scenario in Table 2
In the cost matrix the row indicated alternative predictedclasser whereas the column indicates actual classes The costof false negative is notated as 11988801 and the cost of false positiveis notated as 11988810 Conceptually the cost of correctly classifiedinstances should always be less than the cost of incorrectlyclassified instances In the imbalance problem 11988810 is alwaysgreater than 11988801 For the German credit data set previouslyreported as the part of the Stalog project [41] the cost matrixis provided in Table 3
The cost of false good prediction is greater than the costof false bad prediction in the view of economical reason Soin Level 1 classifier of stacking for the imbalance problem thecost-sensitive algorithm is adopted
322 Logistic Regression Ting and Witten illustrated thatMLR (Multiresponse Linear Regression) algorithm had an
advantage over other Level 1 generalizers in stacking [42]In this paper the logistic regression classifier is used as themetalayer algorithm Based on the metalayer the cost of mis-classification is considered in the cost-sensitive algorithm Inlogistic regression classifier the prediction result of Level 0layer is used as the attribute of Level 1 metalayer and the realvalue of instance is used as output spaceThe linear regressionfor class 119894 is simply obtained as
LR119894 (119909) =119870
sum119895=1
120572119895119894119875119895119894 (119909) 119895 = 1 119896 (12)
Details of the implementation of RECSG are presentedbelow
33 Algorithm Pseudocode Pseudocode 1 presents the pro-posed RECSG approach Input parameters are two data setstraining set and testing set Output predicts class labelsof the test samples The first step in the process is datapreprocessing resample new instances and train model119872119895 (119895 = 1 119896) where 119895 is the number of base classifiersand 119906119895(119909119894) is the generated function of 119909119894 based on model119872119895(lines (1)-(2) in Pseudocode 1)
Then the metalayer model is constructed based onthe data 119863meta which is firstly predicted with Level 0 layer(line (3) in Pseudocode 1) Finally Lever 1 layer classification(cost-sensitive and logistic regression) is used to predict theultimate value of the tested samples
4 Empirical Investigation
The experiment aims to verify whether the RECSG approachcan improve the classification performance for the imbalanceproblem In this paper the RECSG approach was comparedwith other algorithms involving various combination ensem-ble techniques and imbalanced approaches For eachmethodthe same training set testing set and validation set were used
The experimental system has been implemented and iscomposed of 7 learning methods implemented in Weka [43]namely Naıve Bayes(1) (NB) C45(j48)(2) 119896-nearest neigh-bor (k-NN)(3) cost-sensitive(4) AdaBoost(5) Bagging(6) andstacking(7) The brief description the standard version andparameters of these methods are shown in Table 4
The RECSG approach includes 2 layers Level 1 layer(metalearning) system consists of 2 learning methods imple-mented inWeka namely simple logistic regression(9) (MLP)and cost-sensitive classifier(4) MLP is the base classificationalgorithm of cost-sensitive classifier Level 0 layer (base -learning) of stacking is composed of 3 classifiers NaıveBayes(1) C45(2) and 119896-NN(3) Evaluation metrics of algo-rithmperformance includeAUCGeoMean andAGeoMean
41 Experimental Settings Experiments were implementedwith 17 data sets from the UCI Machine Learning Repository[44] These data sets cover various fields and are based onIR measure values (from 054 to 0014) unique data setnames a varying number of samples (from 173 to 2338)
6 Mathematical Problems in Engineering
Input Training set119863 = (119909119894 119910119894)119873119894=1 Test dataset119863test = (1199091015840119894 )1198731015840
119894=1
Output Predict class labels of the test samples 1199101015840119894meta1198731015840
119894=1
For each119895 = 1 2 119870 do(1) Resample imbalance data and generate 119895ndashfold cross-validation sets to obtain New data119895(2) Train and compute 119906119895(119909119894)119873119894=1 and 119906119895(1199091015840119894 )119873
1015840
119894=1 in Level-0 (base)-layer classifier119872119895end(3) Construct119863meta = 1199061(119909119894) 119910119894119873
1015840
119894=1 and1198631015840meta = 1199061(1199091015840119894 ) 1199101198941198731015840
119894=1
(4) Based on the data119863meta classification (cost-sensitive and Logistic Regression) is used togenerate Level-1 (meta)-layer model through with1198631015840meta to predict 1199101015840119894meta119873
1015840
119894=1
Pseudocode 1 Pseudocodes of RE-sample and Cost-Sensitive Stacked Generalization
Table 4 Base classifier methods and corresponding ensemble algorithms
Methods Descriptions and parameters Abbreviations(1) Naıve Bayes wekaclassifiersbayesNaiveBayes NB(2) C45 wekaclassifierstreesJ48 -C 025 -M 2 C45
(3) 119896-nearest neighborwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNNSearch -AwekacoreEuclideanDistance -R first-last
KNN
(4) Cost-sensitive with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Cost-NB
(5) AdaBoost with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Ada-NB
(6) AdaBoost with cost-sensitive and Naıve Bayes
wekaclassifiersmetaAdaBoostM1 -P 100 -S 1-I10-W wekaClassifiersmetaCostSensitiveClassifier-- -cost-matrix [10 30 10 10] -S 1 -W weka
classifiersbayesNaiveBayes
Ada-Cost-NB
(7) Bagging with Naıve BayeswekaclassifiersmetaBagging -P 100 -S 1
-num-slots 1 -I 10 -WwekaclassifiersbayesNaiveBayes
Bag-NB
(8) Bagging with cost-sensitive and Naıve Bayes
wekaclassifiersmetaBagging -P 100 -S 1-num-slots 1 -I 10 -W
wekaclassifiersmetaCostSensitive Classifier ---cost-matrix [10 30 10 10] -S 1 -W
wekaclassifiersbayesNaiveBayes
Bag-Cost-NB
(9) Stacking with NB KNN C45 and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersfunctionsLogistic -R 10E-8 -M -1ndashnum ndashdecimal-places 4 -S 1 -num-slots 1 -B
weka classifiersbayesNaiveBayes -Bwekaclassifiers lazyIBk -K 1 -W 0 -A
wekacoreneighboursearch LinearNNSearch-A wekacoreEuclideanDistance -R
first-last -B wekaclassifierstreesJ48 -C025 -M 2
Stacking-log
(10) Stacking cost-sensitive with NB KNN C45and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersmetaCostSensitiveClassifier ndashcost
-matrix [10 30 10 10] -S 1 -Wwekaclassifiers functionsLogistic -- -R 10E-8 -M-1 ndashnum ndashdecimal ndashplaces 4 -S 1 -num-slots 1 -B
weka classifiers bayesNaiveBayes -BwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNN Search-A wekacoreEuclideanDistance -R first-
last -B wekaclassifierstreesJ48 -C 025-M 2
Stacking-Cost-log
Mathematical Problems in Engineering 7
Table 5 Summary of imbalanced data sets
Data sets sam (min maj) IR feawisconsin 683 (239 444) 054 9pima 768 (268 500) 053 8haberman 306 (81 225) 036 3Vehicle1 846 (217 629) 0345 18Vehicle0 846 (199 647) 0307 18segment0 2308 (329 1979) 017 19yeast-0-3-5-9 vs 7-8 506 (50 456) 011 8yeast-0-2-5-6 vs 3-7-8-9 1004 (99 905) 011 8vowel0 988 (90 898) 010 13led7digit-0-2-4-5-6-7-8-9 vs 1 443 (37 406) 009 7Glass2 214 (17 197) 008 10cleveland-0 vs 4 173 (13 160) 008 13glass-0-1-6 vs 5 184 (9 175) 005 9car-good 1728 (69 1659) 004 6flare-F 1066 (43 1023) 004 11car-vgood 1728 (65 1663) 0039 6abalone-17 vs 7-8-9-10 2338 (58 2280) 00254 9
and variations in the amount of class overlap (see the KEELrepository [45]) Multiclass data sets were modified to obtaintwo-class imbalanced problems so that the union of one ormore classes became the positive class and the union of oneor more of the remaining classes was labeled as the negativeclass A brief description of the used date set is presented inTable 5 It includes the total number of instances (Sam)the total number of each class instances (Min Maj) theimbalance ratio (IR = the ratio of the number of minorityclass instance to majority class instance) and the number offeatures (Fea)
Our system was compared with other ensemble algo-rithms including AdaBoost with Naıve Bayes AdaBoost withcost-sensitive bagging with Naıve Bayes bagging with cost-sensitive and stacking cost-sensitive with NB k-NN C45and logistic regression All the experiments were performedby 10-fold cross-validation
42 Experimental Results Tables 6 7 and 8 respectivelyshow the results of the three metrics (AUC GeoMean andAGeoMean) for the algorithms in Table 4 obtained with thedata set in Table 5The best results are emphasized in boldfaceon each data set in these tables
The results showed that the performance of the proposedRECSG method was the best for 12 of 17 data sets in termsof GeoMean and AGeoMean and for 10 of 17 data sets interms of AUC Some methods are better than others in someevaluation metrics but not in all metric and most data sets
In Table 6 (AUC) the performance of the RECSGmethodis better than that of the 3 single-base classification methodsfor 14 of 17 data sets and better than that of other 5 ensemblealgorithms for 13 of 17 data sets In Table 7 (GeoMean)the RECSG method outperforms all the 3 single-base clas-sification methods in 15 of 17 data sets and outperformsother 5 ensemble algorithms in 15 of 17 data sets In Table 8
(AGeoMean) the RECSG method outperforms all the 3single-base classification methods in 14 of 17 data sets andoutperforms other 5 ensemble algorithms in 16 of 17 data setsFor the 17 data sets the evaluation metrics of GeoMean andAGeoMean are better than AUC
43 Statistical Tests Statistical test [46] is adopted to comparedifferent algorithms In the paper we use two types of com-parisons pairwise comparison (between a pair of algorithms)and multiple comparisons (among a group of algorithms)
431 Pair of Algorithms We performed statistic 119879-teststo explore whether RECSG approach is the significantlybetter than other algorithms in the three metrics (AUCGeoMean and AGeoMean) Table 9 shows the results ofRECSG approach compared with the other methods interms of AUC GeoMean and AGeoMean The values inthe square brackets indicate the number of the metrics withstatistically significant difference in the 119879-test performedwith the confidence level 120572 = 005 in the three evaluationmetrics As for the evaluation metric of AUC the RECSGoutperformed NB for 11 of 17 data sets and 5 data sets showedthe statistically significant differencewith the confidence levelof 120572 = 005
432 Multiple Comparisons We performed Holm post hoctest [47] and selected multiple groups to test the threemetrics (AUC GeoMean and AGeoMean) The post hocprocedures can determine whether a comparison hypothesisshould be rejected at a specified confidence level 120572 Statisticalexperiment was performed in the platform on the websitehttpteccitiususcesstac Table 10 provides the results thatRECSGapproach is significantly differentwith the confidencelevel of120572 = 005 in terms of AUCGeoMean andAGeoMean1198670 indicates that there is no significant difference According
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 5: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/5.jpg)
Mathematical Problems in Engineering 5
Table 2 Cost matrix
Actual negative Actual positivePredict negative 119862 (0 0) = 11988800 119862 (0 1) = 11988801Predict positive 119862 (1 0) = 11988810 119862 (1 1) = 11988811
Table 3 Cost matrix of credit data sets
Actual bad Actual goodPredict bad 0 1Predict good 5 0
(2) Decision Tree (C45) Decision tree classification is amethod usually used in data mining [39] A decision tree isa tree where each input feature labels a nonleaf node oneclass or probability over the class labels each leaf node of thetree By recursive partitioning a tree can be ldquolearnedrdquo untilno prediction value is added or the subset has the same valueWhen the parameter of training data selects informationentropy the decision tree is named C45
(3) 119896-NN k-NN is a nonparametric algorithm used forregression or classification An instance is classified by amajority vote of its 119896-nearest neighbors If 119896 = 1 the instanceis simply classified in accordance with the class label of thenearest neighbor
All the above algorithms are simple and have low com-plexity so they are applicable weak base classifiers
32 Level 1 Model Generalizers Prediction results of severalbase classifiers in Level 0 layer are used as input space andtrue value class of instance is used as output space Based onLevel 0 layer Level 1 layer model is trained by other learningalgorithms For imbalanced data the cost-sensitive algorithmis used to train the Level 1 metalayer classifier in the paper
321 Cost-Sensitive Classifier Aiming at imbalanced datalearning problem in cost-sensitive classifier different costmatrices are used as the penalty ofmisclassified instance [40]For example a cost matrix 119862 has the following structure in abinary classification scenario in Table 2
In the cost matrix the row indicated alternative predictedclasser whereas the column indicates actual classes The costof false negative is notated as 11988801 and the cost of false positiveis notated as 11988810 Conceptually the cost of correctly classifiedinstances should always be less than the cost of incorrectlyclassified instances In the imbalance problem 11988810 is alwaysgreater than 11988801 For the German credit data set previouslyreported as the part of the Stalog project [41] the cost matrixis provided in Table 3
The cost of false good prediction is greater than the costof false bad prediction in the view of economical reason Soin Level 1 classifier of stacking for the imbalance problem thecost-sensitive algorithm is adopted
322 Logistic Regression Ting and Witten illustrated thatMLR (Multiresponse Linear Regression) algorithm had an
advantage over other Level 1 generalizers in stacking [42]In this paper the logistic regression classifier is used as themetalayer algorithm Based on the metalayer the cost of mis-classification is considered in the cost-sensitive algorithm Inlogistic regression classifier the prediction result of Level 0layer is used as the attribute of Level 1 metalayer and the realvalue of instance is used as output spaceThe linear regressionfor class 119894 is simply obtained as
LR119894 (119909) =119870
sum119895=1
120572119895119894119875119895119894 (119909) 119895 = 1 119896 (12)
Details of the implementation of RECSG are presentedbelow
33 Algorithm Pseudocode Pseudocode 1 presents the pro-posed RECSG approach Input parameters are two data setstraining set and testing set Output predicts class labelsof the test samples The first step in the process is datapreprocessing resample new instances and train model119872119895 (119895 = 1 119896) where 119895 is the number of base classifiersand 119906119895(119909119894) is the generated function of 119909119894 based on model119872119895(lines (1)-(2) in Pseudocode 1)
Then the metalayer model is constructed based onthe data 119863meta which is firstly predicted with Level 0 layer(line (3) in Pseudocode 1) Finally Lever 1 layer classification(cost-sensitive and logistic regression) is used to predict theultimate value of the tested samples
4 Empirical Investigation
The experiment aims to verify whether the RECSG approachcan improve the classification performance for the imbalanceproblem In this paper the RECSG approach was comparedwith other algorithms involving various combination ensem-ble techniques and imbalanced approaches For eachmethodthe same training set testing set and validation set were used
The experimental system has been implemented and iscomposed of 7 learning methods implemented in Weka [43]namely Naıve Bayes(1) (NB) C45(j48)(2) 119896-nearest neigh-bor (k-NN)(3) cost-sensitive(4) AdaBoost(5) Bagging(6) andstacking(7) The brief description the standard version andparameters of these methods are shown in Table 4
The RECSG approach includes 2 layers Level 1 layer(metalearning) system consists of 2 learning methods imple-mented inWeka namely simple logistic regression(9) (MLP)and cost-sensitive classifier(4) MLP is the base classificationalgorithm of cost-sensitive classifier Level 0 layer (base -learning) of stacking is composed of 3 classifiers NaıveBayes(1) C45(2) and 119896-NN(3) Evaluation metrics of algo-rithmperformance includeAUCGeoMean andAGeoMean
41 Experimental Settings Experiments were implementedwith 17 data sets from the UCI Machine Learning Repository[44] These data sets cover various fields and are based onIR measure values (from 054 to 0014) unique data setnames a varying number of samples (from 173 to 2338)
6 Mathematical Problems in Engineering
Input Training set119863 = (119909119894 119910119894)119873119894=1 Test dataset119863test = (1199091015840119894 )1198731015840
119894=1
Output Predict class labels of the test samples 1199101015840119894meta1198731015840
119894=1
For each119895 = 1 2 119870 do(1) Resample imbalance data and generate 119895ndashfold cross-validation sets to obtain New data119895(2) Train and compute 119906119895(119909119894)119873119894=1 and 119906119895(1199091015840119894 )119873
1015840
119894=1 in Level-0 (base)-layer classifier119872119895end(3) Construct119863meta = 1199061(119909119894) 119910119894119873
1015840
119894=1 and1198631015840meta = 1199061(1199091015840119894 ) 1199101198941198731015840
119894=1
(4) Based on the data119863meta classification (cost-sensitive and Logistic Regression) is used togenerate Level-1 (meta)-layer model through with1198631015840meta to predict 1199101015840119894meta119873
1015840
119894=1
Pseudocode 1 Pseudocodes of RE-sample and Cost-Sensitive Stacked Generalization
Table 4 Base classifier methods and corresponding ensemble algorithms
Methods Descriptions and parameters Abbreviations(1) Naıve Bayes wekaclassifiersbayesNaiveBayes NB(2) C45 wekaclassifierstreesJ48 -C 025 -M 2 C45
(3) 119896-nearest neighborwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNNSearch -AwekacoreEuclideanDistance -R first-last
KNN
(4) Cost-sensitive with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Cost-NB
(5) AdaBoost with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Ada-NB
(6) AdaBoost with cost-sensitive and Naıve Bayes
wekaclassifiersmetaAdaBoostM1 -P 100 -S 1-I10-W wekaClassifiersmetaCostSensitiveClassifier-- -cost-matrix [10 30 10 10] -S 1 -W weka
classifiersbayesNaiveBayes
Ada-Cost-NB
(7) Bagging with Naıve BayeswekaclassifiersmetaBagging -P 100 -S 1
-num-slots 1 -I 10 -WwekaclassifiersbayesNaiveBayes
Bag-NB
(8) Bagging with cost-sensitive and Naıve Bayes
wekaclassifiersmetaBagging -P 100 -S 1-num-slots 1 -I 10 -W
wekaclassifiersmetaCostSensitive Classifier ---cost-matrix [10 30 10 10] -S 1 -W
wekaclassifiersbayesNaiveBayes
Bag-Cost-NB
(9) Stacking with NB KNN C45 and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersfunctionsLogistic -R 10E-8 -M -1ndashnum ndashdecimal-places 4 -S 1 -num-slots 1 -B
weka classifiersbayesNaiveBayes -Bwekaclassifiers lazyIBk -K 1 -W 0 -A
wekacoreneighboursearch LinearNNSearch-A wekacoreEuclideanDistance -R
first-last -B wekaclassifierstreesJ48 -C025 -M 2
Stacking-log
(10) Stacking cost-sensitive with NB KNN C45and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersmetaCostSensitiveClassifier ndashcost
-matrix [10 30 10 10] -S 1 -Wwekaclassifiers functionsLogistic -- -R 10E-8 -M-1 ndashnum ndashdecimal ndashplaces 4 -S 1 -num-slots 1 -B
weka classifiers bayesNaiveBayes -BwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNN Search-A wekacoreEuclideanDistance -R first-
last -B wekaclassifierstreesJ48 -C 025-M 2
Stacking-Cost-log
Mathematical Problems in Engineering 7
Table 5 Summary of imbalanced data sets
Data sets sam (min maj) IR feawisconsin 683 (239 444) 054 9pima 768 (268 500) 053 8haberman 306 (81 225) 036 3Vehicle1 846 (217 629) 0345 18Vehicle0 846 (199 647) 0307 18segment0 2308 (329 1979) 017 19yeast-0-3-5-9 vs 7-8 506 (50 456) 011 8yeast-0-2-5-6 vs 3-7-8-9 1004 (99 905) 011 8vowel0 988 (90 898) 010 13led7digit-0-2-4-5-6-7-8-9 vs 1 443 (37 406) 009 7Glass2 214 (17 197) 008 10cleveland-0 vs 4 173 (13 160) 008 13glass-0-1-6 vs 5 184 (9 175) 005 9car-good 1728 (69 1659) 004 6flare-F 1066 (43 1023) 004 11car-vgood 1728 (65 1663) 0039 6abalone-17 vs 7-8-9-10 2338 (58 2280) 00254 9
and variations in the amount of class overlap (see the KEELrepository [45]) Multiclass data sets were modified to obtaintwo-class imbalanced problems so that the union of one ormore classes became the positive class and the union of oneor more of the remaining classes was labeled as the negativeclass A brief description of the used date set is presented inTable 5 It includes the total number of instances (Sam)the total number of each class instances (Min Maj) theimbalance ratio (IR = the ratio of the number of minorityclass instance to majority class instance) and the number offeatures (Fea)
Our system was compared with other ensemble algo-rithms including AdaBoost with Naıve Bayes AdaBoost withcost-sensitive bagging with Naıve Bayes bagging with cost-sensitive and stacking cost-sensitive with NB k-NN C45and logistic regression All the experiments were performedby 10-fold cross-validation
42 Experimental Results Tables 6 7 and 8 respectivelyshow the results of the three metrics (AUC GeoMean andAGeoMean) for the algorithms in Table 4 obtained with thedata set in Table 5The best results are emphasized in boldfaceon each data set in these tables
The results showed that the performance of the proposedRECSG method was the best for 12 of 17 data sets in termsof GeoMean and AGeoMean and for 10 of 17 data sets interms of AUC Some methods are better than others in someevaluation metrics but not in all metric and most data sets
In Table 6 (AUC) the performance of the RECSGmethodis better than that of the 3 single-base classification methodsfor 14 of 17 data sets and better than that of other 5 ensemblealgorithms for 13 of 17 data sets In Table 7 (GeoMean)the RECSG method outperforms all the 3 single-base clas-sification methods in 15 of 17 data sets and outperformsother 5 ensemble algorithms in 15 of 17 data sets In Table 8
(AGeoMean) the RECSG method outperforms all the 3single-base classification methods in 14 of 17 data sets andoutperforms other 5 ensemble algorithms in 16 of 17 data setsFor the 17 data sets the evaluation metrics of GeoMean andAGeoMean are better than AUC
43 Statistical Tests Statistical test [46] is adopted to comparedifferent algorithms In the paper we use two types of com-parisons pairwise comparison (between a pair of algorithms)and multiple comparisons (among a group of algorithms)
431 Pair of Algorithms We performed statistic 119879-teststo explore whether RECSG approach is the significantlybetter than other algorithms in the three metrics (AUCGeoMean and AGeoMean) Table 9 shows the results ofRECSG approach compared with the other methods interms of AUC GeoMean and AGeoMean The values inthe square brackets indicate the number of the metrics withstatistically significant difference in the 119879-test performedwith the confidence level 120572 = 005 in the three evaluationmetrics As for the evaluation metric of AUC the RECSGoutperformed NB for 11 of 17 data sets and 5 data sets showedthe statistically significant differencewith the confidence levelof 120572 = 005
432 Multiple Comparisons We performed Holm post hoctest [47] and selected multiple groups to test the threemetrics (AUC GeoMean and AGeoMean) The post hocprocedures can determine whether a comparison hypothesisshould be rejected at a specified confidence level 120572 Statisticalexperiment was performed in the platform on the websitehttpteccitiususcesstac Table 10 provides the results thatRECSGapproach is significantly differentwith the confidencelevel of120572 = 005 in terms of AUCGeoMean andAGeoMean1198670 indicates that there is no significant difference According
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 6: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/6.jpg)
6 Mathematical Problems in Engineering
Input Training set119863 = (119909119894 119910119894)119873119894=1 Test dataset119863test = (1199091015840119894 )1198731015840
119894=1
Output Predict class labels of the test samples 1199101015840119894meta1198731015840
119894=1
For each119895 = 1 2 119870 do(1) Resample imbalance data and generate 119895ndashfold cross-validation sets to obtain New data119895(2) Train and compute 119906119895(119909119894)119873119894=1 and 119906119895(1199091015840119894 )119873
1015840
119894=1 in Level-0 (base)-layer classifier119872119895end(3) Construct119863meta = 1199061(119909119894) 119910119894119873
1015840
119894=1 and1198631015840meta = 1199061(1199091015840119894 ) 1199101198941198731015840
119894=1
(4) Based on the data119863meta classification (cost-sensitive and Logistic Regression) is used togenerate Level-1 (meta)-layer model through with1198631015840meta to predict 1199101015840119894meta119873
1015840
119894=1
Pseudocode 1 Pseudocodes of RE-sample and Cost-Sensitive Stacked Generalization
Table 4 Base classifier methods and corresponding ensemble algorithms
Methods Descriptions and parameters Abbreviations(1) Naıve Bayes wekaclassifiersbayesNaiveBayes NB(2) C45 wekaclassifierstreesJ48 -C 025 -M 2 C45
(3) 119896-nearest neighborwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNNSearch -AwekacoreEuclideanDistance -R first-last
KNN
(4) Cost-sensitive with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Cost-NB
(5) AdaBoost with Naıve Bayes wekaclassifiersmetaAdaBoostM1 -P 100 -S 1 -I 10-W wekaclassifiersbayesNaiveBayes Ada-NB
(6) AdaBoost with cost-sensitive and Naıve Bayes
wekaclassifiersmetaAdaBoostM1 -P 100 -S 1-I10-W wekaClassifiersmetaCostSensitiveClassifier-- -cost-matrix [10 30 10 10] -S 1 -W weka
classifiersbayesNaiveBayes
Ada-Cost-NB
(7) Bagging with Naıve BayeswekaclassifiersmetaBagging -P 100 -S 1
-num-slots 1 -I 10 -WwekaclassifiersbayesNaiveBayes
Bag-NB
(8) Bagging with cost-sensitive and Naıve Bayes
wekaclassifiersmetaBagging -P 100 -S 1-num-slots 1 -I 10 -W
wekaclassifiersmetaCostSensitive Classifier ---cost-matrix [10 30 10 10] -S 1 -W
wekaclassifiersbayesNaiveBayes
Bag-Cost-NB
(9) Stacking with NB KNN C45 and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersfunctionsLogistic -R 10E-8 -M -1ndashnum ndashdecimal-places 4 -S 1 -num-slots 1 -B
weka classifiersbayesNaiveBayes -Bwekaclassifiers lazyIBk -K 1 -W 0 -A
wekacoreneighboursearch LinearNNSearch-A wekacoreEuclideanDistance -R
first-last -B wekaclassifierstreesJ48 -C025 -M 2
Stacking-log
(10) Stacking cost-sensitive with NB KNN C45and logistic
wekaclassifiersmetaStacking -X 10 -M wekaclassifiersmetaCostSensitiveClassifier ndashcost
-matrix [10 30 10 10] -S 1 -Wwekaclassifiers functionsLogistic -- -R 10E-8 -M-1 ndashnum ndashdecimal ndashplaces 4 -S 1 -num-slots 1 -B
weka classifiers bayesNaiveBayes -BwekaclassifierslazyIBk -K 1 -W 0 -A
wekacoreneighboursearchLinearNN Search-A wekacoreEuclideanDistance -R first-
last -B wekaclassifierstreesJ48 -C 025-M 2
Stacking-Cost-log
Mathematical Problems in Engineering 7
Table 5 Summary of imbalanced data sets
Data sets sam (min maj) IR feawisconsin 683 (239 444) 054 9pima 768 (268 500) 053 8haberman 306 (81 225) 036 3Vehicle1 846 (217 629) 0345 18Vehicle0 846 (199 647) 0307 18segment0 2308 (329 1979) 017 19yeast-0-3-5-9 vs 7-8 506 (50 456) 011 8yeast-0-2-5-6 vs 3-7-8-9 1004 (99 905) 011 8vowel0 988 (90 898) 010 13led7digit-0-2-4-5-6-7-8-9 vs 1 443 (37 406) 009 7Glass2 214 (17 197) 008 10cleveland-0 vs 4 173 (13 160) 008 13glass-0-1-6 vs 5 184 (9 175) 005 9car-good 1728 (69 1659) 004 6flare-F 1066 (43 1023) 004 11car-vgood 1728 (65 1663) 0039 6abalone-17 vs 7-8-9-10 2338 (58 2280) 00254 9
and variations in the amount of class overlap (see the KEELrepository [45]) Multiclass data sets were modified to obtaintwo-class imbalanced problems so that the union of one ormore classes became the positive class and the union of oneor more of the remaining classes was labeled as the negativeclass A brief description of the used date set is presented inTable 5 It includes the total number of instances (Sam)the total number of each class instances (Min Maj) theimbalance ratio (IR = the ratio of the number of minorityclass instance to majority class instance) and the number offeatures (Fea)
Our system was compared with other ensemble algo-rithms including AdaBoost with Naıve Bayes AdaBoost withcost-sensitive bagging with Naıve Bayes bagging with cost-sensitive and stacking cost-sensitive with NB k-NN C45and logistic regression All the experiments were performedby 10-fold cross-validation
42 Experimental Results Tables 6 7 and 8 respectivelyshow the results of the three metrics (AUC GeoMean andAGeoMean) for the algorithms in Table 4 obtained with thedata set in Table 5The best results are emphasized in boldfaceon each data set in these tables
The results showed that the performance of the proposedRECSG method was the best for 12 of 17 data sets in termsof GeoMean and AGeoMean and for 10 of 17 data sets interms of AUC Some methods are better than others in someevaluation metrics but not in all metric and most data sets
In Table 6 (AUC) the performance of the RECSGmethodis better than that of the 3 single-base classification methodsfor 14 of 17 data sets and better than that of other 5 ensemblealgorithms for 13 of 17 data sets In Table 7 (GeoMean)the RECSG method outperforms all the 3 single-base clas-sification methods in 15 of 17 data sets and outperformsother 5 ensemble algorithms in 15 of 17 data sets In Table 8
(AGeoMean) the RECSG method outperforms all the 3single-base classification methods in 14 of 17 data sets andoutperforms other 5 ensemble algorithms in 16 of 17 data setsFor the 17 data sets the evaluation metrics of GeoMean andAGeoMean are better than AUC
43 Statistical Tests Statistical test [46] is adopted to comparedifferent algorithms In the paper we use two types of com-parisons pairwise comparison (between a pair of algorithms)and multiple comparisons (among a group of algorithms)
431 Pair of Algorithms We performed statistic 119879-teststo explore whether RECSG approach is the significantlybetter than other algorithms in the three metrics (AUCGeoMean and AGeoMean) Table 9 shows the results ofRECSG approach compared with the other methods interms of AUC GeoMean and AGeoMean The values inthe square brackets indicate the number of the metrics withstatistically significant difference in the 119879-test performedwith the confidence level 120572 = 005 in the three evaluationmetrics As for the evaluation metric of AUC the RECSGoutperformed NB for 11 of 17 data sets and 5 data sets showedthe statistically significant differencewith the confidence levelof 120572 = 005
432 Multiple Comparisons We performed Holm post hoctest [47] and selected multiple groups to test the threemetrics (AUC GeoMean and AGeoMean) The post hocprocedures can determine whether a comparison hypothesisshould be rejected at a specified confidence level 120572 Statisticalexperiment was performed in the platform on the websitehttpteccitiususcesstac Table 10 provides the results thatRECSGapproach is significantly differentwith the confidencelevel of120572 = 005 in terms of AUCGeoMean andAGeoMean1198670 indicates that there is no significant difference According
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 7: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/7.jpg)
Mathematical Problems in Engineering 7
Table 5 Summary of imbalanced data sets
Data sets sam (min maj) IR feawisconsin 683 (239 444) 054 9pima 768 (268 500) 053 8haberman 306 (81 225) 036 3Vehicle1 846 (217 629) 0345 18Vehicle0 846 (199 647) 0307 18segment0 2308 (329 1979) 017 19yeast-0-3-5-9 vs 7-8 506 (50 456) 011 8yeast-0-2-5-6 vs 3-7-8-9 1004 (99 905) 011 8vowel0 988 (90 898) 010 13led7digit-0-2-4-5-6-7-8-9 vs 1 443 (37 406) 009 7Glass2 214 (17 197) 008 10cleveland-0 vs 4 173 (13 160) 008 13glass-0-1-6 vs 5 184 (9 175) 005 9car-good 1728 (69 1659) 004 6flare-F 1066 (43 1023) 004 11car-vgood 1728 (65 1663) 0039 6abalone-17 vs 7-8-9-10 2338 (58 2280) 00254 9
and variations in the amount of class overlap (see the KEELrepository [45]) Multiclass data sets were modified to obtaintwo-class imbalanced problems so that the union of one ormore classes became the positive class and the union of oneor more of the remaining classes was labeled as the negativeclass A brief description of the used date set is presented inTable 5 It includes the total number of instances (Sam)the total number of each class instances (Min Maj) theimbalance ratio (IR = the ratio of the number of minorityclass instance to majority class instance) and the number offeatures (Fea)
Our system was compared with other ensemble algo-rithms including AdaBoost with Naıve Bayes AdaBoost withcost-sensitive bagging with Naıve Bayes bagging with cost-sensitive and stacking cost-sensitive with NB k-NN C45and logistic regression All the experiments were performedby 10-fold cross-validation
42 Experimental Results Tables 6 7 and 8 respectivelyshow the results of the three metrics (AUC GeoMean andAGeoMean) for the algorithms in Table 4 obtained with thedata set in Table 5The best results are emphasized in boldfaceon each data set in these tables
The results showed that the performance of the proposedRECSG method was the best for 12 of 17 data sets in termsof GeoMean and AGeoMean and for 10 of 17 data sets interms of AUC Some methods are better than others in someevaluation metrics but not in all metric and most data sets
In Table 6 (AUC) the performance of the RECSGmethodis better than that of the 3 single-base classification methodsfor 14 of 17 data sets and better than that of other 5 ensemblealgorithms for 13 of 17 data sets In Table 7 (GeoMean)the RECSG method outperforms all the 3 single-base clas-sification methods in 15 of 17 data sets and outperformsother 5 ensemble algorithms in 15 of 17 data sets In Table 8
(AGeoMean) the RECSG method outperforms all the 3single-base classification methods in 14 of 17 data sets andoutperforms other 5 ensemble algorithms in 16 of 17 data setsFor the 17 data sets the evaluation metrics of GeoMean andAGeoMean are better than AUC
43 Statistical Tests Statistical test [46] is adopted to comparedifferent algorithms In the paper we use two types of com-parisons pairwise comparison (between a pair of algorithms)and multiple comparisons (among a group of algorithms)
431 Pair of Algorithms We performed statistic 119879-teststo explore whether RECSG approach is the significantlybetter than other algorithms in the three metrics (AUCGeoMean and AGeoMean) Table 9 shows the results ofRECSG approach compared with the other methods interms of AUC GeoMean and AGeoMean The values inthe square brackets indicate the number of the metrics withstatistically significant difference in the 119879-test performedwith the confidence level 120572 = 005 in the three evaluationmetrics As for the evaluation metric of AUC the RECSGoutperformed NB for 11 of 17 data sets and 5 data sets showedthe statistically significant differencewith the confidence levelof 120572 = 005
432 Multiple Comparisons We performed Holm post hoctest [47] and selected multiple groups to test the threemetrics (AUC GeoMean and AGeoMean) The post hocprocedures can determine whether a comparison hypothesisshould be rejected at a specified confidence level 120572 Statisticalexperiment was performed in the platform on the websitehttpteccitiususcesstac Table 10 provides the results thatRECSGapproach is significantly differentwith the confidencelevel of120572 = 005 in terms of AUCGeoMean andAGeoMean1198670 indicates that there is no significant difference According
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 8: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/8.jpg)
8 Mathematical Problems in Engineering
Table6Re
sults
ofotherm
etho
dsin
term
sofA
UC
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
098
60957
0953
0989
0981
0982
099
0989
0982
0982
pima
0819
0751
0662
0819
0804
0808
0817
0819
0819
0819
haberm
an064
0615
053
064
0667
0625
0638
064
70636
063
Vehicle
10711
0722
0671
0711
0769
0723
0715
0712
0782
0782
Vehicle
00814
093
092
0814
0784
0781
0808
0808
0977
097
7segm
ent0
0987
0971
0996
0987
0944
0959
0987
0986
0999
099
9yeast-0
-3-5-9
vs7-8
075
0675
0653
075
0751
0686
075
0754
0769
0773
yeast-0
-2-5-6
vs3-7-8-9
0832
0687
0782
0832
079
0808
084
0842
084
7082
vowel0
098
0974
1098
0996
0994
0981
098
11
led7digit-0
-2-4-5-6-7-8-9
vs1
095
40925
0827
0954
0898
0893
0953
0949
0909
0904
Glass2
0716
0812
0587
0716
0737
0735
0748
0737
0713
0721
cleveland
-0vs
40909
0824
0754
0909
0826
0891
0915
0913
0979
0976
glass-0-1-6
vs5
0934
0989
0825
0934
0975
0905
0983
0985
09
099
car-good
0976
0493
064
80976
0968
0971
0976
0976
0653
097
7flare-F
0908
0525
059
0911
0876
0893
0908
0912
0825
0806
car-vgoo
d0997
0992
069
0997
0997
0997
0997
0997
0998
099
8abalon
e-17
vs7-8-9-10
0775
0613
0624
0775
0817
073
0773
0774
0752
0726
Mean
0864
0791
0748
0864
0858
0846
0869
0869
0855
0875
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 9: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/9.jpg)
Mathematical Problems in Engineering 9
Table7Re
sults
ofotherm
etho
dsin
term
sofG
eoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0965
0954
0953
0968
0959
0967
0968
0966
096
4097
0pima
0719
0697
0649
0736
0715
0739
0723
0731
0703
0741
haberm
an0419
0566
0455
0586
046
60573
0434
0594
0361
060
5Ve
hicle
10677
0622
0652
0674
0607
0677
0673
0661
0561
0712
Vehicle
00724
0912
0919
0726
0735
0733
0719
0733
0894
094
6segm
ent0
0897
0977
0995
0892
0897
0906
0902
0895
099
40993
yeast-0
-3-5-9
vs7-8
046
70463
060
9046
4046
60550
044
50485
044
4060
3yeast-0
-2-5-6
vs3-7-8-9
0762
0590
0762
0793
0762
0792
0762
0787
0706
079
8vowel0
0920
0969
1000
0913
0944
0929
0921
0914
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0578
0472
0470
0564
0578
0610
0586
0568
000
00607
cleveland
-0vs
4089
90724
0722
0936
0611
0863
0866
0861
0866
0866
glass-0-1-6
vs5
0932
0877
0810
0932
0932
0872
0872
0869
0814
0877
car-good
000
0000
00558
0794
0520
0783
000
00778
0561
0813
flare-F
0751
000
00450
0781
0541
064
70738
0781
000
00774
car-vgoo
d0496
0888
0626
0986
0917
0925
0481
0971
0957
098
8abalon
e-17
vs7-8-9-10
0628
0321
0506
064
00319
0368
0627
0630
0293
066
4Mean
0687
064
10703
0779
0696
0753
0680
0769
064
50814
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 10: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/10.jpg)
10 Mathematical Problems in Engineering
Table8Re
sults
ofotherm
etho
dsin
term
sofA
GeoMean
Datas
etNB
C45
KNN
Cost-N
BAd
a-NB
Ada-Cost-N
BBa
g-NB
Bag-Cost-NB
Stacking
-log
Stacking
-Cost-log
wisc
onsin
0961
0960
0961
0962
0963
0962
0962
0960
0967
096
7pima
076
80743
0706
0734
0767
0736
0775
0729
0770
0726
haberm
an064
20680
0601
0718
066
00694
0653
0721
0613
0663
Vehicle
10690
0721
0728
0658
0722
0669
0690
0654
0709
0740
Vehicle
0066
60929
0934
0658
0681
0663
066
40663
0924
093
6segm
ent0
0859
0986
0996
0850
0859
0871
0863
0854
0995
099
5yeast-0
-3-5-9
vs7-8
0715
0706
0782
0709
0713
0737
0703
0721
0700
0770
yeast-0
-2-5-6
vs3-7-8-9
0856
0778
0855
0857
0856
0855
0856
0854
0839
086
6vowel0
0929
0981
1000
0909
0966
0948
0931
0910
1000
1000
led7digit-0
-2-4-5-6-7-8-9
vs1
0853
0874
0818
0869
0863
0870
0845
0850
0846
0874
Glass2
0511
0701
0695
0493
0511
0551
0539
0515
000
00744
cleveland
-0vs
4092
70845
0841
0943
0783
0914
0918
0910
0918
0918
glass-0-1-6
vs5
095
40932
0894
0954
0954
0923
0923
0919
0902
0932
car-good
000
0000
00763
0878
0747
0870
000
00873
0769
089
0flare-F
0840
000
00705
0842
0751
0795
0836
0842
000
0084
5car-vgoo
d0743
0934
0793
0986
0953
0956
0725
0980
0974
099
0abalon
e-17
vs7-8-9-10
0736
0655
0745
0716
0650
0672
0734
0711
064
1080
4Mean
0744
0731
0813
0808
0788
0805
0742
08039
0739
086
2
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 11: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/11.jpg)
Mathematical Problems in Engineering 11
Table 9 Comparison between RECSG and other algorithms in terms of winslosses of the results
NB C45 KNN Cost-NB Ada-NB Ada-Cost-NB Bag-NB Bag-Cost-
NB Stacking-log
Stacking-Cost-log(AUC)
11 [5]6 [3] 15 [11]2 [2] 17 [15]0 [2] 11 [6]6 [5] 13 [7]4 [2] 14 [9]3 [1] 10 [6]7 [5] 10 [6]7 [5] 11 [3]6 [2]
Stacking-Cost-log(GeoMean)
15 [13]2 [2] 17 [13]0 [0] 15 [13]2 [1] 14 [8]3 [2] 16 [12]1 [1] 16 [10]1 [0] 17 [13]0 [0] 16 [10]3 [0] 16 [12]1 [0]
Stacking-Cost-log(AGeoMean)
14 [10]3 [1] 15 [9]2 [2] 15 [12]2 [1] 13 [7]4 [3] 15 [11]2 [2] 15 [10]2 [2] 16 [10]1 [1] 15 [8]2 [1] 16 [10]1 [1]
Table 10 Holm post hoc test to show differences of Stacking-Cost-log (AUC) and other algorithms
Rejected (1198670) Accepted (1198670)Stacking-Cost-log (AUC) versus other algorithms 3 6Stacking-Cost-log (GeoMean) versus other algorithms 8 1Stacking-Cost-log (AGeoMean) versus other algorithms 9 0
to the evaluation metric of GeoMean the RECSG is signifi-cantly better than other 8 methods with the confidence levelof 120572 = 005
44 Discussion According to the experiments performedwith 17 different imbalance data sets and the comparisonwith other 9 classification methods the proposed RECSGmethod showed the higher performance than other methodsin terms of AUC GeoMean and AGeoMean as illustratedin Tables 7 8 and 9 The RECSG method showed the betterperformance for 10 of 17 cases in terms of AUC and for 12 of17 cases in terms of GeoMean and AGeoMean The methodoutperformed other 5 ensemble algorithms for 16 of 17 datasets in terms ofGeoMean andAGeoMean and for 13 of 17 datasets in terms of AUC GeoMean and AGeoMean are betterthan AUC in evaluation metrics in 17 data sets The means ofRECSGmethod in terms of GeoMean AGeoMean and AUCare all higher than other methods The Holm post hoc testshows that RECSGmethod significantly outperforms the restwith the confidence level 120572 = 005 in terms of AGeoMean for8 of 9methods in terms of AGeoMean and for 3 of 9methodsin terms of AUC
Experimental results and statistical tests show that theRECSG approach has improved the classification perfor-mance of imbalanced data sets The reasons can be explainedas follows
Firstly stacking algorithm uses to combine the resultof base classifier whereas bagging employs majority voteBagging is only a simple decision combinationmethodwhichrequires neither cross-validation nor Level 1 learning In thispaper stacked generalization adopts logistic regressionthus providing the simplest linear combination of pooling theLevel 0 modelsrsquo confidence
Secondly cost-sensitive algorithm can affect imbalancedata sets in two aspects Firstly cost-sensitive modificationscan be applied in the probabilistic estimationMoreover cost-sensitive factors change theweight of instance and theweight
of the minority class is higher than that of the majorityclass Therefore the method directly changes the number ofcommon instances without discarding or duplicating any ofthe rare instances
Thirdly RECSG method introduces cost-sensitive learn-ing into stacking ensemble The cost-sensitive function in replaces the error-minimizing function thus allowing Level 1learning to be prone to focus on the minority class
The results in Tables 7 and 8 demonstrate that the RECSGmethod has the higher performancewhen the evaluation per-formance metric of base classifier is weaker (such as Vehicle1Vehicle0 and car-vgood) The reason is that the alternativemethods of NB C45 and 119896-NN have shortcomings Theindependence assumption hampers the performance of NBin some data sets In C45 tree construction the selectionof attributes affects the model performances Error ratio of119896-NN is relatively high when data sets are in imbalanceTherefore logistic regression adopted in can improve theperformance when base classifier is weaker
The performance of the RECSG method is generallybetter when the IR is low (such as Glass2 car-good flare-Fcar-vgood and abalone-17 vs 7-8-9-10) The performance ofRECSG method is probably related to the setting of the costmatrix Different data sets should use different cost matricesFor the purpose of simplicity in the paper we have adoptedthe same cost matrices whichmay bemore suitable to low IRvalues
5 Conclusions
In this paper in order to solve the class imbalance problemwe proposed the RECSG method based on 2-layer learningmodels Experimental results and statistical tests showedthat the RECSG approach improved the classification per-formance The proposed RECSG approach may have therelatively high computational complexity in the trainingstage because the approach involves 2-layer classifier models
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 12: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/12.jpg)
12 Mathematical Problems in Engineering
which consist of several base classifiers and metaclassifiersThe number and kinds of the base-level classifiers areclosely related to the performance of stacking algorithmIn the fusion stage of base classifiers the selection of themetaclassifier is also important In this paper in order tovalidate the performance improvement compared with othercurrent classification algorithms we only randomly selected3 classification algorithms (NB 119896-NN and C45) as baseclassifier and cost-sensitive algorithm as metaclassifier inthe fuse stage Selection of the number or kind of the baseclassifiers and metaclassifier was not discussed Thereforewe will explore the diversity and quality of base classifier inthe future The adoption of these strategies would improvethe prediction performance and reduce the training time ofstacking algorithm for imbalance problems
Conflicts of Interest
The author declares that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This research is supported by the Natural Science Foundationof Shanxi Province under Grant no 2015011039
References
[1] O Loyola-Gonzalez J F Martınez-Trinidad J A Carrasco-Ochoa and M Garcıa-Borroto ldquoEffect of class imbalance onquality measures for contrast patterns an experimental studyrdquoInformation Sciences vol 374 pp 179ndash192 2016
[2] C Beyan and R Fisher ldquoClassifying imbalanced data setsusing similarity based hierarchical decompositionrdquo PatternRecognition vol 48 no 5 pp 1653ndash1672 2015
[3] H He and E A Garcia ldquoLearning from imbalanced datardquo IEEETransactions on Knowledge and Data Engineering vol 21 no 9pp 1263ndash1284 2009
[4] V C Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002
[5] X-Y Liu J X Wu and Z-H Zhou ldquoExploratory under-sampling for class-imbalance learningrdquo in Proceedings of the 6thInternational Conference on Data Mining (ICDM rsquo06) pp 965ndash969 IEEE Hong Kong December 2006
[6] N Japkowicz ldquoThe class imbalance problem significance andstrategiesrdquo in Proceedings of the International Conference onArtificial Intelligence 2000
[7] B Zadrozny and C Elkan ldquoLearning and making decisionswhen costs and probabilities are both unknownrdquo in Proceedingsof the the seventh ACM SIGKDD international conference pp204ndash213 San Francisco Calif USA August 2001
[8] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging boosting and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012
[9] N V Chawla D A Cieslak L O Hall and A Joshi ldquoAuto-matically countering imbalance and its empirical relationship
to costrdquo Data Mining and Knowledge Discovery vol 17 no 2pp 225ndash252 2008
[10] A Freitas A Costapereira and P Brazdil ldquoCost-SensitiveDecision Trees Applied to Medical Datardquo in InternationalConference on Data Warehousing and Knowledge Discovery pp303ndash312 2007
[11] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996
[12] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990
[13] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997
[14] N V Chawla A Lazarevic L O Hall and K W BowyerldquoSMOTEBoost improving prediction of the minority class inboostingrdquo in Proceeding of Knowledge Discovery in Databasesvol 2838 pp 107ndash119
[15] C Seiffert T M Khoshgoftaar J VanHulse and A NapolitanoldquoRUSBoost A hybrid approach to alleviating class imbalancerdquoIEEE Transactions on Systems Man and Cybernetics Systemsvol 40 no 1 pp 185ndash197 2010
[16] S Hu Y Liang L Ma and Y He ldquoMSMOTE Improvingclassification performance when training data is imbalancedrdquoin Proceedings of the 2nd International Workshop on ComputerScience and Engineering WCSE 2009 pp 13ndash17 China October2009
[17] H Guo and H L Viktor ldquoLearning from imbalanced data setswith boosting and data generationrdquoACMSIGKDDExplorationsNewsletter vol 6 no 1 pp 30ndash39 2004
[18] SWang and X Yao ldquoDiversity analysis on imbalanced data setsby using ensemble modelsrdquo in Proceedings of the 2009 IEEESymposium on Computational Intelligence and Data MiningCIDM 2009 pp 324ndash331 USA April 2009
[19] R Barandela J S Srsquoanchez and R M Valdovinos ldquoNewapplications of ensembles of classifiersrdquo PAA Pattern Analysisand Applications vol 6 no 3 pp 245ndash256 2003
[20] J Błaszczynski M Deckert J Stefanowski and S WilkldquoIntegrating selective pre-processing of imbalanced data withIvotes ensemblerdquo Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics) Preface vol 6086 pp 148ndash157 2010
[21] W Fan S J Stolfo J Zhang and P K Chan ldquoAdacostMisclassication cost-sensitive boostingrdquo in the 6th Int ConfMach Learning pp 97ndash105 San Francisco CA USA 1999
[22] K M Ting ldquoA comparative study of cost-sensitive boostingalgorithmsrdquo in Proc 17th Int Conf Mach Learning pp 983ndash990 Stanford CA USA 2000
[23] M Joshi V Kumar and R Agarwal ldquoEvaluating boosting algo-rithms to classify rare classes comparison and improvementsrdquoinProceedings of the 2001 IEEE International Conference onDataMining pp 257ndash264 San Jose CA USA
[24] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007
[25] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992
[26] Y Chen M L Wong and H Li ldquoApplying Ant Colony Opti-mization to configuring stacking ensembles for data miningrdquoExpert Systems with Applications vol 41 no 6 pp 2688ndash27022014
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 13: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/13.jpg)
Mathematical Problems in Engineering 13
[27] H Kadkhodaei and A M E Moghadam ldquoAn entropy basedapproach to find the best combination of the base classi-fiers in ensemble classifiers based on stack generalizationrdquoin Proceedings of the 4th International Conference on ControlInstrumentation and Automation ICCIA 2016 pp 425ndash429Iran January 2016
[28] I Czarnowski and P Jędrzejowicz ldquoAn approach to machineclassification based on stacked generalization and instanceselectionrdquo in Proceedings of the IEEE International Conferenceon Systems Man and Cybernetics SMC 2016 pp 4836ndash4841Hungary 2017
[29] S Kotsiantis ldquoStacking cost sensitive modelsrdquo in Proceedings ofthe 12th Pan-Hellenic Conference on Informatics PCI 2008 pp217ndash221 Greece August 2008
[30] H Y Lo J C Wang H MWang and S D Lin ldquoCost-sensitivestacking for audio tag annotation and retrievalrdquo in Proceedingsof the 36th IEEE International Conference on Acoustics Speechand Signal Processing ICASSP 2011 pp 2308ndash2311 CzechRepublic May 2011
[31] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005
[32] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one sided selection inrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICMLrsquo97 pp179ndash186 1997
[33] R Batuwita and V Palade ldquoAdjusted geometric-mean A novelperformance measure for imbalanced bioinformatics datasetslearningrdquo Journal of Bioinformatics and Computational Biologyvol 10 no 4 Article ID 1250003 2012
[34] L Breiman ldquoStacked regressionsrdquoMachine Learning vol 24 no1 pp 49ndash64 1996
[35] M Leblanc and R Tibshirani ldquoCombining estimates in regres-sion and classificationrdquo Journal of the American StatisticalAssociation vol 91 no 436 pp 1641ndash1650 1996
[36] B Cestnik ldquoEstimating Probabilities A Crucial Task inMachine Learningrdquo in Proceedings of the European Conferenceon Artificial Intelligence pp 147ndash149 1990
[37] J R Quinlan C45 Program for machine learning MorganKaufmann 1993
[38] N S Altman ldquoAn introduction to kernel and nearest-neighbornonparametric regressionrdquo The American Statistician vol 46no 3 pp 175ndash185 1992
[39] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986
[40] C Elkan ldquoThe foundations of cost-sensitive learningrdquo in Pro-ceedings of the 17th International Joint Conference on ArtificialIntelligence (IJCAI rsquo01) pp 973ndash978 New York NY USAAugust 2001
[41] D Michie D Spiegel halter J C Taylor et alMachine learningneural and statistical classification Ellis Horwood neural andstatistical classification 1995
[42] K M Ting and I F Witten ldquoIssues in stacked generalizationrdquoArtif Intell Res vol 10 no 1 pp 271ndash289 1999
[43] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe WEKA data mining software an updaterdquoACM SIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash182009
[44] A Frank and A Asuncion UCI Machine Learning RepositoryUniversity of California School of Information and ComputerScience Irvine CA USA httparchiveicsucieduml
[45] J Alcala-Fdez A Fernandez J Luengo et al ldquoKEEL data-mining software tool data set repository integration ofalgorithms and experimental analysis frameworkrdquo Journal ofMultiple-Valued Logic and Soft Computing vol 17 no 2-3 pp255ndash287 2011
[46] S Garcıa A Fernandez J Luengo and F Herrera ldquoA study ofstatistical techniques and performance measures for genetics-based machine learning accuracy and interpretabilityrdquo SoftComputing vol 13 no 10 pp 959ndash977 2009
[47] S Holm ldquoA simple sequentially rejective multiple test proce-durerdquo Scandinavian Journal of Statistics vol 6 no 2 pp 65ndash701979
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
![Page 14: Classifying Imbalanced Data Sets by a Novel RE-Sample and ...downloads.hindawi.com/journals/mpe/2018/5036710.pdf · Cost-Sensitive Stacked Generalization Method ... In this paper,](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f02a4577e708231d4054843/html5/thumbnails/14.jpg)
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom