microcluster-basedincrementalensemblelearningfornoisy...

12
Research Article Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams Sanmin Liu, 1,2 Shan Xue , 2 Fanzhen Liu, 2 Jieren Cheng , 3 Xiulai Li, 3,4 Chao Kong, 1 and Jia Wu 2 1 School of Computer and Information, Anhui Polytechnic University, Wuhu 241000, China 2 Department of Computing, Macquarie University, Sydney 2109, Australia 3 School of Computer Science & Cyberspace Security, Hainan University, Haikou 570228, China 4 Hainan Hairui Zhong Chuang Technology Co. Ltd., Haikou 570228, China Correspondence should be addressed to Shan Xue; [email protected] Received 23 October 2019; Revised 26 December 2019; Accepted 1 February 2020; Published 5 May 2020 Guest Editor: Xuyun Zhang Copyright © 2020 Sanmin Liu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines. 1. Introduction e velocity and voracity with which we are now producing data is making streaming data ubiquitous in real-world applications [1]. For example, intrusion detection [2], credit fraud detection [3], network traffic management [4], and recommendation system [5] all rely on data streams. However, data streams have some unique characteristics that make it more difficult to manipulate. First, the data can be generated at very fast speeds and in huge volumes. Second, there exists concept drift in data streams, and the existing models no longer work as effectively as they once did. Last, physical constraints mean that only a certain amount of knowledge can be used or extracted from a data stream at any point in time and, once elapsed, it can be very difficult to go back and retrieve more knowledge. us, data stream mining confronts many challenges. Revealing the knowledge hidden in data streams is broadly known as data stream mining, which spans data stream classification, clustering, and other data analytics tasks [6]. Data stream classification is, arguably, the most common analytics task in many practical applications. Due to the time-sequence characteristics, the related research studies of data steam classification confront lots of diffi- culties. For example, to keep track with concept drift, the model not only needs to be retrained frequently but also its processing and memory overheads must stay low to cope with the velocity and volume of the data. In a traditional data mining scenario, the model would merely need to extract knowledge from a static dataset with a joint distribution function that does not change. However, our model for data stream classification needs to extract knowledge from in- stances that are generated over time and where the joint distribution function is variable, i.e., in the presence of concept drift [7, 8]. According to many studies, concept drift is the main barrier to data stream classification. To date, the solutions to classification in nonstationary data stream environment have been based on either online or ensemble learning, and those methods improve the performance of classification. Concerning concept drift in Hindawi Complexity Volume 2020, Article ID 6147378, 12 pages https://doi.org/10.1155/2020/6147378

Upload: others

Post on 13-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

Research ArticleMicrocluster-Based Incremental Ensemble Learning for NoisyNonstationary Data Streams

Sanmin Liu12 Shan Xue 2 Fanzhen Liu2 Jieren Cheng 3 Xiulai Li34 Chao Kong1

and Jia Wu 2

1School of Computer and Information Anhui Polytechnic University Wuhu 241000 China2Department of Computing Macquarie University Sydney 2109 Australia3School of Computer Science amp Cyberspace Security Hainan University Haikou 570228 China4Hainan Hairui Zhong Chuang Technology Co Ltd Haikou 570228 China

Correspondence should be addressed to Shan Xue emmaxuemqeduau

Received 23 October 2019 Revised 26 December 2019 Accepted 1 February 2020 Published 5 May 2020

Guest Editor Xuyun Zhang

Copyright copy 2020 Sanmin Liu et al -is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Data stream classification becomes a promising prediction work with relevance to many practical environments However underthe environment of concept drift and noise the research of data stream classification faces lots of challenges Hence a newincremental ensemble model is presented for classifying nonstationary data streams with noise Our approach integrates threestrategies incremental learning to monitor and adapt to concept drift ensemble learning to improve model stability and amicroclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority voteExperiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides moreaccurate classification in nonstationary data streams with noise than the two popular baselines

1 Introduction

-e velocity and voracity with which we are now producingdata is making streaming data ubiquitous in real-worldapplications [1] For example intrusion detection [2] creditfraud detection [3] network traffic management [4] andrecommendation system [5] all rely on data streamsHowever data streams have some unique characteristics thatmake it more difficult to manipulate First the data can begenerated at very fast speeds and in huge volumes Secondthere exists concept drift in data streams and the existingmodels no longer work as effectively as they once did Lastphysical constraints mean that only a certain amount ofknowledge can be used or extracted from a data stream atany point in time and once elapsed it can be very difficult togo back and retrieve more knowledge -us data streammining confronts many challenges

Revealing the knowledge hidden in data streams isbroadly known as data stream mining which spans datastream classification clustering and other data analytics

tasks [6] Data stream classification is arguably the mostcommon analytics task in many practical applications Dueto the time-sequence characteristics the related researchstudies of data steam classification confront lots of diffi-culties For example to keep track with concept drift themodel not only needs to be retrained frequently but also itsprocessing and memory overheads must stay low to copewith the velocity and volume of the data In a traditional datamining scenario the model would merely need to extractknowledge from a static dataset with a joint distributionfunction that does not change However our model for datastream classification needs to extract knowledge from in-stances that are generated over time and where the jointdistribution function is variable ie in the presence ofconcept drift [7 8] According to many studies concept driftis the main barrier to data stream classification

To date the solutions to classification in nonstationarydata stream environment have been based on either onlineor ensemble learning and those methods improve theperformance of classification Concerning concept drift in

HindawiComplexityVolume 2020 Article ID 6147378 12 pageshttpsdoiorg10115520206147378

imbalanced streams data setting an ensemble learningmodel was presented with resampling technology [7] Acombined online ensemble method was used to simulta-neously consider concept drift and the high-dimensionproblem [9] Additionally in the light of various classifi-cation scenarios many supervised learning approaches re-cently have been widely explored [10ndash17] and some havebeen applied in data stream classification such as supportvector machine (SVM) and Bayesian technique

In nonstationary streaming data environment theseinvestigations solved some of the problems includingconcept drift the curse of dimensionality and imbalancedlearning However there are still some open problems to beaddressed For example few studies have considered how toeffectively and simultaneously cope with both concept driftand noise in nonstationary data streams To deal with theseproblems we design a new classification approach thatconstructs microclusters to serve as a pool of base classifiersFinal prediction of incoming instancersquos class label is made bya majority vote of the microclusters At the same time anincremental learning strategy combined with an ensemblelearning and a smoothing operator does the work ofadapting the model to concept drift distinguishing noiseand maintaining stability

In a word there exist the three main contributions in ourpaper

(1) A technique for constructing a set of microclusters asbase classifiers by redefining the concept of clusterfeature previously used in hierarchical clusteranalysis Good classification results can be achievedwith nonstationary data streams by combining nu-merous microclusters Additionally microclustercombined with incremental learning is a very con-venient way to absorb new knowledge and keep trackof concept drift

(2) A smoothing strategy designed to shift the centroidsof microcluster and control the balance betweenhistorical and new instances -is approach makesthe best use of historical knowledge and can alsoovercome problems with a shortage of drifted data

(3) A majority vote strategy and an incremental learningenhance the stability and adaptability of the model innonstationary data streams with noise -us theproposed model leverages the advantages of bothensemble and incremental learning to maintain highaccuracy in class label prediction

-is paper is organized as follows-e background workis discussed in Section 2 and then Section 3 outlines thebasic concept Section 4 describes the proposed model andprovides a complexity analysis of the algorithm In Section5experimental schema and results are illustrated Section 6describes conclusion and future plans

2 Related Work

An excellent data stream classification approach has theability to learn incrementally and adapt to concept drift as

well [18] In general two important kinds of incrementallearning method are concerned instance-incrementallearning [19 20] which learns an instance at a time andbatch-incremental learning [21] which learns from instanceset once In the instance-incremental learning groupCrammer et al [19] developed an online passive-aggressivealgorithmic (PA) framework based on SVM that forces theclassification hyperplane to move to satisfy the minimumloss constraint when the classifier misclassifies an instance-is framework has been widely explored for many practicalsettings [22 23] In work [24] it presented the instance-incremental method with weighted one-class SVM thatcould solve gradual drift in nonstationary data streamsInstance-incremental learning has also been based on ex-treme learning machine as a way to boost classificationspeeds [25]When data stream is stable incoming instance isused to update the classifier however when concept drifthappens a weakly performing classifier is deleted -is is avery flexible approach to classifying real-time data streamsIn the batch-incremental learning domain Lu et al [26]provided a novel dynamic weighted majority approach todeal with imbalance problems and concept drift -ismethod uses dynamic weighted ensemble learning to keepthe classification model stable and batch-incrementallearning to track concept drift

Between the two modes instance-based incrementallearning is more flexible and scalable for real-time datastream classification It is also a more suitable approach forenvironments where it is difficult to label instances andunderstand concepts in advance [20] Hence we turn ourattention to instance-incremental learning for the remainderof this paper using the simple term incremental learninghereafter

-e impetus for studying ensemble learning in con-junction with data stream classification came from a desireto improve classification modelrsquos stability [27ndash31] -esemodels include base classifier set and merged method whichcombines the base classifierrsquos output into a final output bythe ensemble SEA algorithm [29] is one of the early en-semble methods When the SEA ensemble model is not fulleach newly arriving data chunk is used to build a new baseclassifier If the limit has been reached the new classifier isstill constructed for every newly arriving data chunk but itreplaces the classifier with the worst performanceAccording to a majority vote policy the ensemble methodSEA outputs the final predictions Another similar work isweighted ensemble method based on accuracy [30] wherethe important point is to allocate a weight to each baseclassifier that is an estimate of its accuracy on the newest datachunk -is idea suggests that the newest data chunk couldrepresent the target concept with high probability so theclassifiers with higher accuracy should be given more im-portance Also when the maximum ensemble scale has beenreached a base classifier with the worst performance isdeleted and a new base classifier joins into the ensemblemodel Another iterative ensemble method was developedbased on boosting and batch-incremental learning [31] -ismethod adds a suitable number of base classifiers to theclassification model with each newly arriving data chunk

2 Complexity

instead of adding just one -e experimental results suggestthat the iterative boosting ensemble classification method isa promising way to perform classification task in nonsta-tionary data stream environment Beyond concept driftimbalanced class distributions are another challenge withdata stream classification that can be tackled with ensemblelearning Zhang et alrsquos [27] method of dealing with thisproblem is a two-pronged approach -e first tack is todivide themajority into N subsets of roughly the same size asthe minority and then construct N new balanced trainingsubsets from the minority and divided subsets Next theensemble model is created using a neural network withbackpropagation as the base learning algorithm -e baseclassifiersrsquo diversity is one of the important factors oflearning system Hence Jackowski [32] introduced the ideaof two error trend diversity measurements pair errors andpool errors to find and keep track with concept drift instreaming data setting Experiments with this model showthat the diversity measurements can not only be used toenhance the ensemble modelrsquos performance but also to holdeffectively the scale of ensemble model Based on the aboveanalysis we think that ensemble learning is currently themostpromising research direction for data stream classification

From this review we distill several observations in-cremental learning can dynamically reveal new knowledge indata streams Ensemble learning can improve the stability ofclassification models for nonstationary data streams -esuitable algorithm can enhance a classification modelrsquosflexibility -ese three observations form the basis of threeintegrated strategies in our method for simultaneouslytackling concept drift and noise

3 Basic Concept and Problem Definition

-is section firstly begins with a description of the basicconcepts used in this paper and then a detailed analysis ofthe research problem is explored

31 Data Stream According to the related studies in thispaper we think that data stream consists of a series of labeledinstances namely S X1X2 Xt 1113864 1113865 whereXt (X y) in which X stands for a feature vector whichrepresents an instance characterizing the features of anobject and y is Xtrsquos class label When y is +1 Xt representspositive instance On the contrary Xt is negative instance

According to the above definition we explore a mappingfunction f X⟶ y with high accuracy which stands forclassification model that can output the incoming instanceXrsquos class label Only supervised learning is considered in thispaper -erefore the classification model f is constructedfrom a labeled dataset and once built it can output the classlabel +1 or minus1 for the incoming instance In addition for thepurposes of this paper the real label is acquired after themapping function f outputs the prediction of incominginstance

32 Concept Drift According to the work [33] when a jointprobability distribution P of data changes evolving over

time there exists concept drift In other wordsPt(X y)nePt+1(X y) where the subscript t stands for thetime stamp X suggests the vector which represents the valueof feature attribute and y is a class label According to thechanging rate of concept gradual drift and abrupt drift [34]are discussed Generally speaking gradual drift is a slowerrate of change from one concept to another one and it isillustrated in Figure 1(a) When the distribution Pt isabruptly differentfrom the distribution Pt+1 at t + 1 we saythat abrupt drift occurs and it is seen in Figure 1(b) InFigure 1 the difference between gradual drift and abruptdrift is clearly found and these two kinds of drift areconcerned in this paper

33 Problem Definition Noisy instances and concept driftappear to have similar distributions in nonstationary datastreams It is therefore critical to differentiate noise fromconcept drift and that is the motivation of this paper to builda classification model that can find and keep track withconcept drift in nonstationary streaming data with noiseMeanwhile in order to catch concept drift the classificationmodel should be updated by incremental learning -eresearch problem is demonstrated in Figure 2

From Figure 2 we understand clearly the problemdefinition of this paper and identify the noisy instance innonstationary streaming data -e dotted circle represents amicrocluster and the dotted straight line suggests the dis-tribution for instances in Figure 2-e current case is shownin Figure 2(a) When time goes on the instance is comingand the microcluster is updated at time stamp t + 1 as seenfrom Figure 2(b) which represents the case of incrementallearning At time stamp t + 2 in Figure 2(c) the incominginstance with positive class label lies in the old microclusterwith a different class label In this case the new instance isregarded as a noisy instance and will be discarded this is whythis instance no longer exists at time stamp t + 3 InFigure 2(d) the incoming instance forms a concept drift andleads to a new microcluster construction

Based on the above analysis our solution involves threestrategies to deal with the research problem as illustrated inFigure 2 incremental learning to track concept drift en-semble learning to enhance the modelrsquos stability andmicroclustering method to distinguish drift from noise andmake the final label predictions In the next section weoutline these strategies in detail and discuss the three sce-narios illustrated in Figure 2

4 Adaptive Incremental Ensemble Data StreamClassification Method

-is section describes microcluster and data stream classi-fication model followed by the corresponding algorithm

41 Definition of Microcluster Microclusters as classifiers inour model are constructed by cluster features which is atechnique that was originally developed as part of hierar-chical cluster analysis [35] -e structure of cluster feature is

Complexity 3

defined as CF langSS LS nrang Based on the cluster feature wegive the definition of microcluster used in this paper

Definition 1 Microcluster (MC) is represented aslangSS LS n Ci d CL αrang where SS and LS are used to computethe boundary of MC that SS denotes the square sum of theattributes of the instances in MC as calculated in equation(1) and LS is a vector saves the sum of each attribute as inequation (2) n suggests the number of instances Ci d

presents MCrsquos centroid which changes over time as shownin equation (3) CL is MCrsquos class label and α counts thenumber that MC correctly classifies incoming instance andα is initiated as 0

SS 1113944n

i

1113944

l

j

x2ij (1)

LS 1113944n

i

xi1 1113944n

i

xij 1113944n

i

xil⎛⎝ ⎞⎠ (2)

where l is the dimension of the instance

Cid (1 minus σ) times Cidtminus1 + σ times(LSn) (3)

where Cidtminus1 is MCrsquos centroid on the previous time stampt minus 1 and σ isin [0 1] stands for smoothing parameter

-e size of MC is represented by clusterrsquos radius r whichis calculated as follows

r

SS

nminus

LS

n

21113971

(4)

where LSn represents the length of vector

42 Data Stream Classification Model Based on MicroclusterClassification model consists of three phases classificationincremental learning and updating A framework of themodel is given in Figure 3-e processes and calculations arepresented in detail in this part and summarized into thecorresponding algorithm presented as Algorithm 1

421 Phase 1 (Classification) 8e k-Nearest MicroclustersClassify the Incoming Instance When an incoming instancearrives Euclidean distance is computed between the in-coming instance and each microcluster in pool Based onEuclidean distances the k-nearest microclusters are selectedand then each microcluster will assign its own label to theincoming instance According to equation (5) the final labelof incoming instance is voted by the merged method

y argmaxj

1113944

c

j11113944

k

i1fi(x) (5)

where k stands for the number of microclusters participatingin the classification and c denotes the number of class

t

pt

t + 1

Pt+1

(a)

pt

t + 1

Pt+1

t

(b)

Figure 1 -e two types of concept drift (a) Gradual drift (b) Abrupt drift

Incremental updation ofmicroclusterMicrocluster

(a) (b) (c) (d)

Building new microcluster

t t + 1 t + 2 t + 3

Noisy instance

Figure 2 A demonstration of problem definition In data stream (a) microclusters (shape dotted circle) are developed by historicalinstances (color green) with positive class (shape circle) and negative class (shape triangle) (b) when new instance (color red shape circle)comes microcluster is updated (color red shape dotted circled) and (c) the old microcluster involves noisy instance (color red shapecircle) and (d) concept drift (color red shape circle) is detected and a new microcluster (color red shape dotted circle) is built

4 Complexity

Once incoming instance is classified microcluster isimmediately updated If the final prediction is correct ie ifall the microclusters who voted have the same class label asthe final prediction the value of α increases by 1 otherwiseit decreases by 1

422 Phase 2 (Incremental Learning) 8e Nearest Micro-cluster Will Be Updated Based on the Incoming InstanceFollowing the first-test-and-then-train principle thenearest microcluster is immediately updated to ensure themodel quickly adapts to the new concept or the new

Mediation rule for combining the output of k microclusters

hellip(xt yt) (xt+1 yt+1)hellip Output prediction

Select k microclusters

Phase 2 incremental learning

Phase 1 classify the incoming instance

Phase 3 updating the pool

Microcluster

Microcluster

Figure 3 -e framework of data stream classification model

Input -e instances S X1 X2 Xt 1113864 1113865the pool maximum limit M andthe smoothing parameter σ

Output -e pool of microcluster Plowast

(1) P(0) cupLi1MCi⟵ the pool of initial microclusters (MC) which is formed by k-means(2) for each instance Xt (X y) do

Phase 1 Classification(3) d⟵ distance between X and MC

(4) MC(t)⟵ select the k-nearest microclusters to classify the instance X

(5) 1113954y⟵ the predicted class label of instance X gained by majority vote in equation (5)(6) α⟵ update the parameter of the k-nearest microcluster

Phase 2 Incremental Learning(7) if Scenario 1 then(8) MC(t)lowast ⟵ update the structure of nearest microcluster by equations (1)ndash(3) and the number of the instances in

microcluster will be incremented by 1(9) else if Scenario 2 then(10) X⟵ consider the instance as a noisy point and neglect it(11) else if Scenario 3 then(12) MC

(t)X ⟵ build a new microcluster on instance X

Phase 3 Updating Pool(13) ifLltM then(14) P(t) P(t) + MC

(t)X

(15) L L + 1(16) else(17) MC

(t)worst⟵ the worst microcluster

(18) MC(t)X ⟵ replace MC

(t)worst

(19) end if(20) end if(21) end for(22) return Plowast ⟵ microcluster pool at required time stamp t

ALGORITHM 1 MCBIE

Complexity 5

microcluster is constructed in this phase which is depictedin Figure 4

Scenario 1when incoming instancersquos label is the same asthe nearest microclusterrsquos label incoming instance isused to retrain this microcluster -e terms SS LS andCi d of the nearest microcluster are recalculated byequations (1)ndash(3) -e number of instances in thismicrocluster is incremented by 1 -e radius of micro-cluster is also updated by equation (4) -is scenario isshown in Figure 4(a) As a matter of fact when the in-coming instance drops into the nearest microcluster wecarry out the same operation that is the incoming in-stance is merged into the nearest microclusterScenario 2 incoming instancersquos label varies from thenearest microclusterrsquos label and incoming instance liesinside the boundary of the nearest microcluster as seenfrom Figure 4(b) In this paper there exists the fun-damental assumption that two adjacent instances arehighly likely to represent the same concept ie theprobability that they share the same class label is veryhigh According to the fundamental assumption theincoming instance will be treated as noise and deletedScenario 3 in contrast to Scenario 2 incoming instancersquoslabel is different from the nearest microclusterrsquos label andincoming instance does not drop into the nearestmicrocluster as shown in Figure 4(c) -is scenariosuggests that incoming instance is derived from thedifferent joint probability distribution Under thiscircumstance we think new concept happens and amicrocluster will be constructed with incoming in-stance by the method described in Section 41 Becausethere is only one instance in this new microclusterwhen it is constructed its label CL will be the same asthe incoming instance and its centroid will be theincoming instance itself -e terms SS and LS of thenew microcluster are computed by equations (1) and(2) and the value of α is 0

423 Phase 3 (Updating) 8e Pool of Microcluster IsUpdated As time passes new microclusters are continu-ously being created and eventually the pool will reach itslimit Once full the microcluster with the worst performancewill be replaced with new microcluster By this cyclicalupdate the classification model can effectively catch conceptchange and it leads to improve the classification accuracyGenerally speaking the smaller the value of α the worse theperformance of themicrocluster-erefore the microclusterwith the smallest α is selected for replacement

43 Algorithm and Complexity Analysis In summary of theabove phases and scenarios in data stream classificationmodel the algorithm of microcluster-based incrementalensemble classification named as MCBIE is expressed inAlgorithm 1

-e algorithm of MCBIE includes three phases whichachieve three functions namely classification incrementallearning and updating pool Line 1 is to train the initial

microcluster and build a pool of microclusters Lines 3 to 6achieve the classification for an incoming instance X andupdate the performance of the microcluster According to thethree different scenarios the function of Phase 2 is accom-plished in lines 7 to 12 Finally the size of base classifierreaches the upper-bound M the worst microcluster will bedeleted and the new microcluster is added to microclusterpool On the contrary the new microcluster is directly putinto microcluster pool It is illustrated in lines 13 to 19

In terms of complexity through the analysis of Algo-rithm 1 we know the core operation included by the al-gorithm MCBIE is to calculate the distance in classificationphase -e complexity here depends on mainly two aspectsthe dimensions of the instance (l) and the number ofmicroclusters as base classifier (M) in the ensemble model-us the presented algorithmrsquos time complexity is ap-proximately O(l middot M) In the presented algorithm theprevious instances are not reserved over time and the sta-tistical information of microcluster is recorded such as SSLS and Ci d which can save the storage memory by this way

5 Experiments

51 Datasets To evaluate MCBIE we conduct simulationexperiments with two synthetic datasets -e two datasetsselected are the Hyperplane data stream and the SEA datastream taken from Massive Online Analysis (MOA) [36]Hyperplane data stream is designed to test for gradual driftwhile SEA data stream is designed to test for abrupt drift-ese are the most popular datasets in the data streamclassification domain Further details are as follows

Hyperplane data stream [37] in the l-dimensionalspace a hyperplane includes the point set X whichsatisfies 1113936

li1 aixi a0 1113936

li1 ai where xi represents

the i-th dimension of X Instances for which1113936

li1 aixi ge a0 represent positive class and instances for

which 1113936li1 aixi lt a0 represent negative class A hy-

perplane in l-dimensional space may slowly rotate bychanging the parameters for simulating time-changingconcepts In this paper the value of l is 10 and there are6 attributes with concept drift and it generates 20000instances -ree different noise ratios (respectively20 25 and 30) are injected into data streamSEA data stream [29] the instances in this data streamare generated from three attributes with continuousvalues x1 x2 x3 isin [0 10] When it satisfies x1 + x2 ge θthe instance is positive class otherwise the label ofinstance is negative To simulate concept drift thethreshold value θ 8 9 7 95 will change over timeIt generates 5000 instances with each threshold valueand the whole SEA data stream includes 20000 in-stances SEA data stream with two different noise ratios(20 and 30) is applied in this experiment to test theabrupt drift

52 Baselines -e PA algorithmic framework [19] andHoeffding tree [38] are selected as baselines to compare withthe presented methodMCBIE and these two approaches are

6 Complexity

frequently chosen as the benchmark in many studies[20 22 23 38] Moreover as a well-known classical algo-rithm the Hoeffding tree algorithm is integrated into theMOA platform [36] -erefore we have followed suit in ourpaper

-e PA algorithmic framework [19] is an online in-cremental learning framework for binary classificationbased on SVM Given instance X the classificationmodel outputs the prediction as follows

1113954y sign(wrarr

middot X) (6)

where wrarr represents a vector of weights and 1113954y is the

prediction of instance X

After the 1113954y is output it acquires the ground truth classlabel y and computes a loss value resulting from the fol-lowing equation

ls max 0 1 minus y middot (wrarr

middot X)1113864 1113865 (7)

-e vector of weights wrarr is then updated using

wrarr

wrarr

+ τ middot y middot X (8)

where τ ge 0 is a Lagrange multiplier whose value is calcu-lated by equation (9) in three different methods namely PAPA-I and PA-II

τ ls

X2(PA)

τ min C lsX2

1113872 11138731113966 1113967(PA minus I)

τ ls

X2 +(1(2 middot C))(PA minus II)

(9)

where C is a positive parameter and referred to as aggres-siveness parameter of the algorithm A detailed outline of thederivation procedure can be found in [19]

Hoeffding tree [38] is a decision tree for online learningfrom the high-volume data stream and it is built fromeach instance in constant time According to Hoeffdingbound we can estimate the number of instances whichare needed to build the tree node-eHoeffding boundhas nothing to do with the distribution function that

generates the instances Moreover the Hoeffdingbound is used to construct Hoeffding tree which isapproximated to the one produced by batch learningIn the light of its incremental nature of the Hoeffdingtree it is used widely in data stream classification

53 Experiment Setup Following the first-test-and-then-train principle [39] each incoming instance is first testedand then the model is retrained with the incoming instanceunder an incremental paradigm To assess the classificationmodelrsquos performance in this paper the classification accu-racy is computed every one hundred instances during theprocess of data stream classification

Both our MCBIE and the baselines are initialized on thefirst 100 instances and the model resulting from that ini-tialization is used to predict the following instance in datastream In MCBIE we use these 100 instances to train 6initial microclusters as base classifiers by using the k-meansalgorithm At every time stamp the three nearest micro-clusters of each incoming instance are selected to assert thelabel information -e maximum scale of microcluster poolis 30 and once full a new microcluster which takes the placeof the worst-performing microcluster joins in the pool Weuse Weka package to implement the MCBIE algorithmHoeffding tree algorithm (named as HT) is run in MOAplatform with the parameters set to their default values PAPA-I and PA-II with Gaussian kernel are executed inMATLAB and the constant C is equal to 1

54 Experiment Result and Analysis -e simulation ex-periments are designed to evaluate MCBIE in two sidesFirst we want to assess the sensitivity of the smoothingparameter σ second we want to justify the feasibility andvalidity of MCBIE

541 Experiment 1 Sensitivity Analysis of the SmoothingParameter Following the experimental setup in Section 53we verify the function of smoothing parameter σ in MCBIEfrom 01 to 1 When the smoothing parameter σ is either toobig or too small the MCBIErsquos average accuracy and cor-responding standard deviation do not reach the desiredresult on the Hyperplane data stream and SEA data stream

(a) (b) (c)

Figure 4-ree different scenarios (a) incoming instancersquos label is the same as the nearest microclusterrsquos label (b) incoming instancersquos labelvaries from the nearest microclusterrsquos label which represents noisy instance and (c) incoming instance as a new concept does not drop intothe nearest microcluster and its label is different from the nearest microclusterrsquos label Note the color represents the class label the rectanglesuggests the incoming instance and the circle represents the nearest microcluster

Complexity 7

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 2: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

imbalanced streams data setting an ensemble learningmodel was presented with resampling technology [7] Acombined online ensemble method was used to simulta-neously consider concept drift and the high-dimensionproblem [9] Additionally in the light of various classifi-cation scenarios many supervised learning approaches re-cently have been widely explored [10ndash17] and some havebeen applied in data stream classification such as supportvector machine (SVM) and Bayesian technique

In nonstationary streaming data environment theseinvestigations solved some of the problems includingconcept drift the curse of dimensionality and imbalancedlearning However there are still some open problems to beaddressed For example few studies have considered how toeffectively and simultaneously cope with both concept driftand noise in nonstationary data streams To deal with theseproblems we design a new classification approach thatconstructs microclusters to serve as a pool of base classifiersFinal prediction of incoming instancersquos class label is made bya majority vote of the microclusters At the same time anincremental learning strategy combined with an ensemblelearning and a smoothing operator does the work ofadapting the model to concept drift distinguishing noiseand maintaining stability

In a word there exist the three main contributions in ourpaper

(1) A technique for constructing a set of microclusters asbase classifiers by redefining the concept of clusterfeature previously used in hierarchical clusteranalysis Good classification results can be achievedwith nonstationary data streams by combining nu-merous microclusters Additionally microclustercombined with incremental learning is a very con-venient way to absorb new knowledge and keep trackof concept drift

(2) A smoothing strategy designed to shift the centroidsof microcluster and control the balance betweenhistorical and new instances -is approach makesthe best use of historical knowledge and can alsoovercome problems with a shortage of drifted data

(3) A majority vote strategy and an incremental learningenhance the stability and adaptability of the model innonstationary data streams with noise -us theproposed model leverages the advantages of bothensemble and incremental learning to maintain highaccuracy in class label prediction

-is paper is organized as follows-e background workis discussed in Section 2 and then Section 3 outlines thebasic concept Section 4 describes the proposed model andprovides a complexity analysis of the algorithm In Section5experimental schema and results are illustrated Section 6describes conclusion and future plans

2 Related Work

An excellent data stream classification approach has theability to learn incrementally and adapt to concept drift as

well [18] In general two important kinds of incrementallearning method are concerned instance-incrementallearning [19 20] which learns an instance at a time andbatch-incremental learning [21] which learns from instanceset once In the instance-incremental learning groupCrammer et al [19] developed an online passive-aggressivealgorithmic (PA) framework based on SVM that forces theclassification hyperplane to move to satisfy the minimumloss constraint when the classifier misclassifies an instance-is framework has been widely explored for many practicalsettings [22 23] In work [24] it presented the instance-incremental method with weighted one-class SVM thatcould solve gradual drift in nonstationary data streamsInstance-incremental learning has also been based on ex-treme learning machine as a way to boost classificationspeeds [25]When data stream is stable incoming instance isused to update the classifier however when concept drifthappens a weakly performing classifier is deleted -is is avery flexible approach to classifying real-time data streamsIn the batch-incremental learning domain Lu et al [26]provided a novel dynamic weighted majority approach todeal with imbalance problems and concept drift -ismethod uses dynamic weighted ensemble learning to keepthe classification model stable and batch-incrementallearning to track concept drift

Between the two modes instance-based incrementallearning is more flexible and scalable for real-time datastream classification It is also a more suitable approach forenvironments where it is difficult to label instances andunderstand concepts in advance [20] Hence we turn ourattention to instance-incremental learning for the remainderof this paper using the simple term incremental learninghereafter

-e impetus for studying ensemble learning in con-junction with data stream classification came from a desireto improve classification modelrsquos stability [27ndash31] -esemodels include base classifier set and merged method whichcombines the base classifierrsquos output into a final output bythe ensemble SEA algorithm [29] is one of the early en-semble methods When the SEA ensemble model is not fulleach newly arriving data chunk is used to build a new baseclassifier If the limit has been reached the new classifier isstill constructed for every newly arriving data chunk but itreplaces the classifier with the worst performanceAccording to a majority vote policy the ensemble methodSEA outputs the final predictions Another similar work isweighted ensemble method based on accuracy [30] wherethe important point is to allocate a weight to each baseclassifier that is an estimate of its accuracy on the newest datachunk -is idea suggests that the newest data chunk couldrepresent the target concept with high probability so theclassifiers with higher accuracy should be given more im-portance Also when the maximum ensemble scale has beenreached a base classifier with the worst performance isdeleted and a new base classifier joins into the ensemblemodel Another iterative ensemble method was developedbased on boosting and batch-incremental learning [31] -ismethod adds a suitable number of base classifiers to theclassification model with each newly arriving data chunk

2 Complexity

instead of adding just one -e experimental results suggestthat the iterative boosting ensemble classification method isa promising way to perform classification task in nonsta-tionary data stream environment Beyond concept driftimbalanced class distributions are another challenge withdata stream classification that can be tackled with ensemblelearning Zhang et alrsquos [27] method of dealing with thisproblem is a two-pronged approach -e first tack is todivide themajority into N subsets of roughly the same size asthe minority and then construct N new balanced trainingsubsets from the minority and divided subsets Next theensemble model is created using a neural network withbackpropagation as the base learning algorithm -e baseclassifiersrsquo diversity is one of the important factors oflearning system Hence Jackowski [32] introduced the ideaof two error trend diversity measurements pair errors andpool errors to find and keep track with concept drift instreaming data setting Experiments with this model showthat the diversity measurements can not only be used toenhance the ensemble modelrsquos performance but also to holdeffectively the scale of ensemble model Based on the aboveanalysis we think that ensemble learning is currently themostpromising research direction for data stream classification

From this review we distill several observations in-cremental learning can dynamically reveal new knowledge indata streams Ensemble learning can improve the stability ofclassification models for nonstationary data streams -esuitable algorithm can enhance a classification modelrsquosflexibility -ese three observations form the basis of threeintegrated strategies in our method for simultaneouslytackling concept drift and noise

3 Basic Concept and Problem Definition

-is section firstly begins with a description of the basicconcepts used in this paper and then a detailed analysis ofthe research problem is explored

31 Data Stream According to the related studies in thispaper we think that data stream consists of a series of labeledinstances namely S X1X2 Xt 1113864 1113865 whereXt (X y) in which X stands for a feature vector whichrepresents an instance characterizing the features of anobject and y is Xtrsquos class label When y is +1 Xt representspositive instance On the contrary Xt is negative instance

According to the above definition we explore a mappingfunction f X⟶ y with high accuracy which stands forclassification model that can output the incoming instanceXrsquos class label Only supervised learning is considered in thispaper -erefore the classification model f is constructedfrom a labeled dataset and once built it can output the classlabel +1 or minus1 for the incoming instance In addition for thepurposes of this paper the real label is acquired after themapping function f outputs the prediction of incominginstance

32 Concept Drift According to the work [33] when a jointprobability distribution P of data changes evolving over

time there exists concept drift In other wordsPt(X y)nePt+1(X y) where the subscript t stands for thetime stamp X suggests the vector which represents the valueof feature attribute and y is a class label According to thechanging rate of concept gradual drift and abrupt drift [34]are discussed Generally speaking gradual drift is a slowerrate of change from one concept to another one and it isillustrated in Figure 1(a) When the distribution Pt isabruptly differentfrom the distribution Pt+1 at t + 1 we saythat abrupt drift occurs and it is seen in Figure 1(b) InFigure 1 the difference between gradual drift and abruptdrift is clearly found and these two kinds of drift areconcerned in this paper

33 Problem Definition Noisy instances and concept driftappear to have similar distributions in nonstationary datastreams It is therefore critical to differentiate noise fromconcept drift and that is the motivation of this paper to builda classification model that can find and keep track withconcept drift in nonstationary streaming data with noiseMeanwhile in order to catch concept drift the classificationmodel should be updated by incremental learning -eresearch problem is demonstrated in Figure 2

From Figure 2 we understand clearly the problemdefinition of this paper and identify the noisy instance innonstationary streaming data -e dotted circle represents amicrocluster and the dotted straight line suggests the dis-tribution for instances in Figure 2-e current case is shownin Figure 2(a) When time goes on the instance is comingand the microcluster is updated at time stamp t + 1 as seenfrom Figure 2(b) which represents the case of incrementallearning At time stamp t + 2 in Figure 2(c) the incominginstance with positive class label lies in the old microclusterwith a different class label In this case the new instance isregarded as a noisy instance and will be discarded this is whythis instance no longer exists at time stamp t + 3 InFigure 2(d) the incoming instance forms a concept drift andleads to a new microcluster construction

Based on the above analysis our solution involves threestrategies to deal with the research problem as illustrated inFigure 2 incremental learning to track concept drift en-semble learning to enhance the modelrsquos stability andmicroclustering method to distinguish drift from noise andmake the final label predictions In the next section weoutline these strategies in detail and discuss the three sce-narios illustrated in Figure 2

4 Adaptive Incremental Ensemble Data StreamClassification Method

-is section describes microcluster and data stream classi-fication model followed by the corresponding algorithm

41 Definition of Microcluster Microclusters as classifiers inour model are constructed by cluster features which is atechnique that was originally developed as part of hierar-chical cluster analysis [35] -e structure of cluster feature is

Complexity 3

defined as CF langSS LS nrang Based on the cluster feature wegive the definition of microcluster used in this paper

Definition 1 Microcluster (MC) is represented aslangSS LS n Ci d CL αrang where SS and LS are used to computethe boundary of MC that SS denotes the square sum of theattributes of the instances in MC as calculated in equation(1) and LS is a vector saves the sum of each attribute as inequation (2) n suggests the number of instances Ci d

presents MCrsquos centroid which changes over time as shownin equation (3) CL is MCrsquos class label and α counts thenumber that MC correctly classifies incoming instance andα is initiated as 0

SS 1113944n

i

1113944

l

j

x2ij (1)

LS 1113944n

i

xi1 1113944n

i

xij 1113944n

i

xil⎛⎝ ⎞⎠ (2)

where l is the dimension of the instance

Cid (1 minus σ) times Cidtminus1 + σ times(LSn) (3)

where Cidtminus1 is MCrsquos centroid on the previous time stampt minus 1 and σ isin [0 1] stands for smoothing parameter

-e size of MC is represented by clusterrsquos radius r whichis calculated as follows

r

SS

nminus

LS

n

21113971

(4)

where LSn represents the length of vector

42 Data Stream Classification Model Based on MicroclusterClassification model consists of three phases classificationincremental learning and updating A framework of themodel is given in Figure 3-e processes and calculations arepresented in detail in this part and summarized into thecorresponding algorithm presented as Algorithm 1

421 Phase 1 (Classification) 8e k-Nearest MicroclustersClassify the Incoming Instance When an incoming instancearrives Euclidean distance is computed between the in-coming instance and each microcluster in pool Based onEuclidean distances the k-nearest microclusters are selectedand then each microcluster will assign its own label to theincoming instance According to equation (5) the final labelof incoming instance is voted by the merged method

y argmaxj

1113944

c

j11113944

k

i1fi(x) (5)

where k stands for the number of microclusters participatingin the classification and c denotes the number of class

t

pt

t + 1

Pt+1

(a)

pt

t + 1

Pt+1

t

(b)

Figure 1 -e two types of concept drift (a) Gradual drift (b) Abrupt drift

Incremental updation ofmicroclusterMicrocluster

(a) (b) (c) (d)

Building new microcluster

t t + 1 t + 2 t + 3

Noisy instance

Figure 2 A demonstration of problem definition In data stream (a) microclusters (shape dotted circle) are developed by historicalinstances (color green) with positive class (shape circle) and negative class (shape triangle) (b) when new instance (color red shape circle)comes microcluster is updated (color red shape dotted circled) and (c) the old microcluster involves noisy instance (color red shapecircle) and (d) concept drift (color red shape circle) is detected and a new microcluster (color red shape dotted circle) is built

4 Complexity

Once incoming instance is classified microcluster isimmediately updated If the final prediction is correct ie ifall the microclusters who voted have the same class label asthe final prediction the value of α increases by 1 otherwiseit decreases by 1

422 Phase 2 (Incremental Learning) 8e Nearest Micro-cluster Will Be Updated Based on the Incoming InstanceFollowing the first-test-and-then-train principle thenearest microcluster is immediately updated to ensure themodel quickly adapts to the new concept or the new

Mediation rule for combining the output of k microclusters

hellip(xt yt) (xt+1 yt+1)hellip Output prediction

Select k microclusters

Phase 2 incremental learning

Phase 1 classify the incoming instance

Phase 3 updating the pool

Microcluster

Microcluster

Figure 3 -e framework of data stream classification model

Input -e instances S X1 X2 Xt 1113864 1113865the pool maximum limit M andthe smoothing parameter σ

Output -e pool of microcluster Plowast

(1) P(0) cupLi1MCi⟵ the pool of initial microclusters (MC) which is formed by k-means(2) for each instance Xt (X y) do

Phase 1 Classification(3) d⟵ distance between X and MC

(4) MC(t)⟵ select the k-nearest microclusters to classify the instance X

(5) 1113954y⟵ the predicted class label of instance X gained by majority vote in equation (5)(6) α⟵ update the parameter of the k-nearest microcluster

Phase 2 Incremental Learning(7) if Scenario 1 then(8) MC(t)lowast ⟵ update the structure of nearest microcluster by equations (1)ndash(3) and the number of the instances in

microcluster will be incremented by 1(9) else if Scenario 2 then(10) X⟵ consider the instance as a noisy point and neglect it(11) else if Scenario 3 then(12) MC

(t)X ⟵ build a new microcluster on instance X

Phase 3 Updating Pool(13) ifLltM then(14) P(t) P(t) + MC

(t)X

(15) L L + 1(16) else(17) MC

(t)worst⟵ the worst microcluster

(18) MC(t)X ⟵ replace MC

(t)worst

(19) end if(20) end if(21) end for(22) return Plowast ⟵ microcluster pool at required time stamp t

ALGORITHM 1 MCBIE

Complexity 5

microcluster is constructed in this phase which is depictedin Figure 4

Scenario 1when incoming instancersquos label is the same asthe nearest microclusterrsquos label incoming instance isused to retrain this microcluster -e terms SS LS andCi d of the nearest microcluster are recalculated byequations (1)ndash(3) -e number of instances in thismicrocluster is incremented by 1 -e radius of micro-cluster is also updated by equation (4) -is scenario isshown in Figure 4(a) As a matter of fact when the in-coming instance drops into the nearest microcluster wecarry out the same operation that is the incoming in-stance is merged into the nearest microclusterScenario 2 incoming instancersquos label varies from thenearest microclusterrsquos label and incoming instance liesinside the boundary of the nearest microcluster as seenfrom Figure 4(b) In this paper there exists the fun-damental assumption that two adjacent instances arehighly likely to represent the same concept ie theprobability that they share the same class label is veryhigh According to the fundamental assumption theincoming instance will be treated as noise and deletedScenario 3 in contrast to Scenario 2 incoming instancersquoslabel is different from the nearest microclusterrsquos label andincoming instance does not drop into the nearestmicrocluster as shown in Figure 4(c) -is scenariosuggests that incoming instance is derived from thedifferent joint probability distribution Under thiscircumstance we think new concept happens and amicrocluster will be constructed with incoming in-stance by the method described in Section 41 Becausethere is only one instance in this new microclusterwhen it is constructed its label CL will be the same asthe incoming instance and its centroid will be theincoming instance itself -e terms SS and LS of thenew microcluster are computed by equations (1) and(2) and the value of α is 0

423 Phase 3 (Updating) 8e Pool of Microcluster IsUpdated As time passes new microclusters are continu-ously being created and eventually the pool will reach itslimit Once full the microcluster with the worst performancewill be replaced with new microcluster By this cyclicalupdate the classification model can effectively catch conceptchange and it leads to improve the classification accuracyGenerally speaking the smaller the value of α the worse theperformance of themicrocluster-erefore the microclusterwith the smallest α is selected for replacement

43 Algorithm and Complexity Analysis In summary of theabove phases and scenarios in data stream classificationmodel the algorithm of microcluster-based incrementalensemble classification named as MCBIE is expressed inAlgorithm 1

-e algorithm of MCBIE includes three phases whichachieve three functions namely classification incrementallearning and updating pool Line 1 is to train the initial

microcluster and build a pool of microclusters Lines 3 to 6achieve the classification for an incoming instance X andupdate the performance of the microcluster According to thethree different scenarios the function of Phase 2 is accom-plished in lines 7 to 12 Finally the size of base classifierreaches the upper-bound M the worst microcluster will bedeleted and the new microcluster is added to microclusterpool On the contrary the new microcluster is directly putinto microcluster pool It is illustrated in lines 13 to 19

In terms of complexity through the analysis of Algo-rithm 1 we know the core operation included by the al-gorithm MCBIE is to calculate the distance in classificationphase -e complexity here depends on mainly two aspectsthe dimensions of the instance (l) and the number ofmicroclusters as base classifier (M) in the ensemble model-us the presented algorithmrsquos time complexity is ap-proximately O(l middot M) In the presented algorithm theprevious instances are not reserved over time and the sta-tistical information of microcluster is recorded such as SSLS and Ci d which can save the storage memory by this way

5 Experiments

51 Datasets To evaluate MCBIE we conduct simulationexperiments with two synthetic datasets -e two datasetsselected are the Hyperplane data stream and the SEA datastream taken from Massive Online Analysis (MOA) [36]Hyperplane data stream is designed to test for gradual driftwhile SEA data stream is designed to test for abrupt drift-ese are the most popular datasets in the data streamclassification domain Further details are as follows

Hyperplane data stream [37] in the l-dimensionalspace a hyperplane includes the point set X whichsatisfies 1113936

li1 aixi a0 1113936

li1 ai where xi represents

the i-th dimension of X Instances for which1113936

li1 aixi ge a0 represent positive class and instances for

which 1113936li1 aixi lt a0 represent negative class A hy-

perplane in l-dimensional space may slowly rotate bychanging the parameters for simulating time-changingconcepts In this paper the value of l is 10 and there are6 attributes with concept drift and it generates 20000instances -ree different noise ratios (respectively20 25 and 30) are injected into data streamSEA data stream [29] the instances in this data streamare generated from three attributes with continuousvalues x1 x2 x3 isin [0 10] When it satisfies x1 + x2 ge θthe instance is positive class otherwise the label ofinstance is negative To simulate concept drift thethreshold value θ 8 9 7 95 will change over timeIt generates 5000 instances with each threshold valueand the whole SEA data stream includes 20000 in-stances SEA data stream with two different noise ratios(20 and 30) is applied in this experiment to test theabrupt drift

52 Baselines -e PA algorithmic framework [19] andHoeffding tree [38] are selected as baselines to compare withthe presented methodMCBIE and these two approaches are

6 Complexity

frequently chosen as the benchmark in many studies[20 22 23 38] Moreover as a well-known classical algo-rithm the Hoeffding tree algorithm is integrated into theMOA platform [36] -erefore we have followed suit in ourpaper

-e PA algorithmic framework [19] is an online in-cremental learning framework for binary classificationbased on SVM Given instance X the classificationmodel outputs the prediction as follows

1113954y sign(wrarr

middot X) (6)

where wrarr represents a vector of weights and 1113954y is the

prediction of instance X

After the 1113954y is output it acquires the ground truth classlabel y and computes a loss value resulting from the fol-lowing equation

ls max 0 1 minus y middot (wrarr

middot X)1113864 1113865 (7)

-e vector of weights wrarr is then updated using

wrarr

wrarr

+ τ middot y middot X (8)

where τ ge 0 is a Lagrange multiplier whose value is calcu-lated by equation (9) in three different methods namely PAPA-I and PA-II

τ ls

X2(PA)

τ min C lsX2

1113872 11138731113966 1113967(PA minus I)

τ ls

X2 +(1(2 middot C))(PA minus II)

(9)

where C is a positive parameter and referred to as aggres-siveness parameter of the algorithm A detailed outline of thederivation procedure can be found in [19]

Hoeffding tree [38] is a decision tree for online learningfrom the high-volume data stream and it is built fromeach instance in constant time According to Hoeffdingbound we can estimate the number of instances whichare needed to build the tree node-eHoeffding boundhas nothing to do with the distribution function that

generates the instances Moreover the Hoeffdingbound is used to construct Hoeffding tree which isapproximated to the one produced by batch learningIn the light of its incremental nature of the Hoeffdingtree it is used widely in data stream classification

53 Experiment Setup Following the first-test-and-then-train principle [39] each incoming instance is first testedand then the model is retrained with the incoming instanceunder an incremental paradigm To assess the classificationmodelrsquos performance in this paper the classification accu-racy is computed every one hundred instances during theprocess of data stream classification

Both our MCBIE and the baselines are initialized on thefirst 100 instances and the model resulting from that ini-tialization is used to predict the following instance in datastream In MCBIE we use these 100 instances to train 6initial microclusters as base classifiers by using the k-meansalgorithm At every time stamp the three nearest micro-clusters of each incoming instance are selected to assert thelabel information -e maximum scale of microcluster poolis 30 and once full a new microcluster which takes the placeof the worst-performing microcluster joins in the pool Weuse Weka package to implement the MCBIE algorithmHoeffding tree algorithm (named as HT) is run in MOAplatform with the parameters set to their default values PAPA-I and PA-II with Gaussian kernel are executed inMATLAB and the constant C is equal to 1

54 Experiment Result and Analysis -e simulation ex-periments are designed to evaluate MCBIE in two sidesFirst we want to assess the sensitivity of the smoothingparameter σ second we want to justify the feasibility andvalidity of MCBIE

541 Experiment 1 Sensitivity Analysis of the SmoothingParameter Following the experimental setup in Section 53we verify the function of smoothing parameter σ in MCBIEfrom 01 to 1 When the smoothing parameter σ is either toobig or too small the MCBIErsquos average accuracy and cor-responding standard deviation do not reach the desiredresult on the Hyperplane data stream and SEA data stream

(a) (b) (c)

Figure 4-ree different scenarios (a) incoming instancersquos label is the same as the nearest microclusterrsquos label (b) incoming instancersquos labelvaries from the nearest microclusterrsquos label which represents noisy instance and (c) incoming instance as a new concept does not drop intothe nearest microcluster and its label is different from the nearest microclusterrsquos label Note the color represents the class label the rectanglesuggests the incoming instance and the circle represents the nearest microcluster

Complexity 7

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 3: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

instead of adding just one -e experimental results suggestthat the iterative boosting ensemble classification method isa promising way to perform classification task in nonsta-tionary data stream environment Beyond concept driftimbalanced class distributions are another challenge withdata stream classification that can be tackled with ensemblelearning Zhang et alrsquos [27] method of dealing with thisproblem is a two-pronged approach -e first tack is todivide themajority into N subsets of roughly the same size asthe minority and then construct N new balanced trainingsubsets from the minority and divided subsets Next theensemble model is created using a neural network withbackpropagation as the base learning algorithm -e baseclassifiersrsquo diversity is one of the important factors oflearning system Hence Jackowski [32] introduced the ideaof two error trend diversity measurements pair errors andpool errors to find and keep track with concept drift instreaming data setting Experiments with this model showthat the diversity measurements can not only be used toenhance the ensemble modelrsquos performance but also to holdeffectively the scale of ensemble model Based on the aboveanalysis we think that ensemble learning is currently themostpromising research direction for data stream classification

From this review we distill several observations in-cremental learning can dynamically reveal new knowledge indata streams Ensemble learning can improve the stability ofclassification models for nonstationary data streams -esuitable algorithm can enhance a classification modelrsquosflexibility -ese three observations form the basis of threeintegrated strategies in our method for simultaneouslytackling concept drift and noise

3 Basic Concept and Problem Definition

-is section firstly begins with a description of the basicconcepts used in this paper and then a detailed analysis ofthe research problem is explored

31 Data Stream According to the related studies in thispaper we think that data stream consists of a series of labeledinstances namely S X1X2 Xt 1113864 1113865 whereXt (X y) in which X stands for a feature vector whichrepresents an instance characterizing the features of anobject and y is Xtrsquos class label When y is +1 Xt representspositive instance On the contrary Xt is negative instance

According to the above definition we explore a mappingfunction f X⟶ y with high accuracy which stands forclassification model that can output the incoming instanceXrsquos class label Only supervised learning is considered in thispaper -erefore the classification model f is constructedfrom a labeled dataset and once built it can output the classlabel +1 or minus1 for the incoming instance In addition for thepurposes of this paper the real label is acquired after themapping function f outputs the prediction of incominginstance

32 Concept Drift According to the work [33] when a jointprobability distribution P of data changes evolving over

time there exists concept drift In other wordsPt(X y)nePt+1(X y) where the subscript t stands for thetime stamp X suggests the vector which represents the valueof feature attribute and y is a class label According to thechanging rate of concept gradual drift and abrupt drift [34]are discussed Generally speaking gradual drift is a slowerrate of change from one concept to another one and it isillustrated in Figure 1(a) When the distribution Pt isabruptly differentfrom the distribution Pt+1 at t + 1 we saythat abrupt drift occurs and it is seen in Figure 1(b) InFigure 1 the difference between gradual drift and abruptdrift is clearly found and these two kinds of drift areconcerned in this paper

33 Problem Definition Noisy instances and concept driftappear to have similar distributions in nonstationary datastreams It is therefore critical to differentiate noise fromconcept drift and that is the motivation of this paper to builda classification model that can find and keep track withconcept drift in nonstationary streaming data with noiseMeanwhile in order to catch concept drift the classificationmodel should be updated by incremental learning -eresearch problem is demonstrated in Figure 2

From Figure 2 we understand clearly the problemdefinition of this paper and identify the noisy instance innonstationary streaming data -e dotted circle represents amicrocluster and the dotted straight line suggests the dis-tribution for instances in Figure 2-e current case is shownin Figure 2(a) When time goes on the instance is comingand the microcluster is updated at time stamp t + 1 as seenfrom Figure 2(b) which represents the case of incrementallearning At time stamp t + 2 in Figure 2(c) the incominginstance with positive class label lies in the old microclusterwith a different class label In this case the new instance isregarded as a noisy instance and will be discarded this is whythis instance no longer exists at time stamp t + 3 InFigure 2(d) the incoming instance forms a concept drift andleads to a new microcluster construction

Based on the above analysis our solution involves threestrategies to deal with the research problem as illustrated inFigure 2 incremental learning to track concept drift en-semble learning to enhance the modelrsquos stability andmicroclustering method to distinguish drift from noise andmake the final label predictions In the next section weoutline these strategies in detail and discuss the three sce-narios illustrated in Figure 2

4 Adaptive Incremental Ensemble Data StreamClassification Method

-is section describes microcluster and data stream classi-fication model followed by the corresponding algorithm

41 Definition of Microcluster Microclusters as classifiers inour model are constructed by cluster features which is atechnique that was originally developed as part of hierar-chical cluster analysis [35] -e structure of cluster feature is

Complexity 3

defined as CF langSS LS nrang Based on the cluster feature wegive the definition of microcluster used in this paper

Definition 1 Microcluster (MC) is represented aslangSS LS n Ci d CL αrang where SS and LS are used to computethe boundary of MC that SS denotes the square sum of theattributes of the instances in MC as calculated in equation(1) and LS is a vector saves the sum of each attribute as inequation (2) n suggests the number of instances Ci d

presents MCrsquos centroid which changes over time as shownin equation (3) CL is MCrsquos class label and α counts thenumber that MC correctly classifies incoming instance andα is initiated as 0

SS 1113944n

i

1113944

l

j

x2ij (1)

LS 1113944n

i

xi1 1113944n

i

xij 1113944n

i

xil⎛⎝ ⎞⎠ (2)

where l is the dimension of the instance

Cid (1 minus σ) times Cidtminus1 + σ times(LSn) (3)

where Cidtminus1 is MCrsquos centroid on the previous time stampt minus 1 and σ isin [0 1] stands for smoothing parameter

-e size of MC is represented by clusterrsquos radius r whichis calculated as follows

r

SS

nminus

LS

n

21113971

(4)

where LSn represents the length of vector

42 Data Stream Classification Model Based on MicroclusterClassification model consists of three phases classificationincremental learning and updating A framework of themodel is given in Figure 3-e processes and calculations arepresented in detail in this part and summarized into thecorresponding algorithm presented as Algorithm 1

421 Phase 1 (Classification) 8e k-Nearest MicroclustersClassify the Incoming Instance When an incoming instancearrives Euclidean distance is computed between the in-coming instance and each microcluster in pool Based onEuclidean distances the k-nearest microclusters are selectedand then each microcluster will assign its own label to theincoming instance According to equation (5) the final labelof incoming instance is voted by the merged method

y argmaxj

1113944

c

j11113944

k

i1fi(x) (5)

where k stands for the number of microclusters participatingin the classification and c denotes the number of class

t

pt

t + 1

Pt+1

(a)

pt

t + 1

Pt+1

t

(b)

Figure 1 -e two types of concept drift (a) Gradual drift (b) Abrupt drift

Incremental updation ofmicroclusterMicrocluster

(a) (b) (c) (d)

Building new microcluster

t t + 1 t + 2 t + 3

Noisy instance

Figure 2 A demonstration of problem definition In data stream (a) microclusters (shape dotted circle) are developed by historicalinstances (color green) with positive class (shape circle) and negative class (shape triangle) (b) when new instance (color red shape circle)comes microcluster is updated (color red shape dotted circled) and (c) the old microcluster involves noisy instance (color red shapecircle) and (d) concept drift (color red shape circle) is detected and a new microcluster (color red shape dotted circle) is built

4 Complexity

Once incoming instance is classified microcluster isimmediately updated If the final prediction is correct ie ifall the microclusters who voted have the same class label asthe final prediction the value of α increases by 1 otherwiseit decreases by 1

422 Phase 2 (Incremental Learning) 8e Nearest Micro-cluster Will Be Updated Based on the Incoming InstanceFollowing the first-test-and-then-train principle thenearest microcluster is immediately updated to ensure themodel quickly adapts to the new concept or the new

Mediation rule for combining the output of k microclusters

hellip(xt yt) (xt+1 yt+1)hellip Output prediction

Select k microclusters

Phase 2 incremental learning

Phase 1 classify the incoming instance

Phase 3 updating the pool

Microcluster

Microcluster

Figure 3 -e framework of data stream classification model

Input -e instances S X1 X2 Xt 1113864 1113865the pool maximum limit M andthe smoothing parameter σ

Output -e pool of microcluster Plowast

(1) P(0) cupLi1MCi⟵ the pool of initial microclusters (MC) which is formed by k-means(2) for each instance Xt (X y) do

Phase 1 Classification(3) d⟵ distance between X and MC

(4) MC(t)⟵ select the k-nearest microclusters to classify the instance X

(5) 1113954y⟵ the predicted class label of instance X gained by majority vote in equation (5)(6) α⟵ update the parameter of the k-nearest microcluster

Phase 2 Incremental Learning(7) if Scenario 1 then(8) MC(t)lowast ⟵ update the structure of nearest microcluster by equations (1)ndash(3) and the number of the instances in

microcluster will be incremented by 1(9) else if Scenario 2 then(10) X⟵ consider the instance as a noisy point and neglect it(11) else if Scenario 3 then(12) MC

(t)X ⟵ build a new microcluster on instance X

Phase 3 Updating Pool(13) ifLltM then(14) P(t) P(t) + MC

(t)X

(15) L L + 1(16) else(17) MC

(t)worst⟵ the worst microcluster

(18) MC(t)X ⟵ replace MC

(t)worst

(19) end if(20) end if(21) end for(22) return Plowast ⟵ microcluster pool at required time stamp t

ALGORITHM 1 MCBIE

Complexity 5

microcluster is constructed in this phase which is depictedin Figure 4

Scenario 1when incoming instancersquos label is the same asthe nearest microclusterrsquos label incoming instance isused to retrain this microcluster -e terms SS LS andCi d of the nearest microcluster are recalculated byequations (1)ndash(3) -e number of instances in thismicrocluster is incremented by 1 -e radius of micro-cluster is also updated by equation (4) -is scenario isshown in Figure 4(a) As a matter of fact when the in-coming instance drops into the nearest microcluster wecarry out the same operation that is the incoming in-stance is merged into the nearest microclusterScenario 2 incoming instancersquos label varies from thenearest microclusterrsquos label and incoming instance liesinside the boundary of the nearest microcluster as seenfrom Figure 4(b) In this paper there exists the fun-damental assumption that two adjacent instances arehighly likely to represent the same concept ie theprobability that they share the same class label is veryhigh According to the fundamental assumption theincoming instance will be treated as noise and deletedScenario 3 in contrast to Scenario 2 incoming instancersquoslabel is different from the nearest microclusterrsquos label andincoming instance does not drop into the nearestmicrocluster as shown in Figure 4(c) -is scenariosuggests that incoming instance is derived from thedifferent joint probability distribution Under thiscircumstance we think new concept happens and amicrocluster will be constructed with incoming in-stance by the method described in Section 41 Becausethere is only one instance in this new microclusterwhen it is constructed its label CL will be the same asthe incoming instance and its centroid will be theincoming instance itself -e terms SS and LS of thenew microcluster are computed by equations (1) and(2) and the value of α is 0

423 Phase 3 (Updating) 8e Pool of Microcluster IsUpdated As time passes new microclusters are continu-ously being created and eventually the pool will reach itslimit Once full the microcluster with the worst performancewill be replaced with new microcluster By this cyclicalupdate the classification model can effectively catch conceptchange and it leads to improve the classification accuracyGenerally speaking the smaller the value of α the worse theperformance of themicrocluster-erefore the microclusterwith the smallest α is selected for replacement

43 Algorithm and Complexity Analysis In summary of theabove phases and scenarios in data stream classificationmodel the algorithm of microcluster-based incrementalensemble classification named as MCBIE is expressed inAlgorithm 1

-e algorithm of MCBIE includes three phases whichachieve three functions namely classification incrementallearning and updating pool Line 1 is to train the initial

microcluster and build a pool of microclusters Lines 3 to 6achieve the classification for an incoming instance X andupdate the performance of the microcluster According to thethree different scenarios the function of Phase 2 is accom-plished in lines 7 to 12 Finally the size of base classifierreaches the upper-bound M the worst microcluster will bedeleted and the new microcluster is added to microclusterpool On the contrary the new microcluster is directly putinto microcluster pool It is illustrated in lines 13 to 19

In terms of complexity through the analysis of Algo-rithm 1 we know the core operation included by the al-gorithm MCBIE is to calculate the distance in classificationphase -e complexity here depends on mainly two aspectsthe dimensions of the instance (l) and the number ofmicroclusters as base classifier (M) in the ensemble model-us the presented algorithmrsquos time complexity is ap-proximately O(l middot M) In the presented algorithm theprevious instances are not reserved over time and the sta-tistical information of microcluster is recorded such as SSLS and Ci d which can save the storage memory by this way

5 Experiments

51 Datasets To evaluate MCBIE we conduct simulationexperiments with two synthetic datasets -e two datasetsselected are the Hyperplane data stream and the SEA datastream taken from Massive Online Analysis (MOA) [36]Hyperplane data stream is designed to test for gradual driftwhile SEA data stream is designed to test for abrupt drift-ese are the most popular datasets in the data streamclassification domain Further details are as follows

Hyperplane data stream [37] in the l-dimensionalspace a hyperplane includes the point set X whichsatisfies 1113936

li1 aixi a0 1113936

li1 ai where xi represents

the i-th dimension of X Instances for which1113936

li1 aixi ge a0 represent positive class and instances for

which 1113936li1 aixi lt a0 represent negative class A hy-

perplane in l-dimensional space may slowly rotate bychanging the parameters for simulating time-changingconcepts In this paper the value of l is 10 and there are6 attributes with concept drift and it generates 20000instances -ree different noise ratios (respectively20 25 and 30) are injected into data streamSEA data stream [29] the instances in this data streamare generated from three attributes with continuousvalues x1 x2 x3 isin [0 10] When it satisfies x1 + x2 ge θthe instance is positive class otherwise the label ofinstance is negative To simulate concept drift thethreshold value θ 8 9 7 95 will change over timeIt generates 5000 instances with each threshold valueand the whole SEA data stream includes 20000 in-stances SEA data stream with two different noise ratios(20 and 30) is applied in this experiment to test theabrupt drift

52 Baselines -e PA algorithmic framework [19] andHoeffding tree [38] are selected as baselines to compare withthe presented methodMCBIE and these two approaches are

6 Complexity

frequently chosen as the benchmark in many studies[20 22 23 38] Moreover as a well-known classical algo-rithm the Hoeffding tree algorithm is integrated into theMOA platform [36] -erefore we have followed suit in ourpaper

-e PA algorithmic framework [19] is an online in-cremental learning framework for binary classificationbased on SVM Given instance X the classificationmodel outputs the prediction as follows

1113954y sign(wrarr

middot X) (6)

where wrarr represents a vector of weights and 1113954y is the

prediction of instance X

After the 1113954y is output it acquires the ground truth classlabel y and computes a loss value resulting from the fol-lowing equation

ls max 0 1 minus y middot (wrarr

middot X)1113864 1113865 (7)

-e vector of weights wrarr is then updated using

wrarr

wrarr

+ τ middot y middot X (8)

where τ ge 0 is a Lagrange multiplier whose value is calcu-lated by equation (9) in three different methods namely PAPA-I and PA-II

τ ls

X2(PA)

τ min C lsX2

1113872 11138731113966 1113967(PA minus I)

τ ls

X2 +(1(2 middot C))(PA minus II)

(9)

where C is a positive parameter and referred to as aggres-siveness parameter of the algorithm A detailed outline of thederivation procedure can be found in [19]

Hoeffding tree [38] is a decision tree for online learningfrom the high-volume data stream and it is built fromeach instance in constant time According to Hoeffdingbound we can estimate the number of instances whichare needed to build the tree node-eHoeffding boundhas nothing to do with the distribution function that

generates the instances Moreover the Hoeffdingbound is used to construct Hoeffding tree which isapproximated to the one produced by batch learningIn the light of its incremental nature of the Hoeffdingtree it is used widely in data stream classification

53 Experiment Setup Following the first-test-and-then-train principle [39] each incoming instance is first testedand then the model is retrained with the incoming instanceunder an incremental paradigm To assess the classificationmodelrsquos performance in this paper the classification accu-racy is computed every one hundred instances during theprocess of data stream classification

Both our MCBIE and the baselines are initialized on thefirst 100 instances and the model resulting from that ini-tialization is used to predict the following instance in datastream In MCBIE we use these 100 instances to train 6initial microclusters as base classifiers by using the k-meansalgorithm At every time stamp the three nearest micro-clusters of each incoming instance are selected to assert thelabel information -e maximum scale of microcluster poolis 30 and once full a new microcluster which takes the placeof the worst-performing microcluster joins in the pool Weuse Weka package to implement the MCBIE algorithmHoeffding tree algorithm (named as HT) is run in MOAplatform with the parameters set to their default values PAPA-I and PA-II with Gaussian kernel are executed inMATLAB and the constant C is equal to 1

54 Experiment Result and Analysis -e simulation ex-periments are designed to evaluate MCBIE in two sidesFirst we want to assess the sensitivity of the smoothingparameter σ second we want to justify the feasibility andvalidity of MCBIE

541 Experiment 1 Sensitivity Analysis of the SmoothingParameter Following the experimental setup in Section 53we verify the function of smoothing parameter σ in MCBIEfrom 01 to 1 When the smoothing parameter σ is either toobig or too small the MCBIErsquos average accuracy and cor-responding standard deviation do not reach the desiredresult on the Hyperplane data stream and SEA data stream

(a) (b) (c)

Figure 4-ree different scenarios (a) incoming instancersquos label is the same as the nearest microclusterrsquos label (b) incoming instancersquos labelvaries from the nearest microclusterrsquos label which represents noisy instance and (c) incoming instance as a new concept does not drop intothe nearest microcluster and its label is different from the nearest microclusterrsquos label Note the color represents the class label the rectanglesuggests the incoming instance and the circle represents the nearest microcluster

Complexity 7

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 4: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

defined as CF langSS LS nrang Based on the cluster feature wegive the definition of microcluster used in this paper

Definition 1 Microcluster (MC) is represented aslangSS LS n Ci d CL αrang where SS and LS are used to computethe boundary of MC that SS denotes the square sum of theattributes of the instances in MC as calculated in equation(1) and LS is a vector saves the sum of each attribute as inequation (2) n suggests the number of instances Ci d

presents MCrsquos centroid which changes over time as shownin equation (3) CL is MCrsquos class label and α counts thenumber that MC correctly classifies incoming instance andα is initiated as 0

SS 1113944n

i

1113944

l

j

x2ij (1)

LS 1113944n

i

xi1 1113944n

i

xij 1113944n

i

xil⎛⎝ ⎞⎠ (2)

where l is the dimension of the instance

Cid (1 minus σ) times Cidtminus1 + σ times(LSn) (3)

where Cidtminus1 is MCrsquos centroid on the previous time stampt minus 1 and σ isin [0 1] stands for smoothing parameter

-e size of MC is represented by clusterrsquos radius r whichis calculated as follows

r

SS

nminus

LS

n

21113971

(4)

where LSn represents the length of vector

42 Data Stream Classification Model Based on MicroclusterClassification model consists of three phases classificationincremental learning and updating A framework of themodel is given in Figure 3-e processes and calculations arepresented in detail in this part and summarized into thecorresponding algorithm presented as Algorithm 1

421 Phase 1 (Classification) 8e k-Nearest MicroclustersClassify the Incoming Instance When an incoming instancearrives Euclidean distance is computed between the in-coming instance and each microcluster in pool Based onEuclidean distances the k-nearest microclusters are selectedand then each microcluster will assign its own label to theincoming instance According to equation (5) the final labelof incoming instance is voted by the merged method

y argmaxj

1113944

c

j11113944

k

i1fi(x) (5)

where k stands for the number of microclusters participatingin the classification and c denotes the number of class

t

pt

t + 1

Pt+1

(a)

pt

t + 1

Pt+1

t

(b)

Figure 1 -e two types of concept drift (a) Gradual drift (b) Abrupt drift

Incremental updation ofmicroclusterMicrocluster

(a) (b) (c) (d)

Building new microcluster

t t + 1 t + 2 t + 3

Noisy instance

Figure 2 A demonstration of problem definition In data stream (a) microclusters (shape dotted circle) are developed by historicalinstances (color green) with positive class (shape circle) and negative class (shape triangle) (b) when new instance (color red shape circle)comes microcluster is updated (color red shape dotted circled) and (c) the old microcluster involves noisy instance (color red shapecircle) and (d) concept drift (color red shape circle) is detected and a new microcluster (color red shape dotted circle) is built

4 Complexity

Once incoming instance is classified microcluster isimmediately updated If the final prediction is correct ie ifall the microclusters who voted have the same class label asthe final prediction the value of α increases by 1 otherwiseit decreases by 1

422 Phase 2 (Incremental Learning) 8e Nearest Micro-cluster Will Be Updated Based on the Incoming InstanceFollowing the first-test-and-then-train principle thenearest microcluster is immediately updated to ensure themodel quickly adapts to the new concept or the new

Mediation rule for combining the output of k microclusters

hellip(xt yt) (xt+1 yt+1)hellip Output prediction

Select k microclusters

Phase 2 incremental learning

Phase 1 classify the incoming instance

Phase 3 updating the pool

Microcluster

Microcluster

Figure 3 -e framework of data stream classification model

Input -e instances S X1 X2 Xt 1113864 1113865the pool maximum limit M andthe smoothing parameter σ

Output -e pool of microcluster Plowast

(1) P(0) cupLi1MCi⟵ the pool of initial microclusters (MC) which is formed by k-means(2) for each instance Xt (X y) do

Phase 1 Classification(3) d⟵ distance between X and MC

(4) MC(t)⟵ select the k-nearest microclusters to classify the instance X

(5) 1113954y⟵ the predicted class label of instance X gained by majority vote in equation (5)(6) α⟵ update the parameter of the k-nearest microcluster

Phase 2 Incremental Learning(7) if Scenario 1 then(8) MC(t)lowast ⟵ update the structure of nearest microcluster by equations (1)ndash(3) and the number of the instances in

microcluster will be incremented by 1(9) else if Scenario 2 then(10) X⟵ consider the instance as a noisy point and neglect it(11) else if Scenario 3 then(12) MC

(t)X ⟵ build a new microcluster on instance X

Phase 3 Updating Pool(13) ifLltM then(14) P(t) P(t) + MC

(t)X

(15) L L + 1(16) else(17) MC

(t)worst⟵ the worst microcluster

(18) MC(t)X ⟵ replace MC

(t)worst

(19) end if(20) end if(21) end for(22) return Plowast ⟵ microcluster pool at required time stamp t

ALGORITHM 1 MCBIE

Complexity 5

microcluster is constructed in this phase which is depictedin Figure 4

Scenario 1when incoming instancersquos label is the same asthe nearest microclusterrsquos label incoming instance isused to retrain this microcluster -e terms SS LS andCi d of the nearest microcluster are recalculated byequations (1)ndash(3) -e number of instances in thismicrocluster is incremented by 1 -e radius of micro-cluster is also updated by equation (4) -is scenario isshown in Figure 4(a) As a matter of fact when the in-coming instance drops into the nearest microcluster wecarry out the same operation that is the incoming in-stance is merged into the nearest microclusterScenario 2 incoming instancersquos label varies from thenearest microclusterrsquos label and incoming instance liesinside the boundary of the nearest microcluster as seenfrom Figure 4(b) In this paper there exists the fun-damental assumption that two adjacent instances arehighly likely to represent the same concept ie theprobability that they share the same class label is veryhigh According to the fundamental assumption theincoming instance will be treated as noise and deletedScenario 3 in contrast to Scenario 2 incoming instancersquoslabel is different from the nearest microclusterrsquos label andincoming instance does not drop into the nearestmicrocluster as shown in Figure 4(c) -is scenariosuggests that incoming instance is derived from thedifferent joint probability distribution Under thiscircumstance we think new concept happens and amicrocluster will be constructed with incoming in-stance by the method described in Section 41 Becausethere is only one instance in this new microclusterwhen it is constructed its label CL will be the same asthe incoming instance and its centroid will be theincoming instance itself -e terms SS and LS of thenew microcluster are computed by equations (1) and(2) and the value of α is 0

423 Phase 3 (Updating) 8e Pool of Microcluster IsUpdated As time passes new microclusters are continu-ously being created and eventually the pool will reach itslimit Once full the microcluster with the worst performancewill be replaced with new microcluster By this cyclicalupdate the classification model can effectively catch conceptchange and it leads to improve the classification accuracyGenerally speaking the smaller the value of α the worse theperformance of themicrocluster-erefore the microclusterwith the smallest α is selected for replacement

43 Algorithm and Complexity Analysis In summary of theabove phases and scenarios in data stream classificationmodel the algorithm of microcluster-based incrementalensemble classification named as MCBIE is expressed inAlgorithm 1

-e algorithm of MCBIE includes three phases whichachieve three functions namely classification incrementallearning and updating pool Line 1 is to train the initial

microcluster and build a pool of microclusters Lines 3 to 6achieve the classification for an incoming instance X andupdate the performance of the microcluster According to thethree different scenarios the function of Phase 2 is accom-plished in lines 7 to 12 Finally the size of base classifierreaches the upper-bound M the worst microcluster will bedeleted and the new microcluster is added to microclusterpool On the contrary the new microcluster is directly putinto microcluster pool It is illustrated in lines 13 to 19

In terms of complexity through the analysis of Algo-rithm 1 we know the core operation included by the al-gorithm MCBIE is to calculate the distance in classificationphase -e complexity here depends on mainly two aspectsthe dimensions of the instance (l) and the number ofmicroclusters as base classifier (M) in the ensemble model-us the presented algorithmrsquos time complexity is ap-proximately O(l middot M) In the presented algorithm theprevious instances are not reserved over time and the sta-tistical information of microcluster is recorded such as SSLS and Ci d which can save the storage memory by this way

5 Experiments

51 Datasets To evaluate MCBIE we conduct simulationexperiments with two synthetic datasets -e two datasetsselected are the Hyperplane data stream and the SEA datastream taken from Massive Online Analysis (MOA) [36]Hyperplane data stream is designed to test for gradual driftwhile SEA data stream is designed to test for abrupt drift-ese are the most popular datasets in the data streamclassification domain Further details are as follows

Hyperplane data stream [37] in the l-dimensionalspace a hyperplane includes the point set X whichsatisfies 1113936

li1 aixi a0 1113936

li1 ai where xi represents

the i-th dimension of X Instances for which1113936

li1 aixi ge a0 represent positive class and instances for

which 1113936li1 aixi lt a0 represent negative class A hy-

perplane in l-dimensional space may slowly rotate bychanging the parameters for simulating time-changingconcepts In this paper the value of l is 10 and there are6 attributes with concept drift and it generates 20000instances -ree different noise ratios (respectively20 25 and 30) are injected into data streamSEA data stream [29] the instances in this data streamare generated from three attributes with continuousvalues x1 x2 x3 isin [0 10] When it satisfies x1 + x2 ge θthe instance is positive class otherwise the label ofinstance is negative To simulate concept drift thethreshold value θ 8 9 7 95 will change over timeIt generates 5000 instances with each threshold valueand the whole SEA data stream includes 20000 in-stances SEA data stream with two different noise ratios(20 and 30) is applied in this experiment to test theabrupt drift

52 Baselines -e PA algorithmic framework [19] andHoeffding tree [38] are selected as baselines to compare withthe presented methodMCBIE and these two approaches are

6 Complexity

frequently chosen as the benchmark in many studies[20 22 23 38] Moreover as a well-known classical algo-rithm the Hoeffding tree algorithm is integrated into theMOA platform [36] -erefore we have followed suit in ourpaper

-e PA algorithmic framework [19] is an online in-cremental learning framework for binary classificationbased on SVM Given instance X the classificationmodel outputs the prediction as follows

1113954y sign(wrarr

middot X) (6)

where wrarr represents a vector of weights and 1113954y is the

prediction of instance X

After the 1113954y is output it acquires the ground truth classlabel y and computes a loss value resulting from the fol-lowing equation

ls max 0 1 minus y middot (wrarr

middot X)1113864 1113865 (7)

-e vector of weights wrarr is then updated using

wrarr

wrarr

+ τ middot y middot X (8)

where τ ge 0 is a Lagrange multiplier whose value is calcu-lated by equation (9) in three different methods namely PAPA-I and PA-II

τ ls

X2(PA)

τ min C lsX2

1113872 11138731113966 1113967(PA minus I)

τ ls

X2 +(1(2 middot C))(PA minus II)

(9)

where C is a positive parameter and referred to as aggres-siveness parameter of the algorithm A detailed outline of thederivation procedure can be found in [19]

Hoeffding tree [38] is a decision tree for online learningfrom the high-volume data stream and it is built fromeach instance in constant time According to Hoeffdingbound we can estimate the number of instances whichare needed to build the tree node-eHoeffding boundhas nothing to do with the distribution function that

generates the instances Moreover the Hoeffdingbound is used to construct Hoeffding tree which isapproximated to the one produced by batch learningIn the light of its incremental nature of the Hoeffdingtree it is used widely in data stream classification

53 Experiment Setup Following the first-test-and-then-train principle [39] each incoming instance is first testedand then the model is retrained with the incoming instanceunder an incremental paradigm To assess the classificationmodelrsquos performance in this paper the classification accu-racy is computed every one hundred instances during theprocess of data stream classification

Both our MCBIE and the baselines are initialized on thefirst 100 instances and the model resulting from that ini-tialization is used to predict the following instance in datastream In MCBIE we use these 100 instances to train 6initial microclusters as base classifiers by using the k-meansalgorithm At every time stamp the three nearest micro-clusters of each incoming instance are selected to assert thelabel information -e maximum scale of microcluster poolis 30 and once full a new microcluster which takes the placeof the worst-performing microcluster joins in the pool Weuse Weka package to implement the MCBIE algorithmHoeffding tree algorithm (named as HT) is run in MOAplatform with the parameters set to their default values PAPA-I and PA-II with Gaussian kernel are executed inMATLAB and the constant C is equal to 1

54 Experiment Result and Analysis -e simulation ex-periments are designed to evaluate MCBIE in two sidesFirst we want to assess the sensitivity of the smoothingparameter σ second we want to justify the feasibility andvalidity of MCBIE

541 Experiment 1 Sensitivity Analysis of the SmoothingParameter Following the experimental setup in Section 53we verify the function of smoothing parameter σ in MCBIEfrom 01 to 1 When the smoothing parameter σ is either toobig or too small the MCBIErsquos average accuracy and cor-responding standard deviation do not reach the desiredresult on the Hyperplane data stream and SEA data stream

(a) (b) (c)

Figure 4-ree different scenarios (a) incoming instancersquos label is the same as the nearest microclusterrsquos label (b) incoming instancersquos labelvaries from the nearest microclusterrsquos label which represents noisy instance and (c) incoming instance as a new concept does not drop intothe nearest microcluster and its label is different from the nearest microclusterrsquos label Note the color represents the class label the rectanglesuggests the incoming instance and the circle represents the nearest microcluster

Complexity 7

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 5: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

Once incoming instance is classified microcluster isimmediately updated If the final prediction is correct ie ifall the microclusters who voted have the same class label asthe final prediction the value of α increases by 1 otherwiseit decreases by 1

422 Phase 2 (Incremental Learning) 8e Nearest Micro-cluster Will Be Updated Based on the Incoming InstanceFollowing the first-test-and-then-train principle thenearest microcluster is immediately updated to ensure themodel quickly adapts to the new concept or the new

Mediation rule for combining the output of k microclusters

hellip(xt yt) (xt+1 yt+1)hellip Output prediction

Select k microclusters

Phase 2 incremental learning

Phase 1 classify the incoming instance

Phase 3 updating the pool

Microcluster

Microcluster

Figure 3 -e framework of data stream classification model

Input -e instances S X1 X2 Xt 1113864 1113865the pool maximum limit M andthe smoothing parameter σ

Output -e pool of microcluster Plowast

(1) P(0) cupLi1MCi⟵ the pool of initial microclusters (MC) which is formed by k-means(2) for each instance Xt (X y) do

Phase 1 Classification(3) d⟵ distance between X and MC

(4) MC(t)⟵ select the k-nearest microclusters to classify the instance X

(5) 1113954y⟵ the predicted class label of instance X gained by majority vote in equation (5)(6) α⟵ update the parameter of the k-nearest microcluster

Phase 2 Incremental Learning(7) if Scenario 1 then(8) MC(t)lowast ⟵ update the structure of nearest microcluster by equations (1)ndash(3) and the number of the instances in

microcluster will be incremented by 1(9) else if Scenario 2 then(10) X⟵ consider the instance as a noisy point and neglect it(11) else if Scenario 3 then(12) MC

(t)X ⟵ build a new microcluster on instance X

Phase 3 Updating Pool(13) ifLltM then(14) P(t) P(t) + MC

(t)X

(15) L L + 1(16) else(17) MC

(t)worst⟵ the worst microcluster

(18) MC(t)X ⟵ replace MC

(t)worst

(19) end if(20) end if(21) end for(22) return Plowast ⟵ microcluster pool at required time stamp t

ALGORITHM 1 MCBIE

Complexity 5

microcluster is constructed in this phase which is depictedin Figure 4

Scenario 1when incoming instancersquos label is the same asthe nearest microclusterrsquos label incoming instance isused to retrain this microcluster -e terms SS LS andCi d of the nearest microcluster are recalculated byequations (1)ndash(3) -e number of instances in thismicrocluster is incremented by 1 -e radius of micro-cluster is also updated by equation (4) -is scenario isshown in Figure 4(a) As a matter of fact when the in-coming instance drops into the nearest microcluster wecarry out the same operation that is the incoming in-stance is merged into the nearest microclusterScenario 2 incoming instancersquos label varies from thenearest microclusterrsquos label and incoming instance liesinside the boundary of the nearest microcluster as seenfrom Figure 4(b) In this paper there exists the fun-damental assumption that two adjacent instances arehighly likely to represent the same concept ie theprobability that they share the same class label is veryhigh According to the fundamental assumption theincoming instance will be treated as noise and deletedScenario 3 in contrast to Scenario 2 incoming instancersquoslabel is different from the nearest microclusterrsquos label andincoming instance does not drop into the nearestmicrocluster as shown in Figure 4(c) -is scenariosuggests that incoming instance is derived from thedifferent joint probability distribution Under thiscircumstance we think new concept happens and amicrocluster will be constructed with incoming in-stance by the method described in Section 41 Becausethere is only one instance in this new microclusterwhen it is constructed its label CL will be the same asthe incoming instance and its centroid will be theincoming instance itself -e terms SS and LS of thenew microcluster are computed by equations (1) and(2) and the value of α is 0

423 Phase 3 (Updating) 8e Pool of Microcluster IsUpdated As time passes new microclusters are continu-ously being created and eventually the pool will reach itslimit Once full the microcluster with the worst performancewill be replaced with new microcluster By this cyclicalupdate the classification model can effectively catch conceptchange and it leads to improve the classification accuracyGenerally speaking the smaller the value of α the worse theperformance of themicrocluster-erefore the microclusterwith the smallest α is selected for replacement

43 Algorithm and Complexity Analysis In summary of theabove phases and scenarios in data stream classificationmodel the algorithm of microcluster-based incrementalensemble classification named as MCBIE is expressed inAlgorithm 1

-e algorithm of MCBIE includes three phases whichachieve three functions namely classification incrementallearning and updating pool Line 1 is to train the initial

microcluster and build a pool of microclusters Lines 3 to 6achieve the classification for an incoming instance X andupdate the performance of the microcluster According to thethree different scenarios the function of Phase 2 is accom-plished in lines 7 to 12 Finally the size of base classifierreaches the upper-bound M the worst microcluster will bedeleted and the new microcluster is added to microclusterpool On the contrary the new microcluster is directly putinto microcluster pool It is illustrated in lines 13 to 19

In terms of complexity through the analysis of Algo-rithm 1 we know the core operation included by the al-gorithm MCBIE is to calculate the distance in classificationphase -e complexity here depends on mainly two aspectsthe dimensions of the instance (l) and the number ofmicroclusters as base classifier (M) in the ensemble model-us the presented algorithmrsquos time complexity is ap-proximately O(l middot M) In the presented algorithm theprevious instances are not reserved over time and the sta-tistical information of microcluster is recorded such as SSLS and Ci d which can save the storage memory by this way

5 Experiments

51 Datasets To evaluate MCBIE we conduct simulationexperiments with two synthetic datasets -e two datasetsselected are the Hyperplane data stream and the SEA datastream taken from Massive Online Analysis (MOA) [36]Hyperplane data stream is designed to test for gradual driftwhile SEA data stream is designed to test for abrupt drift-ese are the most popular datasets in the data streamclassification domain Further details are as follows

Hyperplane data stream [37] in the l-dimensionalspace a hyperplane includes the point set X whichsatisfies 1113936

li1 aixi a0 1113936

li1 ai where xi represents

the i-th dimension of X Instances for which1113936

li1 aixi ge a0 represent positive class and instances for

which 1113936li1 aixi lt a0 represent negative class A hy-

perplane in l-dimensional space may slowly rotate bychanging the parameters for simulating time-changingconcepts In this paper the value of l is 10 and there are6 attributes with concept drift and it generates 20000instances -ree different noise ratios (respectively20 25 and 30) are injected into data streamSEA data stream [29] the instances in this data streamare generated from three attributes with continuousvalues x1 x2 x3 isin [0 10] When it satisfies x1 + x2 ge θthe instance is positive class otherwise the label ofinstance is negative To simulate concept drift thethreshold value θ 8 9 7 95 will change over timeIt generates 5000 instances with each threshold valueand the whole SEA data stream includes 20000 in-stances SEA data stream with two different noise ratios(20 and 30) is applied in this experiment to test theabrupt drift

52 Baselines -e PA algorithmic framework [19] andHoeffding tree [38] are selected as baselines to compare withthe presented methodMCBIE and these two approaches are

6 Complexity

frequently chosen as the benchmark in many studies[20 22 23 38] Moreover as a well-known classical algo-rithm the Hoeffding tree algorithm is integrated into theMOA platform [36] -erefore we have followed suit in ourpaper

-e PA algorithmic framework [19] is an online in-cremental learning framework for binary classificationbased on SVM Given instance X the classificationmodel outputs the prediction as follows

1113954y sign(wrarr

middot X) (6)

where wrarr represents a vector of weights and 1113954y is the

prediction of instance X

After the 1113954y is output it acquires the ground truth classlabel y and computes a loss value resulting from the fol-lowing equation

ls max 0 1 minus y middot (wrarr

middot X)1113864 1113865 (7)

-e vector of weights wrarr is then updated using

wrarr

wrarr

+ τ middot y middot X (8)

where τ ge 0 is a Lagrange multiplier whose value is calcu-lated by equation (9) in three different methods namely PAPA-I and PA-II

τ ls

X2(PA)

τ min C lsX2

1113872 11138731113966 1113967(PA minus I)

τ ls

X2 +(1(2 middot C))(PA minus II)

(9)

where C is a positive parameter and referred to as aggres-siveness parameter of the algorithm A detailed outline of thederivation procedure can be found in [19]

Hoeffding tree [38] is a decision tree for online learningfrom the high-volume data stream and it is built fromeach instance in constant time According to Hoeffdingbound we can estimate the number of instances whichare needed to build the tree node-eHoeffding boundhas nothing to do with the distribution function that

generates the instances Moreover the Hoeffdingbound is used to construct Hoeffding tree which isapproximated to the one produced by batch learningIn the light of its incremental nature of the Hoeffdingtree it is used widely in data stream classification

53 Experiment Setup Following the first-test-and-then-train principle [39] each incoming instance is first testedand then the model is retrained with the incoming instanceunder an incremental paradigm To assess the classificationmodelrsquos performance in this paper the classification accu-racy is computed every one hundred instances during theprocess of data stream classification

Both our MCBIE and the baselines are initialized on thefirst 100 instances and the model resulting from that ini-tialization is used to predict the following instance in datastream In MCBIE we use these 100 instances to train 6initial microclusters as base classifiers by using the k-meansalgorithm At every time stamp the three nearest micro-clusters of each incoming instance are selected to assert thelabel information -e maximum scale of microcluster poolis 30 and once full a new microcluster which takes the placeof the worst-performing microcluster joins in the pool Weuse Weka package to implement the MCBIE algorithmHoeffding tree algorithm (named as HT) is run in MOAplatform with the parameters set to their default values PAPA-I and PA-II with Gaussian kernel are executed inMATLAB and the constant C is equal to 1

54 Experiment Result and Analysis -e simulation ex-periments are designed to evaluate MCBIE in two sidesFirst we want to assess the sensitivity of the smoothingparameter σ second we want to justify the feasibility andvalidity of MCBIE

541 Experiment 1 Sensitivity Analysis of the SmoothingParameter Following the experimental setup in Section 53we verify the function of smoothing parameter σ in MCBIEfrom 01 to 1 When the smoothing parameter σ is either toobig or too small the MCBIErsquos average accuracy and cor-responding standard deviation do not reach the desiredresult on the Hyperplane data stream and SEA data stream

(a) (b) (c)

Figure 4-ree different scenarios (a) incoming instancersquos label is the same as the nearest microclusterrsquos label (b) incoming instancersquos labelvaries from the nearest microclusterrsquos label which represents noisy instance and (c) incoming instance as a new concept does not drop intothe nearest microcluster and its label is different from the nearest microclusterrsquos label Note the color represents the class label the rectanglesuggests the incoming instance and the circle represents the nearest microcluster

Complexity 7

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 6: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

microcluster is constructed in this phase which is depictedin Figure 4

Scenario 1when incoming instancersquos label is the same asthe nearest microclusterrsquos label incoming instance isused to retrain this microcluster -e terms SS LS andCi d of the nearest microcluster are recalculated byequations (1)ndash(3) -e number of instances in thismicrocluster is incremented by 1 -e radius of micro-cluster is also updated by equation (4) -is scenario isshown in Figure 4(a) As a matter of fact when the in-coming instance drops into the nearest microcluster wecarry out the same operation that is the incoming in-stance is merged into the nearest microclusterScenario 2 incoming instancersquos label varies from thenearest microclusterrsquos label and incoming instance liesinside the boundary of the nearest microcluster as seenfrom Figure 4(b) In this paper there exists the fun-damental assumption that two adjacent instances arehighly likely to represent the same concept ie theprobability that they share the same class label is veryhigh According to the fundamental assumption theincoming instance will be treated as noise and deletedScenario 3 in contrast to Scenario 2 incoming instancersquoslabel is different from the nearest microclusterrsquos label andincoming instance does not drop into the nearestmicrocluster as shown in Figure 4(c) -is scenariosuggests that incoming instance is derived from thedifferent joint probability distribution Under thiscircumstance we think new concept happens and amicrocluster will be constructed with incoming in-stance by the method described in Section 41 Becausethere is only one instance in this new microclusterwhen it is constructed its label CL will be the same asthe incoming instance and its centroid will be theincoming instance itself -e terms SS and LS of thenew microcluster are computed by equations (1) and(2) and the value of α is 0

423 Phase 3 (Updating) 8e Pool of Microcluster IsUpdated As time passes new microclusters are continu-ously being created and eventually the pool will reach itslimit Once full the microcluster with the worst performancewill be replaced with new microcluster By this cyclicalupdate the classification model can effectively catch conceptchange and it leads to improve the classification accuracyGenerally speaking the smaller the value of α the worse theperformance of themicrocluster-erefore the microclusterwith the smallest α is selected for replacement

43 Algorithm and Complexity Analysis In summary of theabove phases and scenarios in data stream classificationmodel the algorithm of microcluster-based incrementalensemble classification named as MCBIE is expressed inAlgorithm 1

-e algorithm of MCBIE includes three phases whichachieve three functions namely classification incrementallearning and updating pool Line 1 is to train the initial

microcluster and build a pool of microclusters Lines 3 to 6achieve the classification for an incoming instance X andupdate the performance of the microcluster According to thethree different scenarios the function of Phase 2 is accom-plished in lines 7 to 12 Finally the size of base classifierreaches the upper-bound M the worst microcluster will bedeleted and the new microcluster is added to microclusterpool On the contrary the new microcluster is directly putinto microcluster pool It is illustrated in lines 13 to 19

In terms of complexity through the analysis of Algo-rithm 1 we know the core operation included by the al-gorithm MCBIE is to calculate the distance in classificationphase -e complexity here depends on mainly two aspectsthe dimensions of the instance (l) and the number ofmicroclusters as base classifier (M) in the ensemble model-us the presented algorithmrsquos time complexity is ap-proximately O(l middot M) In the presented algorithm theprevious instances are not reserved over time and the sta-tistical information of microcluster is recorded such as SSLS and Ci d which can save the storage memory by this way

5 Experiments

51 Datasets To evaluate MCBIE we conduct simulationexperiments with two synthetic datasets -e two datasetsselected are the Hyperplane data stream and the SEA datastream taken from Massive Online Analysis (MOA) [36]Hyperplane data stream is designed to test for gradual driftwhile SEA data stream is designed to test for abrupt drift-ese are the most popular datasets in the data streamclassification domain Further details are as follows

Hyperplane data stream [37] in the l-dimensionalspace a hyperplane includes the point set X whichsatisfies 1113936

li1 aixi a0 1113936

li1 ai where xi represents

the i-th dimension of X Instances for which1113936

li1 aixi ge a0 represent positive class and instances for

which 1113936li1 aixi lt a0 represent negative class A hy-

perplane in l-dimensional space may slowly rotate bychanging the parameters for simulating time-changingconcepts In this paper the value of l is 10 and there are6 attributes with concept drift and it generates 20000instances -ree different noise ratios (respectively20 25 and 30) are injected into data streamSEA data stream [29] the instances in this data streamare generated from three attributes with continuousvalues x1 x2 x3 isin [0 10] When it satisfies x1 + x2 ge θthe instance is positive class otherwise the label ofinstance is negative To simulate concept drift thethreshold value θ 8 9 7 95 will change over timeIt generates 5000 instances with each threshold valueand the whole SEA data stream includes 20000 in-stances SEA data stream with two different noise ratios(20 and 30) is applied in this experiment to test theabrupt drift

52 Baselines -e PA algorithmic framework [19] andHoeffding tree [38] are selected as baselines to compare withthe presented methodMCBIE and these two approaches are

6 Complexity

frequently chosen as the benchmark in many studies[20 22 23 38] Moreover as a well-known classical algo-rithm the Hoeffding tree algorithm is integrated into theMOA platform [36] -erefore we have followed suit in ourpaper

-e PA algorithmic framework [19] is an online in-cremental learning framework for binary classificationbased on SVM Given instance X the classificationmodel outputs the prediction as follows

1113954y sign(wrarr

middot X) (6)

where wrarr represents a vector of weights and 1113954y is the

prediction of instance X

After the 1113954y is output it acquires the ground truth classlabel y and computes a loss value resulting from the fol-lowing equation

ls max 0 1 minus y middot (wrarr

middot X)1113864 1113865 (7)

-e vector of weights wrarr is then updated using

wrarr

wrarr

+ τ middot y middot X (8)

where τ ge 0 is a Lagrange multiplier whose value is calcu-lated by equation (9) in three different methods namely PAPA-I and PA-II

τ ls

X2(PA)

τ min C lsX2

1113872 11138731113966 1113967(PA minus I)

τ ls

X2 +(1(2 middot C))(PA minus II)

(9)

where C is a positive parameter and referred to as aggres-siveness parameter of the algorithm A detailed outline of thederivation procedure can be found in [19]

Hoeffding tree [38] is a decision tree for online learningfrom the high-volume data stream and it is built fromeach instance in constant time According to Hoeffdingbound we can estimate the number of instances whichare needed to build the tree node-eHoeffding boundhas nothing to do with the distribution function that

generates the instances Moreover the Hoeffdingbound is used to construct Hoeffding tree which isapproximated to the one produced by batch learningIn the light of its incremental nature of the Hoeffdingtree it is used widely in data stream classification

53 Experiment Setup Following the first-test-and-then-train principle [39] each incoming instance is first testedand then the model is retrained with the incoming instanceunder an incremental paradigm To assess the classificationmodelrsquos performance in this paper the classification accu-racy is computed every one hundred instances during theprocess of data stream classification

Both our MCBIE and the baselines are initialized on thefirst 100 instances and the model resulting from that ini-tialization is used to predict the following instance in datastream In MCBIE we use these 100 instances to train 6initial microclusters as base classifiers by using the k-meansalgorithm At every time stamp the three nearest micro-clusters of each incoming instance are selected to assert thelabel information -e maximum scale of microcluster poolis 30 and once full a new microcluster which takes the placeof the worst-performing microcluster joins in the pool Weuse Weka package to implement the MCBIE algorithmHoeffding tree algorithm (named as HT) is run in MOAplatform with the parameters set to their default values PAPA-I and PA-II with Gaussian kernel are executed inMATLAB and the constant C is equal to 1

54 Experiment Result and Analysis -e simulation ex-periments are designed to evaluate MCBIE in two sidesFirst we want to assess the sensitivity of the smoothingparameter σ second we want to justify the feasibility andvalidity of MCBIE

541 Experiment 1 Sensitivity Analysis of the SmoothingParameter Following the experimental setup in Section 53we verify the function of smoothing parameter σ in MCBIEfrom 01 to 1 When the smoothing parameter σ is either toobig or too small the MCBIErsquos average accuracy and cor-responding standard deviation do not reach the desiredresult on the Hyperplane data stream and SEA data stream

(a) (b) (c)

Figure 4-ree different scenarios (a) incoming instancersquos label is the same as the nearest microclusterrsquos label (b) incoming instancersquos labelvaries from the nearest microclusterrsquos label which represents noisy instance and (c) incoming instance as a new concept does not drop intothe nearest microcluster and its label is different from the nearest microclusterrsquos label Note the color represents the class label the rectanglesuggests the incoming instance and the circle represents the nearest microcluster

Complexity 7

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 7: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

frequently chosen as the benchmark in many studies[20 22 23 38] Moreover as a well-known classical algo-rithm the Hoeffding tree algorithm is integrated into theMOA platform [36] -erefore we have followed suit in ourpaper

-e PA algorithmic framework [19] is an online in-cremental learning framework for binary classificationbased on SVM Given instance X the classificationmodel outputs the prediction as follows

1113954y sign(wrarr

middot X) (6)

where wrarr represents a vector of weights and 1113954y is the

prediction of instance X

After the 1113954y is output it acquires the ground truth classlabel y and computes a loss value resulting from the fol-lowing equation

ls max 0 1 minus y middot (wrarr

middot X)1113864 1113865 (7)

-e vector of weights wrarr is then updated using

wrarr

wrarr

+ τ middot y middot X (8)

where τ ge 0 is a Lagrange multiplier whose value is calcu-lated by equation (9) in three different methods namely PAPA-I and PA-II

τ ls

X2(PA)

τ min C lsX2

1113872 11138731113966 1113967(PA minus I)

τ ls

X2 +(1(2 middot C))(PA minus II)

(9)

where C is a positive parameter and referred to as aggres-siveness parameter of the algorithm A detailed outline of thederivation procedure can be found in [19]

Hoeffding tree [38] is a decision tree for online learningfrom the high-volume data stream and it is built fromeach instance in constant time According to Hoeffdingbound we can estimate the number of instances whichare needed to build the tree node-eHoeffding boundhas nothing to do with the distribution function that

generates the instances Moreover the Hoeffdingbound is used to construct Hoeffding tree which isapproximated to the one produced by batch learningIn the light of its incremental nature of the Hoeffdingtree it is used widely in data stream classification

53 Experiment Setup Following the first-test-and-then-train principle [39] each incoming instance is first testedand then the model is retrained with the incoming instanceunder an incremental paradigm To assess the classificationmodelrsquos performance in this paper the classification accu-racy is computed every one hundred instances during theprocess of data stream classification

Both our MCBIE and the baselines are initialized on thefirst 100 instances and the model resulting from that ini-tialization is used to predict the following instance in datastream In MCBIE we use these 100 instances to train 6initial microclusters as base classifiers by using the k-meansalgorithm At every time stamp the three nearest micro-clusters of each incoming instance are selected to assert thelabel information -e maximum scale of microcluster poolis 30 and once full a new microcluster which takes the placeof the worst-performing microcluster joins in the pool Weuse Weka package to implement the MCBIE algorithmHoeffding tree algorithm (named as HT) is run in MOAplatform with the parameters set to their default values PAPA-I and PA-II with Gaussian kernel are executed inMATLAB and the constant C is equal to 1

54 Experiment Result and Analysis -e simulation ex-periments are designed to evaluate MCBIE in two sidesFirst we want to assess the sensitivity of the smoothingparameter σ second we want to justify the feasibility andvalidity of MCBIE

541 Experiment 1 Sensitivity Analysis of the SmoothingParameter Following the experimental setup in Section 53we verify the function of smoothing parameter σ in MCBIEfrom 01 to 1 When the smoothing parameter σ is either toobig or too small the MCBIErsquos average accuracy and cor-responding standard deviation do not reach the desiredresult on the Hyperplane data stream and SEA data stream

(a) (b) (c)

Figure 4-ree different scenarios (a) incoming instancersquos label is the same as the nearest microclusterrsquos label (b) incoming instancersquos labelvaries from the nearest microclusterrsquos label which represents noisy instance and (c) incoming instance as a new concept does not drop intothe nearest microcluster and its label is different from the nearest microclusterrsquos label Note the color represents the class label the rectanglesuggests the incoming instance and the circle represents the nearest microcluster

Complexity 7

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 8: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

-rough the observation and analysis we find thesmoothing parameter σ could regulate the balance betweenthe historical and new instances used to compute the cen-troid of the microcluster When σ 0 the centroid of themicrocluster will not move and only its radius changes Onthe contrary when σ 1 reaches the maximum value themicroclusterrsquos centroid is a mean of instances It suggests allinstances have the same importance to the centroidHowever because concept drift will occur in nonstationarydata stream environment instance at different time stampsshould have different contributions to the centroid ofmicrocluster Experiment results justify this viewpointAccording to the analysis of experiment results a conclusionis made that the best value of σ is located at an interval[06 07] hence we chose σ 065 for subsequentexperiments

542 Experiment 2 Feasibility and Validity of MCBIEAll the experimental results with both the Hyperplane andSEA data streams are shown in Table 1 At the same time themaximum value in each column is marked in bold

From Table 1 we see the average accuracy of MCBIEreaches the highest value of 696 648 and 618 onHyperplane data stream with the noise ratio of 20 25and 30 respectively -e corresponding standard varianceof the three average accuracies is 0051 0046 and 0047 andthe standard variances about accuracy are relatively lowcompared with the baselines On average MCBIE providesthe most accurate classifications with the least standardvariance among all the baselines with the Hyperplane datastream On the SEA data stream the average classificationaccuracy of MCBIE is 708 at 20 noise and 632 at 30noise respectively Again the standard variances are thesmallest compared to all baselines which demonstrateMCBIErsquos stability in nonstationary data streams with noiseBased on the above experiment results we may draw aconclusion that MCBIE is an effective and well-performingclassification approach

-rough the further analysis of experiment results inTable 1 some interesting phenomena exist As the noise ratiogrows the performance of MCBIE is improved to a greaterdegree than the other methods For instance with a noiseratio of 20 on the SEA data stream MCBIE ranks only thesecond lead behind PA-I However at 30 noise MCBIEbecomes the most accurate model Given the same noiseratios the experiment results show that the classificationmodel on SEA data stream performs better than on Hy-perplane data stream -is suggests it is more difficult forclassification model to learn knowledge from gradual drift

than from abrupt drift Of all the baselines PA-I provides thebest performance which indicates that selecting an ap-propriate learning ratio τ is very important for incrementallearning-eHoeffding tree baseline has the largest standardvariance and it shows that Hoeffding tree has instability

Last but not least we want to showMCBIE adapts well toconcept drift Figures 5 and 6 illustrate accuracy curve for theMCBIE method with Hyperplane and SEA data streamsFigure 5 suggests that MCBIE can tackle concept drift intime on the Hyperplane data stream with the different noiseratios When concept drift occurs the curve plot in Figure 5sharply descends and it indicates the concept included by themodel is inconsistent with the current concept MCBIErsquosaccuracy decreases when concept changes However theperformance of MCBIE improves immediately after themodel is retrained and updated with the incoming instancethrough incremental learning -e intensity of ascent anddescent in Figure 5 reflects that the classification model hasthe ability to catch concept drift In Figure 6 we easilyunderstand that the similar phenomena are presented withthe SEA data stream

To demonstrateMCBIErsquos superiority based on the aboveanalysis we choose the two best methods MCBIE and PA-Ito illustrate the ability to perform the prediction task instreaming data setting with noise and concept drift -eaccuracy curve is plotted in Figures 7 and 8 From Figure 7the accuracy curve suggests that these two methods have theability to keep track with concept drift and shows clearly thatour method is superior to the PA-I in terms of accuracy overthe Hyperplane data stream with the three different noise

Table 1 Average classification accuracy and standard variance

Hyperplane data stream SEA data stream20 noise 25 noise 30 noise 20 noise 30 noise

PA 0622 plusmn 0056 0581 plusmn 0054 0548 plusmn 0048 0661 plusmn 0061 0577 plusmn 0056PA-I 0678 plusmn 0049 0626 plusmn 0050 0582 plusmn 0045 0726 plusmn 0060 0617 plusmn 0061PA-II 0646 plusmn 0053 0599 plusmn 0053 0558 plusmn 0047 0688 plusmn 0062 0595 plusmn 0059HT 0582 plusmn 0067 0567 plusmn 0061 0556 plusmn 0060 0705 plusmn 0068 0625 plusmn 0059MCBIE 0696 plusmn 0051 0648 plusmn 0046 0618 plusmn 0047 0708 plusmn 0053 0632 plusmn 0046

0 20 40 60 80 100 120 140 160 180 200045

053

061

069

077

085

Data streamsA

ccur

acy

20 noise25 noise30 noise

Figure 5 -e accuracy curve for the MCBIE method with Hy-perplane data stream

8 Complexity

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 9: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

20 noise30 noise

05

058

066

074

082

09

Acc

urac

y20 40 60 80 100 120 140 160 180 2000

Data streams

Figure 6 -e accuracy curve for the MCBIE method with SEA data stream

MCBIEPA-I

05

055

06

065

07

075

08

085

09

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

(a)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

08

Acc

urac

y

(b)

MCBIEPA-I

20 40 60 80 100 120 140 160 180 2000Data streams

045

05

055

06

065

07

075

Acc

urac

y

(c)

Figure 7 -e accuracy curve for the two best methods over Hyperplane data streams (a) Hyperplane with 20 noise (b) Hyperplane with25 noise (c) Hyperplane with 30 noise

Complexity 9

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 10: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

ratios Moreover in three cases the maximum and mini-mum accuracy of MCBIE is higher than that of the PA-I-rough the analysis of the accuracy curve over SEA datastream concerned with the ability to adapt to concept driftthese two methods seem to have the same function to dealwith nonstationary data stream classification as demon-strated in Figure 8 Moreover with the growth of noise ratiothe MCBIE has a better performance than PA-I such asstability

From these analyses we conclude that the MCBIEmethod is able to conduct nonstationary data stream clas-sification with high accuracy in environments characterizedby both concept drift and noise

6 Conclusions

Classification task in nonstationary data streams faces thetwo main problems concept drift and noise which requirethe classification model to not only cope with concept driftbut also differentiate noise from concept drift In order todeal with these problems a novel method named MCBIEwas proposed which can achieve the classification task innonstationary data streams with noise Aiming to enhanceMCBIErsquos performance the three strategies are used to al-leviate the influence of concept drift and noise In this paperincremental learning can help microcluster as classifier tocatch the concept change fast and ensemble strategy alle-viates the disturbance between noise and concept drift -efunction of smoothing parameter is to absorb the usefulinformation from historical knowledge Compared with thebaseline methods experiment results justify that ourmethod MCBIE has the ability to perform classification innonstationary streaming data setting However the threeproblems are worthy to be further concerned (1) how toimprove the noise recognition ability of our method in

abrupt drift environment needs to be further strengthened(2) in addition to accuracy the stability of model needs to beimproved (3) when concept reoccurs it is important todesign more appropriate strategies for the replacement ofmicrocluster

Data Availability

-e data used to support the findings of this study have beendeposited in the GitHub repository (httpsgithubcomFanzhenLiuComplexityJournal)

Conflicts of Interest

-e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

-is study was supported by the Natural Science Foundationof Anhui Province (nos 1608085MF147 and1908085MF183) the Humanities and Social ScienceFoundation of theMinistry of Education (no 18YJA630114)a Major Project of Natural Science Research in the Collegesand Universities of Anhui Province (no KJ2019ZD15)MQNS (no 9201701203) MQEPS (no 96804590) MQRSG(no 95109718) and the Investigative Analytics CollaborativeResearch Project between Macquarie University and Data61CSIRO

References

[1] G Ditzler M Roveri C Alippi and R Polikar ldquoLearning innonstationary environments a surveyrdquo IEEE ComputationalIntelligence Magazine vol 10 no 4 pp 12ndash25 2015

053

059

065

071

077

083

087

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(a)

045

05

055

06

065

07

075

08

Acc

urac

y

20 40 60 80 100 120 140 160 180 2000Data streams

MCBIEPA-I

(b)

Figure 8 -e accuracy curve for the two best methods over SEA data streams (a) SEA with 20 noise (b) SEA with 30 noise

10 Complexity

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 11: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

[2] A Jadhav A Jadhav P Jadhav and P Kulkarni ldquoA novelapproach for the design of network intrusion detection system(NIDS)rdquo in Proceedings of 2013 International Conference onSensor Network Security Technology and Privacy Communi-cation System IEEE New York NY USA pp 22ndash27 De-cember 2013

[3] A Salazar G Safont A Soriano and L Vergara ldquoAutomaticcredit card fraud detection based on non-linear signal pro-cessingrdquo in Proceedings of 2012 IEEE International CarnahanConference on Security Technology IEEE Newton MA USApp 207ndash212 October 2012

[4] T Bujlow T Riaz and J M Pedersen ldquoA method forclassification of network traffic based on c5 0 machinelearning algorithmrdquo in Proceedings of the 2012 InternationalConference on Computing Networking and CommunicationsIEEE Maui HI USA pp 237ndash241 February 2012

[5] L Gao J Wu C Zhou and Y Hu ldquoCollaborative dynamicsparse topic regression with user profile evolution for itemrecommendationrdquo in Proceedings of the 8irty-First AAAIConference on Artificial Intelligence New York NY USAFebruary 2017

[6] S Xue J Lu and G Zhang ldquoCross-domain network repre-sentationsrdquo Pattern Recognition vol 94 pp 135ndash148 2019

[7] S Ren W Zhu B Liao et al ldquoSelection-based resamplingensemble algorithm for nonstationary imbalanced streamdata learningrdquo Knowledge-Based Systems vol 163 pp 705ndash722 2019

[8] J Sun H Fujita P Chen and H Li ldquoDynamic financialdistress prediction with concept drift based on time weightingcombined with adaboost support vector machine ensemblerdquoKnowledge-Based Systems vol 120 pp 4ndash14 2017

[9] T Zhai Y Gao H Wang and L Cao ldquoClassification of high-dimensional evolving data streams via a resource-efficientonline ensemblerdquo Data Mining and Knowledge Discoveryvol 31 no 5 pp 1242ndash1265 2017

[10] W-X Lu C Zhou and J Wu ldquoBig social network influencemaximization via recursively estimating influence spreadrdquoKnowledge-Based Systems vol 113 pp 143ndash154 2016

[11] Y Zhang J Wu C Zhou and Z Cai ldquoInstance cloned ex-treme learning machinerdquo Pattern Recognition vol 68pp 52ndash65 2017

[12] J Wu Z Cai S Zeng and X Zhu ldquoArtificial immune systemfor attribute weighted naive bayes classificationrdquo in Proceedingsof the 2013 International Joint Conference on Neural Networks(IJCNN) IEEE Dallas TX USA pp 1ndash8 August 2013

[13] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instancelearning with discriminative bagmappingrdquo IEEE Transactionson Knowledge and Data Engineering vol 30 no 6pp 1065ndash1080 2018

[14] P ZareMoodi S K Siahroudi and H Beigy ldquoA supportvector based approach for classification beyond the learnedlabel space in data streamsrdquo in Proceedings of the 31st AnnualACM Symposium on Applied Computing ACM Pisa Italypp 910ndash915 April 2016

[15] S Ramirez-Gallego B Krawczyk S Garcia M WozniakJ M Benitez and F Herrera ldquoNearest neighbor classificationfor high-speed big data streams using sparkrdquo IEEE Trans-actions on Systems Man and Cybernetics Systems vol 47no 10 pp 2727ndash2739 2017

[16] J Gama R Fernandes and R Rocha ldquoDecision trees formining data streamsrdquo Intelligent Data Analysis vol 10 no 1pp 23ndash45 2006

[17] H L Hammer A Yazidi and B J Oommen ldquoOn theclassification of dynamical data streams using novel ldquoAnti-

Bayesianrdquo techniquesrdquo Pattern Recognition vol 76 pp 108ndash124 2018

[18] M A Maloof and R S Michalski ldquoIncremental learning withpartial instance memoryrdquo Artificial Intelligence vol 154no 1-2 pp 95ndash126 2004

[19] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006

[20] M Tennant F Stahl O Rana and J B Gomes ldquoScalable real-time classification of data streams with concept driftrdquo FutureGeneration Computer Systems vol 75 pp 187ndash199 2017

[21] J Read A Bifet B Pfahringer and G Holmes ldquoBatch-in-cremental versus instance-incremental learning in dynamicand evolving datardquo in Proceedings of International Symposiumon Intelligent Data Analysis Springer Helsinki Finlandpp 313ndash323 October 2012

[22] J Lu D Sahoo P Zhao and S C Hoi ldquoSparse passive-ag-gressive learning for bounded online kernel methodsrdquo ACMTransactions on Intelligent Systems and Technology vol 9no 4 p 45 2018

[23] M Oide A Takahashi T Abe and T Suganuma ldquoUser-oriented video streaming service based on passive aggressivelearningrdquo International Journal of Software Science andComputational Intelligence vol 9 no 1 pp 35ndash54 2017

[24] B Krawczyk and M Wozniak ldquoOne-class classifiers with in-cremental learning and forgetting for data streamswith conceptdriftrdquo Soft Computing vol 19 no 12 pp 3387ndash3400 2015

[25] S Xu and J Wang ldquoA fast incremental extreme learningmachine algorithm for data streams classificationrdquo ExpertSystems with Applications vol 65 pp 332ndash344 2016

[26] Y Lu Y-M Cheung and Y Y Tang ldquoDynamic weightedmajority for incremental learning of imbalanced data streamswith concept driftrdquo in Proceedings of the 2017 InternationalJoint Conference on Artificial Intelligence pp 2393ndash2399Melbourne Australia August 2017

[27] Y Zhang J Yu W Liu and K Ota ldquoEnsemble classificationfor skewed data streams based on neural networkrdquo Inter-national Journal of Uncertainty Fuzziness and Knowledge-Based Systems vol 26 p 08 2018

[28] B Krawczyk L L Minku J Gama J Stefanowski andM Wozniak ldquoEnsemble learning for data stream analysis asurveyrdquo Information Fusion vol 37 pp 132ndash156 2017

[29] W N Street and Y Kim ldquoA streaming ensemble algorithm(sea) for large-scale classificationrdquo in Proceedings of theSeventh ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining ACM San Francisco CAUSA pp 377ndash382 August 2001

[30] H Wang W Fan P S Yu and J Han ldquoMining concept-drifting data streams using ensemble classifiersrdquo in Pro-ceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining ACM Wash-ington DC USA pp 226ndash235 December 2003

[31] J R B Junior and M do Carmo Nicoletti ldquoAn iterativeboosting-based ensemble for streaming data classificationrdquoInformation Fusion vol 45 pp 66ndash78 2019

[32] K Jackowski ldquoNew diversity measure for data stream clas-sification ensemblesrdquo Engineering Applications of ArtificialIntelligence vol 74 pp 23ndash34 2018

[33] G I Webb R Hyde H Cao H L Nguyen and F PetitjeanldquoCharacterizing concept driftrdquo Data Mining and KnowledgeDiscovery vol 30 no 4 pp 964ndash994 2016

[34] A Tsymbal ldquo-e problem of concept drift definitions andrelated workrdquo Computer Science Department vol 106 no 22004

Complexity 11

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity

Page 12: Microcluster-BasedIncrementalEnsembleLearningforNoisy ...downloads.hindawi.com/journals/complexity/2020/6147378.pdf · concept drift in nonstationary streaming data with noise. Meanwhile,inordertocatchconceptdrift,theclassification

[35] T Zhang R Ramakrishnan and M Livny ldquoBirchrdquo ACMSIGMOD Record vol 25 no 2 pp 103ndash114 1996

[36] A Bifet G Holmes R Kirkby and B Pfahringer ldquoMoaMassive online analysisrdquo Journal of Machine Learning Re-search vol 11 pp 1601ndash1604 2010

[37] G Hulten L Spencer and P Domingos ldquoMining time-changing data streamsrdquo in Proceedings of the Seventh ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining ACM San Francisco CA USA pp 97ndash106December 2001

[38] P Domingos and G Hulten ldquoMining high-speed datastreamsrdquo in Proceedings of the Fifth ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining vol 2 p 4 Boston MA USA April 2000

[39] A Bifet G Holmes B Pfahringer and R Gavalda ldquoIm-proving adaptive bagging methods for evolving data streamsrdquoin Proceedings of 2009 Asian Conference on Machine LearningSpringer Berlin Germany pp 23ndash37 November 2009

12 Complexity