pbl

Applied Soft Computing 13 (2013) 654666

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing

j ourna l ho me p age: www.elsev ier .co

Meta-c Bfor clas

G. SateesSchool of Comp

a r t i c l

Article history:Received 2 FebReceived in reAccepted 31 AAvailable onlin

Keywords:Meta-cognitivSelf-regulatoryRadial basis fuMulti-category classicationProjection Based Learning

Netwquenhumanent ction

optimergy fy chlemezation

The interaction of cognitive component and meta-cognitive component address the what-to-learn, when-to-learn and how-to-learn human learning principles efciently. The performance of the PBL-McRBFN isevaluated using a set of benchmark classication problems from UCI machine learning repository and twopractical problems, viz., the acoustic emission signal classication and the mammogram for cancer classi-cation. The statistical performance evaluation on these problems has proven the superior performance

1. Introdu

Neural nmate the coHence, fromemployed tcation proaccurately mlabels. Sevearchitectureness, indusrecognitionnosis [4], prinformationthe literatution due towidely useto RBF learnclassied indient desce

CorresponE-mail add

1568-4946/$ http://dx.doi.oof PBL-McRBFN classier over results reported in the literature. 2012 Elsevier B.V. All rights reserved.

ction

etworks are powerful tools that can be used to approxi-mplex nonlinear inputoutput relationships efciently.

the last few decades neural networks are extensivelyo solve real world classication problems [1]. In a classi-blem, the objective is to learn the decision surface thataps an input feature space to an output space of class

ral learning algorithms for different neural networks have been used in various problems in science, busi-

try and medicine, including the handwritten character [2], speech recognition [3], biomedical medical diag-ediction of bankruptcy [5], text categorization [6] and

retrieval [7]. Among various architectures reported inre, Radial Basis Function (RBF) network gaining atten-

its localization property of Gaussian function, andd in classication problems. Signicant contributionsing algorithms for classication problems are broadlyto two categories: (a) Batch learning algorithms: Gra-nt based learning was used to determine the network

ding author. Tel.: +65 6790 6185.ress: [email protected] (S. Suresh).

parameters [8]. Here, the complete training data are presented mul-tiple times, until the training error is minimum. Alternatively, onecan implement random input parameter selection with least squaresolution for the output weight [9,10]. In both cases, the numberof Gaussian functions required to approximate the true func-tion is determined heuristically. (b) Sequential learning algorithms:The number of Gaussian neurons required to approximate theinputoutput relationship is determined automatically [1115].Here, the training samples are presented one-by-one and discardedafter learning. Resource Allocation Network (RAN) [11] was therst sequential learning algorithm introduced in the literature. RANevolves the network architecture required to approximate the truefunction using novelty based neuron growth criterion. MinimalResource Allocation Network (MRAN) [12] uses a similar approach,but it incorporates error based neuron growing/pruning criterion.Hence, MRAN determines compact network architecture than RANalgorithm. Growing and Pruning Radial Basis Function Network[13] selects growing/pruning criteria of the network based on thesignicance of a neuron. A sequential learning algorithm usingrecursive least squares presented in [14], referred as an On-lineSequential Extreme Learning Machine (OS-ELM). OS-ELM choosesinput weights randomly with xed number of hidden neurons andanalytically determines the output weights using minimum normleast-squares. In case of sparse and imbalance data sets, the random

see front matter 2012 Elsevier B.V. All rights reserved.rg/10.1016/j.asoc.2012.08.047ognitive RBF Network and its Projectionsication problems

h Babu, S. Suresh

uter Engineering, Nanyang Technological University, Singapore

e i n f o

ruary 2012vised form 24 May 2012ugust 2012e 23 September 2012

e learning thresholdsnction network

a b s t r a c t

Meta-cognitive Radial Basis Functionrithm for classication problems in sePBL-McRBFN. McRBFN is inspired by ponents, namely the cognitive compois a single hidden layer radial basis funnent, the PBL algorithm computes theanalytical minima of the nonlinear enprocess in the cognitive component badapts the learning strategies by imptions are considered for proper initialim/l ocate /asoc

ased Learning algorithm

ork (McRBFN) and its Projection Based Learning (PBL) algo-tial framework is proposed in this paper and is referred to asn meta-cognitive learning principles. McRBFN has two com-and the meta-cognitive component. The cognitive component

network with evolving architecture. In the cognitive compo-al output weights with least computational effort by nding

unction. The meta-cognitive component controls the learningoosing the best learning strategy for the current sample andnting self-regulation. In addition, sample overlapping condi-

of new hidden neurons, thus minimizes the misclassication.

G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666 655

(b)(a)Metacognitive Component

C

Metacognition

tion a

selection ofin the OS-E[16]. In neu(EFuNNs) [1shown in [1the functionlems. A Seq(SMC-RBF) classicatioand paramupdating thof the curreupdating a

Aforemein the traincontained inprocessing remembericognitive inhas revealeers adopt se[18,19]. Mecognitive frprocesses, dand evaluatbasis functisuitable leacess then itNetwork (Mwhat-to-leation from tself-regulat

Self-adaComplex-va(CSRAN) [2cognition berror and happropriatehelps in impevident thawith suitabability of a n(a) the samterion whic(b) the newmay overlaclassicatioand (d) useparameter and Meta-caddress theof meta-cogally intensi

edgeon ied Pn ne

propnd a

we p fast

re arlogy ed inson as acti

is anmponenta-cogationformnitivnents thenentts m

BFNon mrensnamnenthiddctury ads of weig

bases areorithut/hple

rgy m ofogramCognition

ControlMonitoring PredictedOutput

Fig. 1. (a) Nelson and Narens Model of meta-cogni

input weights with xed number of hidden neuronsLM affects the performance signicantly as shown inral-fuzzy framework, Evolving Fuzzy Neural Networks7] is the novel sequential learning algorithm. It has been5] that the aforementioned algorithms works well for

approximation problems than the classication prob-uential Multi-Category Radial Basis Function network[15] considers the similarity measure within class, mis-n rate and prediction error are used in neuron growingeter update criterion. SMC-RBF has been shown thate nearest neuron parameters in the same class as thatnt sample helps in improving the performance than

nearest neuron in any class.ntioned neural network algorithms use all the samplesing data set to gain knowledge about the information

the samples. In other words, they possess information-abilities of humans, including perception, learning,ng, judging, and problem-solving, and these abilities are

nature. However, recent studies on human learningd that the learning process is effective when the learn-lf-regulation in learning process using meta-cognitionta-cognition means cognition about cognition. In a meta-amework, human-beings think about their cognitiveevelop new strategies to improve their cognitive skillse the information contained in their memory. If a radialon network analyzes its cognitive process and choosesrning strategies adaptively to improve its cognitive pro-

is referred to as Meta-Cognitive Radial Basis FunctioncRBFN). Such a McRBFN must be capable of deciding

rn, when-to-learn and how-to-learn the decision func-he stream of training data by emulating the humaned learning.ptive Resource Allocation Network (SRAN) [20] andlued Self-regulating Resource Allocation Network1] address the what-to-learn component of meta-y selecting signicant samples using misclassicationinge loss error. It has been shown that the selecting

samples for learning and removing repetitive samplesroving the generalization performance. Therefore, it ist emulating the three components of human learningle learning strategies would improve the generalizationeural network. The drawbacks in these algorithms are:ples for training are selected based on simple error cri-h is not sufcient to address the signicance of samples;

knowlcognitiproposfunctioexceptwork apaper,and itsrithm.

Thephysioreportby Nelvarioumodeltwo cocompoto metinformThe inthe cogcompoinformcompoponenstate.

McRcognitiand Nanents composingle architedata bweightinput minedneuroning algthe inpthe saman eneprobleear pr hidden neuron center is allocated independently whichp with already existed neuron centers leading to mis-n; (c) knowledge gained from past samples is not used;s computationally intensive extended Kalman lter forupdate. Meta-cognitive Neural Network (McNN) [22]ognitive Neuro-Fuzzy Inference System (McFIS) [23]

rst two issues efciently by using three componentsnition. However, McNN and McFIS use computation-ve parameter update and does not utilize the past

and real caalgorithm tsystem of liweights, cofunction. Tdynamic mand self-regthe learningthe four str(RBF Neural Network)ognitive Component

Best learning Strategy

nd (b) McRBFN Model.

stored in the network. Similar works using meta-n complex domain are reported in [24,25]. Recentlyrojection Based Learning in meta-cognitive radial basistwork [26] addresses the above issues in batch modeer utilization of the past knowledge stored in the net-pplied to solve biomedical problems in [2729]. In thisropose a meta-cognitive radial basis function networkand efcient projection based sequential learning algo-

e several meta-cognition models available in humanand a brief survey of various meta-cognition models are

[30]. Among the various models, the model proposednd Narens in [31] is simple and clearly highlights theons in human meta-cognition as shown in Fig. 1(a). Thealogous to the meta-cognition in human-beings and hasnents, the cognitive component and the meta-cognitive. The information ow from the cognitive componentnitive component is considered monitoring, while the

ow in the reverse direction is considered control.ation owing from the meta-cognitive component toe component either changes the state of the cognitive

or changes the cognitive component itself. Monitoring meta-cognitive component about the state of cognitive, thus continuously updating the meta-cognitive com-odel of cognitive component, including, no change in

is developed based on the Nelson and Narens meta-odel [31] as shown in Fig. 1(b). Analogous to the Nelson

meta-cognition model [31], McRBFN has two compo-ely the cognitive component and the meta-cognitive

as shown in Fig. 1(b). The cognitive component is aen layer radial basis function network with evolvinge. The cognitive component learns from the trainingding new hidden neurons and updating the outputhidden neurons to approximate the true function. Thehts of hidden neurons (center and width) are deter-d on the training data and output weights of hidden

estimated using the projection based sequential learn-m. When a neuron is added to the cognitive component,idden layer parameters are xed based on the input of

and the output weights are estimated by minimizingfunction given by the hinge loss error as in [32]. The

nding optimal weights is rst formulated as a lin-ming problem using the principles of minimization

lculus [33,34]. The Projection Based Learning (PBL)hen converts the linear programming problem into anear equations and provides a solution for the optimalrresponding to the minimum energy point of the energyhe meta-cognitive component of McRBFN contains aodel of the cognitive component, knowledge measuresulated thresholds. Meta-cognitive component controls

process of the cognitive component by choosing one ofategies for each sample in the training data set. When a

656 G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666

sample is presented to McRBFN, the meta-cognitive component ofMcRBFN measures the knowledge contained in the current train-ing sample with respect to the cognitive component using itsknowledge measures. Predicted class label, maximum hinge errorand class-wures of theobtained frmethods totightly by tbetween thtion helps paper, McRspherical ping the learthe meta-coing strategithese stratethe cognitivachieves begies are adathresholds.the overlaptance fromusing the PProjection Basis Netwo

The perevaluated ution problelearning repbinary clasfactor. In ais comparethe literatuall/average test [37]. Tranking of etistical signthe performated using emission sition for brethat PBL-Mcmance than

The outlmeta-cogniSection 3 prsier on a sand comparliterature. S

2. Meta-coclassicatio

In this setion netwothe classicradial basissequential form.

2.1. Problem

Given stwhere xt =

sample, and ct (1, n) is its class label. Where n is the total numberof classes. The coded class labels (yt = [yt1, . . . , ytj , . . . , ytn]

T ) Rnare given by:

1

1

jectiisionro hachiectur

deta

cRBF

RBFNnd thive cork wie me

of tgulatg prr straininnentg sam

infog pry foro-lea

presive co

Cogn cognrwar

layenenthoutsian

pred

1

wk

wkjput ninpu

p

(

lk

uronf the

cognhm fhat o34]. Tjecting a

funcise signicance are considered as knowledge meas- meta-cognitive component. Class-wise signicance isom spherical potential, which is used widely in kernel

determine whether all the data points are enclosedhe Gaussian kernels [35]. Here, the squared distancee current sample and the hyper-dimensional projec-in measuring the novelty in the data. Since, in thisBFN addresses classication problems, we redene theotential in class-wise framework and is used in devis-ning strategies. Using the above mentioned measuresgnitive component constructs two sample based learn-es and two neuron based learning strategies. One ofgies is selected for the current training sample such thate component learns the true function accurately andtter generalization performance. These learning strate-pted by meta-cognitive component using self-regulated

In addition, the meta-cognitive component identiesping/non-overlapping conditions by measuring the dis-

nearest neuron in the inter/intra-class. The McRBFNBL to obtain the network parameters is referred to as,Based Learning algorithm for a Meta-cognitive Radialrk (PBL-McRBFN).formance of the proposed PBL-McRBFN classier issing set of benchmark binary/multi-category classica-ms from University of California, Irvine (UCI) machineository [36]. We consider ve multi-category and ve

sication problems with varying values of imbalancell these problems, the performance of PBL-McRBFNd against the best performing classiers available inre using class-wise performance measures like over-efciency and a non-parametric statistical signicancehe non-parametric Friedman test based on the meanach algorithm over multiple data sets indicate the sta-icance of the proposed PBL-McRBFN classier. Finally,ance of PBL-McRBFN classier has also been evalu-

two practical classication problems viz., the acousticgnal classication [38] and the mammogram classica-ast cancer detection [39]. The results clearly highlightRBFN classier provides a better generalization perfor-

the results reported in the literature.ine of this paper is as follows: Section 2 describes thetive radial basis network for classication problems.esents the performance evaluation of PBL-McRBFN clas-et of benchmark and practical classication problems,es with the best performing classiers available in theection 4 summarizes the conclusions from this study.

gnitive radial basis function network forn problems

ction, we describe the meta-cognitive radial basis func-rk for solving classication problems. First, we deneation problem. Next, we present the meta-cognitive

function network architecture. Finally, we present thelearning algorithm and summarize in a pseudo-code

denition

ream of training data samples, {(x1, c1), . . ., (xt, ct), . . . },[xt1, . . . , x

tm]T Rm is the m-dimensional input of the tth

ytj =

The obing decwith zeple to architegies in

2.2. M

Mcnent acognitnetworon. Thmodelself-relearninthe founew trcompotraininon thislearninstrategwhen-t

Wecognit

2.2.1. The

feed fooutputcompo

WitK Gausxt, the

ytj =Kk=

wherejth outto the

htk = ex

whereden neclass o

Thealgoritlar to tin [33,

ProLearnienergyif ct = jotherwise j = 1, . . . , n (1)

ve of McRBFN classier is to approximate the underly- function that maps xt Rm yt Rn. McRBFN beginsidden neuron and selects suitable strategy for each sam-ve this objective. In the next section, we describe thee of McRBFN and discuss each of these learning strate-il.

N architecture

has two components, namely the cognitive compo-e meta-cognitive component, as shown in Fig. 2. Themponent is a single hidden layer radial basis functionth evolving architecture starting from zero hidden neu-ta-cognitive component of McRBFN contains dynamiche cognitive component, knowledge measures anded thresholds. Meta-cognitive component controls theocess of the cognitive component by choosing one ofategies for each sample in the training data set. When ag sample presented to the McRBFN, the meta-cognitive

of McRBFN estimates the knowledge present in the newple with respect to the cognitive component. Based

rmation, the meta-cognitive component controls theocess of the cognitive component by selecting suitable

the current training sample to address what-to-learn,rn and how-to-learn properly.ent a detailed description of the cognitive and the meta-mponents of McRBFN in the following sections:

itive component of McRBFNitive component of McRBFN is a single hidden layeredd radial basis function network with a linear input andrs. The neurons in the hidden layer of the cognitive

of McRBFN employ the Gaussian activation function. loss of generality, we assume that the McRBFN builds

neurons from t 1 training samples. For a given inputicted output of the jth output neuron (yt

j) of McRBFN is

jhtk, j = 1, . . . , n (2)

is the weight connecting the kth hidden neuron to theeuron and ht

kis the response of the kth hidden neuron

t xt is given by

xt lk2

(lk)2

)(3)

Rm is the center and lk

R+ is the width of the kth hid-. Here, the superscript l represents the corresponding

hidden neuron.itive component uses Projection Based Learning (PBL)

or learning process. The strategy proposed here is simi-f fast learning algorithm for single layer neural networkhe PBL algorithm is described as follows.on Based Learning algorithm: The Projection Basedlgorithm works on the principle of minimization oftion and nds the optimal network output parameters


cRBF

for which ththe minimu

The conserror at McRis dened a

Ji =nj=1

(eij)

where eijis

eij ={

0

yij

When yijyij 1

yij

otherwisej = 1, . . . , n (5)

1, the energy function for ith sample becomes

yij)2

=nj=1

0(yij

Kk=1wkjh

ik

)2, i = 1, . . . , t (6)

g samples, the overall energy function is dened as

1

Ji =12

ti=1

nj=1

0(yij

Kk=1wkjh

ik

)2(7)

the response of the kth hidden neuron for ith training

mal output weights (W RKn) are estimated suchal energy reaches its minimum.

minRKn

J(W) (8)

l W* corresponding to the minimum energy point offunction (J(W*)) is obtained by equating the rst order

k=1 i=1

Eq. (10) can

Kk=1akpwkj =

which can b

AW = B

where the p

akp =ti=1hik

and the out

bpj =ti=1hip

Eq. (11) givoutput weiclosed-form

Propositioni.e.xi, whenN.

vative of J(W) with respect to the output weight to zero,

p = 1, . . . , K; j = 1, . . . , n (9)

e rst partial derivative to zero and re-arranging we get

ipwkj =

thipy

ij (10)i=1

be written as

bpj, p = 1, . . . , K; j = 1, . . . , n (11)

e represented in matrix form as

(12)

rojection matrix A RKK is given by

hip, k = 1, . . . , K; p = 1, . . . , K (13)

put matrix B RKn is

yij, p = 1, . . . , K; j = 1, . . . , n (14)

es the set of K n linear equations with K n unknownghts W. We state the following prepositions to nd the

solution for these set of linear equations.

1. The responses of the hidden neurons are unique. k /= p, hi

k/= hip; k, p = 1, . . ., K, i = 1, . . ., t.


Proof. Let us assume that for a given xi, hip = hik; when k /= p, thisassumption is valid if and only if

lp == lk AND lp == lk (15)

But the paiselected sigsignicant in Eq. (33).(Et) and claneuron is ato the currcurrent samhence, the rfor all samp

Propositionfor at least f

Proof. Let0, i.e., hi

k=

, or lk

The inpu

that |xj| < 1 growth straallocated bawidths are ddistances wof the hidde

We state

Theorem 1matrix, and

Proof. Fro(13),

Apk =ti=1h

it can be inf

Akk =ti=1hik

From PropoTherefore E

Akk =ti=1

|h

Hence the ppositive, i.e

The off-d

Akj =ti=1hik

From Eqs. (1A is a symm

A symmqTAq > 0. Lethat q11 = 1 qT1Aq1 = A10. Thereforevector qk = [

qTkAqk = Ak

Let p RK be the linear transformed sum of K such unit basisvectors, i.e., p = q1t1 + + qktk + + qKtK, where tk R is the trans-formation constant. Then,

Kk=1

K K

wn in,

k R

he pble.

solus givtive s is g

) =

secoing o

functoutptiongy po

he Tn be

1B

Meta menitivolds.onit

of thmplt of Mg sawled

claser (pled

pted. Usie mg stre stranitivlizat

met

dictedted c

g mj1

um he er

clasr of vectors lk

and lp are allocated based upon thenicant training samples for addition of neurons, thesesamples are selected using neuron growth criterion as

Neuron growth criterion uses maximum hinge errorss-wise signicance ( c). c dened such that a newdded such that when there is no neuron present nearent sample which produces signicant output for theple. So there are no two neuron centers are equal andesponse of the kth and pth hidden neurons are not equalles.

2. The response of the each hidden neuron is non-zeroew samples.

us assume that the response of kth hidden neuron is 0 xi. This is possible if and only if xi , or l

k

0t variables xi are normalized in a circle of radius 1 such; j = 1, . . ., m. As shown in overlapping conditions of thetegy in subsection 2.2.3 that hidden neuron centers aresed upon the selected signicant training samples andetermined based upon inter/intra class nearest neuronhich are nonzero positive values. Hence, the responsen neuron is non-zero for at least few samples.

the following theorem, using the Propositions 1 and 2.

. The projection matrix A is a positive denite symmetrichence it is invertible.

m the denition of the projection matrix A given in Eq.

iphik, p = 1, . . . , K; k = 1, . . . , K (16)

er that the diagonal elements of the A are:

hik, k = 1, . . . , K (17)

sition 2, the hidden neurons response are non-zero.q. (17) can be written as

ik|2 > 0 (18)

rojection matrix diagonal elements are non-zero, and., Ai

kk R+ > 0.

iagonal elements of the projection matrix (A) are:

hij =ti=1hijh

ik = Ajk (19)

7) and (19), it can be inferred that the projection matrixetric matrix.etric matrix is positive denite iff for any q /= 0,t us consider an unit basis vector q1 RK1 suchand q12 q1K = 0, i.e., q1 = [1 0 0 0]T. Therefore,1 In Eq. (17), it was shown that k = 1, . . . , K, Akk R >, A11 R > 0 qT1Aq1 > 0. Similarly, for an unit basis0 1 0]T, the product qT

kAqk is given by

k > 0; k = 1, . . . , K (20)

pTAp =

As shoHence

|tk|2Ak

Thus, tinverti

Thetions aderivaweight

2J(Wwlp2

As the follow

1 The 2 The

equaener

Using t(12) ca

W = A

2.2.2. The

the cogthreshnent mmodel(tth) saponentraininits knodictedclassiof knoware adasampleolds, thlearninof thesthe coggenera

Thebelow:

Prepredic

ct = ar

Maximmize th(yt). In(qktk)TAk=1

(qktk) =k=1

|tk|2Akk (21)

Eq. (17), Akk R> 0. Also, that |tk|2 R> 0 is evident.

> 0; k = 1, . . . , K Kk=1

|tk|2Akk R > 0 (22)

rojection matrix A is positive denite, and hence it is

tion for W obtained as a solution to the set of equa-en in Eq. (12) is minimum, if 2J/wlp2 > 0. The secondof the energy function (J) with respect to the outputiven by,

t

i=1hiph

ip =

ti=1

|hip|2 > 0 (23)

nd derivative of the energy function J(W) is positive, thebservations can be made from Eq. (23):

ion J is a convex function.ut weight W* obtained as a solution to the set of linears (Eq. (12)) is the weight corresponding to the minimumint of the energy function (J).

heorem 1, the solution for the system of equations in Eq. determined as follows:

(24)

-cognitive component of McRBFNta-cognitive component contains dynamic model ofe component, knowledge measures and self-regulated

During the learning process, meta-cognitive compo-ors the cognitive component and updates its dynamice cognitive component. When a new training samplee is presented to the McRBFN, the meta-cognitive com-cRBFN estimates the knowledge present in the new

mple with respect to the cognitive component usingge measures. The meta-cognitive component uses pre-s label (ct), maximum hinge error (Et), condence of(ct |xt)) and class-wise signicance ( c) as the measuresge in the new training sample. Self-regulated thresholds

to capture the knowledge presented in the new trainingng the knowledge measures and self-regulated thresh-eta-cognitive component constructs two sample basedategies and two neuron based learning strategies. Onetegies is selected for the new training sample such thate component learn them accurately and achieves betterion performance.a-cognitive component measures are dened as shown

class label (ct): Using the predicted output (yt), the

lass label (ct) can be obtained as

ax,...,n

ytj (25)

inge error (Et): The objective of the classier is to mini-ror between the predicted output (y

t) and actual output

sication problems, it has been shown in [32,40] that


the classier developed using hinge loss error estimates the poste-rior probability more accurately than the classier developed usingmean square error. Hence, in McRBFN, we use the hinge loss error(

et =[et1,

t t]T)

n

The max

Et = maxj1,2,...,

Condence otion or pred

p(j|xt) =mi

Class-wise smapped onK Gaussian dimensionapotential ofdistance fro[35].

In McRBdescribe thfeature spanew data xoriginal spafeature spac

= ||h(xt) As shown in

= h(xt , xt

From the athe rst terconstants. Smay be disc

2K

Kk=1

Since wedistributionthe classiespherical poc with respeKc be the nuwise spherias

c = 1Kc

Kck=1

The sphetained in thto one) indiedge in thepotential (c

2.2.3. LearnMeta-co

gies using twhich direhuman lealearn). The

cognitive component by selecting one of the following four learningstrategies for the new training sample.

Sample delete strategy: If the new training sample containsmationeata son ghiddtion,

hidmetete thdateple r

info of togniearnmete

incipelow

ple dtrain

posle dan bing p

ct A

metales p

to 1rocecingy sam

mayctede raning o-trainon gcan

the on toon gr

ct O

e cnd ermarnireshhold

very willosenloserork ty. He

entange. . . , ej, . . . , en R dened as in Eq. (5).

imum absolute hinge error (Et) is given by

n

etj (26)f Classier (p(ct |xt)): The condence level of classica-icted posterior probability is given as

n(1, max(1, ytj)) + 1

2, j = ct (27)

ignicance ( c): In general, the input feature (xt) is to a hyper-dimensional spherical feature space S usingneurons, i.e., xt H. Therefore, all H(xt) lie on a hyper-l sphere as shown in [41]. The knowledge or spherical

any sample in original space is expressed as a squaredm the hyper-dimensional mapping S centered at h0

FN, the center () and width () of the Gaussian neuronse feature space S. Let the center of the K-dimensionalce be h0 = 1K

Kk=1h(k). The knowledge present in the

t can be expressed as the potential of the data in thece, which is squared distance from the K-dimensionale to the center h0. The potential ( ) is given as

h0||2 (28) [35], the above equation can be expressed as

) 2K

Kk=1h(xt , lk) +

1K2

Kk,r=1

h(lk, lr) (29)

bove equation, we can see that for Gaussian functionm (h(xt, xt)) and last term (1/K2

Kk,r=1h

(lk, lr

)) are

ince potential is a measure of novelty, these constantsarded and the potential can be reduced to

h(xt , lk) (30)

are addressing classication problems, the class-wise plays a vital role and it will inuence the performancer signicantly [15]. Hence, we use the measure of thetential of the new training sample xt belonging to classct to the neurons associated to same class (i.e., l = c). Letmber of neurons associated with the class c, then class-cal potential or class-wise signicance ( c) is dened

h(xt , ck) (31)

rical potential explicitly indicates the knowledge con-e sample, a higher value of spherical potential (closecates that the sample is similar to the existing knowl-

cognitive component and a smaller value of sphericallose to zero) indicates that the sample is novel.

ing strategiesgnitive component devices various learning strate-he knowledge measures and self-regulated thresholds,ctly addresses the basic principles of self-regulatedrning (i.e., what-to-learn, when-to-learn and how-to-meta-cognitive part controls the learning process in

inforcomping d

Neurnew addia new

Paraupdato up

Samsomestagethe cout lpara

The prdetail b

Samnew matesampand clearn

ct ==

The sampcloseing pRedumanworkexpein thlearnover

Neursignifromneurneur(ct /=

wherold aThe tfor lec ththresthenworkis chsen cnetwabilisuremthe ron similar to the knowledge present in the cognitivent, then delete the new training sample from the train-et without using it in the learning process.rowth strategy: Use the new training sample to add aen neuron in the cognitive component. During neuronsample overlapping conditions are identied to allocateden neuron appropriately.r update strategy: The new training sample is used toe parameters of the cognitive component. PBL is used

the parameters.eserve strategy: The new training sample containsrmation but not signicant, they can be used at laterhe learning process for ne tuning the parameters oftive component. These sample may be discarded with-ing or used for ne tuning the cognitive componentrs in a later stage.

le behind these four learning strategies are described in:

elete strategy: When the predicted class label of theing sample is same as the actual class label and the esti-terior probability is close to 1, then the new trainingoes not provide additional information to the classiere deleted from training sequence without being used inrocess. The sample deletion criterion is given by

ND p(ct |xt) d (32)

-cognitive deletion threshold (d) controls number ofarticipating in the learning process. If one selects d

then all the training samples participates in the learn-ss which results in over-training with similar samples.

d beyond the desired accuracy results in deletion of tooples from the training sequence. But, the resultant net-

not satisfy the desired accuracy. Hence, it is xed at the accuracy level. In our simulation studies, it is selectedge of [0.90.95]. The sample deletion strategy preventsf samples with similar information, and thereby, avoidsing and reduces the computational effort.rowth strategy: When a new training sample containst information and the predicted class label is differentactual class label then one need to add a new hidden

represent the knowledge contained in the sample. Theowth criterion is given by

R Et a)

AND c(xt) c (33)

is the meta-cognitive knowledge measurement thresh-a is the self-adaptive meta-cognitive addition threshold.s c and a allows samples with signicant knowledgeng rst then uses the other samples for ne tuning. Ifold is chosen closer to zero and the initial value of a

is chosen closer to the maximum value of hinge error, few neurons will be added to the network. Such a net-

not approximate the function properly. If c threshold closer to one and the initial value of a threshold is cho-

to the minimum value of hinge error, then the resultantmay contain many neurons with poor generalizationnce, the range for the meta-cognitive knowledge mea-

threshold can be selected in the interval [0.30.7] and for the initial value of self-adaptive meta-cognitive


addition threshold can be selected in the interval [1.31.7]. Thea is adapted as follows

a := a + (1 )Et (34)

where isclose to oonly whesignican

The newor will beron in theneuron stin the samthe paramlearning anearest nsample. Tinitializatsier signproposed Inter/in

sample Existing

neuronLet nrS bethe neare

nrS = arg

Let the EunrS and n

dS = ||xt

Using theoverlappi Distinct

both inlnrI) thclass cluhiddendeterm

cK+1 =

where responsthe ran

No-overintra-clwith this less tclasses.width (

cK+1 =

Minimuing samto the itance roverlapnew hidneuronis initia

cK+1 =

where is center shift factor which determines how much cen-ter has to be shifted from the new training sample location. Inour simulation studies value is xed to 0.1.

Signicant overlapping with the inter-class: When a new trainingple

the ice rerlapw hiduron

+1 =boves in mhen ledghts ae siz

re ht

ons ing sent inurone. Thffect

as p+1

,p =

=

aK+1,

,K+1

ize o

1)n =

bK+1

,j =K

re yi ion (

1

1

ly th

tK

+1

] the slope that controls rate of self-adaptation and is setne. The a adaptation allows McRBFN to add neuronsn presented samples to the cognitive network containst information.

training sample may have overlap with other classes from a distinct cluster far away from the nearest neu-

same class. Therefore, one need to identify the currentatus (overlapping with other classes or distinct clustere class) with respect to exiting neurons and initializeeters of the new neuron (K + 1). The existing sequentiallgorithms initialize width based on the distance witheuron and output weight as error based on the currenthe inuence of past samples is not considered in weightion. Hence, it will affect the performance of the clas-icantly. The above mentioned issues are dealt in the

McRBFN astra class nearest neuron distances from the current

for width determination. knowledge of past samples stored in the network as

center is used to initialize the weight of new neuron. the nearest hidden neuron in the intra-class and nrI best hidden neuron in the inter-class. They are dened as

minl==c;k

xt lk; nrI = arg minl /= c;kxt lk (35)

clidian distances between the new training sample torI are given as follows

cnrS ||; dI = ||xt lnrI || (36)

nearest neuron distances, we can determine theng/no-overlapping conditions as follows:

sample: when a new training sample is far away fromtra/inter class nearest neurons (dS >> cnrS AND dI >>en the new training sample does not overlap with anyster, and is from a distinct cluster. In this case, the new

neuron center (cK+1) and width (cK+1) parameters are

ined as

xt; cK+1 =

(xt)Txt (37)

is a positive constant which controls the overlap of thees of the hidden units in the input space, which lies inge 0.5 1.lapping: When a new training sample is close to theass nearest neuron then the sample does not overlape other classes, i.e., the intra/inter class distance ratiohan 1, then the sample does not overlap with the other

In this case, the new hidden neuron center (cK+1) andcK+1) parameters are determined as

xt; cK+1 = xt cnrS (38)

m overlapping with the inter-class: when a new train-ple is close to the inter-class nearest neuron comparedntra-class nearest neuron, i.e., the intra/inter class dis-atio is in range 11.5, then the sample has minimumping with the other class. In this case, the center of theden neuron is shifted away from the inter-class nearest

and shifted towards the intra-class nearest neuron, andlized as

xt + (cnrS lnrI); cK+1 = cK+1 cnrS (39)

samto tanovnene

cK

The ahelp

Wknowweig

Th

wheneurlearnpresof nespacthe eused

aK

aK+1

and

aK+1

The s

Bt(K+

and

bK+1

wheneur

yij =

Final[W

wtK is very close to the inter-class nearest neuron comparedntra-class nearest neuron, i.e., the intra/inter class dis-atio is more than 1.5, then the sample has signicantping with the other class. In this case, the center of theden neuron is shifted away from the inter-class nearest

and is initialized as

xt (lnrI xt); cK+1 = cK+1 lnrI (40) mentioned center and width determination conditionsinimizing the misclassication in McRBFN classier.

a neuron is added to McRBFN, based on the existinge of past samples stored in the network the outputre estimated using the PBL as follows:e of matrix A is increased from K K to (K + 1) (K + 1)

(41)

=[ht1, h

t2, . . . , h

tK

]is a vector of the existing K hidden

response for new (tth) training sample. In sequentialamples are discarded after learning, but the information

the past samples are stored in the work. The centers provides the distribution of past samples in featureese centers can be used as pseudo-samples to capture

of past samples. Hence, existing hidden neurons areseudo-samples to calculate aK+1 and aK+1,K+1 terms.R1K is assigned as

K+1i=1hiK+1h

ip, p = 1, . . . , K where hip

exp

(

li lp2

(lp)2

)(42)

K+1 R+ value is assigned as

=K+1i=1hiK+1h

iK+1 (43)

f matrix B is increased from K n to (K + 1) nBt1Kn + (ht)T(yt)TbK+1

(44) 1 nR is a row vector assigned as+1

i=1hiK+1y

ij, j = 1, . . . , n (45)

s the pseudo-output for the ith pseudo sample or hiddenli) given as

if l = j otherwise j = 1, . . . , n (46)

e output weights are estimated as

=(At(K+1)(K+1)

)1Bt(K+1)n (47)


where WtK is the output weight matrix for K hidden neurons, andwtK+1 is the vector of output weights for new hidden neuron afterlearning from tth sample. The inverse of a matrix At(K+1)(K+1) iscalculated recursively using matrix identities as

where

aK+1(AtK

(AtKK )1

After calcthe result

WtK =[IK

[Wt1K +

(wtK+1 =

Parameteis used to(WK = [w

ct == ct A

where uthresholdimum hiadapting will be putant netwIf a loweupdating sequenceparamete[0.40.7]

The u

u := uwhere paramete

When aters, the Pfollows:

J(WtK )wpj

=

Equating (55), we g

(At1 + (h

By substituting Bt1 = At1Wt1K &At1 +(ht)T

ht = At andadding/subtracting the term

(ht)T

htWt1K on both sides Eq. (56)reduced to

(At

ly th

WtK

e et

ple rfy eitone

pusies

samally,in th

sam

L-M

summ give

ocodhm.: Preseata stt: Decen fea

zatione parearnin

eta-coithspect t

CompFindscondusingd on aponenple De

== clete

n Grow

ct /= a ne

ose t

7) to (ate t

resh

eters U

ct ==ate thate thq. (54)ple ReE

curren the sased toIF

itive cO

BL-My d(48)

AtKK = At1 +(ht)T

ht , = aK+1,K+1 1K

)1aTK+1 and

(AtKK

)1is calculated as

= (At1)1 (At1)

1(ht)Tht(At1)

1

1 + ht(At1)1(ht)T(49)

ulating inverse of matrix in Eq. (47) using Eqs. (48)&(49),ant equations are

K +(At1KK

)1aTK+1aK+1

](50)

At1KK)1(

ht)T(

yt)T] (At1KK)1aTK+1bK+1

(51)

1

[aK+1

(Wt1K +

(At1KK

)1(ht)T(

yt)T) bK+1]

(52)

rs update strategy: The current (tth) training sample update the output weights of the cognitive component1, w2, . . . , wK ]

T ) if the following criterion is satised.

ND Et u (53)

is the self-adaptive meta-cognitive parameter update. If u threshold is chosen closer to 50% of max-nge error, then very few samples will be used forthe network parameters and most of the samplesshed to the end of the training sequence. The resul-ork will not approximate the function accurately.r value is chosen, then all samples will be used inthe network parameters without altering the training. Hence, the range for the initial value of meta-cognitiver update threshold can be selected in the interval.is adapted based on the hinge error as:

+ (1 )Et (54)

is the slope that controls the rate of self-adaption ofr update and is set close to one.

sample is used to update the output weight parame-BL algorithm updates the output weight parameters as

J(WtK )wpj

+ Jt(WtK )

wpj= 0, p = 1, . . . , K; j = 1, . . . , n

(55)

the rst partial derivative to zero and re-arranging Eq.et

t)Tht)WtK (Bt1 + (ht)T (yt)T ) = 0 (56)

WtK =Final

WtK =wher(5).

Samsatiscompple ismodthese

Ideable when

2.3. PB

To form is

PseudAlgorit

Inputfrom dOutpubetwe

STARTInitiali

ThStart lDO

Mwre

BasecomSam

IF ct

De

Neuro

ELSEIFAdd

Cho

(3Upd

th

Param

ELSEIFUpdUpdto ESamELSThe

end oflater u

END

The cognENDD

END

In Plearn b)1(AtWt1K + (ht)T ((yt)T htWt1K )) (57)e output weights are updated as

1 +(At)1(

ht)T(

et)T

(58)

is the hinge loss error for tth sample obtained from Eq.

eserve strategy: If the new training sample does nother the deletion or the neuron growth or the cognitivent parameters update criterion, then the current sam-hed to the rear of the training sequence. Since McRBFNthe strategies based on the current sample knowledge,ples may be used in later stage.

training process stops when no further sample is avail-e data stream. However, in real-time, training stopsples in the reserve remains same.

cRBFN classication algorithm

arize, the PBL-McRBFN algorithm in a pseudo coden in Pseudo code 1:

e 1. Pseudo code for the PBL-McRBFN classication

nt the training data one-by-one to the networkream.ision function that estimates the relationshipture space and class label.

: Assign the rst sample as the rst neuron(K=1).ameters of the neuron are chosen as shown in Eq. (37).g for samples t = 2, 3,...

gnitive component computes the signicance of the sample

o the cognitive component:

utes the cognitive component output ytusing Eq. (2).

the predicted class label ct , maximum hinge error Et ,ence of classier p(ct |xt ) and class-wise signicance c

Eqs.(25),(26) and (31).bove calculated measures the meta-cognitivet selects one of the following strategies:lete Strategy:tANDp(ct |xt ) d THEN

the sample from the sequence without learning.

th Strategy:

ctOREt aAND c(xt ) c THENuron to the network (K = K+1).he parameters of the new hidden neuron using Eqs.52).he self-adaptive meta-cognitive addition

old according to Eq. (34)pdate Strategy:

ctANDEt u THENe parameters of the cognitive component using Eq. (58)e self-adaptive meta-cognitive update threshold according

serve Strategy:

t sample xt , yt is pushed to the rearmple stack to be used in future. They can be

ne-tune the cognitive component parameters.

omponent executes the above selected strategy.

cRBFN, sample delete strategy address the what-to-eleting insignicant samples from training data set,


Table 1Description of benchmark data sets selected from UCI machine learning repository for performance study.

Data sets No. of features No. of classes No. of samples I.F

Training Testing Training Testing

Image segm 210 IRIS 45 WINE 60Vehicle class 424a

Glass identi 109a

HEART 70 Liver disorde 200 PIMA 400 Breast cance 300Ionosphere ( 100

a Training sa

neuron grothe how-tolearns fromthresholds when-to-leaaccording t

3. Perform

PBL-McRmark multimachine lethe best peliterature (Sstandard suwith varyinusing Imba

I.F = 1 nN

where Nj isclass j and Nthe numbeber of samppresented ilems chosensets and thePBL-McRBFproblems: ttoring data for breast c

All the sion a desktoFor ELM clathe construulations forthe LIBSVM(c,) are opmeasures u

3.1. Perform

The clasciencies anmultiple clacomparison

3.1.1. ClassThe conf

mance and

anc is de

jj

j 1

qjj isand

traition aer-al

nj=1

Statis clasclassparean tsh thescridmaultip

ith class

irji) o

n by

12M(L +

folloegree

squaentation (IS) 19 7 4 3

13 3ication (VC) 18 4 cation (GI) 9 6

13 2 rs (LD) 6 2

8 2 r (BC) 9 2ION) 34 2

mples are repeated three times randomly as suggested in [15].

wth strategy and parameters update strategy address-learn efciently by which the cognitive component

the samples, and self-adaptive nature of meta-cognitivein addition to the sample reserve strategy address thern by presenting the samples in the learning processo the knowledge present in the sample.

ance evaluation of PBL-McRBFN classier

BFN classier performance is evaluated on bench--category and binary classication problems from UCIarning repository. The performance is compared withrforming sequential learning algorithm reported in theRAN) [20], batch ELM classier [16] and also with thepport vector machines [42]. The data sets are choseng sample imbalance. The sample imbalance is measuredlance Factor (I.F) as

minj=1...n

Nj (59)

the total number of training samples belonging to the =n

j=1Nj . The description of these data sets includingr of input features, the number of classes, the num-les in the training/testing and the imbalance factor aren Table 1. From Table 1, it can be observed that the prob-

for the study have both balanced and unbalanced data imbalance factors of the data sets vary widely. Finally,N classier is used to solve two real-world classicationhe acoustic emission signal processing for health moni-set presented in [38] and the mammogram classicationancer detection data set presented in [43].mulations are conducted in MATLAB 2010 environmentp PC with Intel Core 2 Duo, 2.66GHz CPU and 3GB RAM.ssier, the number of hidden neurons are obtained usingctive-destructive procedure presented in [44]. The sim-

batch SVM with Gaussian kernels are carried out using package in C [45]. For SVM classier, the parameters

performwhich

j =q

N

whereclass j j in theevaluathe ov

a = 1n

3.1.2. The

of an is comFriedmestablibrief d

Frieover mon theall the

1/M

is give

2F = L

whichL 1 dsum oftimized using grid search technique. The performancesed to compare the classiers are described below.

ance measures

s-wise performance measures like overall/average ef-d a statistical signicance test on performance ofssiers on multiple data sets are used for performance.

-wise measureusion matrix Q is used to obtain the class-level perfor-global performance of the various classiers. Class-level

Iman anmore conse

FF =(M

M(L which follodegrees of as the probdistributionthe statisticclassier iscondence 2100 0 0105 0 0118 0 0.29422 0.1 0.12105 0.68 0.77

200 0.14 0.1145 0.17 0.14368 0.22 0.39383 0.26 0.33251 0.28 0.28

e is measured by the percentage classication (j)ned as:

00% (60)

the total number of correctly classied samples in theNj is the total number of samples belonging to a classning/testing data set. The global measures used in there the average per-class classication accuracy (a) and

l classication accuracy (o) dened as:

j, o =

nj=1qjj

N 100% (61)

tical signicance testsication efciency itself is not a conclusive measureier performance [37]. Since the developed classierd with multiple classiers over multiple data sets, theest followed by the Benferroni-Dunn test is used toe statistical signicance of PBL-McRBFN classier. Aption of the conducted test is given below.n Test: It is is used to compare multiple classiers (L)le data sets (M). Let rj

ibe the rank of the jth classier

data set. Under the null-hypothesis, which states thatiers are equivalent and so their average rank Rj (Rj =ver all data sets should be equal, the Friedman statistic

1)

j

R2j L(L + 1)2

4

(62)ws the 2 (Chi-square distribution) distribution withs of freedom. A 2 distribution is the distribution of ares of L independent standard normal variables.

d Davenport showed that Friedmans statistic (2F ) isrvative and derived a better statistic [46]. It is given by

1)2F 1) 2F

(63)

ws the F-distribution with L 1 and (L 1)(M 1)freedom is used in this paper. F-distribution is denedability distribution of the ratio of two independent 2

s over their respective degrees of freedom. The aim ofal test is to prove that the performance of PBL-McRBFN

substantially different from the other classiers with alevel of value 1 . If calculated FF > F/2,(L1),(L1)(M1)


or FF < F1/2,(L1),(L1)(M1), then the null-hypothesis is rejected.The Statistical tables for critical values can be found in [47].

Post-hoc Test: The Benferroni-Dunn test [48] is a post-hoc testthat can be performed after rejection of the null-hypothesis. It isused to comsiers. Thisare signicaby at least ter i perforis calculated

CD = qL

where criticdivided by

3.2. Perform

The clasefcienciesMcRBFN, SRTable 2 conclassicatioTable 2, webetter thanbetter thanaddition, thples to leararchitecture

Well balalization pethan SRAN ers. On IS achieve 2% achieves apclassiers. Suses fewer generalizatgeneralizatrithm, whicon the currto avoid ovuses only 8classier.

In orderposed PBL-ELM classiclassier. Osamples forELM algorit

The testi89 samplesdeveloped ugeneralizatrons (ELM*

testing efachieve 90.strategy premaking abil

Imbalancsets, the gemately 2 1ELM and SMcRBFN redecision su

decision surface. Class-overlap based criterion in initializing thecenters and width of new neuron in PBL-McRBFN and meta-cognitive learning helps PBL-McRBFN to achieves signicantlybetter generalization performance. For example, in VC data set pro-

PBL- efcassit hasigh imassire that SRsses

of pr (a) poseor thtely. e tess, 5%% ims.

ary dN acovernd Btingith fwer e tesemeeciing troblecRBF

imb

Statishis sN clan tion 3are sexpehliger ovmpar).

parassiere pr

is 16. (63)

stati 27

hypo statiejecd PBiersxt, wd PB64), canhat pare PBL-McRBFN classier against all the other clas- test assumes that the performances of two classiersntly different if the corresponding average ranks differhe Critical Difference (CD), i.e, (Ri Rj) > CD then classi-ms signicantly than classier j. The critical difference

using

(L + 1)6M

(64)

al values q are based on the Studentized range statistic2 as given in [37].

ance evaluation on UCI benchmark data sets

s-wise performance measures (average/overall) testing, number of hidden neurons and samples used for PBL-AN, ELM and SVM classiers are reported in Table 2. Thetains results of both the binary and the multi-categoryn data sets from UCI machine learning repository. From

can see that PBL-McRBFN classier performs slightly the best performing SRAN classier and signicantly

ELM and SVM classiers on all the 10 data sets. Ine proposed PBL-McRBFN classier requires fewer sam-n the decision function and develops compact neural

to achieve better generalization performance.anced data sets: In IS, IRIS, WINE data sets, the gener-rformance of PBL-McRBFN is approximately 2% moreclassier and 3 4 % more than ELM and SVM classi-data set proposed PBL-McRBFN uses fewer samples toimprovement over SRAN and proposed PBL-McRBFNproximately 3 4 % improvement over ELM and SVMimilar to IS, on IRIS and WINE data sets, PBL-McRBFNsamples with less number of neurons to achieve betterion performance. PBL-McRBFN classier achieves betterion performance using meta-cognitive learning algo-h selects appropriate samples to used in learning basedent knowledge. Also, deletes many redundant sampleser training. For example, in IS data set, PBL-McRBFN9 samples out of 210 training samples to build the best

to highlight the above-mentioned advantages of pro-McRBFN classier, we conduct a simulation study iner with only training samples used by PBL-McRBFNn IS data set, PBL-McRBFN classier selects the best 89

training and these samples are used in batch learninghm and we refer this classier as ELM*.ng performance of ELM* classier (which uses the best

sequence) is better than the original ELM classiersing 210 training samples. Also, ELM* achieves better

ion performance with smaller number of hidden neu-requires only 32 hidden neurons to achieve 92.14%

ciency whereas ELM requires 49 hidden neurons to23%). This study clearly indicates that sample deletionsent in PBL-McRBFN helps in achieving better decisionity.ed data sets: In VC, GI, HEART, LD, PIMA, BC, ION dataneralization performance of PBL-McRBFN is approxi-0 % more than SRAN classier, and 2 15 % more thanVM classiers. In case of imbalance data sets, PBL-quire more number of neurons to approximate therface with minimal samples for approximating the

posed testingELM cldata seSuch hSVM cl6% mofact ththe claIn caseciencythe proedge faccuraaveragsampleand 15neuron

BinMcRBF2 7 % PIMA aage tesSVM wuses feaveragimprovclass spcapturance pPBL-Msample

3.2.1. In t

McRBFFriedmin Sectranks is 2.5) test higclassiour co(M = 10

Non-4 claset a(62))in Eqied3 andnull iedcan rposeclass

NeposeEq. (signisee tMcRBFN uses fewer samples to achieve better averageiency approximately 2% improvement over SRAN anders, and 10% improvement over SVM classier. The GI

imbalance factor of 0.68 in training and 0.77 in testing.balance inuences the performance of SRAN, ELM and

ers. On GI data set, SRAN overall testing efciency (o) isan the average testing efciency (a). This is due to theAN classier is not able to capture the knowledge forwhich contain smaller number of samples accurately.oposed PBL-McRBFN classier, the average testing ef-is 8% more than the overall testing efciency (o). Thusd PBL-McRBFN classier is able to captures the knowl-e classes which contain smaller number of samplesOn GI data set proposed PBL-McRBFN achieves betterting efciency 12% improvement over SRAN with fewer

improvement over ELM with less number of neurons,provement over SVM classier with fewer number of

ata sets: On HEART and LD data sets proposed PBL-hieves better average testing efciency approximately

SRAN, ELM and SVM with less number of neurons. OnC data sets proposed PBL-McRBFN achieves better aver-

efciency approximately 1 2 % over SRAN, ELM andewer samples. On ION data set proposed PBL-McRBFNsamples with less number of neurons to achieve betterting efciency 5% improvement over SRAN and 8 9 %nt over ELM and SVM. The overlapping conditions andc criterion in learning strategies of PBL-McRBFN helps inhe knowledge accurately in case of high sample imbal-ms. From the Table 2, we can say that the proposedN improves average/overall efciency even under highalance.

tical signicance analysisection, we highlight the signicance of proposed PBL-assier on multiple data set using non-parametricest followed by the Benferroni-Dunn test as described.1.2. The Friedman test identify the measured averageignicantly different from the mean rank (mean rankcted under the null-hypothesis. The Benferroni-Dunnhts statistical difference in performance of PBL-McRBFNer other classiers. From the Table 2, we can see thatison study uses four classiers (L = 4) and ten data sets

metric test using overall testing efciency (o): Ranks of allrs based on the overall testing efciency for each dataovided in Table 3. The Friedman statistic (2F as in Eq..89 and modied (Iman and Davenport) statistic (FF as) is 11.59. For four classiers and ten data sets, the mod-stic is distributed according to the F-distribution withdegrees-of-freedom. The critical value for rejecting thethesis at signicance level of 0.05 is 3.65. Since, mod-stic is greater than the critical value (11.59 3.65), wet the null hypothesis. Hence, we can say that the pro-L-McRBFN classier performs better than the existing

on these data sets.e conduct the Benferroni-Dunn test to compare the pro-L-McRBFN classier with the all other classiers. Fromthe critical difference (CD) is calculated as 1.382 for ace level of 0.05 (q0.05 = 2.394). From Table 3, we canthe difference in average rank between the proposed


Table 2Performance comparison of PBL-McRBFN with SRAN, ELM and SVM.

Data sets PBL-McRBFN SRAN ELM SVM

K Samples Testing K Samples Testing K Testing SVa Testing

a o a o a

IS 92.29 49 90.23 90.23 127 91.38 91.38IRIS 96.19 10 96.19 96.19 13 96.19 96.19WINE 97.19 10 97.46 98.04 36 97.46 98.04VC 76.86 150 77.01 77.59 340 70.62 68.51GI 80.95 80 81.31 87.43 183 70.47 75.61HEART 77.53 36 76.50 75.91 42 75.50 75.10LD 65.78 100 72.41 71.41 141 71.03 70.21PIMA 74.90 100 76.63 75.25 221 77.45 76.43BC 97.26 66 96.35 96.48 24 96.61 97.06ION 91.88 32 89.64 87.52 43 91.24 88.51

a Number o

Table 3Ranks based o

Data sets

ISIRIS WINE VC GI HEART LD PIMABC IONAverage ran

PBL-McRB2.05 and critical dithe Benfeclassier siers.

Non-paraall 4 classdata set ain Eq. (6213.9. Sinc(13.9 3.6that the pthe other

From Tbetween tclassiersgreater thaverage tthat the pthe othermance reclassicatset for heaclassicat[43].

3.3. Acoust

The streducer due deformatioThese signtion/identi

difcic em

furtic emmple

acoUsed o a Used o

50 89 94.19 94.19 47 113 92.29 6 20 98.10 98.10 8 29 96.19

11 29 98.31 98.69 12 46 96.61 175 318 78.91 79.09 113 437 75.12 71 115 84.76 92.72 59 159 86.21 20 69 81.50 81.47 28 56 78.50 87 116 73.1 72.63 91 151 66.90

100 162 79.62 76.67 97 230 78.53 13 45 97.39 97.85 7 91 96.87 18 58 96.41 96.47 21 86 90.84

f support vectors

n the overall (o) and average (a) testing efciencies.

PBL-McRBFN SRAN

o a o a

1 1 2 21 1 3 3 1 1 4 4 1 1 3 3 2 1 1 3 1 1 2 2 1 1 4 4 1 1 2 4 1 1 2 2 1 1 3 2

k (Rj) 1.1 1 2.6 2.9

FN classier and the other three classiers are 1.5,2.05. The difference in average rank is greater than thefference. Hence, based on the overall testing efciencyrroni-Dunn test shows that the proposed PBL-McRBFNis signicantly better than the SRAN, ELM and SVM clas-

a very acoustplexityacoustthe coof suchmetric test using average testing efciency (a): Ranks ofiers based on the average testing efciency for eachre provided in Table 3. The Friedman statistic (2F as)) is 18.21 and modied statistic (FF as in Eq. (63)) ise, modied statistic is greater than the critical value5), we can reject the null hypothesis. Hence, we can sayroposed PBL-McRBFN classier performs better than

classiers on these data sets.able 3, we can see that the difference in average rankhe proposed PBL-McRBFN classier and the other three

are 1.9, 1.95 and 2.15. The difference in average rank isan the critical difference (1.382). Hence, based on theesting efciency, the Benferroni-Dunn test also showsroposed PBL-McRBFN classier performs better than

well known classiers. Next, we present the perfor-sults of PBL-McRBFN classier on the two real-worldion problem data sets, viz., an acoustic emission datalth monitoring presented in [38] and the mammogramion data set for breast cancer detection presented in

ic emission signal classication for health monitoring

ss or pressure waves produced by the sensitive trans-to the transient energy released by the irreversiblen in the material are called as acoustic emission signals.als are produced by various sources and classica-cation of sources using the acoustic emission signals is

classier. Ttic emissionstudy as gicharacterizof the 4 souspark sourcare used forples are useof input fea[38].

The percompared Table 4. It caicant samplachieve an classier pesion signals

Table 4Performance c

Classier

PBL-McRBFNSRAN ELMSVM

a Number oELM SVM

o a o a

4 4 3 33 3 3 32.5 2.5 2.5 2.52 2 4 43 2 4 43 3 4 42 2 3 34 3 3 24 4 3 34 4 2 33.15 2.95 3.15 3.15

ult problem. The presence of ambient noise and pseudoission signals in practical situations increases the com-her. In addition, the supercial similarities between theission signals produced by different sources increasesxity further. In this section, we address classicationustic emission signals using the proposed PBL-McRBFN

he experimental data provided for the burst type acous-

signals from the metallic surface is considered for ourven in [38]. The burst type acoustic emission signal ised by 5 features and these signals are classied into onerces, namely, the pencil source, the pulse source, thee and the noise source. Out of 199 samples, 62 samples

training (as highlighted in [38]) and the remaining sam-d for testing the classier. For details on characteristicstures and the experimental setup, one should refer to

formance study results of PBL-McRBFN classier areagainst the SRAN, ELM, and SVM, and presented inn be seen that PBL-McRBFN classier uses only 9 signif-es to build the classier and requires only 5 neurons toover-all testing efciency of 99.27%. Thus, PBL-McRBFNrforms an efcient classication of the acoustic emis-

using a compact network.

omparison on acoustic emission signal problem.

Hidden Samples Testing

Neurons Used o a

5 9 99.27 98.9110 39 99.27 98.9110 62 99.27 98.9122a 62 98.54 97.95

f support vectors.


Table 5Performance comparison on mammogram classication problem.

Classier Hidden Samples Testing

PBL-McRBFNSRAN ELMSVM

1 Number o

3.4. Mamm

Mammocer, as tumbefore theyically, idenabnormal mas either mtumor is debenign or mprocedure tpatient. A nin a mammsies, thus spcosts. In thbeen used. of the identeither maligPBL-McRBFsier is evadetails on t[43].

The perfison with From the tahighly efcismaller numSVM classierably.

Thus, frowith SRAN, classicatioMcRBFN cla

4. Conclus

In this pFunction N(PBL) algorwork. The learning ofcognitive coimplementiwhen-to-leaping conditinitializatiomizes the mPBL-McRBFmulti-categlearning reppractical claparison witindicates thclassier.

Acknowledgements

The authors would like to thank the Nanyang Technologicalsity-Ministry of Defence (NTU-MINDEF), Singapore, for theal sut thi

nces

. Zhantems, 462eCunel, Baputa

Li, T.Jlied Sri, G. rk to c340avi, Cwork ks, Ap. RuizrmatKhan,lied S. RumpagatB. Huaeme oNeuraB. Huachine

Platt, ation ingwection ral Co

B. Huaorithms on 422Y. Lianine ses on Nuresher usi513ureshdied09) 54Kasabine knernet. Rive

essmedern Lsaacsorning:p of Teureshource 230SureshcompnsactiSateesblems96.Subramneuro336Neurons Used o a

22 60 100 10025 45 90.91 91.6730 97 90.91 90.0261 97 90.91 91.67

f support vectors

ogram classication for breast cancer detection

gram is a better means for early diagnosis of breast can-ors and abnormalities show up in mammogram much

can be detected through physical examinations. Clin-tication of malignant tissues involves detecting theasses or tumors, if any, and then classifying the massalignant or benign as given in [39]. However, once atected, the only method of determining whether it isalignant is by conducting a biopsy, which is an invasivehat involves the removal of the cells or tissue from aon-invasive method of identifying the abnormalitiesogram can reduce the number of unnecessary biop-aring the patients of inconvenience and saving medicalis study, mammogram database available in [43] hasThe 9 input features extracted from the mammogramied abnormal mass are used to classify the tumor asnant or benign. Here, 97 samples are used to developN classier and the performance of PBL-McRBFN clas-luated using the remaining 11 samples. For furtherhe input features and the data set, one should refer to

ormance results of PBL-McRBFN classier, in compar-the SRAN, ELM and SVM are presented in Table 5.ble, it is seen that PBL-McRBFN classier performs aent classication with 100% classication accuracy withber of hidden neurons. When compared to SRAN, ELM,ers, performance of PBL-McRBFN is improved consid-

m the performance study of PBL-McRBFN conductedELM, SVM for chosen benchmark data sets and practicaln problems,it can be observed that the proposed PBL-ssier performs better than other classiers.

ions

aper, we have presented a Meta-cognitive Radial Basisetwork (McRBFN) and its Projection Based Learningithm for classication problems in sequential frame-meta-cognitive component in McRBFN controls the

the cognitive component in McRBFN. The meta-mponent adapts the learning process appropriately byng self-regulation and hence it decides what-to-learn,rn and how-to-learn efciently. In addition, the overlap-ions present in neuron growth strategy helps in propern of new hidden neuron parameters and also mini-

Univernanciconduc

Refere

[1] G.BSys451

[2] Y. LJackCom

[3] F.F.App

[4] S. Awo330

[5] V. Rnetban

[6] M.EInfo

[7] M. App

[8] D.Epro

[9] G.-schon

[10] G.-ma

[11] J.C.put

[12] L. YfunNeu

[13] G.-algtion228

[14] N.-onltion

[15] S. Ssi134

[16] S. Smo(20

[17] N. onlCyb

[18] W.PassMo

[19] R. Ileashi

[20] S. Sres301

[21] S. for Tra

[22] G. pro86

[23] K. for 360isclassication error. The performance of the proposedN classier has been evaluated using the benchmarkory, binary classication problems from UCI machineository with wide range of imbalance factor and twossication problems. The statistical performance com-h the well-known classiers in the literature clearlye superior performance of the proposed PBL-McRBFN

[24] R. Savithcomplex-(5) (2012

[25] R. Savithfor a Full209218

[26] G. Sateescognitive2012 Inte290729pport (Grant number: MINDEF-NTU-JPP/11/02/05) tos study.

g, Neural network for classication: a survey, IEEE Transactions onMan and Cybernetics Part C: Applications and Reviews 30 (4) (2000)., B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D.ckpropagation applied to handwritten zip code recognition, Neuraltion. 1 (1989) 541551.. Cox, A neural network model for speech intelligibility quantication,oft Computing 7 (1) (2007) 145155.Saha, In search of an optimization technique for articial neural net-lassify abnormal heart sounds, Applied Soft Computing 9 (1) (2009).. Pramodh, Threshold accepting trained principal component neuraland feature subset selection: application to bankruptcy prediction inplied Soft Computing 8 (4) (2008) 15391548., P. Srinivasan, Hierarchical text categorization using neural networks,ion Retrieval 5 (2002) 87118.

S.W. Khor, Web document clustering using a hybrid neural network,oft Computing 4 (4) (2004) 423432.elhart, G.E. Hinton, R.J. Williams, Learning representations by back-ion errors, nature, Nature 323 (1986) 533536.ng, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learningf feedforward neural networks, IEEE International Joint Conferencel Networks. Proceedings 2 (2004) 985990.ng, X. Ding, H. Zhou, Optimization method based extreme learningfor classication, Neurocomputing 74 (1-3) (2010) 155163.A resource allocation network for function interpolation, Neural Com-3 (2) (1991) 213225.i, N. Sundararajan, P. Saratchandran, A sequential learning scheme forapproximation using minimal radial basis function neural networks,mputation 9 (2) (1997) 461478.ng, P. Saratchandran, N. Sundararajan, An efcient sequential learning

for growing and pruning RBF (GAP-RBF) networks, IEEE transac-Systems, Man, and Cybernetics. Part B, Cybernetics 34 (6) (2004)92.g, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and accuratequential learning algorithm for feedforward networks., IEEE Transac-eural Networks 17 (6) (2006) 14111423.

, N. Sundararajan, P. Saratchandran, A sequential multi-category clas-ng radial basis function networks, Neurocomputing 71 (1) (2008)58., R.V. Babu, H.J. Kim, No-reference image quality assessment using

extreme learning machine classier, Applied Soft Computing 9 (2)1552.ov, Evolving fuzzy neural networks for supervised/unsupervisedowledge-based learning, IEEE Transactions on Systems, Man, andics, Part B: Cybernetics 31 (6) (2001) 902918.rs, Autonomy at all costs: an ethnography of metacognitive self-nt and self-management among experienced language learners, Theanguage Journal 85 (2) (2001) 279290.n, F. Fujita, Metacognitive knowledge monitoring and self-regulated

academic success and reections on learning, Journal of the Scholar-aching and Learning 6 (1) (2006) 3955., K. Dong, H.J. Kim, A sequential learning algorithm for self-adaptiveallocation network classier, Neurocomputing 73 (1618) (2010)19., R. Savitha, N. Sundararajan, A sequential learning algorithmlex-valued self-regulating resource allocation network-CSRAN, IEEEons on Neural Networks 22 (7) (2011) 10611072.h Babu, S. Suresh, Meta-cognitive neural network for classication

in a sequential learning framework, Neurocomputing 81 (2012)

anian, S. Suresh, A meta-cognitive sequential learning algorithm-fuzzy inference system, Applied Soft Computing 12 (11) (2012)

14.a, S. Suresh, N. Sundararajan, Metacognitive learning in a fullyvalued radial basis function neural network, Neural Computation 24) 12971328.a, S. Suresh, N. Sundararajan, A meta-cognitive learning algorithmy Complex-valued Relaxation Network, Neural Networks 32 (2012).h Babu, R. Savitha, S. Suresh, A projection based learning in meta-

radial basis function network for classication problems, in: Thernational Joint Conference on Neural Networks (IJCNN), 2012, pp.

14.


[27] G. Sateesh Babu, S. Suresh, B.S. Mahanand, Alzheimers disease detection usinga Projection Based Learning Meta-cognitive RBF Network, in: The 2012 Inter-national Joint Conference on Neural Networks (IJCNN), 2012, pp. 408415.

[28] G. Sateesh Babu, S. Suresh, K. Uma Sangumathi, H. Kim, A Projection BasedLearning Meta-cognitive RBF network classier for effective diagnosis ofParkinsons disease, in: J. Wang, G. Yen, M. Polycarpou (Eds.), Advances in Neu-ral Networks ISNN 2012, vol. 7368 of Lecture Notes in Computer Science,Springer, Berlin / Heidelberg, 2012, pp. 611620.

[29] G. Sateesh Babu, S. Suresh, Parkinsons disease prediction using gene expres-sion a projection based learning meta-cognitive neural classier approach,Expert Systems with Applications (2012), http://dx.doi.org/10.1016/j.eswa.2012.08.070

[30] M.T. Cox, Metacognition in computation: a selected research review, ArticialIntelligence 169 (2) (2005) 104141.

[31] T.O. Nelson, L. Narens, Metamemory: A Theoretical Framework and New Find-ings, Allyn and Bacon, Boston, USA, 1992.

[32] S. Suresh, N. Sundararajan, P. Saratchandran, Risk-sensitive loss functions forsparse multi-category classication problems, Information Sciences 178 (12)(2008) 26212638.

[33] E. Castillo, O. Fontenla-Romero, B. Guijarro-Berdinas, A. Alonso-Betanzos, Aglobal optimum approach for one-layer neural networks, Neural Computation14 (6) (2002) 14291449.

[34] E. Castillo, B. Guijarro-Berdinas, O. Fontenla-Romero, A. Alonso-Betanzos, Avery fast learning method for neural networks based on sensitivity analysis,Journal of Machine Learning Research 7 (2006) 11591182.

[35] H. Hoffmann, Kernel PCA for novelty detection, Pattern Recognition 40 (3)(2007) 863874.

[36] C. Blake, C. Merz, UCI repository of machine learning databases, University ofCalifornia, Irvine, Department of Information and Computer Sciences, 1998,http://archive.ics.uci.edu/ml/

[37] J. Demsar, Statistical comparisons of classiers over multiple data sets, TheJournal of Machine Learning Research 7 (2006) 130.

[38] S.N. Omkar, S. Suresh, T.R. Raghavendra, V. Mani, Acoustic emission sig-nal classication using fuzzy C-means clustering, Proceedings of the ICONIP02, 9th International Conference on Neural Information Processing 4 (2002)18271831.

[39] C. Aize, Q. Song, X. Yang, S. Liu, C. Guo, Mammographic mass detection by vicinalsupport vector machine, Proceedings of the ICNN 04, International Conferenceon Neural Networks 3 (2004) 19531958.

[40] T. Zhang, Statistical behavior and consistency of classication methods basedon convex risk minimization, Annals of Statistics 32 (1) (2004) 5685.

[41] B. Scholkopf, A.J. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002.[42] C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (3) (1995)

273297.[43] J. Sucklin

matakis, digital m1069 (19

[44] S. Suresh, S.N. Omkar, V. Mani, T.N.G. Prakash, Lift coefcient prediction athigh angle of attack using recurrent neural network, Aerospace Science andTechnology 7 (8) (2003) 595602.

[45] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans-actions on Intelligent Systems and Technology 2 (2011) 27:1-27:27, softwareavailable at http://www.csie.ntu.edu.tw/ cjlin/libsvm

[46] R.L. Iman, J.M. Davenport, Approximations of the critical region of the Friedmanstatistic, Communications in Statistics (1980) 571595.

[47] J.H. Zar, Biostatistical Analysis, 4th Ed., Prentice-Hall, Englewood Clifs, NewJersey, 1999.

[48] O.J. Dunn, Multiple comparisons among means, Journal of the American Statis-tical Association 56 (293) (1961) 5264.

Mr. Giduthuri Sateesh Babu received the B.Tech degreein electrical and electronics engineering from Jawahar-lal Nehru Technological University, India, in 2007, andM.Tech degree in electrical engineering from Indian Insti-tute of Technology Delhi, India, in 2009. From 2009 to2010, he worked as a senior software engineer in Sam-sung R&D centre, India. He is currently a Ph.D. studentwith School of Computer Engineering, Nanyang Tech-nological University, Singapore. His research interestsinclude machine learning, cognitive computing, neuralnetworks, control systems, optimization and medicalinformatics.

Dr. Sundaram Suresh received the B.E degree in electricaland electronics engineering from Bharathiyar Univer-sity in 1999, and M.E (2001) and Ph.D. (2005) degreesin aerospace engineering from Indian Institute of Sci-ence, India. He was post-doctoral researcher in schoolof electrical engineering, Nanyang Technological Univer-sity from 2005 to 2007. From 2007 to 2008, he was inINRIA-Sophia Antipolis, France as ERCIM research fel-low. He was in Korea University for a short period as avisiting faculty in Industrial Engineering. From January2009 to December 2009, he was in Indian Institute ofTechnology-Delhi as an Assistant Professor in Departmentof Electrical Engineering. Currently, he is working as an

Assistant Professor in School of Computer Engineering, Nanyang Technological Uni-Singapore since 2010. He was awarded best young faculty for the year

IIT-Designg, J. Parker, D.R. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts, E. Sta-N. Cerneaz, S. Kok, et al., The mammographic image analysis societyammogram database, Experta Medica International Congress Series94) 375378.

versity, 2009 byvehicle dvision.elhi His research interest includes ight control, unmanned aerial, machine learning, applied game theory, optimization and computer

Meta-cognitive RBF Network and its Projection Based Learning algorithm for classification problems1 Introduction2 Meta-cognitive radial basis function network for classification problems2.1 Problem definition2.2 McRBFN architecture2.2.1 Cognitive component of McRBFN2.2.2 Meta-cognitive component of McRBFN2.2.3 Learning strategies

2.3 PBL-McRBFN classification algorithm

3 Performance evaluation of PBL-McRBFN classifier3.1 Performance measures3.1.1 Class-wise measure3.1.2 Statistical significance test

3.2 Performance evaluation on UCI benchmark data sets3.2.1 Statistical significance analysis

3.3 Acoustic emission signal classification for health monitoring3.4 Mammogram classification for breast cancer detection

4 ConclusionsAcknowledgementsReferences

pbl

Documents

discardedafter learning

batch learning algorithms

various problems

twopractical problems

human learning principles

network architecture

metacognitive component

training error