pbl

13
Applied Soft Computing 13 (2013) 654–666 Contents lists available at SciVerse ScienceDirect Applied Soft Computing j ourna l ho me p age: www.elsevier.com/l ocate/asoc Meta-cognitive RBF Network and its Projection Based Learning algorithm for classification problems G. Sateesh Babu, S. Suresh School of Computer Engineering, Nanyang Technological University, Singapore a r t i c l e i n f o Article history: Received 2 February 2012 Received in revised form 24 May 2012 Accepted 31 August 2012 Available online 23 September 2012 Keywords: Meta-cognitive learning Self-regulatory thresholds Radial basis function network Multi-category classification Projection Based Learning a b s t r a c t ‘Meta-cognitive Radial Basis Function Network’ (McRBFN) and its ‘Projection Based Learning’ (PBL) algo- rithm for classification problems in sequential framework is proposed in this paper and is referred to as PBL-McRBFN. McRBFN is inspired by human meta-cognitive learning principles. McRBFN has two com- ponents, namely the cognitive component and the meta-cognitive component. The cognitive component is a single hidden layer radial basis function network with evolving architecture. In the cognitive compo- nent, the PBL algorithm computes the optimal output weights with least computational effort by finding analytical minima of the nonlinear energy function. The meta-cognitive component controls the learning process in the cognitive component by choosing the best learning strategy for the current sample and adapts the learning strategies by implementing self-regulation. In addition, sample overlapping condi- tions are considered for proper initialization of new hidden neurons, thus minimizes the misclassification. The interaction of cognitive component and meta-cognitive component address the what-to-learn, when- to-learn and how-to-learn human learning principles efficiently. The performance of the PBL-McRBFN is evaluated using a set of benchmark classification problems from UCI machine learning repository and two practical problems, viz., the acoustic emission signal classification and the mammogram for cancer classi- fication. The statistical performance evaluation on these problems has proven the superior performance of PBL-McRBFN classifier over results reported in the literature. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Neural networks are powerful tools that can be used to approxi- mate the complex nonlinear input–output relationships efficiently. Hence, from the last few decades neural networks are extensively employed to solve real world classification problems [1]. In a classi- fication problem, the objective is to learn the decision surface that accurately maps an input feature space to an output space of class labels. Several learning algorithms for different neural network architectures have been used in various problems in science, busi- ness, industry and medicine, including the handwritten character recognition [2], speech recognition [3], biomedical medical diag- nosis [4], prediction of bankruptcy [5], text categorization [6] and information retrieval [7]. Among various architectures reported in the literature, Radial Basis Function (RBF) network gaining atten- tion due to its localization property of Gaussian function, and widely used in classification problems. Significant contributions to RBF learning algorithms for classification problems are broadly classified into two categories: (a) Batch learning algorithms: Gra- dient descent based learning was used to determine the network Corresponding author. Tel.: +65 6790 6185. E-mail address: [email protected] (S. Suresh). parameters [8]. Here, the complete training data are presented mul- tiple times, until the training error is minimum. Alternatively, one can implement random input parameter selection with least square solution for the output weight [9,10]. In both cases, the number of Gaussian functions required to approximate the true func- tion is determined heuristically. (b) Sequential learning algorithms: The number of Gaussian neurons required to approximate the input–output relationship is determined automatically [11–15]. Here, the training samples are presented one-by-one and discarded after learning. Resource Allocation Network (RAN) [11] was the first sequential learning algorithm introduced in the literature. RAN evolves the network architecture required to approximate the true function using novelty based neuron growth criterion. Minimal Resource Allocation Network (MRAN) [12] uses a similar approach, but it incorporates error based neuron growing/pruning criterion. Hence, MRAN determines compact network architecture than RAN algorithm. Growing and Pruning Radial Basis Function Network [13] selects growing/pruning criteria of the network based on the significance of a neuron. A sequential learning algorithm using recursive least squares presented in [14], referred as an On-line Sequential Extreme Learning Machine (OS-ELM). OS-ELM chooses input weights randomly with fixed number of hidden neurons and analytically determines the output weights using minimum norm least-squares. In case of sparse and imbalance data sets, the random 1568-4946/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.asoc.2012.08.047

Upload: debasish-shome

Post on 03-Oct-2015

21 views

Category:

Documents


2 download

DESCRIPTION

PBL-Mcrbfn neural network

TRANSCRIPT

  • Applied Soft Computing 13 (2013) 654666

    Contents lists available at SciVerse ScienceDirect

    Applied Soft Computing

    j ourna l ho me p age: www.elsev ier .co

    Meta-c Bfor clas

    G. SateesSchool of Comp

    a r t i c l

    Article history:Received 2 FebReceived in reAccepted 31 AAvailable onlin

    Keywords:Meta-cognitivSelf-regulatoryRadial basis fuMulti-category classicationProjection Based Learning

    Netwquenhumanent ction

    optimergy fy chlemezation

    The interaction of cognitive component and meta-cognitive component address the what-to-learn, when-to-learn and how-to-learn human learning principles efciently. The performance of the PBL-McRBFN isevaluated using a set of benchmark classication problems from UCI machine learning repository and twopractical problems, viz., the acoustic emission signal classication and the mammogram for cancer classi-cation. The statistical performance evaluation on these problems has proven the superior performance

    1. Introdu

    Neural nmate the coHence, fromemployed tcation proaccurately mlabels. Sevearchitectureness, indusrecognitionnosis [4], prinformationthe literatution due towidely useto RBF learnclassied indient desce

    CorresponE-mail add

    1568-4946/$ http://dx.doi.oof PBL-McRBFN classier over results reported in the literature. 2012 Elsevier B.V. All rights reserved.

    ction

    etworks are powerful tools that can be used to approxi-mplex nonlinear inputoutput relationships efciently.

    the last few decades neural networks are extensivelyo solve real world classication problems [1]. In a classi-blem, the objective is to learn the decision surface thataps an input feature space to an output space of class

    ral learning algorithms for different neural networks have been used in various problems in science, busi-

    try and medicine, including the handwritten character [2], speech recognition [3], biomedical medical diag-ediction of bankruptcy [5], text categorization [6] and

    retrieval [7]. Among various architectures reported inre, Radial Basis Function (RBF) network gaining atten-

    its localization property of Gaussian function, andd in classication problems. Signicant contributionsing algorithms for classication problems are broadlyto two categories: (a) Batch learning algorithms: Gra-nt based learning was used to determine the network

    ding author. Tel.: +65 6790 6185.ress: [email protected] (S. Suresh).

    parameters [8]. Here, the complete training data are presented mul-tiple times, until the training error is minimum. Alternatively, onecan implement random input parameter selection with least squaresolution for the output weight [9,10]. In both cases, the numberof Gaussian functions required to approximate the true func-tion is determined heuristically. (b) Sequential learning algorithms:The number of Gaussian neurons required to approximate theinputoutput relationship is determined automatically [1115].Here, the training samples are presented one-by-one and discardedafter learning. Resource Allocation Network (RAN) [11] was therst sequential learning algorithm introduced in the literature. RANevolves the network architecture required to approximate the truefunction using novelty based neuron growth criterion. MinimalResource Allocation Network (MRAN) [12] uses a similar approach,but it incorporates error based neuron growing/pruning criterion.Hence, MRAN determines compact network architecture than RANalgorithm. Growing and Pruning Radial Basis Function Network[13] selects growing/pruning criteria of the network based on thesignicance of a neuron. A sequential learning algorithm usingrecursive least squares presented in [14], referred as an On-lineSequential Extreme Learning Machine (OS-ELM). OS-ELM choosesinput weights randomly with xed number of hidden neurons andanalytically determines the output weights using minimum normleast-squares. In case of sparse and imbalance data sets, the random

    see front matter 2012 Elsevier B.V. All rights reserved.rg/10.1016/j.asoc.2012.08.047ognitive RBF Network and its Projectionsication problems

    h Babu, S. Suresh

    uter Engineering, Nanyang Technological University, Singapore

    e i n f o

    ruary 2012vised form 24 May 2012ugust 2012e 23 September 2012

    e learning thresholdsnction network

    a b s t r a c t

    Meta-cognitive Radial Basis Functionrithm for classication problems in sePBL-McRBFN. McRBFN is inspired by ponents, namely the cognitive compois a single hidden layer radial basis funnent, the PBL algorithm computes theanalytical minima of the nonlinear enprocess in the cognitive component badapts the learning strategies by imptions are considered for proper initialim/l ocate /asoc

    ased Learning algorithm

    ork (McRBFN) and its Projection Based Learning (PBL) algo-tial framework is proposed in this paper and is referred to asn meta-cognitive learning principles. McRBFN has two com-and the meta-cognitive component. The cognitive component

    network with evolving architecture. In the cognitive compo-al output weights with least computational effort by nding

    unction. The meta-cognitive component controls the learningoosing the best learning strategy for the current sample andnting self-regulation. In addition, sample overlapping condi-

    of new hidden neurons, thus minimizes the misclassication.

  • G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666 655

    (b)(a)Metacognitive Component

    C

    Metacognition

    tion a

    selection ofin the OS-E[16]. In neu(EFuNNs) [1shown in [1the functionlems. A Seq(SMC-RBF) classicatioand paramupdating thof the curreupdating a

    Aforemein the traincontained inprocessing remembericognitive inhas revealeers adopt se[18,19]. Mecognitive frprocesses, dand evaluatbasis functisuitable leacess then itNetwork (Mwhat-to-leation from tself-regulat

    Self-adaComplex-va(CSRAN) [2cognition berror and happropriatehelps in impevident thawith suitabability of a n(a) the samterion whic(b) the newmay overlaclassicatioand (d) useparameter and Meta-caddress theof meta-cogally intensi

    edgeon ied Pn ne

    propnd a

    we p fast

    re arlogy ed inson as acti

    is anmponenta-cogationformnitivnents thenentts m

    BFNon mrensnamnenthiddctury ads of weig

    bases areorithut/hple

    rgy m ofogramCognition

    ControlMonitoring PredictedOutput

    Fig. 1. (a) Nelson and Narens Model of meta-cogni

    input weights with xed number of hidden neuronsLM affects the performance signicantly as shown inral-fuzzy framework, Evolving Fuzzy Neural Networks7] is the novel sequential learning algorithm. It has been5] that the aforementioned algorithms works well for

    approximation problems than the classication prob-uential Multi-Category Radial Basis Function network[15] considers the similarity measure within class, mis-n rate and prediction error are used in neuron growingeter update criterion. SMC-RBF has been shown thate nearest neuron parameters in the same class as thatnt sample helps in improving the performance than

    nearest neuron in any class.ntioned neural network algorithms use all the samplesing data set to gain knowledge about the information

    the samples. In other words, they possess information-abilities of humans, including perception, learning,ng, judging, and problem-solving, and these abilities are

    nature. However, recent studies on human learningd that the learning process is effective when the learn-lf-regulation in learning process using meta-cognitionta-cognition means cognition about cognition. In a meta-amework, human-beings think about their cognitiveevelop new strategies to improve their cognitive skillse the information contained in their memory. If a radialon network analyzes its cognitive process and choosesrning strategies adaptively to improve its cognitive pro-

    is referred to as Meta-Cognitive Radial Basis FunctioncRBFN). Such a McRBFN must be capable of deciding

    rn, when-to-learn and how-to-learn the decision func-he stream of training data by emulating the humaned learning.ptive Resource Allocation Network (SRAN) [20] andlued Self-regulating Resource Allocation Network1] address the what-to-learn component of meta-y selecting signicant samples using misclassicationinge loss error. It has been shown that the selecting

    samples for learning and removing repetitive samplesroving the generalization performance. Therefore, it ist emulating the three components of human learningle learning strategies would improve the generalizationeural network. The drawbacks in these algorithms are:ples for training are selected based on simple error cri-h is not sufcient to address the signicance of samples;

    knowlcognitiproposfunctioexceptwork apaper,and itsrithm.

    Thephysioreportby Nelvarioumodeltwo cocompoto metinformThe inthe cogcompoinformcompoponenstate.

    McRcognitiand Nanents composingle architedata bweightinput minedneuroning algthe inpthe saman eneprobleear pr hidden neuron center is allocated independently whichp with already existed neuron centers leading to mis-n; (c) knowledge gained from past samples is not used;s computationally intensive extended Kalman lter forupdate. Meta-cognitive Neural Network (McNN) [22]ognitive Neuro-Fuzzy Inference System (McFIS) [23]

    rst two issues efciently by using three componentsnition. However, McNN and McFIS use computation-ve parameter update and does not utilize the past

    and real caalgorithm tsystem of liweights, cofunction. Tdynamic mand self-regthe learningthe four str(RBF Neural Network)ognitive Component

    Best learning Strategy

    nd (b) McRBFN Model.

    stored in the network. Similar works using meta-n complex domain are reported in [24,25]. Recentlyrojection Based Learning in meta-cognitive radial basistwork [26] addresses the above issues in batch modeer utilization of the past knowledge stored in the net-pplied to solve biomedical problems in [2729]. In thisropose a meta-cognitive radial basis function networkand efcient projection based sequential learning algo-

    e several meta-cognition models available in humanand a brief survey of various meta-cognition models are

    [30]. Among the various models, the model proposednd Narens in [31] is simple and clearly highlights theons in human meta-cognition as shown in Fig. 1(a). Thealogous to the meta-cognition in human-beings and hasnents, the cognitive component and the meta-cognitive. The information ow from the cognitive componentnitive component is considered monitoring, while the

    ow in the reverse direction is considered control.ation owing from the meta-cognitive component toe component either changes the state of the cognitive

    or changes the cognitive component itself. Monitoring meta-cognitive component about the state of cognitive, thus continuously updating the meta-cognitive com-odel of cognitive component, including, no change in

    is developed based on the Nelson and Narens meta-odel [31] as shown in Fig. 1(b). Analogous to the Nelson

    meta-cognition model [31], McRBFN has two compo-ely the cognitive component and the meta-cognitive

    as shown in Fig. 1(b). The cognitive component is aen layer radial basis function network with evolvinge. The cognitive component learns from the trainingding new hidden neurons and updating the outputhidden neurons to approximate the true function. Thehts of hidden neurons (center and width) are deter-d on the training data and output weights of hidden

    estimated using the projection based sequential learn-m. When a neuron is added to the cognitive component,idden layer parameters are xed based on the input of

    and the output weights are estimated by minimizingfunction given by the hinge loss error as in [32]. The

    nding optimal weights is rst formulated as a lin-ming problem using the principles of minimization

    lculus [33,34]. The Projection Based Learning (PBL)hen converts the linear programming problem into anear equations and provides a solution for the optimalrresponding to the minimum energy point of the energyhe meta-cognitive component of McRBFN contains aodel of the cognitive component, knowledge measuresulated thresholds. Meta-cognitive component controls

    process of the cognitive component by choosing one ofategies for each sample in the training data set. When a

  • 656 G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666

    sample is presented to McRBFN, the meta-cognitive component ofMcRBFN measures the knowledge contained in the current train-ing sample with respect to the cognitive component using itsknowledge measures. Predicted class label, maximum hinge errorand class-wures of theobtained frmethods totightly by tbetween thtion helps paper, McRspherical ping the learthe meta-coing strategithese stratethe cognitivachieves begies are adathresholds.the overlaptance fromusing the PProjection Basis Netwo

    The perevaluated ution problelearning repbinary clasfactor. In ais comparethe literatuall/average test [37]. Tranking of etistical signthe performated using emission sition for brethat PBL-Mcmance than

    The outlmeta-cogniSection 3 prsier on a sand comparliterature. S

    2. Meta-coclassicatio

    In this setion netwothe classicradial basissequential form.

    2.1. Problem

    Given stwhere xt =

    sample, and ct (1, n) is its class label. Where n is the total numberof classes. The coded class labels (yt = [yt1, . . . , ytj , . . . , ytn]

    T ) Rnare given by:

    1

    1

    jectiisionro hachiectur

    deta

    cRBF

    RBFNnd thive cork wie me

    of tgulatg prr straininnentg sam

    infog pry foro-lea

    presive co

    Cogn cognrwar

    layenenthoutsian

    pred

    1

    wk

    wkjput ninpu

    p

    (

    lk

    uronf the

    cognhm fhat o34]. Tjecting a

    funcise signicance are considered as knowledge meas- meta-cognitive component. Class-wise signicance isom spherical potential, which is used widely in kernel

    determine whether all the data points are enclosedhe Gaussian kernels [35]. Here, the squared distancee current sample and the hyper-dimensional projec-in measuring the novelty in the data. Since, in thisBFN addresses classication problems, we redene theotential in class-wise framework and is used in devis-ning strategies. Using the above mentioned measuresgnitive component constructs two sample based learn-es and two neuron based learning strategies. One ofgies is selected for the current training sample such thate component learns the true function accurately andtter generalization performance. These learning strate-pted by meta-cognitive component using self-regulated

    In addition, the meta-cognitive component identiesping/non-overlapping conditions by measuring the dis-

    nearest neuron in the inter/intra-class. The McRBFNBL to obtain the network parameters is referred to as,Based Learning algorithm for a Meta-cognitive Radialrk (PBL-McRBFN).formance of the proposed PBL-McRBFN classier issing set of benchmark binary/multi-category classica-ms from University of California, Irvine (UCI) machineository [36]. We consider ve multi-category and ve

    sication problems with varying values of imbalancell these problems, the performance of PBL-McRBFNd against the best performing classiers available inre using class-wise performance measures like over-efciency and a non-parametric statistical signicancehe non-parametric Friedman test based on the meanach algorithm over multiple data sets indicate the sta-icance of the proposed PBL-McRBFN classier. Finally,ance of PBL-McRBFN classier has also been evalu-

    two practical classication problems viz., the acousticgnal classication [38] and the mammogram classica-ast cancer detection [39]. The results clearly highlightRBFN classier provides a better generalization perfor-

    the results reported in the literature.ine of this paper is as follows: Section 2 describes thetive radial basis network for classication problems.esents the performance evaluation of PBL-McRBFN clas-et of benchmark and practical classication problems,es with the best performing classiers available in theection 4 summarizes the conclusions from this study.

    gnitive radial basis function network forn problems

    ction, we describe the meta-cognitive radial basis func-rk for solving classication problems. First, we deneation problem. Next, we present the meta-cognitive

    function network architecture. Finally, we present thelearning algorithm and summarize in a pseudo-code

    denition

    ream of training data samples, {(x1, c1), . . ., (xt, ct), . . . },[xt1, . . . , x

    tm]T Rm is the m-dimensional input of the tth

    ytj =

    The obing decwith zeple to architegies in

    2.2. M

    Mcnent acognitnetworon. Thmodelself-relearninthe founew trcompotraininon thislearninstrategwhen-t

    Wecognit

    2.2.1. The

    feed fooutputcompo

    WitK Gausxt, the

    ytj =Kk=

    wherejth outto the

    htk = ex

    whereden neclass o

    Thealgoritlar to tin [33,

    ProLearnienergyif ct = jotherwise j = 1, . . . , n (1)

    ve of McRBFN classier is to approximate the underly- function that maps xt Rm yt Rn. McRBFN beginsidden neuron and selects suitable strategy for each sam-ve this objective. In the next section, we describe thee of McRBFN and discuss each of these learning strate-il.

    N architecture

    has two components, namely the cognitive compo-e meta-cognitive component, as shown in Fig. 2. Themponent is a single hidden layer radial basis functionth evolving architecture starting from zero hidden neu-ta-cognitive component of McRBFN contains dynamiche cognitive component, knowledge measures anded thresholds. Meta-cognitive component controls theocess of the cognitive component by choosing one ofategies for each sample in the training data set. When ag sample presented to the McRBFN, the meta-cognitive

    of McRBFN estimates the knowledge present in the newple with respect to the cognitive component. Based

    rmation, the meta-cognitive component controls theocess of the cognitive component by selecting suitable

    the current training sample to address what-to-learn,rn and how-to-learn properly.ent a detailed description of the cognitive and the meta-mponents of McRBFN in the following sections:

    itive component of McRBFNitive component of McRBFN is a single hidden layeredd radial basis function network with a linear input andrs. The neurons in the hidden layer of the cognitive

    of McRBFN employ the Gaussian activation function. loss of generality, we assume that the McRBFN builds

    neurons from t 1 training samples. For a given inputicted output of the jth output neuron (yt

    j) of McRBFN is

    jhtk, j = 1, . . . , n (2)

    is the weight connecting the kth hidden neuron to theeuron and ht

    kis the response of the kth hidden neuron

    t xt is given by

    xt lk2

    (lk)2

    )(3)

    Rm is the center and lk

    R+ is the width of the kth hid-. Here, the superscript l represents the corresponding

    hidden neuron.itive component uses Projection Based Learning (PBL)

    or learning process. The strategy proposed here is simi-f fast learning algorithm for single layer neural networkhe PBL algorithm is described as follows.on Based Learning algorithm: The Projection Basedlgorithm works on the principle of minimization oftion and nds the optimal network output parameters

  • G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666 657

    cRBF

    for which ththe minimu

    The conserror at McRis dened a

    Ji =nj=1

    (eij)

    where eijis

    eij ={

    0

    yij

    When yijyij 1

    yij

    otherwisej = 1, . . . , n (5)

    1, the energy function for ith sample becomes

    yij)2

    =nj=1

    0(yij

    Kk=1wkjh

    ik

    )2, i = 1, . . . , t (6)

    g samples, the overall energy function is dened as

    1

    Ji =12

    ti=1

    nj=1

    0(yij

    Kk=1wkjh

    ik

    )2(7)

    the response of the kth hidden neuron for ith training

    mal output weights (W RKn) are estimated suchal energy reaches its minimum.

    minRKn

    J(W) (8)

    l W* corresponding to the minimum energy point offunction (J(W*)) is obtained by equating the rst order

    k=1 i=1

    Eq. (10) can

    Kk=1akpwkj =

    which can b

    AW = B

    where the p

    akp =ti=1hik

    and the out

    bpj =ti=1hip

    Eq. (11) givoutput weiclosed-form

    Propositioni.e.xi, whenN.

    vative of J(W) with respect to the output weight to zero,

    p = 1, . . . , K; j = 1, . . . , n (9)

    e rst partial derivative to zero and re-arranging we get

    ipwkj =

    thipy

    ij (10)i=1

    be written as

    bpj, p = 1, . . . , K; j = 1, . . . , n (11)

    e represented in matrix form as

    (12)

    rojection matrix A RKK is given by

    hip, k = 1, . . . , K; p = 1, . . . , K (13)

    put matrix B RKn is

    yij, p = 1, . . . , K; j = 1, . . . , n (14)

    es the set of K n linear equations with K n unknownghts W. We state the following prepositions to nd the

    solution for these set of linear equations.

    1. The responses of the hidden neurons are unique. k /= p, hi

    k/= hip; k, p = 1, . . ., K, i = 1, . . ., t.

  • 658 G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666

    Proof. Let us assume that for a given xi, hip = hik; when k /= p, thisassumption is valid if and only if

    lp == lk AND lp == lk (15)

    But the paiselected sigsignicant in Eq. (33).(Et) and claneuron is ato the currcurrent samhence, the rfor all samp

    Propositionfor at least f

    Proof. Let0, i.e., hi

    k=

    , or lk

    The inpu

    that |xj| < 1 growth straallocated bawidths are ddistances wof the hidde

    We state

    Theorem 1matrix, and

    Proof. Fro(13),

    Apk =ti=1h

    it can be inf

    Akk =ti=1hik

    From PropoTherefore E

    Akk =ti=1

    |h

    Hence the ppositive, i.e

    The off-d

    Akj =ti=1hik

    From Eqs. (1A is a symm

    A symmqTAq > 0. Lethat q11 = 1 qT1Aq1 = A10. Thereforevector qk = [

    qTkAqk = Ak

    Let p RK be the linear transformed sum of K such unit basisvectors, i.e., p = q1t1 + + qktk + + qKtK, where tk R is the trans-formation constant. Then,

    Kk=1

    K K

    wn in,

    k R

    he pble.

    solus givtive s is g

    ) =

    secoing o

    functoutptiongy po

    he Tn be

    1B

    Meta menitivolds.onit

    of thmplt of Mg sawled

    claser (pled

    pted. Usie mg stre stranitivlizat

    met

    dictedted c

    g mj1

    um he er

    clasr of vectors lk

    and lp are allocated based upon thenicant training samples for addition of neurons, thesesamples are selected using neuron growth criterion as

    Neuron growth criterion uses maximum hinge errorss-wise signicance ( c). c dened such that a newdded such that when there is no neuron present nearent sample which produces signicant output for theple. So there are no two neuron centers are equal andesponse of the kth and pth hidden neurons are not equalles.

    2. The response of the each hidden neuron is non-zeroew samples.

    us assume that the response of kth hidden neuron is 0 xi. This is possible if and only if xi , or l

    k

    0t variables xi are normalized in a circle of radius 1 such; j = 1, . . ., m. As shown in overlapping conditions of thetegy in subsection 2.2.3 that hidden neuron centers aresed upon the selected signicant training samples andetermined based upon inter/intra class nearest neuronhich are nonzero positive values. Hence, the responsen neuron is non-zero for at least few samples.

    the following theorem, using the Propositions 1 and 2.

    . The projection matrix A is a positive denite symmetrichence it is invertible.

    m the denition of the projection matrix A given in Eq.

    iphik, p = 1, . . . , K; k = 1, . . . , K (16)

    er that the diagonal elements of the A are:

    hik, k = 1, . . . , K (17)

    sition 2, the hidden neurons response are non-zero.q. (17) can be written as

    ik|2 > 0 (18)

    rojection matrix diagonal elements are non-zero, and., Ai

    kk R+ > 0.

    iagonal elements of the projection matrix (A) are:

    hij =ti=1hijh

    ik = Ajk (19)

    7) and (19), it can be inferred that the projection matrixetric matrix.etric matrix is positive denite iff for any q /= 0,t us consider an unit basis vector q1 RK1 suchand q12 q1K = 0, i.e., q1 = [1 0 0 0]T. Therefore,1 In Eq. (17), it was shown that k = 1, . . . , K, Akk R >, A11 R > 0 qT1Aq1 > 0. Similarly, for an unit basis0 1 0]T, the product qT

    kAqk is given by

    k > 0; k = 1, . . . , K (20)

    pTAp =

    As shoHence

    |tk|2Ak

    Thus, tinverti

    Thetions aderivaweight

    2J(Wwlp2

    As the follow

    1 The 2 The

    equaener

    Using t(12) ca

    W = A

    2.2.2. The

    the cogthreshnent mmodel(tth) saponentraininits knodictedclassiof knoware adasampleolds, thlearninof thesthe coggenera

    Thebelow:

    Prepredic

    ct = ar

    Maximmize th(yt). In(qktk)TAk=1

    (qktk) =k=1

    |tk|2Akk (21)

    Eq. (17), Akk R> 0. Also, that |tk|2 R> 0 is evident.

    > 0; k = 1, . . . , K Kk=1

    |tk|2Akk R > 0 (22)

    rojection matrix A is positive denite, and hence it is

    tion for W obtained as a solution to the set of equa-en in Eq. (12) is minimum, if 2J/wlp2 > 0. The secondof the energy function (J) with respect to the outputiven by,

    t

    i=1hiph

    ip =

    ti=1

    |hip|2 > 0 (23)

    nd derivative of the energy function J(W) is positive, thebservations can be made from Eq. (23):

    ion J is a convex function.ut weight W* obtained as a solution to the set of linears (Eq. (12)) is the weight corresponding to the minimumint of the energy function (J).

    heorem 1, the solution for the system of equations in Eq. determined as follows:

    (24)

    -cognitive component of McRBFNta-cognitive component contains dynamic model ofe component, knowledge measures and self-regulated

    During the learning process, meta-cognitive compo-ors the cognitive component and updates its dynamice cognitive component. When a new training samplee is presented to the McRBFN, the meta-cognitive com-cRBFN estimates the knowledge present in the new

    mple with respect to the cognitive component usingge measures. The meta-cognitive component uses pre-s label (ct), maximum hinge error (Et), condence of(ct |xt)) and class-wise signicance ( c) as the measuresge in the new training sample. Self-regulated thresholds

    to capture the knowledge presented in the new trainingng the knowledge measures and self-regulated thresh-eta-cognitive component constructs two sample basedategies and two neuron based learning strategies. Onetegies is selected for the new training sample such thate component learn them accurately and achieves betterion performance.a-cognitive component measures are dened as shown

    class label (ct): Using the predicted output (yt), the

    lass label (ct) can be obtained as

    ax,...,n

    ytj (25)

    inge error (Et): The objective of the classier is to mini-ror between the predicted output (y

    t) and actual output

    sication problems, it has been shown in [32,40] that

  • G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666 659

    the classier developed using hinge loss error estimates the poste-rior probability more accurately than the classier developed usingmean square error. Hence, in McRBFN, we use the hinge loss error(

    et =[et1,

    t t]T)

    n

    The max

    Et = maxj1,2,...,

    Condence otion or pred

    p(j|xt) =mi

    Class-wise smapped onK Gaussian dimensionapotential ofdistance fro[35].

    In McRBdescribe thfeature spanew data xoriginal spafeature spac

    = ||h(xt) As shown in

    = h(xt , xt

    From the athe rst terconstants. Smay be disc

    2K

    Kk=1

    Since wedistributionthe classiespherical poc with respeKc be the nuwise spherias

    c = 1Kc

    Kck=1

    The sphetained in thto one) indiedge in thepotential (c

    2.2.3. LearnMeta-co

    gies using twhich direhuman lealearn). The

    cognitive component by selecting one of the following four learningstrategies for the new training sample.

    Sample delete strategy: If the new training sample containsmationeata son ghiddtion,

    hidmetete thdateple r

    info of togniearnmete

    incipelow

    ple dtrain

    posle dan bing p

    ct A

    metales p

    to 1rocecingy sam

    mayctede raning o-trainon gcan

    the on toon gr

    ct O

    e cnd ermarnireshhold

    very willosenloserork ty. He

    entange. . . , ej, . . . , en R dened as in Eq. (5).

    imum absolute hinge error (Et) is given by

    n

    etj (26)f Classier (p(ct |xt)): The condence level of classica-icted posterior probability is given as

    n(1, max(1, ytj)) + 1

    2, j = ct (27)

    ignicance ( c): In general, the input feature (xt) is to a hyper-dimensional spherical feature space S usingneurons, i.e., xt H. Therefore, all H(xt) lie on a hyper-l sphere as shown in [41]. The knowledge or spherical

    any sample in original space is expressed as a squaredm the hyper-dimensional mapping S centered at h0

    FN, the center () and width () of the Gaussian neuronse feature space S. Let the center of the K-dimensionalce be h0 = 1K

    Kk=1h(k). The knowledge present in the

    t can be expressed as the potential of the data in thece, which is squared distance from the K-dimensionale to the center h0. The potential ( ) is given as

    h0||2 (28) [35], the above equation can be expressed as

    ) 2K

    Kk=1h(xt , lk) +

    1K2

    Kk,r=1

    h(lk, lr) (29)

    bove equation, we can see that for Gaussian functionm (h(xt, xt)) and last term (1/K2

    Kk,r=1h

    (lk, lr

    )) are

    ince potential is a measure of novelty, these constantsarded and the potential can be reduced to

    h(xt , lk) (30)

    are addressing classication problems, the class-wise plays a vital role and it will inuence the performancer signicantly [15]. Hence, we use the measure of thetential of the new training sample xt belonging to classct to the neurons associated to same class (i.e., l = c). Letmber of neurons associated with the class c, then class-cal potential or class-wise signicance ( c) is dened

    h(xt , ck) (31)

    rical potential explicitly indicates the knowledge con-e sample, a higher value of spherical potential (closecates that the sample is similar to the existing knowl-

    cognitive component and a smaller value of sphericallose to zero) indicates that the sample is novel.

    ing strategiesgnitive component devices various learning strate-he knowledge measures and self-regulated thresholds,ctly addresses the basic principles of self-regulatedrning (i.e., what-to-learn, when-to-learn and how-to-meta-cognitive part controls the learning process in

    inforcomping d

    Neurnew addia new

    Paraupdato up

    Samsomestagethe cout lpara

    The prdetail b

    Samnew matesampand clearn

    ct ==

    The sampcloseing pRedumanworkexpein thlearnover

    Neursignifromneurneur(ct /=

    wherold aThe tfor lec ththresthenworkis chsen cnetwabilisuremthe ron similar to the knowledge present in the cognitivent, then delete the new training sample from the train-et without using it in the learning process.rowth strategy: Use the new training sample to add aen neuron in the cognitive component. During neuronsample overlapping conditions are identied to allocateden neuron appropriately.r update strategy: The new training sample is used toe parameters of the cognitive component. PBL is used

    the parameters.eserve strategy: The new training sample containsrmation but not signicant, they can be used at laterhe learning process for ne tuning the parameters oftive component. These sample may be discarded with-ing or used for ne tuning the cognitive componentrs in a later stage.

    le behind these four learning strategies are described in:

    elete strategy: When the predicted class label of theing sample is same as the actual class label and the esti-terior probability is close to 1, then the new trainingoes not provide additional information to the classiere deleted from training sequence without being used inrocess. The sample deletion criterion is given by

    ND p(ct |xt) d (32)

    -cognitive deletion threshold (d) controls number ofarticipating in the learning process. If one selects d

    then all the training samples participates in the learn-ss which results in over-training with similar samples.

    d beyond the desired accuracy results in deletion of tooples from the training sequence. But, the resultant net-

    not satisfy the desired accuracy. Hence, it is xed at the accuracy level. In our simulation studies, it is selectedge of [0.90.95]. The sample deletion strategy preventsf samples with similar information, and thereby, avoidsing and reduces the computational effort.rowth strategy: When a new training sample containst information and the predicted class label is differentactual class label then one need to add a new hidden

    represent the knowledge contained in the sample. Theowth criterion is given by

    R Et a)

    AND c(xt) c (33)

    is the meta-cognitive knowledge measurement thresh-a is the self-adaptive meta-cognitive addition threshold.s c and a allows samples with signicant knowledgeng rst then uses the other samples for ne tuning. Ifold is chosen closer to zero and the initial value of a

    is chosen closer to the maximum value of hinge error, few neurons will be added to the network. Such a net-

    not approximate the function properly. If c threshold closer to one and the initial value of a threshold is cho-

    to the minimum value of hinge error, then the resultantmay contain many neurons with poor generalizationnce, the range for the meta-cognitive knowledge mea-

    threshold can be selected in the interval [0.30.7] and for the initial value of self-adaptive meta-cognitive

  • 660 G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666

    addition threshold can be selected in the interval [1.31.7]. Thea is adapted as follows

    a := a + (1 )Et (34)

    where isclose to oonly whesignican

    The newor will beron in theneuron stin the samthe paramlearning anearest nsample. Tinitializatsier signproposed Inter/in

    sample Existing

    neuronLet nrS bethe neare

    nrS = arg

    Let the EunrS and n

    dS = ||xt

    Using theoverlappi Distinct

    both inlnrI) thclass cluhiddendeterm

    cK+1 =

    where responsthe ran

    No-overintra-clwith this less tclasses.width (

    cK+1 =

    Minimuing samto the itance roverlapnew hidneuronis initia

    cK+1 =

    where is center shift factor which determines how much cen-ter has to be shifted from the new training sample location. Inour simulation studies value is xed to 0.1.

    Signicant overlapping with the inter-class: When a new trainingple

    the ice rerlapw hiduron

    +1 =boves in mhen ledghts ae siz

    re ht

    ons ing sent inurone. Thffect

    as p+1

    ,p =

    =

    aK+1,

    ,K+1

    ize o

    1)n =

    bK+1

    ,j =K

    re yi ion (

    1

    1

    ly th

    tK

    +1

    ] the slope that controls rate of self-adaptation and is setne. The a adaptation allows McRBFN to add neuronsn presented samples to the cognitive network containst information.

    training sample may have overlap with other classes from a distinct cluster far away from the nearest neu-

    same class. Therefore, one need to identify the currentatus (overlapping with other classes or distinct clustere class) with respect to exiting neurons and initializeeters of the new neuron (K + 1). The existing sequentiallgorithms initialize width based on the distance witheuron and output weight as error based on the currenthe inuence of past samples is not considered in weightion. Hence, it will affect the performance of the clas-icantly. The above mentioned issues are dealt in the

    McRBFN astra class nearest neuron distances from the current

    for width determination. knowledge of past samples stored in the network as

    center is used to initialize the weight of new neuron. the nearest hidden neuron in the intra-class and nrI best hidden neuron in the inter-class. They are dened as

    minl==c;k

    xt lk; nrI = arg minl /= c;kxt lk (35)

    clidian distances between the new training sample torI are given as follows

    cnrS ||; dI = ||xt lnrI || (36)

    nearest neuron distances, we can determine theng/no-overlapping conditions as follows:

    sample: when a new training sample is far away fromtra/inter class nearest neurons (dS >> cnrS AND dI >>en the new training sample does not overlap with anyster, and is from a distinct cluster. In this case, the new

    neuron center (cK+1) and width (cK+1) parameters are

    ined as

    xt; cK+1 =

    (xt)Txt (37)

    is a positive constant which controls the overlap of thees of the hidden units in the input space, which lies inge 0.5 1.lapping: When a new training sample is close to theass nearest neuron then the sample does not overlape other classes, i.e., the intra/inter class distance ratiohan 1, then the sample does not overlap with the other

    In this case, the new hidden neuron center (cK+1) andcK+1) parameters are determined as

    xt; cK+1 = xt cnrS (38)

    m overlapping with the inter-class: when a new train-ple is close to the inter-class nearest neuron comparedntra-class nearest neuron, i.e., the intra/inter class dis-atio is in range 11.5, then the sample has minimumping with the other class. In this case, the center of theden neuron is shifted away from the inter-class nearest

    and shifted towards the intra-class nearest neuron, andlized as

    xt + (cnrS lnrI); cK+1 = cK+1 cnrS (39)

    samto tanovnene

    cK

    The ahelp

    Wknowweig

    Th

    wheneurlearnpresof nespacthe eused

    aK

    aK+1

    and

    aK+1

    The s

    Bt(K+

    and

    bK+1

    wheneur

    yij =

    Final[W

    wtK is very close to the inter-class nearest neuron comparedntra-class nearest neuron, i.e., the intra/inter class dis-atio is more than 1.5, then the sample has signicantping with the other class. In this case, the center of theden neuron is shifted away from the inter-class nearest

    and is initialized as

    xt (lnrI xt); cK+1 = cK+1 lnrI (40) mentioned center and width determination conditionsinimizing the misclassication in McRBFN classier.

    a neuron is added to McRBFN, based on the existinge of past samples stored in the network the outputre estimated using the PBL as follows:e of matrix A is increased from K K to (K + 1) (K + 1)

    (41)

    =[ht1, h

    t2, . . . , h

    tK

    ]is a vector of the existing K hidden

    response for new (tth) training sample. In sequentialamples are discarded after learning, but the information

    the past samples are stored in the work. The centers provides the distribution of past samples in featureese centers can be used as pseudo-samples to capture

    of past samples. Hence, existing hidden neurons areseudo-samples to calculate aK+1 and aK+1,K+1 terms.R1K is assigned as

    K+1i=1hiK+1h

    ip, p = 1, . . . , K where hip

    exp

    (

    li lp2

    (lp)2

    )(42)

    K+1 R+ value is assigned as

    =K+1i=1hiK+1h

    iK+1 (43)

    f matrix B is increased from K n to (K + 1) nBt1Kn + (ht)T(yt)TbK+1

    (44) 1 nR is a row vector assigned as+1

    i=1hiK+1y

    ij, j = 1, . . . , n (45)

    s the pseudo-output for the ith pseudo sample or hiddenli) given as

    if l = j otherwise j = 1, . . . , n (46)

    e output weights are estimated as

    =(At(K+1)(K+1)

    )1Bt(K+1)n (47)

  • G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666 661

    where WtK is the output weight matrix for K hidden neurons, andwtK+1 is the vector of output weights for new hidden neuron afterlearning from tth sample. The inverse of a matrix At(K+1)(K+1) iscalculated recursively using matrix identities as

    where

    aK+1(AtK

    (AtKK )1

    After calcthe result

    WtK =[IK

    [Wt1K +

    (wtK+1 =

    Parameteis used to(WK = [w

    ct == ct A

    where uthresholdimum hiadapting will be putant netwIf a loweupdating sequenceparamete[0.40.7]

    The u

    u := uwhere paramete

    When aters, the Pfollows:

    J(WtK )wpj

    =

    Equating (55), we g

    (At1 + (h

    By substituting Bt1 = At1Wt1K &At1 +(ht)T

    ht = At andadding/subtracting the term

    (ht)T

    htWt1K on both sides Eq. (56)reduced to

    (At

    ly th

    WtK

    e et

    ple rfy eitone

    pusies

    samally,in th

    sam

    L-M

    summ give

    ocodhm.: Preseata stt: Decen fea

    zatione parearnin

    eta-coithspect t

    CompFindscondusingd on aponenple De

    == clete

    n Grow

    ct /= a ne

    ose t

    7) to (ate t

    resh

    eters U

    ct ==ate thate thq. (54)ple ReE

    curren the sased toIF

    itive cO

    BL-My d(48)

    AtKK = At1 +(ht)T

    ht , = aK+1,K+1 1K

    )1aTK+1 and

    (AtKK

    )1is calculated as

    = (At1)1 (At1)

    1(ht)Tht(At1)

    1

    1 + ht(At1)1(ht)T(49)

    ulating inverse of matrix in Eq. (47) using Eqs. (48)&(49),ant equations are

    K +(At1KK

    )1aTK+1aK+1

    ](50)

    At1KK)1(

    ht)T(

    yt)T] (At1KK)1aTK+1bK+1

    (51)

    1

    [aK+1

    (Wt1K +

    (At1KK

    )1(ht)T(

    yt)T) bK+1]

    (52)

    rs update strategy: The current (tth) training sample update the output weights of the cognitive component1, w2, . . . , wK ]

    T ) if the following criterion is satised.

    ND Et u (53)

    is the self-adaptive meta-cognitive parameter update. If u threshold is chosen closer to 50% of max-nge error, then very few samples will be used forthe network parameters and most of the samplesshed to the end of the training sequence. The resul-ork will not approximate the function accurately.r value is chosen, then all samples will be used inthe network parameters without altering the training. Hence, the range for the initial value of meta-cognitiver update threshold can be selected in the interval.is adapted based on the hinge error as:

    + (1 )Et (54)

    is the slope that controls the rate of self-adaption ofr update and is set close to one.

    sample is used to update the output weight parame-BL algorithm updates the output weight parameters as

    J(WtK )wpj

    + Jt(WtK )

    wpj= 0, p = 1, . . . , K; j = 1, . . . , n

    (55)

    the rst partial derivative to zero and re-arranging Eq.et

    t)Tht)WtK (Bt1 + (ht)T (yt)T ) = 0 (56)

    WtK =Final

    WtK =wher(5).

    Samsatiscompple ismodthese

    Ideable when

    2.3. PB

    To form is

    PseudAlgorit

    Inputfrom dOutpubetwe

    STARTInitiali

    ThStart lDO

    Mwre

    BasecomSam

    IF ct

    De

    Neuro

    ELSEIFAdd

    Cho

    (3Upd

    th

    Param

    ELSEIFUpdUpdto ESamELSThe

    end oflater u

    END

    The cognENDD

    END

    In Plearn b)1(AtWt1K + (ht)T ((yt)T htWt1K )) (57)e output weights are updated as

    1 +(At)1(

    ht)T(

    et)T

    (58)

    is the hinge loss error for tth sample obtained from Eq.

    eserve strategy: If the new training sample does nother the deletion or the neuron growth or the cognitivent parameters update criterion, then the current sam-hed to the rear of the training sequence. Since McRBFNthe strategies based on the current sample knowledge,ples may be used in later stage.

    training process stops when no further sample is avail-e data stream. However, in real-time, training stopsples in the reserve remains same.

    cRBFN classication algorithm

    arize, the PBL-McRBFN algorithm in a pseudo coden in Pseudo code 1:

    e 1. Pseudo code for the PBL-McRBFN classication

    nt the training data one-by-one to the networkream.ision function that estimates the relationshipture space and class label.

    : Assign the rst sample as the rst neuron(K=1).ameters of the neuron are chosen as shown in Eq. (37).g for samples t = 2, 3,...

    gnitive component computes the signicance of the sample

    o the cognitive component:

    utes the cognitive component output ytusing Eq. (2).

    the predicted class label ct , maximum hinge error Et ,ence of classier p(ct |xt ) and class-wise signicance c

    Eqs.(25),(26) and (31).bove calculated measures the meta-cognitivet selects one of the following strategies:lete Strategy:tANDp(ct |xt ) d THEN

    the sample from the sequence without learning.

    th Strategy:

    ctOREt aAND c(xt ) c THENuron to the network (K = K+1).he parameters of the new hidden neuron using Eqs.52).he self-adaptive meta-cognitive addition

    old according to Eq. (34)pdate Strategy:

    ctANDEt u THENe parameters of the cognitive component using Eq. (58)e self-adaptive meta-cognitive update threshold according

    serve Strategy:

    t sample xt , yt is pushed to the rearmple stack to be used in future. They can be

    ne-tune the cognitive component parameters.

    omponent executes the above selected strategy.

    cRBFN, sample delete strategy address the what-to-eleting insignicant samples from training data set,

  • 662 G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666

    Table 1Description of benchmark data sets selected from UCI machine learning repository for performance study.

    Data sets No. of features No. of classes No. of samples I.F

    Training Testing Training Testing

    Image segm 210 IRIS 45 WINE 60Vehicle class 424a

    Glass identi 109a

    HEART 70 Liver disorde 200 PIMA 400 Breast cance 300Ionosphere ( 100

    a Training sa

    neuron grothe how-tolearns fromthresholds when-to-leaaccording t

    3. Perform

    PBL-McRmark multimachine lethe best peliterature (Sstandard suwith varyinusing Imba

    I.F = 1 nN

    where Nj isclass j and Nthe numbeber of samppresented ilems chosensets and thePBL-McRBFproblems: ttoring data for breast c

    All the sion a desktoFor ELM clathe construulations forthe LIBSVM(c,) are opmeasures u

    3.1. Perform

    The clasciencies anmultiple clacomparison

    3.1.1. ClassThe conf

    mance and

    anc is de

    jj

    j 1

    qjj isand

    traition aer-al

    nj=1

    Statis clasclassparean tsh thescridmaultip

    ith class

    irji) o

    n by

    12M(L +

    folloegree

    squaentation (IS) 19 7 4 3

    13 3ication (VC) 18 4 cation (GI) 9 6

    13 2 rs (LD) 6 2

    8 2 r (BC) 9 2ION) 34 2

    mples are repeated three times randomly as suggested in [15].

    wth strategy and parameters update strategy address-learn efciently by which the cognitive component

    the samples, and self-adaptive nature of meta-cognitivein addition to the sample reserve strategy address thern by presenting the samples in the learning processo the knowledge present in the sample.

    ance evaluation of PBL-McRBFN classier

    BFN classier performance is evaluated on bench--category and binary classication problems from UCIarning repository. The performance is compared withrforming sequential learning algorithm reported in theRAN) [20], batch ELM classier [16] and also with thepport vector machines [42]. The data sets are choseng sample imbalance. The sample imbalance is measuredlance Factor (I.F) as

    minj=1...n

    Nj (59)

    the total number of training samples belonging to the =n

    j=1Nj . The description of these data sets includingr of input features, the number of classes, the num-les in the training/testing and the imbalance factor aren Table 1. From Table 1, it can be observed that the prob-

    for the study have both balanced and unbalanced data imbalance factors of the data sets vary widely. Finally,N classier is used to solve two real-world classicationhe acoustic emission signal processing for health moni-set presented in [38] and the mammogram classicationancer detection data set presented in [43].mulations are conducted in MATLAB 2010 environmentp PC with Intel Core 2 Duo, 2.66GHz CPU and 3GB RAM.ssier, the number of hidden neurons are obtained usingctive-destructive procedure presented in [44]. The sim-

    batch SVM with Gaussian kernels are carried out using package in C [45]. For SVM classier, the parameters

    performwhich

    j =q

    N

    whereclass j j in theevaluathe ov

    a = 1n

    3.1.2. The

    of an is comFriedmestablibrief d

    Frieover mon theall the

    1/M

    is give

    2F = L

    whichL 1 dsum oftimized using grid search technique. The performancesed to compare the classiers are described below.

    ance measures

    s-wise performance measures like overall/average ef-d a statistical signicance test on performance ofssiers on multiple data sets are used for performance.

    -wise measureusion matrix Q is used to obtain the class-level perfor-global performance of the various classiers. Class-level

    Iman anmore conse

    FF =(M

    M(L which follodegrees of as the probdistributionthe statisticclassier iscondence 2100 0 0105 0 0118 0 0.29422 0.1 0.12105 0.68 0.77

    200 0.14 0.1145 0.17 0.14368 0.22 0.39383 0.26 0.33251 0.28 0.28

    e is measured by the percentage classication (j)ned as:

    00% (60)

    the total number of correctly classied samples in theNj is the total number of samples belonging to a classning/testing data set. The global measures used in there the average per-class classication accuracy (a) and

    l classication accuracy (o) dened as:

    j, o =

    nj=1qjj

    N 100% (61)

    tical signicance testsication efciency itself is not a conclusive measureier performance [37]. Since the developed classierd with multiple classiers over multiple data sets, theest followed by the Benferroni-Dunn test is used toe statistical signicance of PBL-McRBFN classier. Aption of the conducted test is given below.n Test: It is is used to compare multiple classiers (L)le data sets (M). Let rj

    ibe the rank of the jth classier

    data set. Under the null-hypothesis, which states thatiers are equivalent and so their average rank Rj (Rj =ver all data sets should be equal, the Friedman statistic

    1)

    j

    R2j L(L + 1)2

    4

    (62)ws the 2 (Chi-square distribution) distribution withs of freedom. A 2 distribution is the distribution of ares of L independent standard normal variables.

    d Davenport showed that Friedmans statistic (2F ) isrvative and derived a better statistic [46]. It is given by

    1)2F 1) 2F

    (63)

    ws the F-distribution with L 1 and (L 1)(M 1)freedom is used in this paper. F-distribution is denedability distribution of the ratio of two independent 2

    s over their respective degrees of freedom. The aim ofal test is to prove that the performance of PBL-McRBFN

    substantially different from the other classiers with alevel of value 1 . If calculated FF > F/2,(L1),(L1)(M1)

  • G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666 663

    or FF < F1/2,(L1),(L1)(M1), then the null-hypothesis is rejected.The Statistical tables for critical values can be found in [47].

    Post-hoc Test: The Benferroni-Dunn test [48] is a post-hoc testthat can be performed after rejection of the null-hypothesis. It isused to comsiers. Thisare signicaby at least ter i perforis calculated

    CD = qL

    where criticdivided by

    3.2. Perform

    The clasefcienciesMcRBFN, SRTable 2 conclassicatioTable 2, webetter thanbetter thanaddition, thples to leararchitecture

    Well balalization pethan SRAN ers. On IS achieve 2% achieves apclassiers. Suses fewer generalizatgeneralizatrithm, whicon the currto avoid ovuses only 8classier.

    In orderposed PBL-ELM classiclassier. Osamples forELM algorit

    The testi89 samplesdeveloped ugeneralizatrons (ELM*

    testing efachieve 90.strategy premaking abil

    Imbalancsets, the gemately 2 1ELM and SMcRBFN redecision su

    decision surface. Class-overlap based criterion in initializing thecenters and width of new neuron in PBL-McRBFN and meta-cognitive learning helps PBL-McRBFN to achieves signicantlybetter generalization performance. For example, in VC data set pro-

    PBL- efcassit hasigh imassire that SRsses

    of pr (a) poseor thtely. e tess, 5%% ims.

    ary dN acovernd Btingith fwer e tesemeeciing troblecRBF

    imb

    Statishis sN clan tion 3are sexpehliger ovmpar).

    parassiere pr

    is 16. (63)

    stati 27

    hypo statiejecd PBiersxt, wd PB64), canhat pare PBL-McRBFN classier against all the other clas- test assumes that the performances of two classiersntly different if the corresponding average ranks differhe Critical Difference (CD), i.e, (Ri Rj) > CD then classi-ms signicantly than classier j. The critical difference

    using

    (L + 1)6M

    (64)

    al values q are based on the Studentized range statistic2 as given in [37].

    ance evaluation on UCI benchmark data sets

    s-wise performance measures (average/overall) testing, number of hidden neurons and samples used for PBL-AN, ELM and SVM classiers are reported in Table 2. Thetains results of both the binary and the multi-categoryn data sets from UCI machine learning repository. From

    can see that PBL-McRBFN classier performs slightly the best performing SRAN classier and signicantly

    ELM and SVM classiers on all the 10 data sets. Ine proposed PBL-McRBFN classier requires fewer sam-n the decision function and develops compact neural

    to achieve better generalization performance.anced data sets: In IS, IRIS, WINE data sets, the gener-rformance of PBL-McRBFN is approximately 2% moreclassier and 3 4 % more than ELM and SVM classi-data set proposed PBL-McRBFN uses fewer samples toimprovement over SRAN and proposed PBL-McRBFNproximately 3 4 % improvement over ELM and SVMimilar to IS, on IRIS and WINE data sets, PBL-McRBFNsamples with less number of neurons to achieve betterion performance. PBL-McRBFN classier achieves betterion performance using meta-cognitive learning algo-h selects appropriate samples to used in learning basedent knowledge. Also, deletes many redundant sampleser training. For example, in IS data set, PBL-McRBFN9 samples out of 210 training samples to build the best

    to highlight the above-mentioned advantages of pro-McRBFN classier, we conduct a simulation study iner with only training samples used by PBL-McRBFNn IS data set, PBL-McRBFN classier selects the best 89

    training and these samples are used in batch learninghm and we refer this classier as ELM*.ng performance of ELM* classier (which uses the best

    sequence) is better than the original ELM classiersing 210 training samples. Also, ELM* achieves better

    ion performance with smaller number of hidden neu-requires only 32 hidden neurons to achieve 92.14%

    ciency whereas ELM requires 49 hidden neurons to23%). This study clearly indicates that sample deletionsent in PBL-McRBFN helps in achieving better decisionity.ed data sets: In VC, GI, HEART, LD, PIMA, BC, ION dataneralization performance of PBL-McRBFN is approxi-0 % more than SRAN classier, and 2 15 % more thanVM classiers. In case of imbalance data sets, PBL-quire more number of neurons to approximate therface with minimal samples for approximating the

    posed testingELM cldata seSuch hSVM cl6% mofact ththe claIn caseciencythe proedge faccuraaveragsampleand 15neuron

    BinMcRBF2 7 % PIMA aage tesSVM wuses feaveragimprovclass spcapturance pPBL-Msample

    3.2.1. In t

    McRBFFriedmin Sectranks is 2.5) test higclassiour co(M = 10

    Non-4 claset a(62))in Eqied3 andnull iedcan rposeclass

    NeposeEq. (signisee tMcRBFN uses fewer samples to achieve better averageiency approximately 2% improvement over SRAN anders, and 10% improvement over SVM classier. The GI

    imbalance factor of 0.68 in training and 0.77 in testing.balance inuences the performance of SRAN, ELM and

    ers. On GI data set, SRAN overall testing efciency (o) isan the average testing efciency (a). This is due to theAN classier is not able to capture the knowledge forwhich contain smaller number of samples accurately.oposed PBL-McRBFN classier, the average testing ef-is 8% more than the overall testing efciency (o). Thusd PBL-McRBFN classier is able to captures the knowl-e classes which contain smaller number of samplesOn GI data set proposed PBL-McRBFN achieves betterting efciency 12% improvement over SRAN with fewer

    improvement over ELM with less number of neurons,provement over SVM classier with fewer number of

    ata sets: On HEART and LD data sets proposed PBL-hieves better average testing efciency approximately

    SRAN, ELM and SVM with less number of neurons. OnC data sets proposed PBL-McRBFN achieves better aver-

    efciency approximately 1 2 % over SRAN, ELM andewer samples. On ION data set proposed PBL-McRBFNsamples with less number of neurons to achieve betterting efciency 5% improvement over SRAN and 8 9 %nt over ELM and SVM. The overlapping conditions andc criterion in learning strategies of PBL-McRBFN helps inhe knowledge accurately in case of high sample imbal-ms. From the Table 2, we can say that the proposedN improves average/overall efciency even under highalance.

    tical signicance analysisection, we highlight the signicance of proposed PBL-assier on multiple data set using non-parametricest followed by the Benferroni-Dunn test as described.1.2. The Friedman test identify the measured averageignicantly different from the mean rank (mean rankcted under the null-hypothesis. The Benferroni-Dunnhts statistical difference in performance of PBL-McRBFNer other classiers. From the Table 2, we can see thatison study uses four classiers (L = 4) and ten data sets

    metric test using overall testing efciency (o): Ranks of allrs based on the overall testing efciency for each dataovided in Table 3. The Friedman statistic (2F as in Eq..89 and modied (Iman and Davenport) statistic (FF as) is 11.59. For four classiers and ten data sets, the mod-stic is distributed according to the F-distribution withdegrees-of-freedom. The critical value for rejecting thethesis at signicance level of 0.05 is 3.65. Since, mod-stic is greater than the critical value (11.59 3.65), wet the null hypothesis. Hence, we can say that the pro-L-McRBFN classier performs better than the existing

    on these data sets.e conduct the Benferroni-Dunn test to compare the pro-L-McRBFN classier with the all other classiers. Fromthe critical difference (CD) is calculated as 1.382 for ace level of 0.05 (q0.05 = 2.394). From Table 3, we canthe difference in average rank between the proposed

  • 664 G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666

    Table 2Performance comparison of PBL-McRBFN with SRAN, ELM and SVM.

    Data sets PBL-McRBFN SRAN ELM SVM

    K Samples Testing K Samples Testing K Testing SVa Testing

    a o a o a

    IS 92.29 49 90.23 90.23 127 91.38 91.38IRIS 96.19 10 96.19 96.19 13 96.19 96.19WINE 97.19 10 97.46 98.04 36 97.46 98.04VC 76.86 150 77.01 77.59 340 70.62 68.51GI 80.95 80 81.31 87.43 183 70.47 75.61HEART 77.53 36 76.50 75.91 42 75.50 75.10LD 65.78 100 72.41 71.41 141 71.03 70.21PIMA 74.90 100 76.63 75.25 221 77.45 76.43BC 97.26 66 96.35 96.48 24 96.61 97.06ION 91.88 32 89.64 87.52 43 91.24 88.51

    a Number o

    Table 3Ranks based o

    Data sets

    ISIRIS WINE VC GI HEART LD PIMABC IONAverage ran

    PBL-McRB2.05 and critical dithe Benfeclassier siers.

    Non-paraall 4 classdata set ain Eq. (6213.9. Sinc(13.9 3.6that the pthe other

    From Tbetween tclassiersgreater thaverage tthat the pthe othermance reclassicatset for heaclassicat[43].

    3.3. Acoust

    The streducer due deformatioThese signtion/identi

    difcic em

    furtic emmple

    acoUsed o a Used o

    50 89 94.19 94.19 47 113 92.29 6 20 98.10 98.10 8 29 96.19

    11 29 98.31 98.69 12 46 96.61 175 318 78.91 79.09 113 437 75.12 71 115 84.76 92.72 59 159 86.21 20 69 81.50 81.47 28 56 78.50 87 116 73.1 72.63 91 151 66.90

    100 162 79.62 76.67 97 230 78.53 13 45 97.39 97.85 7 91 96.87 18 58 96.41 96.47 21 86 90.84

    f support vectors

    n the overall (o) and average (a) testing efciencies.

    PBL-McRBFN SRAN

    o a o a

    1 1 2 21 1 3 3 1 1 4 4 1 1 3 3 2 1 1 3 1 1 2 2 1 1 4 4 1 1 2 4 1 1 2 2 1 1 3 2

    k (Rj) 1.1 1 2.6 2.9

    FN classier and the other three classiers are 1.5,2.05. The difference in average rank is greater than thefference. Hence, based on the overall testing efciencyrroni-Dunn test shows that the proposed PBL-McRBFNis signicantly better than the SRAN, ELM and SVM clas-

    a very acoustplexityacoustthe coof suchmetric test using average testing efciency (a): Ranks ofiers based on the average testing efciency for eachre provided in Table 3. The Friedman statistic (2F as)) is 18.21 and modied statistic (FF as in Eq. (63)) ise, modied statistic is greater than the critical value5), we can reject the null hypothesis. Hence, we can sayroposed PBL-McRBFN classier performs better than

    classiers on these data sets.able 3, we can see that the difference in average rankhe proposed PBL-McRBFN classier and the other three

    are 1.9, 1.95 and 2.15. The difference in average rank isan the critical difference (1.382). Hence, based on theesting efciency, the Benferroni-Dunn test also showsroposed PBL-McRBFN classier performs better than

    well known classiers. Next, we present the perfor-sults of PBL-McRBFN classier on the two real-worldion problem data sets, viz., an acoustic emission datalth monitoring presented in [38] and the mammogramion data set for breast cancer detection presented in

    ic emission signal classication for health monitoring

    ss or pressure waves produced by the sensitive trans-to the transient energy released by the irreversiblen in the material are called as acoustic emission signals.als are produced by various sources and classica-cation of sources using the acoustic emission signals is

    classier. Ttic emissionstudy as gicharacterizof the 4 souspark sourcare used forples are useof input fea[38].

    The percompared Table 4. It caicant samplachieve an classier pesion signals

    Table 4Performance c

    Classier

    PBL-McRBFNSRAN ELMSVM

    a Number oELM SVM

    o a o a

    4 4 3 33 3 3 32.5 2.5 2.5 2.52 2 4 43 2 4 43 3 4 42 2 3 34 3 3 24 4 3 34 4 2 33.15 2.95 3.15 3.15

    ult problem. The presence of ambient noise and pseudoission signals in practical situations increases the com-her. In addition, the supercial similarities between theission signals produced by different sources increasesxity further. In this section, we address classicationustic emission signals using the proposed PBL-McRBFN

    he experimental data provided for the burst type acous-

    signals from the metallic surface is considered for ourven in [38]. The burst type acoustic emission signal ised by 5 features and these signals are classied into onerces, namely, the pencil source, the pulse source, thee and the noise source. Out of 199 samples, 62 samples

    training (as highlighted in [38]) and the remaining sam-d for testing the classier. For details on characteristicstures and the experimental setup, one should refer to

    formance study results of PBL-McRBFN classier areagainst the SRAN, ELM, and SVM, and presented inn be seen that PBL-McRBFN classier uses only 9 signif-es to build the classier and requires only 5 neurons toover-all testing efciency of 99.27%. Thus, PBL-McRBFNrforms an efcient classication of the acoustic emis-

    using a compact network.

    omparison on acoustic emission signal problem.

    Hidden Samples Testing

    Neurons Used o a

    5 9 99.27 98.9110 39 99.27 98.9110 62 99.27 98.9122a 62 98.54 97.95

    f support vectors.

  • G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666 665

    Table 5Performance comparison on mammogram classication problem.

    Classier Hidden Samples Testing

    PBL-McRBFNSRAN ELMSVM

    1 Number o

    3.4. Mamm

    Mammocer, as tumbefore theyically, idenabnormal mas either mtumor is debenign or mprocedure tpatient. A nin a mammsies, thus spcosts. In thbeen used. of the identeither maligPBL-McRBFsier is evadetails on t[43].

    The perfison with From the tahighly efcismaller numSVM classierably.

    Thus, frowith SRAN, classicatioMcRBFN cla

    4. Conclus

    In this pFunction N(PBL) algorwork. The learning ofcognitive coimplementiwhen-to-leaping conditinitializatiomizes the mPBL-McRBFmulti-categlearning reppractical claparison witindicates thclassier.

    Acknowledgements

    The authors would like to thank the Nanyang Technologicalsity-Ministry of Defence (NTU-MINDEF), Singapore, for theal sut thi

    nces

    . Zhantems, 462eCunel, Baputa

    Li, T.Jlied Sri, G. rk to c340avi, Cwork ks, Ap. RuizrmatKhan,lied S. RumpagatB. Huaeme oNeuraB. Huachine

    Platt, ation ingwection ral Co

    B. Huaorithms on 422Y. Lianine ses on Nuresher usi513ureshdied09) 54Kasabine knernet. Rive

    essmedern Lsaacsorning:p of Teureshource 230SureshcompnsactiSateesblems96.Subramneuro336Neurons Used o a

    22 60 100 10025 45 90.91 91.6730 97 90.91 90.0261 97 90.91 91.67

    f support vectors

    ogram classication for breast cancer detection

    gram is a better means for early diagnosis of breast can-ors and abnormalities show up in mammogram much

    can be detected through physical examinations. Clin-tication of malignant tissues involves detecting theasses or tumors, if any, and then classifying the massalignant or benign as given in [39]. However, once atected, the only method of determining whether it isalignant is by conducting a biopsy, which is an invasivehat involves the removal of the cells or tissue from aon-invasive method of identifying the abnormalitiesogram can reduce the number of unnecessary biop-aring the patients of inconvenience and saving medicalis study, mammogram database available in [43] hasThe 9 input features extracted from the mammogramied abnormal mass are used to classify the tumor asnant or benign. Here, 97 samples are used to developN classier and the performance of PBL-McRBFN clas-luated using the remaining 11 samples. For furtherhe input features and the data set, one should refer to

    ormance results of PBL-McRBFN classier, in compar-the SRAN, ELM and SVM are presented in Table 5.ble, it is seen that PBL-McRBFN classier performs aent classication with 100% classication accuracy withber of hidden neurons. When compared to SRAN, ELM,ers, performance of PBL-McRBFN is improved consid-

    m the performance study of PBL-McRBFN conductedELM, SVM for chosen benchmark data sets and practicaln problems,it can be observed that the proposed PBL-ssier performs better than other classiers.

    ions

    aper, we have presented a Meta-cognitive Radial Basisetwork (McRBFN) and its Projection Based Learningithm for classication problems in sequential frame-meta-cognitive component in McRBFN controls the

    the cognitive component in McRBFN. The meta-mponent adapts the learning process appropriately byng self-regulation and hence it decides what-to-learn,rn and how-to-learn efciently. In addition, the overlap-ions present in neuron growth strategy helps in propern of new hidden neuron parameters and also mini-

    Univernanciconduc

    Refere

    [1] G.BSys451

    [2] Y. LJackCom

    [3] F.F.App

    [4] S. Awo330

    [5] V. Rnetban

    [6] M.EInfo

    [7] M. App

    [8] D.Epro

    [9] G.-schon

    [10] G.-ma

    [11] J.C.put

    [12] L. YfunNeu

    [13] G.-algtion228

    [14] N.-onltion

    [15] S. Ssi134

    [16] S. Smo(20

    [17] N. onlCyb

    [18] W.PassMo

    [19] R. Ileashi

    [20] S. Sres301

    [21] S. for Tra

    [22] G. pro86

    [23] K. for 360isclassication error. The performance of the proposedN classier has been evaluated using the benchmarkory, binary classication problems from UCI machineository with wide range of imbalance factor and twossication problems. The statistical performance com-h the well-known classiers in the literature clearlye superior performance of the proposed PBL-McRBFN

    [24] R. Savithcomplex-(5) (2012

    [25] R. Savithfor a Full209218

    [26] G. Sateescognitive2012 Inte290729pport (Grant number: MINDEF-NTU-JPP/11/02/05) tos study.

    g, Neural network for classication: a survey, IEEE Transactions onMan and Cybernetics Part C: Applications and Reviews 30 (4) (2000)., B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D.ckpropagation applied to handwritten zip code recognition, Neuraltion. 1 (1989) 541551.. Cox, A neural network model for speech intelligibility quantication,oft Computing 7 (1) (2007) 145155.Saha, In search of an optimization technique for articial neural net-lassify abnormal heart sounds, Applied Soft Computing 9 (1) (2009).. Pramodh, Threshold accepting trained principal component neuraland feature subset selection: application to bankruptcy prediction inplied Soft Computing 8 (4) (2008) 15391548., P. Srinivasan, Hierarchical text categorization using neural networks,ion Retrieval 5 (2002) 87118.

    S.W. Khor, Web document clustering using a hybrid neural network,oft Computing 4 (4) (2004) 423432.elhart, G.E. Hinton, R.J. Williams, Learning representations by back-ion errors, nature, Nature 323 (1986) 533536.ng, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learningf feedforward neural networks, IEEE International Joint Conferencel Networks. Proceedings 2 (2004) 985990.ng, X. Ding, H. Zhou, Optimization method based extreme learningfor classication, Neurocomputing 74 (1-3) (2010) 155163.A resource allocation network for function interpolation, Neural Com-3 (2) (1991) 213225.i, N. Sundararajan, P. Saratchandran, A sequential learning scheme forapproximation using minimal radial basis function neural networks,mputation 9 (2) (1997) 461478.ng, P. Saratchandran, N. Sundararajan, An efcient sequential learning

    for growing and pruning RBF (GAP-RBF) networks, IEEE transac-Systems, Man, and Cybernetics. Part B, Cybernetics 34 (6) (2004)92.g, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and accuratequential learning algorithm for feedforward networks., IEEE Transac-eural Networks 17 (6) (2006) 14111423.

    , N. Sundararajan, P. Saratchandran, A sequential multi-category clas-ng radial basis function networks, Neurocomputing 71 (1) (2008)58., R.V. Babu, H.J. Kim, No-reference image quality assessment using

    extreme learning machine classier, Applied Soft Computing 9 (2)1552.ov, Evolving fuzzy neural networks for supervised/unsupervisedowledge-based learning, IEEE Transactions on Systems, Man, andics, Part B: Cybernetics 31 (6) (2001) 902918.rs, Autonomy at all costs: an ethnography of metacognitive self-nt and self-management among experienced language learners, Theanguage Journal 85 (2) (2001) 279290.n, F. Fujita, Metacognitive knowledge monitoring and self-regulated

    academic success and reections on learning, Journal of the Scholar-aching and Learning 6 (1) (2006) 3955., K. Dong, H.J. Kim, A sequential learning algorithm for self-adaptiveallocation network classier, Neurocomputing 73 (1618) (2010)19., R. Savitha, N. Sundararajan, A sequential learning algorithmlex-valued self-regulating resource allocation network-CSRAN, IEEEons on Neural Networks 22 (7) (2011) 10611072.h Babu, S. Suresh, Meta-cognitive neural network for classication

    in a sequential learning framework, Neurocomputing 81 (2012)

    anian, S. Suresh, A meta-cognitive sequential learning algorithm-fuzzy inference system, Applied Soft Computing 12 (11) (2012)

    14.a, S. Suresh, N. Sundararajan, Metacognitive learning in a fullyvalued radial basis function neural network, Neural Computation 24) 12971328.a, S. Suresh, N. Sundararajan, A meta-cognitive learning algorithmy Complex-valued Relaxation Network, Neural Networks 32 (2012).h Babu, R. Savitha, S. Suresh, A projection based learning in meta-

    radial basis function network for classication problems, in: Thernational Joint Conference on Neural Networks (IJCNN), 2012, pp.

    14.

  • 666 G.S. Babu, S. Suresh / Applied Soft Computing 13 (2013) 654666

    [27] G. Sateesh Babu, S. Suresh, B.S. Mahanand, Alzheimers disease detection usinga Projection Based Learning Meta-cognitive RBF Network, in: The 2012 Inter-national Joint Conference on Neural Networks (IJCNN), 2012, pp. 408415.

    [28] G. Sateesh Babu, S. Suresh, K. Uma Sangumathi, H. Kim, A Projection BasedLearning Meta-cognitive RBF network classier for effective diagnosis ofParkinsons disease, in: J. Wang, G. Yen, M. Polycarpou (Eds.), Advances in Neu-ral Networks ISNN 2012, vol. 7368 of Lecture Notes in Computer Science,Springer, Berlin / Heidelberg, 2012, pp. 611620.

    [29] G. Sateesh Babu, S. Suresh, Parkinsons disease prediction using gene expres-sion a projection based learning meta-cognitive neural classier approach,Expert Systems with Applications (2012), http://dx.doi.org/10.1016/j.eswa.2012.08.070

    [30] M.T. Cox, Metacognition in computation: a selected research review, ArticialIntelligence 169 (2) (2005) 104141.

    [31] T.O. Nelson, L. Narens, Metamemory: A Theoretical Framework and New Find-ings, Allyn and Bacon, Boston, USA, 1992.

    [32] S. Suresh, N. Sundararajan, P. Saratchandran, Risk-sensitive loss functions forsparse multi-category classication problems, Information Sciences 178 (12)(2008) 26212638.

    [33] E. Castillo, O. Fontenla-Romero, B. Guijarro-Berdinas, A. Alonso-Betanzos, Aglobal optimum approach for one-layer neural networks, Neural Computation14 (6) (2002) 14291449.

    [34] E. Castillo, B. Guijarro-Berdinas, O. Fontenla-Romero, A. Alonso-Betanzos, Avery fast learning method for neural networks based on sensitivity analysis,Journal of Machine Learning Research 7 (2006) 11591182.

    [35] H. Hoffmann, Kernel PCA for novelty detection, Pattern Recognition 40 (3)(2007) 863874.

    [36] C. Blake, C. Merz, UCI repository of machine learning databases, University ofCalifornia, Irvine, Department of Information and Computer Sciences, 1998,http://archive.ics.uci.edu/ml/

    [37] J. Demsar, Statistical comparisons of classiers over multiple data sets, TheJournal of Machine Learning Research 7 (2006) 130.

    [38] S.N. Omkar, S. Suresh, T.R. Raghavendra, V. Mani, Acoustic emission sig-nal classication using fuzzy C-means clustering, Proceedings of the ICONIP02, 9th International Conference on Neural Information Processing 4 (2002)18271831.

    [39] C. Aize, Q. Song, X. Yang, S. Liu, C. Guo, Mammographic mass detection by vicinalsupport vector machine, Proceedings of the ICNN 04, International Conferenceon Neural Networks 3 (2004) 19531958.

    [40] T. Zhang, Statistical behavior and consistency of classication methods basedon convex risk minimization, Annals of Statistics 32 (1) (2004) 5685.

    [41] B. Scholkopf, A.J. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002.[42] C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (3) (1995)

    273297.[43] J. Sucklin

    matakis, digital m1069 (19

    [44] S. Suresh, S.N. Omkar, V. Mani, T.N.G. Prakash, Lift coefcient prediction athigh angle of attack using recurrent neural network, Aerospace Science andTechnology 7 (8) (2003) 595602.

    [45] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans-actions on Intelligent Systems and Technology 2 (2011) 27:1-27:27, softwareavailable at http://www.csie.ntu.edu.tw/ cjlin/libsvm

    [46] R.L. Iman, J.M. Davenport, Approximations of the critical region of the Friedmanstatistic, Communications in Statistics (1980) 571595.

    [47] J.H. Zar, Biostatistical Analysis, 4th Ed., Prentice-Hall, Englewood Clifs, NewJersey, 1999.

    [48] O.J. Dunn, Multiple comparisons among means, Journal of the American Statis-tical Association 56 (293) (1961) 5264.

    Mr. Giduthuri Sateesh Babu received the B.Tech degreein electrical and electronics engineering from Jawahar-lal Nehru Technological University, India, in 2007, andM.Tech degree in electrical engineering from Indian Insti-tute of Technology Delhi, India, in 2009. From 2009 to2010, he worked as a senior software engineer in Sam-sung R&D centre, India. He is currently a Ph.D. studentwith School of Computer Engineering, Nanyang Tech-nological University, Singapore. His research interestsinclude machine learning, cognitive computing, neuralnetworks, control systems, optimization and medicalinformatics.

    Dr. Sundaram Suresh received the B.E degree in electricaland electronics engineering from Bharathiyar Univer-sity in 1999, and M.E (2001) and Ph.D. (2005) degreesin aerospace engineering from Indian Institute of Sci-ence, India. He was post-doctoral researcher in schoolof electrical engineering, Nanyang Technological Univer-sity from 2005 to 2007. From 2007 to 2008, he was inINRIA-Sophia Antipolis, France as ERCIM research fel-low. He was in Korea University for a short period as avisiting faculty in Industrial Engineering. From January2009 to December 2009, he was in Indian Institute ofTechnology-Delhi as an Assistant Professor in Departmentof Electrical Engineering. Currently, he is working as an

    Assistant Professor in School of Computer Engineering, Nanyang Technological Uni-Singapore since 2010. He was awarded best young faculty for the year

    IIT-Designg, J. Parker, D.R. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts, E. Sta-N. Cerneaz, S. Kok, et al., The mammographic image analysis societyammogram database, Experta Medica International Congress Series94) 375378.

    versity, 2009 byvehicle dvision.elhi His research interest includes ight control, unmanned aerial, machine learning, applied game theory, optimization and computer

    Meta-cognitive RBF Network and its Projection Based Learning algorithm for classification problems1 Introduction2 Meta-cognitive radial basis function network for classification problems2.1 Problem definition2.2 McRBFN architecture2.2.1 Cognitive component of McRBFN2.2.2 Meta-cognitive component of McRBFN2.2.3 Learning strategies

    2.3 PBL-McRBFN classification algorithm

    3 Performance evaluation of PBL-McRBFN classifier3.1 Performance measures3.1.1 Class-wise measure3.1.2 Statistical significance test

    3.2 Performance evaluation on UCI benchmark data sets3.2.1 Statistical significance analysis

    3.3 Acoustic emission signal classification for health monitoring3.4 Mammogram classification for breast cancer detection

    4 ConclusionsAcknowledgementsReferences