review article on feature selection and rule extraction ...with feature selection and rule...

13
Review Article On Feature Selection and Rule Extraction for High Dimensional Data: A Case of Diffuse Large B-Cell Lymphomas Microarrays Classification Narissara Eiamkanitchat, 1 Nipon Theera-Umpon, 1,2 and Sansanee Auephanwiriyakul 2,3 1 Department of Electrical Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, ailand 2 Biomedical Engineering Center, Chiang Mai University, Chiang Mai 50200, ailand 3 Department of Computer Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, ailand Correspondence should be addressed to Nipon eera-Umpon; [email protected] Received 28 April 2015; Accepted 13 September 2015 Academic Editor: Chunlin Chen Copyright © 2015 Narissara Eiamkanitchat et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Neurofuzzy methods capable of selecting a handful of useful features are very useful in analysis of high dimensional datasets. A neurofuzzy classification scheme that can create proper linguistic features and simultaneously select informative features for a high dimensional dataset is presented and applied to the diffuse large B-cell lymphomas (DLBCL) microarray classification problem. e classification scheme is the combination of embedded linguistic feature creation and tuning algorithm, feature selection, and rule-based classification in one neural network framework. e adjustable linguistic features are embedded in the network structure via fuzzy membership functions. e network performs the classification task on the high dimensional DLBCL microarray dataset either by the direct calculation or by the rule-based approach. e 10-fold cross validation is applied to ensure the validity of the results. Very good results from both direct calculation and logical rules are achieved. e results show that the network can select a small set of informative features in this high dimensional dataset. By a comparison to other previously proposed methods, our method yields better classification performance. 1. Introduction An innovation in computational intelligence mechanism, not only to develop the high accuracy mechanisms but also to be interpreted easily by human, is an interesting research topic. In order to achieve the interpretability pur- pose, linguistic features are more desirable than other types of features. An algorithm for finding appropriate symbolic descriptors to represent ordinary continuous features must be developed for classification mechanism. Enormous research works in neural networks are accomplished in classification accuracy [1–4]. e better performance of rules generated from neural network than that from the decision tree in noisy conditions was demonstrated [1]. e subset method which conducted a breadth first search for all the hidden and output nodes over the input links was proposed [2]. Knowledge insertion was applied to reduce training times and improve various features of the neural networks [3, 4]. However, these algorithms are difficult to comprehend due to the large number of parameters and the complicated structure inside the networks. e methods to extract rules from neural network without the consideration of linguistic feature have been proposed in some research works [5, 6]. Fuzzy sets are appropriate choice in preparing linguistic data to more interpretable information for humans [7–11]. e methods to transform numeric data into linguistic terms before training and then extracting the rules were proposed [12–18]. A supervised type of neural network with a structure which supported the simplicity of the rule extraction was proposed [12]. Rules were extracted from neural networks using structural learning based on the matrix of importance index [13]. Other rule extraction methods were proposed by simply determining the typical fuzzy membership functions using the expectation maximization (EM) algorithm [14] or determining the context-dependent membership functions for crisp and fuzzy linguistic variables which allowed different Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 275831, 12 pages http://dx.doi.org/10.1155/2015/275831

Upload: others

Post on 19-Aug-2021

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

Review ArticleOn Feature Selection and Rule Extraction for HighDimensional Data A Case of Diffuse Large B-CellLymphomas Microarrays Classification

Narissara Eiamkanitchat1 Nipon Theera-Umpon12 and Sansanee Auephanwiriyakul23

1Department of Electrical Engineering Faculty of Engineering Chiang Mai University Chiang Mai 50200 Thailand2Biomedical Engineering Center Chiang Mai University Chiang Mai 50200 Thailand3Department of Computer Engineering Faculty of Engineering Chiang Mai University Chiang Mai 50200 Thailand

Correspondence should be addressed to NiponTheera-Umpon nipontcmuacth

Received 28 April 2015 Accepted 13 September 2015

Academic Editor Chunlin Chen

Copyright copy 2015 Narissara Eiamkanitchat et alThis is an open access article distributed under the Creative CommonsAttributionLicense which permits unrestricted use distribution and reproduction in anymedium provided the originalwork is properly cited

Neurofuzzy methods capable of selecting a handful of useful features are very useful in analysis of high dimensional datasets Aneurofuzzy classification scheme that can create proper linguistic features and simultaneously select informative features for a highdimensional dataset is presented and applied to the diffuse large B-cell lymphomas (DLBCL) microarray classification problemThe classification scheme is the combination of embedded linguistic feature creation and tuning algorithm feature selection andrule-based classification in one neural network frameworkThe adjustable linguistic features are embedded in the network structurevia fuzzy membership functionsThe network performs the classification task on the high dimensional DLBCLmicroarray dataseteither by the direct calculation or by the rule-based approach The 10-fold cross validation is applied to ensure the validity of theresults Very good results from both direct calculation and logical rules are achieved The results show that the network can selecta small set of informative features in this high dimensional dataset By a comparison to other previously proposed methods ourmethod yields better classification performance

1 Introduction

An innovation in computational intelligence mechanismnot only to develop the high accuracy mechanisms butalso to be interpreted easily by human is an interestingresearch topic In order to achieve the interpretability pur-pose linguistic features are more desirable than other typesof features An algorithm for finding appropriate symbolicdescriptors to represent ordinary continuous featuresmust bedeveloped for classification mechanism Enormous researchworks in neural networks are accomplished in classificationaccuracy [1ndash4] The better performance of rules generatedfrom neural network than that from the decision tree innoisy conditions was demonstrated [1] The subset methodwhich conducted a breadth first search for all the hiddenand output nodes over the input links was proposed [2]Knowledge insertion was applied to reduce training timesand improve various features of the neural networks [3 4]

However these algorithms are difficult to comprehend dueto the large number of parameters and the complicatedstructure inside the networks The methods to extract rulesfrom neural network without the consideration of linguisticfeature have been proposed in some research works [5 6]Fuzzy sets are appropriate choice in preparing linguistic datato more interpretable information for humans [7ndash11] Themethods to transform numeric data into linguistic termsbefore training and then extracting the rules were proposed[12ndash18] A supervised type of neural network with a structurewhich supported the simplicity of the rule extraction wasproposed [12] Rules were extracted from neural networksusing structural learning based on the matrix of importanceindex [13] Other rule extraction methods were proposed bysimply determining the typical fuzzy membership functionsusing the expectation maximization (EM) algorithm [14] ordetermining the context-dependent membership functionsfor crisp and fuzzy linguistic variableswhich alloweddifferent

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 275831 12 pageshttpdxdoiorg1011552015275831

2 Mathematical Problems in Engineering

linguistic variables in different rules [15] The logical ruleextraction from data was proposed with an assumption thata set of symbolic or continuous valued predicate functionshas been defined for some objects thus providing values offeatures for categorization of these objects [16] Nice examplesof neurofuzzy methods for a medical prediction problem anda biometric classification problem can be found in [17] and[18] respectively Nevertheless to discover proper linguisticfeatures for continuous data representation in these methodsis the tradeoff between simplicity and accuracy

Due to linguistic feature requirement in some situationsdepending on classification models input features are 119898times increased corresponding to 119898 linguistic terms foreach original feature This problem is more serious in highdimensional datasets The high dimensional feature vectorsmay contain noninformative features and feature redundancythat in turn can cause unnecessary computation cost anddifficulty in creating classification rules Therefore an algo-rithm for informative feature selection must be developedAlthough some neurofuzzy methods were utilized for featureselection in high dimensional datasets they are not widelyused On the other hand there are several research workson the use of neurofuzzy methods as a classifier [19 20]In [21] a neurofuzzy method was utilized to select goodfeatures by utilizing a relationship between fuzzy criteria In[22] a neurofuzzy scheme which combines neural networksand fuzzy rule base systems was proposed for simultaneousfeature selection and fuzzy rule-based classification Therewere three subprocesses in the learning phase The networkstructure was changed in phases 2 and 3 and the param-eters of membership functions were fine-tuned in phase 3Although we have similar purpose with [22] the networkstructures and learning methods are different In additionour method can automatically create linguistic features andall parameters are modified automatically in one learningphase without changing the network structure or retrainingnetwork while in [22] there is more than one learning phaseand that algorithm cannot fine-tune parameters withoutretraining network with different structure

We initially proposed a neurofuzzy method that couldselect features and simultaneously create rules for low-dimensional datasets [23 24] Although the method in thispaper is similar the training process for a high dimensionaldataset in this paper is different To emphasize its usefulnessChen and Lin have adopted our method into skin colordetection [25] With the consideration of the uncomplicatednetwork structure the main components including linguisticfeature creation and tuning algorithm feature selectionand rule-based classification were embedded in one neuralnetwork mechanism The problem of using linguistic featureis to define a particular linguistic set for each feature in agiven datasetThe fuzzymembership functionwas embeddedin our network for linguistic feature creation rather thanusing the ordinary feature The reason of this combinationmodel is the applicability of the neural networkrsquos learn-ing algorithms developed for fuzzy membership functionThe original features were transformed to linguistic fea-tures and then classified to informative and noninformativeclasses that is either +1 or minus1 Features with high weight

values referred to as the informative features were thereforeselected

In this paper we investigate the usefulness of our pro-posed neurofuzzy method by applying it to the high dimen-sional diffuse large B-cell lymphomas (DLBCL) microar-ray classification problem The number of features in thismicroarray dataset (7070 features) is much larger than whatwe have tried previously Moreover the number of samplesis very small (77 samples) Therefore this problem is verychallenging and it is interesting to see whether our methodwould work in this dataset with huge number of features butvery small number of samples The findings of informativefeatures and rules will be useful in diagnosis of this kind ofcancer The results will also indicate the generalization of ourmethod

This paper is organized as follows A neurofuzzy methodwith feature selection and rule extraction and its trainingscheme designed for a high dimensional dataset is describedin Section 2 The experimental setup data descriptionexperimental results and discussion are given in Section 3Section 4 concludes the paper

2 Neurofuzzy Method with FeatureSelection and Rule Extraction

Three requirements are concerned in the design of a neuralnetwork structure for rule extraction Less complication ofnetwork structure is the first requirement Therefore onlythree layers (an input a hidden and an output layers)in a neural network are constructed Consistent with thefirst requirement the small number of linguistic variablesis the second requirement The set of linguistic termssmallmedium large is sufficiently understood The finalrequirement is that there is only one best combination ruleused for classification The combination rule is created withthe consideration of the class order described in the nextsection

The neural network designed based on the aforemen-tioned requirements is displayed in Figure 1 The originalfeatures are fed forward to the input layer Each originalfeature is reproduced 119898 times corresponding to the numberof specified linguistic terms and used as the input to thehidden layer Instead of using the static linguistic featurefrom preprocessing we add the hidden layer with fuzzy logicmembership functions A Gaussian membership functionis used to represent membership value of each group Inaddition to weight updating equations the modifications ofthe center 119888 and the spread 120590 are required during the trainingprocess The second layer is the combination of fuzzificationand informative linguistic feature classification The class oflinguistic features is decided and fed as the input to the outputlayer

The number of nodes constructed in the input layer is thesame number of original input features For the forward passeach node is reproduced119898 times In our structure119898 is equalto 3 The number of outputs from this layer is therefore tripleof that of the original input features and represented by

119910in119894119895= 119909119894 (1)

Mathematical Problems in Engineering 3

x1

x2

x3

xp

S

M

L

S

M

L

S

M

L

S

M

L

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

O1

O2

Oq

Input layer Hidden layer Output layer

Figure 1 Our neurofuzzy classification model

where 119894 = 1 2 119901 denotes the order number of the originalinput and 119895 = 1 2 3 denotes the order number of thelinguistic feature created for each original input Between theinput layer and hidden layer all connection weights are set tounity

The next layer is the hidden layer In addition to calcu-lating fuzzy values corresponding to the specified linguisticsof the original input the maximum membership value isclassified to the informative class Since we use 3 linguisticterms that is small (S) medium (M) and large (L) foreach original feature the number of nodes constructed inthis hidden layer is 3 times of that of the input layer Eachnode uses Gaussian membership function to specify themembership values of the original feature that is

120583119894119895= 119890(minus(12)(119910

in119894119895minus119888119895)21205902

119895) (2)

120583119894119895is the membership value of 119910in

119894119895which is the original input

119894 from the linguistic node 119895 Each node in this layer has two

parameters to consider The first parameter is the spread 120590119894119895

for each original feature 119894 The initial values of 120590119894119895for 119895 = S

M and L come from the spreads divided by 3 (1205901198941198953) of all

data points in linguistic term 119895 The second parameter is thecenter 119888

119894119895 The initial values of 119888

119894119895for 119895 = S M and L are set to

(119888119894119895minus(1205901198941198953)) 119888

119894119895 and (119888

119894119895+(1205901198941198953)) respectively where 119888

119894119895is the

center of the membership set of linguistic term 119895 of originalfeature 119894

For each original input the most informative linguisticfeature is defined as the feature with maximum membershipvalue corresponding to the input value 119910in

119894119895 The parameters

119888119895and 120590

119895which are mean and standard deviation of only the

most informative linguistic variable are modified Considertwo linguistic terms (classes) of 119878 (the most informativelinguistic term) and 119879 (the noninformative linguistic term)the output 119910ℎ

119894119895of the hidden layer is equal to +1 if the input

119910in119894119895

belongs to class 119878 Alternatively 119910ℎ119894119895is equal to minus1 if

the input belongs to class 119879 for 119895 = 1 2 3 Therefore theoutput from the hidden layer is plusmn1 Each original featurehas one informative linguistic term represented by +1 andtwo noninformative linguistic terms represented by minus1 Alloutputs with identified informative values are used as inputto the final layer

In the output layer the number of nodes is equal to thenumber of the classes in the dataset Weights are fully con-nected between this layer and hidden layer Hence we utilizethis layer for feature selection purpose The importance oflinguistic features is specified by the corresponding weightvalues Sigmoid function is used as the activation function inthis layer Therefore the output is

119910119900

119896= 120593 (119911

119896) =

1

(1 + 119890minus119911119896)

(3)

where

119911119896= sum

119895

119908119895119896119910ℎ

119895 (4)

The subscript 119896 ordering from 1 to 119902 represents the classindices of the dataset 119908

119895119896represents the weight connected

between node 119895 in the hidden layer and node 119896 in the outputlayer The summation of the product between weights andoutputs from the hidden layer is represented by 119911

119896

21 Parameter Tuning Thealgorithm for parameter tuning ofthe proposedmodel is slightly different from the conventionalalgorithm as Algorithm 1

211 Weight Modification The standard error backpropaga-tion algorithm is used in the backward pass of the proposedmodel Consider the 119896th neuron of the output layer atiteration 119899 the error signal is defined by

119890119896 (119899) = 119889119896 (

119899) minus 119910119900

119896(119899) (5)

where 119890119896(119899) and 119889

119896(119899) represent the error signal and desired

output respectively The output signal of neural 119896 in theoutput layer is represented by 119910119900

119896(119899) The instantaneous error

4 Mathematical Problems in Engineering

While (performance le threshold)Compute the delta values (local gradient values) at the output nodes using (7)Update weights between hidden and output layers using (8)For each input features

If (hidden node connecting to input feature belongs to informative class)Compute the delta values (local gradient values) at the output nodes using (9)Update mean and standard deviation using (10)

ElseRetain original values

End IfEnd For

End While

Algorithm 1

energy value of neuron 119896 is defined as (12)1198902119896(119899) The total

error energy of neuron 119896 can be calculated by

119864 (119899) =

1

2

sum

119896

1198902

119896(119899) (6)

Backpropagating from the output layer the delta value (localgradient value) is defined as

120575119900(119899) = 119890119896

1205931015840(119911119896 (119899)) (7)

where 1205931015840(119911119896(119899)) = 120597119910

119896(119899)120597119911

119896(119899) Given the learning rate

120578119908 the connected weights between the hidden layer and the

output layer are updated by

119908119895119896 (119899 + 1) = 119908119895119896 (

119899) minus 120578119908120575119900(119899) 119910ℎ

119895(119899) (8)

212 Membership Function Parameter Modification In thehidden layer the update process follows the parameter tuningalgorithm displayed at the beginning of this section Onlymembership functions of the informative class are updatedSinceGaussianmembership functions are chosen for the clas-sified informative linguistic feature we update 2 parametersthat is 119888 and 120590 Similar to the output layer we perform thebackpropagation algorithm in this hidden layer with the deltavalue defined as

120575ℎ(119899) = 120593

1015840(120583119894119895 (119899))sum

119896

120575119900(119899) 119908119895119896 (

119899) (9)

The parameters 119888 and 120590 belonging to the informative linguis-tic features at iteration (119899 + 1) are updated by

119888119894119895 (119899 + 1) = 119888119894119895 (

119899) + Δ119888119894119895 (119899)

120590119894119895 (119899 + 1) = 120590119894119895 (

119899) + Δ120590119894119895 (119899)

(10)

where Δ119888119894119895and Δ120590

119894119895are defined by

Δ119888119894119895 (119899) = minus120578119888

120575ℎ(119899) 119910

in119894119895

Δ120590119894119895 (119899) = minus120578120590

120575ℎ(119899) 119910

in119894119895

(11)

Table 1 Genes corresponding to the selected features

Feature index Gene83 MDM487 STX16207 NR1D2355 DCLRE1A450 PARK7546 ATIC931 HG4263-HT4533 at2164 CSRP12360 NASP2479 PGK16264 HLA-DPB1 27043 HLA-A 27052 ITK 27057 PLGLB2

120578119888and 120578

120590are the learning rates for the parameters 119888 and 120590

respectivelyIn Figure 2(a) the Wisconsin breast cancer dataset with

9 original features and 683 data points is used to illustratethe initial membership functions Figure 2(b) shows the illus-trators of all corresponding parameters tuned after trainingFor the DLBCL dataset used in this research the number oforiginal features is too large Therefore we cannot displaythe initial membership functions of all features However themeans and standard deviations of a set of 14 selected featuresrsquoinitial membership functions of are shown later in Table 1

22 Rule Extraction Methods For the typical direct calcula-tion in a neural network the output is 119910119900 as shown in (3)The class decision is simply the class with the correspondingmaximum output For the rule extraction purpose howeverthe weight values are used to verify the importance of thefeatures after the training process In each fold of the 10-fold cross validation after the learning phase the connectionweights between the hidden layer and the output layer aresorted to prioritize the informative features to be describedin more detail below

Mathematical Problems in Engineering 5

SML

SML

SML

Original MF of feature 1 Original MF of feature 2 Original MF of feature 3

Original MF of feature 4

002040608

1

2 4 6 8 1000

02040608

1

2 4 6 8 100

Original MF of feature 5

2 4 6 8 100

Original MF of feature 6

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

2 4 8 100 6

Original MF of feature 9

2 4 6 8 100

Original MF of feature 8

2 4 8 100 6

Original MF of feature 7

2 4 6 8 100 2 4 6 8 100 2 4 6 8 100

(a)

Updated MF of feature 1 Updated MF of feature 2 Updated MF of feature 3

SML

SML

SML

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

002040608

1

002040608

1

0

05

1

0

05

1

2 4 6 8 100

Updated MF of feature 4

2 4 6 8 100

Updated MF of feature 5

2 4 6 8 100

Updated MF of feature 6

2 4 6 8 100

Updated MF of feature 7

2 4 6 8 100

Updated MF of feature 8

2 4 6 8 100

Updated MF of feature 9

2 4 6 8 1002 4 6 8 1002 4 6 8 100

(b)

Figure 2 (a) Initial membership functions of the Wisconsin breast cancer dataset (b) Updated membership functions after training by theneurofuzzy classification model

6 Mathematical Problems in Engineering

Table 2 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features before training for the DLBCL dataset

Initial valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 185636 254651 323665 69014 69014 6901487 200414 255633 310852 55219 55219 55219207 minus164730 minus89712 minus14693 75018 75018 75018355 minus22187 11675 45538 33862 33862 33862450 3384316 4126221 4868126 741905 741905 741905546 1810664 2396104 2981544 585440 585440 585440931 minus148675 minus26558 95558 122117 122117 1221172164 865360 1032987 1200614 167627 167627 1676272360 1107608 1373013 1638418 265405 265405 2654052479 25192 42494 59795 17301 17301 173016264 5419876 6722623 8025370 1302747 1302747 13027477043 11535574 12983805 14432037 1448232 1448232 14482327052 259998 393052 526106 133054 133054 1330547057 422074 516844 611614 94770 94770 94770

The algorithm selects the informative linguistic featurestwice The first selection is done by considering the bipolaroutput from the hidden layer The linguistic feature withoutput of +1 is considered an informative featureThe secondselection is done by considering the weight values betweenthe hidden layer and the output layer The larger weightvalue indicates the more informative feature Consider theproposed network for logical rule extraction the IF part isextracted from the hidden layer As mentioned previouslywe use 3 membership functions representing the linguisticterms SM L for each original feature The summation ofthe product of weights and output from the hidden layer isinterpreted to the logical OR in the extracted rules The finalclassified class from the output layer is interpreted to THENin the classification rules After finishing the training phasethe weights are sorted We use the final values of weights toselect the ldquoTop119873rdquo informative features

From the structure described earlier the rule extractedfrom our network can be interpreted by 2 approaches Bothapproaches combine all conditions to only 1 rule The firstapproach is the ldquosimple ORrdquo rule All 119873 features are used tocreate a simple OR rule of each class for example when a 2-class problem is assumed and119873 is set to 3 Considering class 1if the first informative order is ldquoFeature 1 is Largerdquo the secondinformative order is ldquoFeature 5 is Mediumrdquo and the thirdinformative order is ldquoFeature 3 is Smallrdquo Considering class2 if the first informative order is ldquoFeature 10 is Smallrdquo thesecond informative order is ldquoFeature 5 is Smallrdquo and the thirdinformative order is ldquoFeature 6 is Largerdquo In case of the classorder ldquoClass 1 then Class 2rdquo the rule automatically generatedfrom the simple OR approach of our proposed method is asRule 1

In case of the class order ldquoClass 2 then Class 1rdquo the rulewill be slightly modified to Rule 2

The second approach creates a rule with the considerationof the order of informative linguistic features and class orderWe call this approach the ldquolayeredrdquo rule All N linguisticfeatures are created with the consideration of the order of

informative features and class order In case of the class orderldquoClass 1 then Class 2rdquo the rule automatically generated fromthe layered approach for the same scenario as above is asRule 3

In case of the class order ldquoClass 2 then Class 1rdquo theextracted rule will be Rule 4

23 Neurofuzzy Method with Feature Selection and RuleExtraction for High Dimensional Dataset via Iterative Par-tition Method An iterative partition method is designedfor the application on a dataset that has a large amount offeatures The idea is to partition the entire set of featuresinto subclusters Each cluster is used in informative featureselection The structure of the algorithm is displayed inFigure 3The first step in the algorithm is to define the desirednumber of features (119865) to be used in the final stepTheoriginaldataset is partitioned into 119899 subset Each subset is used asan input to the neurofuzzy method All selected features arethen combined to create the dataset with selected featuresThe partitioning and feature selection is iteratively performeduntil the desired number of informative features is achieved

3 Experimental Results and Discussion

The diffuse large B-cell lymphomas (DLBCL) dataset con-sisting of 77 microarray experiments with 7070 gene expres-sion levels [27] was utilized in this research It was madeavailable to the public at httpwwwbroadinstituteorgcgi-bincancerdatasetscgi There are two classes in thedataset that is diffuse large B-cell lymphoma (DLBCL) isreferred to as class 1 and follicular lymphoma (FL) is referredto as class 2 These 2 types of B-cell lineage malignancieshave very different clinical presentations natural historiesand response to therapy [27] Because DLBCLs are the mostcommon lymphoid malignancy in adults a method thatcan efficiently classify these 2 lymphomas is therefore very

Mathematical Problems in Engineering 7

Table 3 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features after training for the DLBCL dataset

Final valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 42217 244422 448788 70018 68702 6864087 89961 244306 401299 55559 53710 53294207 minus316059 minus88806 138408 73797 75289 75806355 minus84419 12311 109255 31586 31900 37146450 1811843 4047102 6272187 746321 748685 750148546 594454 2361230 4125430 590479 591615 595343931 minus404615 minus26100 348347 124371 124241 1257062164 552324 1048629 1545862 168960 168791 1736262360 566830 1351743 2133887 262543 263800 2652772479 minus10591 41380 95801 17087 18011 233556264 2972543 6890328 10810781 1308614 1309833 13083257043 8683574 12940488 17192838 1424231 1424378 14229027052 minus21680 387083 797666 137781 137389 1395187057 217169 507375 803842 100208 102060 98289

No

Original dataset

Sub 1 Sub 2 Sub 3 Sub 4 Sub n

Dataset with selected features

Partition into n subsets

Final dataset with selected features

Yes

middot middot middot

features le F

Figure 3 Iterative partition method for neurofuzzy method withfeature selection and rule extraction for large dataset

desirable In the dataset there are 58 samples of DLBCL classand 19 samples of FL class

The selected informative features from the learning phaseare used for classification task in both direct calculationand logical rule The 10-fold cross validation is performedin the experiments The results displayed are the averageresults on validation sets over 10 cross validations Duringthe training step because the number of features was toolarge the original features were divided into small subsets

rather than using the entire 7070 features at once The 10-fold cross validation was performed on each subset Theinformative features from each subset were selected and formthe final informative features by combining them togetherThe learning rate for weight updating was set to 01 and thelearning rates used in updating 119888 and 120590 were set to 0001

31 Classification Results on DLBCL Microarrays by DirectCalculation Using Selected Features We tried several choicesof the number of selected features (119873) and found that119873 = 14 was adequate for this problem After training 14features were automatically selected by our method The setof features that yielded the best results on validation setsamong those in 10-fold cross validation consisted of features83 (MDM4) 87 (STX16) 207 (NR1D2) 355 (DCLRE1A)450 (PARK7) 546 (ATIC) 931 (HG4263-HT4533 at) 2164(CSRP1) 2360 (NASP) 2479 (PGK1) 6264 (HLA-DPB1 2)7043 (HLA-A 2) 7052 (ITK 2) and 7057 (PLGLB2) Thegenes corresponding to the selected features are shown inTable 1 The values of means and standard deviations of allmembership functions of the 14 selected features before andafter training are shown in Tables 2 and 3 respectively

The classification rates on the validation sets of 10-foldcross validation achieved by using the direct calculation were9221 8961 8442 and 8442 when the numbersof selected linguistic features were set to 14 10 5 and 3respectively These results show that the proposed methodcan select a set of informative features out of a huge pool offeatures As shown in Table 4 the classification performanceis comparable to those performed by previously proposedmethods [26 27] However rather than using random initialweights connecting between the hidden layer and the outputlayer we used the weights achieved in the current crossvalidation to be the initial weights for the next cross valida-tion This constrained weight initialization yielded 100009740 9091 and 9221using the direct calculation whenthe numbers of selected linguistic features were set to 14 10

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

2 Mathematical Problems in Engineering

linguistic variables in different rules [15] The logical ruleextraction from data was proposed with an assumption thata set of symbolic or continuous valued predicate functionshas been defined for some objects thus providing values offeatures for categorization of these objects [16] Nice examplesof neurofuzzy methods for a medical prediction problem anda biometric classification problem can be found in [17] and[18] respectively Nevertheless to discover proper linguisticfeatures for continuous data representation in these methodsis the tradeoff between simplicity and accuracy

Due to linguistic feature requirement in some situationsdepending on classification models input features are 119898times increased corresponding to 119898 linguistic terms foreach original feature This problem is more serious in highdimensional datasets The high dimensional feature vectorsmay contain noninformative features and feature redundancythat in turn can cause unnecessary computation cost anddifficulty in creating classification rules Therefore an algo-rithm for informative feature selection must be developedAlthough some neurofuzzy methods were utilized for featureselection in high dimensional datasets they are not widelyused On the other hand there are several research workson the use of neurofuzzy methods as a classifier [19 20]In [21] a neurofuzzy method was utilized to select goodfeatures by utilizing a relationship between fuzzy criteria In[22] a neurofuzzy scheme which combines neural networksand fuzzy rule base systems was proposed for simultaneousfeature selection and fuzzy rule-based classification Therewere three subprocesses in the learning phase The networkstructure was changed in phases 2 and 3 and the param-eters of membership functions were fine-tuned in phase 3Although we have similar purpose with [22] the networkstructures and learning methods are different In additionour method can automatically create linguistic features andall parameters are modified automatically in one learningphase without changing the network structure or retrainingnetwork while in [22] there is more than one learning phaseand that algorithm cannot fine-tune parameters withoutretraining network with different structure

We initially proposed a neurofuzzy method that couldselect features and simultaneously create rules for low-dimensional datasets [23 24] Although the method in thispaper is similar the training process for a high dimensionaldataset in this paper is different To emphasize its usefulnessChen and Lin have adopted our method into skin colordetection [25] With the consideration of the uncomplicatednetwork structure the main components including linguisticfeature creation and tuning algorithm feature selectionand rule-based classification were embedded in one neuralnetwork mechanism The problem of using linguistic featureis to define a particular linguistic set for each feature in agiven datasetThe fuzzymembership functionwas embeddedin our network for linguistic feature creation rather thanusing the ordinary feature The reason of this combinationmodel is the applicability of the neural networkrsquos learn-ing algorithms developed for fuzzy membership functionThe original features were transformed to linguistic fea-tures and then classified to informative and noninformativeclasses that is either +1 or minus1 Features with high weight

values referred to as the informative features were thereforeselected

In this paper we investigate the usefulness of our pro-posed neurofuzzy method by applying it to the high dimen-sional diffuse large B-cell lymphomas (DLBCL) microar-ray classification problem The number of features in thismicroarray dataset (7070 features) is much larger than whatwe have tried previously Moreover the number of samplesis very small (77 samples) Therefore this problem is verychallenging and it is interesting to see whether our methodwould work in this dataset with huge number of features butvery small number of samples The findings of informativefeatures and rules will be useful in diagnosis of this kind ofcancer The results will also indicate the generalization of ourmethod

This paper is organized as follows A neurofuzzy methodwith feature selection and rule extraction and its trainingscheme designed for a high dimensional dataset is describedin Section 2 The experimental setup data descriptionexperimental results and discussion are given in Section 3Section 4 concludes the paper

2 Neurofuzzy Method with FeatureSelection and Rule Extraction

Three requirements are concerned in the design of a neuralnetwork structure for rule extraction Less complication ofnetwork structure is the first requirement Therefore onlythree layers (an input a hidden and an output layers)in a neural network are constructed Consistent with thefirst requirement the small number of linguistic variablesis the second requirement The set of linguistic termssmallmedium large is sufficiently understood The finalrequirement is that there is only one best combination ruleused for classification The combination rule is created withthe consideration of the class order described in the nextsection

The neural network designed based on the aforemen-tioned requirements is displayed in Figure 1 The originalfeatures are fed forward to the input layer Each originalfeature is reproduced 119898 times corresponding to the numberof specified linguistic terms and used as the input to thehidden layer Instead of using the static linguistic featurefrom preprocessing we add the hidden layer with fuzzy logicmembership functions A Gaussian membership functionis used to represent membership value of each group Inaddition to weight updating equations the modifications ofthe center 119888 and the spread 120590 are required during the trainingprocess The second layer is the combination of fuzzificationand informative linguistic feature classification The class oflinguistic features is decided and fed as the input to the outputlayer

The number of nodes constructed in the input layer is thesame number of original input features For the forward passeach node is reproduced119898 times In our structure119898 is equalto 3 The number of outputs from this layer is therefore tripleof that of the original input features and represented by

119910in119894119895= 119909119894 (1)

Mathematical Problems in Engineering 3

x1

x2

x3

xp

S

M

L

S

M

L

S

M

L

S

M

L

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

O1

O2

Oq

Input layer Hidden layer Output layer

Figure 1 Our neurofuzzy classification model

where 119894 = 1 2 119901 denotes the order number of the originalinput and 119895 = 1 2 3 denotes the order number of thelinguistic feature created for each original input Between theinput layer and hidden layer all connection weights are set tounity

The next layer is the hidden layer In addition to calcu-lating fuzzy values corresponding to the specified linguisticsof the original input the maximum membership value isclassified to the informative class Since we use 3 linguisticterms that is small (S) medium (M) and large (L) foreach original feature the number of nodes constructed inthis hidden layer is 3 times of that of the input layer Eachnode uses Gaussian membership function to specify themembership values of the original feature that is

120583119894119895= 119890(minus(12)(119910

in119894119895minus119888119895)21205902

119895) (2)

120583119894119895is the membership value of 119910in

119894119895which is the original input

119894 from the linguistic node 119895 Each node in this layer has two

parameters to consider The first parameter is the spread 120590119894119895

for each original feature 119894 The initial values of 120590119894119895for 119895 = S

M and L come from the spreads divided by 3 (1205901198941198953) of all

data points in linguistic term 119895 The second parameter is thecenter 119888

119894119895 The initial values of 119888

119894119895for 119895 = S M and L are set to

(119888119894119895minus(1205901198941198953)) 119888

119894119895 and (119888

119894119895+(1205901198941198953)) respectively where 119888

119894119895is the

center of the membership set of linguistic term 119895 of originalfeature 119894

For each original input the most informative linguisticfeature is defined as the feature with maximum membershipvalue corresponding to the input value 119910in

119894119895 The parameters

119888119895and 120590

119895which are mean and standard deviation of only the

most informative linguistic variable are modified Considertwo linguistic terms (classes) of 119878 (the most informativelinguistic term) and 119879 (the noninformative linguistic term)the output 119910ℎ

119894119895of the hidden layer is equal to +1 if the input

119910in119894119895

belongs to class 119878 Alternatively 119910ℎ119894119895is equal to minus1 if

the input belongs to class 119879 for 119895 = 1 2 3 Therefore theoutput from the hidden layer is plusmn1 Each original featurehas one informative linguistic term represented by +1 andtwo noninformative linguistic terms represented by minus1 Alloutputs with identified informative values are used as inputto the final layer

In the output layer the number of nodes is equal to thenumber of the classes in the dataset Weights are fully con-nected between this layer and hidden layer Hence we utilizethis layer for feature selection purpose The importance oflinguistic features is specified by the corresponding weightvalues Sigmoid function is used as the activation function inthis layer Therefore the output is

119910119900

119896= 120593 (119911

119896) =

1

(1 + 119890minus119911119896)

(3)

where

119911119896= sum

119895

119908119895119896119910ℎ

119895 (4)

The subscript 119896 ordering from 1 to 119902 represents the classindices of the dataset 119908

119895119896represents the weight connected

between node 119895 in the hidden layer and node 119896 in the outputlayer The summation of the product between weights andoutputs from the hidden layer is represented by 119911

119896

21 Parameter Tuning Thealgorithm for parameter tuning ofthe proposedmodel is slightly different from the conventionalalgorithm as Algorithm 1

211 Weight Modification The standard error backpropaga-tion algorithm is used in the backward pass of the proposedmodel Consider the 119896th neuron of the output layer atiteration 119899 the error signal is defined by

119890119896 (119899) = 119889119896 (

119899) minus 119910119900

119896(119899) (5)

where 119890119896(119899) and 119889

119896(119899) represent the error signal and desired

output respectively The output signal of neural 119896 in theoutput layer is represented by 119910119900

119896(119899) The instantaneous error

4 Mathematical Problems in Engineering

While (performance le threshold)Compute the delta values (local gradient values) at the output nodes using (7)Update weights between hidden and output layers using (8)For each input features

If (hidden node connecting to input feature belongs to informative class)Compute the delta values (local gradient values) at the output nodes using (9)Update mean and standard deviation using (10)

ElseRetain original values

End IfEnd For

End While

Algorithm 1

energy value of neuron 119896 is defined as (12)1198902119896(119899) The total

error energy of neuron 119896 can be calculated by

119864 (119899) =

1

2

sum

119896

1198902

119896(119899) (6)

Backpropagating from the output layer the delta value (localgradient value) is defined as

120575119900(119899) = 119890119896

1205931015840(119911119896 (119899)) (7)

where 1205931015840(119911119896(119899)) = 120597119910

119896(119899)120597119911

119896(119899) Given the learning rate

120578119908 the connected weights between the hidden layer and the

output layer are updated by

119908119895119896 (119899 + 1) = 119908119895119896 (

119899) minus 120578119908120575119900(119899) 119910ℎ

119895(119899) (8)

212 Membership Function Parameter Modification In thehidden layer the update process follows the parameter tuningalgorithm displayed at the beginning of this section Onlymembership functions of the informative class are updatedSinceGaussianmembership functions are chosen for the clas-sified informative linguistic feature we update 2 parametersthat is 119888 and 120590 Similar to the output layer we perform thebackpropagation algorithm in this hidden layer with the deltavalue defined as

120575ℎ(119899) = 120593

1015840(120583119894119895 (119899))sum

119896

120575119900(119899) 119908119895119896 (

119899) (9)

The parameters 119888 and 120590 belonging to the informative linguis-tic features at iteration (119899 + 1) are updated by

119888119894119895 (119899 + 1) = 119888119894119895 (

119899) + Δ119888119894119895 (119899)

120590119894119895 (119899 + 1) = 120590119894119895 (

119899) + Δ120590119894119895 (119899)

(10)

where Δ119888119894119895and Δ120590

119894119895are defined by

Δ119888119894119895 (119899) = minus120578119888

120575ℎ(119899) 119910

in119894119895

Δ120590119894119895 (119899) = minus120578120590

120575ℎ(119899) 119910

in119894119895

(11)

Table 1 Genes corresponding to the selected features

Feature index Gene83 MDM487 STX16207 NR1D2355 DCLRE1A450 PARK7546 ATIC931 HG4263-HT4533 at2164 CSRP12360 NASP2479 PGK16264 HLA-DPB1 27043 HLA-A 27052 ITK 27057 PLGLB2

120578119888and 120578

120590are the learning rates for the parameters 119888 and 120590

respectivelyIn Figure 2(a) the Wisconsin breast cancer dataset with

9 original features and 683 data points is used to illustratethe initial membership functions Figure 2(b) shows the illus-trators of all corresponding parameters tuned after trainingFor the DLBCL dataset used in this research the number oforiginal features is too large Therefore we cannot displaythe initial membership functions of all features However themeans and standard deviations of a set of 14 selected featuresrsquoinitial membership functions of are shown later in Table 1

22 Rule Extraction Methods For the typical direct calcula-tion in a neural network the output is 119910119900 as shown in (3)The class decision is simply the class with the correspondingmaximum output For the rule extraction purpose howeverthe weight values are used to verify the importance of thefeatures after the training process In each fold of the 10-fold cross validation after the learning phase the connectionweights between the hidden layer and the output layer aresorted to prioritize the informative features to be describedin more detail below

Mathematical Problems in Engineering 5

SML

SML

SML

Original MF of feature 1 Original MF of feature 2 Original MF of feature 3

Original MF of feature 4

002040608

1

2 4 6 8 1000

02040608

1

2 4 6 8 100

Original MF of feature 5

2 4 6 8 100

Original MF of feature 6

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

2 4 8 100 6

Original MF of feature 9

2 4 6 8 100

Original MF of feature 8

2 4 8 100 6

Original MF of feature 7

2 4 6 8 100 2 4 6 8 100 2 4 6 8 100

(a)

Updated MF of feature 1 Updated MF of feature 2 Updated MF of feature 3

SML

SML

SML

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

002040608

1

002040608

1

0

05

1

0

05

1

2 4 6 8 100

Updated MF of feature 4

2 4 6 8 100

Updated MF of feature 5

2 4 6 8 100

Updated MF of feature 6

2 4 6 8 100

Updated MF of feature 7

2 4 6 8 100

Updated MF of feature 8

2 4 6 8 100

Updated MF of feature 9

2 4 6 8 1002 4 6 8 1002 4 6 8 100

(b)

Figure 2 (a) Initial membership functions of the Wisconsin breast cancer dataset (b) Updated membership functions after training by theneurofuzzy classification model

6 Mathematical Problems in Engineering

Table 2 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features before training for the DLBCL dataset

Initial valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 185636 254651 323665 69014 69014 6901487 200414 255633 310852 55219 55219 55219207 minus164730 minus89712 minus14693 75018 75018 75018355 minus22187 11675 45538 33862 33862 33862450 3384316 4126221 4868126 741905 741905 741905546 1810664 2396104 2981544 585440 585440 585440931 minus148675 minus26558 95558 122117 122117 1221172164 865360 1032987 1200614 167627 167627 1676272360 1107608 1373013 1638418 265405 265405 2654052479 25192 42494 59795 17301 17301 173016264 5419876 6722623 8025370 1302747 1302747 13027477043 11535574 12983805 14432037 1448232 1448232 14482327052 259998 393052 526106 133054 133054 1330547057 422074 516844 611614 94770 94770 94770

The algorithm selects the informative linguistic featurestwice The first selection is done by considering the bipolaroutput from the hidden layer The linguistic feature withoutput of +1 is considered an informative featureThe secondselection is done by considering the weight values betweenthe hidden layer and the output layer The larger weightvalue indicates the more informative feature Consider theproposed network for logical rule extraction the IF part isextracted from the hidden layer As mentioned previouslywe use 3 membership functions representing the linguisticterms SM L for each original feature The summation ofthe product of weights and output from the hidden layer isinterpreted to the logical OR in the extracted rules The finalclassified class from the output layer is interpreted to THENin the classification rules After finishing the training phasethe weights are sorted We use the final values of weights toselect the ldquoTop119873rdquo informative features

From the structure described earlier the rule extractedfrom our network can be interpreted by 2 approaches Bothapproaches combine all conditions to only 1 rule The firstapproach is the ldquosimple ORrdquo rule All 119873 features are used tocreate a simple OR rule of each class for example when a 2-class problem is assumed and119873 is set to 3 Considering class 1if the first informative order is ldquoFeature 1 is Largerdquo the secondinformative order is ldquoFeature 5 is Mediumrdquo and the thirdinformative order is ldquoFeature 3 is Smallrdquo Considering class2 if the first informative order is ldquoFeature 10 is Smallrdquo thesecond informative order is ldquoFeature 5 is Smallrdquo and the thirdinformative order is ldquoFeature 6 is Largerdquo In case of the classorder ldquoClass 1 then Class 2rdquo the rule automatically generatedfrom the simple OR approach of our proposed method is asRule 1

In case of the class order ldquoClass 2 then Class 1rdquo the rulewill be slightly modified to Rule 2

The second approach creates a rule with the considerationof the order of informative linguistic features and class orderWe call this approach the ldquolayeredrdquo rule All N linguisticfeatures are created with the consideration of the order of

informative features and class order In case of the class orderldquoClass 1 then Class 2rdquo the rule automatically generated fromthe layered approach for the same scenario as above is asRule 3

In case of the class order ldquoClass 2 then Class 1rdquo theextracted rule will be Rule 4

23 Neurofuzzy Method with Feature Selection and RuleExtraction for High Dimensional Dataset via Iterative Par-tition Method An iterative partition method is designedfor the application on a dataset that has a large amount offeatures The idea is to partition the entire set of featuresinto subclusters Each cluster is used in informative featureselection The structure of the algorithm is displayed inFigure 3The first step in the algorithm is to define the desirednumber of features (119865) to be used in the final stepTheoriginaldataset is partitioned into 119899 subset Each subset is used asan input to the neurofuzzy method All selected features arethen combined to create the dataset with selected featuresThe partitioning and feature selection is iteratively performeduntil the desired number of informative features is achieved

3 Experimental Results and Discussion

The diffuse large B-cell lymphomas (DLBCL) dataset con-sisting of 77 microarray experiments with 7070 gene expres-sion levels [27] was utilized in this research It was madeavailable to the public at httpwwwbroadinstituteorgcgi-bincancerdatasetscgi There are two classes in thedataset that is diffuse large B-cell lymphoma (DLBCL) isreferred to as class 1 and follicular lymphoma (FL) is referredto as class 2 These 2 types of B-cell lineage malignancieshave very different clinical presentations natural historiesand response to therapy [27] Because DLBCLs are the mostcommon lymphoid malignancy in adults a method thatcan efficiently classify these 2 lymphomas is therefore very

Mathematical Problems in Engineering 7

Table 3 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features after training for the DLBCL dataset

Final valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 42217 244422 448788 70018 68702 6864087 89961 244306 401299 55559 53710 53294207 minus316059 minus88806 138408 73797 75289 75806355 minus84419 12311 109255 31586 31900 37146450 1811843 4047102 6272187 746321 748685 750148546 594454 2361230 4125430 590479 591615 595343931 minus404615 minus26100 348347 124371 124241 1257062164 552324 1048629 1545862 168960 168791 1736262360 566830 1351743 2133887 262543 263800 2652772479 minus10591 41380 95801 17087 18011 233556264 2972543 6890328 10810781 1308614 1309833 13083257043 8683574 12940488 17192838 1424231 1424378 14229027052 minus21680 387083 797666 137781 137389 1395187057 217169 507375 803842 100208 102060 98289

No

Original dataset

Sub 1 Sub 2 Sub 3 Sub 4 Sub n

Dataset with selected features

Partition into n subsets

Final dataset with selected features

Yes

middot middot middot

features le F

Figure 3 Iterative partition method for neurofuzzy method withfeature selection and rule extraction for large dataset

desirable In the dataset there are 58 samples of DLBCL classand 19 samples of FL class

The selected informative features from the learning phaseare used for classification task in both direct calculationand logical rule The 10-fold cross validation is performedin the experiments The results displayed are the averageresults on validation sets over 10 cross validations Duringthe training step because the number of features was toolarge the original features were divided into small subsets

rather than using the entire 7070 features at once The 10-fold cross validation was performed on each subset Theinformative features from each subset were selected and formthe final informative features by combining them togetherThe learning rate for weight updating was set to 01 and thelearning rates used in updating 119888 and 120590 were set to 0001

31 Classification Results on DLBCL Microarrays by DirectCalculation Using Selected Features We tried several choicesof the number of selected features (119873) and found that119873 = 14 was adequate for this problem After training 14features were automatically selected by our method The setof features that yielded the best results on validation setsamong those in 10-fold cross validation consisted of features83 (MDM4) 87 (STX16) 207 (NR1D2) 355 (DCLRE1A)450 (PARK7) 546 (ATIC) 931 (HG4263-HT4533 at) 2164(CSRP1) 2360 (NASP) 2479 (PGK1) 6264 (HLA-DPB1 2)7043 (HLA-A 2) 7052 (ITK 2) and 7057 (PLGLB2) Thegenes corresponding to the selected features are shown inTable 1 The values of means and standard deviations of allmembership functions of the 14 selected features before andafter training are shown in Tables 2 and 3 respectively

The classification rates on the validation sets of 10-foldcross validation achieved by using the direct calculation were9221 8961 8442 and 8442 when the numbersof selected linguistic features were set to 14 10 5 and 3respectively These results show that the proposed methodcan select a set of informative features out of a huge pool offeatures As shown in Table 4 the classification performanceis comparable to those performed by previously proposedmethods [26 27] However rather than using random initialweights connecting between the hidden layer and the outputlayer we used the weights achieved in the current crossvalidation to be the initial weights for the next cross valida-tion This constrained weight initialization yielded 100009740 9091 and 9221using the direct calculation whenthe numbers of selected linguistic features were set to 14 10

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

Mathematical Problems in Engineering 3

x1

x2

x3

xp

S

M

L

S

M

L

S

M

L

S

M

L

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

plusmn1

O1

O2

Oq

Input layer Hidden layer Output layer

Figure 1 Our neurofuzzy classification model

where 119894 = 1 2 119901 denotes the order number of the originalinput and 119895 = 1 2 3 denotes the order number of thelinguistic feature created for each original input Between theinput layer and hidden layer all connection weights are set tounity

The next layer is the hidden layer In addition to calcu-lating fuzzy values corresponding to the specified linguisticsof the original input the maximum membership value isclassified to the informative class Since we use 3 linguisticterms that is small (S) medium (M) and large (L) foreach original feature the number of nodes constructed inthis hidden layer is 3 times of that of the input layer Eachnode uses Gaussian membership function to specify themembership values of the original feature that is

120583119894119895= 119890(minus(12)(119910

in119894119895minus119888119895)21205902

119895) (2)

120583119894119895is the membership value of 119910in

119894119895which is the original input

119894 from the linguistic node 119895 Each node in this layer has two

parameters to consider The first parameter is the spread 120590119894119895

for each original feature 119894 The initial values of 120590119894119895for 119895 = S

M and L come from the spreads divided by 3 (1205901198941198953) of all

data points in linguistic term 119895 The second parameter is thecenter 119888

119894119895 The initial values of 119888

119894119895for 119895 = S M and L are set to

(119888119894119895minus(1205901198941198953)) 119888

119894119895 and (119888

119894119895+(1205901198941198953)) respectively where 119888

119894119895is the

center of the membership set of linguistic term 119895 of originalfeature 119894

For each original input the most informative linguisticfeature is defined as the feature with maximum membershipvalue corresponding to the input value 119910in

119894119895 The parameters

119888119895and 120590

119895which are mean and standard deviation of only the

most informative linguistic variable are modified Considertwo linguistic terms (classes) of 119878 (the most informativelinguistic term) and 119879 (the noninformative linguistic term)the output 119910ℎ

119894119895of the hidden layer is equal to +1 if the input

119910in119894119895

belongs to class 119878 Alternatively 119910ℎ119894119895is equal to minus1 if

the input belongs to class 119879 for 119895 = 1 2 3 Therefore theoutput from the hidden layer is plusmn1 Each original featurehas one informative linguistic term represented by +1 andtwo noninformative linguistic terms represented by minus1 Alloutputs with identified informative values are used as inputto the final layer

In the output layer the number of nodes is equal to thenumber of the classes in the dataset Weights are fully con-nected between this layer and hidden layer Hence we utilizethis layer for feature selection purpose The importance oflinguistic features is specified by the corresponding weightvalues Sigmoid function is used as the activation function inthis layer Therefore the output is

119910119900

119896= 120593 (119911

119896) =

1

(1 + 119890minus119911119896)

(3)

where

119911119896= sum

119895

119908119895119896119910ℎ

119895 (4)

The subscript 119896 ordering from 1 to 119902 represents the classindices of the dataset 119908

119895119896represents the weight connected

between node 119895 in the hidden layer and node 119896 in the outputlayer The summation of the product between weights andoutputs from the hidden layer is represented by 119911

119896

21 Parameter Tuning Thealgorithm for parameter tuning ofthe proposedmodel is slightly different from the conventionalalgorithm as Algorithm 1

211 Weight Modification The standard error backpropaga-tion algorithm is used in the backward pass of the proposedmodel Consider the 119896th neuron of the output layer atiteration 119899 the error signal is defined by

119890119896 (119899) = 119889119896 (

119899) minus 119910119900

119896(119899) (5)

where 119890119896(119899) and 119889

119896(119899) represent the error signal and desired

output respectively The output signal of neural 119896 in theoutput layer is represented by 119910119900

119896(119899) The instantaneous error

4 Mathematical Problems in Engineering

While (performance le threshold)Compute the delta values (local gradient values) at the output nodes using (7)Update weights between hidden and output layers using (8)For each input features

If (hidden node connecting to input feature belongs to informative class)Compute the delta values (local gradient values) at the output nodes using (9)Update mean and standard deviation using (10)

ElseRetain original values

End IfEnd For

End While

Algorithm 1

energy value of neuron 119896 is defined as (12)1198902119896(119899) The total

error energy of neuron 119896 can be calculated by

119864 (119899) =

1

2

sum

119896

1198902

119896(119899) (6)

Backpropagating from the output layer the delta value (localgradient value) is defined as

120575119900(119899) = 119890119896

1205931015840(119911119896 (119899)) (7)

where 1205931015840(119911119896(119899)) = 120597119910

119896(119899)120597119911

119896(119899) Given the learning rate

120578119908 the connected weights between the hidden layer and the

output layer are updated by

119908119895119896 (119899 + 1) = 119908119895119896 (

119899) minus 120578119908120575119900(119899) 119910ℎ

119895(119899) (8)

212 Membership Function Parameter Modification In thehidden layer the update process follows the parameter tuningalgorithm displayed at the beginning of this section Onlymembership functions of the informative class are updatedSinceGaussianmembership functions are chosen for the clas-sified informative linguistic feature we update 2 parametersthat is 119888 and 120590 Similar to the output layer we perform thebackpropagation algorithm in this hidden layer with the deltavalue defined as

120575ℎ(119899) = 120593

1015840(120583119894119895 (119899))sum

119896

120575119900(119899) 119908119895119896 (

119899) (9)

The parameters 119888 and 120590 belonging to the informative linguis-tic features at iteration (119899 + 1) are updated by

119888119894119895 (119899 + 1) = 119888119894119895 (

119899) + Δ119888119894119895 (119899)

120590119894119895 (119899 + 1) = 120590119894119895 (

119899) + Δ120590119894119895 (119899)

(10)

where Δ119888119894119895and Δ120590

119894119895are defined by

Δ119888119894119895 (119899) = minus120578119888

120575ℎ(119899) 119910

in119894119895

Δ120590119894119895 (119899) = minus120578120590

120575ℎ(119899) 119910

in119894119895

(11)

Table 1 Genes corresponding to the selected features

Feature index Gene83 MDM487 STX16207 NR1D2355 DCLRE1A450 PARK7546 ATIC931 HG4263-HT4533 at2164 CSRP12360 NASP2479 PGK16264 HLA-DPB1 27043 HLA-A 27052 ITK 27057 PLGLB2

120578119888and 120578

120590are the learning rates for the parameters 119888 and 120590

respectivelyIn Figure 2(a) the Wisconsin breast cancer dataset with

9 original features and 683 data points is used to illustratethe initial membership functions Figure 2(b) shows the illus-trators of all corresponding parameters tuned after trainingFor the DLBCL dataset used in this research the number oforiginal features is too large Therefore we cannot displaythe initial membership functions of all features However themeans and standard deviations of a set of 14 selected featuresrsquoinitial membership functions of are shown later in Table 1

22 Rule Extraction Methods For the typical direct calcula-tion in a neural network the output is 119910119900 as shown in (3)The class decision is simply the class with the correspondingmaximum output For the rule extraction purpose howeverthe weight values are used to verify the importance of thefeatures after the training process In each fold of the 10-fold cross validation after the learning phase the connectionweights between the hidden layer and the output layer aresorted to prioritize the informative features to be describedin more detail below

Mathematical Problems in Engineering 5

SML

SML

SML

Original MF of feature 1 Original MF of feature 2 Original MF of feature 3

Original MF of feature 4

002040608

1

2 4 6 8 1000

02040608

1

2 4 6 8 100

Original MF of feature 5

2 4 6 8 100

Original MF of feature 6

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

2 4 8 100 6

Original MF of feature 9

2 4 6 8 100

Original MF of feature 8

2 4 8 100 6

Original MF of feature 7

2 4 6 8 100 2 4 6 8 100 2 4 6 8 100

(a)

Updated MF of feature 1 Updated MF of feature 2 Updated MF of feature 3

SML

SML

SML

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

002040608

1

002040608

1

0

05

1

0

05

1

2 4 6 8 100

Updated MF of feature 4

2 4 6 8 100

Updated MF of feature 5

2 4 6 8 100

Updated MF of feature 6

2 4 6 8 100

Updated MF of feature 7

2 4 6 8 100

Updated MF of feature 8

2 4 6 8 100

Updated MF of feature 9

2 4 6 8 1002 4 6 8 1002 4 6 8 100

(b)

Figure 2 (a) Initial membership functions of the Wisconsin breast cancer dataset (b) Updated membership functions after training by theneurofuzzy classification model

6 Mathematical Problems in Engineering

Table 2 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features before training for the DLBCL dataset

Initial valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 185636 254651 323665 69014 69014 6901487 200414 255633 310852 55219 55219 55219207 minus164730 minus89712 minus14693 75018 75018 75018355 minus22187 11675 45538 33862 33862 33862450 3384316 4126221 4868126 741905 741905 741905546 1810664 2396104 2981544 585440 585440 585440931 minus148675 minus26558 95558 122117 122117 1221172164 865360 1032987 1200614 167627 167627 1676272360 1107608 1373013 1638418 265405 265405 2654052479 25192 42494 59795 17301 17301 173016264 5419876 6722623 8025370 1302747 1302747 13027477043 11535574 12983805 14432037 1448232 1448232 14482327052 259998 393052 526106 133054 133054 1330547057 422074 516844 611614 94770 94770 94770

The algorithm selects the informative linguistic featurestwice The first selection is done by considering the bipolaroutput from the hidden layer The linguistic feature withoutput of +1 is considered an informative featureThe secondselection is done by considering the weight values betweenthe hidden layer and the output layer The larger weightvalue indicates the more informative feature Consider theproposed network for logical rule extraction the IF part isextracted from the hidden layer As mentioned previouslywe use 3 membership functions representing the linguisticterms SM L for each original feature The summation ofthe product of weights and output from the hidden layer isinterpreted to the logical OR in the extracted rules The finalclassified class from the output layer is interpreted to THENin the classification rules After finishing the training phasethe weights are sorted We use the final values of weights toselect the ldquoTop119873rdquo informative features

From the structure described earlier the rule extractedfrom our network can be interpreted by 2 approaches Bothapproaches combine all conditions to only 1 rule The firstapproach is the ldquosimple ORrdquo rule All 119873 features are used tocreate a simple OR rule of each class for example when a 2-class problem is assumed and119873 is set to 3 Considering class 1if the first informative order is ldquoFeature 1 is Largerdquo the secondinformative order is ldquoFeature 5 is Mediumrdquo and the thirdinformative order is ldquoFeature 3 is Smallrdquo Considering class2 if the first informative order is ldquoFeature 10 is Smallrdquo thesecond informative order is ldquoFeature 5 is Smallrdquo and the thirdinformative order is ldquoFeature 6 is Largerdquo In case of the classorder ldquoClass 1 then Class 2rdquo the rule automatically generatedfrom the simple OR approach of our proposed method is asRule 1

In case of the class order ldquoClass 2 then Class 1rdquo the rulewill be slightly modified to Rule 2

The second approach creates a rule with the considerationof the order of informative linguistic features and class orderWe call this approach the ldquolayeredrdquo rule All N linguisticfeatures are created with the consideration of the order of

informative features and class order In case of the class orderldquoClass 1 then Class 2rdquo the rule automatically generated fromthe layered approach for the same scenario as above is asRule 3

In case of the class order ldquoClass 2 then Class 1rdquo theextracted rule will be Rule 4

23 Neurofuzzy Method with Feature Selection and RuleExtraction for High Dimensional Dataset via Iterative Par-tition Method An iterative partition method is designedfor the application on a dataset that has a large amount offeatures The idea is to partition the entire set of featuresinto subclusters Each cluster is used in informative featureselection The structure of the algorithm is displayed inFigure 3The first step in the algorithm is to define the desirednumber of features (119865) to be used in the final stepTheoriginaldataset is partitioned into 119899 subset Each subset is used asan input to the neurofuzzy method All selected features arethen combined to create the dataset with selected featuresThe partitioning and feature selection is iteratively performeduntil the desired number of informative features is achieved

3 Experimental Results and Discussion

The diffuse large B-cell lymphomas (DLBCL) dataset con-sisting of 77 microarray experiments with 7070 gene expres-sion levels [27] was utilized in this research It was madeavailable to the public at httpwwwbroadinstituteorgcgi-bincancerdatasetscgi There are two classes in thedataset that is diffuse large B-cell lymphoma (DLBCL) isreferred to as class 1 and follicular lymphoma (FL) is referredto as class 2 These 2 types of B-cell lineage malignancieshave very different clinical presentations natural historiesand response to therapy [27] Because DLBCLs are the mostcommon lymphoid malignancy in adults a method thatcan efficiently classify these 2 lymphomas is therefore very

Mathematical Problems in Engineering 7

Table 3 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features after training for the DLBCL dataset

Final valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 42217 244422 448788 70018 68702 6864087 89961 244306 401299 55559 53710 53294207 minus316059 minus88806 138408 73797 75289 75806355 minus84419 12311 109255 31586 31900 37146450 1811843 4047102 6272187 746321 748685 750148546 594454 2361230 4125430 590479 591615 595343931 minus404615 minus26100 348347 124371 124241 1257062164 552324 1048629 1545862 168960 168791 1736262360 566830 1351743 2133887 262543 263800 2652772479 minus10591 41380 95801 17087 18011 233556264 2972543 6890328 10810781 1308614 1309833 13083257043 8683574 12940488 17192838 1424231 1424378 14229027052 minus21680 387083 797666 137781 137389 1395187057 217169 507375 803842 100208 102060 98289

No

Original dataset

Sub 1 Sub 2 Sub 3 Sub 4 Sub n

Dataset with selected features

Partition into n subsets

Final dataset with selected features

Yes

middot middot middot

features le F

Figure 3 Iterative partition method for neurofuzzy method withfeature selection and rule extraction for large dataset

desirable In the dataset there are 58 samples of DLBCL classand 19 samples of FL class

The selected informative features from the learning phaseare used for classification task in both direct calculationand logical rule The 10-fold cross validation is performedin the experiments The results displayed are the averageresults on validation sets over 10 cross validations Duringthe training step because the number of features was toolarge the original features were divided into small subsets

rather than using the entire 7070 features at once The 10-fold cross validation was performed on each subset Theinformative features from each subset were selected and formthe final informative features by combining them togetherThe learning rate for weight updating was set to 01 and thelearning rates used in updating 119888 and 120590 were set to 0001

31 Classification Results on DLBCL Microarrays by DirectCalculation Using Selected Features We tried several choicesof the number of selected features (119873) and found that119873 = 14 was adequate for this problem After training 14features were automatically selected by our method The setof features that yielded the best results on validation setsamong those in 10-fold cross validation consisted of features83 (MDM4) 87 (STX16) 207 (NR1D2) 355 (DCLRE1A)450 (PARK7) 546 (ATIC) 931 (HG4263-HT4533 at) 2164(CSRP1) 2360 (NASP) 2479 (PGK1) 6264 (HLA-DPB1 2)7043 (HLA-A 2) 7052 (ITK 2) and 7057 (PLGLB2) Thegenes corresponding to the selected features are shown inTable 1 The values of means and standard deviations of allmembership functions of the 14 selected features before andafter training are shown in Tables 2 and 3 respectively

The classification rates on the validation sets of 10-foldcross validation achieved by using the direct calculation were9221 8961 8442 and 8442 when the numbersof selected linguistic features were set to 14 10 5 and 3respectively These results show that the proposed methodcan select a set of informative features out of a huge pool offeatures As shown in Table 4 the classification performanceis comparable to those performed by previously proposedmethods [26 27] However rather than using random initialweights connecting between the hidden layer and the outputlayer we used the weights achieved in the current crossvalidation to be the initial weights for the next cross valida-tion This constrained weight initialization yielded 100009740 9091 and 9221using the direct calculation whenthe numbers of selected linguistic features were set to 14 10

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

4 Mathematical Problems in Engineering

While (performance le threshold)Compute the delta values (local gradient values) at the output nodes using (7)Update weights between hidden and output layers using (8)For each input features

If (hidden node connecting to input feature belongs to informative class)Compute the delta values (local gradient values) at the output nodes using (9)Update mean and standard deviation using (10)

ElseRetain original values

End IfEnd For

End While

Algorithm 1

energy value of neuron 119896 is defined as (12)1198902119896(119899) The total

error energy of neuron 119896 can be calculated by

119864 (119899) =

1

2

sum

119896

1198902

119896(119899) (6)

Backpropagating from the output layer the delta value (localgradient value) is defined as

120575119900(119899) = 119890119896

1205931015840(119911119896 (119899)) (7)

where 1205931015840(119911119896(119899)) = 120597119910

119896(119899)120597119911

119896(119899) Given the learning rate

120578119908 the connected weights between the hidden layer and the

output layer are updated by

119908119895119896 (119899 + 1) = 119908119895119896 (

119899) minus 120578119908120575119900(119899) 119910ℎ

119895(119899) (8)

212 Membership Function Parameter Modification In thehidden layer the update process follows the parameter tuningalgorithm displayed at the beginning of this section Onlymembership functions of the informative class are updatedSinceGaussianmembership functions are chosen for the clas-sified informative linguistic feature we update 2 parametersthat is 119888 and 120590 Similar to the output layer we perform thebackpropagation algorithm in this hidden layer with the deltavalue defined as

120575ℎ(119899) = 120593

1015840(120583119894119895 (119899))sum

119896

120575119900(119899) 119908119895119896 (

119899) (9)

The parameters 119888 and 120590 belonging to the informative linguis-tic features at iteration (119899 + 1) are updated by

119888119894119895 (119899 + 1) = 119888119894119895 (

119899) + Δ119888119894119895 (119899)

120590119894119895 (119899 + 1) = 120590119894119895 (

119899) + Δ120590119894119895 (119899)

(10)

where Δ119888119894119895and Δ120590

119894119895are defined by

Δ119888119894119895 (119899) = minus120578119888

120575ℎ(119899) 119910

in119894119895

Δ120590119894119895 (119899) = minus120578120590

120575ℎ(119899) 119910

in119894119895

(11)

Table 1 Genes corresponding to the selected features

Feature index Gene83 MDM487 STX16207 NR1D2355 DCLRE1A450 PARK7546 ATIC931 HG4263-HT4533 at2164 CSRP12360 NASP2479 PGK16264 HLA-DPB1 27043 HLA-A 27052 ITK 27057 PLGLB2

120578119888and 120578

120590are the learning rates for the parameters 119888 and 120590

respectivelyIn Figure 2(a) the Wisconsin breast cancer dataset with

9 original features and 683 data points is used to illustratethe initial membership functions Figure 2(b) shows the illus-trators of all corresponding parameters tuned after trainingFor the DLBCL dataset used in this research the number oforiginal features is too large Therefore we cannot displaythe initial membership functions of all features However themeans and standard deviations of a set of 14 selected featuresrsquoinitial membership functions of are shown later in Table 1

22 Rule Extraction Methods For the typical direct calcula-tion in a neural network the output is 119910119900 as shown in (3)The class decision is simply the class with the correspondingmaximum output For the rule extraction purpose howeverthe weight values are used to verify the importance of thefeatures after the training process In each fold of the 10-fold cross validation after the learning phase the connectionweights between the hidden layer and the output layer aresorted to prioritize the informative features to be describedin more detail below

Mathematical Problems in Engineering 5

SML

SML

SML

Original MF of feature 1 Original MF of feature 2 Original MF of feature 3

Original MF of feature 4

002040608

1

2 4 6 8 1000

02040608

1

2 4 6 8 100

Original MF of feature 5

2 4 6 8 100

Original MF of feature 6

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

2 4 8 100 6

Original MF of feature 9

2 4 6 8 100

Original MF of feature 8

2 4 8 100 6

Original MF of feature 7

2 4 6 8 100 2 4 6 8 100 2 4 6 8 100

(a)

Updated MF of feature 1 Updated MF of feature 2 Updated MF of feature 3

SML

SML

SML

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

002040608

1

002040608

1

0

05

1

0

05

1

2 4 6 8 100

Updated MF of feature 4

2 4 6 8 100

Updated MF of feature 5

2 4 6 8 100

Updated MF of feature 6

2 4 6 8 100

Updated MF of feature 7

2 4 6 8 100

Updated MF of feature 8

2 4 6 8 100

Updated MF of feature 9

2 4 6 8 1002 4 6 8 1002 4 6 8 100

(b)

Figure 2 (a) Initial membership functions of the Wisconsin breast cancer dataset (b) Updated membership functions after training by theneurofuzzy classification model

6 Mathematical Problems in Engineering

Table 2 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features before training for the DLBCL dataset

Initial valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 185636 254651 323665 69014 69014 6901487 200414 255633 310852 55219 55219 55219207 minus164730 minus89712 minus14693 75018 75018 75018355 minus22187 11675 45538 33862 33862 33862450 3384316 4126221 4868126 741905 741905 741905546 1810664 2396104 2981544 585440 585440 585440931 minus148675 minus26558 95558 122117 122117 1221172164 865360 1032987 1200614 167627 167627 1676272360 1107608 1373013 1638418 265405 265405 2654052479 25192 42494 59795 17301 17301 173016264 5419876 6722623 8025370 1302747 1302747 13027477043 11535574 12983805 14432037 1448232 1448232 14482327052 259998 393052 526106 133054 133054 1330547057 422074 516844 611614 94770 94770 94770

The algorithm selects the informative linguistic featurestwice The first selection is done by considering the bipolaroutput from the hidden layer The linguistic feature withoutput of +1 is considered an informative featureThe secondselection is done by considering the weight values betweenthe hidden layer and the output layer The larger weightvalue indicates the more informative feature Consider theproposed network for logical rule extraction the IF part isextracted from the hidden layer As mentioned previouslywe use 3 membership functions representing the linguisticterms SM L for each original feature The summation ofthe product of weights and output from the hidden layer isinterpreted to the logical OR in the extracted rules The finalclassified class from the output layer is interpreted to THENin the classification rules After finishing the training phasethe weights are sorted We use the final values of weights toselect the ldquoTop119873rdquo informative features

From the structure described earlier the rule extractedfrom our network can be interpreted by 2 approaches Bothapproaches combine all conditions to only 1 rule The firstapproach is the ldquosimple ORrdquo rule All 119873 features are used tocreate a simple OR rule of each class for example when a 2-class problem is assumed and119873 is set to 3 Considering class 1if the first informative order is ldquoFeature 1 is Largerdquo the secondinformative order is ldquoFeature 5 is Mediumrdquo and the thirdinformative order is ldquoFeature 3 is Smallrdquo Considering class2 if the first informative order is ldquoFeature 10 is Smallrdquo thesecond informative order is ldquoFeature 5 is Smallrdquo and the thirdinformative order is ldquoFeature 6 is Largerdquo In case of the classorder ldquoClass 1 then Class 2rdquo the rule automatically generatedfrom the simple OR approach of our proposed method is asRule 1

In case of the class order ldquoClass 2 then Class 1rdquo the rulewill be slightly modified to Rule 2

The second approach creates a rule with the considerationof the order of informative linguistic features and class orderWe call this approach the ldquolayeredrdquo rule All N linguisticfeatures are created with the consideration of the order of

informative features and class order In case of the class orderldquoClass 1 then Class 2rdquo the rule automatically generated fromthe layered approach for the same scenario as above is asRule 3

In case of the class order ldquoClass 2 then Class 1rdquo theextracted rule will be Rule 4

23 Neurofuzzy Method with Feature Selection and RuleExtraction for High Dimensional Dataset via Iterative Par-tition Method An iterative partition method is designedfor the application on a dataset that has a large amount offeatures The idea is to partition the entire set of featuresinto subclusters Each cluster is used in informative featureselection The structure of the algorithm is displayed inFigure 3The first step in the algorithm is to define the desirednumber of features (119865) to be used in the final stepTheoriginaldataset is partitioned into 119899 subset Each subset is used asan input to the neurofuzzy method All selected features arethen combined to create the dataset with selected featuresThe partitioning and feature selection is iteratively performeduntil the desired number of informative features is achieved

3 Experimental Results and Discussion

The diffuse large B-cell lymphomas (DLBCL) dataset con-sisting of 77 microarray experiments with 7070 gene expres-sion levels [27] was utilized in this research It was madeavailable to the public at httpwwwbroadinstituteorgcgi-bincancerdatasetscgi There are two classes in thedataset that is diffuse large B-cell lymphoma (DLBCL) isreferred to as class 1 and follicular lymphoma (FL) is referredto as class 2 These 2 types of B-cell lineage malignancieshave very different clinical presentations natural historiesand response to therapy [27] Because DLBCLs are the mostcommon lymphoid malignancy in adults a method thatcan efficiently classify these 2 lymphomas is therefore very

Mathematical Problems in Engineering 7

Table 3 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features after training for the DLBCL dataset

Final valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 42217 244422 448788 70018 68702 6864087 89961 244306 401299 55559 53710 53294207 minus316059 minus88806 138408 73797 75289 75806355 minus84419 12311 109255 31586 31900 37146450 1811843 4047102 6272187 746321 748685 750148546 594454 2361230 4125430 590479 591615 595343931 minus404615 minus26100 348347 124371 124241 1257062164 552324 1048629 1545862 168960 168791 1736262360 566830 1351743 2133887 262543 263800 2652772479 minus10591 41380 95801 17087 18011 233556264 2972543 6890328 10810781 1308614 1309833 13083257043 8683574 12940488 17192838 1424231 1424378 14229027052 minus21680 387083 797666 137781 137389 1395187057 217169 507375 803842 100208 102060 98289

No

Original dataset

Sub 1 Sub 2 Sub 3 Sub 4 Sub n

Dataset with selected features

Partition into n subsets

Final dataset with selected features

Yes

middot middot middot

features le F

Figure 3 Iterative partition method for neurofuzzy method withfeature selection and rule extraction for large dataset

desirable In the dataset there are 58 samples of DLBCL classand 19 samples of FL class

The selected informative features from the learning phaseare used for classification task in both direct calculationand logical rule The 10-fold cross validation is performedin the experiments The results displayed are the averageresults on validation sets over 10 cross validations Duringthe training step because the number of features was toolarge the original features were divided into small subsets

rather than using the entire 7070 features at once The 10-fold cross validation was performed on each subset Theinformative features from each subset were selected and formthe final informative features by combining them togetherThe learning rate for weight updating was set to 01 and thelearning rates used in updating 119888 and 120590 were set to 0001

31 Classification Results on DLBCL Microarrays by DirectCalculation Using Selected Features We tried several choicesof the number of selected features (119873) and found that119873 = 14 was adequate for this problem After training 14features were automatically selected by our method The setof features that yielded the best results on validation setsamong those in 10-fold cross validation consisted of features83 (MDM4) 87 (STX16) 207 (NR1D2) 355 (DCLRE1A)450 (PARK7) 546 (ATIC) 931 (HG4263-HT4533 at) 2164(CSRP1) 2360 (NASP) 2479 (PGK1) 6264 (HLA-DPB1 2)7043 (HLA-A 2) 7052 (ITK 2) and 7057 (PLGLB2) Thegenes corresponding to the selected features are shown inTable 1 The values of means and standard deviations of allmembership functions of the 14 selected features before andafter training are shown in Tables 2 and 3 respectively

The classification rates on the validation sets of 10-foldcross validation achieved by using the direct calculation were9221 8961 8442 and 8442 when the numbersof selected linguistic features were set to 14 10 5 and 3respectively These results show that the proposed methodcan select a set of informative features out of a huge pool offeatures As shown in Table 4 the classification performanceis comparable to those performed by previously proposedmethods [26 27] However rather than using random initialweights connecting between the hidden layer and the outputlayer we used the weights achieved in the current crossvalidation to be the initial weights for the next cross valida-tion This constrained weight initialization yielded 100009740 9091 and 9221using the direct calculation whenthe numbers of selected linguistic features were set to 14 10

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

Mathematical Problems in Engineering 5

SML

SML

SML

Original MF of feature 1 Original MF of feature 2 Original MF of feature 3

Original MF of feature 4

002040608

1

2 4 6 8 1000

02040608

1

2 4 6 8 100

Original MF of feature 5

2 4 6 8 100

Original MF of feature 6

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

0

05

1

2 4 8 100 6

Original MF of feature 9

2 4 6 8 100

Original MF of feature 8

2 4 8 100 6

Original MF of feature 7

2 4 6 8 100 2 4 6 8 100 2 4 6 8 100

(a)

Updated MF of feature 1 Updated MF of feature 2 Updated MF of feature 3

SML

SML

SML

002040608

1

0

05

1

0

05

1

0

05

1

0

05

1

002040608

1

002040608

1

0

05

1

0

05

1

2 4 6 8 100

Updated MF of feature 4

2 4 6 8 100

Updated MF of feature 5

2 4 6 8 100

Updated MF of feature 6

2 4 6 8 100

Updated MF of feature 7

2 4 6 8 100

Updated MF of feature 8

2 4 6 8 100

Updated MF of feature 9

2 4 6 8 1002 4 6 8 1002 4 6 8 100

(b)

Figure 2 (a) Initial membership functions of the Wisconsin breast cancer dataset (b) Updated membership functions after training by theneurofuzzy classification model

6 Mathematical Problems in Engineering

Table 2 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features before training for the DLBCL dataset

Initial valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 185636 254651 323665 69014 69014 6901487 200414 255633 310852 55219 55219 55219207 minus164730 minus89712 minus14693 75018 75018 75018355 minus22187 11675 45538 33862 33862 33862450 3384316 4126221 4868126 741905 741905 741905546 1810664 2396104 2981544 585440 585440 585440931 minus148675 minus26558 95558 122117 122117 1221172164 865360 1032987 1200614 167627 167627 1676272360 1107608 1373013 1638418 265405 265405 2654052479 25192 42494 59795 17301 17301 173016264 5419876 6722623 8025370 1302747 1302747 13027477043 11535574 12983805 14432037 1448232 1448232 14482327052 259998 393052 526106 133054 133054 1330547057 422074 516844 611614 94770 94770 94770

The algorithm selects the informative linguistic featurestwice The first selection is done by considering the bipolaroutput from the hidden layer The linguistic feature withoutput of +1 is considered an informative featureThe secondselection is done by considering the weight values betweenthe hidden layer and the output layer The larger weightvalue indicates the more informative feature Consider theproposed network for logical rule extraction the IF part isextracted from the hidden layer As mentioned previouslywe use 3 membership functions representing the linguisticterms SM L for each original feature The summation ofthe product of weights and output from the hidden layer isinterpreted to the logical OR in the extracted rules The finalclassified class from the output layer is interpreted to THENin the classification rules After finishing the training phasethe weights are sorted We use the final values of weights toselect the ldquoTop119873rdquo informative features

From the structure described earlier the rule extractedfrom our network can be interpreted by 2 approaches Bothapproaches combine all conditions to only 1 rule The firstapproach is the ldquosimple ORrdquo rule All 119873 features are used tocreate a simple OR rule of each class for example when a 2-class problem is assumed and119873 is set to 3 Considering class 1if the first informative order is ldquoFeature 1 is Largerdquo the secondinformative order is ldquoFeature 5 is Mediumrdquo and the thirdinformative order is ldquoFeature 3 is Smallrdquo Considering class2 if the first informative order is ldquoFeature 10 is Smallrdquo thesecond informative order is ldquoFeature 5 is Smallrdquo and the thirdinformative order is ldquoFeature 6 is Largerdquo In case of the classorder ldquoClass 1 then Class 2rdquo the rule automatically generatedfrom the simple OR approach of our proposed method is asRule 1

In case of the class order ldquoClass 2 then Class 1rdquo the rulewill be slightly modified to Rule 2

The second approach creates a rule with the considerationof the order of informative linguistic features and class orderWe call this approach the ldquolayeredrdquo rule All N linguisticfeatures are created with the consideration of the order of

informative features and class order In case of the class orderldquoClass 1 then Class 2rdquo the rule automatically generated fromthe layered approach for the same scenario as above is asRule 3

In case of the class order ldquoClass 2 then Class 1rdquo theextracted rule will be Rule 4

23 Neurofuzzy Method with Feature Selection and RuleExtraction for High Dimensional Dataset via Iterative Par-tition Method An iterative partition method is designedfor the application on a dataset that has a large amount offeatures The idea is to partition the entire set of featuresinto subclusters Each cluster is used in informative featureselection The structure of the algorithm is displayed inFigure 3The first step in the algorithm is to define the desirednumber of features (119865) to be used in the final stepTheoriginaldataset is partitioned into 119899 subset Each subset is used asan input to the neurofuzzy method All selected features arethen combined to create the dataset with selected featuresThe partitioning and feature selection is iteratively performeduntil the desired number of informative features is achieved

3 Experimental Results and Discussion

The diffuse large B-cell lymphomas (DLBCL) dataset con-sisting of 77 microarray experiments with 7070 gene expres-sion levels [27] was utilized in this research It was madeavailable to the public at httpwwwbroadinstituteorgcgi-bincancerdatasetscgi There are two classes in thedataset that is diffuse large B-cell lymphoma (DLBCL) isreferred to as class 1 and follicular lymphoma (FL) is referredto as class 2 These 2 types of B-cell lineage malignancieshave very different clinical presentations natural historiesand response to therapy [27] Because DLBCLs are the mostcommon lymphoid malignancy in adults a method thatcan efficiently classify these 2 lymphomas is therefore very

Mathematical Problems in Engineering 7

Table 3 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features after training for the DLBCL dataset

Final valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 42217 244422 448788 70018 68702 6864087 89961 244306 401299 55559 53710 53294207 minus316059 minus88806 138408 73797 75289 75806355 minus84419 12311 109255 31586 31900 37146450 1811843 4047102 6272187 746321 748685 750148546 594454 2361230 4125430 590479 591615 595343931 minus404615 minus26100 348347 124371 124241 1257062164 552324 1048629 1545862 168960 168791 1736262360 566830 1351743 2133887 262543 263800 2652772479 minus10591 41380 95801 17087 18011 233556264 2972543 6890328 10810781 1308614 1309833 13083257043 8683574 12940488 17192838 1424231 1424378 14229027052 minus21680 387083 797666 137781 137389 1395187057 217169 507375 803842 100208 102060 98289

No

Original dataset

Sub 1 Sub 2 Sub 3 Sub 4 Sub n

Dataset with selected features

Partition into n subsets

Final dataset with selected features

Yes

middot middot middot

features le F

Figure 3 Iterative partition method for neurofuzzy method withfeature selection and rule extraction for large dataset

desirable In the dataset there are 58 samples of DLBCL classand 19 samples of FL class

The selected informative features from the learning phaseare used for classification task in both direct calculationand logical rule The 10-fold cross validation is performedin the experiments The results displayed are the averageresults on validation sets over 10 cross validations Duringthe training step because the number of features was toolarge the original features were divided into small subsets

rather than using the entire 7070 features at once The 10-fold cross validation was performed on each subset Theinformative features from each subset were selected and formthe final informative features by combining them togetherThe learning rate for weight updating was set to 01 and thelearning rates used in updating 119888 and 120590 were set to 0001

31 Classification Results on DLBCL Microarrays by DirectCalculation Using Selected Features We tried several choicesof the number of selected features (119873) and found that119873 = 14 was adequate for this problem After training 14features were automatically selected by our method The setof features that yielded the best results on validation setsamong those in 10-fold cross validation consisted of features83 (MDM4) 87 (STX16) 207 (NR1D2) 355 (DCLRE1A)450 (PARK7) 546 (ATIC) 931 (HG4263-HT4533 at) 2164(CSRP1) 2360 (NASP) 2479 (PGK1) 6264 (HLA-DPB1 2)7043 (HLA-A 2) 7052 (ITK 2) and 7057 (PLGLB2) Thegenes corresponding to the selected features are shown inTable 1 The values of means and standard deviations of allmembership functions of the 14 selected features before andafter training are shown in Tables 2 and 3 respectively

The classification rates on the validation sets of 10-foldcross validation achieved by using the direct calculation were9221 8961 8442 and 8442 when the numbersof selected linguistic features were set to 14 10 5 and 3respectively These results show that the proposed methodcan select a set of informative features out of a huge pool offeatures As shown in Table 4 the classification performanceis comparable to those performed by previously proposedmethods [26 27] However rather than using random initialweights connecting between the hidden layer and the outputlayer we used the weights achieved in the current crossvalidation to be the initial weights for the next cross valida-tion This constrained weight initialization yielded 100009740 9091 and 9221using the direct calculation whenthe numbers of selected linguistic features were set to 14 10

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

6 Mathematical Problems in Engineering

Table 2 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features before training for the DLBCL dataset

Initial valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 185636 254651 323665 69014 69014 6901487 200414 255633 310852 55219 55219 55219207 minus164730 minus89712 minus14693 75018 75018 75018355 minus22187 11675 45538 33862 33862 33862450 3384316 4126221 4868126 741905 741905 741905546 1810664 2396104 2981544 585440 585440 585440931 minus148675 minus26558 95558 122117 122117 1221172164 865360 1032987 1200614 167627 167627 1676272360 1107608 1373013 1638418 265405 265405 2654052479 25192 42494 59795 17301 17301 173016264 5419876 6722623 8025370 1302747 1302747 13027477043 11535574 12983805 14432037 1448232 1448232 14482327052 259998 393052 526106 133054 133054 1330547057 422074 516844 611614 94770 94770 94770

The algorithm selects the informative linguistic featurestwice The first selection is done by considering the bipolaroutput from the hidden layer The linguistic feature withoutput of +1 is considered an informative featureThe secondselection is done by considering the weight values betweenthe hidden layer and the output layer The larger weightvalue indicates the more informative feature Consider theproposed network for logical rule extraction the IF part isextracted from the hidden layer As mentioned previouslywe use 3 membership functions representing the linguisticterms SM L for each original feature The summation ofthe product of weights and output from the hidden layer isinterpreted to the logical OR in the extracted rules The finalclassified class from the output layer is interpreted to THENin the classification rules After finishing the training phasethe weights are sorted We use the final values of weights toselect the ldquoTop119873rdquo informative features

From the structure described earlier the rule extractedfrom our network can be interpreted by 2 approaches Bothapproaches combine all conditions to only 1 rule The firstapproach is the ldquosimple ORrdquo rule All 119873 features are used tocreate a simple OR rule of each class for example when a 2-class problem is assumed and119873 is set to 3 Considering class 1if the first informative order is ldquoFeature 1 is Largerdquo the secondinformative order is ldquoFeature 5 is Mediumrdquo and the thirdinformative order is ldquoFeature 3 is Smallrdquo Considering class2 if the first informative order is ldquoFeature 10 is Smallrdquo thesecond informative order is ldquoFeature 5 is Smallrdquo and the thirdinformative order is ldquoFeature 6 is Largerdquo In case of the classorder ldquoClass 1 then Class 2rdquo the rule automatically generatedfrom the simple OR approach of our proposed method is asRule 1

In case of the class order ldquoClass 2 then Class 1rdquo the rulewill be slightly modified to Rule 2

The second approach creates a rule with the considerationof the order of informative linguistic features and class orderWe call this approach the ldquolayeredrdquo rule All N linguisticfeatures are created with the consideration of the order of

informative features and class order In case of the class orderldquoClass 1 then Class 2rdquo the rule automatically generated fromthe layered approach for the same scenario as above is asRule 3

In case of the class order ldquoClass 2 then Class 1rdquo theextracted rule will be Rule 4

23 Neurofuzzy Method with Feature Selection and RuleExtraction for High Dimensional Dataset via Iterative Par-tition Method An iterative partition method is designedfor the application on a dataset that has a large amount offeatures The idea is to partition the entire set of featuresinto subclusters Each cluster is used in informative featureselection The structure of the algorithm is displayed inFigure 3The first step in the algorithm is to define the desirednumber of features (119865) to be used in the final stepTheoriginaldataset is partitioned into 119899 subset Each subset is used asan input to the neurofuzzy method All selected features arethen combined to create the dataset with selected featuresThe partitioning and feature selection is iteratively performeduntil the desired number of informative features is achieved

3 Experimental Results and Discussion

The diffuse large B-cell lymphomas (DLBCL) dataset con-sisting of 77 microarray experiments with 7070 gene expres-sion levels [27] was utilized in this research It was madeavailable to the public at httpwwwbroadinstituteorgcgi-bincancerdatasetscgi There are two classes in thedataset that is diffuse large B-cell lymphoma (DLBCL) isreferred to as class 1 and follicular lymphoma (FL) is referredto as class 2 These 2 types of B-cell lineage malignancieshave very different clinical presentations natural historiesand response to therapy [27] Because DLBCLs are the mostcommon lymphoid malignancy in adults a method thatcan efficiently classify these 2 lymphomas is therefore very

Mathematical Problems in Engineering 7

Table 3 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features after training for the DLBCL dataset

Final valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 42217 244422 448788 70018 68702 6864087 89961 244306 401299 55559 53710 53294207 minus316059 minus88806 138408 73797 75289 75806355 minus84419 12311 109255 31586 31900 37146450 1811843 4047102 6272187 746321 748685 750148546 594454 2361230 4125430 590479 591615 595343931 minus404615 minus26100 348347 124371 124241 1257062164 552324 1048629 1545862 168960 168791 1736262360 566830 1351743 2133887 262543 263800 2652772479 minus10591 41380 95801 17087 18011 233556264 2972543 6890328 10810781 1308614 1309833 13083257043 8683574 12940488 17192838 1424231 1424378 14229027052 minus21680 387083 797666 137781 137389 1395187057 217169 507375 803842 100208 102060 98289

No

Original dataset

Sub 1 Sub 2 Sub 3 Sub 4 Sub n

Dataset with selected features

Partition into n subsets

Final dataset with selected features

Yes

middot middot middot

features le F

Figure 3 Iterative partition method for neurofuzzy method withfeature selection and rule extraction for large dataset

desirable In the dataset there are 58 samples of DLBCL classand 19 samples of FL class

The selected informative features from the learning phaseare used for classification task in both direct calculationand logical rule The 10-fold cross validation is performedin the experiments The results displayed are the averageresults on validation sets over 10 cross validations Duringthe training step because the number of features was toolarge the original features were divided into small subsets

rather than using the entire 7070 features at once The 10-fold cross validation was performed on each subset Theinformative features from each subset were selected and formthe final informative features by combining them togetherThe learning rate for weight updating was set to 01 and thelearning rates used in updating 119888 and 120590 were set to 0001

31 Classification Results on DLBCL Microarrays by DirectCalculation Using Selected Features We tried several choicesof the number of selected features (119873) and found that119873 = 14 was adequate for this problem After training 14features were automatically selected by our method The setof features that yielded the best results on validation setsamong those in 10-fold cross validation consisted of features83 (MDM4) 87 (STX16) 207 (NR1D2) 355 (DCLRE1A)450 (PARK7) 546 (ATIC) 931 (HG4263-HT4533 at) 2164(CSRP1) 2360 (NASP) 2479 (PGK1) 6264 (HLA-DPB1 2)7043 (HLA-A 2) 7052 (ITK 2) and 7057 (PLGLB2) Thegenes corresponding to the selected features are shown inTable 1 The values of means and standard deviations of allmembership functions of the 14 selected features before andafter training are shown in Tables 2 and 3 respectively

The classification rates on the validation sets of 10-foldcross validation achieved by using the direct calculation were9221 8961 8442 and 8442 when the numbersof selected linguistic features were set to 14 10 5 and 3respectively These results show that the proposed methodcan select a set of informative features out of a huge pool offeatures As shown in Table 4 the classification performanceis comparable to those performed by previously proposedmethods [26 27] However rather than using random initialweights connecting between the hidden layer and the outputlayer we used the weights achieved in the current crossvalidation to be the initial weights for the next cross valida-tion This constrained weight initialization yielded 100009740 9091 and 9221using the direct calculation whenthe numbers of selected linguistic features were set to 14 10

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

Mathematical Problems in Engineering 7

Table 3 Means (119888119895) and variances (120590

119895) of membership functions S M L of 14 selected features after training for the DLBCL dataset

Final valuesFeature 119888S 119888M 119888L 120590S 120590M 120590L

83 42217 244422 448788 70018 68702 6864087 89961 244306 401299 55559 53710 53294207 minus316059 minus88806 138408 73797 75289 75806355 minus84419 12311 109255 31586 31900 37146450 1811843 4047102 6272187 746321 748685 750148546 594454 2361230 4125430 590479 591615 595343931 minus404615 minus26100 348347 124371 124241 1257062164 552324 1048629 1545862 168960 168791 1736262360 566830 1351743 2133887 262543 263800 2652772479 minus10591 41380 95801 17087 18011 233556264 2972543 6890328 10810781 1308614 1309833 13083257043 8683574 12940488 17192838 1424231 1424378 14229027052 minus21680 387083 797666 137781 137389 1395187057 217169 507375 803842 100208 102060 98289

No

Original dataset

Sub 1 Sub 2 Sub 3 Sub 4 Sub n

Dataset with selected features

Partition into n subsets

Final dataset with selected features

Yes

middot middot middot

features le F

Figure 3 Iterative partition method for neurofuzzy method withfeature selection and rule extraction for large dataset

desirable In the dataset there are 58 samples of DLBCL classand 19 samples of FL class

The selected informative features from the learning phaseare used for classification task in both direct calculationand logical rule The 10-fold cross validation is performedin the experiments The results displayed are the averageresults on validation sets over 10 cross validations Duringthe training step because the number of features was toolarge the original features were divided into small subsets

rather than using the entire 7070 features at once The 10-fold cross validation was performed on each subset Theinformative features from each subset were selected and formthe final informative features by combining them togetherThe learning rate for weight updating was set to 01 and thelearning rates used in updating 119888 and 120590 were set to 0001

31 Classification Results on DLBCL Microarrays by DirectCalculation Using Selected Features We tried several choicesof the number of selected features (119873) and found that119873 = 14 was adequate for this problem After training 14features were automatically selected by our method The setof features that yielded the best results on validation setsamong those in 10-fold cross validation consisted of features83 (MDM4) 87 (STX16) 207 (NR1D2) 355 (DCLRE1A)450 (PARK7) 546 (ATIC) 931 (HG4263-HT4533 at) 2164(CSRP1) 2360 (NASP) 2479 (PGK1) 6264 (HLA-DPB1 2)7043 (HLA-A 2) 7052 (ITK 2) and 7057 (PLGLB2) Thegenes corresponding to the selected features are shown inTable 1 The values of means and standard deviations of allmembership functions of the 14 selected features before andafter training are shown in Tables 2 and 3 respectively

The classification rates on the validation sets of 10-foldcross validation achieved by using the direct calculation were9221 8961 8442 and 8442 when the numbersof selected linguistic features were set to 14 10 5 and 3respectively These results show that the proposed methodcan select a set of informative features out of a huge pool offeatures As shown in Table 4 the classification performanceis comparable to those performed by previously proposedmethods [26 27] However rather than using random initialweights connecting between the hidden layer and the outputlayer we used the weights achieved in the current crossvalidation to be the initial weights for the next cross valida-tion This constrained weight initialization yielded 100009740 9091 and 9221using the direct calculation whenthe numbers of selected linguistic features were set to 14 10

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

8 Mathematical Problems in Engineering

IF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

ELSEIF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

END

Rule 1

IF ldquoFeature 10 is Smallrdquo OR ldquoFeature 5 is Smallrdquo OR ldquoFeature 6 is LargerdquoTHEN ldquoClass is 2rdquo

ELSEIF ldquoFeature 1 is Largerdquo OR ldquoFeature 5 is Mediumrdquo OR ldquoFeature 3 is SmallrdquoTHEN ldquoClass is 1rdquo

END

Rule 2

IF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoEND

Rule 3

IF ldquoFeature 10 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 1 is Largerdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 5 is Smallrdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 5 is Mediumrdquo THEN ldquoClass is 1rdquoELSEIF ldquoFeature 6 is Largerdquo THEN ldquoClass is 2rdquoELSEIF ldquoFeature 3 is Smallrdquo THEN ldquoClass is 1rdquoEND

Rule 4

5 and 3 respectively To ensure that the set of all parametersachieved here could get 100 correct classification on thisdataset we tried to use the networks to classify all 77microarrays (rather than considering the results from 10-foldcross validation in which the outputs could be different whenthe random groups are different) We found that each of all10 networks from 10-fold cross validation still yielded 100correct classification on the entire dataset

32 Classification Results on DLBCL Microarrays by LogicalRule Using Selected Features One of the good features ofour method is the automatic rule extraction Even thoughthis approach usually does not yield as good performanceas the direct calculation it provides rules understandable forhumanThis ismore desirable from the human interpretationaspect than the black-box based direct calculation

Table 4 Comparison between the proposed method and otheralgorithms on the DLBCL dataset

MethodNumber offeaturesselected

Classificationrate ()

Naıve Bayes [26] 3ndash8 8376Our method without constrainedweight initialization 10 8961

Decision trees [26] 3ndash8 8546Our method without constrainedweight initialization 14 9221

119896-NN [26] 3ndash8 8860Weighted voting model [27] 30 9220VizRank [26] 3ndash8 9303Our method with constrainedweight initialization 10 9740

SVM [26] 3ndash8 9785Our method with constrainedweight initialization 14 10000

The119873 selected linguistic featureswere used to create rulesusing both simple OR and layered approaches asmentioned inSection 22The classification rate of 9091 on validation setsusing only 5 selected linguistic features was achieved usingthe simple OR rule with the class order ldquoClass 1 then Class2rdquo where Class 1 and Class 2 denote the DLBCL class andFL class respectively For more details of the results Table 5shows the top-10 linguistic features for the DLBCL datasetselected by our method in each cross validation of the 10-fold cross validation The classification rates in all 10 crossvalidations were 7500 10000 10000 7500 100008750 7500 10000 10000 and 10000 respectivelyThe classification rate from the layered rule was 8182 usingthe same top 5 linguistic featuresThe details of classificationrates in all 10 cross validations were 7500 8750 100008750 6250 7500 7500 8571 8571 and 8571respectively When using the aforementioned constrained

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

Mathematical Problems in Engineering 9

IF ldquoHLA-A 2 is Smallrdquo OR ldquoNASP is Largerdquo OR ldquoMDM4 is Smallrdquo OR ldquoATIC is Mediumrdquo OR ldquoSTX16 is SmallrdquoTHEN ldquoClass is DLBCLrdquo

ELSEIF ldquoHLA-A 2 is Largerdquo OR ldquoATIC is Smallrdquo OR ldquoSTX16 is Largerdquo OR ldquoMDM4 is Mediumrdquo OR ldquoNASP is SmallrdquoTHEN ldquoClass is FLrdquo

END

Rule 5

Table 5 Top-10 linguistic features for DLBCL dataset selected by our method in 10-fold cross validation (most informative right leastinformative left)

Cross validation Class Feature type Feature ranking

11 Linguistic S M M S L S M L S S

Original 7057 207 931 87 2479 7052 546 2360 83 7043

2 Linguistic S M L L M S L L L SOriginal 450 7052 6264 207 83 2360 83 87 7043 546

21 Linguistic L L M M S S S M L S

Original 355 2479 207 931 87 7052 83 546 2360 7043

2 Linguistic S M L L L M S L L SOriginal 355 7052 83 6264 207 83 2360 87 7043 546

31 Linguistic S M S M L S M L S S

Original 7052 931 7057 207 2479 87 546 2360 83 7043

2 Linguistic L S S L S L M S L LOriginal 7057 2479 450 207 2360 83 83 546 87 7043

41 Linguistic M L M L M S S S L S

Original 931 2164 207 2479 546 7052 87 83 2360 7043

2 Linguistic S L S L L S M L S LOriginal 2479 6264 450 207 83 2360 83 87 546 7043

51 Linguistic L M L M S S M L S S

Original 2164 931 2479 207 7052 87 546 2360 83 7043

2 Linguistic S S L S L L M L S LOriginal 2479 450 6264 2360 83 207 83 87 546 7043

61 Linguistic L M S M L S S M L S

Original 355 207 7052 931 2479 87 83 546 2360 7043

2 Linguistic L L S L M L S L S LOriginal 7057 6264 450 207 83 83 2360 87 546 7043

71 Linguistic S S M M L S M S L S

Original 7052 7057 931 207 2479 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 2479 450 207 83 2360 83 87 546 7043

81 Linguistic L M M S S S L S L S

Original 355 546 931 7052 87 7057 2479 83 2360 7043

2 Linguistic L L S M S L L L S LOriginal 7057 207 2360 83 2479 6264 83 87 546 7043

91 Linguistic S S M L M S M S L S

Original 7057 7052 931 2479 207 87 546 83 2360 7043

2 Linguistic L S S L L S M L S LOriginal 6264 450 2479 207 83 2360 83 87 546 7043

101 Linguistic M L M S S S M S L S

Original 207 2479 931 87 7052 7057 546 83 2360 7043

2 Linguistic L L S L S L M L S LOriginal 6264 7057 450 207 2360 83 83 87 546 7043

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

10 Mathematical Problems in Engineering

IF ldquoHLA-A 2 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoHLA-A 2 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoNASP is Largerdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoATIC is Smallrdquo THEN ldquoClass is FLrdquoELSEIF ldquoMDM4 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoSTX16 is Largerdquo THEN ldquoClass is FLrdquoELSEIF ldquoATIC is Mediumrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoMDM4 is Mediumrdquo THEN ldquoClass is FLrdquoELSEIF ldquoSTX16 is Smallrdquo THEN ldquoClass is DLBCLrdquoELSEIF ldquoNASP is Smallrdquo THEN ldquoClass is FLrdquoEND

Rule 6

weight initialization the classification rates were the sameWe also tried to increase the number of selected features to10 but it did not help The results were the same as that using5 features

From Table 5 it can be seen that the sets of informativefeatures for class 1 and class 2 are different across the crossvalidations That will result in a number of different rulesHowever in the real application we will have to come upwith the best rules among them We propose to use thesummation of the feature informative level in all 10 crossvalidations The feature informative level is simply definedby the informative order For example if only 10 linguisticfeatures are considered the most informative one will havethe feature informative level of 10 the second one will havethat of 09 and so on Hence the tenth most informativefeature will have the feature informative factor of 01 and theremaining will get the feature informative level of 00 Theinformative levels of each feature are then summed across allcross validations to yield the overall informative level of thatfeature

We show the overall feature informative levels from the10-fold cross validation using top-10 features in Table 6 Inthis case the highest possible informative level is 100 Outof 42 linguistic features for each class there were only 12 and13 linguistic features with nonzero informative level for class1 and class 2 respectively That means the remaining featuresdid not appear at all in the top-10 list of any cross validationThe results showed that for class 1 the first 5most informativelinguistic features ranking from the most informative to theless informative were ldquoFeature 7043 (HLA-A 2) is SmallrdquoldquoFeature 2360 (NASP) is Largerdquo ldquoFeature 83 (MDM4) isSmallrdquo ldquoFeature 546 (ATIC) is Mediumrdquo and ldquoFeature 87(STX16) is Smallrdquo respectively For class 2 the first 5 mostinformative linguistic features were ldquoFeature 7043 is LargerdquoldquoFeature 546 is Smallrdquo ldquoFeature 87 is Largerdquo ldquoFeature 83is Mediumrdquo and ldquoFeature 2360 is Smallrdquo respectively It isworthwhile noting that the last statement can also be ldquoFeature83 is Largerdquo because it has the same feature informative levelof 55 as for ldquoFeature 2360 is Smallrdquo This information wasused to create rules as described in Section 22 Hence therule extracted using the simple OR approach is as Rule 5

Using the layered approach the extracted rule is in Rule 6

Table 6 Feature informative levels from the 10-fold cross validationusing top-10 features

Original feature Linguistic term Informative levelClass 1

7043 S 1002360 L 8783 S 81546 M 6587 S 552479 L 427052 S 39207 M 28931 M 287057 S 19355 L 032164 L 03

Class 27043 L 98546 S 9187 L 8183 M 622360 S 5583 L 55207 L 416264 L 23450 S 202479 S 147057 L 057052 M 04355 S 01

4 Conclusion

The classification problem of diffuse large B-cell lymphoma(DLBCL) versus follicular lymphoma (FL) based on highdimensional microarray data was investigated by our neu-rofuzzy classification scheme Our direct calculation methodcould achieve 100 classification rate on the validation setsof 10-fold cross validation by using only 14 out of 7070features in the dataset These 14 features including genesMDM4 STX16 NR1D2 DCLRE1A PARK7 ATIC HG4263-HT4533 at CSRP1 NASP PGK1 HLA-DPB1 2 HLA-A 2ITK 2 and PLGLB2 were automatically selected by ourmethod The method could also identify the informativelinguistic features for each class For the DLBCL class thefirst 5 most informative linguistic features were ldquoHLA-A 2is Smallrdquo ldquoNASP is Largerdquo ldquoMDM4 is Smallrdquo ldquoATIC isMediumrdquo and ldquoSTX16 is Smallrdquo respectively For class 2the first 5 most informative linguistic features were ldquoHLA-A 2 is Largerdquo ldquoATIC is Smallrdquo ldquoSTX16 is Largerdquo ldquoMDM4is Mediumrdquo and ldquoNASP is Smallrdquo respectively The termsSmall Medium and Large of each original feature wereautomatically determined by our method The informativelinguistic features were used to create rules that achieved9091 classification rate on the validation sets of 10-fold

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

Mathematical Problems in Engineering 11

cross validation Even though this rule creation approachyielded worse classification performance than the direct cal-culation it is more desirable from the human interpretationaspect It can be seen that very good results are achievedin this standard high dimensional dataset A set of selectedinformative genes will be useful for further investigation inthe fields of bioinformatics or medicines To ensure that thisset of selected features can be used in general it should beapplied to more DLBCL versus FL cases

Conflict of Interests

The authors declare no conflict of interests

References

[1] G G Towell J W Shavlik and M O Noordenier ldquoRefinementof approximate domain theories by knowledge based neuralnetworkrdquo in Proceedings of the 8th National Conference onArtificial Intelligence pp 861ndash866 Boston Mass USA July-August 1990

[2] L M Fu ldquoKnowledge-based connectionism for revisingdomain theoriesrdquo IEEE Transactions on Systems Man andCybernetics vol 23 no 1 pp 173ndash182 1993

[3] L M Fu ldquoLearning capacity and sample complexity on expertnetworksrdquo IEEE Transactions on Neural Networks vol 7 no 6pp 1517ndash1520 1996

[4] S Snyders and C W Omlin ldquoWhat inductive bias gives goodneural network training performancerdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo00)pp 445ndash450 Como Italy July 2000

[5] D-X Zhang Y Liu and Z I-Q Wang ldquoEffectively extractingrules from trained neural networks based on the characteristicsof the classification hypersurfacesrdquo inProceedings of the Interna-tional Conference onMachine Learning and Cybernetics (ICMLCrsquo05) vol 3 pp 1541ndash1546 IEEE Guangzhou China August2005

[6] L Fu ldquoRule generation from neural networksrdquo IEEE Transac-tions on Systems Man and Cybernetics vol 24 no 8 pp 1114ndash1124 1994

[7] F Beloufa and M A Chikh ldquoDesign of fuzzy classifier fordiabetes disease usingmodified artificial bee colony algorithmrdquoComputer Methods and Programs in Biomedicine vol 112 no 1pp 92ndash103 2013

[8] J Luo andD Chen ldquoFuzzy information granulation based deci-sion support applicationsrdquo in Proceedings of the InternationalSymposium on Information Processing (ISIP rsquo08) pp 197ndash201Moscow Russia May 2008

[9] A T Azar S A El-Said and A E Hassanien ldquoFuzzy and hardclustering analysis for thyroid diseaserdquo Computer Methods andPrograms in Biomedicine vol 111 no 1 pp 1ndash16 2013

[10] C-H L Lee A Liu and W-S Chen ldquoPattern discovery offuzzy time series for financial predictionrdquo IEEE Transactionson Knowledge and Data Engineering vol 18 no 5 pp 613ndash6252006

[11] R Yang Z Wang P-A Heng and K-S Leung ldquoClassificationof heterogeneous fuzzy data by choquet integral with fuzzy-valued integrandrdquo IEEE Transactions on Fuzzy Systems vol 15no 5 pp 931ndash942 2007

[12] P Hartono and S Hashimoto ldquoAn interpretable neural networkensemblerdquo in Proceedings of the 33rd Annual Conference of theIEEE Industrial Electronics Society (IECON rsquo07) pp 228ndash232Taipei Taiwan November 2007

[13] T-G Fan and X-Z Wang ldquoA new approach to weightedfuzzy production rule extraction from neural networksrdquo inProceedings of the International Conference onMachine Learningand Cybernetics pp 3348ndash3351 Shanghai China August 2004

[14] M Gao X Hong and C J Harris ldquoA neurofuzzy classifier fortwo class problemsrdquo in Proceedings of the 12th UKWorkshop onComputational Intelligence (UKCI rsquo12) pp 1ndash6 Edinburgh UKSeptember 2012

[15] W Duch R Adamczak and K Grabczewski ldquoA new method-ology of extraction optimization and application of crisp andfuzzy logical rulesrdquo IEEE Transactions on Neural Networks vol12 no 2 pp 277ndash306 2001

[16] W Duch R Setiono and J M Zurada ldquoComputational intelli-gence methods for rule-based data understandingrdquo Proceedingsof the IEEE vol 92 no 5 pp 771ndash805 2004

[17] A T Azar ldquoAdaptive network based on fuzzy inference sys-tem for equilibrated urea concentration predictionrdquo ComputerMethods and Programs in Biomedicine vol 111 no 3 pp 578ndash591 2013

[18] A Ouarda A Ahlem B Brahim and B Kheir ldquoComparisonof neuro-fuzzy models for classification fingerprint imagesrdquoInternational Journal of Computer Applications vol 65 no 9pp 12ndash16 2013

[19] Z Yang Y Wang and G Ouyang ldquoAdaptive neuro-fuzzyinference system for classification of background EEG signalsfrom ESES patients and controlsrdquo The Scientific World Journalvol 2014 Article ID 140863 8 pages 2014

[20] M S Al-Batah N A M Isa M F Klaib and M A Al-Betar ldquoMultiple adaptive neuro-fuzzy inference system withautomatic features extraction algorithm for cervical cancerrecognitionrdquo Computational and Mathematical Methods inMedicine vol 2014 Article ID 181245 12 pages 2014

[21] J-A Martinez-Leon J-M Cano-Izquierdo and J IbarrolaldquoFeature selection applying statistical and neurofuzzy methodsto EEG-based BCIrdquo Computational Intelligence and Neuro-science vol 2015 Article ID 781207 17 pages 2015

[22] D Chakraborty andN R Pal ldquoA neuro-fuzzy scheme for simul-taneous feature selection and fuzzy rule-based classificationrdquoIEEE Transactions onNeural Networks vol 15 no 1 pp 110ndash1232004

[23] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoA novel neuro-fuzzymethod for linguistic feature selectionand rule-based classificationrdquo in Proceedings of the 2nd Inter-national Conference on Computer and Automation Engineering(ICCAE rsquo10) vol 2 pp 247ndash252 Singapore February 2010

[24] N Eiamkanitchat N Theera-Umpon and S Auephanwiriya-kul ldquoColon tumor microarray classification using neural net-work with feature selection and rule-based classificationrdquo inAdvances in Neural Network Research and Applications vol 67of Lecture Notes in Electrical Engineering pp 363ndash372 SpringerBerlin Germany 2010

[25] C-H Chen and C-J Lin ldquoCompensatory neurofuzzy inferencesystems for pattern classificationrdquo in Proceedings of the Interna-tional Symposium on Computer Consumer and Control (IS3Crsquo12) pp 88ndash91 IEEE Taichung Taiwan June 2012

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

12 Mathematical Problems in Engineering

[26] MMramor G Leban J Demsar and B Zupan ldquoVisualization-based cancer microarray data classification analysisrdquo Bioinfor-matics vol 23 no 16 pp 2147ndash2154 2007

[27] M A Shipp K N Ross P Tamayo et al ldquoDiffuse large B-celllymphoma outcome prediction by gene-expression profilingand supervised machine learningrdquo Nature Medicine vol 8 no1 pp 68ndash74 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Review Article On Feature Selection and Rule Extraction ...with feature selection and rule extraction and its training scheme designed for a high dimensional dataset is described in

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of