[advances in intelligent systems and computing] proceedings of seventh international conference on...

J. C. Bansal et al. (eds.), Proceedings of Seventh International Conference on Bio-InspiredComputing: Theories and Applications (BIC-TA 2012), Advances in Intelligent Systemsand Computing 201, DOI: 10.1007/978-81-322-1038-2_9, � Springer India 2013

Improved Real-Time Discretize NetworkIntrusion Detection System

Heba F. Eid, Ahmad Taher Azar and Aboul Ella Hassanien

Abstract Intrusion detection systems (IDSs) is an essential key for network de-fense. Many classification algorithms have been proposed for the design of networkIDS. Data preprocessing is a common phase to the classification learning algorithm,which leads to improve the network IDS performance. One of the important datapreprocessing steps is discretization, where continuous features are converted intonominal ones. This paper addresses the impact of applying discretization on buildingnetwork IDS. Furthermore, it explores the impact of the quality of the classificationalgorithms when combining discretization with genetic algorithm (GA) as a featureselection method for network IDS. In order to evaluate the performance of the in-troduced network IDS, several classifiers algorithms; rules based classifiers (Ridor,Decision table), trees classifiers (REPTree, C 4.5, Random Forest) and Naı̈ve baysclassifier are used. Several groups of experiments are conducted and demonstratedon the NSL-KDD dataset. Experiments show that discretization has a positive influ-ence on the time to classify the test instances. Which is an important factor if realtime network IDS is desired.

Keywords: Disretization; Real-time network intrusion detection; Feature selec-tion; Network Intrusion Classification.

Heba F. EidFaculty of Science, Al-Azhar University,Cairo, Egypt e-mail: [email protected]

Ahmad Taher AzarFaculty of Engineering, Misr University for Science & Technologye-mail: ah-mad T [email protected]

Aboul Ella HassanienFaculty of Computers and Information, Cairo University e-mail: [email protected]

99

1 Introduction

Anderson in 1980 [1] proposed the concept of Intrusion Detection System (IDS).IDS is a major research problem in network security. The goal of network IDS isto identify automatically unusual access or attack to secure the networks channels[2, 3].

However, many issues need to be consider when building an IDS, such as datacollection, data preprocessing and classification accuracy. Several machine-learningtechniques have been proposed for the design of IDS. In particular, particular, thesetechniques are developed to classify whether the incoming network trances are nor-mal or intruder. The classifiers algorithms include rules based classifiers (Ridor,Decision table [4]) , trees classifiers (REPTree, C 4.5 [5], Random Forest [6]) andNaı̈ve bays classifier [7] .

Classification is the prediction of a class label of an unknown instance. The in-stances are usually described by a set of features, which can be nominal or contin-uous features. Therefore, discretization is one of the important data preprocessingsteps. Discretization converts the continuous features into nominal ones. Many dis-cretization methods have been proposed during the last two decades [8–10].

Another important research challenge for constructing high-performance IDS isdealing with data containing a large number of features. Therefore, feature selec-tion is required to deal with a large feature set. Feature selection reduces the com-putational complexity and removes information redundancy, which speeds up thelearning algorithm and increase the accuracy of the learning algorithm [11]. Differ-ent feature selection methods are proposed to enhance the performance of IDS [12].Genetic algorithm (GA) [13, 14] is one of the successfully global search algorithmsused to solve the feature selection tasks.

This paper addresses the impact of applying discretization on building networkIDS using different classifiers algorithms; rules based classifiers (Ridor, Decisiontable), trees classifiers (REPTree, C 4.5, Random Forest) and Naı̈ve bays classifier.Furthermore, it explores the impact of the quality of the different classifiers algo-rithms when combining discretization with GA feature selection on building thenetwork intrusion detection system.

The rest of this paper is organized as follows: Section 2 gives an overview ofPre-Processing Approaches: Disretization, feature selection and Genetic algorithm.Section 3 describes the proposed framework of the network intrusion detection sys-tem. The experimental results and conclusions are presented in Section 4 and 5respectively.

100 H. F. Eid et al.

2 Pre-Processing Approaches

2.1 Disretization

Discretization is a process of converting the continuous space of features into anominal space [15]. The goal of the discretization process is to find a set of cutpoints, which split the range into a small number of intervals. Each cut-point is areal value within the range of the continuous values, which divides the range intotwo intervals one greater than the cut-point and other less than or equal to the cut-point value [16]. Discretization is usually performed as a pre-processing phase tothe learning algorithm.

Discretization methods can be classified into five categories [17]:

1. Supervised vs. Unsupervised2. Static vs. Dynamic3. Global vs. Local4. Top-down (splitting) vs. Bottom-up (merging)5. Direct vs. Incremental

Supervised methods use the class labels during the discretization process. In con-trast, Unsupervised methods do not use information about the class labels and gen-erate discretization schemes based only on distribution of the values of the contin-uous attributes. Researches show that supervised methods are better than unsuper-vised methods [18]. Dynamic and static methods depends on whether the methodconsiders the interdependence among the features into account or not [19]. Globalmethods use the entire value space of a numeric attribute for the discretization.While, Local methods use a subset of instances when deriving the discretization.Top-down(splitting) discretization methods start with one interval of all values offeature and split it into smaller intervals at the subsequent iterations. While, thebottom-up (merging) methods start with the maximal number of sub-intervals andmerge these sub intervals until achieving a certain stopping criterion or optimalnumber of intervals [9]. Direct methods divide the range into equal-width of inter-vals, it requires the user to determine the number of intervals. Incremental methodsbegin with a simple discretization and go through an improvement process untilreaching a stopping criterion to terminate the discretization process [20].

Fayyad et al. [21] proposed the Information Entropy Maximization (IEM) dis-cretization method. It is a supervised, local, splitting and incremental discretizationmethod. IEM algorithm criterions are based on information entropy, where the cutpoints should be set between points with different class labels.

Improved Real-Time Discretize Network Intrusion Detection System 101

2.2 Feature Selection

Feature selection (FS) is a preprocessing step before classification. Its purpose is toimprove the classification performance through the removal of redundant or irrele-vant features. FS methods generate a new set of features by selecting only a subsetof the original features.

Based on the evaluation criteria feature selection methods fall into two cate-gories: filter approach [22, 23] and wrapper approach [24, 25]. Filter approachesevaluate and select the new set of features depending on the general characteris-tics of the data without involving any machine algorithm. Frequently used filtermethods include chi-square [26], information gain [27] and Pearson correlation co-efficients [28]. Wrapper approaches use the classification performance of a prede-termined machine algorithm as the evaluation criterion to select the new featuresset. Machine learning algorithms such as Genetic algorithm (GA) [29] ,ID3 [30]and Bayesian networks [7] are commonly used as induction algorithm for wrapperapproaches.

2.3 Genetic Algorithm

Genetic algorithm (GA) is an adaptive search technique initially introduced by Hol-land [29]. It is computational model designed to simulate the evolutionary processesin the nature. GA includes three fundamental operators: selection, crossover andmutation within chromosomes.

1. Selection: A population is created with a group of randomly individuals. Theindividuals in the population are then evaluated by fitness function. Two individ-uals (offspring) are selected for the next generation based on their fitness.

2. Crossover: crossover randomly chooses a point in the two selected parents andexchanging the remaining segments of them to create the new individuals.

3. Mutation: mutation randomly changes one or more components of a selectedindividual. This process continues until a suitable solution has been found or acertain number of generations have passed [31].

Given a well bounded problem GAs can find a global optimum, which makes themwell suited to feature selection processes.

3 A Framework of Real-Time Discretize Network IntrusionDetection

The framework for the proposed anomaly intrusion detection approach is shown inFig 1. It is comprised of the following three fundamental building phases: (1) Dataset Pre-processing by mapping and IEM discretization, (2) Data reduction by GAfeature selection, and (3) Intrusion detection and classification of a new intrusioninto five outcome.


Fig. 1 Real-time Discretize Network Intrusion Detection Framework

3.1 Pre-processing Phase

The following two pre-processing stages has been done on NSL-KDD dataset:

1. Mapping:

• symbolic features to numeric value.• Attack names to one of the five classes, 0 for Normal, 1 for DoS (Denial of

Service ), 2 for U2R (user-to-root: unauthorized access to root privileges),3 for R2L (remote-to-local: unauthorized access to local from a remote ma-chine), and 4 for Probe (probing:information gathering attacks.

2. Discretization: Features where discritized by Information Entropy Maximiza-tion (IEM) discretization method.

3.2 GA Feature Selection Phase

GA is applied as a feature selection method to reduce the dimensionality of thedataset. GA efficiently reduces the NSL-KDD dataset from 41 features to 14 fea-


tures, which reduces 65% of the feature space. Algorithm 1 gives the main steps ofthe genetic algorithm-based feature selection.

Algorithm 1 Genetic algorithm-based feature selection1: Initialize a population of randomly individual M(0) of 41 NSL-KDD features.2: for Reaching ?tness threshold or maximum number of generation do3: Evaluate the fitness f (m) of each individual m in the current population M(t)4: select the best-fit individuals using selection probabilities P(m) for each individual m in

M(t)5: Generate new individuals M(t + 1) through crossover and mutation operations to produce

offspring.6: Replace least-fit individual with new ones.7: end for8: Return the best n features of NSL-KDD dataset.

3.3 Intrusion Detection Phase

we evaluate the performance of the proposed high speed network intrusion detectionframework on different set of classifier. The set of classifier includes rules basedclassifiers (Ridor, Decision table), trees classifiers (REPTree, C 4.5, Random Forest)and Naı̈ve bays classifier.

4 Experiments and Analysis

4.1 Network Dataset Characteristics

NSL-KDD dataset [32] is a benchmark used for evaluating network intrusion detec-tion systems. It consists of selected records of the complete KDD’99 dataset [33].Each NSL-KDD connection record contains 41 features (e.g., protocol type, service,and flag) and is labeled as either normal or an attack. The training set contains a totalof 22 training attack types, with additional to 17 types of attacks in the testing set.The attacks fall into four categories:

1. DoS denial of service e.g Neptune, Smurf, Pod and Teardrop.2. R2L: unauthorized access to local from a remote machine e.g Guess-password,

Ftp-write, Imap and Phf.3. U2R: unauthorized access to root privileges e.g Buffer-overflow, Load-module,

Perl and Spy.4. Prob: collect information to helpful for make an attack in the future eg. Port-

sweep, IP-sweep, Nmap and Satan.


4.2 Comparison Criteria

The Comparison Criteria to evaluate the proposed network intrusion detection sys-tem are: (1) the speed of the ID system and (2) the classification Accuracy.

Classification performance of ID system is measured in term of precision, recalland F−measure; which are calculated based on the confusion matrix given in Table1. F-measure is a weighted mean that assesses the trade-off between precision andrecall. An ID system should achieve a high recall without loss of precision.

Table 1 Confusion MatrixPredicted Class

Normal AttakeActual Class Normal True positives (TP) False positives (FP)

Attake False negatives (FN) True negatives (TN)

Where, T P and T N indicates that normal and attacks events are successfully la-beled as normal and attacks, respectively. FP refer to normal events being predictedas attacks; while FN are attack events incorrectly predicted as normal [34].

Recall =T P

T P+FN(1)

Precision =T P

T P+FP(2)

F −measure =2∗Recall ∗Precision

Recall +Precision(3)

4.3 Results and Analysis

The proposed real time network intrusion detection system is evaluated using theNSL- KDD dataset, where 59586 records are randomly taken. All experiments havebeen performed using Intel Core 2 Duo 2.26 GHz processor with 2 GB of RAM.

We evaluate the proposed framework on different categories of classifiers; treeclassifiers (REPTree, C 4.5, Random Forest), rule based classifiers (Ridor, Decisiontable) and Naı̈ve bayes classifier.

Table 2 and 3 shows the F-measures and speed achieved for the different set ofclassifiers; without applying any preprocessing phase, applying IEM discritizationand finally applying IEM discritization combined with GA feature selection (14features). The comparison results are based on 10 fold cross-validation.

From table 2, applying IEM Discretization method leads to highly improve thespeed of the systems especially for C4.5 classifier; which is very important for real


Table 2 Comparison of F-measures and speed for tree classifiers

REPTree C4.5 Random ForestPreprocess approach F-measure speed (sec.) F-measure speed(sec.) F-measure speed(sec.)

Non 98.3% 6.07 98.8% 43.46 99.2% 34.58Discretization 98.1% 3.75 99.0% 3.05 99.1% 2.87

Discretization + GA 98.7% 1.20 98.8% 0.77 99.3% 1.76

time network intrusion detection systems. Also, the classification accuracy for REP-Tree and Random Forest classifier does not effect by discretization, while it is im-proved for C4.5 classifier. The systems speed shows another improvement whencombining the discretization with GA faeture selection; which reduces the NSL-KDD dimentions from 41 features to 14 features.

Table 3 Comparison of F-measurs and speed for Rules based and Naı̈ve bayes classifiers

Ridor Decision table Naı̈ve BayesPreprocess approach F-measure speed(sec.) F-measure speed(sec.) F-measure speed(sec.)

Non 98.3% 435.57 96.4% 136.6 72.16% 4.21Discretization 97.2% 129.16 96.3% 132.0 93.6% 0.21

Discretization + GA 97.8% 61.14 97.9% 25.5 94.5% 0.09

Table 3, gives the impact of applying discretization and applying discretizationcombined with GA feature selection. For Ridor classifier the system speed shows agood improvement. Also, it is clear that, discretization has a positive impact on thenaı̈ve bayes classifier, that is, it helps to highly improve the detection accuracy andspeed.


Fig. 2 An overall speed comparision of the proposed network ID framework.

Figures 2 shows the speed comparision of the proposed network ID frameworkfor the different set of classifiers. Also, it gives the accuracy comparision for theNaı̈ve bayes classifier.

5 Conclusions

In this study, a real time discritize network ID framework is proposed. We explorethe impact of applying IEM discretization and GA feature selection on the perfor-mance of network IDS. Different classifiers algorithms; rules based classifiers (Ri-dor, Decision table), trees classifiers (REPTree, C4.5, Random Forest) and Naı̈vebays classifier are used to evaluate the classification time and accuracy of the in-troduced network ID framework. Experiments on the NSL-KDD dataset show thatIEM discretization helps to highly improve the time to classify the test instances.Which is an important factor for real time network IDS. Also, IEM discretization


has a positive impact on the classification accuracy, especially for the naı̈ve bayesclassifier.

References

1. J. Anderson, “Computer security threat monitoring and surveillance”, Technical Report,James P. Anderson Co., Fort Washington, PA, 1980.

2. C. Tsai , Y. Hsu , C. Lin and W. Lin, “Intrusion detection by machine learning: A review”,Expert Systems with Applications, vol. 36, pp. 11994-12000, 2009.

3. H. Debar ,M. Dacier , A. Wespi, “Towards a taxonomy of intrusion-detection systems”, Com-puter Networks, vol. 31, pp. 805-822, 1999.

4. X. Xu, “Adaptive Intrusion Detection Based on Machine Learning: Feature Extraction, Clas-sifier Construction and Sequential Pattern Prediction”, International Journal of Web ServicesPractices, vol. 2, pp. 49-58 2006.

5. J. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann, San Mateo, 1993.6. L. Breiman, “Random Forests”, Machine Learning, vol. 45, pp. 5-32, 2001.7. F. Jemili, M. Zaghdoud and M. Ahmed, “Intrusion detection based on Hybrid propagation in

Bayesian Networks,” In proceedings of the IEEE international conference on Intelligence andsecurity informatics, pp. 137-142, 2009.

8. J. Rissanen, “Modeling by shortest data description”, Automatica, vol. 14, pp. 465471, 1978.9. R. Kerber, “Chimerge: discretization of numeric attributes”, In Proceedings of the 9th Inter-

national Conference of Artificial Intelligence, Cambridge, UK, pp. 123-128, 1992.10. C. Tsai, C. Lee and W. Yang, “A discretization algorithm based on class-attribute contingency

coefficient”, Information Sciences, vol. 178, pp. 714-731, 2008.11. F. Amiri, M. Yousefi, C. Lucas, A. Shakery and N. Yazdani, “Mutual information-based fea-

ture selection for intrusion detection systems”, Journal of Network and Computer Applica-tions, vol. 34, pp. 1184-1199, 2011.

12. C. Tsang, S. Kwong and H. Wang, “Genetic-fuzzy rule mining approach and evaluation offeature selection techniques for anomaly intrusion detection”, Pattern Recognition, vol. 40,pp. 2373-2391, 2007.

13. K. Chan, C. Kwong, Y. Tsim, M. Aydin and T. Fogarty,“A new orthogonal array basedcrossover, with analysis of gene interactions, for evolutionary algorithms and its applicationto car door design”, Expert Systems with Applications, vol. 37, pp. 3853-3862, 2010.

14. Y. Li, S. Zhang and X. Zeng, “Research of multi-population agent genetic algorithm for fea-ture selection”, Expert Systems with Applications, vol. 36, pp. 11570-11581, 2009.

15. M. Mizianty, L. Kurgan and M. Ogiela, “Discretization as the enabling technique for theNaı̈ve Bayes and semi-Naı̈ve Bayes-based classification”, The Knowledge Engineering Re-view, vol. 25, pp. 421-449, 2010.

16. S. Kotsiantis and D. Kanellopoulos, “Discretization Techniques: A recent survey”,GESTSInternational Transactions on Computer Science and Engineering, vol.32, pp. 47-58, 2006.

17. H. Liu, F. Hussain, C. Tan and M. Dash, “Discretization: an enabling technique”, Data Miningand Knowledge Discovery, vol. 6, pp. 393-423, 2002.

18. J. Dougherty, R. Kohavi and M. Sahami, “Supervised and unsupervised discretization of con-tinuous features”, In Proceedings of the 12th international conference on machine learning,San Francisco: Morgan Kaufmann; pp. 194-202, 1995.

19. H. Steck and T. Jaakkola, “Predictive discretization during model selection”, In Proceedingsof DAGM Symposium In Pattern Recognition, Tbingen, Germany, pp. 1-8, 2004.

20. J. Cerquides and R. Lopez, “Proposal and Empirical Comparison of a Parallelizable DistanceBased Discretization Method”. In Proceedings of the III International Conference on Knowl-edge Discovery and Data Mining (KDDM97). Newport Beach, California USA, pp. 139-142,1997.


21. U. Fayyad and K. Irani, “Multi-interval discretization of continuous-valued attributes for clas-sification learning”, In Proceedings of the International Joint Conference on Uncertainty inAI. Morgan Kaufmann, San Francisco, CA, USA, pp. 1022-1027, 1993.

22. M. Dash, K. Choi, P. Scheuermann and H. Liu, “Feature selection for clustering-a filter solu-tion”, In Proceedings of the Second International Conference on Data Mining, pp. 115-122,2002.

23. L. Yu and H. Liu, “Feature selection for high-dimensional data: a fast correlation-based filtersolution,” In Proceedings of the twentieth International Conference on Machine Learning, pp.856-863, 2003.

24. R. Kohavi and G. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol.2, pp. 273-324, 1997.

25. Y. Kim, W. Street and F. Menczer, “Feature selection for unsupervised learning via evolu-tionary search”, In Proceedings of the Sixth ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, pp. 365-369, 2000.

26. X. Jin, A. Xu, R. Bie and P. Guo, “Machine learning techniques and chi-square feature selec-tion for cancer classification using SAGE gene expression profiles,” Lecture Notes in Com-puter Science, 3916, DOI: 10.1007/1169173011, pp. 106-115, 2006.

27. M. Ben-Bassat, ”Pattern recognition and reduction of dimensionality,” Handbook of StatisticsII, North-Holland, Amsterdam, vol. 1, 1982.

28. H. Peng, F. Long, C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance ,and min redundancy,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 27, pp. 1226-1238, 2005.

29. J. Holland, “Adaptation in Natural and Artificial Systems. University of Michigan Press,AnnArbor, MI, 1975.

30. J. R. Quinlan, “Induction of Decision Trees”, Machine Learning, vol. 1, pp. 81-106, 1986.31. B. Jiang , X. Ding , L. Ma , Y. He , T. Wang and W. Xie,“ A Hybrid Feature Selection Algo-

rithm:Combination of Symmetrical Uncertainty and Genetic Algorithms”, In Proceedings ofthe Second International Symposium on Optimization and Systems Biology OSB’08), China,pp. 152-157, 2008.

32. I. Cohen, Q. Tian, X. Zhou and T. Huang, “Feature Selection Using Principal Feature Analy-sis”, In Proceedings of the 15th international conference on Multimedia, Augsburg, Germany,September pp. 25-29, 2007.

33. KDD’99 dataset, http://kdd.ics.uci.edu/databases, Irvine, CA, USA, July, 2010.34. R. Duda, P. Hart and P. Stork, “Pattern Classification (Second Edition)”, JohnWiley & Sons,

USA, 2001.


[advances in intelligent systems and computing] proceedings of seventh international conference on...

Documents