differ feature reduction techniques for intrusion detection … · differ feature reduction...

DOI:10.21884/IJMTER.2016.3155.RHOQX 85

DIFFER FEATURE REDUCTION TECHNIQUES FOR INTRUSION

DETECTION SYSTEM Rasha Thamer Shawe

1 and Assist.prof. Safana H. Abbas

2

1,2 College of education in Computer Science, AL-Mustansiriya University, Iraq. Baghdad.

Abstract— Due to a growing number of computer network in recently years there has been an

increasing interest in the intrusion detection system (IDS) that it monitors the activity occurring in a

computer network and analyze them for recognizing intrusions to protect the computer network .

Most of existing IDS use all of features available in the network packet to analyze and look

instructive pattern while some of these features are redundant and irrelevant this is time consuming

and performance degrading process. In this paper three different dimensionality reduction algorithms

Principal Component Analysis (PCA),Linear Discriminate Analysis( LDA) and Singular Value

Decomposition (SVD), were used and an experimental result were tested to get the efficient

algorithm that when applied with classification algorithm gives the highest result. Classification

algorithms were used Back Propagation Neural Network (BPNN) to detect different attack types.

Keywords— IDS, KDDCUP1999, Feature Reduction, Attack Detection.

I. INTRODUCTION

Security is a great issue for all networks in today’s enterprise environment. intruder and

Hackers have made many successful attempt to bring down high profile company networks and web

services.

intrusion detection (ID) is to transfer unwanted content, malicious, or hazardous to the network, the

system being monitored can be a web server, a database, or a group of computers. Intervention may

be virtuous as unwanted or harmful, such as a Trojan horse that infects your computer system by

reading, writing, or even delete files [1]

Systems that attempt to detect malicious behavior that is targeted against a network and its

resources are called Intrusion Detection Systems (IDS). They are network security tools that process

local audit data or monitor network traffic to identify evidence of an occurring attack. IDS can either

search for specific known patterns, called signatures, in their input stream "misuse-based" or detect

certain deviations from expected behavior "anomaly-based" that indicate hostile activities against

the protected network.

Intrusion detection systems(IDS) constitute, besides firewalls and cryptography, the third building

block of a secure computer system installation and can discover intrusions in all of the three stages of

an intrusion [2].

Intrusions detection is the process of monitoring the events that occurred in a computer

system or network and analyzed for signs of intrusion. It determines the interference of any one

attempt to provide confidentiality, integrity concessions, availability, or to bypass the security

mechanisms of a computer or network. The reason of the intrusion by attackers is to access the

system from the inside or external network such as the Internet, unauthorized users of systems who

are trying to get additional privileges, which does not justify them, and unauthorized users who

misuse the privileges granted to them [3,4].

The term "Data Mining" was introduced in the 1990s, but the DM is the evolution of a field

with a long history [5]. data mining is a multidisciplinary field diagram work from areas including

DB systems, statistics, machine learning, information retrieval, pattern recognition , artificial

intelligence 'AI', knowledge based system, and data visualization [6]. DM techniques are

increasingly employed in traditional scientific discovery disciplines, such as biological, medical,

International Journal of Modern Trends in Engineering and Research (IJMTER)

Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161

@IJMTER-2016, All rights Reserved 86

chemical, physical and social sciences, and a variety of other knowledge industries, such as

governments and education with the aim of discovering previously unknown patterns and

correlations, as well as predicting trends and behaviors[7].

II. INTRUSION DETECTION SYSTEM WITH DATA MINING

Data Mining (DM)-based intrusion detection (ID) framework can detect novel intrusions

accurately and automatically. The DM methods automatically find patterns in the dataset used and

use these patterns to detect a set of novel intrusions. by comparing detection tools that using DM

with a conventional signature based methods; see that, DM based detection methods are more than

doubles the current detection rates for new malwares [8].

Intrusion Detection System with Data Mining Generally, various fields like marketing,

financial needs and business governments are depend totally on the term "data mining", and it is

proved the success. The most important techniques of data mining that are depended on most fields

containing classification that transforms a data entity into one of some identified groups [9].

Huge amount intrusion detection models have been suggested in each of the scientific

research and business area after the first intrusion detection system was to provide information

about these research and applications business, in spite of that models are very different in the

mechanisms they used to collect and process data ,many of them depend on a relatively general

constructional model[10].

Most commercial and traditional IDSs are weak and could not give an excellent decision.

These systems actually need a misuse detection mechanism. Misuse detection tries to find out some

patterns of program or customer behavior that match common threats patterns that are kept as

signatures. The hand-coded signatures are supported by researchers based on their good background

of knowledge about intrusion strategies. New attacks could be detected by anomaly detection

techniques.

Anomaly detection constructs forms of natural system manner, called profiles. That will be

used to discover novel patterns that significantly different from the profiles. These deviations could

illustrate real attacks or simply be novel behaviors that must be included in the data. The most

important benefits of anomaly detection is that it could discover new attacks that never been

discovered. in fact, a human analyst must sort through the deviations to ascertain which represent

real intrusions. The most disadvantage of anomaly detection is the high ratio of FP(false positive).

Novel patterns of attacks may be included with the grouping of signatures of Misuse Detection [11].

From this discussion, it becomes obvious that current traditional IDSs face many limitations.

This has lead to an increased interest in data mining for intrusion detection. In comparison to

traditional IDS, IDS based on DM are generally more precise and require far less manual processing

and input from human experts [11].

Intrusion Detection System (IDS) is a computer program that attempts to perform intrusion

detection (ID) by either misuse or anomaly detection, or a combination of techniques (Anomaly and

Misuse). An IDS should preferably perform its task in real time [12].

Intrusion detection is therefore needed as another wall to protect computer systems.

The central elements to intrusion detection engine are: resources to be protected in a target

system, i.e., user accounts, file systems, system kernels, etc; models that characterize the

“normal” or “legitimate” behavior of these resources; techniques that compare the actual

system activities with the established models, and identify those that are “abnormal” or

“intrusive”.

III. PROPOSED SYSTEM

The proposed system is consisting mainly of two major tasks which are:

Feature Reduction.

Attack Detection.




Input KDD Dataset

Preprocessing

(encoding KDD dataset)

Feature Extraction and

Dimensionality

Reduction

Classification Algorithms

(BPNN)

Performance Measurement

The proposed intrusion detection system is illustrated in figure (1) which consisting of the following

stages:

using (PCA,LDA,SVD)

"Figure 1 Proposed Intrusion Detection System"

Input Dataset Stage: In the first step of the proposed system takes the KDD Cup 99 dataset as

an input. And it is given to the next pre-processing step.

Pre-processing Stage: KDDCUP1999 dataset contains number of features and these are in

different format. Some are number format and others are in character format. So, different format

dataset is converted into similar format to be used in the next phase.

Feature Extraction and Dimensionality Reduction Stage: Dimensionality reduction step is

used for Feature Extraction phase by extracting suitable features from dataset, and reduce the

KDD dimensions as well. In this step, different algorithms used such as Principal Component

Analysis (PCA), Linear Discriminant Analysis (LDA) and Singular value decomposition (SVD)

techniques for features reduction . This step reduces the dimensionality of dataset and extracted

features to be given as an input to the next step.

Detection Model Stage: Neural Network back propagation(BPNN) is used as a classifier in this

step. It is important to take previous step output dataset as an input and trains the network using

BPNN model is used both linear and non-linear classifier . for the non-linear classified stage

different kernels is used like Gaussian, linear, and polynomial kernel as a transformation and

mapping approach.

Performance Measurement Stage: Evolution for the results of the classification outputs using

different criteria.

IV. PROCEDURE USING IN PREPROCESSING

KDD’99 as an input dataset contains number of features and these are in different format.

Some are number format and others are in character format. So, these different format datasets are

converted into similar format to be extracted to the next phase.

Since there are some features of KDD CUP1999 datasets are continuous, thus a process for

normalizing these features have been done in order to become more convenient with the data mining




Feature No. (42) Feature No. (1)

Feature No. (2)

, different dimensionality algorithms and classification algorithms. Normalization is used for data

preprocessing, where the features data are scaled so as to fall within a small specified range such as -

1.0 to 1.0 or 0.0 to 1.0. Normalizing the input values for each feature measured in the training

samples will help to speed up the learning phase.

4.1 Dataset Labeling

The dataset should be labeled by using 10% of the corrected dataset should be labeled by using

the whole feature space in the KDD 10% corrected dataset as it shown in the screen shot which is

located in the feature of the whole dataset. The screen shot of the dataset (KDD 99) that are taken

from mat lab environment it shown in Figure (2).

"Figure 2 Sample data of 10% correction KDD cup dataset."

The dataset records contain 42 features (e.g., protocol type, service, and Flag) and is labeled as

either normal or an attack with one specific attack type as shown in Figure (3), if we take a sample

from the dataset before doing the scaling (normalization), first row as an example. We have noticed

that the feature (42) has the normal type of attack as we describe that before in Table (1).

"Figure 3. First row (data sample) of 10% correction KDD cup dataset"

"Table 1. Class labels and number of records of "10% KDD'99" dataset"

Attack Type Original Number of Records Number of Records after removing

duplicated instances Attack Category

Back 2203 994 DoS

Land 21 19

DoS




Neptune 107201 51820 DoS

Pod 264 206 DoS

Smurf 280790 641 DoS

Teardrop 979 918 DoS

Satan 1589 908 Probe

Ipsweep 1247 651 Probe

Nmap 231 158 Probe

Portsweep 1040 416 Probe

Normal 97277 87831 Normal

guess_passwd 53 53 R2L

ftp_write 8 8 R2L

Imap 12 12 R2L

Phf 4 4 R2L

Multihop 7 7 R2L

Warezmaster 20 20 R2L

Warezclient 1020 1020 R2L

Spy 2 2 R2L

buffer_overflow 30 30 R2L

Loadmodule 9 9 R2L

Perl 3 3 R2L

Rootkit 10 10 R2L

So, the dataset is labeled according to the following attacks which are fall into one of five

categories listed below in Table (2): "Table 2. Our Class labeling of "10% KDD'99" dataset"

Attack Type Description Sub Types Label

(DoS)

Denial of Service

Attacker tries to prevent legitimate users from

using a service

Smurf

1

Neptune

Back

Teardrop

Pod

Land

Normal data with no attack

normal

2

Probe Attacker tries to prevent

legitimate users from using a service.

Satan

3 Ipsweep

Portsweep

Nmap

(R2L)

Remote to Local

Attacker does not have an account on the

victim machine, hence tries to gain access

Warezclient

4

guess_passwd

Warezmaster

Imap

ftp_write

Multihop




Phf

spy.

User to Root (U2R) Attacker has local access to the victim machine

and tries to gain super user privileges

buffer_overflow

5 Rootkit

Loadmodule

Perl

The algorithm steps that is used for doing the class labeling is shown in Algorithm (1).

Algorithm 1. KDD 99 Class Labeling

Input: 10% KDD data setT = D (F, C)

Normalized 10% KDD Dataset

Output:Class labels

1. Initialize class labels [ ] , ’class

label’, [ ]

2. Repeat

3. For each column of feature

4. Choose feature number (42)

5.

6. If [ ]

7. [ ]

8. Obtain new labeled instances old one from Du induced by ;

9. until

10. Return Selected features: S.

There are many nominal values like HTTP, ICMP,SF in the dataset. therefore we have to

transform these nominal values to numeric values in advance . For example, the service type of "tcp"

is mapped to 1,"udp" is mapped to 2,"icmp"is mapped to3 and we will follow table(3)to transform

the nominal values of dataset features into the numeric values. "Table 3.Transformation Table"

Type Feature Name Numeric value

Protocol-type

TCP 1

UDP 2

ICMP 3

Flag

SF 1

S1 2

REJ 3

S2 4

S0 5

S3 6

RSTO 7

RSTR 8

RSTOS0 9

OTH 10




SH 11

Service All services 1 to 66

Attack All attack 1 to 23

The transformation the original KDDCUP1999 dataset will become as shown in figure(4).

"Figure 4. Pre-processing Original KDDCUP1999 dataset before and after transformation."

4.1.1Mean range

The seconded step of the pre-processing KDD’99’s dataset is to find the mean range between

[0,1]. We do that by finding the maximum and minimum value of a given feature, then it will be

transformed the feature into a range of value [0,1] by using

4.1.2 Normalization

The third step is to normalized those features depending on the minim and the maximum

values that have been calculated in the previous step. It is estimated depending on the statistical

normalization that has been described in Equation

The statistical normalization is defined as

where is mean of values for a given attribute

(1)

(1)

∑

(2)

before transformation

0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.

00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf.

0,tcp,http,SF,248,2129,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,23,23,0.00,0.00,0.00,0.00,1.00,0.00,0.00,23,255,1.00,0

.00,0.04,0.03,0.00,0.00,0.00,0.00,normal.

after transformation

0,3,8,1,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,

1.00,0.00,0.00,0.00,0.00,0.00,6.

0,1,1,1,248,2129,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,23,23,0.00,0.00,0.00,0.00,1.00,0.00,0.00,23,255,1.00,0.00,0.

04,0.03,0.00,0.00,0.00,0.00,1.

7




and is its stand deviation

4.2 Feature Extraction and Dimensionality Reduction

Feature extraction and Dimensionality reduction is defined as follows: given a set of

candidate features, Selecta subset or a feature that performs the best under some classification

algorithms. This process can reduce not only the cost of recognition by reducing the number of

features, but also provide a better classification accuracy due to finite dataset size effects.

4.2.1 Principal Component Analysis (PCA) Contributions to PCA is technique used for feature extraction, data used in intrusion detection

problem are high dimensional in nature. It is desirable to reduce the dimensionality of the data for

easy exploration and analysis. the PCA is often used for this purpose.

The PCA algorithm is shown in Algorithm (2) .

Algorithm 2. Principle Component Analysis (PCA)

Input:Generate Data matrix (features of KDD 99)

Number of principle component

Output:New Dimensions

1. Repeat

2. Compute the mean of transactions

∑

3. Subtract the mean from each transaction

4. Compute the covariance matrix

5. From Compute eigenvectors of

6. Consider matrix as a matrix matrix

7. Compute the eigenvectors of such that:

8.

9.

10. Compute the best eigenvectors of

11. Keep only K eigenvectors, (K features with their values).

12.

13. until

14. Return

4.2.2 Linear Discernment analysis (LDA)

Linear discernment analysis (LDA) is another method that are used for dimensionality

reduction and feature extraction (Selection). LDA seeks to reduce dimensionality while preserving as

much of the class discriminatory information. Linear Discernment Analysis (LDA) Algorithm steps

are shown in Algorithm (3).

Algorithm 3. Linear Discriminant Analysis (LDA)

Input: Generate Data matrix

Output: New Dimensions

1. Initialize

√

∑

(3)

8




2. Repeat

3. Compute the d-dimensional mean vectors for the different classes from .

4. Compute the scatter matrices (in-between-class and within-class scatter ).

5. Compute the eigenvectors ( ,… ) and corresponding eigenvalues ( ).

6. Sort the eigenvectors by decreasing eigen values

7. choose eigenvectors with the largest eigenvalues to form a

8. dimensional matrix

9. (where every column represents an eigenvector).

10. Use this eigenvector matrix to transform the samples onto the new

subspace.

where is a -dimensional matrix representing the

are the transformed -dimensional samples in the newsubspace)

11. Until

4.2.3 Singular-Value Decomposition(SVD)

It is a powerful computational tool and commonly used in the solution of matrix rank

estimation Singular-Value Decomposition (SVD) algorithm steps are described in the algorithm (4).

Algorithm 4. Singular-Value Decomposition (SVD)

Input: Generate Data matrix

Output: New Dimensions

11. Repeat

12. Applying SVD to the matrix as

is an matrix

- no. of sessions (vectors)

- is no. of attributes)

matrix of the eigenvectors

is matrix which is diagonal

is matrix the eigenvectors.

13. Construct the covariance matrix from this decomposition by

14. an orthogonal matrix

15. square roots of the eigenvalues of are the singular values of

16. until

1. Return

4.4 Back Propagation (BP)

As mentioned before, the mathematical model of the Biological Neural Network is defined as

Artificial Neural Network. One of the Neural Network models which are used almost in all the fields

is Back Propagation Neural Network. The back propagation algorithm is used in multi layered feed-

forward ANNs. This means that the artificial neurons are organized in layers, then send their signals

"forward", and then the errors are propagated backwards. The network receives the inputs signal by

neurons in the input layer, and the output of the network is given by the neurons on an output layer.

Then, may be one or more intermediate hidden layers. The back propagation algorithm uses

supervised learning, which means that the algorithm is provided by examples of the inputs and

outputs that the network must be compute, and then the error which is the difference between actual

and expected results is calculated. The idea of the back propagation algorithm which is used to

reduce this error, until the ANN has been learned the training data. The training begins with initial

(random) weights, and the goal is to adjust (update) them so that the error will be minimal.

9




The following Pseudo coding in Algorithm (5) describes the BP algorithm

Algorithm 5. Back propagation Neural Network

Input: Input features or domain

Output: Class Type

1. Initialize all weights with small random number between [-1,1]

2. Repeat

3. For every pattern in the Training set

4. Present the pattern to the network

5. Propagation the input forward through the network

6. For each layer in the network

7. For each neuron in the layer

8. Calculate the weight sum of the input to the neuron

9. Add the threshold to the sum

10. Calculate the activation function for the neuron

11. End

12. End

13. Propagation the input forward through the network

14. For each layer in the network

15. For each neuron in the layer

16. Calculate the neuron’s signal error

17. Update each neuron’s weight in the network

18. End

19. End

20. Calculate global error

21. Calculate the error function

22. End

23. Until (Maximum number of iteration< specific) and (error> specific)

4.5 Evaluation

To assess the validation and accuracy of the intrusion detection and classification system

based on feature selection and dimensionality reduction, in this case we need to introduce the

measures of validation the results of classification.

confusion matrix for intrusion detection is defined as an matrix, where denotes the

number of classes. A confusion matrix contains information about actual and predicted

classifications done by a classification system. Performance of such systems is commonly

evaluated using the data in the matrix. Each column of the matrix represents the instances in a

predicted class, while each row represents the instances in an actual class. The confusion matrix

shows the classes which are correctly classified and the classes that are misclassified. Confusion

matrix is used to evaluate these parameters as shown in Table (4). "Table 4. Confusion Matrix"

Attack Predicted Class

Yes No

Actual Class

Yes TP FN

No FP TN

The performance of neural network can be evaluated using various parameters. Standard

parameters include classification accuracy, detection rate and false positive rate, the given parameter

calculated using True Positive (TP), False Negative (FN), False Positive (FP) and True Negative

(TN).

True Positive (TP): true positive results refer to correct classifications of positive cases.

True Negative (TN): true negative results refer to correct classifications of negativecases.




False Positive (FP): false positive results refer to incorrect classifications of positive cases

into negative class.

False Negative (FN): false negative results refer to incorrect classifications of negative

cases into class positive.

Accuracy

Recognition Rate is defined as the ratio between the numbers of correct recognition decision

to the total number of attempts as it is given in equation(1.5).

The evaluating performance of face retrieval system is measured through how many

predictions for specific query are truly relevant to are a query. The retrieval efficiency is generally

evaluated through two well- known metrics, precision and recall. The formula for calculating these

measures are given as in equations and respectively.

Detection Rate

Detection rate is defined as the ratio between the numbers of true positive divided by the total

number positive detection samples as it described in (1.6).

False Alarm

False alarm is defined as the ratio between the numbers of false positive detection divided by

the total number the false positive and true negative test samples as it described in (1.7).

the category of data behavior in intrusion detection for binary kind classes (Normal and

Attacks) in term of true negative, true positive, false positive and false negative.

V. KDD’99 INPUT DATASET

The proposed system takes the KDD Cup 99 dataset as an input where the total number of

records we used as a sample is (10000 ). After taking the input it is given to next phase to do the pre-

processing .Each record contains 41 features, the records have labeled either normal or attack type,

with exactly one specific attack type fall into one of the four attack categories: denial of service

attack(DOS),user to root attack (U2R), remote to local attack(R2L) and probing attack.

VI. RESULT

Show the overall performances results of the Neural Network (BPNN) on KDD Cup 99

depending on training and testing datasets by using all four different algorithms (PCA, LDA and

SVD) that we have proposed in our system. intrusion detection classification system based feature

reduction using different algorithms on the KDD Cup 99 dataset. three algorithms are used for

(5)

(6)

(7)




reducing the 42 features of the KDD data set, and one classification algorithms to detect the four type

of ID attacks. Principal Component Analysis (PCA) The first algorithm that we used for dimensionality reduction with Neural Network(BPNN) as an

attack detection for intrusion detection system. in this case we test the PCA using three different k

starting from (k=21,11, and 7) from the original (42) feature space by using trial and error .

Tables (5) shows the confusion matrix results after applying the Neural Network (BPNN)

classification algorithm on KDD Cup 99 training and testing datasets depending on selection just

(k=21) features. "Table 5.confusion matrix when using PCA with K=21 and BPNN in the training and testing dataset"

Ori

gin

al

Fea

ture

Dimensionality

Reduction

Training Dataset

Confusion matrix

Evaluation K-Feature

Selection Time TP TN FP FN

42 21 0.62184 91.7534 9.0144 8.4685 90.9856

Detection 0.9103

False 0.4844

Accuracy 91.2586

Ori

gin

al

Fea

ture

Dimensionality

Reduction

Testing Dataset

Confusion matrix

Evaluation K-Feature


42 21 50.4142 85.3264 10.9542 14.6736 89.0458

Detection 0.8862

False 0.5726

Accuracy 87.1861

Table (6) show different results depending on (k=11) and applied on training and testing

dataset. "Table 6. confusion matrix when using PCA with K=11 and BPNN in the training and testing dataset"

Ori

gin

al

Fea

ture

Dimensionality

Reduction

Training Dataset

Confusion matrix

Accuracy Feature

Selection

Training

Time TP TN FP FN

42 11

164.703

88.7526 12.3076 11.2474 87.6924

Detection 0.8782

False 0.4775

Accuracy 88.2225

Training

Time

Testing Dataset

Confusion matrix

Accuracy TP TN FP FN

51,31030

87.5391 7.0826 12.4609 92.9174

Detection 0.9251

False 0.6376

Accuracy 90.2282

Table (7) show different results depending on different k-selection (k=7) and applied training

and testing dataset

"Table 7. confusion matrix when using PCA with K=7 and BPNN in the training and testing dataset"




Ori

gin

al

Fea

ture

Dimensionality Reduction Training Dataset

Confusion matrix

Accuracy Feature

Selection

Training

Time TP TN FP FN

42 7

139.7367 92.8880 8.3342 7.1120 91.6658

Detection 0.9177

False 0.4604

Accuracy 92.2769

Testing

Time

Testing Dataset

Confusion matrix


50.22079 85.6803 11.5016 14.3197 88.4984

Detection 0.8816

False 0.5546

Accuracy 87.0894

Figure (5) shows the training error curve depending on k=21,11,7 using (BPNN) and (PCA).

Figure (4.2) show the testing error curve result with respect the iteration number by using PCA with

k=21,11,7and (BPNN) classification algorithm.

K=21 K=11

K=7

"Figure 5. the training error curve result with respect the iteration number by using PCA with (k=21,11,7)

and Neural Network (BPNN) classification algorithm."




Figure (6) shows the testing error curve result respect with the iteration number by using PCA

with k=21,11,7 and (BPNN) classification algorithm.

K=21 K=11

K=7

"Figure 6.testing error curve depending on (BPNN)and (PCA) using k=21,11,7."

Linear Discriminant Analysis (LDA)used dimensionality reduction function. This algorithm has

selected only (k=4) features from the whole dataset features which is (42). Tables (8) shows the

confusion matrix results after applying the (BPNN) classification results on KDD Cup 99

datasets using selected k on the training dataset.

"Table 8. confusion matrix when using(LDA )with (BPNN) in the training and testing dataset "

Ori

gin

al

Fea

ture

Dimensionality

Reduction

Training Dataset

Confusion matrix

Accuracy Feature


42 4

155.9450 44.1059 55.8941 55.8941 44.1059

Detection 0.4411

False 0.5000

Accuracy 44.1059

Training

Time Testing Dataset




Confusion matrix


43.74335 44.3063 55.6132 55.6937 44.3868

Detection 0.4434

False 0.5004

Accuracy 44.3466

Figure (7) shows the training error curve depending on (k=4) using (BPNN) and (LDA), and

Figure (8) show the testing error curve result respect with the iteration number by using LDA with

k=4 and (BPNN) classification algorithm.

"Figure 7. training error curve depending on (NN) and (LDA) using K=4."

"Figure (8) Testing error curve depending on (NN) and (LDA) using K=4."




Singular-Value Decomposition(SVD)

The used for dimensionality reduction with Neural Network(BPNN) as an attack detection for

intrusion detection system. We used the SVD depending on different number of selected features

which called k-feature section (the reduced no. of feature as a dimensionality reduction that we want

to select). usually after mean centering (normalizing) the data matrix for each feature . That means

that there is no majority to give us any clue about the k selection in this case we test the SVD using

three different k starting from (k=21,11, and 7) from the original (42) feature space by using trial

and error.

Tables (9) shows the confusion matrix results after applying the Neural Network (BPNN)

classification algorithm on KDD Cup 99 training and testing datasets depending on selection just

(k=21) features. "Table 9.confusion matrix when using SVD with K=21 and BPNN in the training and testing dataset"

Ori

gin

al

Fea

ture


Confusion matrix

Accuracy Feature

Selection

Training

Time TP TN FP FN

42 21

90.521816 96.4787 33.1344 3.5213 66.8656

Detection 0.7444

False 0.0961

Accuracy 81.6722

Testing

Time

Testing Dataset

Confusion matrix


14.646747 75.9007 3.5883 24.0993 96.4117

Detection 0.9549

False 0.8704

Accuracy 86.1562

Table (10) show different results depending on (k=11) and applied on training, and testing

dataset. "Table 10. confusion matrix when using SVD with K=11 and BPNN in the training and testing dataset "

Ori

gin

al

Fea

ture


Confusion matrix

Accuracy Feature

Selection

Training

Time TP TN FP FN

42 11

92.203391

93.6704 7.2423 6.3296 92.7577

Detection 0.9282

False 0.4664

Accuracy 93.2141

Training

Time

Testing Dataset

Confusion matrix


8.060197

57.5514 5.9335 42.4486 94.0665

Detection 0.9065

False 0.8774

Accuracy 75.8090

Table (11) show different results depending on different k-selection (k=7) and applied on training and testing

dataset.

"Table 11.confusion matrix when using SVD with K=7 and BPNN in the training and testing dataset"

O ri gi

na l Fe

at

ur e Dimensionality

Reduction Training Dataset




Confusion matrix

Accuracy Feature

Selection

Training

Time TP TN FP FN

42 7

45.154621 89.5418 11.0414 10.4582 88.9586

Detection 0.8902

False 0.4864

Accuracy 89.2502

Testing

Time

Testing Dataset

Confusion matrix


10.289327 43.6686 56.2010 56.3314 43.7990

Detection 0.4373

False 0.5006

Accuracy 43.7338

Figure (9) shows the training error curve depending on k=21,11,7 using (BPNN) and (SVD),

and Figure (10) show the testing error curve result respect with the iteration number by using (SVD)

with k=21,11,7and (BPNN) classification algorithm.

"Figure 9 the training error curve result with respect the iteration number by using SVD with (k=21,11,7) and

Neural Network (BPNN) classification algorithm."

Figure (10) shows the testing error curve result respect with the iteration number by using

SVD with k=21,11,7 and (BPNN) classification algorithm.




"Figure(10)testing error curve depending on (BPNN)and (SVD) using k=21,11,7."

Tables (12) and (13) show the overall performances results of the Neural Network (BPNN) on

KDD Cup 99 depending on training and testing datasets by using all three different algorithms (PCA,

LDA and SVD) that we have proposed in our system. "Table (12) Accuracy for using BPNN with three different Dimensionality Reduction Algorithm on training

dataset"

Dataset Features No Dimensionality Reduction Algorithm

Accuracy Algorithm Feature No.

42 PCA 7 92.2769

42 LDA 4 44.1059

42 SVD 7 89.2502

Figure (11) illustrate the performance results using training dataset and depending on four

different feature extraction and dimensionality reduction algorithm that we used with the Neural

Network (BPNN) classification algorithm for attack detection.




Accuracy of using BPNN classification with the three dimensionally reduction

algorithms attack detection

"Figure 11.Accuracy of using BPNN classification with the three dimensionally reduction algorithms attack

detection"

Table(13)Accuracy for using BPNN with three different Dimensionality Reduction Algorithm on testing dataset

Dataset Features No Dimensionality Reduction Algorithm

Accuracy Algorithm Feature No.

42 PCA 7 87.0894

42 LDA 4 44.3466

42 SVD 7 43.7338

Figure (12) illustrate the performance results of the three feature extraction and dimensionality

reduction algorithm that we used with the Neural Network (BPNN) classification algorithm for

attack detection.

"Figure 12. Accuracy of using (BPNN)classification with the three algorithms Dimensionality Reduction attack

detection"

Accuracy of using (BPNN)classification with the three algorithms Dimensionality Reduction

attack detection

7 4 7

92.2769

44.1059

89.2502

0

10

20

30

40

50

60

70

80

90

100

PCA LDA SVD

Feature No.

Accuracy




VII. CONCLUSIONS

The aim of this paper is to proposed an efficient dimensionality reduction algorithm which

implemented on ID system. Three dimensionality reduction and classification algorithm are used

which are the Back propagation Neural network (BPNN) to detect the four types of attack addition to

normal system .

Several experiments have been performed to measure the performance of the proposed IDS

,from the experimental results, the following conclusions can be derived:

Dimensionality reduction algorithm with Back propagation Neural network (BPNN)

A. Using Principal Component Analysis (PCA) with (BPNN)

1. According to table (PCA&BP) (5),(6)and(7) it can be noticed that the best accuracy was when

K=(11).

2. From table (5)when K=21the accuracy (87.1861),from table(6) when K=11 the accuracy

(90.2282)and table(7) when K=7 the accuracy(87.894).

B. Using linear Discriminant Analysis( LDA) with (BPNN)

From table(8)when K=4 the accuracy(44.3466).

C. using (Singular Value Decomposition SVD) with (BPNN)

1.According to table (SVD&BP) (9),(10)and(11) it can be noticed that the best accuracy was when

K=(21).

2. From table (9)when K=21 accuracy (86.1562),from table(10) when K=11 the accuracy

(75.8090)and table(11) the accuracy(43.7338).

so it can be noticed from the above results that PCA has achieved the highest performance when it

used with BPNN.

REFERENCES

[1] R. Bace1 and P. Mell ,"NIST Special Publication on Intrusion Detection Systems", Infidel, Scotts Valley, CA, Inc.

2004.

[2] C. Kruegel, F. Valeur and G. Vigna, "Intrusion Detection and Correlation: Challenges and Solutions", Springer

Science, Business Media, Inc, 2005.

[3] E. A. Fisch , G. B. White "Secure Computers and networks: Analysis, Design, and Implementation", CRC press,

2000.

[4] B. laing , J. Alderson, "How to guide–Implementing a Network Based Intrusion Detection System" Sovereign

House, 2000.

[5] S. A. Hassan , "A Technique for Mining Association Rules in Multidimensional Databases", M.Sc. Thesis,

Department of Computer Science,University of Technology, 2008.

[6] J. Han and M. Kamber, "Data Mining: Concepts and Techniques", Morgan Kaufmaan Publishers, 2006.

[7] E. G. Giannopoulou, "Data Mining in Medical and Biological Research", In-Tech a Croatian branch of I-Tech

Education and Publishing KG, Vienna, Austria, 2008.

[8] M G Schultz, E Eskin, E Zadok and S J Stolfo "Data Mining Methods for Detection of New Malicious

Executables", Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on, 2001.

[9] M. Gudadhe ,P. Prasad and K. Wankhade ,"A New Data Mining Based Network Intrusion Detection Model", Int'l

Conf.on Computer Telecommunication Technology [ICCCT'10], 2010 IEEE.

[10] Lazarevic., V. Kumar, and J. Srivastava, " INTRUSION DETECTION: A SURVEY", Computer Science

Department, University of Minnesota, 2005.

[11] J. Han and M. Kamber," Data Mining: Concepts and Techniques ", Book, Second Edition, University of Illinois at

UrbanaChampaign, c 2006 by Elsevier Inc..2006.J. S. Balasubramaniyan, J. O. Garcia-Fernandez, D. Isacoff, E.

Spafford and D. Zamboni “An Architecture for Intrusion Detection using Autonomous Agents” Center for Education

and Research in Information Assurance and Security, Purdue University, Technical Report, 1998.

differ feature reduction techniques for intrusion detection … · differ feature reduction...

Documents