differ feature reduction techniques for intrusion detection … · differ feature reduction...
TRANSCRIPT
DOI:10.21884/IJMTER.2016.3155.RHOQX 85
DIFFER FEATURE REDUCTION TECHNIQUES FOR INTRUSION
DETECTION SYSTEM Rasha Thamer Shawe
1 and Assist.prof. Safana H. Abbas
2
1,2 College of education in Computer Science, AL-Mustansiriya University, Iraq. Baghdad.
Abstract— Due to a growing number of computer network in recently years there has been an
increasing interest in the intrusion detection system (IDS) that it monitors the activity occurring in a
computer network and analyze them for recognizing intrusions to protect the computer network .
Most of existing IDS use all of features available in the network packet to analyze and look
instructive pattern while some of these features are redundant and irrelevant this is time consuming
and performance degrading process. In this paper three different dimensionality reduction algorithms
Principal Component Analysis (PCA),Linear Discriminate Analysis( LDA) and Singular Value
Decomposition (SVD), were used and an experimental result were tested to get the efficient
algorithm that when applied with classification algorithm gives the highest result. Classification
algorithms were used Back Propagation Neural Network (BPNN) to detect different attack types.
Keywords— IDS, KDDCUP1999, Feature Reduction, Attack Detection.
I. INTRODUCTION
Security is a great issue for all networks in today’s enterprise environment. intruder and
Hackers have made many successful attempt to bring down high profile company networks and web
services.
intrusion detection (ID) is to transfer unwanted content, malicious, or hazardous to the network, the
system being monitored can be a web server, a database, or a group of computers. Intervention may
be virtuous as unwanted or harmful, such as a Trojan horse that infects your computer system by
reading, writing, or even delete files [1]
Systems that attempt to detect malicious behavior that is targeted against a network and its
resources are called Intrusion Detection Systems (IDS). They are network security tools that process
local audit data or monitor network traffic to identify evidence of an occurring attack. IDS can either
search for specific known patterns, called signatures, in their input stream "misuse-based" or detect
certain deviations from expected behavior "anomaly-based" that indicate hostile activities against
the protected network.
Intrusion detection systems(IDS) constitute, besides firewalls and cryptography, the third building
block of a secure computer system installation and can discover intrusions in all of the three stages of
an intrusion [2].
Intrusions detection is the process of monitoring the events that occurred in a computer
system or network and analyzed for signs of intrusion. It determines the interference of any one
attempt to provide confidentiality, integrity concessions, availability, or to bypass the security
mechanisms of a computer or network. The reason of the intrusion by attackers is to access the
system from the inside or external network such as the Internet, unauthorized users of systems who
are trying to get additional privileges, which does not justify them, and unauthorized users who
misuse the privileges granted to them [3,4].
The term "Data Mining" was introduced in the 1990s, but the DM is the evolution of a field
with a long history [5]. data mining is a multidisciplinary field diagram work from areas including
DB systems, statistics, machine learning, information retrieval, pattern recognition , artificial
intelligence 'AI', knowledge based system, and data visualization [6]. DM techniques are
increasingly employed in traditional scientific discovery disciplines, such as biological, medical,
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 86
chemical, physical and social sciences, and a variety of other knowledge industries, such as
governments and education with the aim of discovering previously unknown patterns and
correlations, as well as predicting trends and behaviors[7].
II. INTRUSION DETECTION SYSTEM WITH DATA MINING
Data Mining (DM)-based intrusion detection (ID) framework can detect novel intrusions
accurately and automatically. The DM methods automatically find patterns in the dataset used and
use these patterns to detect a set of novel intrusions. by comparing detection tools that using DM
with a conventional signature based methods; see that, DM based detection methods are more than
doubles the current detection rates for new malwares [8].
Intrusion Detection System with Data Mining Generally, various fields like marketing,
financial needs and business governments are depend totally on the term "data mining", and it is
proved the success. The most important techniques of data mining that are depended on most fields
containing classification that transforms a data entity into one of some identified groups [9].
Huge amount intrusion detection models have been suggested in each of the scientific
research and business area after the first intrusion detection system was to provide information
about these research and applications business, in spite of that models are very different in the
mechanisms they used to collect and process data ,many of them depend on a relatively general
constructional model[10].
Most commercial and traditional IDSs are weak and could not give an excellent decision.
These systems actually need a misuse detection mechanism. Misuse detection tries to find out some
patterns of program or customer behavior that match common threats patterns that are kept as
signatures. The hand-coded signatures are supported by researchers based on their good background
of knowledge about intrusion strategies. New attacks could be detected by anomaly detection
techniques.
Anomaly detection constructs forms of natural system manner, called profiles. That will be
used to discover novel patterns that significantly different from the profiles. These deviations could
illustrate real attacks or simply be novel behaviors that must be included in the data. The most
important benefits of anomaly detection is that it could discover new attacks that never been
discovered. in fact, a human analyst must sort through the deviations to ascertain which represent
real intrusions. The most disadvantage of anomaly detection is the high ratio of FP(false positive).
Novel patterns of attacks may be included with the grouping of signatures of Misuse Detection [11].
From this discussion, it becomes obvious that current traditional IDSs face many limitations.
This has lead to an increased interest in data mining for intrusion detection. In comparison to
traditional IDS, IDS based on DM are generally more precise and require far less manual processing
and input from human experts [11].
Intrusion Detection System (IDS) is a computer program that attempts to perform intrusion
detection (ID) by either misuse or anomaly detection, or a combination of techniques (Anomaly and
Misuse). An IDS should preferably perform its task in real time [12].
Intrusion detection is therefore needed as another wall to protect computer systems.
The central elements to intrusion detection engine are: resources to be protected in a target
system, i.e., user accounts, file systems, system kernels, etc; models that characterize the
“normal” or “legitimate” behavior of these resources; techniques that compare the actual
system activities with the established models, and identify those that are “abnormal” or
“intrusive”.
III. PROPOSED SYSTEM
The proposed system is consisting mainly of two major tasks which are:
Feature Reduction.
Attack Detection.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 87
Input KDD Dataset
Preprocessing
(encoding KDD dataset)
Feature Extraction and
Dimensionality
Reduction
Classification Algorithms
(BPNN)
Performance Measurement
The proposed intrusion detection system is illustrated in figure (1) which consisting of the following
stages:
using (PCA,LDA,SVD)
"Figure 1 Proposed Intrusion Detection System"
Input Dataset Stage: In the first step of the proposed system takes the KDD Cup 99 dataset as
an input. And it is given to the next pre-processing step.
Pre-processing Stage: KDDCUP1999 dataset contains number of features and these are in
different format. Some are number format and others are in character format. So, different format
dataset is converted into similar format to be used in the next phase.
Feature Extraction and Dimensionality Reduction Stage: Dimensionality reduction step is
used for Feature Extraction phase by extracting suitable features from dataset, and reduce the
KDD dimensions as well. In this step, different algorithms used such as Principal Component
Analysis (PCA), Linear Discriminant Analysis (LDA) and Singular value decomposition (SVD)
techniques for features reduction . This step reduces the dimensionality of dataset and extracted
features to be given as an input to the next step.
Detection Model Stage: Neural Network back propagation(BPNN) is used as a classifier in this
step. It is important to take previous step output dataset as an input and trains the network using
BPNN model is used both linear and non-linear classifier . for the non-linear classified stage
different kernels is used like Gaussian, linear, and polynomial kernel as a transformation and
mapping approach.
Performance Measurement Stage: Evolution for the results of the classification outputs using
different criteria.
IV. PROCEDURE USING IN PREPROCESSING
KDD’99 as an input dataset contains number of features and these are in different format.
Some are number format and others are in character format. So, these different format datasets are
converted into similar format to be extracted to the next phase.
Since there are some features of KDD CUP1999 datasets are continuous, thus a process for
normalizing these features have been done in order to become more convenient with the data mining
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 88
Feature No. (42) Feature No. (1)
Feature No. (2)
, different dimensionality algorithms and classification algorithms. Normalization is used for data
preprocessing, where the features data are scaled so as to fall within a small specified range such as -
1.0 to 1.0 or 0.0 to 1.0. Normalizing the input values for each feature measured in the training
samples will help to speed up the learning phase.
4.1 Dataset Labeling
The dataset should be labeled by using 10% of the corrected dataset should be labeled by using
the whole feature space in the KDD 10% corrected dataset as it shown in the screen shot which is
located in the feature of the whole dataset. The screen shot of the dataset (KDD 99) that are taken
from mat lab environment it shown in Figure (2).
"Figure 2 Sample data of 10% correction KDD cup dataset."
The dataset records contain 42 features (e.g., protocol type, service, and Flag) and is labeled as
either normal or an attack with one specific attack type as shown in Figure (3), if we take a sample
from the dataset before doing the scaling (normalization), first row as an example. We have noticed
that the feature (42) has the normal type of attack as we describe that before in Table (1).
"Figure 3. First row (data sample) of 10% correction KDD cup dataset"
"Table 1. Class labels and number of records of "10% KDD'99" dataset"
Attack Type Original Number of Records Number of Records after removing
duplicated instances Attack Category
Back 2203 994 DoS
Land 21 19
DoS
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 89
Neptune 107201 51820 DoS
Pod 264 206 DoS
Smurf 280790 641 DoS
Teardrop 979 918 DoS
Satan 1589 908 Probe
Ipsweep 1247 651 Probe
Nmap 231 158 Probe
Portsweep 1040 416 Probe
Normal 97277 87831 Normal
guess_passwd 53 53 R2L
ftp_write 8 8 R2L
Imap 12 12 R2L
Phf 4 4 R2L
Multihop 7 7 R2L
Warezmaster 20 20 R2L
Warezclient 1020 1020 R2L
Spy 2 2 R2L
buffer_overflow 30 30 R2L
Loadmodule 9 9 R2L
Perl 3 3 R2L
Rootkit 10 10 R2L
So, the dataset is labeled according to the following attacks which are fall into one of five
categories listed below in Table (2): "Table 2. Our Class labeling of "10% KDD'99" dataset"
Attack Type Description Sub Types Label
(DoS)
Denial of Service
Attacker tries to prevent legitimate users from
using a service
Smurf
1
Neptune
Back
Teardrop
Pod
Land
Normal data with no attack
normal
2
Probe Attacker tries to prevent
legitimate users from using a service.
Satan
3 Ipsweep
Portsweep
Nmap
(R2L)
Remote to Local
Attacker does not have an account on the
victim machine, hence tries to gain access
Warezclient
4
guess_passwd
Warezmaster
Imap
ftp_write
Multihop
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 90
Phf
spy.
User to Root (U2R) Attacker has local access to the victim machine
and tries to gain super user privileges
buffer_overflow
5 Rootkit
Loadmodule
Perl
The algorithm steps that is used for doing the class labeling is shown in Algorithm (1).
Algorithm 1. KDD 99 Class Labeling
Input: 10% KDD data setT = D (F, C)
Normalized 10% KDD Dataset
Output:Class labels
1. Initialize class labels [ ] , ’class
label’, [ ]
2. Repeat
3. For each column of feature
4. Choose feature number (42)
5.
6. If [ ]
7. [ ]
8. Obtain new labeled instances old one from Du induced by ;
9. until
10. Return Selected features: S.
There are many nominal values like HTTP, ICMP,SF in the dataset. therefore we have to
transform these nominal values to numeric values in advance . For example, the service type of "tcp"
is mapped to 1,"udp" is mapped to 2,"icmp"is mapped to3 and we will follow table(3)to transform
the nominal values of dataset features into the numeric values. "Table 3.Transformation Table"
Type Feature Name Numeric value
Protocol-type
TCP 1
UDP 2
ICMP 3
Flag
SF 1
S1 2
REJ 3
S2 4
S0 5
S3 6
RSTO 7
RSTR 8
RSTOS0 9
OTH 10
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 91
SH 11
Service All services 1 to 66
Attack All attack 1 to 23
The transformation the original KDDCUP1999 dataset will become as shown in figure(4).
"Figure 4. Pre-processing Original KDDCUP1999 dataset before and after transformation."
4.1.1Mean range
The seconded step of the pre-processing KDD’99’s dataset is to find the mean range between
[0,1]. We do that by finding the maximum and minimum value of a given feature, then it will be
transformed the feature into a range of value [0,1] by using
4.1.2 Normalization
The third step is to normalized those features depending on the minim and the maximum
values that have been calculated in the previous step. It is estimated depending on the statistical
normalization that has been described in Equation
The statistical normalization is defined as
where is mean of values for a given attribute
(1)
(1)
∑
(2)
before transformation
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.
00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf.
0,tcp,http,SF,248,2129,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,23,23,0.00,0.00,0.00,0.00,1.00,0.00,0.00,23,255,1.00,0
.00,0.04,0.03,0.00,0.00,0.00,0.00,normal.
after transformation
0,3,8,1,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,
1.00,0.00,0.00,0.00,0.00,0.00,6.
0,1,1,1,248,2129,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,23,23,0.00,0.00,0.00,0.00,1.00,0.00,0.00,23,255,1.00,0.00,0.
04,0.03,0.00,0.00,0.00,0.00,1.
7
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 92
and is its stand deviation
4.2 Feature Extraction and Dimensionality Reduction
Feature extraction and Dimensionality reduction is defined as follows: given a set of
candidate features, Selecta subset or a feature that performs the best under some classification
algorithms. This process can reduce not only the cost of recognition by reducing the number of
features, but also provide a better classification accuracy due to finite dataset size effects.
4.2.1 Principal Component Analysis (PCA) Contributions to PCA is technique used for feature extraction, data used in intrusion detection
problem are high dimensional in nature. It is desirable to reduce the dimensionality of the data for
easy exploration and analysis. the PCA is often used for this purpose.
The PCA algorithm is shown in Algorithm (2) .
Algorithm 2. Principle Component Analysis (PCA)
Input:Generate Data matrix (features of KDD 99)
Number of principle component
Output:New Dimensions
1. Repeat
2. Compute the mean of transactions
∑
3. Subtract the mean from each transaction
4. Compute the covariance matrix
5. From Compute eigenvectors of
6. Consider matrix as a matrix matrix
7. Compute the eigenvectors of such that:
8.
9.
10. Compute the best eigenvectors of
11. Keep only K eigenvectors, (K features with their values).
12.
13. until
14. Return
4.2.2 Linear Discernment analysis (LDA)
Linear discernment analysis (LDA) is another method that are used for dimensionality
reduction and feature extraction (Selection). LDA seeks to reduce dimensionality while preserving as
much of the class discriminatory information. Linear Discernment Analysis (LDA) Algorithm steps
are shown in Algorithm (3).
Algorithm 3. Linear Discriminant Analysis (LDA)
Input: Generate Data matrix
Output: New Dimensions
1. Initialize
√
∑
(3)
8
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 93
2. Repeat
3. Compute the d-dimensional mean vectors for the different classes from .
4. Compute the scatter matrices (in-between-class and within-class scatter ).
5. Compute the eigenvectors ( ,… ) and corresponding eigenvalues ( ).
6. Sort the eigenvectors by decreasing eigen values
7. choose eigenvectors with the largest eigenvalues to form a
8. dimensional matrix
9. (where every column represents an eigenvector).
10. Use this eigenvector matrix to transform the samples onto the new
subspace.
where is a -dimensional matrix representing the
are the transformed -dimensional samples in the newsubspace)
11. Until
4.2.3 Singular-Value Decomposition(SVD)
It is a powerful computational tool and commonly used in the solution of matrix rank
estimation Singular-Value Decomposition (SVD) algorithm steps are described in the algorithm (4).
Algorithm 4. Singular-Value Decomposition (SVD)
Input: Generate Data matrix
Output: New Dimensions
11. Repeat
12. Applying SVD to the matrix as
is an matrix
- no. of sessions (vectors)
- is no. of attributes)
matrix of the eigenvectors
is matrix which is diagonal
is matrix the eigenvectors.
13. Construct the covariance matrix from this decomposition by
14. an orthogonal matrix
15. square roots of the eigenvalues of are the singular values of
16. until
1. Return
4.4 Back Propagation (BP)
As mentioned before, the mathematical model of the Biological Neural Network is defined as
Artificial Neural Network. One of the Neural Network models which are used almost in all the fields
is Back Propagation Neural Network. The back propagation algorithm is used in multi layered feed-
forward ANNs. This means that the artificial neurons are organized in layers, then send their signals
"forward", and then the errors are propagated backwards. The network receives the inputs signal by
neurons in the input layer, and the output of the network is given by the neurons on an output layer.
Then, may be one or more intermediate hidden layers. The back propagation algorithm uses
supervised learning, which means that the algorithm is provided by examples of the inputs and
outputs that the network must be compute, and then the error which is the difference between actual
and expected results is calculated. The idea of the back propagation algorithm which is used to
reduce this error, until the ANN has been learned the training data. The training begins with initial
(random) weights, and the goal is to adjust (update) them so that the error will be minimal.
9
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 94
The following Pseudo coding in Algorithm (5) describes the BP algorithm
Algorithm 5. Back propagation Neural Network
Input: Input features or domain
Output: Class Type
1. Initialize all weights with small random number between [-1,1]
2. Repeat
3. For every pattern in the Training set
4. Present the pattern to the network
5. Propagation the input forward through the network
6. For each layer in the network
7. For each neuron in the layer
8. Calculate the weight sum of the input to the neuron
9. Add the threshold to the sum
10. Calculate the activation function for the neuron
11. End
12. End
13. Propagation the input forward through the network
14. For each layer in the network
15. For each neuron in the layer
16. Calculate the neuron’s signal error
17. Update each neuron’s weight in the network
18. End
19. End
20. Calculate global error
21. Calculate the error function
22. End
23. Until (Maximum number of iteration< specific) and (error> specific)
4.5 Evaluation
To assess the validation and accuracy of the intrusion detection and classification system
based on feature selection and dimensionality reduction, in this case we need to introduce the
measures of validation the results of classification.
confusion matrix for intrusion detection is defined as an matrix, where denotes the
number of classes. A confusion matrix contains information about actual and predicted
classifications done by a classification system. Performance of such systems is commonly
evaluated using the data in the matrix. Each column of the matrix represents the instances in a
predicted class, while each row represents the instances in an actual class. The confusion matrix
shows the classes which are correctly classified and the classes that are misclassified. Confusion
matrix is used to evaluate these parameters as shown in Table (4). "Table 4. Confusion Matrix"
Attack Predicted Class
Yes No
Actual Class
Yes TP FN
No FP TN
The performance of neural network can be evaluated using various parameters. Standard
parameters include classification accuracy, detection rate and false positive rate, the given parameter
calculated using True Positive (TP), False Negative (FN), False Positive (FP) and True Negative
(TN).
True Positive (TP): true positive results refer to correct classifications of positive cases.
True Negative (TN): true negative results refer to correct classifications of negativecases.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 95
False Positive (FP): false positive results refer to incorrect classifications of positive cases
into negative class.
False Negative (FN): false negative results refer to incorrect classifications of negative
cases into class positive.
Accuracy
Recognition Rate is defined as the ratio between the numbers of correct recognition decision
to the total number of attempts as it is given in equation(1.5).
The evaluating performance of face retrieval system is measured through how many
predictions for specific query are truly relevant to are a query. The retrieval efficiency is generally
evaluated through two well- known metrics, precision and recall. The formula for calculating these
measures are given as in equations and respectively.
Detection Rate
Detection rate is defined as the ratio between the numbers of true positive divided by the total
number positive detection samples as it described in (1.6).
False Alarm
False alarm is defined as the ratio between the numbers of false positive detection divided by
the total number the false positive and true negative test samples as it described in (1.7).
the category of data behavior in intrusion detection for binary kind classes (Normal and
Attacks) in term of true negative, true positive, false positive and false negative.
V. KDD’99 INPUT DATASET
The proposed system takes the KDD Cup 99 dataset as an input where the total number of
records we used as a sample is (10000 ). After taking the input it is given to next phase to do the pre-
processing .Each record contains 41 features, the records have labeled either normal or attack type,
with exactly one specific attack type fall into one of the four attack categories: denial of service
attack(DOS),user to root attack (U2R), remote to local attack(R2L) and probing attack.
VI. RESULT
Show the overall performances results of the Neural Network (BPNN) on KDD Cup 99
depending on training and testing datasets by using all four different algorithms (PCA, LDA and
SVD) that we have proposed in our system. intrusion detection classification system based feature
reduction using different algorithms on the KDD Cup 99 dataset. three algorithms are used for
(5)
(6)
(7)
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 96
reducing the 42 features of the KDD data set, and one classification algorithms to detect the four type
of ID attacks. Principal Component Analysis (PCA) The first algorithm that we used for dimensionality reduction with Neural Network(BPNN) as an
attack detection for intrusion detection system. in this case we test the PCA using three different k
starting from (k=21,11, and 7) from the original (42) feature space by using trial and error .
Tables (5) shows the confusion matrix results after applying the Neural Network (BPNN)
classification algorithm on KDD Cup 99 training and testing datasets depending on selection just
(k=21) features. "Table 5.confusion matrix when using PCA with K=21 and BPNN in the training and testing dataset"
Ori
gin
al
Fea
ture
Dimensionality
Reduction
Training Dataset
Confusion matrix
Evaluation K-Feature
Selection Time TP TN FP FN
42 21 0.62184 91.7534 9.0144 8.4685 90.9856
Detection 0.9103
False 0.4844
Accuracy 91.2586
Ori
gin
al
Fea
ture
Dimensionality
Reduction
Testing Dataset
Confusion matrix
Evaluation K-Feature
Selection Time TP TN FP FN
42 21 50.4142 85.3264 10.9542 14.6736 89.0458
Detection 0.8862
False 0.5726
Accuracy 87.1861
Table (6) show different results depending on (k=11) and applied on training and testing
dataset. "Table 6. confusion matrix when using PCA with K=11 and BPNN in the training and testing dataset"
Ori
gin
al
Fea
ture
Dimensionality
Reduction
Training Dataset
Confusion matrix
Accuracy Feature
Selection
Training
Time TP TN FP FN
42 11
164.703
88.7526 12.3076 11.2474 87.6924
Detection 0.8782
False 0.4775
Accuracy 88.2225
Training
Time
Testing Dataset
Confusion matrix
Accuracy TP TN FP FN
51,31030
87.5391 7.0826 12.4609 92.9174
Detection 0.9251
False 0.6376
Accuracy 90.2282
Table (7) show different results depending on different k-selection (k=7) and applied training
and testing dataset
"Table 7. confusion matrix when using PCA with K=7 and BPNN in the training and testing dataset"
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 97
Ori
gin
al
Fea
ture
Dimensionality Reduction Training Dataset
Confusion matrix
Accuracy Feature
Selection
Training
Time TP TN FP FN
42 7
139.7367 92.8880 8.3342 7.1120 91.6658
Detection 0.9177
False 0.4604
Accuracy 92.2769
Testing
Time
Testing Dataset
Confusion matrix
Accuracy TP TN FP FN
50.22079 85.6803 11.5016 14.3197 88.4984
Detection 0.8816
False 0.5546
Accuracy 87.0894
Figure (5) shows the training error curve depending on k=21,11,7 using (BPNN) and (PCA).
Figure (4.2) show the testing error curve result with respect the iteration number by using PCA with
k=21,11,7and (BPNN) classification algorithm.
K=21 K=11
K=7
"Figure 5. the training error curve result with respect the iteration number by using PCA with (k=21,11,7)
and Neural Network (BPNN) classification algorithm."
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 98
Figure (6) shows the testing error curve result respect with the iteration number by using PCA
with k=21,11,7 and (BPNN) classification algorithm.
K=21 K=11
K=7
"Figure 6.testing error curve depending on (BPNN)and (PCA) using k=21,11,7."
Linear Discriminant Analysis (LDA)used dimensionality reduction function. This algorithm has
selected only (k=4) features from the whole dataset features which is (42). Tables (8) shows the
confusion matrix results after applying the (BPNN) classification results on KDD Cup 99
datasets using selected k on the training dataset.
"Table 8. confusion matrix when using(LDA )with (BPNN) in the training and testing dataset "
Ori
gin
al
Fea
ture
Dimensionality
Reduction
Training Dataset
Confusion matrix
Accuracy Feature
Selection Time TP TN FP FN
42 4
155.9450 44.1059 55.8941 55.8941 44.1059
Detection 0.4411
False 0.5000
Accuracy 44.1059
Training
Time Testing Dataset
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 99
Confusion matrix
Accuracy TP TN FP FN
43.74335 44.3063 55.6132 55.6937 44.3868
Detection 0.4434
False 0.5004
Accuracy 44.3466
Figure (7) shows the training error curve depending on (k=4) using (BPNN) and (LDA), and
Figure (8) show the testing error curve result respect with the iteration number by using LDA with
k=4 and (BPNN) classification algorithm.
"Figure 7. training error curve depending on (NN) and (LDA) using K=4."
"Figure (8) Testing error curve depending on (NN) and (LDA) using K=4."
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 100
Singular-Value Decomposition(SVD)
The used for dimensionality reduction with Neural Network(BPNN) as an attack detection for
intrusion detection system. We used the SVD depending on different number of selected features
which called k-feature section (the reduced no. of feature as a dimensionality reduction that we want
to select). usually after mean centering (normalizing) the data matrix for each feature . That means
that there is no majority to give us any clue about the k selection in this case we test the SVD using
three different k starting from (k=21,11, and 7) from the original (42) feature space by using trial
and error.
Tables (9) shows the confusion matrix results after applying the Neural Network (BPNN)
classification algorithm on KDD Cup 99 training and testing datasets depending on selection just
(k=21) features. "Table 9.confusion matrix when using SVD with K=21 and BPNN in the training and testing dataset"
Ori
gin
al
Fea
ture
Dimensionality Reduction Training Dataset
Confusion matrix
Accuracy Feature
Selection
Training
Time TP TN FP FN
42 21
90.521816 96.4787 33.1344 3.5213 66.8656
Detection 0.7444
False 0.0961
Accuracy 81.6722
Testing
Time
Testing Dataset
Confusion matrix
Accuracy TP TN FP FN
14.646747 75.9007 3.5883 24.0993 96.4117
Detection 0.9549
False 0.8704
Accuracy 86.1562
Table (10) show different results depending on (k=11) and applied on training, and testing
dataset. "Table 10. confusion matrix when using SVD with K=11 and BPNN in the training and testing dataset "
Ori
gin
al
Fea
ture
Dimensionality Reduction Training Dataset
Confusion matrix
Accuracy Feature
Selection
Training
Time TP TN FP FN
42 11
92.203391
93.6704 7.2423 6.3296 92.7577
Detection 0.9282
False 0.4664
Accuracy 93.2141
Training
Time
Testing Dataset
Confusion matrix
Accuracy TP TN FP FN
8.060197
57.5514 5.9335 42.4486 94.0665
Detection 0.9065
False 0.8774
Accuracy 75.8090
Table (11) show different results depending on different k-selection (k=7) and applied on training and testing
dataset.
"Table 11.confusion matrix when using SVD with K=7 and BPNN in the training and testing dataset"
O ri gi
na l Fe
at
ur e Dimensionality
Reduction Training Dataset
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 101
Confusion matrix
Accuracy Feature
Selection
Training
Time TP TN FP FN
42 7
45.154621 89.5418 11.0414 10.4582 88.9586
Detection 0.8902
False 0.4864
Accuracy 89.2502
Testing
Time
Testing Dataset
Confusion matrix
Accuracy TP TN FP FN
10.289327 43.6686 56.2010 56.3314 43.7990
Detection 0.4373
False 0.5006
Accuracy 43.7338
Figure (9) shows the training error curve depending on k=21,11,7 using (BPNN) and (SVD),
and Figure (10) show the testing error curve result respect with the iteration number by using (SVD)
with k=21,11,7and (BPNN) classification algorithm.
"Figure 9 the training error curve result with respect the iteration number by using SVD with (k=21,11,7) and
Neural Network (BPNN) classification algorithm."
Figure (10) shows the testing error curve result respect with the iteration number by using
SVD with k=21,11,7 and (BPNN) classification algorithm.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 102
"Figure(10)testing error curve depending on (BPNN)and (SVD) using k=21,11,7."
Tables (12) and (13) show the overall performances results of the Neural Network (BPNN) on
KDD Cup 99 depending on training and testing datasets by using all three different algorithms (PCA,
LDA and SVD) that we have proposed in our system. "Table (12) Accuracy for using BPNN with three different Dimensionality Reduction Algorithm on training
dataset"
Dataset Features No Dimensionality Reduction Algorithm
Accuracy Algorithm Feature No.
42 PCA 7 92.2769
42 LDA 4 44.1059
42 SVD 7 89.2502
Figure (11) illustrate the performance results using training dataset and depending on four
different feature extraction and dimensionality reduction algorithm that we used with the Neural
Network (BPNN) classification algorithm for attack detection.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 103
Accuracy of using BPNN classification with the three dimensionally reduction
algorithms attack detection
"Figure 11.Accuracy of using BPNN classification with the three dimensionally reduction algorithms attack
detection"
Table(13)Accuracy for using BPNN with three different Dimensionality Reduction Algorithm on testing dataset
Dataset Features No Dimensionality Reduction Algorithm
Accuracy Algorithm Feature No.
42 PCA 7 87.0894
42 LDA 4 44.3466
42 SVD 7 43.7338
Figure (12) illustrate the performance results of the three feature extraction and dimensionality
reduction algorithm that we used with the Neural Network (BPNN) classification algorithm for
attack detection.
"Figure 12. Accuracy of using (BPNN)classification with the three algorithms Dimensionality Reduction attack
detection"
Accuracy of using (BPNN)classification with the three algorithms Dimensionality Reduction
attack detection
7 4 7
92.2769
44.1059
89.2502
0
10
20
30
40
50
60
70
80
90
100
PCA LDA SVD
Feature No.
Accuracy
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 03, Issue 12, [December – 2016] ISSN (Online):2349–9745; ISSN (Print):2393-8161
@IJMTER-2016, All rights Reserved 104
VII. CONCLUSIONS
The aim of this paper is to proposed an efficient dimensionality reduction algorithm which
implemented on ID system. Three dimensionality reduction and classification algorithm are used
which are the Back propagation Neural network (BPNN) to detect the four types of attack addition to
normal system .
Several experiments have been performed to measure the performance of the proposed IDS
,from the experimental results, the following conclusions can be derived:
Dimensionality reduction algorithm with Back propagation Neural network (BPNN)
A. Using Principal Component Analysis (PCA) with (BPNN)
1. According to table (PCA&BP) (5),(6)and(7) it can be noticed that the best accuracy was when
K=(11).
2. From table (5)when K=21the accuracy (87.1861),from table(6) when K=11 the accuracy
(90.2282)and table(7) when K=7 the accuracy(87.894).
B. Using linear Discriminant Analysis( LDA) with (BPNN)
From table(8)when K=4 the accuracy(44.3466).
C. using (Singular Value Decomposition SVD) with (BPNN)
1.According to table (SVD&BP) (9),(10)and(11) it can be noticed that the best accuracy was when
K=(21).
2. From table (9)when K=21 accuracy (86.1562),from table(10) when K=11 the accuracy
(75.8090)and table(11) the accuracy(43.7338).
so it can be noticed from the above results that PCA has achieved the highest performance when it
used with BPNN.
REFERENCES
[1] R. Bace1 and P. Mell ,"NIST Special Publication on Intrusion Detection Systems", Infidel, Scotts Valley, CA, Inc.
2004.
[2] C. Kruegel, F. Valeur and G. Vigna, "Intrusion Detection and Correlation: Challenges and Solutions", Springer
Science, Business Media, Inc, 2005.
[3] E. A. Fisch , G. B. White "Secure Computers and networks: Analysis, Design, and Implementation", CRC press,
2000.
[4] B. laing , J. Alderson, "How to guide–Implementing a Network Based Intrusion Detection System" Sovereign
House, 2000.
[5] S. A. Hassan , "A Technique for Mining Association Rules in Multidimensional Databases", M.Sc. Thesis,
Department of Computer Science,University of Technology, 2008.
[6] J. Han and M. Kamber, "Data Mining: Concepts and Techniques", Morgan Kaufmaan Publishers, 2006.
[7] E. G. Giannopoulou, "Data Mining in Medical and Biological Research", In-Tech a Croatian branch of I-Tech
Education and Publishing KG, Vienna, Austria, 2008.
[8] M G Schultz, E Eskin, E Zadok and S J Stolfo "Data Mining Methods for Detection of New Malicious
Executables", Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on, 2001.
[9] M. Gudadhe ,P. Prasad and K. Wankhade ,"A New Data Mining Based Network Intrusion Detection Model", Int'l
Conf.on Computer Telecommunication Technology [ICCCT'10], 2010 IEEE.
[10] Lazarevic., V. Kumar, and J. Srivastava, " INTRUSION DETECTION: A SURVEY", Computer Science
Department, University of Minnesota, 2005.
[11] J. Han and M. Kamber," Data Mining: Concepts and Techniques ", Book, Second Edition, University of Illinois at
UrbanaChampaign, c 2006 by Elsevier Inc..2006.J. S. Balasubramaniyan, J. O. Garcia-Fernandez, D. Isacoff, E.
Spafford and D. Zamboni “An Architecture for Intrusion Detection using Autonomous Agents” Center for Education
and Research in Information Assurance and Security, Purdue University, Technical Report, 1998.