masquerader detection

18
Masquerader Detection using Probabilistic Approach Md.Arquam National Institute of Technology , Delhi

Upload: arquam-md

Post on 12-Apr-2017

125 views

Category:

Documents


0 download

TRANSCRIPT

Masquerader Detection using Probabilistic Approach

Md.ArquamNational Institute of Technology , Delhi

What is Cloud Computing and What are the various threats in cloud and data securityWho is a Masquerader and how he affects securityProposed technique: User search profiling and Bogus informationModeling User Search Behavior for Masquerade Detection Malek Ben Salem and Salvatore J. Stolfo,IEEE cconferencemethod for User search profilingOne class Support Vector Machines (SVM)Detection rate: 80.7% false rate:13.7%1000:600 m:484.2 400:60 m

Literature survey

Nave Bayes ClassifierBayes theorem gives a rule for conditional probability. Conditional probability is probability of event A. occurring if event B occurs. In masquerade detection we use Naive Bayes classification to determine the probability of a block of commands belonging to a user. For example, to estimate that an instance of x = {x1, x2, x3, . . . ,xn} belongs which class( y) as,

SVM: What is it and how it worksSVM is a non-probabilistic binary linear classifier and is the most commonly used classifier for text classification

An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

By using the kernel trick, svm does non-linear classification by mapping their inputs to higher dimensional spaces .

Data Set and tools usedFor the purpose of our project , we have taken the schonlaus dataset which contains the data as Unix commands with arguments stripped collected from 50 users. Each user file has 15000 commands collected over time and considering a block of data as 100 commandsEach user is different from other by set of command that it frequently and repeatedly uses.

Tools used:

LibSVM: A library for SVM which contains various functions for training and testing in various languages like java , python and on various platforms like windows , Matlab etcTextblob: library for NaivesBayes

Tf-IdfTerm Frequency: The number of a term occurs in a document. Might be a simple raw frequency or logarithmic ( tf[t,d] =1+log f[t,d] ) or augmented.

IDF: Theinverse document frequencyis a measure of how much information the word provides, that is, whether the term is common or rare across all documents.

How to distinguish one user from another i.e what is his signature ?

Commonly used by search engines to rank the web pages based on user-query.

Parameters affecting the accuracyBlock Size: Blocks of sizes 200,500, can be considered.

Kernel Function: The choice of kernel function to be used in SVM.

Radial basis function:

Polynomial function:

Hyperbolic tangent:

Parameters in the kernel function: like ,d,c ..

Goal of the projectThe goal is to identify the masqueraders block of data from the users blocks of data using both Nave-Bayes and SVM with high detection rate and low false rate(taking advantages from both methods )

1. Block having all different command other than what is appearing in training dataset.2. Block having 25% commands appearing first time and 75% commands belonging to user. In this 75% commands belonging to user it can be further classified: all commands are non repeatingCommands are repeating more than k times.(k refers to sensitivity) 3.Block having more than 25% commands unique appearing first time.

module-1 we have applied the NB on out dataset and calculated its detection rate.

module-2 we have applied same data to SVM and analysed its performance.

module-3 we have applied the proposed method and calculated its detection rate and compared it with module 1s and 2s performance

Bayes SimulationThe training data consist of 2000 commands of Schonlau dataset which is feeded as training data as one block.Then block of 50 commands are used as one block which is feeded as testing block.so total of 2500 (50 block) commands are used as testing. The testing block is classified into either class 0: masquerader or 1: normal userModule 1: mean detection rate of NB is obtained as 66.78% and false positive rate as 7.8%

Module 1:Bayes Simulation

Detection rate: 66.78%False rate: 17.8%

Module 2:SVM Simulation

mean detection rate as 80.1% and false positive rate as 21.08%.

Proposed Method WorkingFind the probability of occurrence of all distinct commandsSum all of them for a blockFind mean sum for all the blocksNow for each blockIf blocksum>avgmean, It is non-masqueraderElse If blocksum