artificial intelligence techniques for misuse and anomaly detection computer science department...

Artificial Intelligence Techniques for Misuse and Anomaly Detection

Computer Science DepartmentUniversity of WyomingLaramie, WY 82071

Project 1: Misuse Detection With Semantic

Analogy

Faculty:Diana Spears

William SpearsJohn Hitchcock (UW, consultant)

Ph.D. Student:Adriana Alina Bilt

(Relevant disciplines: AI planning, AI machine learning, case-based reasoning, complexity theory)

Misuse Detection

Misuse: Unauthorized behavior specified by usage patterns called signatures.

In this project, we are currently working with signatures that are sequences of commands.

The most significant open problem in misuse detection: False negatives, i.e., errors of omission.

Project Objectives

1. Develop a misuse detection algorithm that dramatically reduces the number of false negatives.

2. (will be specified later)

Our Approach

Analogy, also called Case-Based Reasoning (CBR), will be used to match a current ongoing intrusion against previously stored (in a database) signatures of attacks. Uses a flexible match between a new intrusion

sequence and a previously stored signature. We use semantic, rather than syntactic,

analogy.

Example

Man-in-the-middle attacks: ARP spoofing: Switch Sniff (SS)

Computer 1 Computer 2

LAN Switch

Attacker

A0: ping C

A1: ping S

A2: arpredirect -t <ip C> <ip S>

A3: fragrouter –B1

A4: linsniff

A5: ftp <ip S>; <username C>;<password C>

A0: ping L1

A1: ping L2

A2: arp

A3: ifconfig

A4: ./arp-sk -w -d <ip L1> -D <ip L1> -S <ip L2>

A5: echo 1 > /proc/sys/net/ipv4/ip_forward

A6: tcpdump –i eth0 > packets

A7: telnet <ip L2>; <username L1>;<password L1>

The Traditional Approach to Misuse Detection: Exact Match

New Switch Sniff (NSS) Old Switch Sniff (OSS)

Extremely hard to find a match despite being very similar attacks!!

Part I of Our Approach: Plan Recognition

ping L1:

preconditions (knowledge in database):

is_machine( L1 ) knows( self, name( L1 ) )

postconditions (knowledge in database):

up( L1 ) is_in_arp( L1, self ) knows( self, ip_address( L1 ) )

ping L1

Knowledge State 1: • up( L1 )• is_in_arp( L1, self )• knows( self, ip_address( L1 ) )

Knowledge State 0:•is_machine( L1 ) •knows( self, name( L1 ) )

Automatically annotate each command with states of knowledge before/after (called “action schemas”):

New Switch Sniff (NSS) Old Switch Sniff (OSS)

A5: ftp <ip S>; <username C>; <password C>

A4: linsniff

A3: fragrouter –B1

A2: arpredirect -t <ip C> <ip S>

A1: ping S

A0: ping C

A6: tcpdump –i eth0 > packets

A7: telnet <ip L2>; <username L1>; <password L1>

A5: echo 1 > /proc/sys/net/ipv4/ip_forward

A4: ./arp-sk -w -d <ip L1> -D <ip L1> -S <ip L2>

A3: ifconfig

A2: arp

A1: ping L2

A0: ping L1

Part II of Our Approach: Semantic Analogy

Annotated NSS Annotated OSS

pre 0:is_machine( a )knows( self, name( a ) )

post 0:up( a )is_in_arp( a, self )knows( self, ip_address( a ) )

pre 1:is_machine( b )knows( self, name( b ) )

… post 4:

up( b )knows( self, password( var_any, b ) )knows( self, username( var_any, b ) )see( self, traffic( a, b ) ) or

see( self, traffic( a, through( b ) ) )pre 5: post 5:

has_access( self, b )

pre 0:is_machine( a )knows( self, name( a ) )

post 0:up( a )is_in_arp( a, self )knows( self, ip_address( a ) )

pre 1:is_machine( b )knows( self, name( b ) )

… post 6:

up( b )knows( self, password( var_any, b ) )knows( self, username( var_any, b ) )see( self, traffic( a, b ) ) or

see( self, traffic( a, through( b ) ) )pre 7:post 7:

has_access( self, b )

Easy to find a match using deeper semantics!!

Our Important Contribution to Misuse Detection So Far

We have already substantially reduced the likelihood of making errors of omission over prior approaches by: Adding PLAN RECOGNITION to fill in the deeper

semantic knowledge. Doing SEMANTIC ANALOGY.

Project Objectives

1. Develop a misuse detection algorithm that dramatically reduces the number of false negatives.

2. Develop a similarity metric (for analogy) that performs well experimentally on intrusions, but is also universal (not specific to one type of intrusion or one computer language).

Similarity Metrics That WillBe Experimentally Compared

Exact Match City-block Euclidean distance Maximum Three metrics of our own

Partial Match City-block Euclidean distance Maximum Three metrics of our own

One Attack Metric That We Invented

attack distance: min_atk_dst(p) = the minimum number of steps to achieve any possible goal.

• Designed to be well-suited to intrusion detection.• Can identify attacks almost completely different from any ever seen!

NEXT STEPS: Experimentally select the best performing metric,and then formalize it…

Experiments are currentlyin progress

similarity attack distance

A Formalism for Universal Similarity:Kolmogorov Similarity (Li & Vitanyi)

Kolmogorov complexity = the length of the shortest description of an item K(x) = length of the shortest compressed binary version from which x can be fully

reproduced K(x|y) = the length of the shortest program that computes x given that y was already

computed Kolmogorov similarity d(x,y) = the length of the shortest program to compute

one item from the other

Advantages of Kolmogorov similarity:

It is theoretically founded, rather than “ad-hoc.” It is "universal" - it “incorporates” every similarity metric in a large class C of interest,

i.e.,

It is representation independent – it does not depend on the representation of the intrusion signatures (e.g., sequences of commands versus worms).

k

OyxfyxdCyxf1

),(),(,),( k= max{K(x), K(y)}

Kolmogorov SimilarityProblem with Li & Vitanyi's metric:- For our purposes it’s too general We plan to develop a more domain-specific

variant of Kolmogorov similarity, for intrusion detection

Theoretical part of our research to be done:1. Extend the Li & Vitanyi similarity metric with our domain-specific

concerns (such as weights). - Develop our own compression algorithm that includes the domain-specific

concerns in it.- Extend our current attack metric to also do compression on the attack

scenarios.

2. Prove universality still holds for our new metric (but most likely with respect to a different class of metrics than Li & Vitanyi used).

Experimental part of our research to be done:1. Experimentally evaluate and compare our entire approach on a test suite of

previously unseen attacks and non-attacks – to measure its competitive accuracy as a misuse detection tool.(project member and intrusion expert Jinwook Shin is developing all data)

Project 2: Ensembles of Anomaly NIDS for

Classification

Faculty:Diana Spears

Peter Polyakov (Math Dept, Kelly’s advisor, no cost to ONR)

M.S. Students:Carlos Kelly

Christer Karlson (EPSCoR grant, no cost to ONR)(Relevant disciplines: mathematical formalization, AI machine learning)

Anomaly Network Intrusion Detection Systems (NIDS)

They learn (typically using machine learning) a model of normal network traffic and/or packets from negative examples (normal behavior) only.

Utilize a distance metric to determine the difference between a new observation and the learned model.

Classify an observation as an attack if the observation deviates sufficiently from the model.

classifer(learner)

- example

- example

- example

.

.

.model

induction

A Major Drawback of Current Anomaly NIDs

Classification as “attack” or “not attack” can be inaccurate and not informative enough, e.g,. What kind of attack?

Would like a front-end to a misuse detection algorithm that helps narrow down the class of the attack.

Our Approach Combine existing fast and efficient NIDS

(which are really machine learning “classifiers”) together into an ensemble to increase the information.

Will employ a combined theoretical and experimental approach.

Currently working with the classifiers LERAD (Mahoney & Chan) and PAYL (Wang & Stolfo).

Using data from the DARPA Lincoln Labs Intrusion Database

Ensembles of Classifiers A very popular recent trend in machine

learning. Main idea: Create an ensemble of existing

classifiers – to increase the accuracy, e.g., by doing a majority vote.

Our novel twist: Create an ensemble of classifiers to increase the information, i.e., don’t just classify as “positive” or “negative” – give a more specific class of attack if “positive.” How? First we need to look at the “biases” of the classifiers…

Representational bias of a classifier

Representational bias: Any aspect of a learner that affects its choice of model/hypothesis.

For example: A TCP packet has fields, each of which contains a

range of values. These are examples of basic attributes of a packet.

A TCP packet also has attributes, like size, that are not explicitly contained in the packet, but are potentially useful for induction and can be inferred. This is an example of a constructed attribute.

Main Idea

Research Question:How do the biases of the NIDS affect whichtypes of attacks they are likely to detect?

Anomaly NIDS #1Anomaly NIDS #2

Bias of #1Bias of #2

Attack Class #1

Attack Class #2

Attack Class #3

Classification MatrixA

nom

aly

NID

S #

1 C

lass

ifica

tion

Anomaly NIDS #2 Classification

attack Not an attackat

tac

kN

ot a

n at

tack

Attack Class #1Attack Class #2

Attack Class #3

The ensemble will output the probabilities of attack classes.

Contributions Completed

Mathematical formalizations of LERAD and PAYL.

Explicitly formalized the biases of the two programs.

Created a testbed for attacks with Lerad and Payl.

Research To Be Done: Run many experiments with LERAD and PAYL in order to fill

in the classification matrix.

Update and refine the mathematical formalizations of system biases as needed for the classification matrix.

Revise the standard attack taxonomy to maximize information gain in the classification matrix.

Program the ensemble to output a probability distribution over classes of attacks associated with each alarm.

Test the accuracy of our ensemble on a test suite of previously unseen attack and non-attack examples, and compare just accuracy against that of LERAD and PAYL.

Our approach is potentially scalable to lots of classifiers, but will not be done in the scope of the current MURI.

Experiments are currentlyin progress

Project 3: The Basic Building

Blocks of AttacksFaculty:

Diana Spears (UW)William Spears (UW)

Sampath Kannan (UPenn)Insup Lee (UPenn)

Oleg Sokolsky (UPenn)

M.S. Student:Jinwook Shin (UW)

(Relevant disciplines: graphical models, compiler theory, AI machine learning)

Research Question What is an attack? In particular, “What

are the basic building blocks of attacks?” Question posed by Insup Lee. To the best of our knowledge, this specific

research question has not been previously addressed.

Motivation: Attack building blocks can be used in misuse

detection systems – to look for key signatures of an attack.

"basic building blocks"

The essential elements common to all attack programs (“exploits”) in a certain class

e.g., format string attacks, worms

our goal

Two Challenges to Address

Formal model (UW) Find a good formal model for the exploits

(examples). This formalism will provide a common mathematical framework for the examples – to facilitate induction.

Induction algorithm (UW and Penn) Develop an efficient and practical induction

algorithm. Induction will find the intersection (commonalities) of the attack (positive) examples and exclude the commonalities of the non-attack (negative) examples.

Case Studies of Exploits (found from the web)

Format Strings Attacks Using “Attack Operators” An “attack operator” is a group of

meaningful instructions for a specific attack Ba[e]gle Worms

Windows API Call Trace Remote Buffer Overflow Attacks

Using Unix/Linux System Calls Using Unix/Linux C Library Calls

Overview of Our Approach

(Binary) Attack Programs

Data Dependency Graph (DDG)

input

output Attack Patterns

(Building Blocks of Attacks)

Semantic Abstraction

Formalization

Induction

Induction fromgraphical examples

is novel !!

Research Challenges

Problem 1 – Abstraction (being addressed by UW) Program modules with the same semantics can be

written in many different ways. Implementation-specific control flows, dead code, etc. Need to automatically abstract the language in order

to find commonalities. Currently focusing on inferring common subgoals, expressed within subgraphs of DDGs.

Problem 2 – Alignment (being addressed by UPenn) Before finding the intersection between abstract DDGs

by induction, we first need to align corresponding states.

Attack Programs’ Goals There can be as many types of attacks as there

are program bugs, hence as many ways to write attack programs but…

The goal of most security attacks is to gain unauthorized access to a computer system by taking control of a vulnerable privileged program.

There can be different attack steps according to particular attack classes.

But, one final step of an attack is common:

The transfer of control to malevolent code…

Control Transfer

How does an attack program change a program’s control flow?

Initially, an attacker has no control over the target program. But the attacker can control input(s) to the target program. Vulnerability in the program allows the malicious inputs to

cause unexpected changes in memory locations which are not supposed to be affected by the inputs.

Once the inputs are injected, the unexpected values can propagate into other locations, generating more unexpected values.

Attack program

Target programmalicious inputs

Attack Model We model an attack as a sequence of (memory

operation and function call) steps for generating malicious inputs to a target program.

TARGET

PROGRAM

inputs outputs

Output Backtrace

To extract only the relevant memory operations and function calls, we perform an automated static analysis of each exploit by beginning with the input(s) to the target program and then backtracing (following causality chains) through the control and data flow of the exploit.

Research Already Accomplished

Study of attack patterns Automatic DDG Generator Automatic Output Backtracer

(It involved a considerable amount of work in going from binary exploits to this point. Portions of the process were analogous to those in the development of Tim Teitelbaum’s Synthesizer-Generator)

Research To Be Done

Abstraction algorithm for DDGs Efficient induction algorithm over

graphical examples Analysis of the time complexity of the

induction algorithm Final evaluation of the approach

Currentlyin progress

artificial intelligence techniques for misuse and anomaly detection computer science department...

Documents