a survey of host based intrusion detection systems (hids) emre can sezer dept. of comp. science...

A Survey of Host Based Intrusion Detection Systems

(HIDS)

Emre Can Sezer

Dept. of Comp. Science

North Carolina State University

2

Outline

• Introduction and Motivation• Model Creation Techniques• Sampling of Models:

– N-gram model– Callgraph, Abstract Stack models

• Impossible Path Exploit• Brief overview of VtPath, Dyck and VPStatic• Data attacks• Conclusion

3

Introduction

• Terminology– IDS: Intrusion detection system– IPS: Intrusion prevention system– HIDS/NIDS: Host/Network Based IDS

• Anomaly vs. Intrusion Detection– Anomaly also captures misuse :

• There is no intrusion, however, due to bad programming or administering, the process behaves differently than normal (i.e. a bug in the code)

• Intrusions are also anomalies

• Difference between IDS and IPS– Detection happens after the attack is conducted (i.e. the memory

is already corrupted due to a buffer overflow attack)– Prevention stops the attack before it reaches the system (i.e.

shield does packet filtering)

4

Introduction Cont’

• Idea behind HIDS– Define normal behavior for a process

• Create a model that captures the behavior of a program during normal execution.

– Monitor the process• Raise a flag if the program behaves abnormally

5

Why System Calls? (Motivation)

• The program is a layer between user inputs and the operating system

• A compromised program cannot cause significant damage to the underlying system without using system calls

• i.e Creating a new process, accessing a file etc.

6

Model Creation Techniques

• Models are created using two different methods:– Training: The programs behavior is captured during a training

period, in which, there is assumed to be no attacks. Another way is to craft synthetic inputs to simulate normal operation.

– Static analysis: The information required by the model is extracted either from source code or binary code by means of static analysis.

• Training is easy, however, the model may miss some of the behavior and therefore produce false positives.

• Static analysis based models produce no false positives, yet dynamic libraries and source code availability pose problems.

7

Definitions for Model Analysis

• If a model is training based, it is possible that not every normal sequence is in the database. This results in some normal sequences being flagged as intrusions. This is called a false positive.

• If a model fails to flag an intrusion, this is called a false negative.

• Accuracy: An accurate model has few or no false positives.

• Completeness: A complete model has no false negatives.

• Convergence Rate: The amount of training required for the model to reach a certain accuracy

8

A Visual Description of False Positives and Completeness

Normal Behavior

Model

9


False Positives

Normal Behavior

Model

10


Normal Behavior

ModelFalse Negatives

11

N-Gram

• Pioneering work in the field.• Forrest et. al. A Sense of Self for Unix Processes, 1996.• Tries to define a normal behavior for a process by using

sequences of system calls.• As the name of their paper implies, they show that fixed

length short sequences of system calls are distinguishing among applications.

• For every application a model is constructed and at runtime the process is monitored for compliance with the model.

• Definition: The list of system calls issued by a program for the duration of it’s execution is called a system call trace.

12

N-Gram: Building the Model by Training

• Slide a window of length N over a given system call trace and extract unique sequences of system calls.

Example:

System Call traceUnique Sequences Database

13

N-Gram: Monitoring

• Monitoring– A window is slid across the system call trace

as the program issues them, and the sequence is searched in the database.

– If the sequence is in the database then the issued system call is valid.

– If not, then the system call sequence is either an intrusion or a normal operation that was not observed during training (false positive) !!

14

Experimental Results for N-Gram• Databases for different processes with different window sizes are constructed• A normal sendmail system call trace obtained from a user session is tested

against all processes databases.• The table shows that sendmail’s sequences are unique to sendmail and are

considered as anomalous by other models.

The table shows the number of mismatched sequences and their percentage wrt the total number of subsequences in the user session

15

Problems with Sequence Based Approaches

• The minimal foreign sequence problem

An attack sequence S0,S3,S4,S2

cannot be detected

Database includes:…

S0,S3,S4S3,S4,S2

…

16

Problems with Sequence Based Approaches Cont’

• Code insertion:– As long as the order in which an attacker

issues system calls are accepted as normal, he can insert and run his code on the system (i.e. buffer overflow)

17

FSA Model

• Sekar et. al., A Fast Automaton-Based Method for Detecting Anomalous Program Behaviors, 2001.

• Build a non-deterministic finite state automata (FSA) by training.

• Uses program counter (PC) information to address code insertion problems.

• Once PC is coupled with system calls, every system call site in the code becomes unique.

• Instead of using sequences and be limited by length, they use finite state automaton to express every possible sequence.

• The first piece of research to use PC information and automata.

18

FSA Example

An example code and the corresponding FSA built from it

S0,S3,S4,S2 is captured.No length limitation.

Note the non-determinism in states 1,3,6 and 8.

19

Convergence Comparison

• Experiment is run on ftpd.• FSA model converges faster than N-gram.

20

Callgraph and Abstract Stack Models

• Wagner et. al., Intrusion Detection via Static Analysis, 2001.

• Uses finite state automaton to model the process behavior.

• It is based on static analysis of source code.• They introduce three methods:

– Callgraph (NFA)– Abstract Stack (PDA)– Digraph (a static version of N-gram with window size

of 2, not mentioned here)

21

Callgraph Model

• A control flow graph (CFG) is extracted from the source code by static analysis.

• Every procedure f has an entry(f) and exit(f) state. At this point the graph is disconnected.

• It is assumed that there is exactly one system call between nodes in the automaton and these system calls are represented by edges.

• Every function call site v, calling f, is split into two nodes v and v’. Epsilon edges are added from v to entry(f) and from exit(f) to v’.

• The result is a single, connected, big graph.

22

Callgraph Example

Entry point

Function call site is split into two nodes Epsilon edges

Entry(g)

v

w

Exit(g)

open()

close()

exit()

Entry(f)

Exit(f)

getuid()

geteuid()

23

Monitoring Callgraph

• The IDS is given system call information alone, and no PC information.

• When a system call is received, the automaton is simulated to transition between states. If such a transition does not exist in the model, the IDS raises a flag.

• Due to non-determinism, there might be more than one possible state at a given time. In this case every possible state in the program is simulated against the system call and the ones that do not have a transition on the given system call are dropped.

• Non-determinism usually incurs too much computational overhead in large programs.

24

Imprecision in Callgraph

Valid Path

Impossible Path.Yet the model will not

be able to detect it since all transitions are

valid.

The return address in f can be overridden.

25

Abstract Stack Model

• The more information an IDS has, the more accurately it can model the behavior of a program.

• Abstract Stack model makes use of the call stack.

• In order to incorporate this information into their model, they use a push-down automata (PDA).

• The idea is to have an abstract copy of the call stack in the PDA stack.

• At any given state, the PDA’s stack contains the list of return addresses in the call stack.

26

Push-down automata

• As in FSA, PDA have a set of states and a transition function.

• They differ from FSA by also having a stack. They accept context-free languages.

• At every transition, a symbol can be pushed or popped from the stack.

• They can accept either by state or by stack (if stack is empty), which are equivalent in terms of computational power.

• PDA is stronger than FSA. It can accept regular languages and also some irregular ones such as 0n1n.

Start End1

push 0

0 1

pop 0

Once you see a 1, switch to the End state. The stack contains as many 0 as seen in the input.If the stack is empty at the end of the input, accept.

Stack

27

Detecting the IPE Attack

• Consider the previous example of an impossible path.

• The Abstract Stack model will detect the attack since it stores stack information. When returning from state Exit(f), the stack will have the return address v’.

• State v’ does not have a transition on system call exit() hence the attack will be detected.

28

Performance Issues

• Both of the models Callgraph and Abstact Stack have very high operational costs. The reason for this is non-determinism.

• Non-determinism manifest itself in two ways:– State non-determinism: The automaton can be in a

number of different states. When a system call is received, all these states need to be checked for valid transitions.

– Stack non-determinism: Only applies to Abstract Stack model. There can be a number of different ways a state can be reached, resulting in more than one stack configuration.

29

State and Stack Exposure Techniques

• Exposing state can greatly reduce the non-determinism in the model. The state of the program can be exposed by using PC information.

• The stack can be exposed in two ways:– Indirectly as in Abstract Stack, where the PDA

has transitions that simulate the call stack.– Directly by stack walk, simply obtaining the list

of return addresses from the call stack.

30

VtPath

• Feng et. al., Anomaly Detection Using Call Stack Information, ….

• Inspired by Abstract Stack, they use call stack information in their model.

• It is training based and has better convergence rate and comparable false positive rates than the FSA model.

• Uses virtual stack lists to create virtual paths between two consecutive system calls and keeps a database of these virtual paths.

• It uses PC information and stack walk to get the VSL’s.• The model is a collection of virtual paths.• More resistant to IPE’s. It can capture the IPE presented

in Wagner et. al.’s paper.

31

Dyck• Giffen et. al., Efficient Contest-Sensitive Intrusion

Detection, 2004.• Uses static analysis of binary code.• Exposes stack by inserting null-calls before and after

function call sites.• Null-calls are inserted before and after function call sites

to keep track of function calls using binary rewriting.• With null-calls, the stack becomes deterministic and the

performance improves greatly compared to a non-deterministic PDA.

• This model is called a stack-deterministic PDA (SDPDA) in a later paper by the same authors. Feng et. al., Formalizing Sensitivity in Static Analysis for Intrusion Detection, 2004.

32

Dyck: Model Example

C source code exapmle Dyck instrumentation

33

Dyck: Model Example Cont’

Callgraph Model Dyck Model w/o Squelching

34

VPStatic• Feng et. al., Formalizing Sensitivity in Static Analysis for

Intrusion Detection, 2004.• Static analysis version of VtPath.• Instead of using sequences, they define transitions on a

PDA.• The state of the program is exposed by using PC

information.• The stack is exposed by using stack walk and VSL’s.• The goal is to create a deterministic PDA by exposing

stack and state information.• The model is fully deterministic.• Operating the deterministic PDA is less expensive,

however, the bottleneck in VPStatic is the stack walk operation.

35

Overview of the Models

• The trend has been towards more complicated automata and static analysis.

• Models using state exposure are immune to code insertion. i.e FSA, VtPath, VPStatic.

• Models using stack exposure are immune to control-flow hijacking. i.e Abstract Stack, Dyck, VPStatic.

• Still, if an attack does not issue system calls, these models might fail.

36

Data Flow Attack

• A variation of the IPE. • The control flow is altered but not hijacked.• Instead of overwriting return addresses to

change the control flow, a data used as a predicate in a branch is overwritten.

• The Data Flow Attack does not traverse any function boundaries evading even PDA based models.

• The models need to be flow sensitive in order to capture such an attack.

37

Data Flow Attack Example

• The system call sequences <sys_1, sys_5, sys_3> and <sys_2, sys_5, sys_4> are normal sequences.

• Any of the afore mentioned models using will also accept <sys_1, sys_5, sys_4> and <sys_2, sys_5, sys_3>

• There is no way the model can relate the first loop to the second.

• Execution path history needs to be known to be able to detect such an attack.

sys_5 ();

38

User ID Hijacking

• Example attack on WU-FTPD.• When a user issues a get or a put command, the

effective user id (EUID) is temporarily escaladed to root in order to perform setsockopt().

• Using format string vulnerability, pw->pw_uid can be set to 0 (root), giving root privileges to the user.

FILE * getdatasock( ... ) { ... seteuid(0); setsockopt( ... ); ... seteuid(pw->pw_uid); ...}

39

Decision-Making Data Hijacking

• The following example code is taken from a SSH implementation.

• The function detect_attack() has an integer overflow vulnerability.

• Using the vulnerability, the authenticated flag can be set to non-zero, allowing a user root privilege without him ever supplying a password.

void do_authentication(char *user, ...) {1: int authenticated = 0;...2: while (!authenticated) { /* Get a packet from the client */3: type = packet_read(); // calls detect_attack() internally4: switch (type) { ...5: case SSH_CMSG_AUTH_PASSWORD:6: if (auth_password(user, password))7: authenticated =1; case ... }8: if (authenticated) break; } /* Perform session preparation. */9: do_authenticated(pw);}

40

Why Automata Can’t Capture Data Flow Attack

• With the call to the system call in between the branches (sys_5), the model looses all execution path information.

• None of the models mentioned are neither flow nor path sensitive.

Start

Sys_1

Sys_2

Sys_5

Sys_3

Sys_4

End

• There are no function calls, so stack exposure is ineffective against this attack.

• In the absence of function calls, all the models keep track of consecutive system calls. In other words, they are only as powerful as N-gram with a window size of 2.

Normal path

Abnormal path

41

A Different Look at System-Call Based IDS’s

• The problem is recording every possible system call trace an application can produce.

• In doing so, other security issues such as code injection and mimicry attacks must be considered.

• The models we have seen are compact approximations for these infinite sets.

42

Back To Training Based Models

• The data attack can be detected in two ways:– Finer grained methods: Live analysis of

variables, or checking predicates at branches.– Using training: Normal user sessions will not

produce sequences seen with the data attack.

• Using finer grained models beats the purpose of having system-call based IDS’s.

43

Execution Path History

• Given a node v in the graph, the execution path will go through a number of branch instructions and loops before reaching this state.

• If we were able to keep track of the execution path that was taken up to node v, we could append that information to the node using training.

• In the data attack example, node Sys_3 would know that only an execution path that’s been through node Sys_1 should exist.

Start

Sys_1

Sys_2

Sys_5

Sys_3

Sys_4

End

Start, Sys_1, Sys_5

Start, Sys_2, Sys_5

44

Obtaining Execution Path History

• One major tool in accomplishing this task could be null-call insertion.– It is used in the Dyck model to keep track of function

call sites. The idea can be applied to every branch that issues a system call.

• A sequence of previously issued system calls can be appended to every node.

• During monitoring, the execution path history will be matched against the possible histories at every node.

45

Performance Considerations

• Every node will have a great number of possible histories that needs to be kept track of. Considering the size of today's applications, recording a list of these paths for every node is clearly not possible.

• Compact representations must be developed. For example, a node that has only a single incoming edge needs not keep an entire record of histories as its history is a single system call appended to its predecessors execution path history.

• Also not every node on the path is critical. If most of the execution paths have common substrings, there might be a way to extract the important information from the sequence.

46

Conclusion

• When it comes to real time intrusion detection, false positives are unacceptable. This has lead researchers towards static analysis based, complicated models such as Dyck and VPStatic.

• Yet, the data attacks shows that even these models are not complete.

• Still, these models should not be underrated, since they can capture code insertion, stack corruption and impossible path attacks with no false positives.

a survey of host based intrusion detection systems (hids) emre can sezer dept. of comp. science...

Documents

complete model

model analysisif

accurate model

normal sequences

hidsdefine normal behavior

normal execution

normal operation

intrusion detectionanomaly