sampling techniques to accelerate pattern matching in network intrusion detection systems

Sampling Techniques to Accelerate Pattern

Matching in Network Intrusion Detection Systems

Author ： Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano Giordano, Gregorio Procissi, Fabio VitucciPublisher ： 2010 IEEE International Conference on Communications (ICC)Presenter ： Wen-Tse LiangDate ： 2011/5/4

Introduction Sampling DFAS REGEX SAMPLING RULES

Regex rewriting

DOUBLE STAGE SCHEME First stage: Sampled DFA Second stage: Reverse DFA

EXPERIMENTAL RESULTS

Outline

The previous works proposing acceleration techniques rely on multiplying the amount of bytes (strides) processed per cycle, with the obvious problem of memory blow-up (due to the exponential growth of edge numbers with the stride size).

Introduction

Our approach to the finite automata speed up is completely innovative: sampling the text, thus having less symbols to process.

Clearly, sampling introduces some issues and a certain probability of false alarms is introduced. We address these issues by using together a “sampled” DFA and a “reverse” DFA

Introduction

Our idea is to speed up the process by “sampling” the traffic stream:

we extract a byte every θ bytes from the stream, where θ is the sampling period.

The sampled bytes are then used as input to a proper sampled DFA. The outcome is that all regular traffic is processed θ times faster.

Sampling DFAS

Example: the regex ab.∗cd is sampled (with θ = 2) to [ab].∗[cd] and matched against a text of 16 bytes.

A Motivating Example

Lemma 1: Let DFA A describe a single regular

expression R and let a text T match R. The corresponding sampled DFA AS will match

the sampled text SθT if the sampling period θ satisfies the following condition:

REGEX SAMPLING RULES

Regex rewriting simple string str

concatenation of regular expressions a and b: ab

union of regular expressions a and b: a|b

the case of a star closure of a character a followed by a regex

an example helps better understand the rules: let us sample . ∗ abcde ∗ fgh with period θ =

2. By applying the rules, it follows that:

S2[.*ab.*cd] = .*(a|b).*(c|d) S2[.*abcde*fgh ] = .*(ac|bd)e*(fh|g)

S3[.*abcde*fgh ] = .*(ad|b|c)e*(f|g|h)

First stage: Sampled DFA

By sampling all the regexes belonging to the set, we obtain the “sampled” rules on which the “sampled DFA” has to be built.

Such a resulting automaton is a simple DFA and does not require additional information on the states or on the transitions.

DOUBLE STAGE SCHEME

Second stage: Reverse DFA we propose a novel scheme with a reverse DFA. This

requires a slightly larger amount of off-line processing: all the regexes have to be independently reversed and a new DFA has to be built according to such new rules.

More precisely, to take into account all the characters belonging to the string, the correct starting point for the reverse DFA is the (k+1)-th sampled char in the text:

This way we process some useless characters (less than θ), but the correctness of the detection in ensured.

DOUBLE STAGE SCHEME

Algorithm 1 Pseudo-code for the lookup procedure.

DOUBLE STAGE SCHEME

EXPERIMENTAL RESULTS

sampling techniques to accelerate pattern matching in network intrusion detection systems

sampled rules

sampled dfaby sampling

sampling period

sampled bytes

corresponding sampled

proper sampled dfa

sampling techniques

sampled dfasecond stage

Documents

towards faster string matching for intrusion detection

efficient packet pattern matching for gigabit network...

1 - cs7701 – fall 2004 deterministic memory-efficient...

how to leverage matching gifts to accelerate your...

masters thesis: an examination of pattern matching ... ·...

intrusion detection intrusion detection intrusion detection...

algorithms to accelerate multiple regular expressions...

honeywell intrusion systems honeywell intrusion systems

matching tcp/ip packet to detect stepping-stone intrusion

algorithms to accelerate multiple regular expressions...

multiple pattern matching for network security...

a digest and pattern matching-based intrusion detection...

algorithms to accelerate multiple regular expressions...

accelerate - the hon companyconnect.hon.com › documents...

host/hybrid intrusion detection - home | george mason...

virtualizing suricata* ips using hyperscan pattern...

deterministic memory- efficient string matching algorithms...

a high throughput string matching architecture for intrusion...

intrusion tolerance mete geleŞ. overview ...

intrusion detection & intrusion prevention systems