a hybrid finite automaton for practical deep packet inspection department of computer science and...

24
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. Authors: Michela Becchi; Patrick Crowley Publisher: ACM Conference on Emerging Network Experiment and Technology, CoNEXT 2007, New York, NY, USA, December 10-13, 2007. ACM 2007, ISBN 978- 1-59593-770-4 Present: Chia-Ming ,Chang Date: 4, 22, 2009 1

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

A hybrid finite automaton for practical deep packet

inspection

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Authors: Michela Becchi; Patrick Crowley

Publisher: ACM Conference on Emerging Network Experiment and Technology, CoNEXT 2007, New York, NY, USA, December 10-13, 2007. ACM 2007, ISBN 978-1-59593-770-4

Present: Chia-Ming ,Chang Date: 4, 22, 2009

1

Page 2: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Outline

1. Introduction 2. State Blow-up 3. Hybrid-FA 4. Experimental results 5. concluding remarks

2

Page 3: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Introduction (1/5)

3

For example:The regular expression “prefix.{100}suffix”, which matches ifand only if “prefix” is separated from “suffix” by 100characters,

“require well over 1 million states to be represented in a DFA.”

The regular expression above would require just 101+ (# charsin prefix) + (# chars in suffix) NFA states.

Page 4: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Introduction (2/5)

4

Problem :DFAs corresponding to large sets of regularexpressions, each one representing a different rule, can beprohibitively large

First method: an explosion in states can occur when many rules are grouped together into a single DFA, Yu et al. have proposed segregating rules into multiple groups andevaluating the corresponding DFAs concurrently.

pro and con : This solution decreases memory space requirements, sometimesdramatically, but increases memory bandwidth linearly withthe number of active DFAs.

PS: The memory bandwidth requirement can be expressed interms of the number of memory operations to be performedfor each input character processed.

Page 5: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Introduction (3/5)

5

Problem :DFAs corresponding to large sets of regularexpressions, each one representing a different rule, can beprohibitively large

Second method: aims at reducing the memory space requirement of any

given DFA and is based on two observations.

(A) the memory space required to store a DFA strictly depends on the number of transitions between states.

(B) many states in DFAs have identical sets of outgoing transitions. (remove redundancy)

pro and con : This solution compression technique proposed trades off memory storage requirements with processing time.

Page 6: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Introduction (4/5)

6

we propose a hybrid DFA-NFA finiteautomaton (Hybrid-FA), a solution bringing together thestrengths of both DFAs and NFAs.

we are interested in two distinct situations:

a) State blow-up happens when compiling a single regular expression in isolation.

b) Given two regular expressions RE1 and RE2 the corresponding DFA1 and DFA2 can be built without incurring state explosion. However, when compiling RE1 and RE2 into a unique DFA12, either the number of states in DFA12 is significantly greater than the sum of DFA1 and DFA2, or DFA12 cannot be built at all, due to exponential state explosion.

Page 7: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Introduction (5/5)

3

we are interested in two distinct situations:

a) =>DFAs are not a feasible representation of the given regular expression

b) =>keeping the two DFAs separated and operating them in parallel However, given a set of regular expressions, it would be beneficial to

be able to predict this situation without testing all possible combinations.

The goals of this work are the following:• Explore two distinct conditions which lead to the state blow up during subset construction.• Propose a hybrid automaton which deals with the above problems in a unified way.• Refine the proposal in order to provide an acceptable worse case bound on the memory bandwidth requirement.

Page 8: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Outline

1. Introduction 2. State Blow-up 3. Hybrid-FA 4. Experimental results 5. concluding remarks

7

Page 9: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

State Blow-up (1/4)

8

In practical rule-sets, dot-star conditions do not causestate blow-up when individual regular expressions arecompiled in isolation.

Figure1, which compares DFAs accepting regular expressions abcd and ab.*cd,

It can be observed that the number of states is the same in the two cases; the transitions are “moved toward” the tail of the DFA in thesecond one.

Page 10: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

State Blow-up (2/4)

9

Figure 2 illustrates this situation in the composite DFAfor regular expressions “ab.*cd” and “efgh”. Notice that thesub-DFA matching “efgh” is replicated: first in states 2, 4, 7and 10, and second in states 6, 9, 11 and 12. The secondreplica originates from state 3, which derives fromexpanding the dot-star condition.

Page 11: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

State Blow-up (3/4)

10

Figure 3 illustrates this fact on the simple regular expression“ab.{3}cd”. Clearly, the size of the DFA grows rapidly withthe cardinality of the counting constraints

Page 12: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

State Blow-up (4/4)

11

In contrast to dot-star terms, counting constraints onwildcards and large character ranges cause exponentialstate blow-up when creating DFAs even for single regularexpressions.

This can be explained as follows: when expanding the counting constraint, all possible occurrences of the regular expression prefix must be considered at eachinstance of the wildcard or character range.

Page 13: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Outline

1. Introduction 2. State Blow-up 3. Hybrid-FA 4. Experimental results 5. concluding remarks

12

Page 14: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Hybrid-FA (1/5)

13

processing symbol a in NFA state 0 leads to NFA states 0 and 1, processing the same characterin (DFA) state 0 of the hybrid automaton leads to a (DFA)state tagged 0-1.

It can be noted that the border state 0-2-5has two distinct outgoing transitions on character c:

Border state

NFA-Part

DFA-Part

Page 15: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Hybrid-FA (2/5)

14

resulting hybrid-FA will exhibit some useful properties. Specifically:

i) the starting state will be a DFA-state;

ii) the NFA part of the automaton will remain inactive till a border state is reached; iii) There will be no backwards activation of the DFA coming from the NFA.

Page 16: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Hybrid-FA (3/5)

15

Page 17: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Hybrid-FA (4/5)

16

As long as the border state 2 is not traversed. In other words, as long as the prefix of the first regular expression “ab” is not matched, processing is restricted to the DFA portion.

Page 18: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Hybrid-FA (5/5)

17

Page 19: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Outline

1. Introduction 2. State Blow-up 3. Hybrid-FA 4. Experimental results 5. concluding remarks

18

Page 20: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Experimental results (1/3)

19

982 distinct regular expressions: 25% contain longcounting constraints, generally located at the end of theregular expressions, 11.4% contain .* conditions and54.89% [^c1c2...ck]* conditions.

The Snort IDS performs packet payload inspectiononly after header filtering (i.e. packet classification).

we clustered rules with common header andperformed experiments on some of the largest groups.

Page 21: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Experimental results (2/3)

20

Rule-sets group1 and group2 do not contain counting constraints,

Rule-sets group3-group6 of counting constraints encountered consist of 20 to 1024 repetitions of large character ranges;

Over 10 million states

Page 22: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Experimental results (3/3)

21

The memory bandwidth requirement can be expressed interms of the number of memory operations to be performedfor each input character processed.

Page 23: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

Outline

1. Introduction 2. State Blow-up 3. Hybrid-FA 4. Experimental results 5. concluding remarks

22

Page 24: A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,

concluding remarks (1/1)

23

The primary contribution of this work is the hybrid-FA,which is, to our knowledge, the first automaton that iscapable of evaluating all the regular-expression types foundin common NIDS systems such as Snort and can beimplemented efficiently in practical high-speed systems.

The key characteristics of a hybrid-FA are: a modestmemory storage requirement comparable to those of anNFA solution, an average case memory bandwidthrequirement similar to that of a single DFA solution