memory-efficient regular expression search using state merging
DESCRIPTION
Memory-Efficient Regular Expression Search Using State Merging. Author: Michela Becchi , Srihari Cadambi Publisher: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE Presenter: Ching-Hsuan Shih Date: 2014/04/09. - PowerPoint PPT PresentationTRANSCRIPT
Memory-Efficient Regular Expression Search Using State Merging
Author: Michela Becchi, Srihari CadambiPublisher: INFOCOM 2007. 26th IEEE International Conference
on Computer Communications. IEEE Presenter: Ching-Hsuan ShihDate: 2014/04/09
Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Outline Introduction Related Work State Merging: A Motivational Example State Merging in DFAs Bitmap-based Data Structures for DFAs Experimental Results
2National Cheng Kung University CSIE Computer & Internet Architecture Lab
Introduction (1/2)
Network Intrusion Detection System (NIDS)• Is a device or software to monitor the network whether
there are malicious activities.• Most IDS is to observe the network packet ,system log
or network flow. Regular Expression
• Current rule-sets like Snort, Bro, and many others are replacing strings with the more powerful and expressive regular expressions.
National Cheng Kung University CSIE Computer & Internet Architecture Lab
3
Introduction (2/2)
The classical method to perform regular expression search is to use a deterministic finite automaton (DFA).
The main problem with DFAs is prohibitive memory usage:• The number of states in a DFA scale poorly with the size and
number of wildcards in the regular expressions they represent.
We propose a novel technique that allows non-equivalent states in a DFA to be merged using a scheme where the transitions in the DFA are labeled.
National Cheng Kung University CSIE Computer & Internet Architecture Lab
4
Related Work
National Cheng Kung University CSIE Computer & Internet Architecture Lab
5
Delayed DFA (D2FA) [6]:• It identifies two (or more) states that transition to the same set of
destinations on the same input characters.• D2FA achieves memory compaction by removing duplicated
transitions, but this happens at the expense of latency.• States with a default transition require more than one transition
per input character. In [14]:
• The authors propose increasing the speed of regular expression search by expanding the alphabet.
• They process two characters (bytes) for every state transition in the DFA.
• This produces an exponential increase in memory usage.
State Merging: A Motivational Example(1/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
6
State Merging: A Motivational Example (2/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
7
The merged state is represented as 3_4 The transition [g-i]/0, j/1 indicates that the
same next state, in this case state 5, is reached from state 3_4 upon receiving input characters g, h, i with label 0 or input character j with label 1.
State Merging: A Motivational Example (3/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
8
State Merging: A Motivational Example (4/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
9
The merged state is represented as 1_2 The transition a.0/0,1 from state 3_4 to state
1_2 means:• The transition carries with it a label 0 that tells its
destination state, 1_2 that the transition is meant for underlying original state 1.
• The transition is taken when its source state 3_4 receives labels 0 or 1.
State Merging in DFAs (1/3)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
10
A. Labels For every transition connecting two merged
states, we define source labels and destination labels, ex. c.ld/l0, l1…
B. Legality of State Merging
State Merging in DFAs (2/3)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
11
C. Merging and Labeling Algorithm
State Merging in DFAs (3/3)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
12
Bitmap-based Data Structure for DFAs (1/3)
Basic:
National Cheng Kung University CSIE Computer & Internet Architecture Lab
13
Bitmap-based Data Structure for DFAs (2/3)
Bitmap-based:
National Cheng Kung University CSIE Computer & Internet Architecture Lab
14
Bitmap-based Data Structure for DFAs (3/3)
Bitmap-based merged data structure:
National Cheng Kung University CSIE Computer & Internet Architecture Lab
15
Experimental Results (1/2)
Note that the Snort rule-sets have lower percentages of distinct next state transitions than the Bro rule-sets. This is due to the large number of character ranges (both in the form [c1-c2] and \d, \D, \w, \W, \s, \S) and to the fact that Snort regular expressions are not case sensitive.
National Cheng Kung University CSIE Computer & Internet Architecture Lab
1616
Experimental Results (2/2)
The width of the transition table is set to 32 bits.
National Cheng Kung University CSIE Computer & Internet Architecture Lab
1717