memory-efficient regular expression search using state merging

17
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE Presenter: Ching-Hsuan Shih Date: 2014/04/09 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Upload: avari

Post on 22-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Memory-Efficient Regular Expression Search Using State Merging. Author: Michela Becchi , Srihari Cadambi Publisher: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE  Presenter: Ching-Hsuan Shih Date: 2014/04/09. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Memory-Efficient Regular Expression Search Using State  Merging

Memory-Efficient Regular Expression Search Using State Merging

Author: Michela Becchi, Srihari CadambiPublisher: INFOCOM 2007. 26th IEEE International Conference

on Computer Communications. IEEE Presenter: Ching-Hsuan ShihDate: 2014/04/09

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Page 2: Memory-Efficient Regular Expression Search Using State  Merging

Outline Introduction Related Work State Merging: A Motivational Example State Merging in DFAs Bitmap-based Data Structures for DFAs Experimental Results

2National Cheng Kung University CSIE Computer & Internet Architecture Lab

Page 3: Memory-Efficient Regular Expression Search Using State  Merging

Introduction (1/2)

Network Intrusion Detection System (NIDS)• Is a device or software to monitor the network whether

there are malicious activities.• Most IDS is to observe the network packet ,system log

or network flow. Regular Expression

• Current rule-sets like Snort, Bro, and many others are replacing strings with the more powerful and expressive regular expressions.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

3

Page 4: Memory-Efficient Regular Expression Search Using State  Merging

Introduction (2/2)

The classical method to perform regular expression search is to use a deterministic finite automaton (DFA).

The main problem with DFAs is prohibitive memory usage:• The number of states in a DFA scale poorly with the size and

number of wildcards in the regular expressions they represent.

We propose a novel technique that allows non-equivalent states in a DFA to be merged using a scheme where the transitions in the DFA are labeled.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

4

Page 5: Memory-Efficient Regular Expression Search Using State  Merging

Related Work

National Cheng Kung University CSIE Computer & Internet Architecture Lab

5

Delayed DFA (D2FA) [6]:• It identifies two (or more) states that transition to the same set of

destinations on the same input characters.• D2FA achieves memory compaction by removing duplicated

transitions, but this happens at the expense of latency.• States with a default transition require more than one transition

per input character. In [14]:

• The authors propose increasing the speed of regular expression search by expanding the alphabet.

• They process two characters (bytes) for every state transition in the DFA.

• This produces an exponential increase in memory usage.

Page 6: Memory-Efficient Regular Expression Search Using State  Merging

State Merging: A Motivational Example(1/4)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

6

Page 7: Memory-Efficient Regular Expression Search Using State  Merging

State Merging: A Motivational Example (2/4)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

7

The merged state is represented as 3_4 The transition [g-i]/0, j/1 indicates that the

same next state, in this case state 5, is reached from state 3_4 upon receiving input characters g, h, i with label 0 or input character j with label 1.

Page 8: Memory-Efficient Regular Expression Search Using State  Merging

State Merging: A Motivational Example (3/4)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

8

Page 9: Memory-Efficient Regular Expression Search Using State  Merging

State Merging: A Motivational Example (4/4)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

9

The merged state is represented as 1_2 The transition a.0/0,1 from state 3_4 to state

1_2 means:• The transition carries with it a label 0 that tells its

destination state, 1_2 that the transition is meant for underlying original state 1.

• The transition is taken when its source state 3_4 receives labels 0 or 1.

Page 10: Memory-Efficient Regular Expression Search Using State  Merging

State Merging in DFAs (1/3)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

10

A. Labels For every transition connecting two merged

states, we define source labels and destination labels, ex. c.ld/l0, l1…

B. Legality of State Merging

Page 11: Memory-Efficient Regular Expression Search Using State  Merging

State Merging in DFAs (2/3)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

11

C. Merging and Labeling Algorithm

Page 12: Memory-Efficient Regular Expression Search Using State  Merging

State Merging in DFAs (3/3)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

12

Page 13: Memory-Efficient Regular Expression Search Using State  Merging

Bitmap-based Data Structure for DFAs (1/3)

Basic:

National Cheng Kung University CSIE Computer & Internet Architecture Lab

13

Page 14: Memory-Efficient Regular Expression Search Using State  Merging

Bitmap-based Data Structure for DFAs (2/3)

Bitmap-based:

National Cheng Kung University CSIE Computer & Internet Architecture Lab

14

Page 15: Memory-Efficient Regular Expression Search Using State  Merging

Bitmap-based Data Structure for DFAs (3/3)

Bitmap-based merged data structure:

National Cheng Kung University CSIE Computer & Internet Architecture Lab

15

Page 16: Memory-Efficient Regular Expression Search Using State  Merging

Experimental Results (1/2)

Note that the Snort rule-sets have lower percentages of distinct next state transitions than the Bro rule-sets. This is due to the large number of character ranges (both in the form [c1-c2] and \d, \D, \w, \W, \s, \S) and to the fact that Snort regular expressions are not case sensitive.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

1616

Page 17: Memory-Efficient Regular Expression Search Using State  Merging

Experimental Results (2/2)

The width of the transition table is set to 32 bits.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

1717