optimization of pattern matching algorithm for memory based architecture · algorithm for memory...

Post on 22-Jun-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Optimization of Pattern Matching Algorithm for Memory Based

Architecture

Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang

National Tsing Hua University, Taiwan, R.O.C

Outline

Memory architecture for string matching

Basic idea

Novel Algorithm for memory architecture

Experimental results and conclusions

Introduction

Network Intrusion Detection System is used to detect network attacks by identifying attack patterns.

Software-only approaches can no longer meet the high throughput of today’s networking

Hardware approaches for acceleration.– Logic architecture– Memory architecture

Advantage of Memory Architecture

Young H. Cho and William H. Mangione-Smith, “A Pattern Matching Co-processor for Network Security,” in Proc. 42nd IEEE/ACM Design Automation Conference, Anaheim, CA, June 13-17, 2005.

M. Aldwairi*, T. Conte, and P. Franzon. “Configurable String Matching Hardware for Speeding up Intrusion Detection,” in Proc. ACM SIGARCH Computer Architecture News, 33(1):99–107, 2005.

S. Dharmapurikar and J. Lockwood. “Fast and Scalable Pattern Matching for Content Filtering,” in Proc. Symposium on Architectures for Networking and. Communications Systems (ANCS), Oct 2005.

The memory architecture has attracted a lot of attention because of its easy re-configurability and scalability.

Memory Architecture

“bcdf”“pcdg” 0 1 2 3

5 6 8

pc

b c d f

d g

b b b b~b & ~p

p f ff

pppp b

bb

b4

7

Current state

Decoder

Input

NS1 NS2 …… NS256 MV<8> <8> …… <8> <16>

256:1 MUX8

FSM

Attack Patterns

Memory

matchvector

Major Issue of Memory ArchitectureDue to the increasing number of attacks, the required memory increasestremendously– The performance, cost, and power

consumption are related to the memorysize

– Reducing the memory size has become imperative

OutlineMemory architecture for string matching

Basic idea

Novel algorithm for memory architecture

Experimental results and Conclusions

Review of Aho-Corasick AlgorithmAho-Corasick (AC) algorithm can reduce large number of state transitions and memory size.– Solid line represents

valid transitions.– Dotted line represents

failure transitions.– Introduce the failure transition

to reduce the outgoingtransitions.

2 3 4

6 7 8

pc

b c d f

d g

0 1

5

AC state machine of “bcdf”and “pcdg”

ObservationMany string patterns are similar because of common sub-stringsThe similarity does not lead to a small state machine.

“bcdf”“pcdg”

2 3 4

6 7 8

pc

b c d f

d g

0 1

5

AC state machine

Merge Similar States

The merg_FSM is a different machine – smaller number of states and transitions.– smaller memory in memory architecture.

0 1 2 3 4

5 6 7 8

pc

b c d f

d g

0 1 26 37 4

5 8p c

b c d f

g

merg_FSM

Problem of merg_FSMDirectly merging similar states results in an erroneous state machine.

0 1 2 3 4

5 6 7 8

pc

b c

d g

d f

input stream = {p, c, d, f}

0 1 26 37 4

5 8p c

b c

g

d f

merg_FSMAC state machine

False Positive

OutlineMemory architecture for string matching

Basic Idea

Novel Algorithm for memory architecture

Experimental results and Conclusions

State Traversal MechanismStore merg_FSM table in memoryState traversal mechanism is used to memorize the precedent state and differentiate merged states.

0 1 26 37 4

5 8p c

b

g

c d f

State traversal mechanism

merg_FSM

2 3 4

6 7 8

pc

b c d

d g

f0 1

5

AC state machine

?2 or ?6

New State InformationAC state machine stores match vector.New state machine stores– PathVec stores path information.– IfFinal indicates whether the state is a final state.

match vector

c d0 1 2 3 4

5 6 7 8

p

c

b f

d g

0000 00 00 01

00 00 00 10

AC State Machine

pathVec_ifFinal

c d0 1 2 3 4

5 6 7 8p

c

b f

d g

01_011_0 01_0 01_0 01_1

10_0 10_0 10_0 10_1

New State Machine

Pseudo-Equivalent StatesDefinition: Two states are pseudo-equivalent if they have – identical input transitions– identical failure transitions– identical ifFinal– but different next states.

c d0 1 2 3 4

5 6 7 8

pc

b f

d g

01_0 01_0 01_0 01_1

10_0 10_0 10_0 10_1

11_0

Merge Pseudo-Equivalent States

c d0 1 2 3 4

5 6 7 8

pc

b f

d g

01_011_0

01_0 01_0 01_1

10_0 10_0 10_0 10_1 11_0

0 1 26 37 4

5 8p c

b c d fg

10_0

01_0 01_1

10_1

11_0 11_0

Pseudo-equivalentstates are merged.

PathVec and ifFinalare updated by a unionof merged states

State Traversal Mechanism PreReg traces the precedent pathVec in each state.

0 1 26 37 4

5 8p c

b c d f

g

11_0

10_0

01_0 11_0 11_0 01_1

10_1

input stream: {p, c, d, f}

Next state pathVec ifFinal

preReg11

10

10

11 111001

00

OutlineMemory architecture for string matching

Basic Idea

Novel algorithm for memory architecture

Experimental results and Conclusions

Experiment IPerform experiments on Snort rule sets.

Compare our approach with the Aho-Corasick algorithm .

A.V. Aho and M.J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. In Communications of the ACM 1975.

Compare with Traditional ACTradition AC [24] Our algorithm

# of trans.

# of states

Memory(bytes)

# of trans.

# of states

Memory(bytes)

MemoryReduct.

Oracle 138 4,674 2,180 2,185 880,009 1,389 1,221 452,533 49%

Sql 44 1,089 421 422 129,290 321 284 87,011 33%

Backdoor 57 599 563 565 191,253 523 497 152,268 20%

Web-iis 113 2,047 1,533 1,537 569,651 1,273 1,155 428,072 25%

Web-php 115 2,455 1,670 1,675 620,797 1,295 1,142 423,254 32%

Web-misc 310 4,711 3,576 3,587 1,444,664 3,031 2,734 1,101,119 24%

Web-cgi 347 5,339 3,407 3,419 1,377,002 2,672 2,358 949,685 31%

Total rules 1,595 20,921 17,472 17,522 8,745,668 14,704 13,381 6,248,927 29%

Ratio 1 1 1 84% 76% 71% 29%

Rule Sets # of patterns # of char.

Experiment II

Enhance the bit-split algorithm with our method– The results are compared with the original

bit-split algorithm.

L. Tan and T. Sherwood. A high throughput string matching architecture for intrusion detection and prevention. In ISCA’05.

Compare with Traditional Bit-Split Bit-split [8] Bit-split + Our algorithm

# of trans.

# of states

Memory(bytes)

# of trans.

# of states

Memory(bytes)

MemoryReduct.

Oracle 138 4,674 6,645 6,665 633,175 4,146 3,603 358,499 43%

Sql 44 1,089 1,211 1,215 110,565 866 769 72,671 34%

Backdoor 57 599 1,697 1,705 155,155 1,441 1,305 126,585 18%

Web-iis 113 2,047 4,869 4,885 464,075 3,844 3,374 335,713 28%

Web-php 115 2,455 4,991 5,011 476,045 3,871 3,345 332,828 30%

Web-misc 310 4,711 10,959 11,003 1,067,291 8,861 7,816 797,232 25%

Web-cgi 347 5,339 9,901 9,949 965,053 7,875 6,957 709,614 26%

Total ruls 1,595 20,921 53,930 54,130 5,467,130 43,550 38,701 4,237,760 22%

Ratio 1 1 1 81% 71% 78% 22%

Rule Sets # of patterns # of char.

ConclusionProvide a concept of merging pseudo-equivalent states to reduce the number of states and transitions.

Propose a state traversal mechanism working with the merg_FSM without false positive matching results.

Experimental results demonstrate a significant reduction in memory requirement.

Thank You!

Backup

Cycle ProblemMerging disorder sections of pseudo-equivalent states creates cycle problem.

0 1 2 43

128 9 10 11

6a b c d e f

d e b c g7

w

5

Cycle ProblemFor example, the input string “abcdebcdef” will be mistaken as a match of the pattern “abcdef.”

0 1 2 43

12

5 6a b c d e f

g7

w d

b

Construction of State Traversal Machine

Construction of the state traversal machine consists of two steps

– Step1: Construct valid transitions, failure transitions, pathVec, and ifFinal function.

– Step2: Merge the pseudo-equivalent states.

ExampleConsider three patterns “abcdef”, “apcdeg”, “awcdeh”.

0 1 2 43

7 8 9

5

10 11

6

12 13 14 15 16

a b c d e f

p

w

001_1001_0

010_0 010_0 010_0

100_0 100_0 100_0 100_0

010_1

100_1

c d e g

c d e h

001_0001_0001_0001_0011_0111_0001_0011_0111_0

010_0

16 states

100_0d

d

13

001_0

Merging Pseudo-equivalent States

0 1 2 43

7 8 9 10 11

6

12 14 15 16

a b c d e

pc e

w c e

001_1001_0001_0001_0111_0111_0

010_0 010_0 010_0 010_0

100_0 100_0 100_0

010_1

100_1

5f

g

h

111_0

merging the failure transitionsperforming the union on the pathVec of the merged states

001_0

100_0

111_0

Merging Pseudo-equivalent States

0 1 2 43

7 9 10 11

6

12 14 15 16

a b c d e

pc d e

wc d

e

001_1001_0001_0111_0111_0

010_0 010_0 010_0

100_0 100_0

010_1

100_1

f

g

h

5

100_1

15100_0

111_0

Merging Pseudo-equivalent States

0 1 2 43

7 9 10 11

6

12 14 16

a b c d e f

pc d e g

wc d

eh

001_1001_0001_0001_0111_0111_0

010_0 010_0 010_0

100_0 100_0

010_1

5111_0 111_0

10 states

State Traversal AlgorithmAlgorithm: State traversal pattern matching

algorithmInput: A text string x=a1a2…an where each

ai is an input symbol and a state traversal machine M with valid transition function g, failure transition function f, path function pathVec and final function ifFinal.

Output: Locations at which keywords occur in x.

Method:begin

state←0preReg←1….1 //all bits are initiated to 1.for i←until n do

beginpreReg = preReg & pathVec(state)

while g(state, ai) == fail || preReg == 0 do

beginstate←f (state)preReg←1….1

end

state←g(state, ai)if ifFinal(state) = 1 then

beginprint iprint preReg

endend

end

top related