optimization of pattern matching algorithm for memory based architecture · algorithm for memory...

33
Optimization of Pattern Matching Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing Hua University, Taiwan, R.O.C

Upload: others

Post on 22-Jun-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Optimization of Pattern Matching Algorithm for Memory Based

Architecture

Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang

National Tsing Hua University, Taiwan, R.O.C

Page 2: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Outline

Memory architecture for string matching

Basic idea

Novel Algorithm for memory architecture

Experimental results and conclusions

Page 3: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Introduction

Network Intrusion Detection System is used to detect network attacks by identifying attack patterns.

Software-only approaches can no longer meet the high throughput of today’s networking

Hardware approaches for acceleration.– Logic architecture– Memory architecture

Page 4: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Advantage of Memory Architecture

Young H. Cho and William H. Mangione-Smith, “A Pattern Matching Co-processor for Network Security,” in Proc. 42nd IEEE/ACM Design Automation Conference, Anaheim, CA, June 13-17, 2005.

M. Aldwairi*, T. Conte, and P. Franzon. “Configurable String Matching Hardware for Speeding up Intrusion Detection,” in Proc. ACM SIGARCH Computer Architecture News, 33(1):99–107, 2005.

S. Dharmapurikar and J. Lockwood. “Fast and Scalable Pattern Matching for Content Filtering,” in Proc. Symposium on Architectures for Networking and. Communications Systems (ANCS), Oct 2005.

The memory architecture has attracted a lot of attention because of its easy re-configurability and scalability.

Page 5: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Memory Architecture

“bcdf”“pcdg” 0 1 2 3

5 6 8

pc

b c d f

d g

b b b b~b & ~p

p f ff

pppp b

bb

b4

7

Current state

Decoder

Input

NS1 NS2 …… NS256 MV<8> <8> …… <8> <16>

256:1 MUX8

FSM

Attack Patterns

Memory

matchvector

Page 6: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Major Issue of Memory ArchitectureDue to the increasing number of attacks, the required memory increasestremendously– The performance, cost, and power

consumption are related to the memorysize

– Reducing the memory size has become imperative

Page 7: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

OutlineMemory architecture for string matching

Basic idea

Novel algorithm for memory architecture

Experimental results and Conclusions

Page 8: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Review of Aho-Corasick AlgorithmAho-Corasick (AC) algorithm can reduce large number of state transitions and memory size.– Solid line represents

valid transitions.– Dotted line represents

failure transitions.– Introduce the failure transition

to reduce the outgoingtransitions.

2 3 4

6 7 8

pc

b c d f

d g

0 1

5

AC state machine of “bcdf”and “pcdg”

Page 9: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

ObservationMany string patterns are similar because of common sub-stringsThe similarity does not lead to a small state machine.

“bcdf”“pcdg”

2 3 4

6 7 8

pc

b c d f

d g

0 1

5

AC state machine

Page 10: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Merge Similar States

The merg_FSM is a different machine – smaller number of states and transitions.– smaller memory in memory architecture.

0 1 2 3 4

5 6 7 8

pc

b c d f

d g

0 1 26 37 4

5 8p c

b c d f

g

merg_FSM

Page 11: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Problem of merg_FSMDirectly merging similar states results in an erroneous state machine.

0 1 2 3 4

5 6 7 8

pc

b c

d g

d f

input stream = {p, c, d, f}

0 1 26 37 4

5 8p c

b c

g

d f

merg_FSMAC state machine

False Positive

Page 12: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

OutlineMemory architecture for string matching

Basic Idea

Novel Algorithm for memory architecture

Experimental results and Conclusions

Page 13: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

State Traversal MechanismStore merg_FSM table in memoryState traversal mechanism is used to memorize the precedent state and differentiate merged states.

0 1 26 37 4

5 8p c

b

g

c d f

State traversal mechanism

merg_FSM

2 3 4

6 7 8

pc

b c d

d g

f0 1

5

AC state machine

?2 or ?6

Page 14: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

New State InformationAC state machine stores match vector.New state machine stores– PathVec stores path information.– IfFinal indicates whether the state is a final state.

match vector

c d0 1 2 3 4

5 6 7 8

p

c

b f

d g

0000 00 00 01

00 00 00 10

AC State Machine

pathVec_ifFinal

c d0 1 2 3 4

5 6 7 8p

c

b f

d g

01_011_0 01_0 01_0 01_1

10_0 10_0 10_0 10_1

New State Machine

Page 15: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Pseudo-Equivalent StatesDefinition: Two states are pseudo-equivalent if they have – identical input transitions– identical failure transitions– identical ifFinal– but different next states.

c d0 1 2 3 4

5 6 7 8

pc

b f

d g

01_0 01_0 01_0 01_1

10_0 10_0 10_0 10_1

11_0

Page 16: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Merge Pseudo-Equivalent States

c d0 1 2 3 4

5 6 7 8

pc

b f

d g

01_011_0

01_0 01_0 01_1

10_0 10_0 10_0 10_1 11_0

0 1 26 37 4

5 8p c

b c d fg

10_0

01_0 01_1

10_1

11_0 11_0

Pseudo-equivalentstates are merged.

PathVec and ifFinalare updated by a unionof merged states

Page 17: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

State Traversal Mechanism PreReg traces the precedent pathVec in each state.

0 1 26 37 4

5 8p c

b c d f

g

11_0

10_0

01_0 11_0 11_0 01_1

10_1

input stream: {p, c, d, f}

Next state pathVec ifFinal

preReg11

10

10

11 111001

00

Page 18: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

OutlineMemory architecture for string matching

Basic Idea

Novel algorithm for memory architecture

Experimental results and Conclusions

Page 19: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Experiment IPerform experiments on Snort rule sets.

Compare our approach with the Aho-Corasick algorithm .

A.V. Aho and M.J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. In Communications of the ACM 1975.

Page 20: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Compare with Traditional ACTradition AC [24] Our algorithm

# of trans.

# of states

Memory(bytes)

# of trans.

# of states

Memory(bytes)

MemoryReduct.

Oracle 138 4,674 2,180 2,185 880,009 1,389 1,221 452,533 49%

Sql 44 1,089 421 422 129,290 321 284 87,011 33%

Backdoor 57 599 563 565 191,253 523 497 152,268 20%

Web-iis 113 2,047 1,533 1,537 569,651 1,273 1,155 428,072 25%

Web-php 115 2,455 1,670 1,675 620,797 1,295 1,142 423,254 32%

Web-misc 310 4,711 3,576 3,587 1,444,664 3,031 2,734 1,101,119 24%

Web-cgi 347 5,339 3,407 3,419 1,377,002 2,672 2,358 949,685 31%

Total rules 1,595 20,921 17,472 17,522 8,745,668 14,704 13,381 6,248,927 29%

Ratio 1 1 1 84% 76% 71% 29%

Rule Sets # of patterns # of char.

Page 21: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Experiment II

Enhance the bit-split algorithm with our method– The results are compared with the original

bit-split algorithm.

L. Tan and T. Sherwood. A high throughput string matching architecture for intrusion detection and prevention. In ISCA’05.

Page 22: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Compare with Traditional Bit-Split Bit-split [8] Bit-split + Our algorithm

# of trans.

# of states

Memory(bytes)

# of trans.

# of states

Memory(bytes)

MemoryReduct.

Oracle 138 4,674 6,645 6,665 633,175 4,146 3,603 358,499 43%

Sql 44 1,089 1,211 1,215 110,565 866 769 72,671 34%

Backdoor 57 599 1,697 1,705 155,155 1,441 1,305 126,585 18%

Web-iis 113 2,047 4,869 4,885 464,075 3,844 3,374 335,713 28%

Web-php 115 2,455 4,991 5,011 476,045 3,871 3,345 332,828 30%

Web-misc 310 4,711 10,959 11,003 1,067,291 8,861 7,816 797,232 25%

Web-cgi 347 5,339 9,901 9,949 965,053 7,875 6,957 709,614 26%

Total ruls 1,595 20,921 53,930 54,130 5,467,130 43,550 38,701 4,237,760 22%

Ratio 1 1 1 81% 71% 78% 22%

Rule Sets # of patterns # of char.

Page 23: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

ConclusionProvide a concept of merging pseudo-equivalent states to reduce the number of states and transitions.

Propose a state traversal mechanism working with the merg_FSM without false positive matching results.

Experimental results demonstrate a significant reduction in memory requirement.

Page 24: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Thank You!

Page 25: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Backup

Page 26: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Cycle ProblemMerging disorder sections of pseudo-equivalent states creates cycle problem.

0 1 2 43

128 9 10 11

6a b c d e f

d e b c g7

w

5

Page 27: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Cycle ProblemFor example, the input string “abcdebcdef” will be mistaken as a match of the pattern “abcdef.”

0 1 2 43

12

5 6a b c d e f

g7

w d

b

Page 28: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

Construction of State Traversal Machine

Construction of the state traversal machine consists of two steps

– Step1: Construct valid transitions, failure transitions, pathVec, and ifFinal function.

– Step2: Merge the pseudo-equivalent states.

Page 29: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

ExampleConsider three patterns “abcdef”, “apcdeg”, “awcdeh”.

0 1 2 43

7 8 9

5

10 11

6

12 13 14 15 16

a b c d e f

p

w

001_1001_0

010_0 010_0 010_0

100_0 100_0 100_0 100_0

010_1

100_1

c d e g

c d e h

001_0001_0001_0001_0011_0111_0001_0011_0111_0

010_0

16 states

Page 30: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

100_0d

d

13

001_0

Merging Pseudo-equivalent States

0 1 2 43

7 8 9 10 11

6

12 14 15 16

a b c d e

pc e

w c e

001_1001_0001_0001_0111_0111_0

010_0 010_0 010_0 010_0

100_0 100_0 100_0

010_1

100_1

5f

g

h

111_0

merging the failure transitionsperforming the union on the pathVec of the merged states

Page 31: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

001_0

100_0

111_0

Merging Pseudo-equivalent States

0 1 2 43

7 9 10 11

6

12 14 15 16

a b c d e

pc d e

wc d

e

001_1001_0001_0111_0111_0

010_0 010_0 010_0

100_0 100_0

010_1

100_1

f

g

h

5

Page 32: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

100_1

15100_0

111_0

Merging Pseudo-equivalent States

0 1 2 43

7 9 10 11

6

12 14 16

a b c d e f

pc d e g

wc d

eh

001_1001_0001_0001_0111_0111_0

010_0 010_0 010_0

100_0 100_0

010_1

5111_0 111_0

10 states

Page 33: Optimization of Pattern Matching Algorithm for Memory Based Architecture · Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing

State Traversal AlgorithmAlgorithm: State traversal pattern matching

algorithmInput: A text string x=a1a2…an where each

ai is an input symbol and a state traversal machine M with valid transition function g, failure transition function f, path function pathVec and final function ifFinal.

Output: Locations at which keywords occur in x.

Method:begin

state←0preReg←1….1 //all bits are initiated to 1.for i←until n do

beginpreReg = preReg & pathVec(state)

while g(state, ai) == fail || preReg == 0 do

beginstate←f (state)preReg←1….1

end

state←g(state, ai)if ifFinal(state) = 1 then

beginprint iprint preReg

endend

end