Download - Research and optimization of pattern matching algorithm based on Intrusion Detection System
1
Research and optimization of pattern matching algorithm based on Intrusion Detection System
Author:QIN Hai-sheng, LI Xin-hua, WEI Hai-lan, LI Jun-huiPublisher:International Conference on Business Management and Electronic Information 2011Presenter:Zi-Yang OuDate:2011/9/14
Outline Introduction
Pattern Matching Algorithm
Single Pattern Matching Algorithm
Many Pattern Matching Algorithm
Experimental Analysis
2
Introduction Intrusion detection system, as the second line of defense
after the firewall in computer security system, can well
improve the Computer security performance.
At present intrusion detection system analysis module uses the pattern matching technology.
3
Pattern Matching Algo. Pattern matching, that is character string matching, to find
a certain character string in the target character string T =
t1t2…tn to exactly matched with given pattern strings P
= p1p2…pm.
Single pattern matching:
In text T, only one pattern string can be matched.
Many pattern matching:
In text T, several pattern strings can be matched at the same.
4
Single Pattern Matching Algo. BM algorithm
BMHS algorithm
Improved single pattern matching algorithm : BMHSL
5
BM Align pattern strings P and text T from left to right, but the
matching operation starts from right to left.
If the matching fails, 2 Offset functions Badchar and Goodsuffix in preprocessing will work out the distance which pattern strings P moves toward right, and align T and P again to match.
6
BM
Goodsuffix : Substring U in the pattern strings has already been matched with one substring in T, if there is still a character string U in P , pattern strings will be moved some distance to make next substring U match, or leap over the whole distance.
7
BM
a b c d e j u s t I n
j u s t
j u s t
j u s t
a b c a b a b
c a b a b
c a b a b
8
others
j u s t
4 3 2 1 4
BMHS If the character T[k+1] does not exist in the pattern strings,
move toward m+1.
a b c d e j u s t I n
j u s t
j u s t
a b c a b a b
c a b a b
c a b a b9
BMHSL
1. Reduce the unnecessary matching (English)
2. To move the distance as long as possible character set Σ1 in P ; Σ 2 in T the frequency t in P of every character in Σ 1 ∩Σ 2 find out character K which appears least and record the
locate[i] of K appearing in text T and the distance d[i] If T[k+1] belongs to Σ 1, moves to locate[i+1] Else, if d[i]+d[i+1] +...+d[i+j]>m+1, move to locate[i+ j]
10
BMHSL
a f f d g e f c c f g h
f g h
f g h => d[2]+d[3] > 4, move to locate[3]
f g h
f g h
K=f
locate[i] of K : 2, 3, 7, 10
d[i] : 1, 4, 3
11
Many Pattern Matching Algo. When several pattern strings need to match, using Single
pattern matching has low efficiency.
AC algorithm
AC-BM algorithm
Improved AC-BM algorithm : AC-BML
12
AC In the preprocess stage, AC algorithm form several
pattern strings into Tree finite state automata.
The matching process starts from the root of the tree. If the scanning shows the character is not the next character of pattern strings, it turns to another situation which is the suffix of current situation.
13
AC
14
AC-BM In 1993, on the basis of AC, Jang Jong used the leap idea
of BM and proposed AC-BM. In the preprocess stage, according to the idea of AC. In the matching process, align the pattern strings with
shortest length of character of pattern tree with the right of target string, then match from right to left of pattern tree.
When matching fails, pattern tree needs to move left.
Goodsuffix is the same as BM’s.
15
AC-BM
a b b c f b a e b b c f g h
h b
d
a b b c
f
b d
16
AC-BML
When several pattern strings have the same prefix, Single pattern matching algorithm can solve this situation and improve the efficiency of the whole matching process.
Using the basis of AC - BM algorithm combing BMHSL algorithm.
Apply BMHSL to the same prefix of pattern strings, find out all the locations wi where character string in the text can match with prefix and record them.
Align the root node of pattern tree with the location of w1 in the text string directly at the beginning.
17
If the matching fails, more the pattern tree left, and the distance can be determined by the following functions.
Badchar function is the same as AC-BM’s. distance (i) =Wi-Wi-1 If the value of distance(i) is greater, move the pattern tree
to wi-1 to match. If the value of Badchar(c) is greater, make distance (i) to
be Wi-Wi-2, then compare the value of two functions again.
18
Experimental Analysis
19