author : sarang dharmapurikar, john lockwood publisher : ieee journal on selected areas in...
TRANSCRIPT
![Page 1: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/1.jpg)
Author : Author : Sarang Dharmapurikar, John LockwoodSarang Dharmapurikar, John Lockwood
Publisher : Publisher : IEEE Journal on Selected Areas in Communications, 2006IEEE Journal on Selected Areas in Communications, 2006
Presenter : Presenter : Jo-Ning YuJo-Ning Yu
Date : Date : 2010/12/292010/12/29
![Page 2: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/2.jpg)
Simple Multipattern Matching Algorithm
Scaling to Arbitrary String Lengths Basic Ideas Data Structures and Algorithm
Evaluation with Snort
Outline2
![Page 3: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/3.jpg)
Notation T[i…j] : the substring of T starting at location i and ending at j.
Method Use hash tables to maintain for each set of strings with different
lengths.
Let L be the largest length of any string in S.
When we consider a window of L bytes starting at location i, i.e., T[i…(i + L - 1)], we have L possible prefix strings T[i…(i + L -1)], T[i…(i + L - 2)], …,T[i…i] each of which must be looked up individually in the corresponding hash table.
Simple Multi-pattern Matching Algorithm
3
![Page 4: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/4.jpg)
After all these prefixes have been looked up we can move the window forward by a single byte such that now we consider bytes from location i + 1 forming the window T[(i+1)…(i+L)].
Secondly, for applications such as NIDS, most of the network data does not contain any string of interest, so we use Bloom filters to reduce the unsuccessful hash table accesses.
We maintain a separate Bloom filter corresponding to each hash table kept in the off-chip memory.
Before we execute a hash table query, we first query the corresponding on-chip Bloom filter to check if the string under inspection is present in the off-chip hash table.
4
Simple Multi-pattern Matching Algorithm
![Page 5: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/5.jpg)
5
Simple Multi-pattern Matching Algorithm
![Page 6: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/6.jpg)
Use r = 4 parallel machines.
6
Simple Multi-pattern Matching Algorithm
![Page 7: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/7.jpg)
Hash computation of the DetectString1 algorithm over long
strings (like 100 characters) consumes more resources and introduces more latency.
We extend the algorithm to handle arbitrarily large strings.
7
Simple Multi-pattern Matching Algorithm
![Page 8: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/8.jpg)
Background Aho-Corasick automaton example:
Basic ideas To split up longer strings into multiple fixed length shorter
segments and use our first algorithm to match the individual segments.
8
Scaling to Arbitrary String Lengths
![Page 9: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/9.jpg)
For instance, if we can detect strings of up to length four characters using Bloom filter technique and if we would like to detect a string “technically” then we cut this string into three pieces “tech”, “nica” and “lly”. These segments can be stitched in a chain and represented as a small state machine with four states q0, q1, q2 and q3.
9
Scaling to Arbitrary String Lengths
![Page 10: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/10.jpg)
lllye
We first segment each string into k character segments (k = 4 for the purpose of illustration) and the left over portion which we call tails.
We treat each of these k-character segments as a symbol.
There are six unique symbols in the given example namely {tech, nica, tele, phon, elep, hant} and the tails associated with these strings are {l, lly, e}.
10
Scaling to Arbitrary String LengthsConstruction
SymbolTail
![Page 11: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/11.jpg)
* We have shown the failure transition to only non-q0 states for clarity in the figure.
11
Scaling to Arbitrary String Lengths
![Page 12: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/12.jpg)
* The tails do not create any state to which the automaton jumps.* Tails simply indicate the completion of a string.
12
Scaling to Arbitrary String Lengths
Since this algorithm essentially builds an Aho-Corasick Non-deterministic Finite Automaton (NFA) that jumps ahead by k characters at a time, we refer to this machine as Jump-ahead Aho-Corasick NFA (JACK-NFA).
![Page 13: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/13.jpg)
13
SearchScaling to Arbitrary String Lengths
* Example 1:Text : technicallytech nica lly
![Page 14: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/14.jpg)
If a string were to appear in the middle of the k character group, it will not be
detected.
For instance, if we begin scanning text “xytechnical...,” then the machine will view the text as “xyte chni cal...,” and since “xyte” is not the beginning of any string and is not a valid 4-character segment associated with state q0, the automaton will never jump to a valid state causing the machine to miss the detection.
14
Scaling to Arbitrary String LengthsProblem
![Page 15: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/15.jpg)
We deploy k machines each of which scans the text with one
byte offset.
In our example, if we deploy 4 machines, then the first machine scans the text as “xyte chni cal…,” whereas the second machine scans it as “ytec hnic al...,” and the third machine scans it as “tech nica l…,” and finally the fourth machine scans it as “echn ical ...”. Since the string appears at the correct boundary for the third machine, it will be detected by it.
Therefore, by using k machines in parallel, we will never miss detection of a string.
15
Scaling to Arbitrary String LengthsSolution
![Page 16: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/16.jpg)
These k machines need not be physical machines. They can be four virtual machines emulated by a single physical machine.
16
Scaling to Arbitrary String Lengths
consume one character at a time
![Page 17: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/17.jpg)
To shift the text stream by r characters at a time, we use r physical machines, which emulate k virtual machines ,where r k.≦
17
Scaling to Arbitrary String Lengths
consume two characters at a time
![Page 18: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/18.jpg)
18
Scaling to Arbitrary String Lengths Data Structures and Algorithm
![Page 19: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/19.jpg)
19
Scaling to Arbitrary String Lengths
LPM2 can be performed in the same way as LPM1 algorithm with the exception that we have to check for the prefixes (tails) associated with only the current state of the machine.
k = 4
ji
![Page 20: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/20.jpg)
Implementation of a physical machine.
20
Scaling to Arbitrary String Lengths
![Page 21: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/21.jpg)
21
Evaluation with Snort - 1
* The string length up to 16 bytes.
![Page 22: Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29](https://reader036.vdocuments.net/reader036/viewer/2022062805/5697bfc11a28abf838ca4367/html5/thumbnails/22.jpg)
22
Evaluation with Snort - 2
* K = 16