![Page 1: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/1.jpg)
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP
Anat Bremler-BarrInterdisciplinary Center HerzliyaShimrit Tzur David
Interdisciplinary Center Herzliya &The Hebrew University, Jerusalem
David HayThe Hebrew University, Jerusalem
Yaron KoralTel Aviv University
1
![Page 2: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/2.jpg)
OutlineMotivationBackground
◦AC algorithmOur solution
◦The offline Phase◦The online phase
Experimental Results
2
![Page 3: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/3.jpg)
Deep Packet Inspection (DPI) Search for patterns in the packets` payload
Signatures-based NIDS ◦Intrusion Preventions
Web-Application Firewalls◦Leakage prevention◦Content Filtering
Challenges:◦Thousands of known malicious patterns◦Real time, link rate
Security tools performance is dominated by the pattern matching engine (Fisk & Varghese 2002)
3
![Page 4: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/4.jpg)
Compressed HTTP
4
19% increase in 8 month!
84.1% of the top 1,000 sites compress their traffic.
Data compression is done by adding references to repeated data.
There are two types of compression:
◦Intra-response compression – the references point to bytes within the response (Gzip/Deflate)
◦Inter-responses/connections compression – the references point to bytes in a separate file, called dictionary (Google’s SDCH).
![Page 5: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/5.jpg)
Example – Intra-Response Compression
File1.html:abcdefgabcd
File2.htmlabcdxyzbcdtr
Encode repeated strings by pointer: {distance, length}
5
TCP Connection Setup
GET File1.html
abcdefg(7,4)
GET File2.html
abcdxyz(6,3)tr
![Page 6: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/6.jpg)
Example – Inter-Response Compression
Dictionary:abcd
File1.html:abcdefgabcd
File2.htmlabcdxyzbcdtr
Copy repeated strings from the dictionary: (address, length)
6
TCP Connection Setup
GET File1.html
Delta file: (0,4)efg(0,4)
GET File2.html
Delta file:(0,4)xyz(1,3)tr
GET dictionaryabcd
![Page 7: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/7.jpg)
Current NIDS Operation (1)
7
Server Client
Http uncompressed
NIDS
GET \index.htmlAccept-Encoding: SDCH
Scan for Intrusions
Http uncompressed
GET \index.htmlAccept-Encoding: SDCH
![Page 8: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/8.jpg)
Current NIDS Operation (2)
8
Server Client
Http compressed
NIDS
GET \index.htmlAccept-Encoding: SDCH
Do Not Scan/Decompress,Scan, Compress
Http compressed
GET \index.htmlAccept-Encoding: SDCH
![Page 9: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/9.jpg)
Our Solution
9
Server Client
Http compressed
NIDS
GET \index.htmlAccept-Encoding: SDCH
Scan directly with no decompression
Http compressed
GET \index.htmlAccept-Encoding: SDCH
![Page 10: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/10.jpg)
Our Solution: Decompression-Free Scanning
Focused on inter-response compression
Our algorithm works in two phases◦Offline phase - Scanning the dictionary◦Online phase - Scanning the delta files
Works at the rate of the compressed traffic◦Gain 56% improvement compared with scanning
the plain-text directly
10
![Page 11: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/11.jpg)
Outline
MotivationBackground
◦Aho-Corasick (AC) algorithmOur solution
◦The offline Phase◦The online phase
Experimental Results
11
![Page 12: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/12.jpg)
Aho-Corasick (AC) Algorithm Finite State Machine (FSM)
◦ Regular states, accepting states
Goto function (black arrows)◦ g(state,symbol)state
Each state corresponds to a label- the sequence of characters on its goto path from the root.◦ The length of the label is the depth of the state
Failure function (red arrows)◦ f(state)state◦ Taken when there is no goto function◦ Goes to a state that its label is the longest suffix of
the current state’s label
s0
s7
s12
s1 s2
s3 s5s4
s14
s13 s6
s8
s9
s10
s11
C
C
E
D
B
ED
D B
C
A
B
A
A
The label of S14 is BCAA
g(S11,B) = S12g(S11,A) = ?
Patterns:EBEBDBCAABCDCDBCAB
f(S11) = S13 g(S11,A) g(S13,A)=S14
![Page 13: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/13.jpg)
Aho-Corasick InsightsThe automaton remembers
only its current state
◦The input text ends with the label of current state
◦This label is the longest suffix in the text that can be a prefix of a match
No future pattern can begin before this label
s0
s7
s12
s1 s2
s3 s5s4
s14
s13 s6
s8
s9
s10
s11
C
C
E
D
B
ED
D B
C
A
B
A
A
![Page 14: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/14.jpg)
Outlines
MotivationBackground
◦Aho-Corasick (AC) algorithmOur solution
◦The offline Phase◦The online phase
Experimental Results
14
![Page 15: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/15.jpg)
Accelerator Algorithm Idea
The algorithm operates in two phases:The Offline Phase:
◦Scan the dictionary and store information about the pattern matching results
The Online Phase:◦Scan the delta file and skip almost all referenced
bytes that were already scanned for patterns.
15
![Page 16: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/16.jpg)
The Offline PhaseThe dictionary is scanned using
AC (from its first byte and from s0). We save the state after each byte.
16
11 10 9 8 7 6 5 4 3 2 1 0C B A C B D C A A E B DS5 S12 S11 S10 S9 S8 S7 S0 S0 S3 S2 S0
s0
s7
s12
s1 s2
s3 s5s4
s14
s1
3
s6
s8
s9
s10
s11
C
C
E
D
B
E D
D B
C
A
B
A
AState:
We also save information of matched patterns that are found in the dictionary
![Page 17: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/17.jpg)
ChallengesDictionary:Delta file:
ABDB(5,4)AAB(1,4)The uncompressed data is:
We copy from arbitrary position in the dictionary when the automaton in an arbitrary state◦We show that no matter in what state and which
symbol we start to copy, the resulting state is reachable via failure transitions from the saved state.
17
A B D B C D B C A A B B E A A
Patterns/Signatures:EBEBDBCAABCDCDBCAB
Types of matches:Right boundaryInternalLeft boundary
0 1 2 3 4 5 6 7 8 9 10 11
DB E A A C DB C A B C
![Page 18: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/18.jpg)
The Online Phase
Scan the delta file:Uncompressed bytes - scan using AC.
Copy instruction (p,x)◦ The compressed data that we already scanned in the offline
phase.◦ We will save the scan for almost all these bytes.
The internal match is trivial, see paper for details.
18
![Page 19: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/19.jpg)
The Online Phase - Right BoundaryWhen encountering copy instruction (p,x),
We want to stop scanning and jump to state[p+x-1]◦If the label of the state is longer than the copy-
value The label begins before the copy value The context of this state is not as in the online scan We take failure transitions to find state with
sufficiently short label.
◦Otherwise The label of the state is contained in the copy value This is the longest suffix that can lead to a match
19
![Page 20: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/20.jpg)
Example – Right Boundary
Uncompressed data:…B
20
s0
s7
s12
s1 s2
s3 s5s4
s14
s13 s6
s8
s9
s10
s11
C
C
E
D
B
E D
D B
C
A
B
A
A
11 10 9 8 7 6 5 4 3 2 1 0C B A C B D C A A E B DS5 S12 S11 S10 S9 S8 S7 S0 S0 S3 S2 S0
State:
BCABCOPY(7,4):
Go to State[10]=s12. depth(s12) > 4.Go to f(s12)=s2
depth(s2) ≤ 4Current state is S2
![Page 21: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/21.jpg)
The Online Phase – Left BoundaryWhen encountering copy instruction (p,x),
We want to stop scanning and jump to state[p+x-1]◦If the number of bytes we read from the copy value
is less than the depth of the current state The label of the state begins before the copied bytes We scan the copy value till we reach a state that its
label is shorter than the number of read bytes.
◦Otherwise The label of the state is contained in the copy value Both offline and online scans have the same context
21
![Page 22: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/22.jpg)
Example – Left Boundary
Uncompressed data:…B
22
s0
s7
s12
s1 s2
s3 s5s4
s1
4
s1
3
s6
s8
s9
s10
s11
C
C
E
D
B
E D
D B
C
A
B
A
A
11 10 9 8 7 6 5 4 3 2 1 0C B A C B D C A A E B DS5 S12 S11 S10 S9 S8 S7 S0 S0 S3 S2 S0
State:
CDBCCOPY(5,4):
j=0depth=1Continue
j=1Depth=2Continue
j=2Depth=3Continuej=3
Stop scanning (depth(s9)≤3)
![Page 23: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/23.jpg)
Outline
MotivationBackground
◦Aho-Corasick (AC) algorithmOur solution
◦The offline Phase◦The online phase
Experimental Results
23
![Page 24: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/24.jpg)
Experimental Results
Input: ◦google.com dictionary ◦Pages for 1000 most popular Google queries.
Patterns◦Snort
The synthetic case◦A patterns file for each input file so the input
file has a different percentage of matches, from 25% to 100%.
24
![Page 25: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/25.jpg)
The Algorithm Overheads
1. Traversing the failure transitions◦ In the right boundary
2. Scanning the copy value◦ In the left boundary
3. Memory consumption:◦ The additional information of the offline phase.◦ Total: 420 KB (per dictionary)
Can be further reduced by a variable-length pointer encoding.
25
![Page 26: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/26.jpg)
Failure Transitions – Right Boundaries
If length ≥ depth, no failure transition is taken
In our experiments:◦The average is 2.35
failure transitions per file (average of 557 copy
instructions per file)
26
![Page 27: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/27.jpg)
Scanning the Copy Value -Left Boundary
Compression ratio – compressed/uncompressed
Scan ratio – scanned/uncompressed.
Snort◦ low percentage of matches
scan-ratio ~ compression ratio
The synthetic case◦ high percentage of matches◦ Unrealistic case ◦ scan-ratio is between 1.05 to
1.2 times compression-ratio.
27
![Page 28: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/28.jpg)
Regular Expression Results
Strings were extracted from the regular expression and were added to the pattern set.
When needed, we use off-the-shelf perl compatible regular expression engine to scan additional parts of the text.
The overhead of the regular expression is around 1% which is almost negligible
28
![Page 29: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/29.jpg)
Questions??
29
![Page 30: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/30.jpg)
Regular ExpressionVery common in security purpose patterns.
◦In Snort, 55% of the rules contain regular expression.
Composed of anchors and pcre tokens.For example, in the pattern: abc[1-9]*xyza{3,7}The anchors are:
◦abc◦xyz
The pcre tokens are:◦[1-9]*◦a{3,7}
30
![Page 31: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/31.jpg)
Dealing with Regular Expression
1. The anchors are extracted from the regular expression offline.
2. The anchors are added to the patterns set.
3. If there is a regular expression which all its anchors were matched:
◦run an off the-shelf regular expression engine until, either a mismatch, a full pattern match, or the whole (limited) text is searched.
31
![Page 32: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/32.jpg)
Regular Expression – Limited Search
In most cases, we can limit the search in at least one direction.◦If before the first anchor all tokens have a
limited size, there is a bounded number of characters we should examine before the matched anchor.
◦If after the last anchor all tokens have a limited size there is a bounded number of characters we should examine after the matched anchor.
32
![Page 33: Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP](https://reader035.vdocuments.net/reader035/viewer/2022062501/5681676c550346895ddc5551/html5/thumbnails/33.jpg)
Memory Consumption
1. Doubling the size of the dictionary (for saving the offline scan results, one pointer per symbol)
2. Saving the matched list (for internal matches)
Our experiments:◦Match list size 40,000◦Dictionary size 116K symbols◦Pointer size 17 bits
Total memory consumption is 420 KB (per dictionary)◦Can be further reduced by a variable-length pointer
encoding.
33