decompression-free inspection: dpi for shared dictionary compression over http

27
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral , Shimrit Tzur David, David Hay Publisher: IEEE INFOCOM , 2012 Presenter: Kai-Yang, Liu Date: 2011/12/21

Upload: laith-rhodes

Post on 31-Dec-2015

36 views

Category:

Documents


0 download

DESCRIPTION

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP. Author: Anat Bremler-Barr, Yaron Koral , Shimrit Tzur David, David Hay Publisher: IEEE INFOCOM , 2012 Presenter: Kai-Yang, Liu Date: 2011/12/21. INTRODUCTION. - PowerPoint PPT Presentation

TRANSCRIPT

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP

Author:

Anat Bremler-Barr, Yaron Koral ,

Shimrit Tzur David, David Hay

Publisher: IEEE INFOCOM , 2012

Presenter: Kai-Yang, Liu

Date: 2011/12/21

INTRODUCTION

•Gzip or Deflate work well as the compression for each individual response, but in many cases there is a lot of common data shared by a group of pages.

•Therefore, compression methods of the next generation are inter-file, where there is one dictionary that can be referenced by several files.

2

The VCDIFF Compression Algorithm•VCDIFF encoding process uses three types

of instructions, called delta instructions:ADD(i , str) means to append to the output i

bytes, which are specified in parameter str.

RUN(i , b) means to append i times the byte b.

COPY(p , x) means that the interval [p , p + x) should be copied from the dictionary.

3

ExampleDictionary : DBEAACDBCABC

The plain-text that should be considered is therefore

ABDDBEAAAACDBCABABCAACBCDBADBC

4

Aho-Corasick Algorithm•patterns set:

{ E,BE,BD,BCD,BCAA,CDBCAB }

5

The Offline Phase•The dictionary is scanned from the first

symbol using the Aho-Corasick algorithm.State array :

Match list :

6

The Offline PhaseDictionary : DBEAACDBCABC

7

Four Kinds of Matches

•Patterns that are fully contained within an ADD or COPY instruction.

•Patterns whose prefix is within a COPY instruction.

•Patterns whose suffix is within a COPY instruction.

8

The Online Phase•Scanning the delta file by the AC algorithm.

•ADD instruction : simply scanning it by traversing the automaton.

•COPY (p , x) instruction :

Step1: Scan the copied symbols from the dictionary one by one, until when scanning a symbol bp+i we reach a state in the automaton whose depth is less or equal to i. ※ Find all the patterns of fourth category

9

The Online Phase

Step2: We check the Matched list to find any patterns in the dictionary that ends within interval [ p, p+x).

※ Find all the patterns of second category

Step3: We obtain the state State[p+x-1]. From that state, we follow failure transitions in the automaton, until we reach a state s whose depth is less or equal to x.

10

11

Dictionary : DBEAACDBCABC

Match List:

12

Dictionary : DBEAACDBCABC

Match List:

13

Dictionary : DBEAACDBCABC

Match List:

14

Dictionary : DBEAACDBCABC

Match List:

15

Dictionary : DBEAACDBCABC

Match List:

16

Dictionary : DBEAACDBCABC

Match List:

17

Dictionary : DBEAACDBCABC

Match List:

18

Dictionary : DBEAACDBCABC

Match List:

19

Dictionary : DBEAACDBCABC

Match List:

20

Dictionary : DBEAACDBCABC

Match List:

Optimizations•Efficient pattern lookups in the Matched list:

Save the Matched list as a balanced tree.

Add an array of pointers of the dictionary size.

Given a COPY (p , x) instruction, one can cache the corresponding internal matches within

[p , p+x-1] in a hash-table whose key is “(p , x)”.

21

REGULAR EXPRESSIONS INSPECTION•Anchors are extracted from the regular

expression offline. Then, our algorithm is applied on the SDCH-compressed traffic with the anchors as the patterns set.

•For example the regular expression \d{6}ABCDE\s+123456\d*XYZ$if we matched the anchor ABCDE at position x1 and the anchor XYZ at position x2, the interval [x1-10,x2] should be passed to the regular expressions engine for re-examination.

22

Experimental Results

•Data Sets : We first downloaded the dictionary from google.com and used the 1000 most popular Google search queries.

•Pattern Sets : The signatures data sets are drawn from a snapshot of Snort rules as of October 2010.

•we also constructed for each input file a synthetic patterns file.

23

Experimental Results

24

Experimental Results

25

Experimental Results

26

Experimental Results

27