1 efficient rule matching for large scale systems packet classification – a case study alok...

1

Efficient Rule Matching for Large Scale Systems

Packet Classification – A Case Study

Alok Tongaonkar

Stony Brook University

A

2

Rule Based Systems

Applications in Security – Intrusion Detection System Firewalls Access Control Systems

Policy specified in terms of a database of rules

Enforcement involves identifying the applicable rule(s)

3

Fundamental Operation Given an input p with attributes {p1, p2, ..., pk}, identify the

rules Ri from {R1, R2, ..., Rn} that match p

Ri: condition -> action

e.g. R1: dhost == PLUTO && dport == HTTP && content: “Bad command” -> DENY

Challenge

Rule matching algorithms do not scale well – either in space or in time

4

Matching Algorithms n – no. of rules k – no. of attributes Linear Search

Match one rule at a time Space efficient – O(n*k) Matching time increases very fast – O(n)

Table-based Search Columns correspond to attributes Rows correspond to rules Wastes space when many rules specify “*” for many

attributes – O(n*k) Efficient matching in hardware/multiprocessor – match

different attributes in parallel and combine results In uniprocessor environment matching time – O(n)

5

Matching Algorithms contd. Decision Tree (Trie-like structure)

Each node corresponds to test on an attribute Matching time – O(k)

No. of attributes is order of magnitude smaller than no. of rules

Size – Can be exponential in n

Minimization of decision tree is a NP-complete problem!

Goal

Develop efficient techniques for rule matching that scale to support thousands of rules

6

Outline

Problem Formulation Techniques

Minimize duplication Benign non-determinism Polynomial bound Utility

Results

7

Packet Classification A mechanism that

inspects network packets determines how to process a packet based on the values of

header fields and the payload Applications

Firewalls – Identify highest priority matching rule Intrusion Detection Systems

Use unordered rules Identify all matching rules

Network Monitoring – whether a packet satisfies any of the conditions

8

Objective Promote sharing of tests

not restricted to equality tests we need to support inequalities, disequalities, and

bit-masking operations Flexibility to support diverse application

Ordered (firewalls) and unordered (intrusion detection) rule sets

Packet-filtering (network monitoring)

9

Problem FormulationTests involve a variable x and one or two constants

(denoted by c). Equality tests x == c

tcp_sport == 80 Equality tests with bitmasks x & c1 == c

tcp_flags & 0x03 == 0x03 Disequality tests x != c

tcp_sport != 80

Disequality tests with bitmasks x & c1 != c tcp_flags & 0x03 != 0x03

Inequality tests x <= c tcp_dport <= 1024

10

Rules and priorities A rule R is a conjunction of tests

(dport == 22) && (sport <=1024) && (flags&0xb == 0x3) A set of rules may be partially ordered by a priority

relation The priority of R is denoted as Pri(R).

A rule R matches a packet p, if: the packet satisfies R, i.e., R(p) is true the packet does not satisfy any rule that has higher

priority than R

11

Decision Tree for Packet Classification

{R1, R2, R3}

{}

icmp_type == ECHO

ttl == 1

ttl == 1

ttl == 1 ttl != 1

ttl != 1

ttl != 1

icmp_type == ECHO_REPLY {R1, R3}

{R2, R3}

{R3}

{}{R3}

{R2, R3}

{R1, R3} {R1}

icmp_type != ECHO &&

icmp_type != ECHO_REPLY

R1: (icmp_type == ECHO)R2: (icmp_type == ECHO_REPLY) && (ttl ==1)R3: (ttl == 1)

12

Exponential Blowup R1: x == 1 R2: x == 2 R3: x == 3 R4: x == 4

R5: y == 1 R6: y == 2 R7: y == 3 R8: y == 4

12

34

x

y

213 4

else

elseelse1

2 3 4

{R1, R5} {R1, R6} {R2, R5} {R2, R6}

13

Decision Tree Construction Decompose and reorder tests to increase

sharing of tests among rules

R1: x == 5

R2: x & 0x03 != 1

{R2}

x & 0x03 != 1x & 0x03 == 1

x & 0x03 != 1x & 0x03 == 1

x == 5 x != 5

{R1} {R1, R2} {}

{R1}

14

Condition Factorization Decomposing rules into combination of more

primitive tests Similar to factorization of integers Based on the residue operation – analogous to

integer divisionResidue We want to determine if there is a match for a rule

C1

We have so far tested a condition C2

A residue captures the additional tests that need to be performed at this point to verify C1

15

Residue OperationThe residue C1/C2 is another condition C3 such

that:1. C2 Æ C3 ) C1

2. C1 Æ C2 ) C3

Examples C1: x 2 [1, 20], C2: x 2 [15, 25] C3: x <= 20

C1: x 2 [1, 20], C2: x == 15 C3: true

C1: x 2 [1, 20], C2: x == 35 C3: false

C1: x 2 [1, 20], C2: y == 15 C3: x 2 [1, 20]

16

Computing Residue on Tests

17

Build Algorithm Recursive procedure Takes a node s as its first parameter Builds the sub-tree that is rooted at s It takes two other parameters

Candidate Set (Cs) – rules that haven’t completed a match, but future matches can’t be ruled out either.

Match Set (Ms) – all rules for which a match can be announced at s.

18

Minimize Duplication R1: x == 1 && y == 1

R2: x == 2 && y == 2

R3: y == 3

x

12

else

yy y

1 3 else 2 else3 3 else

{R1} {R3} {} {}{R3}{R2} {} {R3}

19

Minimize Duplication R1: x == 1 && y == 1

R2: x == 2 && y == 2

R3: y == 3

y

12

else

xx

1 else 2 else

3

{R3}

{R1} {} {}

{}

{R2}

20

Benign Non-determinism Two rules R1 and R2 are said to be independent of each

if they do not have a common test Build separate trees for each independent set Match packets against each tree – non-determinism

without incurring any performance penalties If R1 and R2 are independent, packet may match R1, R2,

both, or neither. Number of nodes of tree for R1 is k1, for R2 is k2. Number of states of tree for R1 U R2 is k1 * k2. Combined number of nodes of independent trees for R1

and R2 is k1 + k2.

21

Exponential Blowup R1: x == 1 R2: x == 2 R3: x == 3 R4: x == 4

R5: y == 1 R6: y == 2 R7: y == 3 R8: y == 4

12

34

x

y

213 4

else

elseelse1

2 3 4

{R1, R5} {R1, R6} {R2, R5} {R2, R6}

yx

{R1} {R2} {R5} {R6}

22

Ensuring Polynomial Bounds Breadth of tree is function of breadth of sub-

trees Select a polynomial bound to satisfy at each

node Pick tests that satisfy the bounds Pick a test that comes closest to satisfying

this constraint and make some outgoing edges nondeterministic

23

Improving Matching TimeUtility - how much a test goes towards checking a rule based on notion of assigning costs to tests and rules compare cost of a rule with combined cost of a test and

the residue of a rule w.r.t the test

select strategySize reduction more important than matching time1. Pick discriminating test when available

Pick test with higher utility2. Examine opportunities for benign-nondeterminism3. Pick tests that satisfy polynomial bound

24

Tree Size

0

10000

20000

30000

40000

50000

60000

70000

0 50 100 150 200 250 300

No. of rules

No

. o

f n

od

es

ConditionFactorization

Snort NG

25

Matching Time

0102030405060708090

0 100 200 300

No. of rules

Mat

chin

g tim

e (p

er p

acke

t) in

ns

ConditionFactorization

Snort NG

Snort 2

26

Summary Developed a new technique for fast packet

classification Flexible – support diverse applications in a uniform

framework Promotes sharing of tests

Developed novel techniques for generating packet classification trees that Have polynomial size Virtually constant matching time

Demonstrated the gains from our technique for intrusion detection systems and firewalls

1 efficient rule matching for large scale systems packet classification – a case study alok...

Documents