1 efficient rule matching for large scale systems packet classification – a case study alok...

26
1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University

Upload: juniper-little

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

1

Efficient Rule Matching for Large Scale Systems

Packet Classification – A Case Study

Alok Tongaonkar

Stony Brook University

A

Page 2: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

2

Rule Based Systems

Applications in Security – Intrusion Detection System Firewalls Access Control Systems

Policy specified in terms of a database of rules

Enforcement involves identifying the applicable rule(s)

Page 3: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

3

Fundamental Operation Given an input p with attributes {p1, p2, ..., pk}, identify the

rules Ri from {R1, R2, ..., Rn} that match p

Ri: condition -> action

e.g. R1: dhost == PLUTO && dport == HTTP && content: “Bad command” -> DENY

Challenge

Rule matching algorithms do not scale well – either in space or in time

Page 4: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

4

Matching Algorithms n – no. of rules k – no. of attributes Linear Search

Match one rule at a time Space efficient – O(n*k) Matching time increases very fast – O(n)

Table-based Search Columns correspond to attributes Rows correspond to rules Wastes space when many rules specify “*” for many

attributes – O(n*k) Efficient matching in hardware/multiprocessor – match

different attributes in parallel and combine results In uniprocessor environment matching time – O(n)

Page 5: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

5

Matching Algorithms contd. Decision Tree (Trie-like structure)

Each node corresponds to test on an attribute Matching time – O(k)

No. of attributes is order of magnitude smaller than no. of rules

Size – Can be exponential in n

Minimization of decision tree is a NP-complete problem!

Goal

Develop efficient techniques for rule matching that scale to support thousands of rules

Page 6: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

6

Outline

Problem Formulation Techniques

Minimize duplication Benign non-determinism Polynomial bound Utility

Results

Page 7: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

7

Packet Classification A mechanism that

inspects network packets determines how to process a packet based on the values of

header fields and the payload Applications

Firewalls – Identify highest priority matching rule Intrusion Detection Systems

Use unordered rules Identify all matching rules

Network Monitoring – whether a packet satisfies any of the conditions

Page 8: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

8

Objective Promote sharing of tests

not restricted to equality tests we need to support inequalities, disequalities, and

bit-masking operations Flexibility to support diverse application

Ordered (firewalls) and unordered (intrusion detection) rule sets

Packet-filtering (network monitoring)

Page 9: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

9

Problem FormulationTests involve a variable x and one or two constants

(denoted by c). Equality tests x == c

tcp_sport == 80 Equality tests with bitmasks x & c1 == c

tcp_flags & 0x03 == 0x03 Disequality tests x != c

tcp_sport != 80

Disequality tests with bitmasks x & c1 != c tcp_flags & 0x03 != 0x03

Inequality tests x <= c tcp_dport <= 1024

Page 10: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

10

Rules and priorities A rule R is a conjunction of tests

(dport == 22) && (sport <=1024) && (flags&0xb == 0x3) A set of rules may be partially ordered by a priority

relation The priority of R is denoted as Pri(R).

A rule R matches a packet p, if: the packet satisfies R, i.e., R(p) is true the packet does not satisfy any rule that has higher

priority than R

Page 11: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

11

Decision Tree for Packet Classification

{R1, R2, R3}

{}

icmp_type == ECHO

ttl == 1

ttl == 1

ttl == 1 ttl != 1

ttl != 1

ttl != 1

icmp_type == ECHO_REPLY {R1, R3}

{R2, R3}

{R3}

{}{R3}

{R2, R3}

{R1, R3} {R1}

icmp_type != ECHO &&

icmp_type != ECHO_REPLY

R1: (icmp_type == ECHO)R2: (icmp_type == ECHO_REPLY) && (ttl ==1)R3: (ttl == 1)

Page 12: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

12

Exponential Blowup R1: x == 1 R2: x == 2 R3: x == 3 R4: x == 4

R5: y == 1 R6: y == 2 R7: y == 3 R8: y == 4

12

34

x

y

213 4

else

elseelse1

2 3 4

{R1, R5} {R1, R6} {R2, R5} {R2, R6}

Page 13: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

13

Decision Tree Construction Decompose and reorder tests to increase

sharing of tests among rules

R1: x == 5

R2: x & 0x03 != 1

{R2}

x & 0x03 != 1x & 0x03 == 1

x & 0x03 != 1x & 0x03 == 1

x == 5 x != 5

{R1} {R1, R2} {}

{R1}

Page 14: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

14

Condition Factorization Decomposing rules into combination of more

primitive tests Similar to factorization of integers Based on the residue operation – analogous to

integer divisionResidue We want to determine if there is a match for a rule

C1

We have so far tested a condition C2

A residue captures the additional tests that need to be performed at this point to verify C1

Page 15: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

15

Residue OperationThe residue C1/C2 is another condition C3 such

that:1. C2 Æ C3 ) C1

2. C1 Æ C2 ) C3

Examples C1: x 2 [1, 20], C2: x 2 [15, 25] C3: x <= 20

C1: x 2 [1, 20], C2: x == 15 C3: true

C1: x 2 [1, 20], C2: x == 35 C3: false

C1: x 2 [1, 20], C2: y == 15 C3: x 2 [1, 20]

Page 16: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

16

Computing Residue on Tests

Page 17: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

17

Build Algorithm Recursive procedure Takes a node s as its first parameter Builds the sub-tree that is rooted at s It takes two other parameters

Candidate Set (Cs) – rules that haven’t completed a match, but future matches can’t be ruled out either.

Match Set (Ms) – all rules for which a match can be announced at s.

Page 18: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

18

Minimize Duplication R1: x == 1 && y == 1

R2: x == 2 && y == 2

R3: y == 3

x

12

else

yy y

1 3 else 2 else3 3 else

{R1} {R3} {} {}{R3}{R2} {} {R3}

Page 19: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

19

Minimize Duplication R1: x == 1 && y == 1

R2: x == 2 && y == 2

R3: y == 3

y

12

else

xx

1 else 2 else

3

{R3}

{R1} {} {}

{}

{R2}

Page 20: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

20

Benign Non-determinism Two rules R1 and R2 are said to be independent of each

if they do not have a common test Build separate trees for each independent set Match packets against each tree – non-determinism

without incurring any performance penalties If R1 and R2 are independent, packet may match R1, R2,

both, or neither. Number of nodes of tree for R1 is k1, for R2 is k2. Number of states of tree for R1 U R2 is k1 * k2. Combined number of nodes of independent trees for R1

and R2 is k1 + k2.

Page 21: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

21

Exponential Blowup R1: x == 1 R2: x == 2 R3: x == 3 R4: x == 4

R5: y == 1 R6: y == 2 R7: y == 3 R8: y == 4

12

34

x

y

213 4

else

elseelse1

2 3 4

{R1, R5} {R1, R6} {R2, R5} {R2, R6}

yx

{R1} {R2} {R5} {R6}

Page 22: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

22

Ensuring Polynomial Bounds Breadth of tree is function of breadth of sub-

trees Select a polynomial bound to satisfy at each

node Pick tests that satisfy the bounds Pick a test that comes closest to satisfying

this constraint and make some outgoing edges nondeterministic

Page 23: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

23

Improving Matching TimeUtility - how much a test goes towards checking a rule based on notion of assigning costs to tests and rules compare cost of a rule with combined cost of a test and

the residue of a rule w.r.t the test

select strategySize reduction more important than matching time1. Pick discriminating test when available

Pick test with higher utility2. Examine opportunities for benign-nondeterminism3. Pick tests that satisfy polynomial bound

Page 24: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

24

Tree Size

0

10000

20000

30000

40000

50000

60000

70000

0 50 100 150 200 250 300

No. of rules

No

. o

f n

od

es

ConditionFactorization

Snort NG

Page 25: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

25

Matching Time

0102030405060708090

0 100 200 300

No. of rules

Mat

chin

g tim

e (p

er p

acke

t) in

ns

ConditionFactorization

Snort NG

Snort 2

Page 26: 1 Efficient Rule Matching for Large Scale Systems Packet Classification – A Case Study Alok Tongaonkar Stony Brook University TexPoint fonts used in EMF

26

Summary Developed a new technique for fast packet

classification Flexible – support diverse applications in a uniform

framework Promotes sharing of tests

Developed novel techniques for generating packet classification trees that Have polynomial size Virtually constant matching time

Demonstrated the gains from our technique for intrusion detection systems and firewalls